Koruk Kasimcan 202208 MASc

DEFINITION OF GEOLOGICAL DOMAINS WITH AN ENSEMBLE
IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION
By
Kasimcan Koruk
A thesis submitted to the Department of Mining Engineering
in conformity with the requirements for
the degree of Master of Applied Science
Queen’s University
Kingston, Ontario, Canada
(August, 2022)
Copyright © Kasimcan Koruk, 2022

To my wife and my son.
i
Abstract
Building a resource model can be a challenging process because of the complex nature of geology.
Employing limited information to build the resource model inevitably gives rise to uncertainties,
and therefore uncertainty is an important phenomenon that needs to be taken into account while
building the model. In fact, quantifying the uncertainties can be a significant advantage for the
evaluation of mineral resources. Uncertainties can be quantified by traditional approaches, such as
geostatistical simulation methods. While traditional approaches generally focus directly on
domains to build geological models, today huge geochemical datasets obtained during exploration
campaigns are seen as an opportunity to better evaluate the geological models. In this context,
machine learning algorithms stand as a strong alternative to traditional approaches (explicit or
implicit modeling, or geostatistical techniques), because they can easily incorporate large and
multivariate datasets to reach consistent and semi-autonomous results.
This study proposes an ensemble learning approach to define geological domains and their inherent
uncertainty. The study adopts an unsupervised binary clustering method to label the domains of a
multivariate geochemical dataset, and the information obtained by this unsupervised method is
used to inform a supervised learning algorithm, which is Support Vector Classification. The
supervised learning step is applied to assign domaining in locations of a gridded model. The
unsupervised and supervised learning steps are repeated with subsets of geochemical variables and
with subsets of training samples to realize an Ensemble model. These models are combined to
define a strong model, and the uncertainty can be incorporated with the model. Finally, a
hierarchical application is applied on the results to build multiple domains. We demonstrate the
proposed workflow with an application to a real database from a porphyry copper deposit. The
ii
workflow was capable to build a robust model with high accuracy and with well-defined, curvy
boundaries.
iii
Acknowledgements
I wish to express my gratitude to my supervisor Dr. Julian Ortiz for his guiding and supportive
attitude throughout the research and daily life. The ideas given by Dr. Ortiz shaped my study.
I am also grateful for the scholarship funded by Ministry of National Education of the Republic of
Turkey. The financial support provided by the Ministry is priceless, and it gave me confidence and
assurance to complete the degree.
I would thank to Sebastian Avalos for his friendship and support during the education period in
Canada.
I would like to offer my sincere gratitude to my mother Meftun and my brother Nevzat Cagatay
since they always supported all of my decisions throughout my career. I am also thankful to my
son Timur because he joined our life during this period, and he gave us days with full of joy and
happiness. Lastly and most importantly, I feel eternal gratitude to my wife Gorkem. Every process
was easier thanks to her unconditional support and sacrifices throughout this education.
iv
Table of Contents
Abstract ................................................................................................................................................... ii
Acknowledgements...................................................................................................................................... iv
List of Figures .............................................................................................................................................. vii
List of Tables ............................................................................................................................................... xii
List of Abbreviations ...................................................................................................................................xiv
Chapter 1 Introduction .............................................................................................................................. 1
1.1 Evaluation of Mineral Resources .................................................................................................. 1
1.2 Objective and Approach................................................................................................................ 2
Chapter 2 Literature Review ...................................................................................................................... 5
2.1 Geological Modeling for Resource Estimation.............................................................................. 5
2.1.1 Explicit and Implicit Modeling ............................................................................................... 5
2.1.2 Deterministic and Stochastic Modeling ................................................................................ 7
2.2 Overview of Machine Learning Algorithms ................................................................................ 13
2.2.1 Support Vector Machine and Support Vector Classifier ..................................................... 20
2.2.2 Ensemble Learning .............................................................................................................. 24
Chapter 3 Methodology ........................................................................................................................... 26
3.1 Unsupervised Classification for the Definition of Geological Domains ...................................... 26
3.1.1 Exploratory Data Analysis ................................................................................................... 27
3.1.2 Optimization........................................................................................................................ 29
3.1.3 Domain Assignment ............................................................................................................ 34
3.2 Ensemble Learning with Support Vector Classification .............................................................. 37
3.2.1 Application of Support Vector Classification ...................................................................... 37
3.2.2 Ensemble Learning with 3-Combination of Variables and Bagging .................................... 39
3.2.3 Hierarchical Application ...................................................................................................... 41
3.3 Comparison with BlockSIS ........................................................................................................... 41
Chapter 4 Case Study ............................................................................................................................... 44
4.1 First Level of Hierarchical Application......................................................................................... 44
4.1.1 Unsupervised Clustering ..................................................................................................... 45
4.1.2 Supervised Learning and Ensemble Learning ..................................................................... 55
4.1.3 Comparison with BlockSIS ................................................................................................... 62
4.2 Second Level of Hierarchical Application .................................................................................... 71
4.2.1 Unsupervised Clustering ..................................................................................................... 71
v
4.2.2 Supervised Learning and Ensemble Learning ..................................................................... 80
4.2.3 Comparison with BlockSIS ................................................................................................... 86
Chapter 5 Conclusions and Recommendations ....................................................................................... 96
References ................................................................................................................................................. 99
Appendices ............................................................................................................................................... 104
vi
List of Figures
Figure 2.1 (a) An example of a failed explicit model. The model requires phantom points to build
the boundary represented by orange color. (b) An implicit model with successful folding structure
......................................................................................................................................................... 7
Figure 2.2 Deterministic model (left) and stochastic model with alternate models (right) ............ 8
Figure 2.3 Two of the positive definite functions to define variogram models: nugget effect model
and spherical model ...................................................................................................................... 11
Figure 2.4 General workflow of machine learning systems ......................................................... 13
Figure 2.5 Linear regression: an example of a simple regression model...................................... 17
Figure 2.6 Algorithm of neural networks ..................................................................................... 18
Figure 2.7 Bias-variance tradeoff ................................................................................................. 20
Figure 2.8 Hard margin linear SVC .............................................................................................. 21
Figure 2.9 Soft margin linear SVC; encircled samples are misclassified samples ....................... 22
Figure 2.10 A simple example of bootstrapping ........................................................................... 24
Figure 3.1 Schematic summary of unsupervised classification .................................................... 27
Figure 3.2 Log-normally distributed global histogram of a variable containing two distinct domains
....................................................................................................................................................... 29
Figure 3.3 A near-perfect fit of a combined distribution (red line) to the global data (blue dots) and
the optimum distributions of two domains forming the combined distribution (green and yellow
lines).............................................................................................................................................. 31
Figure 3.4 Topographical view of SE according to change in parameter µ of domain 1 and domain
2..................................................................................................................................................... 32
Figure 3.5 An example of quadratic form problem with two controlling parameters .................. 33
vii
Figure 3.6 Result of domain assignment according to optimal distributions ............................... 35
Figure 3.7 Result of domain assignment in tri-variate space ........................................................ 36
Figure 3.8 (a) Accuracy assessment with recurring loops and (b) accuracy assessment within one
loop. Notice that red circles show when the algorithm reaches to minimum Euclidean distance.
While the last points in graphs have the best fit between combined weighted CDFs and global data,
results with best clustering are obtained earlier. ........................................................................... 36
Figure 3.9 Main steps of SVC and technique of training-test split............................................... 39
Figure 3.10 Workflow of the study ............................................................................................... 40
Figure 3.11 Hierarchical application of domaining ...................................................................... 41
Figure 3.12 Parameter file of BlockSIS ........................................................................................ 42
Figure 3.13 Effect of image cleaning on the results ..................................................................... 43
Figure 4.1 Two-level hierarchical domaining ............................................................................... 44
Figure 4.2 Histograms - natural logarithms of variables .............................................................. 47
Figure 4.3 Cumulative distributions - natural logarithms of variables ........................................ 48
Figure 4.4 Alteration data obtained from field logging in tri-variate space ................................. 49
Figure 4.5 Optimization result of first case: cumulative probability plots of variables with initial
parameters of normal distribution on the left, cumulative probability plots of variables with
optimum parameters of distributions on the right ......................................................................... 51
Figure 4.6 Improvement in domain assignment ........................................................................... 53
Figure 4.7 Result of domain assignment - distribution of samples in tri-variate space ................ 53
Figure 4.8 Domain assignment result of first case: probability plots comparing achieved and
optimal domaining ........................................................................................................................ 54
Figure 4.9 Result of domain assignment process in coordinate system ....................................... 55
viii
Figure 4.10 Node model of the variables Mg/Al, Mn/K and Zn .................................................. 57
Figure 4.11 Final ensemble model ................................................................................................ 58
Figure 4.12 Vertical cross-sections taken along North axis of the SVC Ensemble model ........... 58
Figure 4.13 Boxplots showing the distribution of accuracies for each model according to subset of
variables. Notice that vertical axis of the graph is limited from bottom and top, 0.89 and 0.92
respectively. .................................................................................................................................. 59
Figure 4.14 Categorization of Ensemble learning according to optimum proportion of domains 60
Figure 4.15 Categorization of Ensemble model – Horizontal profiles ......................................... 61
Figure 4.16 Experimental variograms created along horizontal direction .................................... 63
Figure 4.17 Variogram models built along chosen directions ...................................................... 63
Figure 4.18 Parameter file of BlockSIS ........................................................................................ 64
Figure 4.19 Block models of BlockSIS with light image cleaning and heavy image cleaning .... 65
Figure 4.20 Effect of image cleaning on BlockSIS – horizontal profiles ..................................... 66
Figure 4.21 BlockSIS with heavy image cleaning - Vertical cross-sections taken along North axis
....................................................................................................................................................... 67
Figure 4.22 Categorization of BlockSIS model with heavy image cleaning ................................ 68
Figure 4.23 Categorization of BlockSIS – horizontal profiles ..................................................... 69
Figure 4.24 Comparison of two models with vertical cross-sections ........................................... 70
Figure 4.25 Variables chosen for sub-domains of domain 1 ........................................................ 72
Figure 4.26 Variables chosen for sub-domains of domain 2 ........................................................ 74
Figure 4.27 Optimization of domain 1 with variables Y, Mo and Re: cum. probability plots of
variables with initial parameters of normal dist. on the left, cum. probability plots of variables with
optimum parameters of distributions on the right. ........................................................................ 76
ix
Figure 4.28 Optimization of domain 2 with variables Zn, In and Te: cum. probability plots of
variables with initial parameters of normal dist. on the left, cum. probability plots of variables with
optimum parameters of distributions on the right. ........................................................................ 77
Figure 4.29 Hierarchical domain assignment of domain 1: probability plots comparing achieved
and optimal domaining for the case with variables Y, Mo and Re ............................................... 78
and optimal domaining for the case with variables Zn, In and Te ................................................ 79
Figure 4.31 Methodology followed to realize hierarchical ensemble model ................................ 81
Figure 4.32 Boxplots showing balanced accuracies of SVM models ........................................... 82
Figure 4.33 Categorization of ensemble model ............................................................................ 83
Figure 4.34 Result of second-level hierarchical domain assignment ........................................... 84
Figure 4.35 Vertical cross-sections taken from final ensemble model ......................................... 85
Figure 4.36 Two hierarchical ensemble models along horizontal benches .................................. 86
Figure 4.37 Variogram models built along chosen directions for each sub-domain .................... 87
Figure 4.38 Cumulative probability of E-Type BlockSIS model. Imposing local proportion on
BlockSIS increases the deterministic zones (red dashed circles). Assuming deterministic zones are
correctly estimated: correctly (a) and wrongly (b) assigned samples when global proportion is
neglected and considered, respectively. ........................................................................................ 90
Figure 4.39 Three consecutive cross-sections of BlockSIS model taken along north .................. 92
Figure 4.40 Comparison of BlockSIS models along horizontal benches with two categories (left)
and with four categories (right)..................................................................................................... 93
Figure 4.41 Final ensemble model (left) and BlockSIS model (right) along horizontal benches 94
x
Figure 4.42 Vertical cross-sections of final ensemble model (top) and BlockSIS model (bottom)
....................................................................................................................................................... 95
xi
List of Tables
Table 2.1 A confusion matrix ....................................................................................................... 19
Table 2.2 Types of kernel functions ............................................................................................. 23
Table 3.1 Optimization of thirteen parameters ............................................................................. 30
Table 4.1 Descriptive statistics of sample coordinates and variables. Elements are in ppm, and
ratios are adimensional. ................................................................................................................ 46
Table 4.2 Combinations of 3 out of 6 variables ............................................................................ 49
Table 4.3 Functions used to define the optimization problem ...................................................... 50
Table 4.4 Initial parameters of variables....................................................................................... 50
Table 4.5 Optimum parameters for each subset of variables ........................................................ 52
Table 4.6 Functions used to perform support vector classifier ..................................................... 55
Table 4.7 Scores of SVC with different combination of parameters according to 5-fold cross-
validation....................................................................................................................................... 56
Table 4.8 Mesh grid along which predictions made ..................................................................... 57
Table 4.9 Confusion matrix of categorized ensemble model ....................................................... 62
Table 4.10 Confusion matrices of all realizations with balanced accuracy .................................. 68
Table 4.11 Confusion matrix of categorized BlockSIS model ..................................................... 69
Table 4.12 Descriptive statistics of variables chosen for domain 1. Elements are in ppm, and ratio
is adimensional.............................................................................................................................. 72
Table 4.13 Descriptive statistics of variables chosen for domain 2. Elements are in ppm, and ratio
is adimensional.............................................................................................................................. 73
Table 4.14 Combinations of 3 out of 4 variables chosen for domain 1 and domain 2 ................. 74
Table 4.15 Initial parameters of variables chosen for domain 1 ................................................... 75
xii
Table 4.16 Initial parameters of variables chosen for domain 2 ................................................... 75
Table 4.17 Optimum parameters for each subset of variables chosen for domain 1 .................... 75
Table 4.18 Optimum parameters for each subset of variables chosen for domain 2 .................... 75
Table 4.19 Confusion matrix of final ensemble model................................................................. 84
Table 4.20 Confusion matrix of BlockSIS considering deterministic zones. ............................... 91
Table 4.21 Confusion matrix of BlockSIS considering global proportions.................................. 91
xiii
List of Abbreviations
CDF Cumulative Distribution Function
CG Conjugate Gradient Method
EDA Exploratory Data Analysis
KNN K-Nearest Neighbor
MLA Machine Learning Algorithms
Newton-CG Newton Conjugate Gradient Method
PCA Principal Component Analysis
RBF Radial Basis Function
SE Squared Error
SIS Sequential Indicator Simulation
SVC Support Vector Classifier
SVM Support Vector Machines
t-SNE t-distributed Stochastic Neighbor Embedding
xiv
Chapter 1 Introduction
1.1 Evaluation of Mineral Resources
Building a resource model to determine the value of ore deposits plays a crucial role in the mine
development stage. Scarcely made observations, both on surface and underground, are gathered in
resource models by geologists to predict the geological structure of the ore deposits at target
locations. Traditionally, geologists make use of 2D cross-sections drawn along the target space
according to the observations made through drillhole samples to build a geological model, and
how the orebody extends must be interpreted with a detailed work to make the model accurate.
While the traditional modeling, which is called explicit, is a time-demanding work, implicit
models, which are widely used thanks to computers now, semi-automate and support the process
of prediction much faster with particular functions used for the interpolation.
Whether the model is explicit or implicit, one must interpret the continuity between the drillhole
samples, thus this gives rise inevitably to uncertainties. In fact, the complex nature of geological
structures caused by discontinuities like faults, or geological non-deposition periods makes models
uncertain (Boisvert, 2010), and therefore any uncertainty must be taken into account to make a
resource model reliable. At this point, geostatistics is utilized both for the prediction of unsampled
locations and for assessment of their level of uncertainty. Geostatistics is the field of study which
employs statistical tools to estimate the spatial features and uncertainties of a geological model
(Isaaks and Srivastava, 1989; Rossi and Deutsch, 2016). To better define the uncertainties,
geostatistical simulations are performed to determine a distribution of possible values rather than
a single estimation (Boisvert, 2010). The simulated resource models allow geologists to determine
a confidence interval for the resource models. This is termed probabilistic approach.
1
Traditionally, kriging estimation and simulation methods are considered robust geostatistical
methods to make estimation and to define uncertainties, respectively. While the traditional
geostatistical methods like kriging and simulation methods remain the industry standard, the
mining industry is open to new technologies and methods to effectively manage big exploration
datasets (Hodkiewicz, 2014). In this context, machine learning algorithms (MLA) have been
utilized in several geostatistical studies to support the traditional methods (Cevik and Ortiz,
2020a). In fact, MLA can save important amounts of time and cost spent on labor-intensive works
like geological logging and modeling (Faraj and Ortiz, 2021; Gwynn et al., 2013). Moreover, MLA
offers flexible options thanks to several algorithms available which can be adapted to solve
different geostatistical problems. The applications of MLA performed so far have shown distinct
success (Ortiz, 2019; Shen et al., 2018). Today, machine learning keeps growing in geostatistics
just like all the other fields of scientific studies. Development of new learning algorithms and
theories has driven this ongoing growth (Jordan and Mitchell, 2015).
Although MLA is proven to be successful, MLA has also some problems when it comes to
accuracy of the modeling. To increase the accuracy of the MLA, ensemble approaches have been
developed. Basically, an ensemble approach combines several models to increase the accuracy of
the final model. While one model can be a weak learner with a restricted information or approach,
combining multiple weak models can yield a strong model. The ensemble methods become robust
especially when the knowledge informing the method is also robust.
1.2 Objective and Approach
The main goal of the study is to form a workflow to build a semi-autonomous geological model
utilizing multiple geochemical variables. The statement of the research is that an ensemble
approach based on Support Vector Classifier (SVC), used to determine the extent of geological
2
domains based on sets of geochemical variables, can provide a reliable model to obtain domain
volumes and to determine their uncertainty. One of the main features proposed by the research is
that the model is capable to handle the uncertainty, and it realizes multiple scenarios with different
possible extent of the domains. To do this, the workflow utilizes geochemical information and
multiple variables are combined to realize each scenario. The key point of the work is that the
model can be applied on any geological environment as long as the variables are chosen with
geological expertise. The proposed workflow can be applicable during advanced exploration as a
part of feasibility stage starting after initial exploration campaigns.
The workflow consists of two steps: a binary-unsupervised clustering approach originally
proposed by Faraj and Ortiz (2021) is combined with the second step, an ensemble learning method
formed by SVC. At the first step, the geochemical dataset is analyzed and the variables relevant to
the objective of clustering are determined by an exploratory data analysis (EDA). After EDA, the
unsupervised clustering approach is applied on the variables to split the dataset into two distinct
geological domains. At the second step, SVC is applied on the defined domains to predict the
unknown target locations. The key parameters controlling the SVC models are validated and
determined by k-fold cross-validation method. The individual SVC model is a weak classifier
alone. A stronger classifier is built by the combination of weak classifiers, repeating the process
over subsets of samples and subsets of variables. The uncertainty is also quantified under the strong
classifier by combining the weak classifiers. Finally, a hierarchical model having multiple domains
is built by repeating the process on each domain individually.
The details of the study are explained in the following chapters. A broad overview related to the
types of geological modeling is made in chapter 2. How a general machine learning approach
should be applied and some traditional geostatistical tools as the alternatives of MLA are also
3
discussed in chapter 2. The details of the methodology followed in the study is explained in chapter
3. A case study, on which a geochemical dataset from a Copper Porphyry Deposit is employed to
apply the proposed workflow, is investigated in chapter 4. For comparison, a model is built using
a geostatistical simulation technique, also documented in chapter 4. Finally, conclusions and
recommendations related to future studies are outlined in the last chapter.
4
Chapter 2 Literature Review
Earth sciences are always in search of advancements when it comes to geological modeling. It is
well-understood that traditional geological modeling approaches must be combined with data
analysis techniques which take advantage of the data to learn features, so that the features can be
imposed on the models. In this context, this section covers the current state of the traditional
geological modeling techniques, and then focuses on MLAs to clarify what possible improvements
can be applied on traditional geological modeling techniques.
2.1 Geological Modeling for Resource Estimation
Geological modeling is an important aspect of the resource estimation process. While early
modeling attempts were based on mostly explicit and deterministic applications due to
computational restrictions, today implicit and stochastic approaches are easily applicable on
geological models thanks to the advancements on computer technologies. This section is dedicated
to elaborate on the aforementioned concepts, which are explicit, implicit, deterministic and
stochastic modeling.
2.1.1 Explicit and Implicit Modeling
The earliest efforts to understand geology dates back more than 200 years. Cross-sections drawn
by William Smith in the early 1800s to conceive the underground have been the foundation of
geological mapping and modeling (MacCormack et al., 2015). Traditional geological models have
been built combining 2D cross-sections with linear interpolation. A borehole sample is represented
by a point and polylines are obtained by combining points to represent boundaries between
geological units, finally combining polylines with linear functions yields volumetric models
(Newell, 2018). This type of modeling is termed explicit in the literature. While explicit modeling
is highly capable to represent geology based on a conceptual assumption, any change on the model
5
is challenging because it is time demanding. The need for changes can arise either because new
hard data become available, or when the functions used are not capable to build logical layers. A
new conceptual model can be required if a new hard data is discordant with the current conceptual
model. On the other hand, phantom points, which are manually added points in the space, are
sometimes required to let the functions build logical layers due to the insufficient number of points
available. Limited number of points lead to subjectivity in the model and therefore simple
geological structures like folding and stratigraphic continuum can be further challenges for the
explicit modeling approach. The two challenges expressed above make explicit models hard to
update. Even if the time limitation is neglected, the facts that explicit models are built based on
subjective decisions and not accounting for uncertainty make explicit models inefficient.
These challenges forced researchers to look for faster and more flexible alternatives to explicit
modeling. Contrary to explicit models, implicit models are quick and capable to represent
thickness and continuity of stratigraphy of rocks (Figure 2.1) (Gautier, 2016). While explicit
modeling analyzes data on a section-by-section basis, implicit modeling uses input data
simultaneously to interpolate 3-D scalar fields (Newell, 2018). Advanced mathematical functions
used in implicit models define surfaces successfully. A general approach in an implicit model is
that first a scalar field is described by a signed distance function, and then the surface is
interpolated tracing the field values with the help of another function along tetrahedral mesh points
(Hillier et al., 2017; Guo et al., 2020). The most used interpolation function is the radial basis
function and its derivatives. Today, implicit modeling has been adopted by commercial software
packages like Leapfrog and GoCad/SKUA, and has been endorsed by the mining industry.
6
Figure 2.1 (a) An example of a failed explicit model. The model requires phantom points to build
the boundary represented by orange color. (b) An implicit model with successful folding structure
2.1.2 Deterministic and Stochastic Modeling
Besides the concepts of implicit and explicit modeling, two additional concepts, which are crucial
for a well-developed understanding on modeling, are the deterministic and stochastic approaches.
By definition, a deterministic approach is an assumption made about an aspect of a model which
is fixed and imposed by the user. On the other hand, probability can be described as an aspect of a
model which is guided by a random (stochastic) outcome of a probabilistic approach (Bentley and
Ringrose, 2021). Early applications of geostatistics have focused on deterministic approaches,
therefore they were devoid of the uncertainty assessment (Galli and Beucher, 1997). In
geostatistics, the terms probability and stochasticity are frequently used interchangeably. Today
the requirement of a probabilistic approach by means of random processes, also known as
stochastic approach, is an accepted fact in geological modeling (Deutsch and Journel, 1997). The
fact that there is only limited information available to represent a model makes randomness a
natural part of geological models. Stochastic models are generally obtained as collection of all
possible scenarios, while deterministic models depict only one average scenario (Figure 2.2). A
model considering all possible scenarios provides flexibility at the stage of decision-making.
Although focusing on only one scenario can cause overfitting of models, models represented by
7
high variations can also become inefficient. An optimum approach would be to develop some
deterministic approaches about geological features like faults, which can be identified by
geophysical applications, and then build the stochastic model under deterministic constraints (Galli
and Beucher, 1997).
Figure 2.2 Deterministic model (left) and stochastic model with alternate models (right)
Stochastic models can be reviewed under two methods, which are pixel-oriented and object-
oriented methods. As the name implies pixel-oriented models gives outputs at nodes that discretize
the working space, utilizing two-point variogram statistics (although more advanced techniques
with multiple-point statistics have been proposed). On the other hand, object-oriented models are
aimed to identify geological objects without fixing them in a strict Cartesian grid system
(Hassanpour and Deutsch, 2010).
Sequential indicator simulation (SIS) can be considered one of the most well-known traditional
methods of pixel-oriented stochastic models. It is important to review SIS to better understand
8
traditional pixel-oriented probabilistic models. Indicator variables are generally used to define
geological rock types or facies prior to definition of petrophysical properties of units (Deutsch,
2006). An indicator is a function describing 𝐾 categories with binary values, such that value is 1
if the location belongs to a category, otherwise 0. The indicator function is
1, 𝑖𝑓 𝑠𝑘 𝑝𝑟𝑒𝑣𝑎𝑖𝑙𝑠 𝑎𝑡 𝑢
𝐼𝑘 (𝑢; 𝑠𝑘 ) = { (2.1)
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where 𝑠𝑘 is the true category at location 𝑢. Indeed, the expected value of each indicator can be
viewed as probability of the corresponding category and its variance is given by
𝑉𝑎𝑟{𝐼(𝑢; 𝑘)} = 𝑝𝑘 (1 − 𝑝𝑘 ) (2.2)
where 𝑝𝑘 is the prior probability of category 𝑘. To realize SIS, the spatial continuity of the indicator
variables should be defined by the corresponding indicator variograms:
1
𝛾𝐼 (ℎ; 𝑠𝑘 ) = ∑ [𝐼(𝑢; 𝑠𝑘 ) − 𝐼(𝑢 + ℎ; 𝑠𝑘 )]2 (2.3)
2|𝑁(ℎ)|
𝑁(ℎ)
where ℎ is the distance between two points and 𝑁(ℎ) is the number of pairs considered in the
variogram calculation and 𝑠𝑘 is the 𝑘-th category. The traditional approach to discover the spatial
continuity of an indicator variable is that variograms are calculated experimentally along certain
directions at some lags, so that an understanding about the spatial continuity of the indicator
variable can be developed. Different directions in space are inspected to check if there is an
anisotropy of the spatial continuity. Finally, variogram models are built along the directions which
represent the anisotropy best, using nested structures of functions, so that SIS can use this spatial
continuity to determine the probability distribution of the categories at each point in space based
on the available information. The variogram model combines a nugget effect to consider short-
9
range randomness of variables, and n other valid variogram functions (Deutsch, 2003). The nugget
effect model is given by
0 𝑖𝑓 ℎ = 0
𝛾(ℎ) = { (2.4)
𝐶 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where h is the distance and C is the sill contribution of the function to the variogram. Other than
the nugget effect, one of the most common functions to define variogram models is the spherical
model:
3 ℎ 1 ℎ3
𝐶( − ) 𝑖𝑓 ℎ ≤ 𝑎
𝛾(ℎ) = { 2 𝑎 2 𝑎3 (2.5)
𝐶 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where h is the distance, a is the range, and C is the sill contribution of the function to the variogram
model (Figure 2.3). There are also other functions to define variogram models with different
characteristics, which are exponential, gaussian, power and sine cardinal models, among others.
For simplicity, the number of nested structures should not be more than three. A nested structure
of a variogram model with one nugget effect model and two more models is given by
𝛾(ℎ) = 𝐶𝑁 ∙ 𝑁𝑢𝑔(ℎ) + 𝐶1 ∙ 𝑀𝑜𝑑𝑒𝑙𝑎1 (ℎ) + 𝐶2 ∙ 𝑀𝑜𝑑𝑒𝑙𝑎2 (ℎ). (2.6)
10
Figure 2.3 Two of the positive definite functions to define variogram models: nugget effect model
and spherical model
Point-based evaluations are performed ‘sequentially’ using Monte Carlo simulation based on the
knowledge of the spatial distribution obtained using variograms and considering the closest points,
which are either available samples or previously simulated locations. Different order of point
evaluations and different drawings by Monte Carlo simulation over the conditional cumulative
distributions provides different realizations, and thus this ensemble of realizations provides the
stochastic model. Although SIS is a simple and robust algorithm, the realizations obtained by SIS
may contain short-scale variations which make a geological body modeling unrealistic. Post-
processing algorithms for image cleaning may be used on the problem of short-scale variations
(Deutsch, 2006). While SIS is normally considered a point-oriented stochastic simulation method,
when image cleaning techniques are applied, this renders the models akin an object-oriented
approach. BlockSIS offered by Deutsch (2006) is one of the programs which makes use of this
image cleaning technique, providing smoothed models that are comparable with interpreted
geological models obtained by explicit or implicit modeling. There are also other point-based
11
simulation techniques like truncated Gaussian simulation (TGS), truncated pluriGaussian
simulation (TPGS) and multiple point statistics (MPS), which are valid today. The name Gaussian
is given because raw data is transformed to Gaussian, and then a truncation rule is applied on the
continuous Gaussian data to transform simulated Gaussian values into discrete values according
to thresholds. TPGS is applied in the same way except multiple Gaussian fields are used to define
the domains. All the techniques aforementioned here are variogram-based techniques. On the other
hand, relatively newer techniques called multiple point statistics MPS search for patterns in the
data rather than two-point statistics of variography (Bentley and Ringrose, 2021).
Although pixel-oriented models are simpler and more popular, object-oriented models have also
been used in recent years. The main motivation of using object-oriented models is to determine
the volume and location of a well-understood geological object (Hassanpour and Deutsch, 2010).
While pixel-oriented models can encounter problems when it comes to highly curved boundaries,
object-oriented models can handle such boundaries. Typical examples of geological objects for
which object-oriented modeling is ideal are fluvial systems like paleo-channels, or submarine fan
systems. Traditional object-oriented approaches model the geological objects with simple
volumetric formulations and interpolation methods. Interpolation methods can be critical to model
objects with specific geometries. The process which object-oriented models employ is called
marked point process (Bentley and Ringrose, 2021; Holden et al., 1998). Points are randomly
picked, and an object is defined according to the imposed proportions and volumetric parameters
like width and orientation. The objects which do not satisfy the prior data constraints are rejected,
and this process is repeated until the model is completed. While traditional object-oriented models
are simpler in terms of formulation, it frequently requires user intervention or correction. These
models are hard to condition when many data are available.
12
2.2 Overview of Machine Learning Algorithms
Géron (2017) describes machine learning as the science of programming which lets computers
learn from data. Indeed, the word “learning” describes the inspirational part of the technique. In a
broad sense, raw data is processed to extract knowledge, and the knowledge is trained with
machine learning algorithms (MLA) after the preparation process. Model prediction and validation
is performed to check if the output is accurate or not. Finally, the output feedback the environment
to make possible improvements. This process is repeated until a satisfactory output is obtained
from the algorithm (Figure 2.4).
Figure 2.4 General workflow of machine learning systems
MLA can be classified under two main types which are supervised learning, where learning is
performed to make predictions according to a given set of information where inputs and outputs
are known, and unsupervised learning, where learning is performed without prior assumptions
13
about the outputs, to detect the potential clusters of input samples and identify patterns.
Unsupervised learning seeks for samples which may form possible patterns or groups given a set
of input variables. The main aim of unsupervised learning algorithms is to gather descriptive
information rather than modeling labels, or categories. On the other hand, supervised learning
algorithms tries to fit a model on the input based on the imposed response given by the known
outputs. Both supervised and unsupervised learning methods are frequently utilized in resource
modeling individually for specific purposes and sometimes an unsupervised learning method is
used to inform the subsequent supervised learning methods (Cevik and Ortiz, 2020b).
Unsupervised learning algorithms can be classified under three sub-topics, which are clustering,
dimensionality reduction and association rule learning (Géron, 2017). Generally, clustering is the
main goal of unsupervised learning algorithms. k-Means clustering is one of the traditional
unsupervised clustering algorithms. Basically, k-Means clustering method assigns samples
randomly to k number of clusters, then the algorithm tries to increase the accuracy of the clustering
by changing the assigned label of randomly picked samples. Accuracy of each cluster is assessed
by squared Euclidean distance (James et al., 2013):
|𝐶𝑘 | |𝐶𝑘 |
1
𝑊(𝐶𝑘 ) = ∑ ∑(𝑥𝑖 − 𝑥𝑗 )2 (2.7)
|𝐶𝑘 |
𝑖=1 𝑗=1
where 𝑊(𝐶𝑘 ) is the measure of how samples differ from each other, |𝐶𝑘 | is the number of
observations in the cluster 𝐶𝑘 , 𝑥𝑖 and 𝑥𝑗 are the pair of samples which belong to cluster 𝐶𝑘 . The
equation is in fact the sum of Euclidean distances of pair of samples picked from the same cluster
divided by the number of observations in cluster 𝐶𝑘 . As 𝑊(𝐶𝑘 ) decreases, the accuracy of 𝐶𝑘
increases. The main controlling parameter of the algorithm is the number of clusters into which
14
samples are assigned. Overall, k-Means clustering turns into a minimization problem given below.
As a result of Eq. ((2.8)), samples are assigned to a cluster in a way that average pairwise distance
of samples are minimized for each respective cluster.
minimize {∑ 𝑊(𝐶𝑘 )} (2.8)

𝐶1 ,…,𝐶𝑘
𝑘=1
Dimensionality reduction techniques are seen as necessary when the number of variables is too
large to develop a logical relationship between variables. Dimensionality reduction techniques aim
at reducing the number of variables to a reasonable amount with minimum loss of information
from the input data. To do this, dimensionality reduction techniques check for correlations and
variations between variables and record the information with maximum variation in the minimum
number of variables possible. A traditional example of dimensionality reduction techniques can be
considered Principal Component Analysis (PCA) offered by Hotelling (1933). PCA makes use of
eigenvalues and eigenvectors, and it detects the major linear directions of variation in a dataset.
While PCA is a linear-based dimensionality reduction technique, relationships between variables
may not be only explained linearly but also non-linearly. As an alternative to PCA, t-distributed
Stochastic Neighbor Embedding (t-SNE) is another dimensionality reduction technique which
accounts for non-linear relationships between the original variables. t-SNE constructs probability
distributions for both high dimensional data and low dimensional embeddings (van der Maaten
and Hinton, 2008), and it tries to match the two probability distributions to better represent the
data in the lower dimension. Finally, association rule learning is used to detect how variables are
associated to each other and final groups with related variables are built accordingly (Dobilas,
2021). Association rule does not use a prior knowledge about the variables, instead it tries to extract
the co-existing patterns in a dataset (Agrawal et al., 1993). While association rule methods are
15
originally developed to discover customers’ shopping habits, it is also applied in different fields
(Kumbhare and Chobe, 2014).
Some of the most common supervised learning algorithms are k-Nearest Neighbors (KNN), linear
regression, logistic regression, Naïve Bayes, Support Vector Machines (SVM), decision trees,
random forests, and neural network (Cevik and Ortiz, 2020b). Supervised learning algorithms can
be divided into two sub-topics according to the objective: classification and regression.
Classification algorithms train several variables based on the prior knowledge to make prediction
on discrete output variables, e.g., 0s for the false labels and 1s for the true labels. On the other
hand, regression algorithms try to fit a model on several variables and make predictions on
continuous samples (Figure 2.5). Linear regression and logistic regression can be good examples
of regression models, and KNN, SVM, and decision trees can be both applied for regression and
for classification tasks. The algorithm of SVM utilized for classification tasks is called Support
Vector Classifier (SVC). SVC will be explained in detail later. KNN, which is one of the most
frequently used supervised learning algorithm, makes predictions on test samples to assign a class
calculating the distance to the training samples (Christopher, 2021). The basic algorithm of KNN
selects K samples from the training data closest to the test sample and the most frequent sample is
assigned to test data. Decision trees builds decision rules to predict the value of a target (Pedregosa
et al., 2011). Decision trees are built from roots, where general rules and features are located, to
leaves where more detailed rules and features are located. Decision trees are generally preferrable
since they are relatively easier to understand and easy to visualize. On the other hand, neural
networks are conceived harder to understand because of their relatively complex structure. Neural
networks employ hidden layers made of tuning parameters between input and output layers to
make predictions. The tuning parameters of hidden layers are fit to the model to improve the
16
predictions (Figure 2.6), and fitting is controlled by a loss function. Neural network algorithms are
employed in complex artificial intelligence tasks like image recognition and search engines
(Géron, 2017).
Logistic regression, another popular algorithm, models the samples’ probability of belonging to a
category rather than an estimation (James et al. 2013). Logistic regression improves linear
regression function and limits the range of probabilities between 0 and 1 using logistic function
(2.9).
Figure 2.5 Linear regression: an example of a simple regression model
𝑒 𝛽0+𝛽1𝑋
𝑝(𝑋) =
1 + 𝑒 𝛽0+𝛽1𝑋 (2.9)
The function takes the final form when the exponential part of the function is left alone and the
logarithm of both side is taken,
𝑝(𝑋)
log( ) = 𝛽0 + 𝛽1 𝑋
1 − 𝑝(𝑋) (2.10)
17
and hence probability of 𝑋 has logarithmic behavior, while 𝑋 has linear behavior. The coefficients
𝛽0 and 𝛽1 are estimated using approaches like maximum likelihood to fit the model. Once the
coefficients are estimated, probability prediction can be done like a linear regression problem.
Figure 2.6 Algorithm of neural networks
A model always requires a validation process to understand whether it represents the data well or
not. Therefore, assessing the accuracy of the model is crucial for the final decisions made on the
models. Accuracy of an unsupervised learning algorithms is assessed either with qualitatively, or
with mathematical measures applied on the task since there is no prior knowledge related to the
data. Mathematical measures applied on the unsupervised learning tasks generally assess how well
clustering is performed.
On the other hand, a direct quantitative assessment on final results is possible for supervised
learning tasks since there are prior knowledge related to the data. To assess the accuracy of
supervised learning tasks, data is split into training and validation sets prior to modeling. While
training data is used to train the model, performance of the model is evaluated with validation set
18
of the data (James et al., 2013). If the supervised learning task is a classification task, a confusion
matrix can be built which compares predicted results with validation set (Table 2.1), so that
performance of the model can be observed for each label.
Table 2.1 A confusion matrix

Label Negative True-Negative False-Positive
True
Positive False-Negative True-Positive

Negative Positive
Predicted Label
While it is possible to obtain an overall accuracy from the confusion matrix, a balanced accuracy
can be calculated, which accounts for the arithmetic mean of true-positive and true-negatives,
when the proportion between binary labels is imbalanced (Pedregosa et al., 2011). The equation
for balanced accuracy is
1 𝑇𝑃 𝑇𝑁
𝐵𝑎𝑙𝑎𝑛𝑐𝑒𝑑 − 𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ( + )
2 𝑇𝑃 + 𝐹𝑁 𝑇𝑁 + 𝐹𝑃 (2.11)
where TP is true-positives, FN is false-negatives, TN is true-negatives, and FP is false-positives
obtained from the model prediction.
Basically, all MLA models should have low bias, which means having less difference between
training data and model, and low variance, meaning that having less change on the model when
the training data set changes (James et al. 2013). However, it is a general characteristic for a model
that bias and variance are inversely proportional to each other, and this is called bias-variance
tradeoff (Rocca, 2019). Because bias and variance are inversely proportional to each other, an
optimum model complexity should be determined for the pairs of optimum bias and variance
values (Figure 2.7).
19
Figure 2.7 Bias-variance tradeoff
2.2.1 Support Vector Machine and Support Vector Classifier
SVM is a powerful method for machine learning because it can handle complex regression and
classification problems (Bishop, 2006) and it has been popular for many years (Géron, 2017). The
name of SVM comes from the subset of training samples, known as support vectors, utilized in the
decision function. Support vectors let the computer use a small amount of training subsets, and
thus the model becomes memory efficient (Pedregosa et al., 2011). In the case of a classification
problem, support vectors optimize class margins to a maximum possible span. To do this, SVC
utilizes a complex mathematical algorithm including sparse kernel techniques.
The mathematics behind SVC can be challenging for a learner. The best approach to understand
the logic of SVC is to start from a simple binary-class and two-parameter case. For this simple
case, SVC model takes the linear form
20
𝑦(𝑥𝑛 ) = 𝑤 𝑇 Φ(𝑥𝑛 ) + 𝑏 (2.12)
where 𝑥𝑛 is the training input from 1 to N number of inputs, 𝑦(𝑥𝑛 ) is the target class for the input
𝑥𝑛 , Φ(𝑥𝑛 ) is a fixed transformation function of 𝑥, 𝑤 𝑇 is a coefficient vector which is maximizing
the distance between the binary classes, and b is the parameter to control the bias (Bishop, 2006).
The equation is quite similar to logistic regression technique, however the main difference between
logistic regression and SVC is that SVC maximizes the distance between classes with the help of
margins at each side of the class boundary (Figure 2.8). The samples located on these margins are
called support vectors.
Going back to the equation, SVC tries to approximate best values to the unknowns, 𝑤 𝑇 and 𝑏, to
maximize the margins of the boundary. The optimum condition is provided when 1/‖𝑤‖ is
maximized. If classification can be performed without letting any misclassification, the
classification is called hard margin classification like in Figure 2.8. In its simplest manner, the
problem takes the form
𝑡𝑛 (𝑤 𝑇 Φ(𝑥𝑛 ) + 𝑏) = 1 (2.13)
where 𝑡𝑛 is the target for samples from 1 to N number of samples.
Figure 2.8 Hard margin linear SVC

21
However, in most cases hard margin classification is not possible. At this point, soft margin
classification comes as a solution. Soft margin classification lets the model misclassify some
samples to reach the optimum condition (Figure 2.9). To make misclassification possible, slack
variables, ξ𝑛 , are introduced into the equation (Bishop, 2006). ξ𝑛 = 0 when data points are correctly
classified, and the rest is ξ𝑛 = |𝑡𝑛 − 𝑦(𝑥𝑛 )|, meaning that samples located on boundary or samples
passing boundary are penalized up to 1. After the introduction of slack variables, the classification
problem takes the form:
𝑡𝑛 𝑦(𝑥𝑛 ) ≥ 1 − ξ𝑛 (2.14)
Theoretically, minimizing the summation of condition (2.14) for n ranging from 1 to N number of
samples, can provide the best classification:
𝑁
1
𝑚𝑖𝑛 ∑ ξ𝑛 + ‖𝑤‖2 (2.15)
2
𝑛=1
The Eq. ((2.15)) is scaled by a parameter C, and C, with the condition of being higher than 0,
regularizes the complexity of the model.
Figure 2.9 Soft margin linear SVC; encircled samples are misclassified samples
22
So far, the mathematical explanation of SVC is done as simple as possible with focus on linearly
separable class problems. However, most of the real scenarios require non-linear solutions with
quadratic function problems. Detailed explanations for quadratic function problems can be found
in Bishop (2006). For the sake of an easier comprehension, the problem is kept simple. However,
it is worth to mention kernel functions to better understand the non-linear solutions. The
transformation functions we called earlier Φ(Xn) are kernel functions. Technically, Kernel
functions work as if more features are added to the sample space to make classes separable (Géron,
2017). The most popular kernel functions are tabulated in Table 2.2 below:
Table 2.2 Types of kernel functions

Kernel Types Function
Linear Kernel 𝑘(𝑥, 𝑥 ′ ) = 𝑥 𝑇 𝑥 ′
Polynomial Kernel 𝑘(𝑥, 𝑥 ′ ) = (𝑥 𝑇 𝑥 ′ + 𝑟)𝑑
Radial Basis Kernel ′ 2
𝑘(𝑥, 𝑥 ′ ) = 𝑒 (−𝛾 𝑥−𝑥 )
Sigmoid Kernel 𝑘(𝑥, 𝑥 ′ ) = tanh(𝛾(𝑥 𝑇 𝑥 ′ ) + 𝑟)
Linear kernel, polynomial and radial basis kernel are derived from the same equation. While d
equals 1 in polynomial kernel, polynomial kernel becomes linear kernel. Increasing d can be
considered as increasing the dimension of the feature space in the training process. If d
approximates to ∞, polynomial kernel approximates to radial basis function. The approximation is
performed by making use of Taylor Expansion Series. Fundamentally, employing radial basis
function can yield similar but much faster performance compared to polynomial kernel when d is
∞. Therefore, radial basis function (RBF) generally yields better results on the implementation of
SVC.
23
2.2.2 Ensemble Learning
Ensemble learning methods basically combine several models to build complex models with
optimum bias and variance. The models used to build ensemble model are called weak learners.
Typically, weak learners have high bias or high variance. Combining all weak learners reduces
bias and variance, hence the process yields better models. The new models are called strong
learners. The most well-known ensemble learning algorithms are bagging, boosting, and stacking.
Bagging stands for bootstrap aggregating meaning that averaging of bootstrapped subsets of data.
Bootstrapping is the process of sampling available data with replacement up to a determined
amount (Figure 2.10). After each random sampling, the sample is replaced back into the global
data, so that it can be sampled more than once for each attempt. While each model built using a
subset of data is considered weak typically with high variance, averaging all data can yield a strong
model with relatively low variance (James et al., 2013). Bagging can be applied to optimize the
parameters of a single MLA, and it is applicable to all MLA.
Figure 2.10 A simple example of bootstrapping
24
Boosting, which is another ensemble learning method, is also performed to optimize the
parameters of a single MLA. While boosting is performed using resampling with replacement like
bagging, the difference is that models are combined sequentially. After each resampling, the model
is trained according to a resampled subset of data and testing is performed according to the original
data. The next subset of data is more likely to be chosen from the poorly represented part of the
previous data, so that this time bias can be decreased. This process continues up to a determined
number of weak learners, and the final output model is obtained by combining all sequential
models. The weak learners with higher accuracy get a higher weight in the combination to increase
the accuracy of the ensemble model. This type of boosting is called adaboost standing for adaptive
boosting. In contrast to bagging, boosting utilizes weak learners with high bias to build a strong
model with low bias. Although the process decreases the bias substantially, it should be noticed
that keeping the number of weak learners too high may cause overfitting.
Stacking can be considered complex compared to bagging and boosting since it utilizes different
MLAs to build a strong model. The data is split into two, and one half is trained with some MLAs.
The outputs of the MLAs are combined with different techniques, such as stacked models,
blending or super ensemble. The combined outputs are used as input in a main MLA which is
called meta-model (Rocca, 2019). Here, outputs of MLAs used to train this meta-model are
considered weak learners. Finally, training the combined outputs as inputs in the meta-model
yields the final strong learner.
25
Chapter 3 Methodology
3.1 Unsupervised Classification for the Definition of Geological Domains
The unsupervised classification method adopted for the first step is the workflow outlined in Faraj
and Ortiz (2021). The workflow leans on the idea that certain geochemical elements of samples
belonging to a distinct geological unit show certain distribution characteristics in nature. The log-
normal distribution is frequently observed in geochemical elements. Moreover, the log-normal
distribution can approximate to normal distribution as the parameters approach small values.
Furthermore, the ratio of two elements with log-normally distributed populations also show a log-
normal distribution.
Based on the idea expressed above the distinct geological domains in a data can be identified if
the parameters of the log-normally distributed populations are determined correctly. These
parameters are the mean (µ) and standard deviation (σ) of each population, and their corresponding
proportion in the global data (w). The weighted combination of cumulative distributions of these
domains yields the global distribution. Considering that clustering more than 2 domains can be
costly in terms of computation process, this study adopted a binary clustering approach. The
method can be applied hierarchically to define multiple domains. The workflow is formed by three
main steps, which are:
1. Exploratory data analysis to determine the relevant geochemical variables,
2. Optimization of the parameters of the log-normal distributions of variables, and
3. Domain assignment according to the determined ideal distribution.
These steps are illustrated in Figure 3.1. The three steps are repeated using different sets of 3
variables of data to later utilize on the second step of the study.
26
Figure 3.1 Schematic summary of unsupervised classification
The word “domain” used in the thesis is a generalized term which refers to a volume of material
that shows consistent properties. It is a concept used to help model a phenomenon in space.
Domains are usually inferred from samples and variables available at sample locations. If the
variables used to define domains are elements of rock-forming minerals, then the domains likely
refer to lithology. Domains can be attributed to mineralization or alteration if the variables allow
the modeler distinguishing between different populations referred to these attributes. However, the
information used to define domains is not constrained to geochemical elements. For instance,
quantified geotechnical or structural properties of samples like texture, stability, resistance to
weathering or permeability can be employed to define geotechnical or structural domains.
3.1.1 Exploratory Data Analysis
The most relevant geochemical variables identifying the distinct domains are chosen in this step.
Important criteria of how geochemical variables are determined is whether the chosen variables
are discriminatory enough to show the distinct log-normally distributed domains. Geochemical
elements for domaining can be chosen visually with the help of plots like histograms (Figure 3.2).
27
On the other hand, specific domain knowledge (in this case, geological knowledge) plays a critical
role in the selection of variables, and the geological literature can significantly assist to determine
the geochemical variables that are relevant to discriminate among geological domains. Domain
knowledge can be based on alteration, rock type or mineralization. While some elements give
information about the rock type and geological environments, others can be useful to define
alteration and mineralization. In the case of porphyry copper deposits, the composition of major
elements like Zn, Pb, Al, Mg, and K can be attributed to the chlorite-sericite, quartz-sericite,
potassic and argillic alteration groups (Sillitoe, 2010; Faraj and Ortiz, 2021). The major alteration
groups can be roughly split into binary domains as chlorite-sericite & potassic and quartz-sericite
& argillic.
Although one variable can be enough to identify distinct domains theoretically, utilizing a higher
number of variables increases the accuracy of the domaining. Three geochemical variables are
utilized to identify domains in the study, so that variables having fuzzy boundaries between
domains can be combined with other variables to contribute to a better definition of these domains.
Once a set of variables is determined out of all variables, it is going to be employed in the following
steps.
28
Figure 3.2 Log-normally distributed global histogram of a variable containing two distinct
domains
3.1.2 Optimization
The optimization process is critical to have an accurate domaining. Optimization is performed
based on the assumption that the geochemical variables belonging to a geological domain are log-
normally distributed. It is important to notice that if a variable is log-normally distributed, the
variable’s natural logarithm shows a normal distribution with parameters µ and σ. Considering this
assumption, all the works are done utilizing variable’s natural logarithm.
When binary classification is performed using three variables, the number of parameters used to
define the distribution of each variable in each domain is in total thirteen. Considering each
variable under a domain should have the parameters µ and σ to define the log-normal distribution,
three variables with two geological domains require twelve parameters in total to define the
29
distribution of the global data. Adding the proportion of the two domains in the global data makes
the number of parameters required to be optimized thirteen in total (Table 3.1).
Table 3.1 Optimization of thirteen parameters

Domain 1 Domain 2
Variable 1 µ11, σ11 µ12, σ12
Proportion w (1-w)
The objective of the optimization is to fit a combined weighted cumulative distribution function
(CDF) of the two domains to the global distribution of the data for each one of the variables. The
level of the fit between combined CDF and the global distribution is assessed by calculating the
squared error (SE). An exact fitting combined CDF to the global distribution means that the
weighted combination of the CDFs assigned to the two domains matches exactly the global CDF
(Figure 3.3), and in that case, the SE converges to zero (Figure 3.4). Therefore, a minimization
approach is adopted to solve the optimization problem. Optimum parameters for each of the
variables are defined by solving the minimization problem expressed below:
𝑉 𝑁
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒{∑ ∑(𝐺𝑙𝑜𝑏𝑣 (𝑧𝑛 ) − 𝐶𝑜𝑚𝑏𝑣,𝑘 (𝑧𝑛 ))2 } (3.1)

𝑣=1 𝑛=1
where 𝐺𝑙𝑜𝑏𝑣 (𝑧𝑛 ) is the percentile rank of the sample n of variable v in the global data, and
𝐶𝑜𝑚𝑏𝑣,𝑘 (𝑧𝑛 ) is the combined CDF of the sample n of variable v according to the proportions of
k-domains:
𝐶𝑜𝑚𝑏𝑣,𝑘 (𝑧𝑛 ) = 𝑤1 ∗ 𝐶𝐷𝐹𝑣,1 (𝑧𝑛 ) + (1 − 𝑤1 ) ∗ 𝐶𝐷𝐹𝑣,2 (𝑧𝑛 ) (3.2)
where k is 2 for the binary classification case, 𝑤1 is the weight of domain 1 in the global data and
𝐶𝐷𝐹𝑣,𝑘 (𝑧𝑛 ) is the cumulative distribution of sample n of variable v in domain k.
30
In case of a poor fit between the combined weighted CDF of two domains and the global
distribution, it can be inferred that either the assumption made about the type of populations’
distribution is wrong, or the number of most distinct populations are more than 2. If the poor fit is
caused by a wrong assumption about the distributions, a better fit could be obtained when different
types of distribution, such as normal or Poisson distributions, are assumed for the populations.
Figure 3.3 A near-perfect fit of a combined distribution (red line) to the global data (blue dots)
and the optimum distributions of two domains forming the combined distribution (green and
yellow lines)
31
Figure 3.4 Topographical view of SE according to change in parameter µ of domain 1 and
domain 2.
There are different approaches to solve minimization problems. Some of the methods include
Powell’s method, Conjugate Gradient method (CG), Newton Conjugate Gradient method
(Newton-CG), BFGS which stands for Broyden, Fletcher, Goldfarb and Shanno method (Adams,
1996). Out of the methods mentioned, Powell’s method can be considered the simplest method.
Powell’s method is a quadratically convergent iterative function, meaning that it can be effective
on quadratic problems (Figure 3.5). However, considering most functions are quadratic near their
extrema, parameters defining the problem should be well defined to let the function converge to
extrema. Powell’s method is considered greedy because it can be challenging to finalize the
optimization when the parameters and constraints are not determined well. As an alternative, CG
32
is also a simple method used for quadratic problems aimed to find the solution giving zero gradient
(Shewchuk, 1994). CG method makes use of Eigenvalues and Eigenvectors to find the extrema.
BFGS and Newton-CG can be considered improved versions of CG method. However, while CG
does not require any extra information, BFGS and Newton-CG require first order partial
derivatives of the problem, which form the Jacobian matrix, and second order partial derivatives,
which is the Hessian matrix. Considering all, CG method is utilized in the minimization problem.
Figure 3.5 An example of quadratic form problem with two controlling parameters
33
3.1.3 Domain Assignment
Once the optimum parameters are determined, each sample must be assigned to one of the
domains. Initially, domains of samples are randomly assigned according to the optimum weight of
domains determined during the optimization stage, and hence the optimum weight is already
provided before starting the domain assignment step. This initial assignment creates two
distributions that do not match the parametrized distributions obtained in the previous step. The
final domain assignment is performed utilizing an algorithm which checks if swapping randomly
picked samples from each domain yields a lower relative error, meaning better fit for both domains
CDFs (Figure 3.6 & Figure 3.7). Notice that each domain swapping passes the value for all three
variables from one domain to another for each of the samples swapped. The equation of the relative
error is
𝐾 𝑉
|𝑓𝑣,𝑘 (𝑍𝑛 ) − 𝑓𝑣,𝑘 (𝑧𝑛 )|
∑∑ (3.3)
𝑓𝑣,𝑘 (𝑍𝑛 )
𝑘=1 𝑣=1
where 𝑓𝑣,𝑘 (𝑍𝑛 ) is the ideal CDF of sample n of variable v in domain k, and 𝑓𝑣,𝑘 (𝑧𝑛 ) is the current
rank of sample n of variable v in domain k. If the relative error decreases after the potential swap,
then the swapping of the randomly picked samples is realized by the algorithm. The loop continues
running until there is no change in domains up to a determined threshold value.
At the end of the loop, the improvement of the accuracy is assessed using the equation applied for
k-Means Clustering method:
𝐾 |𝐶𝑘 | |𝐶𝑘 |
1
∑ ∑ ∑(𝑥𝑖 − 𝑥𝑗 )2 (3.4)
|𝐶𝑘 |
𝑘=1 𝑖=1 𝑗=1
34
where 𝐶𝑘 is the domain, |𝐶𝑘 | is the number of observations in domain 𝐶𝑘 , K is 2 for the binary
clustering, and 𝑥𝑖 and 𝑥𝑗 are the pair of samples in domain 𝐶𝑘 . After computing the accuracy, the
loop of swapping samples starts again. This recurring system of loops stops when there is no
change in accuracy of the domains, and the domain assignment with the best K-Means Clustering
score is recorded at the end of the algorithm (Figure 3.8-a). As an alternative to the method of
accuracy assessment expressed above, accuracy can be calculated after certain number of
swapping attempts within only one loop, which ends when there is no swap after a determined
threshold value (Figure 3.8-b). While the first method of accuracy assessment provides better
clustering, the latter performs the clustering faster with insignificant amount of accuracy loss. It is
important to notice that the K-Means Clustering scorer minimizes the pair-wise distance between
samples belonging to a cluster, while maximizing the distance between clusters in trivariate space
(Figure 3.7). Because the objective of K-Means Clustering and of searching for the best fit between
the combined weighted CDFs and global data are different, the best fit does not necessarily give
the best clustering result (Figure 3.8). The final output data of the unsupervised clustering
algorithm is used to inform the next step, Ensemble learning with Support Vector Classification.
Figure 3.6 Result of domain assignment according to optimal distributions
35
Figure 3.7 Result of domain assignment in tri-variate space
Figure 3.8 (a) Accuracy assessment with recurring loops and (b) accuracy assessment within
one loop. Notice that red circles show when the algorithm reaches to minimum Euclidean
distance. While the last points in graphs have the best fit between combined weighted CDFs and
global data, results with best clustering are obtained earlier.
36
3.2 Ensemble Learning with Support Vector Classification
The second stage of the study focuses on the uncertainty modeling utilizing an ensemble approach.
The knowledge obtained after repeating the stage one on different set of variables is used to inform
SVC. SVC is applied on a subset of samples. This process is known as bootstrap aggregating, or
bagging, and the process is repeated with the different sets of geochemical variables. All the
models obtained employing different subsets of samples and different subsets of variables are
considered weak learners, and weak learners are combined to build a strong learner. This is the
principle of ensemble learning. At this point, it is important to notice that the number of variables
in the geochemical dataset should be large so that subsets of variables can be chosen to apply
ensemble learning. For example, if five variables are found to be useful for the classification, and
three variables are used for each learner, then there will be twenty combinations to find weak
learners to be used for the ensemble learning approach.
3.2.1 Application of Support Vector Classification
As stated previously, SVC is a supervised MLA used generally for binary classification problems.
A certain portion of the input data is used to train SVC, and the remaining part of the input is
reserved for validation of the model. Since the objective of the approach is domaining, for this
study, the coordinates of samples in 3-D space are utilized as input data, and the domains are
employed as output of the model (labels). The goal is to classify points in the 3D space to determine
the extent of each of these two domains. First, data are split into training and test datasets before
modeling. Twenty five percent of the data is left for testing and the rest is used for training.
Secondly, two important parameters of SVC, C and gamma, are tuned to optimum values. To tune
the parameters, the GridSearchCV module is utilized on the training dataset. GridSearchCV
(Pedregosa et al., 2011) is a module of scikit-learn used to perform cross-validated grid search.
37
Different combinations of parameters are assessed using the balanced-accuracy presented in Eq.
(2.11), under the application of 5-fold cross-validation.
During the operation of grid search, the training data is split into five equal subsets, and model
performance is assessed five times by leaving one subset of samples out for validation each time
and training is performed using the remaining four fifths of the data. The mean accuracy of the
five splits yields overall accuracy of the model. The parameters providing the highest accuracy are
chosen from the grid search. However, choosing extreme values for parameters may cause
overfitting of the model, therefore the user can make a subjective decision about the choice of
parameters.
After determining the optimum parameters, one more split is applied on the training dataset to
perform ensemble learning. Two third of the training data is selected to perform the actual training,
and the rest is neglected for one SVC model. The ratio of training data to all data becomes 50%
after the second split. The model is trained, and predictions are performed along the target
locations. This process is repeated for the subset of samples (Figure 3.9). This is generally called
bagging, however the key difference between the bagging and the technique utilized in this study
is that sampling is performed without replacement. The reason why sampling is performed without
replacement is that a reproduced sample does not have any effect on the training data, instead they
only cause model to be informed less. While the approach may cause bias, high bias may be
overcome by employing fewer samples during training.
38
Figure 3.9 Main steps of SVC and technique of training-test split
3.2.2 Ensemble Learning with 3-Combination of Variables and Bagging
Once SVC is performed for all the subsets of samples and subsets of variables, the ensemble model
can be built combining all the models. The total number of models to combine is the 3-combination
of n variables times the number of resampling done during bagging. For example, the total number
6
of models for the case of 6 variables and 20 resampling of SVC with bagging is ( ) 𝑥20 =
3
6!
𝑥20 = 400. Considering the outputs of all models are the predictions along the target
3! 𝑥 3!
locations of 3-D space, combining all the models yields a probabilistic model. The accuracy of
models can be assessed using Eq. (2.11). To assess the performance of the ensemble model, values
at each location are categorized back to binary values and assigned to one of the two domains,
according to the global proportions of domains obtained during the first step of the workflow, and
compared with the domains assigned using K-means clustering algorithm from the drillhole dataset
39
reserved for validation. In this way, a confusion matrix can be built, and the accuracy can be
assessed again using Eq. (2.11). A schematic summary of the methodology is illustrated in Figure
3.10.
Figure 3.10 Workflow of the study
40
3.2.3 Hierarchical Application
The workflow defined above can be repeated hierarchically to define more than two domains.
Once an ensemble model is built with two domains, the application can be repeated for each
domain, so that subdomains of each domain can be built (Figure 3.11). The final model is built by
constraining the subdomains to the preceding domains.
Figure 3.11 Hierarchical application of domaining
3.3 Comparison with BlockSIS
To compare the result of the ensemble model, a sequential indicator simulation (SIS) model is
built. BlockSIS is the implementation used to create the SIS simulations. The model uses soft
secondary data, which constrain the traditional SIS to impose local proportions (Deutsch, 2006),
and image cleaning provides further smoothing of the results. BlockSIS is a program developed
within the GSLIB environment (Figure 3.12), which means it has a similar programming structure
with the conventional algorithm in GSLIB, named SISIM. There are different options in BlockSIS
to use the secondary data. Simple kriging with locally varying mean probabilities (Figure 3.12,
Line1 - option 2) is chosen to obtain models that reproduce the local trends in the categories.
41
Figure 3.12 Parameter file of BlockSIS
BlockSIS is performed on the same target locations used for Ensemble learning, to make
comparison easy. To apply BlockSIS, a model of the local proportions, p(u), is generated using
the nearest neighbor method. First, the closest N composite drillhole samples within a certain
distance are recorded for each target location where BlockSIS is performed, and then p(u) is
calculated for each location with an estimation approach like inverse-distance weighting or simply
average of the nearest neighbors. The global proportions are assigned to the target locations which
has no neighbor drillhole samples within a certain distance. Once the local proportions are obtained
for one domain, the other domain is obtained subtracting p(u) from 1, 1-p(u), since the case is
42
binary. Beside local proportions, the rest of the requirements to run BlockSIS are the same as SIS.
Variogram models are built to represent the spatial continuity of the model, and global proportion
of domains corresponding to their prior probabilities are defined. The number of realizations and
some other details related to kriging should also be determined. When the application is completed,
image cleaning is applied on the data according to the chosen level of intensity (Figure 3.13).
Figure 3.13 Effect of image cleaning on the results
For the accuracy assessment of BlockSIS, the same process applied for the Ensemble model is
followed. This time, the accuracy assessment of BlockSIS is performed for each realization and
for the E-type. First, the E-type model is categorized back to binary values according to the
optimum proportions of the domains, and then, the closest domains are assigned using the nearest
neighbor algorithm from the drillhole dataset reserved for validation. In this way, a confusion
matrix can be built, and the accuracy can be assessed again using Eq. (2.11).
43
Chapter 4 Case Study
A geochemical dataset is employed to test the implementation. The geochemical data is obtained
from a drillhole campaign of a Porphyry Copper deposit. Implementation is applied on the data
with a two-level hierarchical approach (Figure 4.1). The case study is explained under two sub-
sections for each level of the hierarchical application, and each application includes three sections,
which explain (1) unsupervised clustering, (2) supervised learning and Ensemble implementation
and (3) validation of the work respectively. All the works are performed using Python
programming language.
Figure 4.1 Two-level hierarchical domaining
4.1 First Level of Hierarchical Application
The first step of the application divides the data into two distinct domains as domain 1 and domain
2. The following sub-sections explain how domaining is obtained step by step starting from
unsupervised clustering, which include exploratory data analysis, optimization of the distribution
of selected variables and domain assignment of samples. This is followed by ensemble modeling
according to the obtained domaining, and finally SIS modeling using the domaining information
obtained from unsupervised clustering.
44
4.1.1 Unsupervised Clustering
Unsupervised clustering is explained under three main steps: data preparation and exploratory data
analysis, optimization of the parameters defining the distribution of domains, and domain
assignment according to the optimized parameters of log-normally distributed data.
4.1.1.1 Data Preparation and Exploratory Data Analysis
The data contain 1138 drillholes, and each drillhole consists of 15-meter-long composites.
Considering the composites are samples, the data have 21641 samples. The data contain 49
chemical elements, which are Au, Ag, Mo, Re, Cd, Pb, Zn, Te, Bi, Sb, Hg, Co, Ni, Se, Al, As, B,
Ba, Be, Ca, Ce, Cr, Cs, Cu, Ga, Ge, Hf, In, K, La, Li, Mg, Mn, Na, Nb, P, Rb, S, Sc, Sn, Sr, Ta,
Th, Ti, Tl, U, V, W, Y and the other available information are coordinates, lithology, alteration,
and mineralization.
Samples with unidentified alteration are removed from the data. Data are further constrained to a
certain area where the number of samples are enough to represent the space. After this, the number
of samples decreased to 17074. The middle coordinates of each composite are used for their spatial
location (Table 4.1). Domain knowledge obtained in the first step of the workflow is expected to
represent alteration minerals of Copper Porphyry Deposit. The relevant variables representing
alteration are determined with EDA and literature (Sillitoe, 2010). Out of 49 geochemical
elements, Mg, Al, Zn, Pb, Y and Ga showed clear binary clustering. Furthermore, ratios of
elements are used to highlight some specific geological features. The final variables to be used in
the unsupervised clustering are Mn/K, Mg/Al, Zn, Pb, Y, Ga (Table 4.1).
45
Table 4.1 Descriptive statistics of sample coordinates and variables. Elements are in ppm, and
ratios are adimensional.
Count Mean St. Dev. Min 25% 50% 75% Max
x 17074 16711.02 601.49 14971.13 16227.29 16781.55 17163.53 18050.26
y 17074 106896.80 357.86 106000.01 106601.30 106916.51 107199.63 107500.00
z 17074 2843.23 108.77 2640.03 2758.09 2839.45 2927.61 3138.85
Mn/K 17074 923.10 1616.95 9.83 96.00 189.16 1094.44 18642.86
Mg/Al 17074 0.25 0.27 0.01 0.05 0.08 0.48 3.39
Zn 17074 398.79 742.73 2.00 19.00 73.00 486.92 10000.00
Pb 17074 60.97 188.66 0.20 3.83 16.00 54.93 10000.00
Y 17074 4.18 5.53 0.05 0.49 0.92 7.47 52.20
Ga 17074 2.38 2.01 0.19 0.95 1.47 3.30 12.81
While the objective of the implementation is to define domaining using only geochemical data,
alteration is turned into a binary variable to see how field-logging is distributed in relation to sets
of three of the selected variables. For this purpose, chloritic-sericitic and potassic alteration are
combined under domain 1, and argillic alteration and quartz-sericitic alteration are combined in
domain 2 as suggested in Faraj and Ortiz, 2021. The results which are going to be obtained by the
first level of hierarchical application are expected to be similar with the binary domaining obtained
by field-logging.
It is important to notice that the natural logarithm of log-normally distributed data follows a normal
distribution. For practical purposes, natural logarithms of the variables are preferred when
displaying histograms and probability plots (Figure 4.2 & Figure 4.3). Notice that Figure 4.2 and
Figure 4.3 show that two distinct clusters of samples are clearly observable, which is reflected by
the bimodal distributions. Distribution of alteration along natural logarithms of Mn/K, Mg/Al and
Zn is shown in Figure 4.4. Figure 4.4 shows that clustering between two domains is clearly visible,
although the boundary between two clusters is not clear likely because of man-made errors. In the
following sections, optimization and domain assignment are performed for each 3-combination of
6 variables (Table 4.2).
46
Figure 4.2 Histograms - natural logarithms of variables
47
Figure 4.3 Cumulative distributions - natural logarithms of variables
48
Figure 4.4 Alteration data obtained from field logging in tri-variate space
Table 4.2 Combinations of 3 out of 6 variables
Variables Variables
1 2 3 1 2 3
1 ln(Mn/K) ln(Mg/Al) ln(Zn) 11 ln(Mg/Al) ln(Zn) ln(Pb)
2 ln(Mn/K) ln(Mg/Al) ln(Pb) 12 ln(Mg/Al) ln(Zn) ln(Y)
3 ln(Mn/K) ln(Mg/Al) ln(Y) 13 ln(Mg/Al) ln(Zn) ln(Ga)
4 ln(Mn/K) ln(Mg/Al) ln(Ga) 14 ln(Mg/Al) ln(Pb) ln(Y)
5 ln(Mn/K) ln(Zn) ln(Pb) 15 ln(Mg/Al) ln(Pb) ln(Ga)
6 ln(Mn/K) ln(Zn) ln(Y) 16 ln(Mg/Al) ln(Y) ln(Ga)
7 ln(Mn/K) ln(Zn) ln(Ga) 17 ln(Zn) ln(Pb) ln(Y)
8 ln(Mn/K) ln(Pb) ln(Y) 18 ln(Zn) ln(Pb) ln(Ga)
9 ln(Mn/K) ln(Pb) ln(Ga) 19 ln(Zn) ln(Y) ln(Ga)
10 ln(Mn/K) ln(Y) ln(Ga) 20 ln(Pb) ln(Y) ln(Ga)
49
4.1.1.2 Optimization of Parameters Defining the Distribution of Variables
As explained before, an optimization process for the parameters, which define the distribution of
variables, is required to have an accurate domaining. The library NumPy is utilized to manage the
data and functions of SciPy are used to define the minimization problem, normal distribution, and
global distribution (Table 4.3). The minimization problem is defined by Eq. (3.1), which is
expressed in the methodology section.
Table 4.3 Functions used to define the optimization problem
Library Package Function

NumPy - -
SciPy stats norm
SciPy stats percentileofscore
SciPy optimize minimize
To minimize the approximation errors, initial parameters are approximately determined by visually
inspecting the distribution of global data in histograms (Figure 4.2). Initial parameters assigned to
each variable are tabulated in (Table 4.4). Because domain 2 has a slightly larger population size,
the initial parameter w is assigned as 0.4. Then, optimizations are performed for the twenty subsets
of variables tabulated in Table 4.2. Optimization for each case approximately lasted for 5 hours.
Optimization results of the first case can be seen in Figure 4.5. Optimized parameters of each case
are tabulated in Table 4.5. All the optimum parameters are utilized in the following step of domain
assignment.
Table 4.4 Initial parameters of variables

Mn/K Mg/Al Zn Pb Y Ga
μ σ μ σ μ σ μ σ μ σ μ σ w
Domain1 8 1.5 -0.5 0.5 6.5 1 4 1 2.4 0.75 1.5 0.5 0.4
Domain2 4 1.5 -3 0.5 3 1 2 1 -0.5 0.75 0.1 0.5 0.6
50
Figure 4.5 Optimization result of first case: cumulative probability plots of variables with initial
parameters of normal distribution on the left, cumulative probability plots of variables with
optimum parameters of distributions on the right
51
Table 4.5 Optimum parameters for each subset of variables
Three - Combination of Variable 1 Variable 2 Variable 3
Variables Domain1 Domain2 Domain1 Domain2 Domain1 Domain2
# Variable1 Variable2 Variable3 μ σ μ σ μ σ μ σ μ σ μ σ w
1 ln(Mn/K) ln(Mg/Al) ln(Zn) 7.39 0.85 4.79 0.70 -0.57 0.37 -2.80 0.45 6.53 0.83 3.43 1.14 0.40
2 ln(Mn/K) ln(Mg/Al) ln(Pb) 7.42 0.83 4.81 0.71 -0.56 0.35 -2.79 0.46 4.21 1.00 1.77 1.36 0.39
3 ln(Mn/K) ln(Mg/Al) ln(Y) 7.35 0.89 4.77 0.69 -0.59 0.39 -2.82 0.43 2.11 0.55 -0.52 0.63 0.41
4 ln(Mn/K) ln(Mg/Al) ln(Ga) 7.43 0.82 4.81 0.72 -0.55 0.35 -2.79 0.46 1.36 0.50 0.07 0.43 0.39
5 ln(Mn/K) ln(Zn) ln(Pb) 7.23 1.01 4.72 0.65 6.40 0.92 3.29 1.02 4.11 1.04 1.61 1.27 0.45
6 ln(Mn/K) ln(Zn) ln(Y) 7.22 1.02 4.72 0.65 6.39 0.93 3.28 1.01 2.02 0.65 -0.59 0.56 0.45
7 ln(Mn/K) ln(Zn) ln(Ga) 7.32 0.93 4.76 0.67 6.47 0.87 3.36 1.08 1.30 0.54 0.05 0.42 0.43
8 ln(Mn/K) ln(Pb) ln(Y) 7.26 0.99 4.73 0.66 4.12 1.03 1.63 1.28 2.04 0.63 -0.57 0.58 0.44
9 ln(Mn/K) ln(Pb) ln(Ga) 7.46 0.78 4.83 0.73 4.23 0.99 1.81 1.38 1.38 0.48 0.09 0.44 0.38
10 ln(Mn/K) ln(Y) ln(Ga) 7.28 0.97 4.74 0.66 2.06 0.61 -0.56 0.59 1.28 0.56 0.04 0.41 0.44
11 ln(Mg/Al) ln(Zn) ln(Pb) -0.58 0.38 -2.81 0.44 6.51 0.84 3.41 1.12 4.18 1.01 1.73 1.33 0.41
12 ln(Mg/Al) ln(Zn) ln(Y) -0.61 0.41 -2.83 0.42 6.47 0.87 3.36 1.08 2.09 0.58 -0.54 0.61 0.42
13 ln(Mg/Al) ln(Zn) ln(Ga) -0.58 0.37 -2.81 0.45 6.52 0.84 3.42 1.13 1.34 0.52 0.06 0.42 0.41
14 ln(Mg/Al) ln(Pb) ln(Y) -0.60 0.40 -2.82 0.43 4.16 1.02 1.70 1.32 2.10 0.57 -0.53 0.62 0.42
15 ln(Mg/Al) ln(Pb) ln(Ga) -0.56 0.36 -2.79 0.46 4.20 1.00 1.76 1.35 1.35 0.51 0.07 0.43 0.40
16 ln(Mg/Al) ln(Y) ln(Ga) -0.60 0.40 -2.82 0.43 2.10 0.56 -0.53 0.63 1.32 0.53 0.05 0.42 0.42
17 ln(Zn) ln(Pb) ln(Y) 6.33 0.97 3.23 0.97 4.06 1.05 1.55 1.23 1.97 0.71 -0.62 0.53 0.47
18 ln(Zn) ln(Pb) ln(Ga) 6.19 1.08 3.11 0.86 3.96 1.09 1.41 1.16 1.14 0.65 -0.01 0.38 0.52
19 ln(Zn) ln(Y) ln(Ga) 6.35 0.96 3.25 0.98 1.99 0.69 -0.60 0.54 1.23 0.59 0.02 0.40 0.47
20 ln(Pb) ln(Y) ln(Ga) 4.09 1.04 1.60 1.26 2.01 0.66 -0.59 0.55 1.25 0.58 0.03 0.40 0.46
4.1.1.3 Domain Assignment
The same libraries and functions used for the optimization step are used for domain assignment
(Table 4.3). Moreover, a compiler of a library specifically built to run faster codes containing
NumPy is adopted at this step. The compiler JIT of Numba is employed to calculate Eq. (3.4). The
methodology section can be reviewed for details of the domain assignment process. Domain
assignment is performed for each subset of variables, and this process lasts in average 1.5 hours
for each case. Figure 4.6 clearly shows that domain assignment reached to a certain maturity in
terms of clustering for the first case. Figure 4.7 also shows that the domain assignment splits the
data into two clearly distinct clusters. The quality of the achieved assignment for each variable can
be seen in Figure 4.8. As seen in Figure 4.8, there is no perfect fit between optimum distribution
52
and the achieved domaining. The results observed in Figure 4.8 suggest that there are more than
two sub-domains within the larger domains used for this binary split, which is an expected behavior
for an ore deposit.
Figure 4.6 Improvement in domain assignment
Figure 4.7 Result of domain assignment - distribution of samples in tri-variate space
53
Figure 4.8 Domain assignment result of first case: probability plots comparing achieved and
optimal domaining
54
4.1.2 Supervised Learning and Ensemble Learning
Coordinates of composite samples and domain labels defined after the unsupervised learning step
are used as input and output data in SVC, respectively (Figure 4.9). The libraries scikit-learn and
NumPy are employed for all steps of SVC (Table 4.6). The kernel function used for SVC was
RBF. Combination of possible SVC parameters are tested with 5-fold cross-validation technique
and the parameters are determined accordingly. GridSearchCV function of scikit-learn is utilized
to perform 5-fold cross-validation. The results of 5-fold cross-validation are tabulated in Table 4.7.
Figure 4.9 Result of domain assignment process in coordinate system
Table 4.6 Functions used to perform support vector classifier
Library Package Function

NumPy - -
Scikit-learn svm SVC
Scikit-learn preprocessing StandardScaler
Scikit-learn model_selection GridSearchCV
Scikit-learn model_selection train_test_split
Scikit-learn metrics Confusion_matrix
55
Table 4.7 Scores of SVC with different combination of parameters according to 5-fold cross-
validation
Parameters Test Score Mean of St. Dev.
Index C Gamma Split1 Split2 Split3 Split4 Split5 Scores of Scores Rank
1 1 scale 0.86 0.847 0.861 0.858 0.859 0.857 0.005 27
2 1 10 0.905 0.896 0.904 0.897 0.903 0.901 0.004 6
3 1 5 0.894 0.883 0.892 0.898 0.897 0.893 0.005 12
4 1 2 0.88 0.872 0.883 0.883 0.886 0.881 0.005 18
5 1 1 0.873 0.859 0.871 0.873 0.874 0.87 0.006 21
6 1 0.1 0.852 0.84 0.852 0.852 0.852 0.85 0.005 30
7 10 scale 0.867 0.854 0.868 0.865 0.869 0.865 0.006 24
8 10 10 0.911 0.896 0.907 0.9 0.908 0.904 0.005 3
9 10 5 0.904 0.894 0.9 0.898 0.904 0.9 0.004 7
10 10 2 0.89 0.883 0.888 0.89 0.898 0.89 0.005 14
11 10 1 0.88 0.87 0.882 0.879 0.885 0.879 0.005 19
12 10 0.1 0.858 0.847 0.858 0.857 0.857 0.855 0.004 29
13 50 scale 0.872 0.856 0.867 0.87 0.873 0.868 0.006 23
14 50 10 0.91 0.9 0.906 0.905 0.903 0.905 0.004 1
15 50 5 0.909 0.895 0.909 0.902 0.904 0.904 0.005 4
16 50 2 0.894 0.884 0.887 0.893 0.897 0.891 0.005 13
17 50 1 0.886 0.875 0.884 0.885 0.889 0.884 0.005 17
18 50 0.1 0.861 0.846 0.86 0.86 0.857 0.857 0.005 28
19 100 scale 0.873 0.858 0.868 0.872 0.874 0.869 0.006 22
20 100 10 0.906 0.9 0.908 0.899 0.901 0.903 0.003 5
21 100 5 0.911 0.896 0.906 0.902 0.907 0.905 0.005 2
22 100 2 0.898 0.886 0.89 0.897 0.898 0.894 0.005 11
23 100 1 0.889 0.879 0.886 0.888 0.892 0.887 0.004 16
24 100 0.1 0.861 0.849 0.861 0.857 0.862 0.858 0.005 26
25 1000 scale 0.878 0.862 0.873 0.877 0.879 0.874 0.006 20
26 1000 10 0.898 0.892 0.905 0.893 0.889 0.895 0.005 10
27 1000 5 0.904 0.894 0.903 0.902 0.895 0.9 0.004 8
28 1000 2 0.901 0.889 0.899 0.899 0.902 0.898 0.005 9
29 1000 1 0.892 0.879 0.886 0.887 0.895 0.888 0.005 15
30 1000 0.1 0.866 0.85 0.864 0.865 0.867 0.862 0.006 25
While applying SVC, the data was split into training and validation sets with 25% validation ratio,
and then one third of the training data are left out for bagging. Predictions were made along a
determined mesh grid (Table 4.8), and each point of the mesh corresponds to a node of the model
centered in a volume of 15x15x10 cubic meter with a total of 2.3 billion cubic meter for the entire
model. Considering an average rock density is 2.7 tonne/m3 modelled volume is equivalent to 6.3
billion tonne material. SVC is performed for each subset of variables, which are in total 20, and
for each subset of samples, which are again 20. In total, SVC is performed 400 times. The node
56
model determined when applying SVC to the variables Mg/Al, Mn/K and Zn is illustrated in Figure
4.10, and the ensemble model, which is the average of all SVC models for all sets of variables, is
illustrated in Figure 4.11. Vertical cross-sections overlaid by composite samples (Figure 4.12)
confirmed that the ensemble model represents boundaries well.
Table 4.8 Mesh grid along which predictions made

Range Number of points
Easting 15000 18000 201
Northing 106000 107500 101
Elevation 2640 3140 51
Figure 4.10 Node model of the variables Mg/Al, Mn/K and Zn
57
Figure 4.11 Final ensemble model
Figure 4.12 Vertical cross-sections taken along North axis of the SVC Ensemble model
58
The accuracy of all node models is tracked using the balanced-accuracy technique Eq. (2.11). All
models show high accuracy of approximately 90% (Figure 4.13). The ensemble model is
categorized back to binary values according to the global proportion of domains which is
0.41/0.59, so that an overall accuracy can be quantified. To provide the original proportion of the
domains, 1.4 is decided as threshold value (Figure 4.14), meaning that if the average of domain
labels is equal or less than 1.4, then the block becomes domain 1, else it becomes domain 2 (Figure
4.15). Total mass of domain 1 of the final model is equivalent to 2.6 billion tonnes of material, and
that of domain 2 is equivalent to 3.7 billion tonnes of material.
The domains of nodes closest to the validation samples are found using nearest neighbor method,
so that the categorized ensemble model can be compared with the validation dataset. Finally, a
91.3% balanced accuracy of the proposed ensemble model is obtained from the resulting confusion
matrix (Table 4.9).
Figure 4.13 Boxplots showing the distribution of accuracies for each model according to subset
of variables. Notice that vertical axis of the graph is limited from bottom and top, 0.89 and 0.92
respectively.
59
Figure 4.14 Categorization of Ensemble learning according to optimum proportion of domains
60
Figure 4.15 Categorization of Ensemble model – Horizontal profiles
61
Table 4.9 Confusion matrix of categorized ensemble model
Domain
1 0.898 0.102
True
2 0.071 0.929
1 2
Predicted Domain
4.1.3 Comparison with BlockSIS
BlockSIS is applied on the labelled data and implements simple indicator kriging with local
proportions, to determine the conditional distribution of categories at every node. Firstly, the local
proportion within a neighborhood is calculated using the k-means clustering method along the grid
specified in Table 4.8. An average of the closest 8 samples within 50 meters distance is used as
local proportion. The global average proportion is assigned to points when there are no samples
within 50 meters distance. The global proportions imposed on the program were 0.41/0.59.
Secondly, the spatial continuity of the domains is defined with indicator variogram models.
Horizontal continuity is analyzed with experimental variograms created along 12 directions from
Azimuth 0° to Azimuth 165° (Figure 4.16). Azimuth 75°, Azimuth 165° and vertical directions are
determined as orthogonal directions with maximum anisotropy. Variogram models were built
fitting the experimental variograms (Figure 4.17) with 0.05 nugget effect and 3 spherical structures
Eq. (4.1). The number of realizations was defined as 20. The rest of the details can be seen in
Figure 4.18 showing the parameter file of BlockSIS.
62
Figure 4.16 Experimental variograms created along horizontal direction
Figure 4.17 Variogram models built along chosen directions
63
𝛾𝐴−75° (ℎ) = 0.05 ∙ 𝑁𝑢𝑔(ℎ) + 0.63 ∙ 𝑆𝑝ℎ850 (ℎ) + 0.21 ∙ 𝑆𝑝ℎ55 (ℎ) + 0.11 ∙ 𝑆𝑝ℎ600 (ℎ)
𝛾𝐴−165° (ℎ) = 0.05 ∙ 𝑁𝑢𝑔(ℎ) + 0.63 ∙ 𝑆𝑝ℎ5000 (ℎ) + 0.21 ∙ 𝑆𝑝ℎ110 (ℎ) + 0.11 ∙ 𝑆𝑝ℎ20 (ℎ) (4.1)
𝛾𝑉𝑒𝑟𝑡 (ℎ) = 0.05 ∙ 𝑁𝑢𝑔(ℎ) + 0.63 ∙ 𝑆𝑝ℎ1500 (ℎ) + 0.21 ∙ 𝑆𝑝ℎ250 (ℎ) + 0.11 ∙ 𝑆𝑝ℎ40 (ℎ)
Figure 4.18 Parameter file of BlockSIS
64
Two scenarios were created with image cleaning; one is with light image cleaning and the other is
with heavy image cleaning options (Figure 4.19 & Figure 4.20). Pointy results occur because local
proportions are imposed on the simulation. Increasing the search neighborhood of the k-means
clustering method can smooth the pointy appearance, but leads to little uncertainty in the
boundaries between categories.
It can be observed from the results that as the intensity of image cleaning increases, models look
like object-based models. Moreover, the model obtained from BlockSIS with heavy image
cleaning is closer to the that of the Ensemble model, therefore comparison and validation is
performed using heavy image cleaning.
Figure 4.19 Block models of BlockSIS with light image cleaning and heavy image cleaning
65
Figure 4.20 Effect of image cleaning on BlockSIS – horizontal profiles
Cross-sections overlaid by composite samples (Figure 4.21) show that BlockSIS with heavy image
cleaning has less uncertainty along boundaries in comparison to the Ensemble model. Moreover,
extrapolation in BlockSIS is limited, resulting in the global proportions in areas without data.
Accuracy is assessed with the balanced accuracy technique again for each realization (Table 4.10).
To make the final E-type model comparable, values are categorized back to domains considering
the original proportion of domains (Figure 4.22 & Figure 4.23). Just like with the Ensemble model,
domains of closest blocks are assigned to validation data samples using the nearest neighbor
method, so that the categorized BlockSIS model can be compared with the validation dataset. The
66
balanced accuracy of the categorized E-type model obtained from the confusion matrix (Table
4.11) is 92.5%.
Figure 4.21 BlockSIS with heavy image cleaning - Vertical cross-sections taken along North axis
67
Table 4.10 Confusion matrices of all realizations with balanced accuracy
Confusion Matrix (Row, Column) Balanced
Realizations (0,0) (0,1) (1,0) (1,1) Accuracy
1 1319 187 100 2642 0.920
2 1325 181 123 2619 0.917
3 1331 175 128 2614 0.919
4 1326 180 108 2634 0.921
5 1333 173 138 2604 0.917
6 1324 182 119 2623 0.918
7 1308 198 114 2628 0.913
8 1300 206 119 2623 0.910
9 1295 211 95 2647 0.913
10 1324 182 135 2607 0.915
11 1331 175 127 2615 0.919
12 1339 167 131 2611 0.921
13 1308 198 123 2619 0.912
14 1333 173 148 2594 0.916
15 1317 189 125 2617 0.914
16 1312 194 114 2628 0.915
17 1328 178 126 2616 0.918
18 1309 197 120 2622 0.913
19 1322 184 115 2627 0.918
20 1313 193 108 2634 0.916
Figure 4.22 Categorization of BlockSIS model with heavy image cleaning
68
Figure 4.23 Categorization of BlockSIS – horizontal profiles
Table 4.11 Confusion matrix of categorized BlockSIS model
Domain
1 0.889 0.111
True
2 0.046 0.954
1 2
Predicted Domain
69
It is clear that both Ensemble model and BlockSIS model have high accuracy and the difference
of accuracy between the two models is small. The reason why accuracies are high is because the
validation set is determined by extracting individual composite samples from the original data.
This means that the validation data are always very close to neighboring composite samples, which
condition the block at the validation sample location very strongly. If the validation set was defined
by removing entire drillholes instead of sample, the accuracy would be much lower.
While there is no clear difference between the two models, there are some details worth
mentioning. First, when the two models are compared (Figure 4.24), Ensemble model depicts
uncertainty better especially at boundaries with smoother transitions. Second, the Ensemble model
provides extrapolations that follow the geological shape of the orebodies. Third, the pointy
appearance of BlockSIS caused using a local mean can be diminished if the neighborhood size of
the k-means clustering is increased, however this may reduce the local accuracy of the model and
increase smoothing.
Figure 4.24 Comparison of two models with vertical cross-sections
70
4.2 Second Level of Hierarchical Application
Each domain obtained from first level of hierarchical application are further split into two domains.
Domain 1 is split into domains 10 and 20, and domain 2 is split into domains 30 and 40. The
methodology followed for the first level are followed in the second level of hierarchical
application. While domain 10 and 20 represent chlorite-sericite group and potassic group
respectively, domain 30 and 40 represent quartz-sericite and argillic alteration groups respectively.
4.2.1 Unsupervised Clustering
Out of 17074 samples, 6142 samples were labelled as domain 1 and the rest were labelled as
domain 2 at first level of hierarchical application. Unsupervised clustering is performed using
samples of each domain separately.
4.2.1.1 Data Preparation and Exploratory Data Analysis
Best variables representing sub-domains of domain 1 and domain 2 are determined with
exploratory data analysis. For each domain, four variables are determined which has similar
distribution characteristics. The variables used for domain 1 are yttrium (Y), molybdenum (Mo),
rhenium (Re) and magnesium divided by aluminum (Mg/Al) (Table 4.12 and Figure 4.25).
Histograms of variables in Figure 4.25 clearly show that data has negative skewness which can be
the sign of two domains, one of which located at negative side has a smaller proportion.
71
Table 4.12 Descriptive statistics of variables chosen for domain 1. Elements are in ppm, and
ratio is adimensional
St.
Count Mean Dev. Min 25% 50% 75% Max
x 6142 16181.52 498.82 14971.13 15904.13 16120.12 16395.51 18050.26
y 6142 106931.66 370.18 106000.01 106632.29 106982.95 107254.12 107500.00
z 6142 2834.24 104.20 2640.03 2753.45 2828.08 2911.87 3106.07
Y 6142 9.83 5.43 0.88 6.28 9.54 12.22 52.20
Mo 6142 88.64 92.36 1.30 40.08 70.53 112.56 1870.00
Re 6142 0.34 0.36 0.00 0.12 0.25 0.44 5.73
Mg/Al 6142 0.57 0.19 0.09 0.44 0.59 0.71 1.09
Figure 4.25 Variables chosen for sub-domains of domain 1
72
The variables used for domain 2 are zinc (Zn), indium (In), tellurium (Te) and manganese divided
by potassium (Mn/K) (Table 4.13 and Figure 4.26). Likewise the variables chosen for domain 1,
the histograms of variables chosen for domain 2 also show skewness (Figure 4.26) which can be a
sign of data consisting of two domains, one of which is located at the positive side this time has
smaller proportion. Optimization and domain assignment steps of two domains are performed for
each 3 combinations of 4 variables (Table 4.14).
Table 4.13 Descriptive statistics of variables chosen for domain 2. Elements are in ppm, and
ratio is adimensional
St.
Count Mean Dev. Min 25% 50% 75% Max
x 10932 17008.52 423.36 15500.36 16749.28 17034.09 17319.83 17853.72
y 10932 106877.22 349.24 106000.01 106599.90 106900.23 107152.00 107500.00
z 10932 2848.29 110.95 2640.29 2761.28 2851.13 2935.05 3138.85
Zn 10932 92.64 257.88 2.00 14.00 26.00 64.00 8283.00
In 10932 0.22 0.37 0.01 0.06 0.12 0.24 6.15
Te 10932 1.16 2.51 0.03 0.26 0.48 1.01 61.18
Mn/K 10932 166.68 269.70 9.83 70.56 114.01 177.24 10437.81
73
Figure 4.26 Variables chosen for sub-domains of domain 2
Table 4.14 Combinations of 3 out of 4 variables chosen for domain 1 and domain 2
Variables of Domain 1 Variables of Domain 2

1 2 3 1 2 3
1 ln(Y) ln(Mo) ln(Re) 1 ln(Zn) ln(In) ln(Te)
2 ln(Y) ln(Mo) ln(Mg/Al) 2 ln(Zn) ln(In) ln(Mn/K)
3 ln(Y) ln(Re) ln(Mg/Al) 3 ln(Zn) ln(Te) ln(Mn/K)
4 ln(Mo) ln(Re) ln(Mg/Al) 4 ln(In) ln(Te) ln(Mn/K)
4.2.1.2 Optimization of Parameters Defining the Distribution of Variables
Initial parameters are approximately determined by visual inspection on the distribution of global
data. Initial parameters assigned to each variable and optimum parameters are tabulated in Table
4.15, Table 4.16, Table 4.17 and Table 4.18. One of the results is illustrated for both domain 1 and
domain 2 in Figure 4.27 and Figure 4.28 respectively.
74
Table 4.15 Initial parameters of variables chosen for domain 1
Y Mo Re Mg/Al
μ σ μ σ μ σ μ σ w
Domain10 0.7 0.5 2 0.5 -4.2 0.5 -2.5 0.5 0.15
Domain20 2.4 0.5 4.4 0.6 -1.5 1 -0.4 0.4 0.85
Table 4.16 Initial parameters of variables chosen for domain 2

Zn In Te Mn/K
μ σ μ σ μ σ μ σ w
Domain30 2.9 0.6 -2.2 0.7 -1 0.7 4.7 0.8 0.85
Domain40 6 0.5 0 0.5 0.5 2 6.5 0.5 0.15
Table 4.17 Optimum parameters for each subset of variables chosen for domain 1
1 ln(Y) ln(Mo) ln(Re) 1.36 0.50 2.38 0.32 3.25 1.29 4.40 0.59 -2.73 1.47 -1.18 0.72 0.26
2 ln(Y) ln(Mo) ln(Mg/Al) 1.09 0.91 2.35 0.34 3.60 1.28 4.40 0.56 -1.78 0.94 -0.46 0.23 0.39
3 ln(Y) ln(Re) ln(Mg/Al) 1.79 0.76 2.38 0.28 -2.03 1.46 -1.16 0.61 -0.91 0.46 -0.41 0.18 0.45
4 ln(Mo) ln(Re) ln(Mg/Al) 3.54 1.25 4.41 0.55 -2.34 1.53 -1.17 0.68 -1.06 0.42 -0.42 0.20 0.34
Table 4.18 Optimum parameters for each subset of variables chosen for domain 2
1 ln(Zn) ln(In) ln(Te) 2.81 0.62 4.26 1.23 -2.59 0.76 -1.51 0.95 -1.10 0.68 -0.04 1.14 0.53
2 ln(Zn) ln(In) ln(Mn/K) 2.79 0.55 3.98 1.29 -2.63 0.74 -1.69 0.99 4.79 0.43 4.66 1.02 0.42
3 ln(Zn) ln(Te) ln(Mn/K) 2.79 0.58 4.09 1.27 -1.12 0.65 -0.16 1.15 4.78 0.46 4.66 1.06 0.47
4 ln(In) ln(Te) ln(Mn/K) -2.63 0.74 -1.67 0.99 -1.12 0.63 -0.21 1.15 4.79 0.44 4.66 1.03 0.44
75
Figure 4.27 Optimization of domain 1 with variables Y, Mo and Re: cum. probability plots of
variables with initial parameters of normal dist. on the left, cum. probability plots of variables
with optimum parameters of distributions on the right.
76
Figure 4.28 Optimization of domain 2 with variables Zn, In and Te: cum. probability plots of
variables with initial parameters of normal dist. on the left, cum. probability plots of variables
with optimum parameters of distributions on the right.
77
4.2.1.3 Domain Assignment
Domain assignment is applied for each subset of variables of both domain 1 and domain 2. Domain
1 is split into two domains as domain 10 and domain 20 (Figure 4.29), and domain 2 is split into
two domains as domain 30 and domain 40 (Figure 4.30) according to optimum cumulative
distributions.
and optimal domaining for the case with variables Y, Mo and Re
78
and optimal domaining for the case with variables Zn, In and Te
79
4.2.2 Supervised Learning and Ensemble Learning
Supervised learning is applied on subdomains of domain 1 and domain 2, which are obtained from
the domain assignment step, separately. First, Ensemble models are built for domain 1 and domain
2, and then domaining of ensemble models are constrained to the first-level categorized ensemble
model (Figure 4.31). Accuracies of SVM models built for domain 1 range from 0.72 to 0.81, and
SVM models built for domain 2 range from 0.65 to 0.71 (Figure 4.32).
The average of global proportions, which are the optimized distributions from the unsupervised
classification step, is used to determine the categorization threshold of the Ensemble model (Figure
4.33). Again, the average of the assigned categories for each sample is categorized into four
domains to assess the accuracy of the Ensemble model (Figure 4.34). Assuming density of the
rocks is 2.7 tonne/m3, total mass of domains 10, 20, 30 and 40 are 1 billion tonne, 1.6 billion tonne,
1.7 billion tonne and 2 billion tonne, respectively. The confusion matrix was built for the final
model, now considering the four categories (Table 4.19). While domain-wise accuracies range
between 0.56 and 0.90, the balanced accuracy of the final model is 0.73. Vertical and horizontal
cross-sections are illustrated in Figure 4.35 and Figure 4.36, respectively. As seen in Figure 4.35,
there are some misclassifications in the final model. Moreover, it can be observed from the cross-
sections that continuity along vertical direction is higher than that of horizontal direction.
80
Figure 4.31 Methodology followed to realize hierarchical ensemble model
81
Figure 4.32 Boxplots showing balanced accuracies of SVM models
82
Figure 4.33 Categorization of ensemble model
83
Figure 4.34 Result of second-level hierarchical domain assignment
Table 4.19 Confusion matrix of final ensemble model

True Domain
10 0.557 0.255 0.083 0.106

20 0.049 0.904 0.034 0.013
30 0.026 0.015 0.799 0.160
40 0.033 0.057 0.263 0.648
10 20 30 40
Predicted Domain
84
Figure 4.35 Vertical cross-sections taken from final ensemble model
85
Figure 4.36 Two hierarchical ensemble models along horizontal benches
4.2.3 Comparison with BlockSIS
BlockSIS model is built following the same methodology which was followed in the first level of
the hierarchical application, but it is not constrained by the binary model initially built. Results of
the domain assignment step are employed to condition the model. First, indicator variograms are
built for each individual domain against the rest of the domains (Figure 4.37 and Eq. (4.2).
BlockSIS model is built imposed local proportions and with heavy image cleaning. As was the
case in the first-level hierarchical application, gridded prior proportions were computed using
86
Nearest Neighbor method with 50 meters distance. The global proportions determined for the
domains of model from domain 10 to domain 40 were 0.158, 0.247, 0.274 and 0.321 respectively.
Twenty realizations were performed.
Figure 4.37 Variogram models built along chosen directions for each sub-domain
87
Domain 10 Variogram models
𝛾𝐴−120° (ℎ) = 0.1 ∙ 𝑁𝑢𝑔(ℎ) + 04 ∙ 𝑆𝑝ℎ120 (ℎ) + 0.3 ∙ 𝑆𝑝ℎ40 (ℎ) + 0.2 ∙ 𝑆𝑝ℎ1000 (ℎ)
(4.2)
Next, average of global proportions is used to assign the four categories. Because local proportions
are imposed on the model, some zones with drillholes samples were modelled deterministically,
therefore global proportions were also affected by the results (Figure 4.38). The E-type model is
categorized considering deterministic values are correctly estimated since the zones with
deterministic values are highly populated by drillhole samples, however it is important to notice
that this choice comes with the neglection of the global proportion of the model (Figure 4.38-a).
The confusion matrix is built for the categorized model (Table 4.20). The accuracy of the model
for each domain ranges between 0.482 and 0.854, and the balanced accuracy of the model is 0.694
which is slightly less than the Ensemble model. If E-type model was categorized back to four
88
domains considering the global proportions, the balanced accuracy would decrease to 54.4%
(Table 4.21 and Figure 4.38-b), which would be a substantial decrease.
89
Figure 4.38 Cumulative probability of E-Type BlockSIS model. Imposing local proportion on
BlockSIS increases the deterministic zones (red dashed circles). Assuming deterministic zones
are correctly estimated: correctly (a) and wrongly (b) assigned samples when global proportion
is neglected and considered, respectively.
90
Table 4.20 Confusion matrix of BlockSIS considering deterministic zones.
True Domain
10 0.631 0.190 0.050 0.129
20 0.164 0.810 0.013 0.013
30 0.012 0.028 0.482 0.478
40 0.022 0.048 0.076 0.854
10 20 30 40
Predicted Domain
Table 4.21 Confusion matrix of BlockSIS considering global proportions

True Domain
10 0.691 0.131 0.05 0.129

20 0.787 0.187 0.012 0.014
30 0.016 0.024 0.447 0.513
40 0.026 0.044 0.069 0.861
10 20 30 40
Predicted Domain
Vertical cross-sections taken along North axis are illustrated in Figure 4.39. As seen in Figure
4.39, the effect of simulation is observable especially in zones where samples are scarce.
Compared to BlockSIS model of the first-level hierarchical application, the model gave similar
results (Figure 4.40). As was the case in the first level of the application, results of the BlockSIS
model have more alternating zones than the Ensemble model (Figure 4.41). In zones where there
91
are scarce drillhole samples, BlockSISmodel has pointy and discontinuous domains, whereas the
Ensemble model remains continuous (Figure 4.42).
Figure 4.39 Three consecutive cross-sections of BlockSIS model taken along north
92
Figure 4.40 Comparison of BlockSIS models along horizontal benches with two categories (left)
and with four categories (right)
93
Figure 4.41 Final ensemble model (left) and BlockSIS model (right) along horizontal benches
94
Figure 4.42 Vertical cross-sections of final ensemble model (top) and BlockSIS model (bottom)
95
Chapter 5 Conclusions and Recommendations
The main goal of the study was to propose a workflow to build a semi-autonomous domain model
for resource estimation, using a geochemical dataset. The motivation behind the goal was to
employ multiple variables that are linked to the geological domains in a consistent and semi-
autonomous way. This supports the construction of geological models that constrain the estimation
of the elements of interest in the resource model. This motivation was realized by adopting the
unsupervised classification method proposed by Faraj and Ortiz (2021) for the first step of the
study. Although the unsupervised classification proposed by Faraj and Ortiz (2021) is open to
developments, it was capable to cluster the data into distinct geological domains successfully.
There are some points worth to mention about the first step of the study. Global proportions during
the optimization of the parameters that define the distributions do not consider the spatial
distribution of the samples, but only the statistical distribution of the underlying variables. Using
global proportions obtained by optimization in the subsequent supervised classification step may
cause that the final model is biased towards oversampled zones. Correcting this issue can
significantly improve the Ensemble model. A possible solution can be proposed by applying
declustering techniques to the samples.
Another point related to unsupervised classification is choosing the right variables and determining
the importance of each variable. Investigating data assuming variables are log-normally distributed
increases the accuracy of the unsupervised classification substantially. However, it is important to
notice that some variables may affect domaining more compared to the rest (Faraj and Ortiz, 2021).
The unsupervised algorithm which considers the weight of variables can further improve the
domaining. A final consideration about the unsupervised classification can be developing an
algorithm which can model more than two domains, so that the hierarchical application is no longer
96
required. However, optimization of multiple domains may require a complex algorithm which can
be the focus of future studies.
Successful results were also obtained from supervised learning and ensemble learning, the next
step of the study. SVC was employed as a classifier, and the domaining obtained by SVC was
curvy and well-defined, which was another motivation of the study. SVC offers flexible models in
terms of continuity and shape of the domain thanks to the two parameters controlling the algorithm.
An optimum choice of parameters balancing the accuracy of the model and the expected shape of
domains can yield a successful final model. Reproducing the model with Ensemble learning further
improved the result, and thus results were able to define the spatial uncertainties. Accuracies of
Ensemble models were very high. However, it should be noticed that accuracy would be less if
training data was built considering drillholes rather than individual samples. One of the most
important challenges both on the first and second step of the study was the computing time.
Especially as the number of variables increases, building an Ensemble model becomes inefficient
in terms of time. Therefore, any improvement on algorithms to decrease the time spent on each
step can make the workflow more appealing for further studies.
The hierarchical application was another challenge for the proposed workflow. The most important
issue with the hierarchical application is building subdomains with an assumption that the
preceding domains are correct. However, when there are odd-numbered domains in a population,
the hierarchical application is not capable to assign true domaining using binary classification. In
such conditions, decisions made by users to merge two subdomains, which are previously split
into different branches of the hierarchical tree, can be a solution to the problem, but this may
require labor-intensive work and moreover model becomes prone to subjective errors. There is
also an accuracy issue related to the hierarchical application. It can be observed that the accuracy
97
decreases as the number of hierarchical levels increases. Another issue with the hierarchical
application is that the variables used for the preceding hierarchical level may not be efficient for
the descending level and using a different subset of variables comes with other challenges. When
a different subset of variables is employed for the descending level, then model must be built
according to the first level categorized ensemble model, which give rise to inconsistencies in zones
with high variation. Considering these two problems of hierarchical application, a model capable
of dealing with multiple domains in one step would be advantageous rather than the hierarchical
application.
In conclusion, the workflow proposed in this study stands as a robust alternative to traditional
modeling techniques, although there are issues that require further improvements. An ensemble
model informed by an unsupervised classification can be a promising roadmap to follow for the
field of resource estimation. The results of the case study proved that the workflow is capable to
build a robust model with high accuracy and with well-defined, curvy boundaries. Moreover, the
applications adopted for each step can be replaced by other alternative applications according to
the objectives of the studies.
98
References
Adams, L. M., Nazareth, J. L., (1996). Linear and Nonlinear Conjugate Gradient-related Methods.
AMS-IMS-SIAM Joint Summer Research Conference. Society for Industrial and Applied
Mathematics. Philadelphia
Agrawal, R., Imielinski, T., Swami, A., (1993). Mining Association Rules between Sets of Items
in Large Databases. ACM SIGMOD Rec. 22. 207-216. 10.1145/170035.170072.
Boisvert, J., (2010). Geostatistics with Locally Varying Anisotropy (Doctoral thesis, University of
Alberta, Edmonton, Canada). Available from ERA: Education and Research Archive of University
of Alberta. (https://doi.org/10.7939/R31X5C)
Bentley M., Ringrose P. (2021). The Rock Model. In: Reservoir Model Design. Springer, Cham.
https://doi.org/10.1007/978-3-030-70163-5_2
Bishop, Cristopher M., 2006. Kernel Methods & Sparse Kernel Machines. Pattern Recognition
and Machine Learning. Springer Science+Business Media, LLC, Singapore.
Cevik S. I., Ortiz J. M., (2020-a). Machine learning applied in mineral resource sector: An
overview, Predictive Geometallurgy and Geostatistics Lab, Queen’s University, Annual Report
2020, paper 2020-07, 106-129.
Cevik S. I., Ortiz J. M., (2020-b). Machine learning applied in mineral resource sector: An
overview, Predictive Geometallurgy and Geostatistics Lab, Queen’s University, Annual Report
2020, paper 2020-07, 106-129.
Christopher, A. (2021). K-Nearest Neighbor. medium.com, February 2, 2021. accessed June 2022,
https://medium.com/swlh/k-nearest-neighbor-ca2593d7a3c4
99
Deutsch C. V., (2003). Geostatistics, Encyclopedia of Physical Science and Technology, Third
Edition, pp. 697-707.
Deutsch C. V., (2006). A Sequential Indicator Simulation Program for Categorical Variables with
Point and Block Data: BlockSIS, Computer & Geosciences, Volume 32, Issue 10, 1669-1681,
https://doi.org/10.1016/j.cageo.2006.03.005.
Deutsch, C.V. and Journel, A.G., (1997). GSLIB Geostatistical Software Library and User’s
Guide, Oxford University Press, New York, second edition. 369 pages.
Dobilas, S., (2021). Apriori Algorithm for Association Rule Learning — How To Find Clear Links
Between Transactions. Towards Data Science, July 10, 2021, accessed April 2022,
https://towardsdatascience.com/apriori-algorithm-for-association-rule-learning-how-to-find-
clear-links-between-transactions-bf7ebc22cf0a
Faraj, F., Ortiz, J. M., (2021). A Simple Unsupervised Classification Workflow for Defining
Geological Domains Using Multivariate Data. Mining, Metallurgy & Exploration 38, 1609-1623,
https://doi.org/10.1007/s42461-021-00428-5.
Galli A., Beucher H., (1997). Stochastic models for reservoir characterization: a user-friendly
review. Paper presented at the Latin American and Caribbean Petroleum Engineering Conference,
Rio de Janeiro, Brazil, August 1997. doi: https://doi.org/10.2118/38999-MS.
Gautier, L., (2016). Iterative Thickness Regularization of Stratigraphic Layers in Discrete Implicit
Modeling. Mathematical Geosciences, Springer Verlag, 2016, 10.1007/s11004-016-9637-y. hal-
01338981.
100
Géron, A., (2017). Support Vector Machines. Hands-On Machine Learning with Scikit-Learn,
Keras, and TensorFlow., 2nd Edition, O'Reilly Media, Inc.
Gwynn, X. P., Brown M. C., Mohr P. J., (2013). Combined use of traditional core logging and
televiewer imaging for practical geotechnical data collection. In Proceedings of the 2013
International Symposium on Slope Stability in Open Pit Mining and Civil Engineering: 261-272.
Australian Centre for Geomechanics
Guo J., Wang J., Wu L., Liu C., Li C., Li F., Lin M., Jessell M. W., Li P., Dai X., Tang J., (2020).
Explicit-implicit-intergrated 3-D geological modelling approach: A case study of the Xianyan
Demolition Volcano (Fujian, China). Tectonophysics 795, 228648.
Hassanpour, R. M., Deutsch, C. V., (2010). An Introduction to Grid-Free Object-based Facies
Modeling. Centre for Computational Geostatistics Report 12, 107. University of Alberta, Canada.
Hillier, M.J., de Kemp, E.A., and Schetselaar, E.M., 2017. Implicit 3-D modelling of geological
surfaces with the Generalized Radial Basis Functions (GRBF) algorithm; Geological Survey of
Canada, Open File 7814, 15 p. https://doi.org/10.4095/301665
Hodkiewicz, P. F., (2014). Rapid Resource Modelling – A Vision for Faster and Better Mining
Decisions. Ninth International Mining Geology Conference / Adelaide, SA. 18-20 August 2014.
Hotelling, H. (1933) Analysis of a complex of statistical variables into principal components.
Journal of Educational Psychology, 24, 417-441. http://dx.doi.org/10.1037/h0071325
Isaaks, E. H., Srivastava, R. M., (1989). An Introduction to Applied Geostatistics. Oxford
University, New York, New York, USA.
101
James, G., Witten, D., Hastie, T., Tibshirani, R., (2013). An Introduction to Statistical Learning
with Applications in R. Springer, New York, USA.
Jordan, M. I., Mitchell, T. M., (2015). Machine Learning: Trends, perspectives, and prospects.
Science 349, 255-260. DOI: 10.1126/science.aaa8415
Kumbhare, T. A., Chobe, S. V., (2014). An Overview of Association Rule Mining Algorithms.
IJCSIT, International Journal of Computer Science and Information Technology Vol. 5 (1), 927-
930.
MacCormack, K.E., Thorleifson, L.H., Berg, R.C. and Russell, H.A.J., ed. (2016): Three-
Dimensional Geological Mapping: Workshop Extended Abstracts; Geological Society of America
Annual Meeting, Baltimore, Maryland, October 31, 2015; Alberta Energy Regulator, AER/AGS
Special Report 101, 99 p.
Newell, A. J., (2018). Implicit Geological Modelling: a new approach to 3D volumetric national-
scale geological models. British Geological Survey Internal Report, OR/19/004. 37pp.
Ortiz J. M., (2019). Geometallurgical modeling framework, Predictive Geometallurgy and
Geostatistics Lab, Queen’s University, Annual Report 2019, paper 2019-01, 6-16.
Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M.,
Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M.,
Perrot M., Duchesnay E., (2011). Scikit-learn: Machine Learning in Python. Journal of Machine
Learning Research, vol. 12, pp. 2825-2830.
Platt, JC., 1999. Probabilistic Outputs for Support Vector Machines and Comparisons to
Regularized Likelihood Methods. MIT Press.
102
Rocca, J., (2019). Ensemble methods: bagging, boosting and stacking. Towards Data Science.
Retrieved from https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-
stacking-c9214a10a205
Rossi, M. E., Deutsch, C. V. (2016). Mineral resource estimation. SPRINGER.
Shen C, Laloy E, Albert A, Chang F-J, Elshorbagy A, Ganguly S, Hsu K-L, Kifer D, Fang Z, Fang
K, Li D, Li X, Tsai W-P (2018). HESS Opinions: Deep learning as a promising avenue towards
knowledge discovery in water sciences, Hydrology and Earth System Sciences Discussions,
doi.org/10.5194/hess-2018-168.
Shewchuk J. R., (1994). An Introduction to the Conjugate Gradient Method without the Agonizing
Pain, School of Computer Science Carnegie Mellon University, Pittsburgh.
Sillitoe, R. H., (2010). Porphyry copper systems. Econ Geol 105(1):3–41
Tipping, Michael E., 2000. The relevance vector machine. Advances in neural information
processing systems, pp. 652 – 658.
van der Maaten, L and Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine
Learning Research. 9. 2579-2605.
103
Appendices
1. Optimization of the case with variables Mn/K, Mg/Al and Pb: cumulative probability plots
of variables with initial parameters of normal distribution on the left, cumulative
probability plots of variables with optimum parameters of distributions on the right.
104
2. Domain assignment of the case with variables Mn/K, Mg/Al and Pb: probability plots
comparing achieved and optimal domaining.
105
3. Optimization of the case with variables Mn/K, Mg/Al and Y: cumulative probability plots
106
4. Domain assignment of the case with variables Mn/K, Mg/Al and Y: probability plots
107
5. Optimization of the case with variables Mn/K, Mg/Al and Ga: cumulative probability plots
108
6. Domain assignment of the case with variables Mn/K, Mg/Al and Ga: probability plots
109
7. Optimization of the case with variables Mn/K, Zn and Pb: cumulative probability plots of
variables with initial parameters of normal distribution on the left, cumulative probability
plots of variables with optimum parameters of distributions on the right.
110
8. Domain assignment of the case with variables Mn/K, Zn and Pb: probability plots
111
9. Optimization of the case with variables Mn/K, Zn and Y: cumulative probability plots of
112
10. Domain assignment of the case with variables Mn/K, Zn and Y: probability plots
113
11. Optimization of the case with variables Mn/K, Zn and Ga: cumulative probability plots of
114
12. Domain assignment of the case with variables Mn/K, Zn and Ga: probability plots
115
13. Optimization of the case with variables Mn/K, Pb and Y: cumulative probability plots of
116
14. Domain assignment of the case with variables Mn/K, Pb and Y: probability plots
117
15. Optimization of the case with variables Mn/K, Pb and Ga: cumulative probability plots of
118
16. Domain assignment of the case with variables Mn/K, Pb and Ga: probability plots
119
17. Optimization of the case with variables Mn/K, Y and Ga: cumulative probability plots of
120
18. Domain assignment of the case with variables Mn/K, Y and Ga: probability plots
121
19. Optimization of the case with variables Mg/Al, Zn and Pb: cumulative probability plots of
122
20. Domain assignment of the case with variables Mg/Al, Zn and Pb: probability plots
123
21. Optimization of the case with variables Mg/Al, Zn and Y: cumulative probability plots of
124
22. Domain assignment of the case with variables Mg/Al, Zn and Y: probability plots
125
23. Optimization of the case with variables Mg/Al, Zn and Ga: cumulative probability plots of
126
24. Domain assignment of the case with variables Mg/Al, Zn and Ga: probability plots
127
25. Optimization of the case with variables Mg/Al, Pb and Y: cumulative probability plots of
128
26. Domain assignment of the case with variables Mg/Al, Pb and Y: probability plots
129
27. Optimization of the case with variables Mg/Al, Pb and Ga: cumulative probability plots of
130
28. Domain assignment of the case with variables Mg/Al, Pb and Ga: probability plots
131
29. Optimization of the case with variables Mg/Al, Y and Ga: cumulative probability plots of
132
30. Domain assignment of the case with variables Mg/Al, Y and Ga: probability plots
133
31. Optimization of the case with variables Zn, Pb and Y: cumulative probability plots of
134
32. Domain assignment of the case with variables Zn, Pb and Y: probability plots comparing
achieved and optimal domaining.
135
33. Optimization of the case with variables Zn, Pb and Ga: cumulative probability plots of
136
34. Domain assignment of the case with variables Zn, Pb and Ga: probability plots comparing
137
35. Optimization of the case with variables Zn, Y and Ga: cumulative probability plots of
138
36. Domain assignment of the case with variables Zn, Y and Ga: probability plots comparing
139
37. Optimization of the case with variables Pb, Y and Ga: cumulative probability plots of
140
38. Domain assignment of the case with variables Pb, Y and Ga: probability plots comparing
141
39. Optimization of domain1 distribution with variables Y, Mo and Mg/Al: cum. probability
plots of variables with initial parameters of normal dist. on the left, cum. probability plots
of variables with optimum parameters of distributions on the right.
142
40. Hierarchical application of domain1: probability plots compare achieved and optimal
domaining after domain assignment according to variables Y, Mo and Mg/Al.
143
41. Optimization of domain1 distribution with variables Y, Re and Mg/Al: cum. probability
144
domaining after domain assignment according to variables Y, Re and Mg/Al.
145
43. Optimization of domain1 distribution with variables Mo, Re and Mg/Al: cum. probability
146
domaining after domain assignment according to variables Mo, Re and Mg/Al.
147
45. Optimization of domain2 distribution with variables Zn, In and Mn/K: cum. probability
148
domaining after domain assignment according to variables Zn, In and Mn/K.
149
47. Optimization of domain2 distribution with variables Zn, Te and Mn/K: cum. probability
150
domaining after domain assignment according to variables Zn, Te and Mn/K.
151
49. Optimization of domain2 distribution with variables In, Te and Mn/K: cum. probability
152
domaining after domain assignment according to variables In, Te and Mn/K.
153

Koruk Kasimcan 202208 MASc

Uploaded by

Copyright:

Available Formats

You might also like

Koruk Kasimcan 202208 MASc

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Koruk Kasimcan 202208 MASc

Uploaded by

Copyright:

Available Formats

DEFINITION OF GEOLOGICAL DOMAINS WITH AN ENSEMBLE

IMPLEMENTATION OF SUPPORT VECTOR CLASSIFICATION

A thesis submitted to the Department of Mining Engineering

in conformity with the requirements for

the degree of Master of Applied Science

Kingston, Ontario, Canada

Copyright © Kasimcan Koruk, 2022

evaluation of mineral resources. Uncertainties can be quantified by traditional approaches, such as

geostatistical simulation methods. While traditional approaches generally focus directly on

machine learning algorithms stand as a strong alternative to traditional approaches (explicit or

multivariate datasets to reach consistent and semi-autonomous results.

assurance to complete the degree.

and spherical model ...................................................................................................................... 11

Figure 2.4 General workflow of machine learning systems ......................................................... 13

Figure 2.5 Linear regression: an example of a simple regression model...................................... 17

Figure 2.6 Algorithm of neural networks ..................................................................................... 18

Figure 2.7 Bias-variance tradeoff ................................................................................................. 20

Figure 2.8 Hard margin linear SVC .............................................................................................. 21

Figure 2.10 A simple example of bootstrapping ........................................................................... 24

Figure 3.1 Schematic summary of unsupervised classification .................................................... 27

Figure 3.7 Result of domain assignment in tri-variate space ........................................................ 36

results with best clustering are obtained earlier. ........................................................................... 36

Figure 3.9 Main steps of SVC and technique of training-test split............................................... 39

Figure 3.10 Workflow of the study ............................................................................................... 40

Figure 3.11 Hierarchical application of domaining ...................................................................... 41

Figure 3.12 Parameter file of BlockSIS ........................................................................................ 42

Figure 3.13 Effect of image cleaning on the results ..................................................................... 43

Figure 4.1 Two-level hierarchical domaining ............................................................................... 44

Figure 4.2 Histograms - natural logarithms of variables .............................................................. 47

Figure 4.3 Cumulative distributions - natural logarithms of variables ........................................ 48

optimum parameters of distributions on the right ......................................................................... 51

Figure 4.6 Improvement in domain assignment ........................................................................... 53

optimal domaining ........................................................................................................................ 54

Figure 4.9 Result of domain assignment process in coordinate system ....................................... 55

Figure 4.11 Final ensemble model ................................................................................................ 58

Figure 4.14 Categorization of Ensemble learning according to optimum proportion of domains 60

Figure 4.15 Categorization of Ensemble model – Horizontal profiles ......................................... 61

Figure 4.16 Experimental variograms created along horizontal direction .................................... 63

Figure 4.17 Variogram models built along chosen directions ...................................................... 63

Figure 4.18 Parameter file of BlockSIS ........................................................................................ 64

Figure 4.20 Effect of image cleaning on BlockSIS – horizontal profiles ..................................... 66

Figure 4.23 Categorization of BlockSIS – horizontal profiles ..................................................... 69

Figure 4.24 Comparison of two models with vertical cross-sections ........................................... 70

Figure 4.25 Variables chosen for sub-domains of domain 1 ........................................................ 72

Figure 4.26 Variables chosen for sub-domains of domain 2 ........................................................ 74

optimum parameters of distributions on the right. ........................................................................ 76

optimum parameters of distributions on the right. ........................................................................ 77

Figure 4.31 Methodology followed to realize hierarchical ensemble model ................................ 81

Figure 4.32 Boxplots showing balanced accuracies of SVM models ........................................... 82

Figure 4.33 Categorization of ensemble model ............................................................................ 83

Figure 4.34 Result of second-level hierarchical domain assignment ........................................... 84

neglected and considered, respectively. ........................................................................................ 90

and with four categories (right)..................................................................................................... 93

Table 2.1 A confusion matrix ....................................................................................................... 19

Table 2.2 Types of kernel functions ............................................................................................. 23

Table 3.1 Optimization of thirteen parameters ............................................................................. 30

ratios are adimensional. ................................................................................................................ 46

Table 4.2 Combinations of 3 out of 6 variables ............................................................................ 49

Table 4.3 Functions used to define the optimization problem ...................................................... 50