11 Articulo - A Regression-Tree-Based Model For Mining Capital Cost Estimation

International Journal of Mining, Reclamation and
Environment
ISSN: 1748-0930 (Print) 1748-0949 (Online) Journal homepage: http://www.tandfonline.com/loi/nsme20
A regression-tree-based model for mining capital

cost estimation
Hamidreza Nourali & Morteza Osanloo
To cite this article: Hamidreza Nourali & Morteza Osanloo (2018): A regression-tree-based model
for mining capital cost estimation, International Journal of Mining, Reclamation and Environment,
DOI: 10.1080/17480930.2018.1510300
To link to this article: https://doi.org/10.1080/17480930.2018.1510300
Published online: 03 Sep 2018.
Submit your article to this journal
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=nsme20
INTERNATIONAL JOURNAL OF MINING, RECLAMATION AND ENVIRONMENT
https://doi.org/10.1080/17480930.2018.1510300
A regression-tree-based model for mining capital cost estimation

Hamidreza Nourali and Morteza Osanloo
Department of Mining and Metallurgical Engineering, Amirkabir University of Technology, Tehran, Iran
ABSTRACT ARTICLE HISTORY

Determination of mining capital cost always is a challenging issue for Received 18 February 2018
mining engineer. Underestimating the capital cost may postpone the Accepted 7 August 2018
construction and production phases. In addition, overestimating may KEYWORDS
decrease value of the project. Currently available capital cost estimation Capital cost estimation;
models cannot predict mining capital cost in a reliable range of error. initial investment; regression
Therefore, in this paper, a model based on regression tree method is tree; machine learning
developed to estimate the mining capital cost. According to the
obtained results, the capability of presented model to estimate the
mining capital cost in a wide range of mining capacity is significant.
1. Introduction
One of the fundamental components of mining feasibility study is capital cost estimation.
Spending capital cost, during the early years of mine life, has an impressive impact on the net
present value (NPV) of projects. Generally, the main goal of mine planning, designing and
scheduling is the fact of determination of optimal ultimate pit limit with regard to production
scheduling horizon to achieve the maximum NPV. Fixation of the factor of production per year
and specification of suitable mine fleet and equipment are both key factors of design and
production scheduling to meet maximum profit. These are the determinant factors for mining
capital cost estimation [1]. Equipment size has a direct relation with capital cost [2]. In addition,
the new technologies have an impact on mining capital cost that includes the potential of
increasing capital cost [3]. Capital cost assessment can play a critical role in deciding whether
projects will be proceeded, delayed or abandoned [1]. It is, therefore, important that the capital
cost estimation is carried out accurately as determined by the estimation guidelines based on the
level of estimation conducted [4].
A literature survey shows that during the last four decades, the deviation towards under-
estimating of capital cost is remarkably grandstanding. There have been relatively few prior
studies of the uncertainty in mining project capital cost estimates, and these have been fairly
unsophisticated in their analysis of the data. Yet, the results still reveal just how poor these capital
cost estimates are [5].
Comparing the capital cost estimate in a project’s feasibility study with the actual cost incurred
during development of the project for 17 international ferrous metal, non-ferrous metal, uranium
and coal mining projects initiated between 1965 and 1980 shows that 12 experienced capital cost
overruns, 10 by more than 15% .The average overrun was 35% [6]. Investigation of 16 projects
completed between 1990 and 1995 as-built capital costs exceeded the feasibility study cost estimate
by an average of 27% [7]. Also, in the other study, by the escalating rate of inflation, there is an
average cost overrun of 17% on 21 projects [8]. Yet, even after taking this into account, there
appears to remain a downward bias in the initial feasibility study estimate. A study of 60 mining
CONTACT Morteza Osanloo morteza.osanloo@gmail.com

© 2018 Informa UK Limited, trading as Taylor & Francis Group
2 H. NOURALI AND M. OSANLOO
projects covering the period from 1980 to 2001 showed average cost overruns of 22%. In more
than half of the cases, as-built capital costs exceeded the feasibility study estimate by at least 20%
and 25% of the projects had cost under-runs [9]. Also, investigation of 63 worldwide mining and
smelting projects completed between 1980 and 2001 indicates that feasibility study capital cost
estimates contain both bias and error. Of the 63 projects, 44 had cost overruns, six had cost
under-runs and 13 were within 0.5% of the feasibility cost estimate [5]. The bias in capital cost
estimation is particularly troubling given its persistence over four decades, and the fact that
engineers providing capital cost estimates for large public sector capital projects financed by the
World Bank tended to overestimate the real costs of construction [10].
Underestimating capital cost indicates the project at a higher NPV than its real NPV.
Therefore, it can be caused to postpone the construction and accordingly the production phases
in order to provide the required finance. In the other side, overestimating indicates the project at a
lower NPV than its real NPV. Both of them have some disadvantages for mine planner. Regarding
to the literature, it seems to have a problem with completion risk. Completion risk is recognised
by both mining companies and financiers. In fact, through completion guarantees, additional
equity subscriptions or standby facilities, financial institutions have measures in place to mitigate
this risk [7]. Yet, the presence of deviation of capital cost estimation in the mining industry is not
discussed and analysed well [5]. Table 1 illustrates some mining projects during recent years that
estimated their capital cost to have serious differences compared with the real capital cost [11].
The completion time of these projects was April 2006. The significant difference between
estimated capital cost during the feasibility study and actual amount is notable. Some of the
project’s actual and predicted capital cost is reported in US dollars and Australian dollars in
Table 1. With regard to the time of project completion, it can be found that the error percentage
in estimation process is very significant in most of the projects.
Usually in preliminary stages of project study, shortage of data is tangible. So the estimation of
capital cost in this period is a challenging issue. To overcome on this obstacle, implementation of
machine-learning-based models is recommended, but currently available capital cost estimation
models or methods cannot predict mining capital cost in a reliable range of error.
Some researchers have classified a number of approaches for the product cost estimation such
that they can be employed for the capital cost estimation process [12,13]. Generally, the regres-
sion-based approach is the most common technique to develop the cost model [14]. Various
researchers have effort to offer remarkable cost estimation models using univariate regression
method [15–19]. The general equation of mining cost estimation is shown in Equation (1) [20].
Y ¼ k Xn (1)
where Y is the estimated cost, X is a variable to be a causal of Y, K is a unit cost related to X, and
n is a power to control a changing trend of the curve.
One of the known methods in this area is the O’Hara model [21,22]. The base of this model is
according to polynomial least square approach. It is a model derived from collected data of
Canadian mining capital cost like cost of site preparation, cost of overburden stripping, open pit
and underground mining capital cost [5,10]. O’Hara model considers only capacity item, but
Table 1. Example of some mining projects and miscalculation of their investment [11].
Name of Project Company Predicted capital cost Actual capital cost Error Percentage
Ravensthorpe/Yabilu (Australia) BHP Billiton 1.4 (billion Australian $) 1.82 (billion Australian $) 30%
Spence (Chile) BHP Billiton 0.99 (billion US $) 1.09 (billion US $) 10%
Telfer (Australia) Newcrest 1.19 (billion Australian $) 1.4 (billion Australian $) 18%
Stanwell (Australia) AMC 1.3 (billion Australian $) 1.69 (billion Australian $) 30%
Boddington (Australia) Newmont 0.886 (billion Australian $) 1.772 (billion Australian $) 100%
Goro (Indonesia) Inco 1.45 (billion US $) 1.67 (billion US $) 15%
ProminentHill (Australia) Oxiana 0.35 (billion Australian $) 0.529 (billion Australian $) 51%
INTERNATIONAL JOURNAL OF MINING, RECLAMATION AND ENVIRONMENT 3
other important parameters are ignored. The O’Hara model is shown in Equation (2). The
expected capital cost calculates in US dollars for mid-1989.
C ¼ US $ 600; 000 T 0:6 (2)
where C is the estimated capital cost and T is daily ore milled (tonne).
O’Hara model comes with error in most of situations, so that this model should be used only
for rough estimation of mining capital cost [22]. Not considering the other effective variables in
the model construction process may lead to a considerable error in results. Therefore, the multi-
variate regression can be considered as an alternative solution for providing a reliable cost
estimation model [23]. Accordingly, a multiple linear regression model for the capital and
operating cost of the backhoe loader was developed [24]. Cited models only can be used for
capital and operating cost for one machine. Likewise, another multiple regression-based cost
model to estimate capital and operating cost of a flotation machine was developed [25]. Most of
the declared cost models were constructed to use in special cases such as estimation of a machine
or a product cost.
However, to determine the estimation accuracy of regression methodology in comparison with
the other approaches, a study was performed in which the regression versus neural network
approach was compared. The results demonstrated that an artificial neural network approach may
be an alternative substitution for regression [14]. Of course, having a reliable neural network is
related to having the adequate data for training validating, and testing of the network.
Furthermore, specifically to estimate the mining capital cost, a variety of activities such as
developing several models with a wide range of accuracy level have been suggested.
There is a rule of thumb for capital cost estimation, which is called the six-tenths rule [26].
Investigation of this rule shows that the model often comes with up to 30% of error [27]. Some
regression models have been presented based on mining capacity [28,29]. Also, a linear multi-
variate regression model has been constructed according to capital cost data from 27 porphyry
copper mine [30]. The following parameters are involved in the mentioned study: 1-Mill recovery,
2-Strip ratio and 3-Distance from the railway. The advantage of that research is benefiting from
other effective parameters in the capital cost estimation, but it still suffers from appearing of a
wide range of error in capital cost estimation. Because of complexity of this subject, previous
studies did not express any comment on other effective parameters such as annual ore production
and annual waste stripping. Hence, current studies cannot fetch up to an acceptable result.
Meanwhile, most of them were developed only for special aspects of mining activities; therefore
they cannot be used for the other mining cost estimation problems. Nevertheless, some of them
can be used for a rough estimation of capital cost in the stage of a mining feasibility study. It is
clear that considering the influence of other effective parameters is necessary during the estima-
tion process to develop a reliable model for capital cost estimation.
Performing of capital cost estimation practice is an intertwined task and depends on many
mechanisms. For this reason, the process of capital cost estimation is so complicated. With regard
to complexity of mining capital cost estimation process and high amount of initial investment in
such projects, a flexible robust method must be applied on it which has the capability of
forecasting under any sophisticated conditions. Recently, soft computing methods have been
employed by researchers to overcome foresaid issues. In recent decades, along with the develop-
ment of artificial-intelligence-based methods, many useful mathematical tools are developed based
on this theory. One of these methods is regression tree training algorithm. For the reason of
outstanding generalisation performance and reliable outputs of this method, it has been applied to
variety of problems in different areas [31–35]. Hence, this paper presents a model based on
regression tree to approximate the capital cost of mining projects. To establish the fundamental
principle of this model, in the beginning, capital cost data of the 28 porphyry copper ore open pit
mine were stored in a database.
2. Data collection
Review of literature survey on mining capital cost estimation subject shows that the ‘Production
Capacity’ is the most important factor used in mining capital cost estimation models. Although a
multivariate regression model has been developed based on other factors such as ‘stripping ratio’
and ‘distance from railroad’ to meet a reliable range of error, still the ‘capacity’ factor is widely
used for capital cost estimation model construction [30]. Furthermore, the quality of collected
data in terms of the research area and dispersion of capacity over different mines should be
investigated. It is clear that to achieve a confident model for capital cost estimation, the research
area must be bounded to one type of mineral, and the other effective factors (except than capacity)
ought to participate in the model construction process. Moreover, the range of annual rock
(wastes and ore) production should be considered of those specific areas. A reliable cost model
must be developed based on actual data, therefore, the previous collected capital cost data of 28
porphyry copper mines with a wide range of Annual Rock (Ore & Waste) Production were
classified in a database reported in Table 2 [36]. This set of technical and economic data has been
gathered by CRU Incorporation. Also, the capital cost data are escalated to 2016 US dollar [36].
Furthermore, the annual rock production (ore and waste) histogram is shown in Figure 1. As can
be seen, gathered data have a wide variation range. This wide area of dispersion can increase the
generality of the investigations. With respect to the cumulative relative frequency diagram, the
difference between distribution of the mine annual rock production and normal distribution is
negligible.
Further, the descriptive statistics of collected data can be seen in Table 3. Specifically, the mean,
standard deviation and number of data indicate that this data set has sufficient dispersion to
develop a global predictor model.
2.1. Multicollinearity diagnostic

One of the diagnostic methods for multicollinearity is to perform auxiliary regressions and to
regress one covariate on the remaining covariates [37,38]. The variance inflation factor ðVIF Þ,
defined as:
1
VIF ¼ (3)
1 R2i
where R2i is the R2 for a covariate ith variable regressed on the remaining covariates in an auxiliary
regression. It is the most commonly used regression diagnostic for multicollinearity within
standard statistical software [37–41].
In general, standard errors of regression coefficients are inflated when the VIF is large (e.g.
when VIF > 10, multicollinearity is usually considered a problem, though this is an arbitrary
threshold). Having a review on VIF in Table 4, the selected variables do not have significant
multicollinearity for estimation process.
3. Development of the model

There are various methods to develop predictive models. The regression-based models are among
of those popular and efficient ones. The earliest form of regression was the method of the least
squares. Many researchers have benefited of this method to structure a cost estimation model, but
such kind of models can become deficient in complicated conditions such as the existence of
different variables, which cause to generate the ineligible estimation error answer. In this regard,
artificial intelligence methods are greatly developed. They are creatively capable of looking ahead.
An intelligent-based method can be trained in various ways. There are many statistical and
machine-learning approaches. The decision tree approach is one of the most common approaches
Table 2. Copper mines properties [36].

Annual Waste Annual Ore Annual Rock CAPITAL
Type of Production Production Production LOM COST US$
Num. Name Country Mine (tonne) (tonne) (tonne) (Year) millions
1 Productora Chile Open Pit 46,200,000 11,000,000 57,200,000 50 1,983
2 Xietongmen- China Open Pit 28,785,360 13,578,000 42,363,360 31 1,826
Xiongcun
3 Tominskoye Russia Open Pit 20,177,772 25,222,215 45,399,987 13 941
4 Rosemont USA Open Pit 52,000,000 26,000,000 78,000,000 22 1,099
Ranch
5 Santo Chile Open Pit 63,738,125 23,177,500 86,915,625 22 1,989
Domingo
6 Mirador Ecuador Open Pit 17,739,000 21,900,000 39,639,000 26 2,087
7 Cristalino Brazil Open Pit 40,000,000 16,000,000 56,000,000 18 5,498
8 Qulong China Open Pit 43,160,000 33,000,000 76,160,000 21 1,247
9 Aktogay Kazakhstan Open Pit 6,000,000 25,000,000 31,000,000 21 3,096
10 Casino Canada Open Pit 25,842,000 43,800,000 69,642,000 24 2,094
11 El Arco Centre Open Pit 31,450,000 37,000,000 68,450,000 20 5,761
12 Agua Rica Argentina Open Pit 70,664,000 40,150,000 110,814,000 38 5,950
Yamana
13 Rio Blanco Peru Open Pit 25,250,000 25,000,000 50,250,000 32 3,500
14 El Galeno Peru Open Pit 8,792,000 31,400,000 40,192,000 23 1,822
15 Schaft Creek Canada Open Pit 94,900,000 47,450,000 142,350,000 22 2,664
Teck
16 Quellaveco Quellaveco Open Pit 55,845,000 46,537,500 102,382,500 20 8,300
17 Corredor Chile Open Pit 109,500,000 36,500,000 146,000,000 17 4,581
18 Minas Conga Peru Open Pit 38,617,000 33,580,000 72,197,000 25 1,652
19 Galore Creek Canada Open Pit 50,800,000 30,660,000 81,460,000 15 567
20 Cerro Casale – Chile Open Pit 103,104,000 57,600,000 160,704,000 28 3,196
Aldebaran
21 Cobre Panama Panama Open Pit 74,000,000 74,000,000 148,000,000 20 2,450
22 Las Bambas Peru Open Pit 102,200,000 51,100,000 153,300,000 18 1,584
23 Altar Argentina Open Pit 51,100,000 51,100,000 102,200,000 30 3,059
24 Collahuasi Chile Open Pit 89,610,791 58,931,378 148,542,169 30 6,178
Expansion
25 Los Pelambres Chile Open Pit 10,950,000 18,950,000 29,900,000 21 1,600
Expansion
26 Quebrada Chile Open Pit 100,405,500 55,275,000 155,680,500 39 5,323
Blanca
Sulphides
27 Toquepala Peru Open Pit 36,700,900 26,710,000 63,410,900 45 1,133
Expansion
28 Zaldivar – Chile Open Pit 49,706,000 24,900,000 74,606,000 20 2,348
Sulphides
*LOM: Life Of Mine
in automatic learning and decision-making. It is popular for its simplicity in constructing, efficient
use in decision-making and for simple representation, which is easily understood by humans.
Regarding the existence of complexity in mining engineering issues, such methods can be
promising to solve many problems. For example, a decision tree classification has been applied
to create real option analysis for managing geological uncertainty as the predictive model [42].
Therefore, with regards to ability of discussed subject, a regression tree cost estimation model is
presented in the following.
3.1. Decision tree model

Data mining is a term coined to describe the process of shifting through large databases in search of
interesting and previously unknown patterns. The accessibility and abundance of data today make
data mining a matter of considerable importance and necessity. The field of data mining provides
the techniques and tools by which large quantities of data can be automatically analysed [43]. Some
0.016 1
Cumulative relative frequency

0.014 0.9
0.8
0.012
0.7
0.01 0.6
Density
0.008 0.5
0.006 0.4
0.004 0.3
0.2
0.002
0.1
0 0
0 50 100 150 200 20 70 120 170
Annual Rock Production (million tonne) Annual Rock Production (million tonne)
a.Histograms of Annual Rock Production (Ore & Waste) b.Cumulative distributions of Annual Rock Production
(Ore & Waste)
Figure 1. Annual Rock (Ore & Waste) Production histogram of collected data.
Table 3. Descriptive Statistics of collected data.

Statistic Num. of observations Minimum Maximum Mean Standard deviation (n-1)
Annual Waste Production (million tonne) 28 6.000 109.500 51.687 30.978
Annual Ore Production (million tonne) 28 11.000 74.000 35.197 15.493
Annual Rock Production (million tonne) 28 29.900 160.704 86.884 42.683
LOM (Year) 28 13.000 50.000 25.393 8.854
CAPITAL COST US$ millions 28 567.000 8300.000 2983.143 1947.223
Table 4. Multicollinearity analysis for predictors.

Predictors VIF
Annual Ore Production (tonne) 2.588
Annual Rock Production(tonne) 1.357
a. Dependent Variable: capital cost (US$ millions)
of the researchers consider the term ‘Data Mining’ as misleading and prefer the term ‘Knowledge
Mining’ as it provides a better analogy to gold mining [44].
Most of the data mining techniques are based on inductive learning [45], where a model is
constructed explicitly or implicitly by generalising from a sufficient number of training
examples. The underlying assumption of the inductive approach is that the trained model is
applicable to future unseen examples. Strictly speaking, any form of inference in which the
conclusions are not deductively implied by the premises can be thought of as an induction.
While in data mining, a decision tree is a predictive model which can be used to represent
both classifiers and regression models, in operations research decision trees refer to a hier-
archical model of decisions and their consequences. The decision-maker employs decision trees
to identify the strategy, which will most likely reach its goal. When a decision tree is used for
classification tasks, it is most commonly referred to as a classification tree. When it is used for
regression tasks, it is called a regression tree.
Decision trees can be used for other data mining tasks. One of common tasks is the regression.
Regression models map the input space into a real-value domain. For example, a regressor can
predict the mining capital cost for a certain mining size given its characteristics. Formally, the goal
is to examine Y j X for a response Y and a set of predictors X. Regression trees are decision trees
that deal with a continuous target. The basic idea is to combine decision trees and linear
regression to forecast numerical target attribute based on a set of input attributes. These methods
perform induction by means of an efficient recursive partitioning algorithm. The choice of the
best split at each node of the tree is usually guided by a least squares error criterion. Regression
tree models are successfully used for day ahead forecasting of some phenomena using input
features. There are some decision trees induction algorithms, including: ID3, C4.5, and
Classification and Regression Tree (CART). All of these algorithms are using splitting criterion
and pruning methods. Each algorithm has some advantages and disadvantages. CART and
CHAID regression tree algorithms assign a constant value to the leaves. The constant value is
usually the mean value of the corresponding leaf. The acronym CHAID stands for chi-squared
Automatic Interaction Detector. It is one of the oldest tree classification methods. CHAID will
‘build’ non-binary trees (i.e. trees where more than two branches can attach to a single root or
node) based on a relatively simple algorithm that is particularly well suited for the analysis of
larger data sets. Also, because the CHAID algorithm will often effectively yield many multi-way
frequency tables (e.g. when classifying a categorical response variable with many categories, based
on categorical predictors with many classes), it has been particularly popular in marketing
research, in the context of market segmentation studies.
A popular algorithm for building model trees with linear models in the leaves is M5 [46].
Another well-known approach is RETIS, which uses a more sophisticated heuristic [47]. The
main difference between the M5ʹ and RETIS approaches is that M5ʹ first learns a standard regression
tree (with constants in the leaves) and only afterwards, during a pruning phase, turns it into a model
tree [48]. RETIS aims immediately at building a model tree and uses a heuristic tuned towards this
task. This heuristic, however, is quite expensive to compute, which may render the RETIS approach
infeasible for certain practical problems. RETIS and M5 algorithms are able to use linear regression
models at the leaves. Linear regression is a well-known statistics method that was studied rigorously.
CART stands for Classification and Regression Trees. It is characterised by the fact that it constructs
binary trees, namely, each internal node has exactly two outgoing edges [49]. The splits are selected
using the Twoing Criteria and the obtained tree is pruned by Cost-Complexity Pruning. When
provided, CART can consider misclassification costs in the tree induction. It also enables users to
provide prior probability distribution. An important feature of CART is its ability to generate
regression trees. In regression trees, the leaves predict a real number and not a class. In case of
regression, CART looks for splits that minimise the prediction-squared error (the least-squared
deviation). The prediction in each leaf is based on the weighted mean for node. CART can be
applied to both categorical and quantitative response variables. When CART is applied with a
quantitative response variable, the procedure is known as ‘Regression Trees’. At each step, hetero-
geneity is now measured by the within-node sum of squares of the response:
X
iðτ Þ ¼ ðyi yðτ ÞÞ2 (4)
where for node τ the summation is over all cases in that node, and yðτ Þ is the mean of those cases.
The heterogeneity for each potential split is the sum of the two sums of squares for the two
nodes that would result. The split which reduces most this within-nodes sum of squares is chosen;
the sum of squares of the parent node is compared with the combined sum of squares from each
potential split into two offspring nodes. Generalisation to Poisson regression (for count data)
follows with the deviance used in place of the sum of squares. Several software packages provide
implementation of regression tree models. MATLAB software provides the fitrtree function that
fits a regression tree model according to the threshold and tree pruning setting. To construct the
regression tree model, we used MATLAB 2014b software. By considering the concept of machine
learning methods, 80 and 20% of the source data were considered as the train and test data set,
respectively. The developed regression tree model has been shown in Figure 2. As mentioned
above, each parent node splits into two offspring nodes. With regard to the annual rock and ore
production of the certain mine, mining capital cost can be calculated by moving down in the
regression tree until arriving to the suitable leaf. Also, the result of training of the model is shown
in Figure 3. The blue dash line shows the actual capital cost data and the red dash line is the sign
of predicted data. According to Figure 3, the predicted graph is following up the real capital cost
Figure 2. Mining capital cost regression tree model.
data and is fitted smoothly. By having a deeper look at this graph, it can be deducted that this
model is capable of derivation of various types of capital cost in different capacities of production
in the training stage. Due to the robust design of the framework of this method, the percentage of
error lies on tolerable range in this stage.
Specifically, it can be seen that the fitness errors percentage of the proposed model is very near to
zero with an excellent approximation in the training process. Therefore, to illustrate the model
Figure 3. Implementation of the model on the data set.

estimation capability, it was tested with the test data after the training process. According to the
results obtained from the cross validation, the significant capability of the presented model to
estimate the mining capital cost in a wide range of mining capacity is demonstrated. The fitness
performance in the training stage and the cross-validation of the test data are illustrated in Figure 4.
4. Validation
In this paper, the acceptable error range of 10 % was considered to construct the model [50].
For this purpose, the constructed models were assessed by the fitness error percentage in the
training and testing stages. the training fitness errors percentage of the regression tree model are
very near to zero with an excellent approximation, and regarding to the error percentage of testing
stage, the model has been able to estimate all the test data in the reliable range of errors. It can be
seen in Figure 5.
Eventually, for evaluation of the model, root-mean-square error (RMSE) and mean absolute
error (MAE) were calculated using Equations (5) and (6). The RMSE shows the difference
between inputs and predicted values according to the model.
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uP
un
u ðti yi Þ2
ti¼1
RSME ¼ (5)
n
1X n
MAE ¼ j ti yi j (6)
n i¼1
where ti is the input value, yi is the predicted value and n is the number of data.
The test and predicted data, obtained from the model, are reported in Table 5. Then, by recalling
the evaluation process, the amount of RMSE and MAE of the model is obtained and reported in
Table 6. RMSE is the standard deviation of the residuals (prediction errors). RMSE is a frequently
used measure of the differences between values predicted by a model or an estimator and the values
a. performance of the regression tree model to fit on b. Cross validation of testing data set
the training data set
Figure 4. Regression tree model cross validation.
a. Regression tree training fitness errors percentage b. Regression tree testing fitness errors percentage
Figure 5. The fitness errors percentage of the training and testing process.
Table 5. The test and predicted data obtained from the model.
Annual Ore Annual Rock Life Of Actual Capital Predicted
Production Production Mine Cost US$ Capital Cost Error
Num. Name Country (tonne) (tonne) (Year) millions US$ millions Percentage
1 Altar Argentina 51,100,000 102,200,000 30 3,059 3196 4.48%
2 Collahuasi Chile 58,931,378 148,542,169 30 6,178 5950 −3.69%
Expansion
3 Los Pelambres Chile 18,950,000 29,900,000 21 1,600 1652 3.25%
Expansion
4 Quebrada Blanca Chile 55,275,000 155,680,500 39 5,323 5761 8.23%
Sulphides
5 Toquepala Peru 26,710,000 63,410,900 45 1,133 1247 10.06%
Expansion
6 Zaldivar – Chile 24,900,000 74,606,000 20 2,348 2450 4.34%
Sulphides
Table 6. RMSE and MAE of the regression tree models.

RMSE MAE
219.36 178.5
observed. RMSE is commonly used in regression analysis to verify experimental results. In addition,
MAE is used to measure how close forecasts or predictions are to eventual outcomes.
Considering the predicted values obtained by regression tree model, and subsequently, the
values of RMSE and MAE show that the model is capable to predict the mining capital cost in
reliable range of error.
Finally, according to the results, the capability of the regression tree model to estimate the
mining capital cost in a wide range of mining capacity is proved. So as a whole, with a view of
evaluation results, the regression tree model can be used as a reliable model for estimating the
mining capital cost.
5. Results and discussion

Highly accurate estimation of capital cost becomes crucial and has an undeniable impact on project cash
flow, since the mining capital cost is usually huge and expended in the primary stage of mine life. As well,
it must satisfy the financier and guarantee the payback of investment for durable time. An accurate
estimate of the amount of capital cost required to setup a mine project can lead to the success of that one
and achieve the highest possible efficiency. This requires careful consideration at different levels of study
in each phase of a mining project. According to the conducted research in review of differences between
actual and estimated capital cost, it seems that there is always an underestimating tendency in capital cost
estimation during the feasibility study.
As mentioned above, possession of an authentic estimator model that can consider effective
parameters on estimating capital cost with an acceptable error range is essential. Therefore, to
overcome this problem, four major rules that should be considered in devising a reliable estimator
model are suggested as follows:
(1) Collected data should be related to a specific mineral and specific extraction and processing
method.
(2) Interval variations of mining capacity and generally scale of mining should have a suitable
dispersion.
(3) Effective factors in determining capital cost should be included in the model. the data of
the 28 open pit.
(4) Due to the complexity of the problem, the appropriate methodology to construct the
model is selected.
For this purpose, capital cost data of 28 porphyry copper mines were collected. If the
collected data have a suitable dispersion of size of mining (annual rock and ore production),
then the model will be more comprehensive. Thus, it has the ability of predicting capital cost
under various circumstances. So, in this study, two factors are selected as the effective items on
the amount of capital cost needed for an open pit mine. The prime factor is Annual Ore
Production (AOP). There is a direct relationship between AOP and the amount of capital cost.
Also, AOP represents the feed capacity of mill. The second factor is Annual Rock Production
(ARP). When ARP and AOP are increased, more capital cost is needed to support the mining
activities. For large-scale mines covering high production per year, with regard to mine scale,
more capital cost must be considered. Complicated relationship among predictors can blazon
the estimation process to a complex task. This study tries to benefit from regression tree to
develop a cost estimation model. Specifically, the expected mean-squared error of regression
trees and the expected misclassification cost of classification trees converge to the lowest
possible values as the training sample size increases [49]. Of course that the number of
predictors (cost drivers) is very determinant for sufficient sample size to support reliable
regression tree model.
With regard to considering 28 data records in the database (80% for training and 20% for testing), the
performance of the proposed model should be clarified in the training and testing stages. As shown in
Figure 5, the training fitness errors percentage of the proposed model is very near to zero with an excellent
approximation, and it has been able to estimate all the test data in the reliable range of errors. Therefore,
the sample size adequately is large for developing a reliable regression tree model. Also, the result of the
validation process indicates the fact that the model is able to estimate different amounts of capital cost.
6. Conclusion
High initial investment in the mining industry requires careful management of the risk sources.
With a view to high investment volume in the early years of the project, the definition of this
factor has an undeniable impact on the intended NPV. Also, both underestimating and over-
estimating of mining capital cost have many negative impacts on mining projects. The first one
may cause to postpone the construction and accordingly the production phases. Furthermore, the
second one may lead to decrease the value of the project. Currently available capital cost
estimation models cannot predict mining capital cost in a reliable range of error. Therefore, in this
study using the regression tree method, an estimator model with an acceptable confidence level
was developed. According to the results, the capability of the proposed model to estimate the
mining capital cost is proved. So, as a whole, with a view of evaluation results, this model can be
used as a reliable model for estimating of mining capital cost.
Acknowledgment
We have to express our appreciation to the National Iranian Copper Industry CO. (NICICO) for sharing their
available data with us during the course of this research. These data have been extracted from a report document
which has been purchased by NICICO from CRU Group Company.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
[1] M. Mohutsiwa and C. Musingwini, Parametric estimation of capital costs for establishing a coal mine: South
Africa case study, J. South. Afr. Inst. Mining Metallurgy 115 (2015), pp. 789–797. doi:10.17159/2411-9717/
2015/v115n8a17
[2] A. Bozorgebrahimi, R. Hall, and M. Morin, Equipment size effects on open pit mining performance, Int. J.
Surface Mining, Reclamation Environ. 19 (2005), pp. 41–56. doi:10.1080/13895260412331326821
[3] C.A. Wheeler, Development of the rail conveyor technology, Int. J. Mining, Reclamation Environ. (2017), pp.
1–15. doi:10.1080/17480930.2017.1352058
[4] S. Shafiee and E. Topal, New approach for estimating total mining costs in surface coal mines, Mining
Technol. 121 (2012), pp. 109–116. doi:10.1179/1743286312Y.0000000011
[5] J. Bertisen and G.A. Davis, Bias and error in mine project capital cost estimation, Eng. Economist 53 (2008),
pp. 118–139. doi:10.1080/00137910802058533
[6] G. Castle, Feasibility studies and other pre-project estimates: How reliable are they?, Proceedings of the
Finance for the Minerals Industry, New York, 1985.
[7] R. Bennet, Technical due diligence requirements for mining project finance, Randol at Vancouver ’96 85th
Annual Global Mining Opportunities and 2nd Annual Copper Hydromet Rountable, Vancouver, 1996.
[8] S. Thomas, Project development costs—Estimates versus reality, Mineral Economics and Management
Society, Tenth Annual Conference, Houghton, Michigan, 2001.
[9] C. Gypton, How have we done? Eng. Mining J. 203 (2002), pp. 40.
[10] G. Pohl and D. Mihaljek, Project evaluation and uncertainty in practice: A statistical analysis of rate-of-return
divergences of 1,015 World Bank projects, World Bank Econ. Rev. 6 (1992), pp. 255–277. doi:10.1093/wber/
6.2.255
[11] M. Noakes, and T. Lanz, Cost estimation handbook for the Australian mining industry: MinCost 90,
Australasian Inst. of Mining and Metallurgy, Sydney, 1993.
[12] X.X. Huang, L.B. Newnes, and G.C. Parry, The adaptation of product cost estimation techniques to estimate
the cost of service, International, J. Comput. Integr. Manufacturing 25 (2012), pp. 417–431. doi:10.1080/
0951192X.2011.596281
[13] A. Niazi, J.S. Dai, S. Balabani, and L. Seneviratne, Product cost estimation: Technique classification and
methodology review, J. Manuf. Sci. Eng. 128 (2006), pp. 563–575. doi:10.1115/1.2137750
[14] A.E. Smith and A.K. Mason, Cost estimation predictive modeling: Regression versus neural network, Eng.
Economist 42 (1997), pp. 137–161. doi:10.1080/00137919708903174
[15] B.H. Daud, A Model for Preliminary Evaluation of Underground Coal Mines, in Computer Methods for the
80’s in the Mineral Industry, Mine Development and Valuation, A. Weiss ed., Society for Mining, Metallurgy,
and Exploration, New York, 1979.
[16] A. Petrick, and R. Dewey, Microcomputer cost models for mining and milling, in Mineral Resource
Management by Personal Computer, T. M. Li, S. D. H. Handelsman and L. Kovisaars eds., Society of
Mining Engineers, New York, 1987.
[17] L. Prasad, Mineral processing plant design and cost estimation, Processors Division of the Canadian Institute
of Mining, Metallurgy and Petroleum, Canada, Montreal, 1969.
[18] J.S. Redpath Ltd, Estimating pre-production and operating costs of small underground deposits, Canada
Centre for Mineral and Energy Technology Minister of Supply and Services Canada, Ottawa, 1986, pp. 252.
[19] S.A. Stebbins, Cost estimation handbook for small placer mines, U.S. Dept. of the Interior, Bureau of Mines,
Pittsburgh, 1987.
[20] P. Darling, SME mining engineering handbook, Omnipress, Madison (Wis.), 2011.
[21] T.A. O’Hara, Quick guide to the evaluation of ore bodies, CIM Bull. 73 (1980), pp. 87–99.
[22] T.A. O’Hara, A Parametric Cost Estimation Method for Open Pit Mines, in Mining Engineering Handbook, H.
L. Hartman ed., Society of mining engineers (SME), New York, 1980.
[23] A.R. Sayadi, M.R. Khalesi, and M.K. Borji, A parametric cost model for mineral grinding mills, Minerals Eng.
55 (2014), pp. 96–102. doi:10.1016/j.mineng.2013.09.013
[24] B. Oraee, A. Lashgari, and A.R. Sayadi, Estimation of capital and operation costs of backhoe loaders, SME
Annual Meeting, Denver, CO, 2011.
[25] S. Arfania, A. Sayadi, and M. Khalesi, Cost modelling for flotation machines, J. South. Afr. Inst. Mining
Metallurgy 117 (2017), pp. 89–96. doi:10.17159/2411-9717/2017/v117n1a13
[26] A. Mular, The estimation of preliminary capital costs, in Mineral Processing Plant Design, A. L. Mular and R. B.
Bhappu eds., Canadian Institute of Mining and Metallurgy, Montreal, 1978.
[27] M. Noakes, and T. Lanz, Cost estimation handbook for the Australian mining industry: MinCost 90,
Australasian Inst. of Mining and Metallurgy, Sydney, 1993.
[28] T.W. Camm, The development of cost models using regression analysis, SME Annual Meeting, Arizona, 1992.
[29] F.-W. Wellmer, M. Dalheimer, and M. Wagner, Economic evaluations in exploration, Second ed., Springer
Science & Business Media, Berlin, Heidelberg, 2007.
[30] K. Long, Statistical methods of estimating mining costs, SME Annual Meeting and Exhibit and CMA 113th
National Western Mining Conference New York, 2011.
[31] H. Boström, H. Linusson, T. Löfström, and U. Johansson, Accelerating difficulty estimation for conformal
regression forests, Ann. Math. Artif. Intell. 81 (2017), pp. 125–144. doi:10.1007/s10472-017-9539-9
[32] A. D’Ambrosio, M. Aria, C. Iorio, and R. Siciliano, Regression trees for multivalued numerical response
variables, Expert. Syst. Appl 69 (2017), pp. 21–28. doi:10.1016/j.eswa.2016.10.021
[33] U. Johansson, H. Bostrom, and T. Lofstrom, Conformal prediction using decision trees, IEEE 13th
International Conference on Data Mining, Dallas, TX, USA, 2013.
[34] U. Johansson, H. Boström, T. Löfström, and H. Linusson, Regression conformal prediction with random
forests, Mach. Learn. 97 (2014), pp. 155–176. doi:10.1007/s10994-014-5453-0
[35] W.Y. Loh, Fifty years of classification and regression trees, Int. Stat. Rev. 82 (2014), pp. 329–348. doi:10.1111/
insr.12016
[36] D. Duckworth, and P.S. John, Copper Mine Project Profiles - 2016 Edition, CRU, London, United Kingdom,
2016.
[37] S.A. Glantz, B.K. Slinker, and T.B. Neilands, Primer of Applied Regression and Analysis of Variance, Vol. 309,
McGraw-Hill, New York, 1990.
[38] E.J. Pedhazur and F.N. Kerlinger, Multiple Regression in Behavioral Research, Holt, Rinehart and Winston,
New York, 1973.
[39] S. Chatterjee, and A.S. Hadi, Regression analysis by example, Fourth ed., John Wiley & Sons, Hoboken, New
Jersey, 2015.
[40] G.S. Maddala, and K. Lahiri, Introduction to econometrics, John Wiley & Sons, Hoboken, New Jersey, 2009.
[41] B. Slinker and S. Glantz, Multiple regression for physiological data analysis: The problem of multicollinearity,
Am. J. Physiology-Regulatory, Integr. Comp. Physiol. 249 (1985), pp. R1–R12. doi:10.1152/
ajpregu.1985.249.1.R1
[42] A.D. Ajak, E. Lilford, and E. Topal, Application of predictive data mining to create mine plan flexibility in the
face of geological uncertainty, Resour. Policy . 55 (2018), pp. 62–79. doi:10.1016/j.resourpol.2017.10.016
[43] L. Rokach, and O. Maimon, Data Mining with Decision Trees: Theory and Applications. World Scientific,
World Scientific Publishing Co. Pte. Ltd., Singapore, 2014.
[44] W. Klosgen, and J. Zytkow, KDD: The purpose, necessity and chalanges, illustrated ed., Handbook of Data
Mining and Knowledge Discovery, Oxford University Press, London, UK, 2002.
[45] T.M. Mitchell, Machine Learning. 1997, Vol. 45, McGraw Hill, Burr Ridge, IL, 1997, pp. 870–877.
[46] J.R. Quinlan, Learning with continuous classes, 5th Australian joint conference on artificial intelligence,
Hobart, Tasmania, 1992.
[47] A. Karalič, Employing linear regression in regression tree leaves, Proceedings of the 10th European conference
on Artificial intelligence, Vienna, Austria, 1992.
[48] C. Vens and H. Blockeel, A simple regression based heuristic for learning model trees, Intell. Data Anal. 10
(2006), pp. 215–236.
[49] L. Breiman, Classification and regression trees, 1st Edition ed., Routledge, New York, 2017.
[50] W.A. Hustrulid, M. Kuchta, and R.K. Martin, Open pit mine planning and design, two volume set & CD-
ROM pack, CRC Press, London, 2013.

11 Articulo - A Regression-Tree-Based Model For Mining Capital Cost Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

11 Articulo - A Regression-Tree-Based Model For Mining Capital Cost Estimation

Uploaded by

Copyright:

Available Formats

International Journal of Mining, Reclamation and

ISSN: 1748-0930 (Print) 1748-0949 (Online) Journal homepage: http://www.tandfonline.com/loi/nsme20

A regression-tree-based model for mining capital

Hamidreza Nourali & Morteza Osanloo

To link to this article: https://doi.org/10.1080/17480930.2018.1510300

Published online: 03 Sep 2018.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at

A regression-tree-based model for mining capital cost estimation

ABSTRACT ARTICLE HISTORY

CONTACT Morteza Osanloo morteza.osanloo@gmail.com

C ¼ US $ 600; 000 T 0:6 (2)

2.1. Multicollinearity diagnostic

3. Development of the model

Table 2. Copper mines properties [36].

3.1. Decision tree model

Cumulative relative frequency

Table 3. Descriptive Statistics of collected data.

Table 4. Multicollinearity analysis for predictors.

Figure 2. Mining capital cost regression tree model.

Figure 3. Implementation of the model on the data set.

Table 6. RMSE and MAE of the regression tree models.

5. Results and discussion

You might also like