Environmental Modelling & Software: Saman Razavi, Bryan A. Tolson, Donald H. Burn

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Environmental Modelling & Software 34 (2012) 67e86

Contents lists available at SciVerse ScienceDirect

Environmental Modelling & Software


journal homepage: www.elsevier.com/locate/envsoft

Numerical assessment of metamodelling strategies in computationally intensive


optimization
Saman Razavi*, Bryan A. Tolson, Donald H. Burn
Department of Civil and Environmental Engineering, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada

a r t i c l e i n f o a b s t r a c t

Article history: Metamodelling is an increasingly more popular approach for alleviating the computational burden
Received 19 November 2010 associated with computationally intensive optimization/management problems in environmental and
Received in revised form water resources systems. Some studies refer to the metamodelling approach as function approximation,
15 September 2011
surrogate modelling, response surface methodology or model emulation. A metamodel-enabled opti-
Accepted 27 September 2011
mizer approximates the objective (or constraint) function in a way that eliminates the need to always
Available online 1 November 2011
evaluate this function via a computationally expensive simulation model. There is a sizeable body of
literature developing and applying a variety of metamodelling strategies to various environmental and
Keywords:
Metamodelling
water resources related problems including environmental model calibration, water resources systems
Optimization analysis and management, and water distribution network design and optimization. Overall, this liter-
Computationally intensive simulation ature generally implies metamodelling yields enhanced solution efficiency and (almost always) effectiveness
models of computationally intensive optimization problems. This paper initially develops a comparative assess-
Neural networks ment framework which presents a clear computational budget dependent definition for the success/
Kriging failure of the metamodelling strategies, and then critically evaluates metamodelling strategies, through
Radial basis functions numerical experiments, against other common optimization strategies not involving metamodels. Three
different metamodel-enabled optimizers involving radial basis functions, kriging, and neural networks
are employed. A robust numerical assessment within different computational budget availability
scenarios is conducted over four test functions commonly used in optimization as well as two real-world
computationally intensive optimization problems in environmental and water resources systems.
Numerical results show that metamodelling is not always an efficient and reliable approach to optimizing
computationally intensive problems. For simpler response surfaces, metamodelling can be very efficient
and effective. However, in some cases, and in particular for complex response surfaces when compu-
tational budget is not very limited, metamodelling can be misleading and a hindrance, and better
solutions are achieved with optimizers not involving metamodels. Results also demonstrate that neural
networks are not appropriate metamodelling tools for limited computational budgets while metamodels
employing kriging and radial basis functions show comparable overall performance when the available
computational budget is very limited.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction and objective optimization, operation management, and automatic calibration,


these computer simulation models should be recursively run
Computational demand of some modern computer simulation hundreds or thousands of times. An optimization setting which
models tends to keep pace with rapidly advancing computing involves such a computationally expensive simulation model in
facilities. A single run of such models can be prohibitively long, in evaluating either the objective function or a constraint (or both)
the order of minutes to days (Keating et al., 2010; Razavi et al., for any given candidate solution can be referred to as computa-
2010; Wang and Shan, 2007; Zhang et al., 2009); while in tionally intensive optimization.
many optimization-based engineering practices, such as design One of the most commonly used approaches to dealing with this
type of computationally intensive optimization is metamodelling
* Corresponding author.
(also called response surface surrogate modelling) or the use of
E-mail addresses: ssrazavi@uwaterloo.ca (S. Razavi), btolson@uwaterloo.ca function approximation (Razavi et al., 2010). Arising from various
(B.A. Tolson), dhburn@uwaterloo.ca (D.H. Burn). disciplines, there are a variety of metamodelling strategies which

1364-8152/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2011.09.010
68 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

are concerned with developing and utilizing cheap-to-run The organization of this paper is as follows. Section 2 raises
“surrogates” of expensive simulation models based on function some fundamental metamodelling considerations that metamodel-
approximation. For a fixed computational budget, this enables enabled optimizer users should consider before any metamodel
more solutions to be at least approximately evaluated in the opti- development/application and any subsequent comparison with
mization procedure. The existence of the metamodelling concept optimizers that do not rely on metamodelling. Section 3 describes
perhaps dates back to well before formalizing modern meta- three different metamodel-enabled optimizers representing
modelling approaches as, for example, the traditional local search different metamodelling strategies from the current literature that
non-linear optimization techniques working with Taylor series are adopted and implemented in this study. This section contains
expansion (i.e., different variations of Newton’s method) use simple detailed practical information regarding the metamodel-enabled
approximations. As classified by Razavi et al. (2010), there are three optimizers we implemented as such details seem largely unre-
other main (sometimes complementary) approaches to alleviating ported in the current literature. Conversely, there is only limited
this computational burden as well including: utilizing computa- information in Section 3 on theories and metamodel equations
tionally efficient optimization algorithms (Doherty, 2005; Kuzmin (e.g., kriging and neural networks theories and equations), as they
et al., 2008; Tan et al., 2008; Tolson et al., 2009; Tolson and are available elsewhere. Section 4 benchmarks the optimization
Shoemaker, 2007), including the efficient population-based algorithms (without metamodelling) used in this study to develop
evolutionary algorithms by (Hansen et al., 2003) and (Vrugt et al., the baseline for metamodelling assessment. Section 5 presents the
2009), utilizing parallel computing networks (Cheng et al., 2005; test functions and the real-world case studies used as well as the
Feyen et al., 2007; He and Hui, 2007; Vrugt et al., 2006), and details of experimental settings. Experiment results are reported in
utilizing strategies for opportunistically evading model evaluations Section 6 and followed by a discussion in Section 7 and conclusions
(Joslin et al., 2006; Ostfeld and Salomons, 2005; Razavi et al., 2010) and final remarks in Section 8.
such as model preemption (Razavi et al., 2010).
The wide and extensive application of metamodelling strategies 2. Fundamental considerations
over more than four decades in a wide variety of optimization
problems from different disciplines (Simpson et al., 2008) suggests 2.1. Evaluating the effectiveness and efficiency of metamodelling
that this research field is sufficiently mature so that the pros and
cons of metamodelling can be fairly evaluated. There is a sizeable Based on our analysis of the metamodelling studies in envi-
body of literature developing and applying a variety of meta- ronmental and water resources optimization literature, the lowest
modelling strategies in various environmental and water resources CPU time saving observed due to metamodelling is 21% in Broad
related problems including surface water model and groundwater et al. (2005) and the highest saving is 97% reported in Zou et al.
model calibration (Bliznyuk et al., 2008; Keating et al., 2010; Khu (2007). Typically, efficiency or effectiveness of a metamodel-
et al., 2004; Khu and Werner, 2003; Liong et al., 2001; enabled algorithm is comparatively quantified with respect to
Mugunthan and Shoemaker, 2006; Mugunthan et al., 2005; a benchmark algorithm without metamodelling. Thus, for any given
Ostfeld and Salomons, 2005; Regis and Shoemaker, 2009; metamodel-enabled algorithm, it is important to carefully choose
Shoemaker et al., 2007; Zhang et al., 2009; Zou et al., 2007, an appropriate benchmark algorithm to make appropriate and
2009), groundwater systems analysis and management (Bau and logical conclusions.
Mayer, 2006; Fen et al., 2009; Johnson and Rogers, 2000; Metamodels become attractive when the maximum possible
Kourakos and Mantoglou, 2009; Regis and Shoemaker, 2004, number of original simulation model evaluations is limited.
2007a,b, 2009; Yan and Minsker, 2006), and water distribution However, there are other computationally efficient optimization
network design and optimization (Behzadian et al., 2009; Broad algorithms independent of metamodelling (e.g., Kuzmin et al.,
et al., 2005; di Pierro et al., 2009). These publications typically 2008; Tan et al., 2008; Tolson et al., 2009; Tolson and Shoemaker,
give the impression that metamodelling increases the overall 2007) that have been designed to work in a limited computa-
computational efficiency (i.e., attaining good quality solutions tional budget (i.e., limited number of simulation model evaluations
while consuming less computational budget compared to when or, equivalently, objective function evaluations, or simply function
metamodelling is not applied) and almost always effectiveness (i.e., evaluations), and there also exist strategies to enhance the effi-
attaining better quality solutions than the solutions achieved ciency of some inefficient metamodel-independent optimization
without metamodelling) in computationally intensive optimization algorithms (Ostfeld and Salomons, 2005; Razavi et al., 2010). Unlike
problems. these fast algorithms, there are algorithms like standard genetic
The main objective of this study is to test this common belief algorithms (GAs) which are robust and effective given a large
through a comparative assessment of metamodel-enabled opti- number of function evaluations but are not typically designed for
mizers against optimizers without metamodelling. This compar- cases where the number of function evaluations is quite limited. As
ison is focused on serial optimization algorithms only and does not such, assessing the efficiency/effectiveness of a metamodel-
extend the comparison to parallel optimization algorithms. In enabled algorithm with respect to a single inefficient algorithm
general, since metamodel-enabled optimizers and optimizers without metamodelling is not appropriate when more efficient
without metamodelling can both be implemented to take advan- alternative algorithms without metamodelling are available.
tage of a parallel computing network, we expect our findings would Moreover, there is substantial amount of randomness inherently
be similar if parallel optimization algorithms were considered. Two involved in metamodel-enabled optimization algorithms. Any single
major factors affecting the performance of metamodel-enabled application of such an algorithm is a single performance level obser-
optimizers namely the shape/complexity of the original computa- vation from a population (statistical) of possible performance levels.
tionally expensive function being optimized and computational Thus, to ensure that the findings are representative of the population
budget availability are directly addressed. The experiments and the of possible performance levels, empirical assessment and comparison
detailed descriptions of the metamodels we implemented were of these algorithms must be based on performing multiple replicates
designed in a way that they give metamodel users a clear view of (despite the obvious computational burden). Regrettably, there are
metamodel characteristics, benefits and shortcomings and many example studies in water resources related metamodelling
demonstrate the complexities and subjective decisions required by literature which make conclusions based on only a single run of their
analysts building a metamodel-enabled optimizer. developed stochastic metamodel-enabled optimization algorithm(s).
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 69

We believe that there are three possible outcomes (cases) in the budget dependent relative metamodel performance”. In Case B, as
evaluation of a metamodel-enabled optimizer relative to an opti- its name reflects, the relative performance of a metamodel-enabled
mizer without metamodelling for a given optimization problem optimizer is a function of the available computational budget (or
(see Fig. 1). As stated in Section 1, the current metamodelling equivalently total number of original function evaluations). The
literature as a whole presents results in a way that implies relative most likely scenario for Case B is that in smaller computational
performance of metamodel-enabled optimizers compared to opti- budgets, the metamodel-enabled optimizer outperforms the opti-
mizers without metamodelling most often looks like Case A mizer without metamodelling; while in higher computational
(“idealized relative metamodel performance”) shown in Fig. 1. In budgets, the performance of the metamodel-enabled optimizer is
Case A, which is too optimistic in our view, the metamodel-enabled inferior. Notably, “equivalence time”, t*, in Case B (see Fig. 1) is not
optimizer always outperforms (or equals) the optimizer without known a priori, and for any specific problem lying under Case B,
metamodelling for any computational budget. Conversely, from determination of t* requires multiple comparative numerical
a pessimistic point of view, there may exist metamodel-enabled experiments. Practically, we do not consider relative performance
optimizers that result in inferior relative performance for any given an infinite computational budget and as such, the computa-
computational budget (Case C in Fig. 1). If lying in Case C, the tional budgets in Fig. 1 should be viewed as a range representing
metamodel-enabled optimizer fails and its application is not relatively limited or practical computational budgets. This study
justifiable. Clearly, another case must exist (i.e., Case B in Fig. 1) in aims at demonstrating the three possible relative performance
between Case A and Case C. We refer to Case B as “computational cases through extensive numerical experiments with different
metamodel-enabled optimizers and different optimizers without
metamodelling.
Case A – Idealized
relative metamodel performance 2.2. Metamodel-enabled optimizer framework development
Best objective Function Value

Development of a successful metamodel-enabled optimizer for


a given problem involves several subjective, non-trivial decisions.
These decisions are important as they affect the behaviour and
performance of the algorithm. The major decisions include the
selection of the appropriate function approximation technique
O capable of acting as a metamodel (see Section 2.2.1) and the selection
M of the framework through which the original model, metamodel, and
the optimizer can be effectively linked (see Section 2.2.2). There are
also a number of subjective decisions to be made for each selected
metamodel-enabled optimizer as discussed in Section 3 for our
Computational Budget adopted algorithms. Obviously, any metamodelling effort would be
burdened by metamodelling time and analyst time, as highlighted in
Section 2.2.3. Design of experiment (also referred to as DoE) is another
Case B – Computational budget dependent subjective component in many metamodel-enabled optimizers. DoE
Best objective Function Value

is focused on identifying a logical initial set of fully evaluated solutions


relative metamodel performance
on which to train or fit the first metamodel. A review of all DoE
methods in metamodelling is beyond the scope of this paper and
instead DoE considerations for the metamodelling frameworks we
implement are discussed later where these frameworks are intro-
duced (Section 3).
O

M 2.2.1. Selecting a function approximation technique


Metamodels approximate the response surface of a computa-
tionally intensive simulation model (i.e., original model) by fitting
t*
a simplified function over a set of previously evaluated points,
Computational Budget commonly referred to as design sites, in the decision variable space.
A variety of function approximation techniques have been devel-
oped and applied as metamodels. Some examples of such tech-
Case C – Failure of metamodelling niques include: polynomials (Crestaux et al., 2009; Fen et al., 2009;
Best objective Function Value

Hussain et al., 2002; Sudret, 2008), kriging which is sometimes


called design and analysis of computer experiment (DACE) in the
metamodelling context (Kleijnen, 2005; Sacks et al., 1989; Sakata
et al., 2003; Santner et al., 2003; Simpson and Mistree, 2001),
M smoothing splines ANOVA models (Gu, 2002; Ratto and Pagano,
O 2010; Storlie et al., 2011; Wahba, 1990), artificial neural networks
(ANNs e Behzadian et al., 2009; Khu and Werner, 2003;
Papadrakakis et al., 1998) and radial basis function (RBF) models
(Hussain et al., 2002; Mugunthan et al., 2005; Mullur and Messac,
2006; Nakayama et al., 2002; Regis and Shoemaker, 2007b). In
some publications (e.g. Jin, 2005; Khu et al., 2004), as well as in the
Computational Budget
MATLAB neural network toolbox (Beale et al., 2010), RBF models are
Fig. 1. Three possible cases for relative performance of metamodel-enabled optimizers considered as a simple type of feed-forward neural networks under
(M) compared to optimizers without metamodelling (O) in minimization problems. the large umbrella of what can be deemed a neural network.
70 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

Function approximation techniques can be classified as inter- population-based) optimization algorithms in a way that an
polating (exact emulator) or non-interpolating (inexact emulator). evolutionary algorithm selectively switches between the original
DACE and RBF models, which are interpolants, exactly predict all expensive function and the metamodel to evaluate the individuals
design sites to represent the underlying function. Polynomials and at each generation. Examples of metamodelling frameworks in
ANNs, which are non-interpolants, produce a varying bias (unpre- environmental and water resources field following this approach
dictable in ANNs) at different design sites (note that if there are as are Behzadian et al. (2009), Khu et al. (2004), Kourakos and
many coefficients in a polynomial as there are design sites, it Mantoglou (2009), Ostfeld and Salomons (2005), Regis and
becomes an interpolant). Note that kriging with the so-called Shoemaker (2004), Shoemaker et al. (2007), and Yan and Minsker
“Nugget effect” (Cressie, 1993) can also be a non-interpolating (2006).
approximator that produces a statistics-based bias at design sites Some researchers have developed surface functions that incor-
and serves as a smoother. As most environmental and water porate metamodel approximation uncertainty (Jones, 2001; Jones
resources simulation models perform deterministic simulation (i.e., et al., 1998; Regis and Shoemaker, 2007b; Sobester et al., 2005).
the outputs of running a model multiple times with the same input In this approach a measure is needed to quantify the predictive
are identical), exact emulators seem more appealing. Section 3.3.2 uncertainty of metamodels. Such a measure is explicitly available in
further deals with the metamodelling issues related to interpola- some metamodels, e.g., in polynomials, kriging (DACE), Gaussian
tion versus smoothing. Radial Basis Function models (Sobester et al., 2005), and smoothing
spline ANOVA (Gu, 2002). Regis and Shoemaker (2007b) utilize
2.2.2. How to link the metamodel, original model, and optimizer a distance-based metric (i.e., the minimum distance from previ-
The most important part of developing metamodel-enabled ously evaluated design sites) to provide a heuristic measure of the
optimizers is the framework through which the optimizer, meta- predictive uncertainty for any given metamodel that is combined
model, and original computationally expensive model interact. with the metamodel function approximation value in order to
There is a large body of literature developing frameworks that guide which solution is selected for evaluation by the original
combine metamodels with optimization concepts. The main focus function.
of water resources related metamodelling literature is also on this
framework development (e.g., as in Behzadian et al., 2009; Broad 2.2.3. Assessing metamodelling time and analyst time
et al., 2005; Mugunthan and Shoemaker, 2006; Zou et al., 2007, To evaluate the true efficiency of a metamodel-enabled opti-
2009). The existing metamodelling frameworks can be classified mizer, besides the computational burden associated with the
into three main approaches. original or expensive simulation model evaluations, other time
The simplest approach to metamodelling is what we refer to as consuming efforts should be taken into account, namely “meta-
the basic sequential approach (also called off-line in Jin, 2005); it modelling time” and “analyst time”. Metamodelling time is the
has been used in different environmental and water resources time needed for determining the location of initial DoE sites,
problems (e.g., in Broad et al., 2005; Johnson and Rogers, 2000; Khu metamodel fitting and refitting (which can be prohibitively long
and Werner, 2003; Liong et al., 2001; Zhang et al., 2009; Zou et al., especially when the metamodel is a neural network as in Broad
2007). This approach involves a minimum of communication/ et al., 2005), metamodel evaluations, and the search procedure
feedback between the metamodel and the original expensive on the metamodel. Metamodelling time is considered in some
model where the metamodel is essentially fitted only once. water resources metamodelling studies such as Behzadian et al.
A more mature approach to metamodelling is what we call (2009), Broad et al. (2005), and Kourakos and Mantoglou
“adaptiveerecursive approach”. This metamodelling approach (2009), and in these studies, metamodelling time is based on
starts with a DoE (Step 1) to identify the initial set of design sites. In tracking total optimization run time as opposed to simply
Step 2, a global/local metamodel is fitted on the initial design sites. counting the number of original model runs required. Unfortu-
In Step 3, usually an optimization algorithm or sometimes another nately, tracking (and then comparing) total optimization run
type of search engine/sampler (even a uniform/normal random times requires more care than tracking the number of original
sampler as in Regis and Shoemaker, 2007b, 2009) is employed to model runs as it is dependent on the computer programming
find the optimal/near-optimal point (or multiple high-quality language, the skill of the person writing the code and the
points) on the metamodel. Then these candidate points are evalu- processor on which the code is executed.
ated by the original function and added to the set of design sites to Analyst time is the human time required to develop and/or
recursively update the metamodel. Step 2 and Step 3 are subse- apply a metamodel-enabled optimization algorithm. None of the
quently repeated many times to adaptively evolve the metamodel water resources related metamodelling studies we are aware of
until some convergence or stopping criterion is met. The best considered the analyst time in their comparative assessment
solution found during the entire trial (available in the final set of against existing metamodel-independent optimizers. Obviously, if
design sites) is returned as the optimal/near-optimal solution to the a practitioner uses readily available software implementing a met-
original function. The adaptive-recursive approach is probably the amodel-enabled algorithm, the analyst time for such an experi-
most commonly used metamodelling approach in environmental ment is zero (or equivalent to analyst time using an optimizer
and water resources context (Bau and Mayer, 2006; Bliznyuk et al., without metamodelling). Unfortunately, analyst time is not some-
2008; di Pierro et al., 2009; Fen et al., 2009; Mugunthan and thing that can be ignored currently since most water resources
Shoemaker, 2006; Mugunthan et al., 2005; Regis and Shoemaker, related metamodelling studies introducing new frameworks or
2007a,b, 2009; Zou et al., 2007, 2009). Therefore, we selected example metamodel-enabled optimizers do not make corre-
three different metamodel-enabled optimization frameworks (see sponding software available for future researchers/users.
Section 3) based on the adaptiveerecursive approach in our Fig. 2 shows the computational budget dependent relative
numerical comparisons. metamodel performance (introduced in Fig. 1 e Case B) when
The third approach in metamodelling literature, which we call metamodelling time and analyst time are taken into account. As
“metamodel embedded evolution approach”, shares some charac- can be seen, when metamodelling time and/or analyst time are
teristics with the adaptiveerecursive approach; but it has signifi- considered, the equivalence time, t*, decreases and in situations
cant differences. For example, the metamodel embedded approach with substantial analyst time, relative metamodel performance can
is inherently designed to be used with evolutionary (i.e., transfer from Case B to Case C.
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 71

3. Metamodel-enabled optimizers

We adopted three metamodel-enabled optimizers to be evalu-


ated against the benchmark optimizers without metamodelling.
Although various metamodel-enabled optimizers have been
developed, software packages implementing them are not as
commonly available as optimizers without metamodelling. One
available metamodel-enabled optimization software in the envi-
ronmental and water resources metamodelling literature (available
at http://www.sju.edu/wrregis/pages/software.html) implements
multistart local metric stochastic RBF (MLMSRBF) developed by
Regis and Shoemaker (2007b). MLMSRBF, described in Section 3.1,
was employed in this study as a well-established, readily available
Fig. 2. Case B relative performance of metamodel-enabled optimizers compared to
optimizers without metamodelling (O) when metamodelling time and analyst time are metamodel-enabled optimizer. Two other metamodel-enabled
taken into account. M0: computational budget considered for comparison is only based optimizers, which are called DACEeGA and ANNeGA hereafter,
on the number of original function evaluations, M1: computational budget also were also tested in this study. Both ANNeGA and DACEeGA were
includes metamodelling time, and M2: both metamodelling time and analyst time are implemented in almost the same GA-based metamodel-enabled
considered.
optimization framework with the only difference being the func-
tion approximation techniques (ANN or DACE) and their associated
2.3. Difficulties in high-dimensional problems design/fitting procedures. Sections 3.2 and 3.3 deal with devel-
oping/implementing DACEeGA and ANNeGA, respectively. All
Metamodelling becomes less attractive or even infeasible due to three of the above metamodel-enabled optimizers follow the
the curse of dimensionality when the number of decision variables general adaptiveerecursive approach described in Section 2.2.2.
is large (Wang and Shan, 2007). In such a problem, the primary
issue is that the minimum number of design sites required to fit 3.1. Multistart local metric stochastic RBF
some metamodels can be excessively large. For example, to deter-
mine the coefficients of a second-order polynomial in a D-dimen- Multistart Local Metric Stochastic RBF (MLMSRBF) is an efficient
sional input space, at least p ¼ (D þ 1)(D þ 2)/2 design sites are RBF embedded optimizer that has shown superior performance
required. Note that this curse of dimensionality problem exists in over multiple existing metamodel-enabled optimizers, specifically
all other metamodels being augmented by second-order poly- for the problems with 8e15 decision variables (Regis and
nomials (e.g., RBF models and kriging if used in conjunction with Shoemaker, 2007b). MLMSRBF has been successfully applied in
second-order polynomials). For example, the kriging software we multiple studies (e.g. Mugunthan and Shoemaker, 2006;
utilized in this paper (Lophaven et al., 2002) allows users an option Mugunthan et al., 2005; Regis and Shoemaker, 2009). MLMSRBF
to use second-order polynomials. The minimum number of design implicitly considers the metamodel approximation uncertainty
sites in RBF models and kriging augmented by first order poly- (see Section 2.2.2). The algorithm starts with a DoE and iteratively
nomials is D þ 1 which is not very limiting. In ANNs, there is no generates candidate points by perturbing the current best solution
clear mathematical minimum number for design sites, but practi- through a normal distribution with zero mean and a specified
cally, it is commonly accepted that this number should be greater covariance matrix. MLMSRBF implements a local search algorithm
than the number of network weights. in the close vicinity of the current best solution since the spread of
Most importantly, high-dimensional problems have an extremely this normal distribution is relatively small compared to the size of
large search space. As such, the number of design sites required to feasible space. To obtain a global search capability, MLMSRBF
reasonably cover the space becomes extremely large for a higher includes multiple independent restarts each initialized using a new
number of decision variables (DVs). As a result, the number of DVs DoE whenever it appears to have converged to a local minimum.
reportedly tackled by metamodel-enabled optimizers is typically not See Regis and Shoemaker (2007b) for full algorithm details.
large and most metamodel applications in the environmental and
water resources context are on functions having less than 15e20 DVs 3.2. Design and analysis of computer experimentegenetic
(as in Bau and Mayer, 2006; Bliznyuk et al., 2008; Fen et al., 2009; algorithm (DACEeGA)
Khu et al., 2004; Khu and Werner, 2003; Liong et al., 2001;
Mugunthan and Shoemaker, 2006; Mugunthan et al., 2005; Ostfeld DACEeGA employs the DACE function approximation technique
and Salomons, 2005; Regis and Shoemaker, 2004, 2007a, b, 2009; in conjunction with a GA. DACE has been used as a metamodel in
Shoemaker et al., 2007; Zhang et al., 2009; Zou et al., 2007, 2009). Bau and Mayer (2006) for pump-and-treat optimization and in di
Therefore, the numerical experiments conducted in this study Pierro et al. (2009) for water distribution system design optimiza-
utilized case studies with only 7e15 DVs. tion. Although not as common in the environmental and water
Screening is the most commonly used compensating solution for resources optimization literature, DACE has been widely used in
difficulties associated with metamodelling in high-dimensional other fields for approximating computer simulation models, and
problems. Based on the fact that models never respond strongly hence, we selected it as one of our metamodelling techniques. We
to all inputs, DV space is typically screened to identify and remove selected a GA as the optimization engine to search over the
DVs that are less important. Various approaches to screening, approximated response surface (metamodel) because GAs are so
especially for high-dimensional model representation, are available commonly used in conjunction with metamodelling (Broad et al.,
in the literature (e.g., Ratto et al., 2007; Young and Ratto, 2009). 2005; di Pierro et al., 2009; Fen et al., 2009; Khu et al., 2004; Khu
However, it can be difficult to obtain substantial reductions of and Werner, 2003; Ostfeld and Salomons, 2005; Yan and Minsker,
dimensionality for large-scale problems (Koch et al., 1999), and also 2006; Zou et al., 2007, 2009). In addition, Johnson and Rogers
any reduction in dimensionality is accompanied by a decrease in (2000) report that the quality of solutions obtained through an
the overall accuracy of approximation unless only the absolutely adaptiveerecursive approach based metamodel-enabled optimizer
irrelevant parameters (if they exist) are fixed. is mostly controlled by metamodelling performance, not the search
72 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

(optimization) algorithm (which is a GA in DACEeGA). This is also a direct relationship with the computational time required for the
consistent with the results of our preliminary experiments. In other original computationally expensive model evaluations. For
words, different global optimization algorithms require different instance, it is quite logical to allocate 5 s on average to meta-
computational budgets to converge to the near-global optimum, modelling time in order to avoid an original model evaluation when
but when applied on fast-to-run metamodels, the difference, which the original model takes 5 min to run; while it seems illogical to
is a small part of the total computational budget needed for met- allocate the same 5 s when the running time of the original model is
amodel-enabled optimization, is negligible. The GA used in devel- only 10 s. In this study, for the two real-world computationally
oping our metamodel-enabled optimizers is the same as the GA intensive case studies (see Section 5.2), we limited ourselves to use
used as a benchmark optimization algorithm without meta- at most 5% of our available computational budget for metamodel-
modelling that is detailed in Section 4.1. ling time. Note that since the four test functions used in this study
It is worth noting that, initially, our goal was to exactly replicate (see Section 5.1) were assumed to represent the response surface of
previously published implementations of the DACEeGA (and computationally intensive simulation models, we ignored the issue
ANNeGA, explained later in Section 3.3) frameworks. Unfortu- of metamodelling time in these problems as it was assumed
nately, since software was unavailable and there is a shortage of negligible.
required framework implementation details in the publications In this study, a well-established, well documented imple-
from various sources, exact replication was not possible. Instead, mentation of DACE written by Lophaven et al. (2002) was utilized. It
we implemented our own interpretation of the framework based is a MATLAB add-on toolbox (available at http://www2.imm.dtu.
on adaptiveerecursive approach that shares features with other dk/whbn/dace/) which supports a variety of user selected corre-
similar metamodel-enabled optimizers (e.g., Regis and Shoemaker, lation functions. The DACE model applied in this study utilizes
2007a; Zou et al., 2007, 2009). Fig. 3 shows the flowchart of our a Gaussian correlation function augmented with a first order
DACEeGA metamodel-enabled optimizer. Details of the meta- polynomial. The minimum number of design sites required to
model, the size of the initial set of design sites, p, and the procedure initially fit our selected DACE approximation in a D-dimensional
and frequency of metamodel (re)fitting are presented in the space is D þ 1; while the optimal number is highly function-and-
following paragraphs. computational-budget dependent and very difficult to determine.
To design the metamodelling framework and initial DoE, there is The term “optimal” here reflects the fact that increasing the
a basic question which needs to be answered first: how much extra number of initial design sites would enhance the accuracy of fit (a
computational budget or equivalent time should be allocated to positive effect), however, after some point (which is the optimum)
metamodel associated efforts, called metamodelling time (see this enhancement would be at the expense of unnecessarily
Section 2.2.3)? Clearly, the metamodelling time allocated can be in increasing the computational budget having to be initially spent on
DoE while it could have been spent more intelligently in the next
steps (a negative effect). In our DACEeGA implementation, the size
of the initial set of design sites was calculated as follows based on
some preliminary experiments:

p ¼ max ½2ðD þ 1Þ; 0:1n (1)

where n is total number of original function evaluations (which


typically accounts for almost all available computational budget) to
be evaluated during the optimization. When n is relatively small,
2(D þ 1) initial design sites are used, which is twice as many design
sites as the minimum requirement (Regis and Shoemaker, 2007b,
recommend 2(D þ 1) initial design sites). When n becomes larger,
in order to design a more detailed metamodel, 0.1n is used as the
size of the initial DoE.
Fitting/refitting the metamodel can become computationally
demanding such that the frequency of refitting the metamodel
must not be too high. Inversion of an i  i matrix (i ¼ number of
design sites, p  i  n) is the main computational effort of DACE
output calculation for a given input. Therefore, assuming
GausseJordan elimination is applied for matrix inversion (actual
inversion method is unclear in DACE software), the complexity of
the (re)fitting problem can be assumed to be proportional to O(i3).
Refitting of DACE in our DACEeGA is performed in two levels: the
first level is to simply add the new points to the set of design sites in
DACE, but the second level is to re-optimize (re-tune) the DACE
hyper-parameters (correlation function parameters), over the
entire set of design sites through maximum likelihood estimation.
As the first level is relatively fast, even for large i values (in this
study i  n  1000), this level is performed whenever the algorithm
goes back to the refitting step (second box in the flowchart shown
in Fig. 3). However, since the second level can be computationally
demanding for large i values, in order to limit metamodelling time,
Fig. 3. Flowchart of the developed DACEeGA and ANNeGA metamodel-enabled
the frequency of performing the second level should be decreased
optimizers e the term ‘metamodel’ in this flowchart interchangeably represents as i becomes larger. This decreasing trend can exactly follow the
ANNs and DACE. polynomial form of the complexity equation mentioned above. In
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 73

our DACEeGA implementation, the second level is always per- Dandy (2000), although mathematically there is no limitation
formed at the DACE refitting step while i  100, but afterwards, this when the number of parameters is higher than p. A possibly good
level is performed less often following the complexity equation so idea is to enlarge m dynamically as more design sites become
that, for example, for the i values around 200, the second level is available. In this study, we followed the trial-and-error approach
performed after receiving every 8 new design sites (8 ¼ 2003/1003), used in all similar studies (Behzadian et al., 2009; Broad et al., 2005;
and for the i values around 500, after receiving every 125 new Johnson and Rogers, 2000; Khu and Werner, 2003; Kourakos and
design sites (125 ¼ 5003/1003). Our two-level refitting approach Mantoglou, 2009; Liong et al., 2001; Yan and Minsker, 2006;
seems unique in comparison with previous DACE-based studies Zhang et al., 2009; Zou et al., 2007, 2009). Clearly, this trial-and-
although exact details of the refitting strategies employed in error step considerably adds to both analyst time and meta-
previous papers are not always clearly reported. modelling time. Generally, for a specific problem, there are multiple
good structures, and for each structure, there are many good/
3.3. Artificial neural networkegenetic algorithm (ANNeGA) acceptable parameter sets. However, the error surface (ANN error
function with respect to network weights and biases) of the more
ANNeGA utilizes the same framework presented in Section 3.2 parsimonious networks are more complex and harder to train,
for DACEeGA. As explained later in this section, we believe that while the more flexible ones may become over-parameterized and
there are multiple problems and shortcomings associated with the degrade in generalization ability (Razavi and Tolson, 2011). In this
application of ANNs in metamodel-enabled optimizers for study, significant effort was also devoted to determine a proper
computationally intensive problems. Despite this view, it is hard to number of initial design sites, p (these numbers are presented in
ignore the empirical evidence showing that ANNs are the most Section 6).
commonly employed metamodel in environmental and water As there is no guarantee to reach to a good solution when
resources optimization problems (as in Behzadian et al., 2009; whatever ANN training algorithm applied has converged, at the
Broad et al., 2005; Johnson and Rogers, 2000; Khu and Werner, first ANN training step, we conduct 50 independent training
2003; Kourakos and Mantoglou, 2009; Liong et al., 2001; Yan and trials starting from different solutions initialized through
Minsker, 2006; Zhang et al., 2009; Zou et al., 2007, 2009) and are NguyeneWidrow method (Nguyen and Widrow, 1990) and take
mostly used in conjunction with GAs (as in Broad et al., 2005; Khu the best one (the one with lowest error function value e see
and Werner, 2003; Liong et al., 2001; Yan and Minsker, 2006; Zou equation (2) as introduced in Section 3.3.2). Like DACEeGA,
et al., 2007, 2009). Therefore, we adopt such a combination refitting the ANN in our ANNeGA is also performed in two
(ANNeGA) in this study. Similar to the DACEeGA discussion in levels: the first level is to add the new point to the set of design
Section 3.2, we could not exactly replicate previously published sites and re-train the ANN starting from the current state
ANNeGA frameworks due to ANNeGA software unavailability and (current weights and biases), but the second level is to re-train
a lack of framework implementation details in various ANNeGA an ANN by exactly the same procedure as the one used in the
publications. Fig. 3 shows the flowchart of ANNeGA when ANN is first ANN training step (50 independent training trials). Like
used as the metamodel. For the ANN implementation, the MATLAB DACEeGA, the first level is performed whenever the algorithm
Neural Network Toolbox (Beale et al., 2010) was used. goes back to the refitting step (second box in the flowchart
shown in Fig. 3). But the second level is performed after every
3.3.1. ANN structure and training 50 new design sites are collected. The logic behind frequently
The ANN design and fitting/refitting procedures along with the re-training the ANN from scratch is that it helps ANNeGA escape
difficulties associated with the application of ANNs in meta- from false regions of attraction, which sometimes may be
modelling context (and how we tried to address them) are pre- captured by an inappropriately formed ANN, and thus explore
sented in this section. Determination of optimal or proper structure the search space more effectively and globally. The second-order
of ANNs is a main step in ANN-based metamodel design. In variations of backpropagation algorithms (i.e., quasi-Newton
a multilayer perceptron neural network (MLP), number of hidden algorithm) are the most computationally efficient training
layers, number of neurons in each hidden layer, and transfer algorithms (Hamm et al., 2007). In this study, the highly efficient
functions are the subjective decisions (structure parameters) the LevenbergeMarquardt algorithm available in MATLAB neural
user must make. ANNs with one sigmoidal hidden layer and linear network toolbox (Hagan and Menhaj, 1994) was used. ANN
output layer have been proven capable of approximating any training even through the LevenbergeMarquardt algorithm is
function with any desired accuracy provided that associated computationally demanding. The analyst and metamodelling
conditions are satisfied (Hornik et al., 1989; Leshno et al., 1993). times spent for designing and (re)training an ANN metamodel
Almost all metamodel-enabled optimization frameworks using can be prohibitively long.
ANNs have utilized single-hidden-layer neural networks. For
example, we are only aware of one ANN-based metamodelling 3.3.2. Non-interpolation issue in emulation
study (Liong et al., 2001) that used an MLP with more than one In addition to the ANN challenges associated with its subjective
hidden layer. Accordingly, in our study, a single-hidden-layer design process and computationally demanding training, there is
neural network with a tangent sigmoid function in hidden layer a shortcoming in applying ANNs to approximate the deterministic
and a linear function in output layer was used. The number of response of computer simulation models. As stated in Section 2.2.1,
parameters (weights and biases) to be adjusted in such an ANN is neural networks are non-interpolating approximators, also called
m  (2 þ D) þ 1 where D is the number of inputs (DVs of the inexact emulators, (as opposed to interpolating approximators such
optimization problem) and m is the number of hidden neurons. The as DACE) producing a varying bias (usually unpredictable) at
optimal number of hidden neurons, m, is a function of form and different design sites. ANNs may suffer the most from this varying
complexity of the underlying function (Xiang et al., 2005) as well as bias when emulating deterministic response of a system as,
the training data availability. In the optimization context, the form according to Villa-Vialaneix et al. (2011), other non-interpolating
of a function to be optimized is often unclear, therefore, the number emulation techniques may perform better in such cases. The
of data points available for training, p, is the main factor involved in interpolative/non-interpolative distinction is important since there
determining m. It is usually preferred that the number of ANN are two general types of problems involving function approxima-
parameters be less (or much less) than p as discussed in Maier and tion: physical experiment and computer experiment. There may
74 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

exist substantial random errors in physical experiments due to as with the examples so far, an NN cannot be over-trained”. Simi-
different error sources; whereas, computer simulation models are larly, in the paper by Jin et al. (2002) which proposes the concept of
usually deterministic (noise-free) which means observations evolution control in metamodel-enabled optimizers based on the
generated by a computer model experiment with the same set of metamodel embedded evolution approach (see Section 2.2.2), no
inputs are identical. This distinction has also been acknowledged in attention/effort was devoted to avoid ANN over-training. Similarly,
some publications, e.g., Sacks et al. (1989) and Jones (2001). in ANN enabled optimization papers in water resources literature,
Accordingly, the non-interpolating approximation techniques such there are papers involving ANNs as metamodels which were not
as ANNs are more suitable for physical experiments than computer concerned about over-training at all.
experiments as the usual objective is to have an approximation that In statistics, over-fitting is usually described as when the model
is insensitive (or less sensitive) to noise. As a result, ANNs have been fits the noise existing in the data dominantly rather than the
widely and successfully applied in relating or approximating underlying function. This phenomenon is more likely when the
different error-prone variables existing in the water resources model has a large degree of freedom (is over-parameterized)
problems including: meteorological variable forecasting (e.g., compared to the amount of available data. However, there is
Karamouz et al., 2008; Luk et al., 2000), rainfall-runoff modelling another factor affecting the potential for over-fitting which is the
(e.g., Hsu et al., 1995; Khan and Coulibaly, 2006; Shamseldin, 1997), conformability of the model structure with the shape of the avail-
and streamflow modelling and prediction (e.g., Razavi and able data. Curve-fitting (regression analysis) practices are less
Araghinejad, 2009; Razavi and Karamouz, 2007). prone to the negative effects of this factor because they have
Jones (2001) pointed out that non-interpolating surfaces are a global pre-specified model structure (form) covering the input
unreliable because they might not sufficiently capture the shape of variable space. While in the ANN context, as the response of neural
the deterministic underlying function. However, Ratto and Pagano networks is formed by a union of a number of local flexible non-
(2010) showed that the most advanced smoothing (non-interpo- linear units, the problem associated with the conformability factor
lating) methods compare favourably with kriging which is an can be substantial (especially when the highly efficient quasi-
interpolant. Overall, in metamodel-enabled optimization, it is Newton training algorithms are used). This is not only an issue with
sometimes beneficial to have a smooth approximate response noisy data, and it may also occur when the data are noise-free,
surface passing between design sites in regions of the feasible space especially when the underlying function in the noise-free data is
with inferior quality, and it may lead the search smoothly to the non-smooth or complex (i.e., highly multi-modal). Therefore, the
regions of attraction. However, on the other hand, it can be very so-called over-training issue in ANN context can be caused by both
misleading in regions of attraction where the superiority of aforementioned factors.
candidate solutions over each other is very important and the key As in the metamodelling context (noise-free data) the second
to continue the search. To our knowledge, this issue has not been factor may cause over-training (and it is sometimes ignored), Fig. 4
addressed in the metamodelling literature. In this study, to demonstrates how it may happen and how significant its effect can
diminish this inherent negative effect, we devised a weighted error be through a simple illustrative example. In this figure, the
function to be used in ANN training instead of usual error functions underlying function is y ¼ x2  0.1 cos (10 px), and the objective is
which put equal emphasis on all design sites (and in our prelimi- to approximate it with 50 random design sites. Two single-hidden-
nary ANNeGA experiments yielded inferior results compared to layer ANNs differing in the number of hidden neurons, 15 and 8, are
when our weighted version was used). An ANN error function applied. The response of both ANNs after 1000 epochs with the
quantifies the discrepancies between the ANN outputs and design LevenbergeMarquardt training algorithm is shown in Fig. 4. The
sites which are to be minimized in the training process. The new first ANN which has 46 parameters (weights and biases) is a rela-
error function is a weighted sum of squared errors as follows: tively flexible ANN when compared to the number of design sites
(Fig. 4a); while the second one with 25 parameters is relatively
X
i  2 parsimonious (Fig. 4b). As demonstrated, although in a majority of
wsse ¼ wj ej (2)
the problem domain ANNs have properly captured the underlying
j¼1
function, both ANNs have presented wild and unpredictable fluc-
where i is the number of design sites, ej is the approximation error tuations in some other parts of the domain. The occurrence prob-
for the jth design site, and wj is its corresponding weight. The wj ability of such behaviour is higher in the areas where fewer design
values are linearly proportional to the quality (objective function sites are found.
value) of a design site in a way that the best design site gets In environmental and water resources modelling literature,
a weight value of 1 and the worst one gets 0.1. This strategy leads there are two well-known approaches to avoid ANN over-training:
the neural network to learn the better design sites (which are more early stopping and Bayesian regularization, and both have been
likely in the main region of attraction) more accurately and only efficiently implemented in MATLAB neural network toolbox (Beale
capture the general trend (less accuracy) of the underlying et al., 2010). Most ANN-based metamodelling studies considering
response surface in the poorer quality regions. We implemented over-training have applied the early stopping approach (Broad
our wsse through customization capabilities of the MATLAB neural et al., 2005; Johnson and Rogers, 2000; Khu and Werner, 2003;
network toolbox. Zou et al., 2007, 2009). The main problem with early stopping is
that the available design sites have to be split into training and
3.3.3. Over-fitting issue testing sets (and also sometimes validation set) resulting in fewer
The other issue in ANN-based metamodels is over-training. data available to train ANNs which decreases the approximation
Although the likely occurrence of this phenomenon is globally accuracy, especially in cases where the original model is compu-
accepted when ANNs are applied to approximate physical tationally expensive and, therefore, the available data are scarce.
processes (fitting on physical experiments), surprisingly, some The Bayesian regularization procedure is less common in ANN
researchers believe that over-training never occurs when ANNs are metamodelling literature. It has been used in Kourakos and
fitted over noise-free data (i.e., data obtained from deterministic Mantoglou (2009) to avoid over-training of an ANN metamodel
computer experiments). For example, in water resources inde- for coastal aquifer management. Bayesian regularization has been
pendent publications, this belief has been explicitly stated in proposed by Mackay (1992) and generalized to neural network
Sexton et al. (1998) expressing “when there is no error in the data, applications by Foresee and Hagan (1997). Although the logic
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 75

a b

Fig. 4. Examples of over-trained ANNs responses over noise-free data: (a) a relatively flexible ANN, and (b) a relatively parsimonious ANN.

behind this technique, which tries to keep a balance between 4.1. Genetic algorithm (GA)
accuracy and smoothness, and the steps involved are very sophis-
ticated, it can be easily used through its efficient MATLAB imple- The GA was used as a benchmark optimization algorithm
mentation. Our understanding is that Bayesian regularization may despite its potential ineffectiveness for limited computational
substantially sacrifice neural network accuracy for smoothness, budgets, because GAs are one of the most commonly used family of
particularly for noise-free data. We have verified this statement optimization algorithms in environmental and water resources
through some experiments which are not presented in this paper. (Nicklow et al., 2010) and are one of the most common benchmark
In this study, the early stopping approach was used to avoid over- algorithms applied in metamodelling studies (Broad et al., 2005;
training. As higher quality design sites are of more importance in Fen et al., 2009; Khu et al., 2004; Ostfeld and Salomons, 2005;
metamodel fitting (higher accuracy in more promising regions is Zou et al., 2007). In this study, the GA in the MATLAB global opti-
desired), the testing sites were not selected from the first one-third mization toolbox (MathWorks, 2010) was used. The specific GA
best sites. reproduction steps utilized here were tournament selection, scat-
Over-fitting may also occur in other function approximation tered crossover, and adaptive feasible mutation, chosen based on
techniques including DACE, even when being fitted over noise-free our previous experience with this GA. Note that scattered crossover
data due to the aforementioned conformability issue. However, the and adaptive feasible mutation are default operators for bounded
risk and extent of over-fitting in DACE is typically less compared to optimization problems. Details of these operators have been well-
ANNs. The risk of over-fitting in DACE is higher when there are documented in MathWorks (2010). The GA parameters that should
a very few design sites relative to the number of DACE hyper- be tuned for any specific problem are presented in Section 5.3.
parameters (i.e., correlation function parameters) to be tuned
(Welch et al., 1992). Note that, for example, the DACE model we 4.2. Dynamically dimensioned search (DDS)
used in this study with Gaussian correlation function has D corre-
lation function parameters (hyper-parameters) each of which DDS (Tolson and Shoemaker, 2007) was selected here as the
associated with each dimension in the design sites space (D equals second benchmark optimization algorithm because it was designed
the number of decision variables in the original optimization and has been demonstrated to work very well when the compu-
problem) and i parameters (i is the number of design sites) deter- tational budget is very limited. DDS, which is a single-solution
mined through BLUP (best linear unbiased predictor) (Sacks et al., based algorithm, is unique compared to other optimization algo-
1989). As such, typically, the number of hyper-parameters in rithms with respect to the way that the neighbourhood is
DACE is considerably less than the number of parameters in ANNs. dynamically defined by changing the dimension of the search as
As an example for the first test problem in this study, as demon- a function of current iteration number and the user-specified
strated in Section 6.1, the number of ANN parameters was deter- maximum number of function evaluations. One valuable feature
mined as 85, 121, and 181 for different computational budgets, of DDS is that it requires no algorithm parameter adjustment,
while the number of DACE hyper-parameters was 10 for the same unlike other commonly used stochastic global search algorithms
test problem (for all computational budget scenarios). As over- such as GAs. This value becomes more important in computation-
fitting in DACE is not a major challenge, it has not been directly ally intensive optimization problems, as tuning algorithm param-
addressed in some DACE studies including this work. Over-fitting eters in such problems can be prohibitively long, forcing
has been also addressed in other function approximation tech- practitioners to use default algorithm parameter settings which
niques including smoothing spline ANOVA models (Gu, 2002; may be far from optimal.
Storlie et al., 2011).
4.3. DDS with preemption
4. Benchmark optimization algorithms
This study also uses a more efficient extension of DDS called
Two benchmark optimization algorithms (without meta- “DDS with preemption” hereafter. The deterministic model
modelling), GA and dynamically dimensioned search (DDS), were preemption concept developed and formalized recently in Razavi
used in this study to develop a baseline for assessing the applied et al. (2010) can be used in conjunction with a variety of optimi-
metamodel-enabled optimizers. Whenever applicable, DDS was zation algorithms to enhance their computational efficiency. Model
also enabled with the so-called “model preemption” strategy to preemption is an approach to opportunistically evade unnecessary
enhance the efficiency of the optimization process. evaluations of computationally expensive simulation models.
76 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

Deterministic preemption is applicable when the objective function 5.1. Test functions
value monotonically increases (decreases) in minimization (maxi-
mization) problems as simulation proceeds. For example, in Four mathematical functions commonly used as performance
hydrologic model automatic calibration, preemption is applicable test problems for optimization algorithms, namely the Griewank,
when the objective function is a summation of model prediction Ackley, Rastrigin, and Schwefel functions were used in this study.
error terms accumulating throughout the simulation time period. Fig. 5 shows the perspectives of the 2-dimensional versions of these
As such, model preemption monitors the intermediate results of test functions which have different characteristics. As can be seen
a model simulation and terminates the simulation early (i.e., prior in Fig. 5, the general form of Griewank’s function is well-behaved
to simulating the entire time period) once it recognizes that this and convex with numerous local minima attached to the global
solution is so poor that it will not contribute to guiding the cali- form. Ackley’s function is a more difficult function which has a large
bration algorithm. The attractive feature of the deterministic non-informative area (due to the exponential form of the function)
preemption strategy is that its application leads to exactly the same containing numerous local minima where there is not a detectable
result as when it is not applied. As reported in Razavi et al. (2010), trend towards the global region of attraction. Rastrigin’s function is
preemption may lead up to 60% computational saving in an opti- a fairly difficult, highly multi-modal function with regularly
mization problem. In this study, DDS was used for test functions distributed local minima and a large search space. Schwefel’s
(see Section 5.1) and DDS with preemption was used for real-world function consisting of a large number of peaks and valleys is
computationally intensive automatic calibration case studies (see a difficult and deceptive function in a way that the global minimum,
Section 5.2). which is located near the bound of feasible space, is geometrically
distant from the next best local minima; as such, optimization
algorithms are potentially prone to converging far from the true
5. Design of numerical assessments optimum. Table 1 presents the formulations, bound constraints,
and arbitrarily selected numbers of dimensions used in this study of
This section aims to design a fair and comprehensive numerical these test functions. As stated in Section 2.3, the number of
assessment of metamodel-enabled optimizers versus the opti- dimensions (i.e., number of decision variables of the optimization
mizers without metamodelling. As outlined in Section 1, we believe problem) in metamodel-enabled optimizers cannot typically be
that the shape and the complexity of the original function and the large (typically less than 15e20), and the performance of
computational budget availability are the most important factors metamodel-enabled optimizers is expected to degrade when
affecting the performance of metamodelling. Therefore, as pre- applied to higher dimensional problems. The global optima of the
sented in Section 5.1, four test functions with different character- Griewank, Ackley, and Rastrigin functions are located at the center
istics and different levels of complexity were used for the of feasible space (xi ¼ 0, i ¼ 1,.,D), while the global minimum of
comparative assessment. Two computationally intensive water the Schwefel function is at xi ¼ 420.9687, i ¼ 1,.,D. Note that
resources optimization problems, as presented in Section 5.2, were increasing the number of dimensions does not necessarily corre-
also used as representatives of real-world problems. The experi- spond to increasing complexity. For example in the Griewank
ments were conducted within different computational budget function, although the number of local minima increases expo-
availability scenarios. Details of experimental settings are pre- nentially with the number of dimensions, Locatelli (2003) has
sented in Section 5.3. shown that the function becomes very easy to optimize for large

Fig. 5. Perspectives of the 2-D versions of the applied test functions.


S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 77

Table 1
Summary of optimization test functions.

Name Equation Number of dimensions (D) Bound Constraints Min


 
Griewank 1 XD 2 xi 10 [600, 600]D 0
f ðxÞ  x  PD i¼1 cos pffi þ 1
4000 i¼1 i i
rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi!  X 
Ackley 1 XD 2 1 D 15 [32.768, 32.768]D 0
f ðxÞ ¼ 20exp  0:2 x  exp cosð2pxi Þ þ 20 þ exp ð1Þ
D i¼1 i D i¼1
P [5.12, 5.12]D
f ðxÞ ¼ 10D þ D i ¼ 1 ½xi  10cosð2pxi Þ
Rastrigin 2 10 0
 
Schwefela PD pffiffiffiffiffiffiffi 15 [500, 500]D 0
f ðxÞ ¼ 418:9829D þ i ¼ 1  xi sinð jxi j Þ

a
The first term in the equation does not exist in the original form of the function. Here, it is added to set zero as the minimum of the function.

numbers of dimensions (because local minima become extremely 37 min on average to execute on a 2.8 GHz Intel Pentium processor
small, resulting in an almost uni-modal form of the function), by with 2 GB of RAM and running the Windows XP operating system.
any optimization algorithms, especially derivative-based ones. On
the other hand, in the Ackley function, the ratio of the size of the 5.3. Experimental setting
main region of attraction to the size of the entire feasible space
becomes smaller as the number of dimensions increases e for Computational budget availability is a limiting factor in solving
example from roughly 25% for the 2-D Ackley function to less than computationally intensive optimization problems. In such prob-
0.5% for the 15-D Ackley function (obtained through Monte-Carlo lems, practitioners sometimes are restricted to find a reasonably
analysis). It is worth noting that there are metamodelling studies good solution within a very small number of function evaluations
optimizing the Ackley function in a reduced DV range, while the (for instance, as small as 100). To replicate possible common
reduced range only covers the informative part around the global practices, the following four computational budget availability
minimum having a clear almost linear general trend (with smaller scenarios based on the total number of original function evalua-
scale sinusoidal fluctuations) towards the optimum. As a result, tions were assumed to be available: 100, 200, 500, and 1000.
metamodel-enabled optimizers have substantially less difficulty Optimization within only 100 or 200 original function evaluations
optimizing the Ackley function in the reduced range. are consistent with the benchmark metamodelling study by Regis
and Shoemaker (2007b) in which MLMSRBF (see Section 3.1) was
proposed. One thousand function evaluations was deemed as the
5.2. Computationally intensive calibration problems maximum practical budget available as it might require a very long
computational time; for example, about 26 days of serial processing
In addition to the test functions, two benchmark real-world are needed to run the DFRTT model 1000 times which might be
computationally expensive optimization problems were utilized practically infeasible. The performance of the optimization algo-
in this study. The first one is an automatic calibration problem of rithms (with and without metamodelling) within each computa-
SWAT2000 streamflow model being calibrated over the Cannons- tional budget scenario can be accurately compared in terms of
ville Reservoir watershed in Upstate, New York. This case study, algorithm effectiveness e the final solution quality attained by
which has been originally developed by Tolson and Shoemaker different algorithms at each computational budget scenario can be
(2007), seeks to calibrate 14 parameters by maximizing the Nash- directly compared.
Sutcliffe coefficient for daily flow at the Walton gauging station. Due to the excessively high computational demand of the
Details of this case study are available in Tolson and Shoemaker experiment with DFRTT, only the two very limited computational
(2007); for range constraints see Table 2 therein. This case study, budget scenarios (i.e., the scenarios with 100 and 200 function
which is called SWAT hereafter, is exactly the same as the so-called evaluations) were evaluated for this case study. Note that in all
SWAT-1 case study in Razavi et al. (2010) used to demonstrate the cases, experiments in each computational budget scenario were
performance of DDS with preemption. A single evaluation of this independent from the experiments for other scenarios. For
watershed model requires about 1.8 min on average to execute on example, for the budget of 200 function evaluations, we ran the
a 2.8 GHz Intel Pentium processor with 2 GB of RAM and running experiments from scratch, without getting any feedback from the
the Windows XP operating system. experiments with the budget of 100 function evaluations.
The second benchmark problem is based on a recently intro- As stated in Section 4.3, DDS was used in conjunction with the
duced groundwater flow and reactive transport model that is model preemption strategy (i.e., DDS with preemption) for the
designed to aid in the interpretation of aquifer tests. Dipole flow SWAT and DFRTT case studies. Razavi et al. (2010) show that
and reactive tracer test (DFRTT) is a single-well test proposed for in deterministic model preemption can save approximately 15% and
situ aquifer parameter estimation to aid in the design of remedial 50% of the computational budgets required by DDS to run on the
systems for contaminated sites. The observed breakthrough curve SWAT and DFRTT case studies, respectively, while it can find exactly
(BTC) obtained through this test is analyzed by a DFRTT interpre- the same near-optimal solutions as when preemption was not
tation model (DFRTT-IM) to estimate aquifer parameters. DFRTT-IM applied. Therefore, for DDS with preemption, we add 15% and 50%
is a high-resolution two-dimensional radially symmetric finite to the total number of function evaluations allocated in each
volume model. For details, interested readers are referred to Roos scenario for the SWAT and DFRTT case studies. For example, we
(2009). This case study which was adopted from Razavi et al. consider 230 (200 þ 0.15  200) function evaluations for DDS with
(2010) seeks 7 aquifer parameter values of an unconfined sand preemption on the SWAT case study when the computational
aquifer at the Canadian Forces Base (CFB) Borden near Alliston, ON, budget is equivalent to 200 full function evaluations scenario.
Canada. The objective is to minimize a weighted sum of squared To ensure a fair comparison and make statistically valid
deviations of DFRTT-IM outputs from an observed BTC. Detailed conclusions, multiple replicates with different initial conditions
description as well as parameter range constraints are available in (i.e., random seeds) should be conducted. On the test functions, all
Razavi et al. (2010). A single evaluation of this model takes about optimization algorithms were conducted for 30 replicates;
78 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

however, only five optimization replicates were conducted for each Schwefel functions, respectively. As an example on how to interpret
algorithm on the real-world computationally intensive case the CDFs, according to Fig. 7, the probability that the GA with 100
studies, SWAT and DFRTT, due to their high computational demand. function evaluations attains an objective function value of at most
For DDS, DDS with preemption, and MLMSRBF, there is no need 30 on the Griewank function is about 0.15. A more vertical CDF
to tune algorithm parameters; while GA (in GA without meta- indicates less variable algorithm performance (more robust), and as
modelling and in DACEeGA and ANNeGA) has multiple algorithm such, CDFs that are vertical and as far to the left as possible are
parameters that require tuning. In this study, tournament size and ideal. When comparing algorithms A and B in terms of their
crossover ratio were fixed equal to 4 and 0.8, respectively (the respective empirical CDFs of best (minimum) objective function
MATLAB default values, MathWorks, 2010), and population size, values attained, FA and FB, algorithm A dominates algorithm B
number of generations, and elite count were considered as the GA stochastically at first order if, for any desired objective function
algorithm parameters that need to be tuned for any specific value f, FA(f)  FB(f). As for the real-world case studies the number of
problem. In DACEeGA and ANNeGA, where the GA runs over a fast- replicates is small (i.e., 5), instead of CDFs, the dispersion of the final
to-run metamodel (no strict limit on the total number of meta- solutions found through each algorithm is compared using the
model evaluations), population size, number of generations, and simple plots shown on Fig. 11 and Fig. 12, for the SWAT and DFRTT
elite count were selected to be large and equal to 100, 100, and 4, case studies, respectively. When the final objective function values
respectively. However, as GA without metamodelling was to run on found in all replicates through an algorithm are closer to each other,
an original function (which is assumed computationally expensive), the algorithm is less variable. In the following, we have allocated
selection of a proper GA algorithm parameter set needed more a sub-section to each test function and each real-world case study
careful attention. Obviously, when computational budget is limited to elaborate the different and function-dependent performance of
and fixed (i.e., total number of function evaluations is fixed in the metamodel-enabled optimizers versus the optimizers without
advance), the number of generation is a known function of pop- metamodelling. Due to the reasons outlined in Section 6.1,
ulation size, elite count and total number of function evaluations. ANNeGA was only assessed on the Griewank function.
For the test functions, different configurations of these GA param-
eters were tested and only the best GA results were presented in 6.1. Griewank function
the Results Section (Section 6). Importantly, the algorithm param-
eter tuning process substantially adds to the computational burden In the Griewank function, DACEeGA and MLMSRBF drastically
of an experiment, and it is not feasible when the original simulation outperformed DDS and GA and both were comparable in terms of
model is computationally expensive. However, as the GA parame- approximating the global minimum very fast and efficiently. Within
ters highly affect the algorithm performance, we wanted to solicit only 100 original function evaluations, the average function values
(almost) the best performance of the GA on the test functions found through both these metamodel-enabled algorithms are
which was to be compared with the performance of other algo- approximately 1 and the standard deviations are less than 0.2.
rithms. For the real-world computationally intensive case studies, However, the ANNeGA was completely unable to find a trajectory
the GA parameters were selected based on our evaluation of the GA toward optimality within 100 function evaluations. In other words,
results over the test functions. the ANN metamodel was misleading in this very limited budget and
In the metamodelling part of DACEeGA, we used the meta- the ANNeGA search was hardly ever able to find a better solution
modelling parameters presented in Section 3.2, while for ANNeGA based on the metamodelling guidance than the solutions initially
we still had to determine a proper number of hidden neurons for evaluated though original function for initial DoE. As a result, the
any given case study by conducting multiple trials with various final best function values reported for ANNeGA with 100 function
numbers of hidden neurons. We also had to determine a proper evaluations are almost always the best point in the initial design
number of initial design sites for ANNeGA in a given computational sites. In 200 function evaluations scenario, the ANN metamodel
budget, and this was also selected by trial-and-error experiments. was not as misleading and ANNeGA performance was almost the
The selected numbers of hidden neurons and initial design sites for same as GA in terms of mean function value but inferior in terms of
ANNeGA is presented in Section 6. robustness (see associated CDFs in Fig. 7). In the 500 and 1000
To quantify the computational budget required for a meta- function evaluations scenarios, ANN became able to play the met-
modelling experiment, especially for comparison purposes, it is amodelling role properly although it was still outperformed by
common and convenient to only consider and compare the total DACEeGA and MLMSRBF.
number of original model evaluations in a given optimization trial Note that we went to great efforts to improve the very poor
(Mugunthan and Shoemaker, 2006; Regis and Shoemaker, 2007b; performance of ANNeGA through testing various algorithm
Zou et al., 2007, 2009). Similarly, in the comparative assessments parameter configurations, especially the number of initial design
performed in this study, we also ignored the analyst time and the sites, p, and the number of hidden neurons, m. The values for p and
metamodelling time in our experiments and comparisons were m in ANNeGA were finally selected equal to (p ¼ ) 50, 100, 200, and
based on number of original model evaluations. 300 and (m ¼ ) 7, 10, 15, and 15 for computational budget scenarios
with 100, 200, 500, and 1000 function evaluations, respectively.
6. Results These poor results on such a simple function confirm that ANNs are
handicapped when few design sites are available (i.e., in case of
The average performance of the metamodel-enabled optimizers very limited computational budgets); whereas DACEeGA and
(DACEeGA, ANNeGA, and MLMSRBF) as well as the optimizers MLMSRBF effectively worked although they used fewer initial
without metamodelling (DDS and GA) over the test functions and design sites, p. This is consistent with the fact that to properly train
real-world case studies are shown in Fig. 6. Average performance neural networks, relatively larger sets of design sites are required,
for each computational budget scenario is the average of the best as the minimum number we found reported in literature for initial
original objective function values found in all replicates. The design sites in ANN enabled optimizers is 150 (in Yan and Minsker,
empirical cumulative distribution functions (CDFs) of the final best 2006; in Zou et al., 2007).
function values found within various computational budget The relatively poor performance of ANNeGA was observed
scenarios from all 30 optimization replicates are shown in Fig. 7, despite the fact that at least four extra parameters/decisions
Fig. 8, Fig. 9, and Fig. 10 for the Griewank, Ackley, Rastrigin, and existing in ANNeGA, which are not in DACEeGA and MLMSRBF,
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 79

Fig. 6. Algorithms average performance comparisons on the 10-D Griewank, 15-D Ackley, 10-D Rastrigin, and 15-D Schwefel functions (over 30 replicates) as well as SWAT and
DFRTT case studies (over 5 replicates) at different computational budget scenarios.

(i.e., how to handle over-training prevention and inexact emulation optimizers without metamodelling, and their relative performance
behaviour as well as determination of the number of hidden lies under Case A introduced in Section 2.1 (idealized relative
neurons and the initial DOE size) were manipulated and fine-tuned metamodel performance). Regarding the optimizers without met-
to optimize its performance. As such, substantially more analyst amodelling, DDS substantially outperformed GA and approached
time and metamodelling time were spent for ANNeGA fine-tuning the performance of the metamodel-enabled optimizers in the 500
compared to the other metamodel-enabled optimizers used herein. and 1000 function evaluations scenarios.
As a result, and considering its other shortcomings pointed out in
Sections 3.3.1e3.3.3, ANNeGA was deemed unsuitable for such 6.2. Ackley function
computationally intensive problems and was not assessed on the
other test functions and real-world case studies. On the Ackley function, the metamodel-enabled optimizers,
According to Fig. 7, in all computational budget scenarios, both DACEeGA and MLMSRBF, demonstrated completely different
DACEeGA and MLMSRBF are stochastically dominant over the behaviours. MLMSRBF outperformed DDS and GA, while DACEeGA
80 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

Fig. 7. Empirical cumulative distribution function of final best function values on the 10-D Griewank function.

performance was the worst of all e DDS and GA stochastically increases e in the 100 and 200 function evaluations scenarios 3e4
dominate DACEeGA in the 100, 200, and 500 scenarios (see Fig. 8). replicates (out of 30), with 500 function evaluations only one
In other words, the relative performance of MLMSRBF compared to replicate, and with 1000 none of the 30 replicates are from the non-
both GA and DDS is in Case A (idealized relative metamodel informative region.
performance), whereas the relative performance of DACEeGA Unlike MLMSRBF, DACEeGA starts with an initial DoE and then
compared to GA and DDS is in Case C (failure of metamodelling). As searches globally around the feasible space until it finishes the
noted in Section 5.1, a very small portion of the feasible space of the computational budget. As such, DACEeGA performance is drasti-
15-D Ackley function is informative, and there is a very large region cally inferior compared to MLMSRBF because firstly, the probability
having an almost flat (plateau) general form with numerous regu- of finding a point on the main region of attraction through the
larly distributed local valleys. Fitting a metamodel on the details of initial DoE is lower. Secondly, DACEeGA involves a global search
the non-informative region can be quite misleading and it does not method as opposed to MLMSRBF, and it is not as efficient in
give useful information to locate the main region of attraction. In improving a solution found in the main region of attraction. As
other words, as there is not an effective clue in the general form for a result in DACEeGA, the metamodel is mostly focused on
locating the main region of attraction, a metamodel-based search emulating the deceptive local valleys distributed over the non-
may wander around wasting the entire computational budget informative region. According to Fig. 8, DACEeGA in the 100 and
unless a point on the main region of attraction is evaluated through 200 function evaluations scenarios became mired in the non-
the original function either in the initial DoEs or while searching. As informative region in all 30 replicates, while in larger computa-
MLMSRBF restarts once it converges to a local minimum, the tional budgets, particularly in 1000 function evaluations, some
number of points on the feasible space being evaluated through replicates can find the main region of attraction but are unable to
DoE in a single optimization is relatively high in comparison to efficiently approach the global minimum.
DACEeGA. In addition, MLMSRBF searches locally around the best
solution found so far in a trial, therefore, since the main region of 6.3. Rastrigin function
attraction is located at the center of the feasible space, the chance of
finding it is higher (compared to the case when it is located near On the Rastrigin function, DDS stochastically dominates the
bounds or at the corner e this was verified through an experiment metamodel-enabled optimizers, DACEeGA and MLMSRBF, and GA
not reported here). Once MLMSRBF locates the main region of in all computational budget scenarios (see Fig. 9 e note that only
attraction, it can reach a good solution efficiently due to its local CDFs for 100 and 1000 function evaluations are depicted). There-
search capability. Accordingly, as can be seen in Fig. 8, the proba- fore, compared to DDS, the performance of both metamodel-
bility of returning a relatively poor solution for MLMSRBF from the enabled optimizers is in Case C (failure of metamodelling). In the
non-informative region diminishes as the computational budget 100 and 200 function evaluations scenarios, the metamodel-
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 81

Fig. 8. Empirical cumulative distribution function of final best function values on the 15-D Ackley function.

enabled optimizers outperformed GA, whereas interestingly, in the to MLMSRBF in the 100 function evaluations scenario, while it is
500 and 1000 function evaluations scenario, GA surpasses inferior in the 500 and 1000 function evaluations scenarios.
DACEeGA and MLMSRBF. This behaviour confirms that the supe- DACEeGA and MLMSRBF performed comparably in the 200 func-
riority/suitability of a metamodel-enabled optimizer for a specific tion evaluations scenario.
problem can be a function of computational budget, as in Case B
(computational budget dependent relative metamodel perfor- 6.4. Schwefel function
mance) discussed in Section 2.1. Equivalence time, t*, in this case is
between the 200 and 500 function evaluations scenarios for both For the Schwefel function, the performance of DACEeGA and
MLMSRBF and DACEeGA. Moreover, DDS and GA are more robust DDS in the 100 function evaluations scenario were comparable and
compared to metamodel-enabled optimizers especially in larger both superior to MLMSRBF and GA, but in larger computational
computational budgets. The performance of DACEeGA is superior budgets, DDS outperformed all other algorithms. DACEeGA

Fig. 9. Empirical cumulative distribution function of final best function values on the 10-D Rastrigin function.
82 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

Fig. 10. Empirical cumulative distribution function of final best function values on the 15-D Schwefel function.

performance is considerably better (almost stochastically domi- preemption performed the best in larger budget scenarios in terms
nant, see Fig. 10 e note that only CDFs for 100 and 1000 function of both the mean objective function value and their dispersion over
evaluations are depicted) than MLMSRBF in all computational the 5 replicates. GA performance was the worst in the 100 and 200
budget scenarios. GA is the least effective optimizer in the 100 and function evaluations scenarios, but it surpassed MLMSRBF in the
200 function evaluations scenarios, while it outperforms MLMSRBF 500 and 1000 function evaluations scenarios. In the 100 function
in the 500 and 1000 function evaluations scenarios. Although GA evaluations scenario, the dispersion of the final solutions found in
and MLMSRBF performed comparably in terms of mean best the 5 replicates is relatively high for all algorithms (see Fig. 11).
function values, GA is more robust as the associated CDFs are more However, for the higher computational budget scenarios, DDS with
vertical. Note that none of the optimizers (with and without met- preemption resulted in final solutions with relatively close quality
amodelling) were able to reach close to the global minimum (close objective function values) indicating that among all, DDS
function value of zero in the allocated computational budgets. As with preemption is the most robust and reliable algorithm on this
can be seen in Fig. 6, MLMSRBF reached a plateau in performance case study. In addition, DACEeGA, GA, and MLMSRBF are the
after the 200 function evaluations scenario. One possible reason for second-, third- and fourth-ranked algorithms, respectively, in
this poor behaviour can be that the main regions of attraction of the terms of robustness and reliability. Compared to GA, the relative
Schwefel function are located at the corners of the feasible space performance of MLMSRBF is in Case B (equivalence time is between
geometrically distant and separated from each other, and besides, the 200 and 500 function evaluations scenarios), and the relative
MLMSRBF searches locally around the best solution found. As performance of DACEeGA is in Case A. Compared to DDS, the
a result, MLMSRBF can easily get trapped in the low quality local relative performance of MLMSRBF is in Case C and the relative
minima located far from the main regions of attraction. Compared performance of DACEeGA is in Case B (equivalence time is between
to GA, the relative performance of both metamodel-enabled opti- 100 and 200).
mizers is in Case B (equivalence time is between the 200 and 500
function evaluations scenarios for MLMSRBF and between 500 and 6.6. DFRTT case study
1000 for DACEeGA), while compared to DDS, their relative
performance is in Case C. The DFRTT case study, as stated in Section 5.3, was only used to
assess the optimizing algorithms within the two very limited
6.5. SWAT case study computational budget scenarios with 100 and 200 function eval-
uations. Overall, DDS with preemption was the most effective and
On the SWAT case study, DACEeGA outperformed all other robust optimization algorithm in both 100 and 200 function eval-
algorithms in the 100 function evaluations scenario, but DDS with uations scenarios. MLMSRBF was almost as effective; however, as

Fig. 11. Algorithms performance comparisons on the SWAT model e a maximization problem.
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 83

metamodelling is enhanced with a reasonably considerable


improving rate as the computational budget availability increases;
whereas, this improving rate with computational budget in a met-
amodel-enabled optimizer can be less if metamodel inaccuracies
are substantial (see e.g., MLMSRBF on Schwefel’s function or
DACEeGA on Rastrigin’s function). As a result and as can be seen in
Fig. 6, the performance plots at some computational budget may
cross each other; this “crossing behaviour” (Case B) was observed 6
times (out of 24 one by one comparisons over the 6 case studies)
between the metamodel-enabled optimizers and optimizers
without metamodelling. For example, on the Rastrigin function, in
the very limited budget scenarios (i.e., with 100 and 200 function
evaluations) both DACEeGA and MLMSRBF were superior to GA,
while for larger budget scenarios (i.e., with 500 and 1000 function
Fig. 12. Algorithm performance comparisons on the DFRTT model e a minimization evaluations) GA outperformed the metamodel-enabled optimizers.
problem. This result suggests that when the total number of function eval-
uations can be high (in cases when a relatively large computational
budget is available or the original simulation model is not very
shown on Fig. 12, in one of the five replicates in the 100 function computationally expensive), the optimizers without metamodel-
evaluations scenario, it failed to approach a good quality solution. ling may be more appealing. The results also confirm that equiva-
This one poor result or outlier is why the mean best objective lence time, t*, in Case B is case study- and algorithm-specific and
function value obtained through MLMSRBF in the 100 function requires extensive numerical experiments to be determined.
evaluations scenario is not desirable in comparison with other The choice of the benchmark optimizer (without a metamodel)
algorithms. The MLMSRBF outlier emphasizes the importance of to which a metamodel-enabled optimizer is compared can have
running multiple replicates in an algorithm assessment process. GA a large impact on relative performance conclusions. We selected
performance was the worst in both 100 and 200 function eval- DDS as an appropriate benchmark based on our experience
autions scenarios. Compared to GA, the relative performance of developing and using the algorithm to solve computationally
both metamodel-enabled optimizers lies in Case A, while their intensive optimization problems. We selected GA as another
relative performance compared to DDS lies in Case C. benchmark despite the fact that it is not an appropriate algorithm
for limited computational budgets largely because it appears as
7. Discussion a benchmark optimizer without metamodelling in multiple
metamodel-enabled optimization studies (Broad et al., 2005; Fen
Metamodelling is not always a solution to computationally et al., 2009; Khu et al., 2004; Ostfeld and Salomons, 2005; Zou
intensive optimization problems. As observed, a given meta- et al., 2007). If we consider the GA as the benchmark optimizer
modelling strategy can sometimes be completely misleading and without metamodelling, relative performance of the metamodel-
a waste of computational budget (see e.g., DACEeGA performance enabled optimizers (MLMSRBF and DACEeGA) in 6 out of 12
on the Ackley function or MLMSRBF performance on the Schwefel comparisons (12 ¼ 6 case studies  2 metamodel-enabled opti-
function). In other cases, they can be very effective and efficient (see mizers) are in Case A (idealized relative metamodel performance),
e.g., the performance of DACEeGA and MLMSRBF on Griewank and 5 times in Case B (computational budget dependent relative met-
MLMSRBF performance on the Ackley function). Experiments on amodel performance), and only once in Case C (failure of meta-
the Rastrigin and Schwefel test functions as well as the SWAT and modelling). However, when DDS/DDS with preemption is
DFRTT case studies confirm that efficient optimizers without met- considered as the benchmark optimizer without metamodelling,
amodelling, such as DDS/DDS with preemption, can outperform or the relative performance of the metamodel-enabled optimizers lies
perform equally well compared to metamodel-enabled optimizers. 3 times in Case A (out of 12 comparisons), once in Case B, and 8
Note that with comparable performance, optimizers without met- times in Case C. Thus, using an inappropriate benchmark optimizer
amodelling are preferred over metamodel-enabled optimizers, (the GA) for this study showed only an 8% failure rate of meta-
because they do not require the extra metamodelling computa- modelling versus a 67% failure rate of metamodelling when a more
tional burden (e.g., metamodelling time and analyst time, both of appropriate benchmark optimizer (DDS) was utilized.
which were not explicitly evaluated in this study). We believe that future metamodelling studies evaluating rela-
As the experimental results on the four test functions confirm, tive metamodel-enabled optimizer efficiency and/or effectiveness
the more complex the original response surface, the less effective should be selecting benchmark optimizers without metamodelling
the metamodelling performance. In other words, when the original which are designed or demonstrated to work well with a limited
response surface is very complex, the metamodel inaccuracy may computational budget. At least, instead of simply comparing a GA to
drive the search process towards poor regions in the search metamodel-enabled optimizer performance, the GA parameters
domain. This risk is higher especially when the original response should first be tuned in a way to improve GA performance at the
surface has multiple geometrically distant regions of attraction computational budget of interest (i.e., via multiple optimization
(like the Schwefel function). test functions). A much better approach would be to benchmark the
Importantly, it is likely that the relative performance of a met- performance of a metamodel-enabled optimizer against meta-
amodel-enabled optimizer depends on the allocated computa- model-independent algorithms designed to work relatively effi-
tional budget (Case B as discussed in Section 2.1 and shown in ciently such as DDS/DDS with preemption as demonstrated herein
Fig. 1). Therefore, computational budget availability (i.e., the total or other algorithms such as the MicroGA technique (see Nicklow
number of original function evaluations) is an important factor et al., 2010 for discusion) or multistart derivative-based algo-
affecting the suitability and superiority of a metamodel-enabled rithms (e.g. Doherty, 2005). Other algorithms we believe to be more
optimizer over an optimizer without metamodelling. Typically, effective than the GA such as CMA-ES (Hansen et al., 2003) or
the quality of the final solution found through an optimizer without AMALGAM (Vrugt et al., 2009) might be good additional algorithms
84 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

in the comparison and in particular for computational budgets results when computational budgets were varied between 100 and
above 1000 function evaluations. 1000 original model evaluations. Typically, the likelihood that
Numerical experiments in this study also suggest that none of a metamodel-enabled optimizer outperforms an optimizer without
our metamodelling implementations work best on all problems metamodelling is higher when a very limited computational
and all computational budgets, as MLMSRBF performed better than budget is available; however, this is not the case when the meta-
DACEeGA in some cases (e.g., on the Ackley function in all model is a neural network. In other words, neural networks are
computational budget scenarios and in the Rastrigin function in severely handicapped in limited computational budgets, as their
larger budget scenarios with 500 and 1000 function evaluations) effective training typically requires a relatively large set of design
and worse in some other cases (e.g., on the Schwefel function and sites, and thus are not recommended for use in these situations.
SWAT in all computational budget scenarios). Furthermore, the Furthermore, the numerical results confirmed that meta-
number of times that the performance of MLMSRBF relative to GA modelling is an effective and efficient approach when the original
lies in Case A, Case B, and Case C is 3, 3, and 0, respectively, and response surface is relatively simple. However, the metamodelling
relative to DDS these numbers are 2, 0, and 4. Similarly for performance degrades considerably in case of more complex orig-
DACEeGA relative to GA, the numbers of times in Case A, Case B, inal response surfaces. In addition, our results do not identify
and Case C are 3, 2, and 1, respectively, and for DACEeGA relative to a single preferred metamodelling strategy between the MLMSRBF
DDS, the same figures are 1, 1, and 4. and DACEeGA metamodel-enabled optimizers. In practice, as the
Figs. 6e12 show the dispersion of the best found solutions general form and the level of complexity of a simulation model
across the case studies and confirm the essential need of having response surface is not usually known a priori, any decision on the
multiple replicates in the (comparative) assessment of optimizing appropriate type of metamodelling strategy as well as any predic-
algorithms despite the obvious extreme computational burden that tion about its performance are non-trivial.
results with computationally intensive optimization problems. As Future research efforts could be focused on predicting the
an example, consider if the performance of MLMSRBF in the 100 success/suitability of metamodel-enabled optimizer for example
function evaluations scenario in Fig. 12 was judged only based on based on preliminary experiments measuring metamodel accuracy
the highly inferior replicate instead of all the five replicates (four and generalizability (e.g., through the initial DoE in Fig. 3).
other solutions are all of much higher quality). Conceptually, we believe metamodel-enabled optimizers can be
developed that would almost always show computational budget
8. Conclusions dependent relative performance (Case B in Fig. 2) such that they
would be preferred over any metamodel-independent optimizer
To provide the metamodelling practitioners with a clear view of for at least some limited range of reduced computational budgets.
metamodelling characteristics, benefits and shortcomings, this Therefore, future research should continue to strive for improved
study conducted a numerical assessment of three well-established metamodel-enabled optimization algorithms.
metamodelling strategies involving radial basis functions, kriging
and neural networks as metamodels within different computa-
Acknowledgment
tional budget availability scenarios on four commonly used test
functions and two real-world water resources case studies. DDS/
This research was supported with funding from Bryan Tolson’s
DDS with preemption and a GA were used as the benchmark
NSERC Discovery Grant. We would like to thank the guest editors
optimizers without metamodelling to provide the baseline for
and the four anonymous reviewers for their insightful comments
assessment. The results clearly show that developing a new or
which significantly improved our manuscript.
approximately replicating an existing metamodel-enabled optimi-
zation framework is not enough to warrant the assumption that
such a product is a more effective approach than optimizers References
without metamodelling. In fact, our results show multiple
instances where optimizers without metamodels clearly outper- Bau, D.A., Mayer, A.S., 2006. Stochastic management of pump-and-treat strategies
using surrogate functions. Advances in Water Resources 29 (12), 1901e1917.
form metamodel-enabled optimizers. Such a result has been rarely
Beale, M.H., Hagan, M.T., Demuth, H.B., 2010. Neural Network ToolboxÔ 7-User’s
reported in literature; nonetheless, it is consistent with Willmes Guide. downloadable at:. MathWorks, Inc. www.mathworks.com/help/pdf_doc/
et al. (2003) who developed two metamodel-enabled optimizers nnet/nnet.pdf.
and applied them to three test functions and concluded that Behzadian, K., Kapelan, Z., Savic, D., Ardeshir, A., 2009. Stochastic sampling design
using a multi-objective genetic algorithm and adaptive neural networks.
“Neither the kriging model nor the neural network could clearly Environmental Modelling & Software 24 (4), 530e541.
demonstrate an advantageous performance over an evolutionary Bliznyuk, N., Ruppert, D., Shoemaker, C., Regis, R., Wild, S., Mugunthan, P., 2008.
optimization without metamodels. Very often, optimization assis- Bayesian calibration and uncertainty analysis for computationally expensive
models using optimization and radial basis function approximation. Journal of
ted with a metamodel leads to a degraded performance.” With Computational and Graphical Statistics 17 (2), 270e294.
these metamodel-enabled optimization failures relative to opti- Broad, D.R., Dandy, G.C., Maier, H.R., 2005. Water distribution system optimization
mization without metamodels in mind, we hope this study moti- using metamodels. Journal of Water Resources Planning and Management-
ASCE 131 (3), 172e180.
vates similar robust comparisons between new metamodel- Cheng, C.T., Wu, X.Y., Chau, K.W., 2005. Multiple criteria rainfall-runoff model
enabled optimizers and optimizers without metamodelling so as calibration using a parallel genetic algorithm in a cluster of computers.
to better focus metamodelling research on only the most promising Hydrological Sciences Journal-Journal Des Sciences Hydrologiques 50 (6),
1069e1087.
metamodel-enabled optimization frameworks. Cressie, N.A.C., 1993. Statistics for Spatial Data. John Wiley & Sons, New York.
This study proposed a comparative assessment framework Crestaux, T., Le Maitre, O., Martinez, J.-M., 2009. Polynomial chaos expansion for
which presents a clear computational budget dependent definition sensitivity analysis. Reliability Engineering & System Safety 94 (7), 1161e1172.
di Pierro, F., Khu, S.T., Savic, D., Berardi, L., 2009. Efficient multi-objective optimal
for success/failure of the metamodelling strategies. Analyzing our
design of water distribution networks on a budget of simulations using hybrid
results within such a framework empirically demonstrated that the algorithms. Environmental Modelling & Software 24 (2), 202e213.
suitability and superiority of metamodelling strategies can be Doherty, J., 2005. PEST: Model Independent Parameter Estimation, fifth ed.
affected by computational budget availability (i.e., the total number Watermark Numerical Computing, Brisbane, Australia.
Fen, C.S., Chan, C.C., Cheng, H.C., 2009. Assessing a response surface-based opti-
of original function evaluations). For example, the success/failure mization approach for soil vapor extraction system design. Journal of Water
characterization of metamodelling strategies often changed in our Resources Planning and Management-ASCE 135 (3), 198e207.
S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86 85

Feyen, L., Vrugt, J.A., Nuallain, B.O., van der Knijff, J., De Roo, A., 2007. Parameter Mugunthan, P., Shoemaker, C.A., 2006. Assessing the impacts of parameter uncer-
optimisation and uncertainty assessment for large-scale streamflow simulation tainty for computationally expensive groundwater models. Water Resources
with the LISFLOOD model. Journal of Hydrology 332 (3e4), 276e289. Research 42 (10).
Foresee, F.D., Hagan, M.T., 1997. GausseNewton approximation to Bayesian regu- Mugunthan, P., Shoemaker, C.A., Regis, R.G., 2005. Comparison of function
larization. In: Proceedings of the 1997 International Joint Conference on Neural approximation, heuristic, and derivative-based methods for automatic cali-
Networks, Houston, Texas. IEEE, New York, pp. 1930e1935. bration of computationally expensive groundwater bioremediation models.
Gu, C., 2002. Smoothing Spline ANOVA Models. Springer, New York, USA. Water Resources Research 41 (11), 1e17.
Hagan, M.T., Menhaj, M.B., 1994. Training feedforward networks with Marquardt Mullur, A.A., Messac, A., 2006. Metamodeling using extended radial basis functions:
algorithm. IEEE Transactions on Neural Networks 5 (6), 989e993. a comparative approach. Engineering with Computers 21 (3), 203e217.
Hamm, L., Brorsen, B.W., Hagan, M.T., 2007. Comparison of stochastic global opti- Nakayama, H., Arakawa, M., Sasaki, R., 2002. Simulation-based optimization using
mization methods to estimate neural network weights. Neural Processing computational intelligence. Optimization and Engineering 3 (2), 201e214.
Letters 26, 145e158. Nguyen, D., Widrow, B., 1990. Improving the learning speed of 2-layer neural
Hansen, N., Muller, S.D., Koumoutsakos, P., 2003. Reducing the time complexity of networks by choosing initial values of the adaptive weights. IJCNN International
the derandomized evolution strategy with covariance matrix adaptation (CMA- Joint Conference on Neural Networks 1e3, C21eC26.
ES). Evolutionary Computation 11 (1), 1e18. Nicklow, J., Reed, P., Savic, D., Dessalegne, T., Harrell, L., Chan-Hilton, A., Karamouz, M.,
He, Y., Hui, C.W., 2007. Genetic algorithm based on heuristic rules for high- Minsker, B., Ostfeld, A., Singh, A., Zechman, E., Evolutionary, A.T.C., 2010. State of
constrained large-size single-stage multi-product scheduling with parallel the art for genetic algorithms and beyond in water resources planning and
units. Chemical Engineering and Processing 46 (11), 1175e1191. management. Journal of Water Resources Planning and Management-ASCE 136
Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward networks are (4), 412e432.
universal approximators. Neural Networks 2 (5), 359e366. Ostfeld, A., Salomons, S., 2005. A hybrid genetic e instance based learning
Hsu, K.L., Gupta, H.V., Sorooshian, S., 1995. Artificial neural network modeling of algorithm for CE-QUAL-W2 calibration. Journal of Hydrology 310 (1e4),
rainfall-runoff process. Water Resources Research 31 (10), 2517e2530. 122e142.
Hussain, M.F., Barton, R.R., Joshi, S.B., 2002. Metamodeling: radial basis functions, Papadrakakis, M., Lagaros, N.D., Tsompanakis, Y., 1998. Structural optimization using
versus polynomials. European Journal of Operational Research 138 (1), 142e154. evolution strategies and neural networks. Computer Methods in Applied
Jin, Y., 2005. A comprehensive survey of fitness approximation in evolutionary Mechanics and Engineering 156 (1e4), 309e333.
computation. Soft Computing 9 (1), 3e12. Ratto, M., Pagano, A., 2010. Using recursive algorithms for the efficient identification
Jin, Y.C., Olhofer, M., Sendhoff, B., 2002. A framework for evolutionary optimization of smoothing spline ANOVA models. ASTA-Advances in Statistical Analysis 94
with approximate fitness functions. IEEE Transactions on Evolutionary (4), 367e388.
Computation 6 (5), 481e494. Ratto, M., Pagano, A., Young, P., 2007. State dependent parameter metamodelling
Johnson, V.M., Rogers, L.L., 2000. Accuracy of neural network approximators in and sensitivity analysis. Computer Physics Communications 177 (11), 863e876.
simulationeoptimization. Journal of Water Resources Planning and Manage- Razavi, S., Araghinejad, S., 2009. Reservoir inflow modeling using temporal neural
ment-ASCE 126 (2), 48e56. networks with forgetting factor approach. Water Resources Management 23
Jones, D.R., 2001. A taxonomy of global optimization methods based on response (1), 39e55.
surfaces. Journal of Global Optimization 21 (4), 345e383. Razavi, S., Karamouz, M., 2007. Adaptive neural networks for flood routing in river
Jones, D.R., Schonlau, M., Welch, W.J., 1998. Efficient global optimization of systems. Water International 32 (3), 360e375.
expensive black-box functions. Journal of Global Optimization 13 (4), 455e492. Razavi, S., Tolson, B.A., 2011. A new formulation for feedforward neural networks.
Joslin, D., Dragovich, J., Vo, H., Terada, J., 2006. Opportunistic fitness evaluation in IEEE Transactions on Neural Networks 22 (10), 1588e1598, doi: 1510.1109/
a genetic algorithm for civil engineering design optimization, In: Evolutionary TNN.2011.2163169.
Computation, C.I.C.o. (Ed.), IEEE Congress on Evolutionary Computation, pp. Razavi, S., Tolson, B.A., Matott, L.S., Thomson, N.R., MacLean, A., Seglenieks, F.R.,
2904e2911. 2010. Reducing the computational cost of automatic calibration through model
Karamouz, M., Razavi, S., Araghinejad, S., 2008. Long-lead seasonal rainfall fore- pre-emption. Water Resources Research.
casting using time-delay recurrent neural networks: a case study. Hydrological Regis, R.G., Shoemaker, C.A., 2004. Local function approximation in evolutionary
Processes 22 (2), 229e241. algorithms for the optimization of costly functions. IEEE Transactions on
Keating, E.H., Doherty, J., Vrugt, J.A., Kang, Q.J., 2010. Optimization and uncertainty Evolutionary Computation 8 (5), 490e505.
assessment of strongly nonlinear groundwater models with high parameter Regis, R.G., Shoemaker, C.A., 2007a. Improved strategies for radial basis function
dimensionality. Water Resources Research 46. methods for global optimization. Journal of Global Optimization 37 (1), 113e135.
Khan, M.S., Coulibaly, P., 2006. Bayesian neural network for rainfall-runoff Regis, R.G., Shoemaker, C.A., 2007b. A stochastic radial basis function method for
modeling. Water Resources Research 42 (7). the global optimization of expensive functions. Informs Journal on Computing
Khu, S.T., Savic, D., Liu, Y., Madsen, H., 2004. A fast evolutionary-based meta- 19 (4), 497e509.
modelling approach for the calibration of a rainfall-runoff model. In: The First Regis, R.G., Shoemaker, C.A., 2009. Parallel stochastic global optimization using
Biennial Meeting of the International Environmental Modelling and Software radial basis functions. Informs Journal on Computing 21 (3), 411e426.
Society, pp. 147e152. Osnabruck, Germany. Roos, G.N., 2009. Development of the Dipole Flow and Reactive Tracer Test (DFRTT)
Khu, S.T., Werner, M.G.F., 2003. Reduction of Monte-Carlo simulation runs for for aquifer parameter estimation, Civil and Environmental Engineering,
uncertainty estimation in hydrological modelling. Hydrology and Earth System University of Waterloo. University of Waterloo, Waterloo, ON.
Sciences 7 (5), 680e692. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., 1989. Design and analysis of
Kleijnen, J.P.C., 2005. An overview of the design and analysis of simulation exper- computer experiments. Statistical Science 4 (4), 409e435.
iments for sensitivity analysis. European Journal of Operational Research 164 Sakata, S., Ashida, F., Zako, M., 2003. Structural optimization using Kriging
(2), 287e300. approximation. Computer Methods in Applied Mechanics and Engineering 192
Koch, P.N., Simpson, T.W., Allen, J.K., Mistree, F., 1999. Statistical approximations for (7e8), 923e939.
multidisciplinary design optimization: the problem of size. Journal of Aircraft Santner, T.J., Williams, B.J., Notz, W., 2003. The Design and Analysis of Computer
36 (1), 275e286. Experiments. Springer, New York.
Kourakos, G., Mantoglou, A., 2009. Pumping optimization of coastal aquifers based Sexton, R.S., Dorsey, R.E., Johnson, J.D., 1998. Toward global optimization of neural
on evolutionary algorithms and surrogate modular neural network models. networks: a comparison of the genetic algorithm and backpropagation. Deci-
Advances in Water Resources 32 (4), 507e521. sion Support Systems 22 (2), 171e185.
Kuzmin, V., Seo, D.J., Koren, V., 2008. Fast and efficient optimization of hydrologic Shamseldin, A.Y., 1997. Application of a neural network technique to rainfall-runoff
model parameters using a priori estimates and stepwise line search. Journal of modelling. Journal of Hydrology 199 (3e4), 272e294.
Hydrology 353 (1e2), 109e128. Shoemaker, C.A., Regis, R.G., Fleming, R.C., 2007. Watershed calibration using
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S., 1993. Multilayer feedforward networks multistart local optimization and evolutionary optimization with radial basis
with a non-polynomial activation function can approximate any function. function approximation. Hydrological Sciences Journal-Journal Des Sciences
Neural Networks 6 (6), 861e867. Hydrologiques 52 (3), 450e465.
Liong, S.Y., Khu, S.T., Chan, W.T., 2001. Derivation of Pareto front with genetic Simpson, T.W., Mistree, F., 2001. Kriging models for global approximation in
algorithm and neural network. Journal of Hydrologic Engineering 6 (1), 52e61. simulation-based multidisciplinary design optimization. AIAA Journal 39 (12),
Locatelli, M., 2003. A note on the Griewank test function. Journal of Global Opti- 2233e2241.
mization 25 (2), 169e174. Simpson, T.W., Toropov, V., Balabanov, V., Viana, F.A.C., 2008. Design and analysis of
Lophaven, S.N., Nielsen, H.B., Sondergaard, J., 2002. DACE: AMATLAB Kriging Computer Experiments in Multidisciplinary Design Optimization: A Review of
Toolbox Version 2.0. Technical Report IMM-TR-2002-12. Technical University of How Far We Have Come e or Not. 12th AIAA/ISSMO Multidisciplinary Analysis
Denmark, Copenhagen. and Optimization Conference, Victoria, British Columbia, Canada. 1e22.
Luk, K.C., Ball, J.E., Sharma, A., 2000. A study of optimal model lag and spatial inputs Sobester, A., Leary, S.J., Keane, A.J., 2005. On the design of optimization strategies
to artificial neural network for rainfall forecasting. Journal of Hydrology 227 based on global response surface approximation models. Journal of Global
(1e4), 56e65. Optimization 33 (1), 31e59.
Mackay, D.J.C., 1992. Bayesin interpolation. Neural Computation 4 (3), 415e447. Storlie, C.B., Bondell, H.D., Reich, B.J., Zhang, H.H., 2011. Surface estimation, variable
Maier, H.R., Dandy, G.C., 2000. Neural networks for the prediction and forecasting of selection, and the nonparametric oracle property. Statistica Sinica 21 (2),
water resources variables: a review of modelling issues and applications. 679e705.
Environmental Modelling & Software 15 (1), 101e124. Sudret, B., 2008. Global sensitivity analysis using polynomial chaos expansions.
MathWorks, 2010. MATLAB Global Optimization Toolbox 3. Reliability Engineering & System Safety 93 (7), 964e979.
86 S. Razavi et al. / Environmental Modelling & Software 34 (2012) 67e86

Tan, C.C., Tung, C.P., Chen, C.H., Yeh, W.W.G., 2008. An integrated optimization Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D., 1992.
algorithm for parameter structure identification in groundwater modeling. Screening, predictiong, and computer experiments. Technometrics 34 (1), 15e25.
Advances in Water Resources 31 (3), 545e560. Willmes, L., Back, T., Jin, Y.C., Sendhoff, B., 2003. Comparing neural networks and
Tolson, B.A., Asadzadeh, M., Maier, H.R., Zecchin, A., 2009. Hybrid discrete Kriging for fitness approximation in evolutionary optimization. In: CEC: 2003
dynamically dimensioned search (HD-DDS) algorithm for water distribution Congress on Evolutionary Computation. Proceedings. IEEE, vol. 1e4,
system design optimization. Water Resources Research 45. pp. 663e670. New York.
Tolson, B.A., Shoemaker, C.A., 2007. Dynamically dimensioned search algorithm for Xiang, C., Ding, S.Q., Lee, T.H., 2005. Geometrical interpretation and architecture
computationally efficient watershed model calibration. Water Resources selection of MLP. IEEE Transactions on Neural Networks 16 (1), 84e96.
Research 43 (1). Yan, S.Q., Minsker, B., 2006. Optimal groundwater remediation design using an
Villa-Vialaneix, N., Follador, M., Ratto, M., Leip, A., 2011. A comparison of eight meta- adaptive neural network genetic algorithm. Water Resources Research 42 (5).
modeling techniques for the simulation of N2O fluxes and N leaching from corn Young, P.C., Ratto, M., 2009. A unified approach to environmental systems
crops. Environmental Modelling & Software (to appear: this Thematic Issue). modeling. Stochastic Environmental Research and Risk Assessment 23 (7),
Vrugt, J.A., Nuallain, O., Robinson, B.A., Bouten, W., Dekker, S.C., Sloot, P.M.A., 2006. 1037e1057.
Application of parallel computing to stochastic parameter estimation in envi- Zhang, X.S., Srinivasan, R., Van Liew, M., 2009. Approximating SWAT model using
ronmental models. Computers & Geosciences 32 (8), 1139e1155. artificial neural network and support vector machine. Journal of the American
Vrugt, J.A., Robinson, B.A., Hyman, J.M., 2009. Self-adaptive multimethod search for Water Resources Association 45 (2), 460e474.
global optimization in real-parameter spaces. IEEE Transactions on Evolu- Zou, R., Lung, W.S., Wu, J., 2007. An adaptive neural network embedded genetic
tionary Computation 13 (2), 243e259. algorithm approach for inverse water quality modeling. Water Resources
Wahba, G., 1990. Spline Models for Observational Data. Society for Industrial and Research 43 (8).
Applied Mathematics, Philadelphia, Pennsylvania, USA. Zou, R., Lung, W.S., Wu, J., 2009. Multiple-pattern parameter identification and
Wang, G.G., Shan, S., 2007. Review of metamodeling techniques in support of engi- uncertainty analysis approach for water quality modeling. Ecological Modelling
neering design optimization. Journal of Mechanical Design 129 (4), 370e380. 220 (5), 621e629.

You might also like