Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

STATUS OF AUTOMATIC CALIBRATION FOR HYDROLOGIC MODELS:

COMPARISON WITH MULTILEVEL EXPERT CALIBRATION


By Hoshin Vijai Gupta,1 Soroosh Sorooshian,2 and Patrice Ogou Yapo3

ABSTRACT: The usefulness of a hydrologic model depends on how well the model is calibrated. Therefore,
the calibration procedure must be conducted carefully to maximize the reliability of the model. In general,
manual procedures for calibration can be extremely time-consuming and frustrating, and this has been a major
factor inhibiting the widespread use of the more sophisticated and complex hydrologic models. A global opti-
mization algorithm entitled shuffled complex evolution recently was developed that has proved to be consistent,
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

effective, and efficient in locating the globally optimal model parameters of a hydrologic model. In this paper,
the capability of the shuffled complex evolution automatic procedure is compared with the interactive multilevel
calibration multistage semiautomated method developed for calibration of the Sacramento soil moisture account-
ing streamflow forecasting model of the U.S. National Weather Service. The results suggest that the state of the
art in automatic calibration now can be expected to perform with a level of skill approaching that of a well-
trained hydrologist. This enables the hydrologist to take advantage of the power of automated methods to obtain
good parameter estimates that are consistent with the historical data and to then use personal judgment to refine
these estimates and account for other factors and knowledge not incorporated easily into the automated proce-
dure. The analysis also suggests that simple split-sample testing of model performance is not capable of reliably
indicating the existence of model divergence and that more robust performance evaluation criteria are needed.

INTRODUCTION AND SCOPE most hydrologic models were calibrated exclusively in a


Computer-based hydrologic models are becoming increas- ‘‘manual’’ fashion; a hydrologist having knowledge of the wa-
ingly sophisticated and are used routinely for tasks ranging tershed and experience with the model would use a trial-and-
from real-time prediction of impending flood events to the error procedure to adjust the parameters while visually com-
design of policies and structures for mitigating the effects of paring the observed and simulated outputs using graphical
extreme hydrologic events (e.g., floods and droughts). Because plots. The feasibility of the manual method relies on the fact
many of these hydrologic models are intended to be applicable that many hydrologic models are designed so the parameters
to different watersheds, they typically have parameters that have some conceptual relationship to characteristics of the wa-
must be adjusted to make the model behavior match the be- tershed. Therefore, the hydrologist expects that adjustment of
havior of the watershed of interest. The process of adjusting certain parameters in certain directions will have predictable
these parameters is called parameter estimation or model cal- effects on the way in which the simulated outputs will change.
ibration. However, a large number of interacting parameters can result
The usefulness of hydrologic models for the purpose of op- in unpredictable effects when multiple parameters are adjusted.
erational predictions depends on how well the model is cali- For example, the Sacramento soil moisture accounting (SAC-
brated. No matter how sophisticated the model structure is, if SMA) model used by the U.S. National Weather Service
the parameters are poorly specified, the model-simulated fluxes (NWS) has 16 or more adjustable parameters in addition to
(e.g., streamflows) can be quite different from those actually those associated with channel routing (Brazil and Hudlow
observed. Therefore, the calibration procedure must be con- 1981). Because the model structures are highly nonlinear, it is
ducted carefully to maximize the reliability of the model. In difficult to predict just how sensitive the different portions of
many models, the values of certain parameters often can be the simulated outputs will be to parameter adjustments. In gen-
estimated by direct measurement of physically observable eral, the manual calibration procedure can be extremely time-
characteristics of the watershed (e.g., area of the watershed, consuming and frustrating, and this has been a major factor
fraction of watershed that is impervious, etc.). However, it is inhibiting the more widespread use of the more sophisticated
typical for a model to have several parameters that are not and complex models. Nonetheless, with considerable experi-
directly observable and therefore that must be estimated using ence, many hydrologists have become very skilled at manual
the indirect technique of matching the model to historical data. calibration of their hydrologic models, especially because
The technique involves obtaining historical records of the im- modern high-speed computing enables the effects of parameter
portant hydrologic inputs and outputs for the watershed (e.g., adjustments to be observed very quickly. By adding semiau-
mean areal precipitation, streamflow, estimated evapotranspir- tomated calibration tools, the degree of training required be-
ation, etc.), processing the input data through the model and fore a hydrologist can perform an acceptable manual calibra-
then adjusting the unknown parameters to obtain model-sim- tion also has been relaxed. For example, Brazil (1988)
ulated outputs that match the observed watershed outputs as developed a sophisticated and powerful interactive multilevel
closely as possible. calibration (MLC) method to assist in manual calibration of
Before the widespread availability of high-speed computing, the SAC-SMA model.
1
Because of the time-consuming and difficult nature of man-
Dept. of Hydro. and Water Resour., Univ. of Arizona, Tucson, AZ
85721. E-mail: hoshin@hwr.arizona.edu
ual calibration, researchers in the 1960s and early 1970s began
2
Dept. of Hydro. and Water Resour., Univ. of Arizona, Tucson, AZ. to explore methods by which the calibration process could be
3
Dept. of Sys. and Industrial Engrg., Univ. of Arizona, Tucson, AZ. speeded up and made more objective through automation.
Note. Discussion open until September 1, 1999. To extend the closing Early attempts were reported by Dawdy and O’Donnell
date one month, a written request must be filed with the ASCE Manager (1965), Chapman (1970), Ibbitt (1970), Nash and Sutcliffe
of Journals. The manuscript for this paper was submitted for review and (1970), Ibbitt and O’Donnell (1971), Monro (1971), and John-
possible publication on June 23, 1997. This paper is part of the Journal
of Hydrologic Engineering, Vol. 4, No. 2, April, 1999. 䉷ASCE, ISSN ston and Pilgrim (1976), among others. The basic concept un-
1084-0699/99/0002-0135 – 0143/$8.00 ⫹ $.50 per page. Paper No. derlying the following ‘‘automatic’’ calibration procedure has
16054. changed little since its inception:
JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999 / 135

J. Hydrol. Eng., 1999, 4(2): 135-143


1. A period of calibration data is selected. ger 1980; Jones 1983; Kuczera 1988). Finally, research into
2. An initial guess is made as to the probable values (or methods for estimating model prediction uncertainty has led
range of values) for the parameters. to the development of the generalized likelihood uncertainty
3. The model is run using these values for the parameters. estimation method (Beven and Binley 1992; Freer et al. 1996),
4. The ‘‘distance’’ between the model output and the ob- the Monte Carlo set membership procedure (Keesman 1990;
served data is measured using a mathematical equation van Straten and Keesman 1991), and the prediction uncertainty
called an objective function or estimation criterion. method (Klepper et al. 1991).
5. An automatic optimization procedure (called a search al- In this paper, we report on a study in which the capability
gorithm) is used to search for the parameter values that of the SCE-UA automatic procedure was compared with the
optimize the value of the objective function. interactive MCL multistage semiautomated (manual) method
developed by Brazil (1988) for calibration of the SAC-SMA
model. The results indicate that the state of the art of automatic
The foregoing automatic procedure requires the selection of calibration methods now has been elevated to the point where
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

an objective function, a search algorithm, and a criterion by such methods now can be considered seriously as a viable
which to terminate the search. The early studies took guidance alternative to the manual approach.
from what was then the state of the art in statistics and systems
theory in the selection of these items. However, it soom be-
came apparent that the automatic methods were not providing SCE-UA AUTOMATIC CALIBRATION METHOD
very good results. One problem was that the parameter values
obtained by the automatic procedure often were unrealistic The SCE-UA method is a general purpose global optimi-
conceptually. Another problem was that, although the solution zation strategy designed to handle the various response surface
parameters might provide a good match to the calibration data, problems encountered in the calibration of nonlinear simula-
the model performance on different periods of data often was tion models, particularly the multilevel or ‘‘nested’’ optima
quite poor. A third problem was that the solution provided by problem encountered with conceptual hydrologic models
the automatic calibration procedure varied considerably with (Duan et al. 1992). Detailed explanations of the method appear
choice of the calibration data, initial parameter guess, objective in Duan et al. (1992, 1993); and Duan et al. (1994). In the
function, and search procedure. For example, Johnston and past 6 years, the method has been tested by numerous re-
Pilgrim (1976) reported that, in more than 2 years of calibra- searchers on a variety of models with overwhelmingly positive
tion research, they were unable to find a unique ‘‘best’’ set of results (Duan et al. 1992, 1993; Sorooshian et al. 1993; Luce
parameters for the Boughton model (Boughton 1965), and that and Cundy 1994; Tanakamaru 1995; Gan and Biftu 1996; Tan-
this seemed largely due to a highly irregular form of the ‘‘re- akamaru and Burges 1996; Kuczera 1997; Mroczkowski et al.
sponse surface’’ of the objective function in the parameter 1997; J. G. Ndiritu and T. M. Daniell unpublished work). The
space. Ibbitt (1970), Ibbitt and O’Donnell (1971), and several algorithm typically is able to find consistently and efficiently
other researchers reported similar problems. the global optimum of the problem, whereas other optimiza-
During the past two decades, a great deal of research has tion methods either fail or provide inconsistent results.
been devoted to improving the automatic procedure. That re- In brief, the SCE-UA method involves the initial selection
search has focused primarily on the following four issues: (1) of a ‘‘population’’ of points distributed randomly throughout
Development of specialized techniques for handling the kinds the feasible parameter space. The population is partitioned into
of errors present in the measured data; (2) search for an op- several ‘‘complexes,’’ each consisting of 2n ⫹ 1 points, where
timization strategy that can reliably solve the parameter esti- n is the number of parameters to be optimized. Each complex
mation problem; (3) determination of the appropriate quantity ‘‘evolves’’ independently in a manner based on the downhill
and most informative kind of data; and (4) efficient represen- simplex algorithm (Nelder and Mead 1965; Press et al. 1992).
tation of the uncertainty of the calibrated model (structure and The population is ‘‘shuffled’’ periodically, and new complexes
parameters) and translation of that uncertainty into uncertainty formed so that the information gained by the previous com-
in the model response. Research into techniques for accounting plexes is shared. The evolution and shuffling steps repeat until
for data error has led to the development of maximum likeli- prescribed convergence criteria are satisfied. In general, the
hood functions for measuring the ‘‘closeness’’ of the model method is very easy to implement, having only one user-spec-
and the data (Sorooshian and Dracup 1980; Sorooshian et al. ified coefficient — the number of complexes — that conforms
1981; James and Burges 1982; Sefe and Boughton 1982; Ku- to the difficulty of the optimization problem. Recommenda-
czera 1983a,b; Lemmer and Rao 1983; Sorooshian and Gupta tions on the use of the SCE-UA appear in Duan et al. (1994);
1983; Ibbitt and Hutchinson 1984). Research into optimization the computer code can be obtained by contacting the first
methods has led to the use of population-evolution-based writer. In each of the studies reported in this paper involving
search strategies (Brazil and Krajewski 1987; Brazil 1988; calibration of the SAC-SMA model, the number of parameters
Wang 1991; Duan et al. 1992; Duan et al. 1993; Sorooshian optimized was 13 and the number of complexes used was 15
et al. 1993). In this regard, the shuffled complex evolution [this gave a population size of 15(2 * 13 ⫹ 1) = 405 points];
(SCE-UA) global optimization algorithm developed at the each optimization run was terminated after 20 shuffling loops.
University of Arizona has proved to be consistent, effective, Implementation of the SCE-UA method requires the selec-
and efficient in locating the globally optimal model parameters tion of an objective function to be optimized with respect to
of a hydrologic model (Duan et al. 1992; Luce and Cundy the model parameters. In this study, we examined the perfor-
1994; Tanakamaru 1995; Bonan 1996; Gan and Biftu 1996; mance of the following two objective functions: the daily root-
Tanakamaru and Burges 1996; Kuczera 1997; Mroczkowski et mean square (DRMS) estimation criterion (DRMS = {SLS/
al. 1997; J. G. Ndiritu and T. M. Daniell unpublished work N}1/2, where SLS is the well-known simple least squares value
1997). Research into data requirements has led to the under- and N is the number of data points) and the heteroscadastic
standing that the informativeness of the data is far more im- maximum likelihood estimator (HMLE) criterion (Sorooshian
portant than the amount used for model calibration (Kuczera and Dracup 1980). The DRMS criterion implicitly assumes
1982; Sorooshian et al. 1983; Gupta and Sorooshian 1985; that the error variance is constant (homogenous), whereas in
Yapo et al. 1996). Research into representation of model un- HMLE the error variance is assumed to vary with the level of
certainty has led to practical procedures for rigorous statistical the output (magnitude of the flows) in a manner believed to
analysis of model parameter uncertainty (Spear and Hornber- be common in streamflow data. The HMLE assumes that a
136 / JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999

J. Hydrol. Eng., 1999, 4(2): 135-143


transformation of the streamflows can be used to stabilize the interactive initial parameter estimator based on the use of com-


residual variance across flow ranges by using the relationship puter-generated graphics along with interactive input is used
to lead the user through some of the steps of initial data quality
q ␭t ⫺ 1
, ␭≠0 control and estimation of the feasible parameter region. In
Qt = ␭ (1) level II, a random search technique is used to obtain an im-
log(qt), ␭=0 proved estimate of the region of the best parameter values
using as many as 10 different statistical goodness-of-fit crite-
ria. In level III, a recursive parameter estimation procedure
where ␭ = an unknown transformation parameter estimated based on the Kalman filter is used to refine the parameter
from the data. The HMLE has the form estimates. Although the implementation of this procedure re-


N
1 quires a fair degree of familiarity with the hydrologic model,
wt ε 2t Brazil (1988) indicated that it can reduce significantly the
N

冋写 册
t=1
amount of previous calibration experience required to obtain
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

min HMLE = N 1/N (2)


␪,␭
a reasonable result. For specific details of these procedures,
wt
t=1
please refer to Brazil (1988).

where ε t = q obs
t ⫺ q tsim = model residual at time t; q obs
t and MODEL, HYDROLOGIC DATA USED, AND
q sim
t = observed and simulated flows, respectively; and wt = PARAMETERS OPTIMIZED
weight assigned to time t, computed as
The SAC-SMA model (Fig. 1) is one of the options used
wt = f 2(␭⫺1)
t (3) by the river forecast centers of the NWS to perform real-time
where ft = q true = expected true flow at time t, approximated river and flood forecasts as well as extended streamflow pre-
t
using q obs dictions. In this study, a research version of the model main-
t . Details of the recommended implementation pro-
cedure appear in Sorooshian et al. (1993). tained by the Department of Hydrology and Water Resources,
University of Arizona, was used. The inputs are mean areal
INTERACTIVE MULTILEVEL CALIBRATION METHOD precipitation (mm/6 h) and potential evapotranspiration (mm/
day). The outputs are estimated evapotranspiration (mm/day)
The Interactive MLC strategy consists of a systematic three- and channel inflow; the latter is converted into streamflow (m3/
stage methodology for estimating the parameters of conceptual s) by means of a unit hydrograph.
hydrologic simulation models; it is designed to reduce the The Leaf River basin (area 1944 km2) has been investigated
larger problem to a number of subproblems that can be solved extensively (Brazil and Hudlow 1981; Sorooshian et al. 1983;
using different optimization techniques. The procedure uses a Brazil 1988). A reliable 40-water-year data set (October 1,
number of computer-based tools to assist the user in a hybrid 1948 – September 30, 1988), which represents a variety of hy-
manual/automatic calibration of the model. In level I, a guided drological conditions, was obtained from the NWS (see statis-

FIG. 1. Schematic Description of SAC-SMA Model [from Brazil (1988)]

JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999 / 137

J. Hydrol. Eng., 1999, 4(2): 135-143


TABLE 1. Precipitation and Flow Statistics for Leaf River Ba- ‘‘IMLC estimates’’ in Table 2. To ensure a fair comparison,
sin the SCE-UA method was used to perform a single calibration
Daily run with the DRMS objective function (hereafter called the
measurements SCE-DRMS procedure) and a single calibration run with the
Flow precipitation Precipitationa HMLE objective function (hereafter called the SCE-HMLE
Statistic (cms) (mm) (mm) procedure). In each case, the same 13 parameters estimated by
(1) (2) (3) (4) Brazil were calibrated under slightly more difficult conditions,
Mean 30.90 3.92 8.69 created by employing a much larger initial parameter uncer-
Standard deviation 65.46 10.14 13.66 tainty on most of the parameters (Table 2).
Minimum 1.58 0 0.0001
Maximum 1,444.17 221.52 221.52 Parameter Estimates
Coefficient variance (%) 221.85 258.76 157.22
Coefficient skew 1.00 1.00 1.00 Fig. 3 shows the final parameter estimates obtained using
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

2 the SCE-DRMS (thin solid line) and the SCE-HMLE (dashed


Note: Area = 1,944 km .
a
Nonzero precipitation. line) calibration runs in comparison with the best interactive
MLC parameter estimates (heavy solid line). Also shown as
the grey shaded region is the size of the initial parameter un-
tics in Table 1). The mean annual precipitation for this period certainty region used by Brazil (1998) for the interactive MLC
is 1,432 mm, and the mean annual runoff is 502 mm (30.90 procedure in comparison with the larger one (the entire box)
cm). The hyetograph and hydrograph for the water-year with used for the SCE-UA procedure. Note that in the interactive
the largest mean flow are shown in Fig. 2. MLC procedure, the estimates of ADIMP and REXP actually
The model has 16 parameters (Table 2), an evapotranspir- drifted outside of the initial uncertainty region during the third
ation adjustment curve, and a unit hydrograph to be deter- (unconstrained) optimization level of the search. The most im-
mined by the user. Following Brazil (1988), the three param- portant thing to notice here is that all three procedures (SCE-
eters SIDE, RIVA, and RSERV were fixed at prespecified DRMS, SCE-HMLE, and interactive MLC) give quite similar
values (RSERV = 0.3, RIVA = 0.0, and SIDE = 0.0); the parameter estimates for some of the parameters (PCTIM, AD-
evapotranspiration adjustment curve and the ordinates for the IMP, ZPERC, LZTWM, LZFPM, and LZFSM) while differing
unit hydrograph were assumed to have been estimated previ- on the others. However, in general the SCE-DRMS parameter
ously. estimates tend to be similar to the interactive MLC estimates,
which probably reflects the fact that Brazil used the DRMS
CASE STUDY criterion as the primary statistic guiding his interactive param-
eter search.
Methodology
Performance Comparison
In this study, we compared the performance of the single-
stage SCE-UA procedure and the three-stage interactive MLC To compare the performance of the parameter estimates ob-
procedure with regard to the calibration of the SAC-SMA tained by the SCE-DRMS, SCE-HMLE, and interactive MLC
model to the Leaf River basin. In previous work, Brazil (1988) methods, the three parameter estimates were used to simulate
applied the interactive MLC procedure to this 13-parameter the streamflow hydrograph for the entire 40-year period for
calibration problem using 11 water-years (WY 1952 – 1962) of which data are available. Then, for each of the 40 years, the
data for the calibration and assuming the initial parameter un- four error statistics DRMS, PBIAS (percent bias), NSE (Nash-
certainty estimates listed in Table 2. Based on numerous runs, Sutcliffe efficiency), and PME (persistence model efficiency)
Brazil arrived at the best set of parameter estimates listed as were computed, where

FIG. 2. Leaf River Basin for Water-Year with Largest Daily Mean Flow (WY 1979): (a) Observed Daily Hyetograph; (b) Observed Daily
Hydrograph

138 / JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999

J. Hydrol. Eng., 1999, 4(2): 135-143


TABLE 2. Parameters and State Variables of SAC-SMA Model
Study bounds Brazil’s initial bounds Interactive MLC
Parameters
optimized Description Lower bound Upper bound Lower bound Upper bound Estimates
(1) (2) (3) (4) (5) (6) (7)
(a) Maximum Capacity Thresholds
UZTWM Upper zone tension water maximum storage 1.0 150.0 10.0 150.0 9.0
(mm)
UZFWM Upper zone free water maximum storage 1.0 150.0 10.0 75.0 39.8
(mm)
LZTWM Lower zone tension water maximum storage 1.0 500.0 75.0 400.0 240.0
(mm)
LZFPM Lower zone free water primary maximum 1.0 1,000.0 120.0 140.0 120.0
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

storage (mm)
LZFSM Lower zone free water supplemental maxi- 1.0 1,000.0 40.0 60.0 40.0
mum storage (mm)
ADIMP Additional impervious area (decimal frac- 0.0 0.4 0.0 0.2 0.25
tion)
(b) Recession Parameters
UZK Upper zone free water lateral depletion rate 0.1 0.5 0.2 0.4 0.200
(day⫺1)
LZPK Lower zone primary free water depletion 0.0001 0.025 0.006 0.01 0.006
rate (day⫺1)
LZSK Lower zone supplemental free water deple- 0.01 0.25 0.15 0.2 0.15
tion rate (day⫺1)
(c) Percolation and Other Parameters
ZPERC Maximum percolation rate (dimensionless) 1.0 250.0 5.0 250.0 250.0
REXP Exponent of the percolation equation (di- 1.0 5.0 1.1 4.0 4.27
mensionless)
PCTIM Impervious fraction of the watershed area 0.0 0.1 0.0 0.02 0.003
(decimal fraction)
PFREE Fraction of water percolating from upper 0.0 0.6 0.0 0.6 0.027
zone directly to lower zone free water
storage (decimal fraction)
(d ) Parameters Not Optimized
RIVA Riparian vegetation area (decimal fraction) 0.0
SIDE Ratio of deep recharge to channel base flow 0.0
(dimensionless)
RSERV Fraction of lower zone free water not trans- 0.3
ferrable to lower zone tension water (dec-
imal fraction)
(e) State Variables
UZTWC Upper zone tension water storage content
(mm)
UZFWC Upper zone free water storage content (mm)
LZTWC Lower zone tension water storage content
(mm)
LZFPC Lower zone free primary water storage con-
tent (mm)
LZFSC Lower zone free secondary water storage
content (mm)
ADIMC Additional impervious area content (mm)

DRMS = 冑冘 1
N
N

t=1
(q sim
t ⫺ q tobs)2 (4)
error; a smaller value indicates a better model performance.
The second statistic PBIAS (units m3/s) measures the average
tendency of the simulated flows to be larger or smaller than
their observed counterparts; the optimal value is 0.0; positive

冘 冒冘
N N
values indicate a model bias toward underestimation, whereas
PBIAS = (q obs
t ⫺ q sim
t ) q tobs ⫻ 100% (5) negative values indicate a bias toward overestimation. Both
t=1 t=1
NSE and PME provide normalized indicators of model per-

冘 冒冘
N N
formance in relation to benchmarks. The NSE (unitless) mea-
sures the relative magnitude of the residual variance (‘‘noise’’)
NSE = 1 ⫺ (q tsim ⫺ q obs
t )
2
(q obs
t ⫺ q mean)2 (6)
t=1 t=1
to the variance of the flows (‘‘information’’); the optimal value
is 1.0, and values should be larger than 0.0 to indicate ‘‘min-

冘 冒冘
N N imally acceptable’’ performance — a value equal to 0.0 indi-
PME = 1 ⫺ (q tsim ⫺ q tobs)2 (q tobs ⫺ q t⫺1
obs 2
) (7) cates that the mean observed flow is a better predictor than
t=1 t=1 the model. The PME measures the relative magnitude of the
residual variance (noise) to the variance of the errors obtained
These four statistics provide interesting perspectives on by the use of a simple persistence model; the optimal value is
model performance. The first statistic DRMS (units m3/s) sim- 1.0, and values should be larger than 0.0 to indicate minimally
ply computes the standard deviation of the model prediction acceptable performance. The simple persistence model repre-
JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999 / 139

J. Hydrol. Eng., 1999, 4(2): 135-143


Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

FIG. 3. Comparison of Normalized Final Parameter Estimates Obtained Using SCE-UA Method with DRMS Criterion (Thin Solid Line)
and HMLE Criterion (Dashed Line) to Interactive MLC Parameter Estimates [Shaded Region Denotes Initial Parameter Uncertainty Re-
gion Used by Brazil (1988)]

FIG. 4. Evaluation of Calibrated Model Performance on Annual Basis for 40 Years of Record

sents a ‘‘minimal-information’’ situation in which we assume However, the SCE-HMLE method tends to provide fore-
that the best estimate of streamflow at the next time step is casts that are relatively unbiased.
given by the observed flow at the current time step. Clearly, 3. Fig. 4(c) indicates that the calibrated SAC-SMA model
if a rainfall-runoff model has a one-step-ahead forecast error tends to be quite efficient with NSE statistic values larger
that is larger than that provided by the simple persistence than 0.8 for most of the 40 years, but that it performs
model (i.e., PME < 0), its suitability for streamflow forecasting poorly on some of the years with NSE less than 0.5.
should be questioned. The results of the comparison (Fig. 4) Also, it is interesting to note that the NSE statistic for
may be summarized as follows: four (WY 1956 – 1959) of the 11 years in the calibration
period tends to be quite poor (NSE < 0.7), suggesting
1. Fig. 4(a) indicates that the interactive MLC and SCE- that it might have been better to select a different set of
DRMS methods generally provide similar DRMS statis- water-years for model calibration.
tic performance for the individual years in the 40-year 4. Fig. 4(d) reveals that the PME statistic is a more pow-
period, with the SCE-DRMS method being slightly su- erful test of model performance than the NSE statistic.
perior. The plot also indicates that the model perfor- The negative PME values on the graph indicate that, for
mance varies from year to year (depending on the wet- some years, the SAC-SMA model is unable to provide
ness of the year — more on this later). The worst (on a daily average basis) a forecast that is superior to
SCE-HMLE performance on the DRMS statistic is, of that provided by a simple persistence model. For the
course, expected, given that the SCE-HMLE parameter years for which the PME statistic is positive, the SAC-
estimates tend to provide model forecasts with noncon- SMA model performance is on the average only 25 – 50%
stant error variance. better than the simple persistence model.
2. Fig. 4(b) shows that the SCE-DRMS and interactive
MLC methods also provide similar PBIAS statistic per- The foregoing results indicate that there is a great deal of
formance, with a definite tendency to underestimation. variability in model performance from year to year as mea-
140 / JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999

J. Hydrol. Eng., 1999, 4(2): 135-143


Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

FIG. 5. Evaluation of Calibrated Model Performance on Annual Basis for 40 Years of Record Ordered by Annual Mean Flow

sured by the four statistics. Some of the following reasons for rameter estimates provide the best below-mean DRMS
this variability become apparent when the results are plotted performance, whereas the SCE-DRMS parameter esti-
against the annual mean daily flow (cm) for the corresponding mates provide the best above-mean performance.
year (Fig. 5): 2. Figs. 6(c and d) indicate that the SCE-HMLE parameter
estimates provide the best below-mean PBIAS perfor-
1. Fig. 5(a) now clearly shows that the model error variance mance, whereas above-mean performance tends to be
(measured by DRMS) increases with wetness of the year, similar for the three estimators.
indicating that the forecast error variance is larger for 3. Fig. 6(e) indicates that the model error variance for be-
higher flows. This observation supports the assumptions low-mean flows tends to be much larger than the vari-
behind the use of the HMLE objective function for pa- ance of the flows themselves (NSE < 0), whereas Fig.
rameter estimation. It also indicates that the model is 6(f) shows that the above-mean forecasts performance
better able to match lower flows (on an absolute basis) tends to be very good for wet years and not so good for
than higher flows. This fact should be considered when drier years.
attempting to do split-sample performance evaluation of 4. Figs. 6(g and h) clearly show that the persistence model
the model; if it is found that the performance evaluation provides superior performance (PME < 0) for below-
period DRMS is quite a bit larger than the calibration mean flows and comparable performance (PME ⬃ 0) for
period DRMS, this does not automatically mean that the dry-year above-mean flows. However, for wet-year
calibration is poor; more careful evaluation of the results above-mean flows the SAC-SMA model performance is
is required — we recommend a comparative evaluation of quite good (PME > 0).
the model fit by computing the statistics separately for
each of the years in the calibration and evaluation peri- CONCLUSIONS
ods and plotting them against an indicator of wetness
such as annual mean daily flow. In this paper, the capability of the SCE-UA automatic pro-
2. Fig. 5(b) indicates that the PBIAS performance for the cedure was shown to compare favorably with the interactive
three methods tends to be more different for dry water- MLC multistage semiautomated method developed by Brazil
years but is quite similar for the wetter years. (1988) for calibration of the SAC-SMA streamflow forecasting
3. Both of Figs. 5(c and d) show that the model NSE and model of the U.S. NWS. The results of this study suggest that
PME performance tends to be better for wetter years than the state of the art in automatic calibration of hydrologic mod-
for dry years. This indicates that the SCA-SMA model els may have reached the point where automated methods can
performs relatively well (PME > 0) for its intended func- be expected to perform with a level of skill approaching that
tion of flood forecasting (high flows), where the ‘‘per- of a well-trained hydrologist. However, we do not mean to
sistence model’’ performance must be necessarily poor, imply that the skill of the hydrologist is becoming unneces-
but that the low flow performance could stand improve- sary, rather that the trained hydrologist may now be able to
ment. place more confidence in the use of automated tools to assist
in the model calibration process. In particular, the hydrologist
The results of the foregoing paragraph indicate that the per- can take advantage of the power of automated methods to
formance of the model tends to vary with flow level. This fact obtain good parameter estimates that are consistent with the
is seen more clearly when, for each water-year, the data are historical data and to then refine these estimates to account for
separated into two portions — ‘‘below-mean’’ flows and other factors and knowledge not easily incorporated into the
‘‘above-mean’’ flows — and the four statistics are computed automated procedure. The major advantage of this is the tre-
again for each portion of the data. The following results are mendous savings in time and energy that can be realized. For
presented in Fig. 6 with the above-mean and below-mean sta- example, to calibrate the SAC-SMA model by manual methods
tistics shown side by side: to the more than 2,000 watersheds for which flood forecasting
must be performed by the U.S. NWS is practically infeasible;
1. Figs. 6(a and b) show clearly that the SCE-HMLE pa- the availability of reliable automated methods such as the
JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999 / 141

J. Hydrol. Eng., 1999, 4(2): 135-143


Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

FIG. 6. Evaluation of Calibrated Model Performance on Annual Basis for 40 Years of Record Ordered by Annual Mean Flow and Par-
titioned into Below-Mean and Above-Mean Flow Categories

SCE-DRMS and SCE-HMLE now makes this task much more rently, we are investigating these issues and will report on our
achievable. findings in due course.
The study also illuminates the fact that model performance We welcome dialog on issues related to the calibration of
evaluation (sometimes incorrectly called verification) can be a various kinds of hydrologic models.
tricky process. It is clear that a simple split-sample test in
which the overall error variance for the calibration and eval- ACKNOWLEDGMENTS
uation periods are compared is incapable of reliably indicating
the existence of model divergence. Instead, it is necessary to Partial financial support for this research was provided by the National
conduct a careful and detailed analysis of the properties of the Science Foundation (Grants EAR-9415347 and EAR-9418147), the Hy-
drologic Research Laboratory of the NWS (Grants NA47WH0408 and
model residuals for both periods; this issue is discussed in NA57WH0575), and by the National Aeronautics and Space Administra-
more detail in a separate paper (Yapo et al. 1996). Another tion (NASA-EOS Grant NAGW2425). Financial assistance provided to
issue is the need to use performance evaluation criteria that Dr. Yapo by the University of Arizona Graduate College is gratefully
have the power to clearly indicate poor model performance. acknowledged. The computer codes and data used in this study are avail-
In this study, the DRMS measure had the least power and the able on request from the first author at hoshin@hwr.arizona.edu.
NSE measure was only marginally informative, whereas the
PBIAS and PME measures were most useful. The power of APPENDIX. REFERENCES
the PME measure comes from the fact that the model perfor-
Beven, K. J., and Binley, A. M. (1992). ‘‘The future of distributed models:
mance is being compared with that of a simple persistence Model calibration and uncertainty prediction.’’ Hydrological Processes,
forecast model. Clearly, much stricter tests could be con- Sussex, U.K., 6, 279 – 298.
structed by comparing model performance to less simple mod- Bonan, G. B. (1996). ‘‘Sensitivity of a GCM simulation to subgrid infil-
els such as trend forecasts or linear time series forecasts. Cur- tration and surface runoff.’’ Climate Dyn., 12, 279 – 285.

142 / JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999

J. Hydrol. Eng., 1999, 4(2): 135-143


Boughton, W. C. (1965). ‘‘A new estimation technique for estimation of Kuczera, G. (1983b). ‘‘Improved parameter inference in catchment mod-
catchment yield.’’ Rep. No. 78, Water Resources Laboratory, University els: 2. Combining different kinds of hydrologic data and testing their
of New South Wales, Manly Vale. compatibility.’’ Water Resour. Res., 19(5), 1163 – 1172.
Brazil, L. E. (1988). ‘‘Multilevel calibration strategy for complex hydro- Kuczera, G. (1988). ‘‘On validity of first-order prediction limits for con-
logic simulation models,’’ PhD dissertation, Colorado State University, ceptual hydrological models.’’ J. Hydro., Amsterdam, 103, 229 – 247.
Fort Collins, Colo. Kuczera, G. (1997). ‘‘Efficient subspace probabilistic parameter optimi-
Brazil, L. E., and Hudlow, M. D. (1981). Calibration procedures used zation for catchment models.’’ Water Resour. Res., 33(1), 177 – 185.
with the National Weather Service river forecast system, in water and Lemmer, H. R., and Rao, A. R. (1983). ‘‘Critical duration analysis and
related land resource systems. New York, 457 – 466. parameter estimation in ILLUDAS.’’ Tech. Rep. No. 153, Water Re-
Brazil, L. E., and Krajewski, W. F. (1987). ‘‘Optimization of complex sources Research Center, Purdue University, West Lafayette, Ind.
hydrologic models using random search methods.’’ Conf. on Engrg. Luce, C. H., and Cundy, T. W. (1994). ‘‘Parameter identification for a
Hydro., Hydraulics Division, ASCE, New York, 726 – 731. runoff model for forest roads.’’ Water Resour. Res., 30(4), 1057 – 1069.
Chapman, T. G. (1970). ‘‘Optimization of a rainfall-runoff model for an Monro, J. C. (1971). ‘‘Direct search optimization in mathematical mod-
arid zone catchment.’’ UNESCO Publication 96, United Nations Ed- eling and a watershed model application.’’ NOAA Techn. Memo No.
ucational, Scientific, and Cultural Organization, Wallington, New Zea- NWS HYDRO-12, U.S. Department of Commerce, Silver Spring, Md.
Downloaded from ascelibrary.org by University of Technology, Sydney on 09/27/22. Copyright ASCE. For personal use only; all rights reserved.

land. Mroczkowski, M., Raper, G. P., and Kuczera, G. (1997). ‘‘The quest for
Dawdy, D. R., and O’Donnell, T. (1965). ‘‘Mathematical models of catch- more powerful validation of conceptual catchment models.’’ Water Re-
ment behavior.’’ J. Hydr. Div., ASCE, 91(4), 113 – 137. sour. Res., 33(10), 2325 – 2336.
Duan, Q. Y., Gupta, V. K., and Sorooshian, S. (1992). ‘‘Effective and Nash, J. E., and Sutcliffe, J. V. (1970). ‘‘River flow forecasting through
efficient global optimization for conceptual rainfall-runoff models.’’ conceptual models, Part 1 — a discussion of principles.’’ J. Hydro.,
Water Resour. Res., 28(4), 1015 – 1031. Amsterdam, 10(3), 282 – 290.
Duan, Q. Y., Gupta, V. K., and Sorooshian, S. (1993). ‘‘A shuffled com- Nelder, J. A., and Mead, R. (1965). ‘‘A simplex method for function
plex evolution approach for effective and efficient global minimiza- minimization.’’ Computer J., 7(4), 308 – 313.
tion.’’ J. Optimization Theory and Applications, 76(3), 501 – 521. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P.
Duan, Q. Y., Sorooshian, S., and Gupta, V. K. (1994). ‘‘Optimal use of (1992). Numerical recipes in C: The art of scientific computing. Cam-
the SCE-UA global optimization method for calibrating watershed bridge University Press, Cambridge, London.
models.’’ J. Hydro., Amsterdam, 158, 265 – 284. Sefe, F. T., and Boughton, W. C. (1982). ‘‘Variation of model parameter
Freer, J., Beven, K. J., and Ambroise, B. (1996). ‘‘Bayesian estimation values and sensitivity with type of objective function.’’ J. Hydro. (New
of uncertainty in runoff prediction and the value of data: An application Zealand), 21(1), 117 – 132.
of the GLUE approach.’’ Water Resour. Res., 32(7), 2161 – 2173. Sorooshian, S., and Dracup, J. A. (1980). ‘‘Stochastic parameter estima-
Gan, T. Y., and Biftu, G. F. (1996). ‘‘Automatic calibration of conceptual tion procedures for hydrologic rainfall-runoff models: Correlated and
rainfall-runoff models: Optimization algorithms, catchment conditions heteroscedastic error cases.’’ Water Resour. Res., 16(2), 430 – 442.
and model structure.’’ Water Resour. Res., 32(12), 3513 – 3524. Sorooshian, S., Duan, Q. Y., and Gupta, V. K. (1993). ‘‘Calibration of
Gupta, V. K., and Sorooshian, S. (1985). ‘‘The relationship between data rainfall-runoff models: Application of global optimization to the Sac-
and the precision of parameter estimates of hydrologic models.’’ J. ramento soil moisture accounting model.’’ Water Resour. Res., 29(4),
Hydro., Amsterdam, 81, 57 – 77. 1185 – 1194.
Ibbitt, R. P. (1970). ‘‘Systematic parameter fitting for conceptual models Sorooshian, S., and Gupta, V. K. (1983). ‘‘Automatic calibration of con-
of catchment hydrology,’’ PhD dissertation, University of London, ceptual rainfall-runoff models: The question of parameter observability
London. and uniqueness.’’ Water Resour. Res., 19(1), 260 – 268.
Ibbitt, R. P., and Hutchinson, P. D. (1984). ‘‘Model parameter consistency Sorooshian, S., Gupta, V. K., and Fulton, J. L. (1981). ‘‘Parameter esti-
and fitting criteria.’’ Proc., IFAC 9th Triennial World Congr., Interna- mation of conceptual rainfall-runoff models assuming autocorrelated
tional Federation of Automatic Control, 153 – 157. data errors — a case study.’’ Int. Symp. on Rainfall-Runoff Modeling,
Ibbitt, R. P., and O’Donnell, T. (1971). ‘‘Fitting methods for conceptual Mississippi State, Miss., 491 – 504.
catchment models.’’ J. Hydr. Div., ASCE, 97(9), 1331 – 1342. Sorooshian, S., Gupta, V. K., and Fulton, J. L. (1983). ‘‘Evaluation of
James, L. D., and Burges, S. J. (1982). ‘‘Selection, calibration, and testing maximum likelihood parameter estimation techniques for conceptual
of hydrologic models.’’ Hydrologic testing of small watersheds, C. T. rainfall-runoff models: Influence of calibration data variability and
Haan, H. P. Johnson, and D. L. Brakensiek, eds., American Society of length on model credibility.’’ Water Resour. Res., 19(1), 251 – 259.
Agricultural Engineers, St. Joseph, Mich., 435 – 472. Spear, R. C., and Hornberger, G. M. (1980). ‘‘Eutrophication in Peel
Johnston, P. R., and Pilgrim, D. (1976). ‘‘Parameter optimization for wa- Inlet — II. Identification of critical uncertainties via generalized sensi-
tershed models.’’ Water Resour. Res., 12(3), 477 – 486. tivity analysis.’’ Water Resour. Res., 14, 43 – 49.
Jones, D. A. (1983). ‘‘Statistical analysis of empirical models fitted by Tanakamaru, H. (1995). ‘‘Parameter estimation for the tank model using
optimization.’’ Biometrika, 70(1), 67 – 88. global optimization.’’ Trans. JSIDRE, 178, 103 – 112.
Keesman, K. J. (1990). ‘‘Set-theoretic parameter estimation using random Tanakamaru, H., and Burges, S. J. (1996). ‘‘Application of global opti-
scanning and principal component analysis.’’ Math. Computation and mization to parameter estimation of the tank model.’’ Proc., Int. Conf.
Simulation, 32, 535 – 543. on Water Resour. and Envir. Res.: Towards the 21st Century, 39 – 46.
Klepper, O., Scholten, H., and Kamer, J. P. G. v. d. (1991). ‘‘Prediction van Straten, G., and Keesman, K. J. (1991). ‘‘Uncertainty propagation
uncertainty in an ecological model of the Oosterschelde Estuary.’’ J. and speculation in projective forecasts of environmental change: A
Forecasting, 10, 191 – 209. lake-eutrophication example.’’ J. Forecasting, 10, 163 – 190.
Kuczera, G. (1982). ‘‘On the relationship of the reliability of parameter Wang, Q. J. (1991). ‘‘The genetic algorithm and its application to cali-
estimates and hydrologic time series data used in calibration.’’ Water brating conceptual rainfall-runoff models.’’ Water Resour. Res., 27(9),
Resour. Res., 18(1), 146 – 154. 2467 – 2471.
Kuczera, G. (1983a). ‘‘Improved parameter inference in catchment mod- Yapo, P. O., Gupta, H. V., and Sorooshian, S. (1996). ‘‘Automatic cali-
els: 1. Evaluating parameter uncertainty.’’ Water Resour. Res., 19(5), bration of conceptual rainfall-runoff models: Sensitivity to calibration
1151 – 1162. data.’’ J. Hydro., 181, 23 – 48.

JOURNAL OF HYDROLOGIC ENGINEERING / APRIL 1999 / 143

J. Hydrol. Eng., 1999, 4(2): 135-143

You might also like