Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Environmental Modelling & Software 20 (2005) 991–1001

www.elsevier.com/locate/envsoft

Does high forecast uncertainty preclude


effective decision support?
Peter Reichert*, Mark E. Borsuk
Swiss Federal Institute for Environmental Science and Technology (EAWAG) 8600 Dübendorf, Switzerland

Abstract

The uncertainty in the predictions of models for the behaviour of environmental systems is usually very large. In many cases the
widths of the predictive probability distributions for outcomes of interest are significantly larger than the differences between the
expected values of the outcomes across different policy alternatives. This seems to lead to a serious problem for model-based
decision support because policy actions appear to have an insignificant effect on variables describing their consequences, relative to
the predictive uncertainty. However, in some cases it is evident that some of the alternatives at least lead to changes in the desired
direction. A formal analysis of this situation is made based on the dependence structure of the variables of interest across different
policy alternatives. This analysis leads to the conclusion that the uncertainty in the difference of model predictions corresponding to
different policies may be significantly smaller than the uncertainty in the predictions themselves. The knowledge about the
uncertainty in this difference may be relevant information for the decision maker in addition to the information usually provided.
The conceptual development is supplemented with a presentation of convenient methods for practical implementation. These are
illustrated with a simple, didactical model for the effect of phosphorus discharge reduction alternatives on phosphorus loading to
a lake.
Ó 2005 Elsevier Ltd. All rights reserved.

1. Introduction by estimates of predictive uncertainty (Reckhow, 1994;


Morgan and Dowlatabadi, 1996; Borsuk et al., 2002).
Mathematical models are useful tools for quantita- Many techniques are available to perform uncertainty
tively summarizing the state of knowledge about a analysis, including Taylor approximations and Monte
system and predicting its future behaviour. To provide Carlo simulation (Beck, 1987; Morgan and Henrion,
a basis for decision making, such models are frequently 1990). For environmental systems, such analyses
used to forecast the effects of different policy alternatives. typically lead to the result that the uncertainty in
For example, in their integrated assessment of climate model-based forecasts is rather large (Morgan and
change, Morgan and Dowlatabadi (1996) generate glob- Dowlatabadi, 1996; Omlin et al., 2001; Reichert and
ally averaged mean radiative forcing projections for three Vanrolleghem, 2001). If the expected values of model
different tax scenarios (Fig. 1). predictions for system variables describing the conse-
An evaluation of the implications of these projections quences of different policy alternatives are closer
is useful information for decisions regarding tax policy. together than the uncertainty range of the predictions,
However, it has been realized that such forecasts are questions may be raised regarding the usefulness of the
even more useful for decision support if accompanied predictions to support a choice between the alternatives.
Such situations seem to imply that the current state
of knowledge about the system and/or natural vari-
* Corresponding author. Swiss Federal Institute for Environmental
Science and Technology (EAWAG), PO Box 611, 8600 Dübendorf,
ability in system behaviour preclude predictions that
Switzerland. Tel.: C41 1 823 52 81; fax: C41 1 823 53 75. are sufficiently accurate to distinguish among policy
E-mail address: reichert@eawag.ch (P. Reichert). alternatives.

1364-8152/$ - see front matter Ó 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2004.10.005
992 P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001

difference between the mean predictions of the three tax


6 scenarios. This seems to imply that the model forecasts
do not provide a basis for deciding among the alter-
natives. However, this ambiguous result of formal
5
radiative forcing [W/m2]

analysis can be in disagreement with the argument that


a tax policy is useful because it at least leads to a change
in the desired direction.
Support for this argument depends on the nature of
4

the major sources of uncertainty. If lack of scientific


knowledge or natural variability in external influence
factors affects the predictions for different policy
alternatives in a similar way, then uncertainty in the
3

difference between the two may be significantly smaller


than the uncertainty in the predictions. However, if
different alternatives are affected by different sources of
natural variability or if the estimation of the effects of
2

2000 2020 2040 the policies is the major source of uncertainty, then the
Year difference itself may be highly uncertain. This raises
interest in calculating the probability distribution of the
Fig. 1. Globally averaged mean radiative forcing projections for three
different tax scenarios (0 [solid], 0.5 [dotted], and 2 [dashed] $/ton C/ difference in model predictions across different policy
year) (Morgan and Dowlatabadi, 1996). alternatives as an additional piece of information
relevant for the decision. A consideration of the
distribution of the predicted differences, together with
To return to the example of integrated assessment of the distribution of the actual predictions, would allow
climate change mentioned above, the difference in the the decision maker to better assess the degree of
predicted mean values corresponding to the three tax confidence that can be placed in the outcome. This
scenarios (Fig. 1) is likely to be significantly less than the information is not usually considered explicitly in
uncertainty in each prediction. For example, one source decision analysis (Berger, 1985; Pratt et al., 1995;
of uncertainty in such predictions is the effect of aerosols Clemen, 1996; French and Rios Insua, 2000). Therefore,
on radiative forcing (Morgan and Dowlatabadi, 1996) our objective is to formally describe and illustrate how
(Fig. 2). Uncertainty regarding the relevance and magni- this information can be derived and used in the decision
tude of aerosol effects would likely overwhelm the process. Our current analysis is an extension of an
earlier analysis presented at the 2002 iEMSs conference
in Lugano, Switzerland (Reichert and Borsuk, 2002).
6

2. Mathematical description of the problem


5
radiative forcing [W/m2]

In the following, we use lower case letters for real


aerosols excluded
values of deterministic functions and upper case letters
for random variables or random functions. Upper case
letters are also used to represent model structure and
4

measurement layout. Bold symbols represent arrays.


A model, M, evaluated for a given observation
layout, L (specifying the variables to be evaluated
and temporal and spatial locations of evaluation),
3

generates predictions that are represented by a vector


aerosols included of nL (number of observations) random variables,
T
YLM ¼ ðYLM;1 ; .; YLM;nL Þ . These predictions are func-
tions of mM model inputs, QM ZðQM;1 ; .; QM;mM ÞT
2

2000 2020 2040 written as


Year
YLM ðQM Þ: ð1Þ
Fig. 2. Globally averaged mean radiative forcing projections for three
different tax scenarios (0 [solid], 0.5 [dotted], and 2 [dashed] $/ton C/
year) with and without consideration of aerosol effects (Morgan and In accordance with the Bayesian paradigm, model
Dowlatabadi, 1996). inputs, including substance loadings, model parameters,
P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001 993

initial conditions, boundary conditions and other structure, Ma, and the random input variables, QaMa , or
external influence factors, are here considered random their densities, faMa ;inp ðqMa Þ:
variables, the probability distributions of which repre-
sent the scientific knowledge available about their values. a Z ðMa ; QMa a Þ or a Z ðMa ; fMa a ;inp ðqMa ÞÞ: ð5Þ
The notation of model predictions given by expres- In many cases, alternatives will be described by different
sion Eq. (1) is extremely general. The model could input distributions for the same model structure.
consist of either: (a) deterministic model functions However, as alternatives can include interventions to
L
yM ðqM Þ; ð2aÞ the system that can be described by setting variables to
given values and deleting the corresponding state
describing the dependence of T model results on input equations (Pearl, 2000), or construction of technical
values, qM Z qM;1 ; .; qM;mM , that become random facilities, the description of which requires model
variables by substituting random variables, QM, for the extensions, different model structures may be required
inputs, (b) deterministic functions with additive random to describe some of the alternatives. Decisions are
disturbance usually based on the expected values, E½YLMa ðQaMa Þ, of
one or more of the model results. In many cases, the
yLM ðqM Þ C ZL ð2bÞ model result of interest is the value of a ‘‘utility’’ or
L
(with E(Z ) Z 0 and often with the additional assump- ‘‘loss’’ function (Berger, 1985; Pratt et al., 1995; Clemen,
tions that Zi are iid and/or normally distributed) which 1996; French and Rios Insua, 2000).
become random variables plus random disturbances The practice of assessing each alternative separately is
when considering the distributions of the model inputs, appropriate when decisions are based on expected values
or, finally, (c) stochastic model functions because the linearity of the operation of taking expect-
ations implies that the expected value of the difference
YLM ðqM Þ; ð2cÞ between the variables of interest for two policy alter-
natives, a0 and a1, is equal to the difference in the
which are already random variables before substituting
expected values of the results,
random variables, QM, for the input values, qM, due to
the stochastic nature of the model functions. The model E½YLMa ðQaM0 a Þ  YLMa ðQaM1 a Þ Z E½YLMa ðQaM0 a Þ
T
outcomes, yLM Zð yLM;1 ; .; yLM;nL Þ , can result directly 0 0 1 1 0 0

from the model functions or they can be functions of  E½YLMa ðQaM1 a Þ; ð6Þ
1 1
model variables. Examples include substance concen-
irrespective of the dependence of these results. However,
trations, ecological or human health impacts, monetary
this simple equation involving expected values may
values, and utilities. In expression (1) it is assumed that
disguise the fact that for the full distribution of the
model results are calculated at a set of discrete locations
difference,
in space and time specified by the observation layout, L,
although results of the underlying model equations can YLMa ðQaM1 a Þ  YLMa ðQaM0 a Þ; ð7Þ
also be continuous functions. For state–space models, 1 1 0 0

expressions (1) or (2a)–(2c) combine the state equations there is no such simple formula. This distribution
with the observation equations. depends in a nontrivial way on the dependence between
Explicit characterization of the model in expres- the two (vectors of) random variables YLMa ðQaM1 a Þ and
1 1
sions (1) or (2a)–(2c) is typically given by: (i) the
L YLMa ðQaM0 a Þ. As discussed in Section 1, this distribution
probability density, fM,cond , of the model outcomes, 0 0
L L L T
y Zðy1 ; .; ynL Þ , for the given observation layout, of the difference would give the decision maker useful
conditional on the input values, qM Z qM;1 ; .; additional information. Of specific interest would be
T
qM;mM (the likelihood function of the model), and (ii) information about the improvement that could be
the probability density of the model inputs, fM, inp: achieved by a decision alternative, a1, compared to
 L  a baseline scenario, a0. Deriving this distribution of the
L
fM;cond y qM ; fM;inp ðqM Þ: ð3Þ difference (Eq. (7)), requires specification of the joint
probability density of the model inputs
The marginal probability density of the model predic-
tions, combining the stochasticity of the model with the fMa0a;a;M
1
a ;inp ðqMa0 ; qMa1 Þ: ð8Þ
0 1
uncertainty of the inputs, is then given as,
Z Specifying the marginals for the decision alternatives
 L  (Expression (5)) is not sufficient for this purpose.
L
fM;pred L
y Z fM;cond ðyL qM Þ fM;inp ðqM Þ dqM : ð4Þ
Unfortunately, in most applications, only this marginal
information is considered. In the following section, we
When using model predictions in decision analysis, will discuss how the joint density (Expression (8)) can be
we characterize a decision alternative, a, by its model assessed and parameterized.
994 P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001

3. Methodological approach 3.1. Fully dependent inputs

The crucial problem in calculating the probability Fully dependent inputs for several decision alter-
distribution of the difference of model results for dif- natives are usually caused by common sources of
ferent policy alternatives (Expression (7)) is the deriva- uncertainty which are not affected by the decision
tion of the joint density, fMa0a;a;M1
a1 ;inp
, of the two input alternatives. This will occur if lack of knowledge about
0
variable sets that characterize the alternatives (Expres- input values is exactly the same for the alternatives or if
sion (8)). It must be decided on a case-by-case basis how there exists a common source of natural variability.
the available information can best be used to formulate Both of these reasons typically apply to the uncertainty
such a distribution. However, some guidelines may be in external influence factors that are not affected by the
useful. As the usual procedure of uncertainty analysis decision alternatives. Additionally, common lack of
(Morgan and Henrion, 1990) is to formulate the knowledge can lead to perfectly dependent distributions
marginal distributions for the two sets of input variables, of ‘‘internal’’ model parameters. For inputs fulfilling this
QaM0 a and QaM1 a , characterizing the two decision alter- condition, the joint distribution consists of fully de-
0 1
natives, this is a natural starting point. The major sources pendent identical distributions for the two subsets of
of uncertainty are usually lack of knowledge and input variables. This is described by a density function
variability in system behaviour (MacIntosh et al., 1994; describing the distribution of those input variables for
Hession et al., 1996; Cullen and Frey, 1999). one alternative multiplied by a Dirac delta function in
Once the sources of uncertainty are identified and the difference of the variables for the two alternatives:
quantified as the marginal distributions of QaM0 a and
QaM1 a , an analysis of the dependence structure of the
0
fMa0a;a;M
1
a ;inp;fdep ðqMa0 ;fdep ; qMa1 ;fdep Þ
0 1
1
sources of uncertainty should be performed. This is Z a0
fM ðqMa0 ;fdep ÞdðqMa0 ;fdep  qMa1 ;fdep Þ; ð9Þ
a0 ;inp;fdep
a difficult step, as measured data will usually not be
available because the decision alternatives have not yet with the index ‘‘fdep’’ selecting the fully dependent input
been implemented. This implies that in most cases the variables.
dependence structure between decision alternatives will
have to be quantified based on prior knowledge about 3.2. Independent inputs
the dependence structure of uncertainty sources for these
alternatives. According to their dependence, the model Input variables that characterize completely different
inputs may be divided into three classes (Fig. 3): fully mechanisms for different decision alternatives may often
dependent inputs, independent inputs, and partially be approximated by independent distributions of these
dependent inputs. We discuss these classes separately. inputs for the different alternatives. In most cases, this
may be a conservative assumption with respect to the
estimation of uncertainty because positive correlations
alternative a1 alternative a0 between the same inputs for different policy alternatives
a a
(the ignorance of which results in an overestimate of the
ΘM1 ΘM0 uncertainty of the difference between outcomes) are
a1,pdep R* a 0,pdep
more typical than negative correlations. For inputs
fulfilling this condition, the joint density can be
a
a
ΘM1 ΘM1 = ΘMa0 a
ΘM0 formulated as the product of the densities of the inputs
a1,ind a1,fdep a0,fdep a 0,ind
for the alternatives:

fMa0a;a;M
1
a ;inp;ind ðqMa0 ;ind ; qMa1 ;ind Þ
0 1
L a L a
YM (ΘM1 ) YM (ΘM0 ) a0
ZfM a1
ðqMa0 ;ind ÞfM ðqMa1 ;ind Þ; ð10Þ
a1 a1 a0 a0 a0 ;inp;ind a1 ;inp;ind

with the index ‘‘ind’’ selecting the independent input


variables.
L a L a
YM (ΘM1 ) - YM (ΘM0 )
a1 a1 a0 a0
3.3. Partially dependent inputs
Fig. 3. Graphical depiction of possible dependence relationships
among model inputs. Round nodes indicate model input variables, and The remaining case of partially dependent input
hexagonal nodes indicate model results. Intermediate model variables variables is the most difficult. Fortunately, in many
may exist but are not shown here. Inputs QaMi a ;fdep are assumed
i applications, most of the inputs can be described to
identical (and therefore fully dependent) for the alternatives a0 and a1,
ai ai
QMa ;ind are assumed to be independent, and QMa ;pdep are assumed to a reasonable degree of approximation by one of the two
i i
be partially dependent with degree of dependence described by Kendall classes described above. Then, if at all, only for a small
*
correlation coefficients R . subset of inputs will a more complicated probabilistic
P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001 995

dependence formulation need to be developed. In the correlation coefficients, R*, and marginal distributions
following paragraphs of this section, we develop (Clemen and Reilly, 1999; Reichert et al., 2002). This
a framework that can be used to parameterize such density has the form
a dependent input distribution.

0 1T 0 0 11
F1
Nð0;1Þ ðx1 Þ F1
Nð0;1Þ ðx1 Þ
1 B 1B .. C  1 B .. CC
cNðRÞ ðxÞ Z 1=2 expB B
@  2@ .
C R  Im B
A @ .
CC
AA ð11bÞ
jRj
F1
Nð0;1Þ ðxm Þ F1
Nð0;1Þ ðxm Þ

The copula representation of multivariate probability (Clemen and Reilly, 1999; Reichert et al., 2002), where
distributions (Schweizer, 1991; Nelson, 1995) provides Im is the m-dimensional identity matrix, F1 N(0,1) is the
a convenient means for structuring dependent inputs. inverse cumulative distribution function of the standard
The copula approach starts from the observation that normal distribution, and R is a correlation matrix
any multivariate probability density function can be calculated from the given Kendall correlation coeffi-
written as a product of all univariate marginal densities cients R* according to
and a copula density which describes the dependence  
structure. When applied to the joint distribution of the p
R : Ri;j Zsin Ri;j 1%i; j%m ð11cÞ
two sets of input variables, QM;a0 and QM;a1 (we can 2
apply this formalism to the complete sets of inputs, as it (Kruskal, 1958; Clemen and Reilly, 1999; Reichert et al.,
includes full dependence and independence as special 2002). In our application, the dimension, m is equal to
cases), for the two decision alternatives, this leads to the the sum of the dimensions of the model input spaces for
following factorization of the joint density the two decision alternatives:
a0 ;a1
fM a ;Ma
0 1
;inp ðqMa0 ; qMa1Þ m Z mMa0 CmMa1 : ð12Þ
a0
ZfMa 0 ;1
ðqMa0 ;1 Þ; .; f Ma0a ;mMa ðqMa0 ;mMa Þ
0 0 0
a1 Kendall correlation coefficients are used because, for
 fM ðq
a ;1 Ma1 ;1
Þ; .; fMa1a ;mMa ðqMa1 ;mMa Þ the copula approach, it is easier to construct a multivariate
1 1 1

cðFMa0a ;1 ðqMa0 ;1 Þ; .; FM
a0
;mMa ðqMa0 ;mMa Þ; distribution with given Kendall correlation coefficients
0 a 0 0
0
a1
than with moment-based correlation coefficients. This is
FM a1 ;1
ðqMa1 ;1 Þ; .; FMa1a ;mMa ðqMa1 ;mMa ÞÞ ð11aÞ because Kendall correlation coefficients are not affected
1 1 1
by the monotone transformations of individual parame-
a a
In this equation fMja ;i and FMja ;i , are the marginal density ters that are necessary to generate the required marginal
j j
and cumulative distribution functions, respectively, for distribution from a uniform distribution. Kendall corre-
input qMaj ;i and decision alternative aj, and c is the lation coefficients can either be calculated directly from
copula density. elicited concordance probabilities (Clemen et al., 2000) or
The decomposition given by Eq. (11a) can be used to adjusted to obtain distributions with elicited Pearson
construct a multivariate density based on information correlation coefficients. In the latter case, an inversion of
about marginals and correlations. This has previously Eq. (11c) can lead to a first estimate valid for normal
been done for combining expert opinions (Jouini and marginals. Note that Eqs. (11a) and (11b) make it
Clemen, 1996; Clemen and Reilly, 1999) and for possible to use any one-dimensional marginal distribution
constructing flexible sampling distributions in impor- for each input variable; only the dependence structure is
tance sampling for Bayesian inference (Reichert et al., derived from a multivariate normal distribution.
2002). It is a useful approach for the current problem as It should be noted that the copula approach can only
well because marginals and correlations may be easier to be used to construct the joint density from one-
specify than either the full joint distribution or other dimensional marginals and Kendall correlation coef-
factorizations, such as a sequence of conditionals ficients between the one-dimensional marginals. It
(Clemen et al., 2000). cannot be applied using the multivariate marginals of
For technical convenience and because a normal the two alternatives directly (Genest et al., 1995). This
approximation is a natural first approximation to means that the multivariate input distribution derived
a distribution close to its maximum, the copula of the for each of the decision alternatives has to be first
multivariate normal distribution is often used to specify approximated in copula form using the same copula as
the correlation structure for given pairwise Kendall used for the joint distribution. Future extensions may be
996 P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001

possible, however, that allow for a more general components of uncertainty along these lines, but rather
approach using multivariate marginals (Cuadras, 1992; in distinguishing the components that are common or
Li et al., 1996). different across decision alternatives.
In some cases, a more straightforward alternative to For situations in which the assumption of full
the methods described above may be to develop a model dependence or full independence of inputs is not
for the difference in the outcomes of the decision appropriate, the copula technique provides a convenient
alternatives directly (Reckhow, 1980). This ‘‘incremen- means for parameterizing the joint distribution. The
tal’’ approach focuses on modelling only the compo- method provides high flexibility regarding distributional
nents of the system that are expected to change as the shape, and a number of efficient sampling techniques
result of a decision and ignores all other components can easily be applied (Reichert et al., 2002). A program
expected to remain the same. Such an approach is useful package implementing these sampling techniques for
in predicting the differences among alternatives but does distributions covering many typically applied marginals
not yield predictions of the actual magnitude of the with a normal copula is publicly available (http://
outcomes. As decision makers will usually also be www.uncsim.eawag.ch).
interested in this quantity, the procedure we describe Once samples of model inputs are drawn according to
above may often be more convenient. It leads to both the specified dependence assumptions, each set is then
the distribution of the difference in outcomes as well as used to generate model predictions as well as to calculate
the distribution of the outcomes themselves. the difference between the predictions for different
Once the distribution of differences of important decision alternatives (see Fig. 3). This difference
model outcomes for different decision alternatives has calculation implicitly accounts for the dependence
been calculated, several characteristic properties of this assumptions employed in generating the input samples.
distribution may be useful to summarize its information. The empirical distribution of this set of difference values
Of special importance are: (i) the expected value (the is then used to approximate the actual distribution of
calculation of which does not require the distribution of the differences. Estimates of the characteristic properties
the difference), (ii) the coefficient of variation (the of this distribution (e.g., the expected value, the
standard deviation divided by the mean, as a measure coefficient of variation, and the probability of improve-
of relative uncertainty), and (iii) the probability of ment) can easily be calculated from this sample.
improvement (the probability that the difference be-
tween the two outcomes is greater or less than zero,
depending on the definition of improvement). 5. Illustration with a phosphorus loading model

In this section, the ideas outlined above are illustrated


4. Practical implementation with a simple, didactical model of phosphorus loading
to a lake.
Once the joint probability density of the model
inputs for the decision alternatives, fMa0a;a;M
1
a1 ;inp
, has been 5.1. Model description
0
derived using the methods described in the previous
section, the distribution of model results, YLMa ðQaM1 a Þ The total yearly phosphorus loading to a lake can be
1 1
and YLMa ðQaM0 a Þ, and of their differences, YLMa ðQaM1 a Þ described by the phosphorus loading model, P,
L a0 0 0 1 1
YMa ðQMa Þ, can be calculated straightforwardly by
0 0 nX
storm
Monte Carlo simulation (Rubinstein, 1981; Morgan and YMP ZLbase CLwwtp CLstorm ZLbase CLwwtp C lstorm;i ;
Henrion, 1990). Nevertheless, it may be worth elabo- iZ1
rating on some special cases of this procedure. ð13Þ
When it is a reasonable approximation to represent
one subset of model inputs as fully dependent across where YMP is the total phosphorus loading, Lwwtp is
decision alternatives (Eq. (9)) and another as fully inde- loading from wastewater treatment plants, Lbase is the
pendent (Eq. (10)), we can use the same Monte Carlo base loading associated with discharge from the
sample for the dependent inputs and separate samples catchment during typical and dry weather conditions,
for the independent inputs (see Fig. 3). This technique Lstorm is loading associated with severe storm events,
has some similarity to the ‘‘two-stage’’ Monte Carlo lstorm, i is loading associated with the severe storm event
method often used in probabilistic risk assessments to i, and nstorm is the number of storm events in the year.
separate sources of uncertainty due to variability from To make this phosphorus loading model suitable for
those due to lack of knowledge (MacIntosh et al., 1994; illustration of our ideas, we make the following
Hession et al., 1996; Frey and Rhodes, 1998; Cullen and assumptions: (i) the loading contributions given in Eq.
Frey, 1999; Kelly and Campbell, 2000). However, in our (13) are independent of each other; (ii) the base loading
application, we are not interested in separating the is distributed lognormally with mean mbase and standard
P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001 997

deviation sbase; (iii) treated wastewater loading is the distributions of Labase 0


and Labase
1
, as well as the
a0 a1
distributed lognormally with mean mwwtp and standard distributions of Lstorm and Lstorm , represent additive
deviation swwtp; (iv) the number of severe storm events contributions that are identical for alternatives a0 and
during each year, nstorm, is Poisson distributed with a1. Therefore, according to Eq. (9), these contributions
mean l; (v) each severe storm event leads to flushing of do not contribute to the difference YMP ðQaM1 P Þ
phosphorus into the lake with a loading, lstorm,i, dis- YMP ðQaM0 P Þ. (Because we are interested in calculating
tributed lognormally with mean mstorm and standard the difference between the alternatives in the same year,
deviation sstorm; and (vi) the loadings resulting from we can ignore year-to-year variability of common
sequential severe storm events are assumed to be sources in the loadings.)
independent. The dependence of the distributions Lawwtp 0
and Lawwtp
1

may be considerably more complicated. It depends on


5.2. Policy alternatives the technology applied for upgrading the wastewater
treatment plant and how similarly the phosphorus
We investigate two policy alternatives to the baseline discharge for the two alternatives depends on external
scenario, a0, described above, which we will refer to as a1 influence factors. To avoid a detailed discussion of this
and a2. technology but still use a parameterization that allows
The first alternative, a1, consists of improving the consideration of different degrees of dependence, we
phosphorus removal procedure in the wastewater describe the joint distribution by a copula distribution
treatment plants. This leads to reduced phosphorus according to equations with the marginal distributions
loading such that the new contribution is lognormally described in Section 5.1 and multiple representative
distributed with a reduced mean value, mawwtp 1
, and the values for the Kendall correlation coefficient, Ra1 .
original standard deviation, swwtp. According to our assumptions, the alternative a2
The second alternative, a2, consists of the implemen- does not influence either base loading or discharge
tation of new agricultural practices that reduce the from wastewater treatment plants. This implies that
average input of phosphorus into surface waters during the distributions of Labase 0
and Labase
2
, as well as the
a0 a2
severe storm events. The new input per severe storm distributions of Lwwtp and Lwwtp , represent additive
event is lognormally distributed with a reduced mean, contributions that are identical for alternatives a0
mastorm
2
, and the original standard deviation, sstorm. It is and a2. Therefore, according to Eq. (9), these con-
assumed that the change in agricultural practices does tributions do not contribute to the difference
not affect the base loading. Values of the seven YMP ðQaM2 P Þ  YMP ðQaM0 P Þ. The number of severe storm
parameters of the loading distributions for each of the events in a certain year is also clearly not influenced by
alternatives are given in Table 1. a change in agricultural practices. However, some
dependence may need to be assumed between the
5.3. Specifying dependence structure distributions of phosphorus loading during each severe
and implementation storm event under the two alternatives. Again, to keep
our example straightforward, we parameterize the
To estimate the distribution of the differences in distribution of loading per severe storm event by
phosphorus loading across alternatives, we must formu- a copula distribution based on the marginals for both
late the joint probability densities fMa1P;a;M
0
P ;inp
ðqMP ; qMP Þ alternatives described in Section 5.1 and multiple
and fMa2P;a;M
0
P ;inp
ðqMP ; qMP Þ. These joint distributions representative values for the Kendall correlation co-
should have the marginals given in the description of efficient, Ra2 .
the alternatives, above. The problem then focuses on The probability distributions of load components
assessing the dependence structure. were sampled using the routine randsamp of the pro-
Upgrading the treatment plants for the alternative a1 gram package UNCSIM (http://www.uncsim.eawag.ch).
does not influence the base loading from the catchment This routine allows the user to sample from distribu-
or loading due to severe storm events. This implies that tions described by the copula formulation given by
the equations. The fully dependent parameters were
Table 1 described using a copula distribution with the corre-
Values of the parameters describing the loading distributions for the sponding Kendall correlation coefficients set to unity.
baseline scenario and two policy alternatives
Alter- Parameter values 5.4. Results
native mbase sbase mwwtp swwtp mstorm sstorm l
(t) (t) (t) (t) (t) (t) (events) The conditions of our example problem lead to
a0 4 1 4 0.5 4 1 2 marginal distributions of the total load, YMP , for the
a1 4 1 2 0.5 4 1 2 different policy alternatives that consist of sums of
a2 4 1 4 0.5 3 1 2
independent lognormal random variables for base
998 P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001

loading, wastewater treatment plant loading, and phosphorus loading during severe storm events does not
loading due to severe storm events. In this case, the lead to a change in loading for these special cases.
alternatives are described by different distributions of The distribution of the differences in phosphorus
the inputs for the same phosphorus loading model, MP. loading to the lake between the policy alternative a1
According to our parameterization, the expected phos- (phosphorus elimination in the wastewater treatment
phorus loading to the lake for any alternative is equal to plant) and the baseline scenario, a0, depends highly on
the dependence assumptions employed (Fig. 5). If all
EðYMP ÞZmbase Cmwwtp Clmstorm : ð14Þ components of the load for both alternatives are
considered independent, then the uncertainty in the
Because each decision alternative only affects one difference is very large (Fig. 5, dotted line). However, if
phosphorus loading source, the expected value of the the more realistic assumption of perfect dependence
difference in phosphorus loading between each of the among sources other than the wastewater treatment
alternatives and the baseline scenario is given by plant is included, then the uncertainty in the load
reduction predicted to result from alternative a1 is
EðYMP ðQaM1 P Þ  YMP ðQaM0 P ÞÞZmawwtp
1
 mwwtp ð15aÞ greatly reduced (Fig. 5, solid and dashed lines). In fact
the probability of improvement associated with a1 is
and nearly 100%, regardless of the assumed value of the
  Kendall correlation coefficient, R) a1 , describing the
EðYMP ðQaM2 P Þ  YMP ðQaM0 P ÞÞZl mastorm
2
 mstorm ; ð15bÞ
dependence of the distributions of the loading from
the wastewater treatment plant across the two alter-
respectively. According to Eq. (14) the expected value of
natives (Table 2). Nevertheless, assuming a greater
total phosphorus loading for the baseline scenario is
dependence across the alternatives further reduces the
16 t. The expected reduction in phosphorus loading for
prediction uncertainty for the resulting reduction.
both alternatives, a1 and a2, relative to the baseline
The distribution of the differences in phosphorus
scenario, a0, is equal to 2 t (Eqs. (15a) and (15b)).
loading to the lake between the policy alternative a2
Uncertainty in the prediction of total phosphorus
(change in agricultural practices) and the baseline
loading is very large for all three alternatives (Fig. 4).
scenario, a0, also depends on the dependence assump-
Most of this uncertainty is due to the hydrologic
tions (Fig. 6). It is evident that consideration of the
variability associated with the uncertain number of
dependence of the loading sources other than the input
severe storm events during a particular year. Peaks at
during severe storm events reduces the prediction
the lower end of the distributions are caused by the
uncertainty for the reduction in phosphorus loading
occurrence of zero, one, or two severe storm events per
year. For larger numbers of events, the peaks overlap
more strongly so that they are no longer clearly
0.6

distinguishable in the density functions. The distribu-


tions demonstrate that the average reduction of 2 t
0.5

achieved by each of the policy alternatives is over-


probability density [1/t]

whelmed by the width of the distributions. This is


0.4

especially the case for the agricultural policy alternative


because it does not lead to a significant change in the
0.3

lower tail of the distribution, in which no severe storm


events occur. Of course, an alternative that only reduces
0.2
0.00 0.02 0.04 0.06 0.08 0.10

0.1
probability density [1/t]

0.0

−10 −5 0 5 10
difference in phosphorus loading [t]
Fig. 5. Distribution of the difference in phosphorus loading between
the phosphorus removal alternative in the wastewater treatment plant,
0 10 20 30 40 a1, and the baseline scenario, a0, for two values of the Kendall
phosphorus loading [t] correlation coefficient describing the dependence of the distributions of
the discharge from the wastewater treatment plant, R) )
a1 : Ra1 Z0 (solid)
Fig. 4. Phosphorus loading for the baseline scenario, a0 (solid), for the and R) a1 Z0:8 (dashed). For comparative purposes, the distribution of
wastewater phosphorus removal alternative, a1 (dashed), and for the the differences under the assumption of full independence of the load
agricultural policy alternative, a2 (dotted). distributions for alternatives a1 and a0 is also shown (dotted).
P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001 999

Table 2 to a value of zero. However, for ease of graphical


Estimates for the expected value (EV), coefficient of variation (CV) and presentation, we maintain the form of Fig. 6.)
probability of improvement (reduction) (p(Impr.)) of the difference in
phosphorus loading for all six cases shown in Figs. 5 and 6
Alternative Kendall corr. P load reduction (t)
6. Discussion
Lbase Lwwtp lstorm EV CV p(Impr.)
a1 0 0 0 2 4.3 0.60 The results of our illustrative example show that,
a1 1 0 1 2 0.36 0.99 under certain conditions, the distributions of the differ-
a1 1 0.8 1 2 0.16 1.00
a2 0 0 0 2 3.9 0.60
ences in phosphorus loading across policy alternatives
a2 1 1 0 2 1.2 0.73 are much tighter than those of the actual predictions.
a2 1 1 0.8 2 0.74 0.86 This occurs because: (i) some external influence factors
The Kendall correlation coefficients represent correlation of the affect all alternatives in the same way, and (ii) the total
variable given in the column header for the alternative given in the load is composed of three independent sources, only one
left column with the same variable under the base scenario. of which is affected by each of the policy options. The
first reason implies the same number of severe storm
considerably. Once this dependence is considered, the events in any particular year in which the alternatives
assumption of independence of the distributions of the are compared. The second reason implies that only
discharge during severe storm events (R) a2 Z0) represents source differences affected by the alternative need to be
the most conservative realistic estimate of the un- considered (other sources would lead to identical incre-
certainty in the resulting load reduction. A higher value mental loadings for the baseline and policy scenarios in
for R) a2 Z0 leads to a tighter distribution for the any particular year).
difference and a resulting higher probability of improve- The usefulness of our suggested approach to a partic-
ment (Table 2). For this decision alternative, the ular decision situation depends on the objectives of the
probability of improvement cannot exceed 86%, be- decision maker. For instance, in our example, if the
cause only phosphorus loading during storm events is major management objective is to achieve a phosphorus
affected and the probability of no storm event in a given load reduction to the lake in any particular year, then it is
year is approximately 14%. (This result also causes the important to correctly calculate the distributions of
peak in the density function at a value of zero. As this is differences across alternatives. These represent the total
a probability for a discrete value, the results shown in uncertainty in achieving a phosphorus load reduction,
Fig. 6 might be more appropriately represented as given the underlying assumptions. It is not necessary in
a density function without this peak representing 86% this case to distinguish between uncertainty due to year-
of the cases and a discrete probability of 14% assigned to-year variability and uncertainty due to lack of
scientific knowledge, for this distinction does not lead
to relevant information concerning the objectives of the
0.6

decision maker. The important point is that the


distribution of the predicted improvement resulting from
0.5

an alternative may be much narrower when dependence


probability density [1/t]

among alternatives is properly considered; knowing the


0.4

nature of the uncertainty comprising that dependence is


not necessary. The practical result for our example is that
0.3

the confidence one can have in achieving a beneficial


impact on phosphorous loading increases significantly
0.2

by virtue of our approach. This is particularly the case


for the wastewater treatment option, which has a nearly
0.1

negligible risk of non-reductions.


If the major objective of the decision maker in our
0.0

example is to reduce the cumulative phosphorus loading


−10 −5 0 5 10 over many years, then the results of our detailed analysis
difference in phosphorus loading [t] of differences for a single year may be less important. In
Fig. 6. Distribution of the difference in phosphorus loading between such a situation, the distributions of actual results (Fig. 4)
the agricultural policy alternative, a2, and the baseline scenario, a0, for and the expected values of the predicted reductions
two values of the Kendall correlation coefficient describing the (Table 2) may provide sufficient information to decide
dependence of the distributions of the discharge during storm events,
among the alternatives, especially if there is a large
R) ) )
a2 : Ra2 Z0 (solid) and Ra2 Z0:8 (dashed). For comparative purposes,
the distribution of the differences under the assumption of full difference in the effectiveness of the alternatives. How-
independence of the load distributions for alternatives a2 and a0 is ever, when the reductions expected to result from the
also shown (dotted). alternatives are of similar magnitude, additional insight
1000 P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001

for the decision might be gained by distinguishing Of course, for our simple example, many of the results
between uncertainty due to year-to-year variability and described above are rather obvious (indeed, this is one
uncertainty due to lack of scientific knowledge. This reason why this example was chosen). Clearly, reduc-
can be done by factoring the joint distribution of model tions in wastewater phosphorus inputs will lead to
inputs and leads to conditional distributions, condi- a reduction in total loads under nearly all conceivable
tioning on either the ‘‘variability parameters’’ or ‘‘lack of situations. Reductions in inputs associated with high
knowledge parameters’’. The two-stage Monte Carlo rainfall events, on the other hand, may be less certain
approach cited earlier (MacIntosh et al., 1994; Hession depending on the frequency of those events. Our results
et al., 1996) is one method for accomplishing this support this intuition, but employing more naive
factorization. assumptions of independence in the sources of un-
For our example, such an analysis of uncertainty certainty across scenarios (i.e. the dotted lines in Figs. 5
would likely lead to the insight that year-to-year and 6) would have obscured this fact. Unfortunately,
variability in inputs is the dominant component. This such assumptions are often implicit in the interpretation
implies that, in the long run and for systems that of predictive analyses. Depending on the criteria used for
integrate the load, there is not much difference between the decision, such implications may be quite misleading.
the distributions shown by the dotted, dashed, and solid
lines in Figs. 5 and 6. In our example, all have the same
expected value for reduction, and the influence of 7. Conclusions
fluctuations will decrease by integration. This argument
might be made more precise by defining the manage- Our theoretical argumentation as well as a didactical
ment objective in terms of the long-term annual example demonstrates that it may be worth studying the
phosphorus load, rather than the load observed in any distribution of the difference in a variable of relevance
given year (as represented by the distribution in Fig. 4). across different policy alternatives, especially in cases in
Such a quantity can be predicted with significantly less which the forecast uncertainty is larger than the
uncertainty whether this prediction is calculated as the difference between expected outcomes of different policy
statistical mean of individual annual predictions or is alternatives. This is not usually part of decision theory,
produced by a model developed to directly predict the but may lead to additional insight. Furthermore, we
integral of phosphorus loadings over many years. present methods for such an analysis which, while
If, on the other hand, lack of scientific knowledge were demonstrated on a simple example, are equally suitable
found to be the major cause of uncertainty in our for more complex models.
example, then, while the expected improvements result-
ing from each alternative would still be equal, our
confidence in achieving an improvement for each alter- Acknowledgments
native may differ greatly, depending on the dependence
assumptions we employ (see Figs. 5 and 6 and Table 2). We thank John Norton and two anonymous
Thus, if knowledge uncertainty is the dominant source of reviewers for their suggestions for improving the
uncertainty and we assume that public decision makers manuscript.
would prefer to avoid the risk of an ineffective policy,
then in our hypothetical example the wastewater treat-
ment option should be preferred. This is because, despite References
having an expected value for phosphorus reduction equal
to the agricultural option, the wastewater option has Beck, M.B., 1987. Water quality modeling: a review of the analysis of
a higher probability of improvement (see Table 2). uncertainty. Water Resources Research 23 (8), 1393–1442.
Berger, J.O., 1985. Statistical Decision Theory and Bayesian Analysis,
Regardless of the objective of the decision maker, it is second ed. Springer, New York.
important to note that the narrowness of the distribu- Borsuk, M.E., Stow, C.A., Reckhow, K.H., 2002. Predicting the
tions shown by the solid and dashed lines in Figs. 5 and frequency of water quality standard violations: a probabilistic
6 does not imply that the effects of a policy will be easily approach for TMDL development. Environmental Science &
detected upon implementation. The difference between Technology 36 (10), 2109–2115.
Clemen, R.T., 1996. Making Hard Decisions, second ed. PWS-Kent,
any two alternatives in the same year is not an Boston.
observable quantity (a policy is either implemented or Clemen, R.T., Fischer, G.W., Winkler, R.L., 2000. Assessing de-
not). This means that the actual predicted result is pendence: some experimental results. Management Science 46 (8),
described by the wide distribution of the variable of 1100–1115.
interest (Fig. 4), even when the distribution of the Clemen, R.T., Reilly, T., 1999. Correlations and copulas for decision
and risk analysis. Management Science 45 (2), 208–224.
difference relative to the baseline scenario is very Cuadras, C.M., 1992. Probability distributions with given multivariate
narrow. Thus, detecting an improvement resulting from marginals and given dependence structure. Journal of Multivariate
a policy action may still be very difficult. Analysis 42, 51–66.
P. Reichert, M.E. Borsuk / Environmental Modelling & Software 20 (2005) 991–1001 1001

Cullen, A.C., Frey, H.C., 1999. Probabilistic Techniques in Exposure Morgan, M.G., Dowlatabadi, H., 1996. Learning from integrated
Assessment, A Handbook for Dealing with Variability and assessment of climate change. Climatic Change 34, 337–368.
Uncertainty in Models and Inputs. Plenum Press, New York. Nelson, R., 1995. Copulas, characterization, correlation, and counter-
French, S., Rios Insua, D., 2000. Statistical Decision Theory. Arnold, examples. Mathematics Magazine 68 (3), 193–198.
London, Kendall’s Library of Statistics 9. Omlin, M., Brun, P., Reichert, P., 2001. Biogeochemical model of
Frey, H.C., Rhodes, D.S., 1998. Characterization and simulation of Lake Zürich: Sensitivity, identifiability and uncertainty analysis.
uncertain frequency distributions: effects of distribution choice, Ecological Modelling 141, 105–123.
variability, uncertainty, and parameter dependence. Human and Pearl, J., 2000. Causality. Cambridge University Press, Cambridge,
Ecological Risk Assessment 4 (2), 423–468. UK.
Genest, C., Quesada Molina, J.J., Rodriguez Lallena, J.A., 1995. On Pratt, J.W., Raiffa, H., Schlaifer, R., 1995. Introduction to Statistical
the impossibility of constructing distributions with fixed multivar- Decision Theory. MIT Press, Cambridge.
iate marginals using copulas. Comptes Rendus de l’Académie des Reckhow, K.H., 1980. An incremental phosphorus loading change
Sciences, Serie I: Mathématique 320, 723–726. approach for prediction error reduction. International Symposium
Hession, W.C., Storm, D.E., Haan, C.T., Reckhow, K.H., Smolen, on Inland Waters and Lake Restoration. Office of Water
M.D., 1996. Risk analysis of total maximum daily loads in an Regulations and Standards, US EPA, Portland, ME, USA.
uncertain environment using EUTROMOD. Journal of Lake and Reckhow, K.H., 1994. Importance of scientific uncertainty in decision-
Reservoir Management 12 (3), 331–347. making. Environmental Management 18, 161–166.
Jouini, M.N., Clemen, R.T., 1996. Copula models for aggregating Reichert, P., Borsuk, M.E., 2002. Uncertainty in model predictions:
expert opinions. Operations Research 44 (3), 444–457. does it preclude effective decision support? In: Rizzoli, A.E.,
Kelly, E.J., Campbell, K., 2000. Separating variability and uncertainty Jakeman, A.J. (Eds.), Integrated Assessment and Decision
in environmental risk assessment – making choices. Human and Support – Proceedings of the first biennial meeting of the Inter-
Ecological Risk Assessment 6 (1), 1–13. national Environmental Modelling and Software Society (iEMSs),
Kruskal, W.H., 1958. Ordinal measures of association. Journal of the vol. 2. iEMSs, pp. 43–48.
American Statistical Association 53, 814–861. Reichert, P., Schervish, M., Small, M.J., 2002. An efficient sampling
Li, H., Scarsini, M., Shaked, M., 1996. Linkages: A tool for the technique for Bayesian inference. Technometrics 44 (4), 318–327.
construction of multivariate distributions with given nonoverlap- Reichert, P., Vanrolleghem, P., 2001. Identifiability and uncertainty
ping multivariate marginals. Journal of Multivariate Analysis 56, analysis of the river water quality model no. 1 (RWQM1). Water
20–41. Science and Technology 43 (7), 329–338.
MacIntosh, D.L., Suter II, G.W., Hoffman, F.O., 1994. Uses of Rubinstein, R.Y., 1981. Simulation and the Monte Carlo Method.
probabilistic exposure models in ecological risk assessments of John Wiley & Sons, New York.
contaminated sites. Risk Analysis 14 (4), 405–419. Schweizer, B., 1991. Thirty years of copulas. In: Dall’Aglio, G.,
Morgan, G.M., Henrion, M., 1990. Uncertainty – A Guide to Dealing Kotz, S., Salinetti, G. (Eds.), Advances of Probability Distributions
with Uncertainty in Quantitative Risk and Policy Analysis. with Given Marginals. Kluwer Academic Publishers, Dordrecht,
Cambridge University Press. pp. 13–50.

You might also like