Professional Documents
Culture Documents
EcologicalModelling 2002
EcologicalModelling 2002
www.elsevier.com/locate/ecolmodel
Abstract
With the growth of statistical modeling in the ecological sciences, researchers are using more complex methods,
such as artificial neural networks (ANNs), to address problems associated with pattern recognition and prediction.
Although in many studies ANNs have been shown to exhibit superior predictive power compared to traditional
approaches, they have also been labeled a ‘‘black box’’ because they provide little explanatory insight into the relative
influence of the independent variables in the prediction process. This lack of explanatory power is a major concern
to ecologists since the interpretation of statistical models is desirable for gaining knowledge of the causal relationships
driving ecological phenomena. In this study, we describe a number of methods for understanding the mechanics of
ANNs (e.g. Neural Interpretation Diagram, Garson’s algorithm, sensitivity analysis). Next, we propose and
demonstrate a randomization approach for statistically assessing the importance of axon connection weights and the
contribution of input variables in the neural network. This approach provides researchers with the ability to eliminate
null-connections between neurons whose weights do not significantly influence the network output (i.e. predicted
response variable), thus facilitating the interpretation of individual and interacting contributions of the input variables
in the network. Furthermore, the randomization approach can identify variables that significantly contribute to
network predictions, thereby providing a variable selection method for ANNs. We show that by extending
randomization approaches to ANNs, the ‘‘black box’’ mechanics of ANNs can be greatly illuminated. Thus, by
coupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be a
valuable quantitative tool to evaluate, understand, and predict ecological phenomena. © 2002 Elsevier Science B.V.
All rights reserved.
Keywords: Connection weights; Sensitivity analysis; Neural Interpretation Diagram; Garson’s algorithm; Statistical models
* Corresponding author. Present address: Department of Biology, Graduate Degree Program in Ecology, Colorado State
University, Fort Collins, CO 80523-1878, USA. Tel.: + 1-970-491-5432; fax: + 1-970-491-0649.
E-mail addresses: olden@lamar.colostate.edu (J.D. Olden), jackson@zoo.utoronto.ca (D.A. Jackson).
0304-3800/02/$ - see front matter © 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 3 0 4 - 3 8 0 0 ( 0 2 ) 0 0 0 6 4 - 9
136 J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150
eliminating null connection weights that mini- 3. Interpreting neural-network connection weights:
mally influence the predicted output, as well as an important caveat
provides a selection method for identifying inde-
pendent variables that significantly contribute to We refrain from detailing the specifics of neural
network predictions. By using randomization pro- network optimization and design, and instead
tocols to partition the importance of connection refer the reader to the extensive coverage provided
weights (in terms of their magnitude and direc- in the texts by Smith (1994), Bishop (1995), Rip-
tion), researchers will be able to quantitatively ley (1996), as well as articles by Ripley (1994),
assess both the individual and interactive effects Cheng and Titterington (1994). It is sufficient to
of the input variables in the network prediction say that the methods described in this paper refer
process, as well as evaluate the overall contribu- to the classic family of one hidden-layer, feed-for-
tions of the variables. Using an empirical example ward neural network trained by the backpropaga-
describing the relationship between fish species tion algorithm (Rumelhart et al., 1986). These
richness and habitat characteristics of north-tem- neural networks are commonly used in ecological
perate lakes, we illustrate the utility of the ANN studies because they are suggested to be universal
randomization test, and compare its results to two approximators of any continuous function
commonly used approaches: Garson’s algorithm (Hornik et al., 1989). N-fold cross validation was
and sensitivity analysis. used to determine optimal network design as it
provides a nearly unbiased estimate of prediction
success (Olden and Jackson, 2000). We found that
a neural network with four hidden neurons exhib-
2. Empirical example: fish species ited good predictive power (r= 0.72 between ob-
richness –habitat relationships in lakes served and predicted species richness).
In the neural network, the connection weights
Throughout this paper we use an empirical between neurons are the links between the inputs
example relating fish species richness to habitat and the outputs, and therefore are the links be-
conditions of 286 freshwater lakes located in Al- tween the problem and the solution. The relative
gonquin Provincial Park, south-central Ontario, contributions of the independent variables to the
Canada (45°50% N, 78°20% W). We tabulated spe- predictive output of the neural network depend
cies presence for each lake to examine relation- primarily on the magnitude and direction of the
ships between fish species richness (ranging from 1 connection weights. Input variables with larger
to 23) and a suite of habitat-related variables (8 in connection weights represent greater intensities of
total). Predictor variables were chosen to include signal transfer, and therefore are more important
factors that have been shown to be related to in the prediction process compared to variables
critical habitat requirements of fish in this geo- with smaller weights. Negative connection weights
graphic region (Minns, 1989), including: surface represent inhibitory effects on neurons (reducing
area, lake volume, and total shoreline perimeter the intensity of the incoming signal) and decrease
which are correlated with habitat diversity; maxi- the value of the predicted response, whereas posi-
mum depth which is negatively correlated with tive connection weights represent excitatory ef-
winter dissolved-oxygen concentrations and re- fects on neurons (increasing the intensity of the
lated to thermal stratification; surface measure- incoming signal) and increase the value of the
ments (taken at depths 52.0 m) of pH and total predicted response.
dissolved solids to provide an estimate of nutrient Given the obvious importance of connection
status and lake productivity; lake elevation which weights in assessing the relative contributions of
is related to both habitat heterogeneity and colo- the independent variables, there is one topic that
nization/extinction features of the lake; and grow- we believe warrants additional attention. During
ing degree-days which is a surrogate for the optimization process, it is necessary that the
productivity. network converges to the global minimum of the
138 J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150
fitting criterion (e.g. prediction error) rather than ing rate and momentum parameters, and the small
one of the many local minima. Connection weights interval of initial random weights ensured a high
in networks that have converged to a local mini- probability of global network convergence and thus
mum will differ from networks that have globally provided greater confidence regarding the validity
converged, thus resulting in the misinterpretation of the connection weights and their interpretation.
of variable contributions. Two approaches can be
employed to ensure the greatest probability of
network convergence to the global minimum. The 4. Illuminating the ‘‘black box’’
first approach involves combining different local
minima rather than choosing between them, for 4.1. Preparation of the data
example, by averaging the outputs of networks
using the connection weights corresponding to Prior to neural network optimization, the data
different local minima (e.g. Wolpert, 1992; Perrone set must be transformed so that the dependent and
and Cooper, 1993; Ripley, 1995). This approach independent variables exhibit particular distribu-
would also presumably involve averaging the con- tional characteristics. The dependent variable must
tributions of input variables across the networks be converted to the range [0…1] so that it conforms
representing each of the local minima. The second to the demands of the transfer function used
approach employs global optimization procedures (sigmoid function) in the building of the neural
where parameters such as learning rate, momentum network. This is accomplished by using the for-
or regularization are used during network training mula:
(e.g. White, 1989; Gelfand and Mitter, 1991; Rip-
ley, 1994). The addition of a learning rate (p) and yn − min(Y)
rn = (1)
momentum (h) parameters during optimization has max(Y)− min(Y)
been used in the ecological literature (e.g. Lek et al.,
1996a; Mastrorillo et al., 1997a; Gozlan et al., 1999; where rn is the converted response value for obser-
Spitz and Lek, 1999; Olden and Jackson, 2001) vation n, yn is the original response value for
because in addition to reducing the problem of observation n, and min(Y) and max(Y) represent
convergence to local minima, it also accelerates the the minimum and maximum values, respectively, of
optimization process. The p regulates the magni- the response variable Y. Note that the dependent
tude of changes in the weights and biases during variable does not have to be converted when
optimization, and h mediates the contribution of modeling a binary response variable (e.g. species
the last weight change in the previous iteration to presence/absence) because its values already fall
the weight change in the current iteration. The within this range.
values of p and h can be set constant or can vary To standardize the measurement scales of the
during network optimization, although there are a network inputs, the independent variables are con-
number of the disadvantages to holding p and h verted to z-scores (i.e. mean= 0, standard devia-
constant (see Bishop, 1995). Consequently, values tion= 1) using the formula:
of both p and h are commonly modified by either xn − X
increasing or decreasing their value according to zn = (2)
|X
whether the error decreased or increased, respec-
tively, during the previous iteration of network where zn is the standardized value of observation
optimization (e.g. Hagan et al., 1996; Mastrorillo n, xn is the original value of observation n, and X
et al., 1998; O8 zesmi and O8 zesmi, 1999). In our and |X are the mean and standard deviation of
study, we included learning rate and momentum the variable X. It is essential to standardize the
parameters during the optimization process (defin- input variables so that same percentage change in
ing them as a function of the error), and started the the weighted sum of the inputs causes a similar
network optimization with random connections percentage change in the unit output. Both the
weights between − 0.3 and 0.3. The variable learn- dependent and independent variables of the rich-
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 139
ness – habitat data set were modified using the Neural Interpretation Diagram (NID) for provid-
above formulas. ing a visual interpretation of the connection weights
among neurons, where the relative magnitude of
4.2. Methods for quantifying input 6ariable each connection weight is represented by line thick-
contributions in ANNs ness (i.e. thicker lines representing greater weights)
and line shading represents the direction of the
In the following section, we detail a series of weight (i.e. black lines representing positive, excita-
methods that are available to aid in the interpreta- tor signals and gray lines representing negative,
tion of connection weights and variable contribu- inhibitor signals). Tracking the magnitude and
tions in neural networks. These approaches have direction of weights between neurons enables re-
been used by ecologists and represent a set of searchers to identify individual and interacting
techniques for understanding neuron connections effects of the input variables on the output. Fig. 1
in networks. Next, we extend a randomization illustrates the NID for the empirical example and
approach to these methods, illustrating how con- shows the relative influence of each habitat factor
nection weights and the overall influence of the in predicting fish species richness. The relationship
input variables in the network can be assessed between the inputs and outputs is determined in
statistically. two steps since there are input-hidden layer connec-
tions and hidden-output layer connections. Positive
4.2.1. Neural Interpretation Diagram (NID) effects of input variables are depicted by positive
Recently, a number of investigators have advo- input-hidden and positive hidden-output connec-
cated using axon connection weights to interpret tion weights, or negative input-hidden and negative
predictor variable contributions in neural networks hidden-output connection weights. Negative effects
(e.g. Aoki and Komatsu, 1999; Chen and Ware, of input variables are depicted by positive input-
1999). O8 zesmi and O8 zesmi (1999) proposed the hidden and negative hidden-output connection
Fig. 1. NID for neural network modeling fish species richness as a function of eight habitat variables. The thickness of the lines
joining neurons is proportional to the magnitude of the connection weight, and the shade of the line indicates the direction of the
interaction between neurons: black connections are positive (excitator) and gray connections are negative (inhibitor).
140 J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150
Fig. 2. Bar plots showing the percentage relative importance of each habitat variable for predicting fish species richness based on
Garson’s algorithm (Garson, 1991). See Box 1 for sample calculations.
Fig. 3. Contribution plots from the sensitivity analysis illustrating the neural network response curves to changes in each habitat
variable with all other variables held at their 20th (· · · ·), 40th (-----), 60th (- - -) and 80th ( —) percentile.
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 143
4.2.4. Randomization test for artificial neural 3. randomly permute the original response vari-
networks able (yrandom);
We propose a randomization test for input-hid- 4. construct a neural network using yrandom and
den-output connection weight selection in neural the initial random connection weights; and
networks. By eliminating null-connection weights 5. repeat steps (3) and (4) a large number of
that do not differ significantly from random, we times (i.e. 999 times in this study) each time
can simplify the interpretation of neural networks recording 2(a), (b) and (c); e.g. randomized
by reducing the number of axon pathways that cA1, randomized c1, and randomized RI1.
have to be examined for direct and indirect (i.e. The statistical significance of each input-hidden-
interaction) effects on the response variable, for output connection weight, overall connection
instance when using NIDs. This objective is simi- weight and relative importance of each input vari-
lar to statistical pruning techniques (e.g. asymp- able (e.g. observed cA1, observed c1 and observed
totic t-tests), yet does not have to conform to the RI1) can be calculated as the proportion of ran-
assumptions of parametric and non-parametric domized values (e.g. randomized cA1, randomized
methods because the randomization approach em- c1 and randomized RI1), including the observed,
pirically constructs the distribution of expected whose value is equal to or more extreme than the
values under the null hypothesis for the test statis- observed values. Fig. 4 illustrates the distribution
tic (i.e. weight connection) from the data at hand. of randomized input-hidden-output connection
Moreover, the randomization approach can be weights (for hidden neuron B), overall connection
used as a variable selection method for ANNs by weight and relative importance of surface area for
summing across input-hidden-output connection
predicting species richness of lakes.
weights or calculating the relative importance (i.e.
Table 1 contains the connection weight struc-
Garson’s algorithm) for each input variable. This
ture for the neural network and the associated
approach provides a quantitative tool for selecting
p-values from the randomization tests. The results
statistically significant input variables for inclu-
show that only a fraction of the total 32 input-
sion into the network, again reducing network
hidden-output connections (i.e. 8 inputs× 4 hid-
complexity and assisting in the network interpre-
den neurons) are statistically different from what
tation. The following is the randomization proto-
would be expected based on chance alone. For
col for testing the statistical significance of
connection weights and input variables: instance, only six input-hidden-output connec-
1. construct a number of neural networks using tions are significant at h= 0.05. The results also
the original data with different initial random show that when you account for all connection
weights; weights (i.e. overall connection weight), lake size
2. select the neural network with the best predic- (i.e. surface area, maximum depth, volume and
tive performance, record initial random con- shoreline perimeter) and pH are positively associ-
nection weights used in constructing this ated with species richness, while elevation, total
network, and calculate and record: dissolved solids and growing-degree days are neg-
(a) input-hidden-output connection weights: atively associated with species richness. However,
the product of input-hidden and hidden-out- only the influence of maximum depth and shore-
put connection weights for each input and line perimeter are statistically significant (Table
hidden neuron (e.g. observed cA1: step 2, 1). Interestingly, the results from the randomiza-
Box 1); tion test using relative importance (derived from
(b) overall connection weight: the sum of Garson’s algorithm) differ from the results using
the input-hidden-output connection weights overall connection weights. Using Garson’s al-
for each input variable (e.g. observed c1 = gorithm, surface area was the only significant
cA1 + cB1); factor correlated with species richness, and eleva-
(c) relative importance (%) for each input tion was marginally nonsignificant (Table 1). The
variable based on Garson’s algorithm (e.g. discrepancy between the two approaches results
observed RI1: step 4, Box 1); from the different ways that the methods use the
144
Table 1
Axon connection weights for the neural network modeling fish species richness as a function of eight habitat variables
i Predictor variable Hidden neuron A Hidden neuron B Hidden neuron C Hidden neuron D Overall connection Relative importance
weight
1 Area (ha) −0.92 0.275 7.81 0.003 3.25 0.183 −2.71 0.094 7.43 0.087 18.27 0.031
2 Max. depth (m) 3.76 0.063 −0.60 0.475 6.23 0.051 −2.58 0.123 6.81 0.002 12.49 0.511
3 Volume (m3) 2.58 0.110 0.92 0.236 −1.76 0.156 0.80 0.359 2.54 0.301 5.94 0.885
4 Sh. per. (km) −2.64 0.102 0.28 0.432 4.95 0.053 3.45 0.088 6.04 0.021 11.02 0.585
5 Elevation (m) 8.10 0.024 0.44 0.458 −2.55 0.322 −7.16 0.012 −1.17 0.217 17.94 0.084
6 TDS (mg/l) −4.54 0.043 3.88 0.138 0.71 0.334 −0.49 0.519 −0.44 0.433 10.54 0.577
7 pH 4.15 0.049 −0.67 0.268 1.31 0.331 −2.39 0.177 2.40 0.067 8.34 0.808
8 GDD 4.36 0.098 2.49 0.117 −7.43 0.005 −1.45 0.361 −2.03 0.114 15.46 0.194
Wij represents the input-hidden-output connection weight for hidden neuron i (where i = A–D) and input variable j (where j = 1–8). P values for input-hidden-output
connection weights (WAi, WBi, WCi, and WDi ), overall connection weights (SWAi…+…Di ), and Garson’s relative importance (%) are based on 999 randomizations.
Italics indicate statistical significance based on h= 0.05. Abbreviations Sh. per., TDS and GDD refer to shoreline perimeter, total dissolved solids, and growing-degree
days, respectively.
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 145
Fig. 5. NID after non-significant input-hidden-output connection weights are eliminated using the randomization test (i.e.
connection weights statistically different from zero based on h = 0.05 and h =0.10). The thickness of the lines joining neurons is
proportional to the magnitude of the connection weight, and the shade of the line indicates the direction of the interaction between
neurons: black connections are positive (excitator) and gray connections are negative (inhibitor). Black input neurons indicate
habitat variables that have an overall positive influence on species richness, and gray input neurons indicate an overall negative
influence on species richness (based on overall connection weights).
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 147
Box 1. Garson’s algorithm for partitioning and quantifying neural network connection weights. Sample calculations shown for three
input neurons (1, 2 and 3), two hidden neurons (A and B), and one output neuron ().
cies richness compared to high elevation lakes hidden neurons for which incoming or outgoing
with simple shorelines. The NID also identifies connection weights are not significantly different
input variables that do not interact, for example from random. For example, no significant weights
lake volume, because this variable does not ex- originate from maximum depth, volume and
hibit significant weights with contrasting effects at shoreline perimeter input neurons in Fig. 5 (h=
any single hidden neuron with any of the other 0.05), indicating that these variables do not con-
variables. tribute significantly to predicted values of species
The randomization test can also be used as a richness. These neurons (i.e. predictor variables)
variable selection method for removing input and could be removed from the analysis with little loss
148 J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150
of predictive power. Similarly, hidden neurons black box approach for modeling ecological phe-
lacking connections to significant weights could nomena? In light of the synthesis provided here,
also be removed (this does not occur in our we argue the answer is unequivocally no. We have
empirical example, but see Olden and Jackson, reviewed a series of methods, ranging from quali-
2001). In summary, a randomization approach to tative (i.e. NIDs) to quantitative (i.e. Garson’s
neural networks can aid greatly in the identifica- algorithm and sensitivity analysis), for interpret-
tion and interpretation of direct and indirect (i.e. ing neural-network connection weights, and have
interaction between input variables) contributions demonstrated the utility of these methods for
of input variables in ANNs. We refer the reader shedding light on the inner workings of neural
to Olden (2000), Olden and Jackson (2001) for networks. These methods provide a means for
additional studies using the randomization test. partitioning and interpreting the contribution of
Two important components of the randomiza- input variables in the neural network modeling
tion test involved the optimization of the neural process. In addition, we described a randomiza-
network. First, we conducted the randomization tion procedure for testing the statistical signifi-
test for the product of input-hidden and hidden- cance of these contributions in terms of individual
output weights rather than each input-hidden and connection weights and overall influence of each
hidden-output connection weight separately, be- input variable. The former case facilitates the
cause the direction of the connection weights (i.e. interpretation of direct and interacting effects of
positive or negative) can switch between different input variables on the response by removing con-
networks optimized with the same data (i.e. sym- nection weights that do not contribute signifi-
metric interchanges of weights: Ripley, 1994). For cantly to the performance of the neural network.
instance, the input-hidden and hidden-output In the latter case, the randomization test assesses
weights might both be positive in one network whether the contribution of a particular input
and both negative in another, but in both cases variable on the response differs from what would
the input variable exerts a positive influence on be expected by chance. The randomization proce-
the response variable. To remove this problem, we dure enables the removal of null neural pathways
examined the product of the input-hidden-output and non-significant input variables; thereby aiding
weights because the true direction of the relation- in the interpretation of the neural network by
ship between the input and output will be con- reducing its complexity. In conclusion, by cou-
served. Second, optimizing the neural network pling the explanatory insight of neural networks
several times (i.e. constructing a number of net- with its powerful predictive abilities, ANNs have
works with the same data but different initial great promise in ecology, as a tool to evaluate,
random weights) can result in neural networks understand, and predict ecological phenomena.
with identical predictive performance, but quite
different connection weights. Therefore, if differ-
ent initial random weights are used for each ran- Acknowledgements
domization, dissimilarities between the observed
and random connection weights cannot be differ- We thank Sovan Lek for his insightful com-
entiated from the differences arising solely from ments regarding the finer points of artificial neural
different initial connection weights. Using the networks, and for providing some of the original
same initial random weights for each randomized MatLab code for this study. This manuscript was
network alleviates this problem. greatly improved by the comments of Nick
Collins, Brian Shuter, Keith Somers and an
anonymous reviewer. Funding for this research
5. Conclusion was provided by a Graduate Scholarship from the
Natural Sciences and Engineering Research
We reiterate the concern raised by a number of Council of Canada (NSERC) to J.D. Olden, and
ecologists and explicit to our paper: are ANNs a an NSERC Research Grant to D.A. Jackson.
J.D. Olden, D.A. Jackson / Ecological Modelling 154 (2002) 135–150 149
O8 zesmi, S.L., O8 zesmi, U., 1999. An artificial neural network 323, 533 – 536.
approach to spatial habitat modeling with interspecific Scardi, M., 2001. Advances in neural network modeling of
interaction. Ecol. Model. 116, 15 – 31. phytoplankton primary production. Ecol. Model. 146, 33 –
Paruelo, J.M., Tomasel, F., 1997. Prediction of functional 46.
characteristics of ecosystems: a comparison of artificial Scardi, M., Harding, L.W., 1999. Developing an empirical
neural networks and regression models. Ecol. Model. 98, model of phytoplankton primary production: a neural
173– 186. network case study. Ecol. Model. 120, 213 – 223.
Perrone, M.P., Cooper, L.N., 1993. When networks disagree: Schleiter, I.M., Borchardt, D., Wagner, R., Dapper, T.,
Ensemble methods for hybrid neural networks. In: Mam- Schmidt, K.-D., Schmidt, H.-H., Werner, H., 1999. Model-
mone, R.J. (Ed.), Artificial Neural Networks for Speech ing water quality, bioindication and population dynamics
and Vision. Chapman and Hall, London, pp. 126 – 147. in lotic ecosystems using neural networks. Ecol. Model.
Ripley, B.D., 1994. Neural networks and related methods for 120, 271 – 286.
classification. J. R. Stat. Soc. B 56, 409 –456. Smith, M., 1994. Neural Networks for Statistical Modeling.
Ripley, B.D., 1995. Statistical ideas for selecting network Van Nostrand Reinhold, New York, USA.
architectures. In: Kappen, B., Gielen, S. (Eds.), Neural Spitz, F., Lek, S., 1999. Environmental impact prediction
Networks: Artificial Intelligence and Industrial Applica- using neural network modeling. An example in wildlife
tions. Springer, London, pp. 183 –190. damage. J. Appl. Ecol. 36, 317 – 326.
Ripley, B.D., 1996. Pattern Recognition and Neural Net- White, H., 1989. Learning in artificial neural networks: a
works. Cambridge University Press. statistical perspective. Neural Computing 1, 425 – 464.
Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learn- Wolpert, D.H., 1992. Stacked generalization. Neural Net-
ing representations by back-propagation errors. Nature works 5, 241 – 259.