Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

RIVER RESEARCH AND APPLICATIONS

River Res. Applic. 22: 503–523 (2006)


Published online 21 March 2006 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/rra.918

A REVIEW OF STATISTICAL METHODS FOR THE EVALUATION OF AQUATIC


HABITAT SUITABILITY FOR INSTREAM FLOW ASSESSMENT

BEHROUZ AHMADI-NEDUSHAN,a* ANDRÉ ST-HILAIRE,a MICHEL BÉRUBÉ,b ÉLAINE ROBICHAUD,b


NATHALIE THIÉMONGEb and BERNARD BOBÉEa
a
Chair in Statistical Hydrology, INRS-ETE, Université du Québec, 490 de la Couronne, Québec, G1K 9A9, Canada
b
Hydro-Québec, 855 Ste-Catherine Street East, Montreal, Québec, H2L 1A4, Canada

ABSTRACT
Habitat models serve three main purposes: First, to predict species occurrences on the basis of abiotic and biotic variables,
second to improve the understanding of species-habitat relationships and third, to quantify habitat requirements. The use of
statistical models to predict the likely occurrence or distribution of species based on relevant variables is becoming an increas-
ingly important tool in conservation planning and wildlife management. This article aims to provide an overview of the current
status of development and application of statistical methodologies for analysing the species-environment association, with a
clear emphasis on aquatic habitat. It describes the main types of univariate and multivariate techniques available for analysis of
species-environment association, and specifically focuses on the assessment of the strengths and weaknesses of the available
statistical methods to estimate habitat suitability. A second objective of this article is to propose new approaches using existing
statistical methods. A wide array of habitat statistical models has been developed to analyse habitat-species relationship.
Generally, physical habitat is dependent on more than one variable (e.g. depth, velocity, substrate, cover) and several suitability
indices must be combined to define a composite index. Multivariate approaches are more appropriate for the analysis of aquatic
habitat as they inherently consider the interrelation and correlation structure of the environmental variables. Ordinary multiple
linear regression and logistic regression are popular methods often used for modelling of species and their relationships with
environment. Ridge regression and Principal component regression are particularly useful when the independent variables are
highly correlated. More recent regression modelling paradigms like generalized linear models (GLMs) present advantages in
dealing with non-normal environmental variables. Generalized additive models (GAMs) and artificial neural networks are better
suited for analysis of non-linear relationships between species distribution and environmental variables. The fuzzy logic
approach presents advantages in dealing with uncertainties that often exist in habitat modelling. Appropriate methods for ana-
lysis of multi-species data are also presented. Finally, the few existing comparative studies for predictive modelling are
reviewed, and advantages and disadvantages of different methods are discussed. Copyright # 2006 John Wiley & Sons, Ltd.
key words: instream habitat; habitat suitability; streams; statistical methods; habitat use; regression; Artificial Neural Networks; fuzzy logic

INTRODUCTION
Water flowing through natural stream channels supports a variety of needs, including habitat for fish and wildlife,
outdoor recreation, hydropower generation and navigation. Existing water demands and projected increase in these
demands have sometimes resulted in a conflict between the use of rivers as a water and energy source, and their
conservation as integrated ecosystems (Tharme, 2003). Conflicts between instream and offstream uses of water
continue to increase as human water needs increase (Caissie and El-Jabi, 2003; Tharme, 2003). It is estimated that
over 50% of the world accessible surface water is already appropriated by humans, and this is projected to increase
to 70% by 2025 (Postel, 1998).
The amount of water needed to maintain stream habitat on a year-round basis is termed instream flow needs
(Cooperrider et al., 1986). Construction of dams to impound water, diversion of water for irrigation, and municipal
and industrial uses may deplete natural stream flows to the point where these needs are no longer met. Protection of
the value of stream resources, therefore, depends upon reserving a portion of the stream flow for instream uses

*Correspondence to: Behrouz Ahmadi-Nedushan, Chair in Statistical Hydrology, INRS-ETE, Université du Québec, 490 de la Couronne,
Québec, G1K 9A9, Canada. E-mail: Behrouz_Nedushan@ete.inrs.ca

Received 15 December 2004


Revised 15 June 2005
Copyright # 2006 John Wiley & Sons, Ltd. Accepted 10 July 2005
504 B. AHMADI-NEDUSHAN ET AL.

(Orth and Maughan, 1981). Many managers of hydroelectric facilities, as well as regulatory agencies are faced
with making difficult decisions on how water will be allocated among multiple uses. When water allocation deci-
sions are made, regulatory agencies need to know how flow alterations will affect fish and aquatic habitats. Effi-
cient management partially depends on both the ability to predict future distributions of aquatic species and
knowledge of the habitats necessary to maintain their populations (Harvey et al., 2002). The need to sustain the
ecological values of rivers is widely recognized and implemented in different policies and legislations (e.g. No net
loss of productive capacity of habitat principle in the Canadian fish habitat management policy: Department of
Fisheries and Oceans (DFO), 1986; King et al., 1999; South African Water Law, 1998). Generally speaking, these
legislations, insist on maintaining sufficient flow in a river to allow different species (mostly fish) to complete their
different life stages (Leclerc et al., 2003).
Initial efforts to set the optimum instream flow have focused on establishing a single minimum flow value.
Annear et al. (2004) reviewed the most common methods for developing instream needs in the United States
and Canada, and concluded that the predominance of single-flow recommendations has not been successful in pro-
tecting the integrity of aquatic ecosystems. It is widely accepted that there is no single flow value that will conserve
an ecosystem or ecosystem components, or is optimal for all organisms and life cycles (Katopodis, 2003). In a
recent review of the present status of environmental flow assessment (EFA) methodologies, Tharme (2003) recog-
nized the existence of 207 individual methodologies, recorded in 44 countries around the world. The methods can
be categorized into four types: hydrological, hydraulic, habitat simulation and holistic methodologies.
Hydrological methods, often regarded as the simplest approach, are the most widely used around the globe
(Tharme, 2003). These methods rely primarily on the statistical analysis of time series, usually in the form of
historical, monthly or daily flow records, for making recommendations on minimum flow requirements which cor-
respond to a percentage of mean annual or seasonal flow, or to a quantile established using low flow frequency
analysis (Caissie and El-Jabi, 2003; Tharme, 2003). Hydraulic methods use changes in simple hydraulic variables,
such as wetted perimeter or maximum depth as a surrogate for habitat factors known or assumed to be limiting
for target biota (Caissie and El-Jabi, 2003). Habitat simulation methodologies ranked second in frequency of
applications at a global scale (28% of the overall total). These methods include the Instream Flow Incremental
Methodology (IFIM), and its popular application software, the physical habitat simulation model, PHABSIM
(Bovee et al., 1998; Stalnaker et al., 1995). IFIM was developed to integrate aspects of instream flow problems,
including the water needs of aquatic ecosystems (Bovee et al., 1998). PHABSIM predicts how physical habitat
(e.g. depth, velocity, substrate, cover) changes with flow and combines this information with habitat suitability
criteria (HSC) to determine an index of the amount of habitat available (the so-called weighted usable area:
WUA) over possible range of streamflows. In spite of the widespread application of PHABSIM, there has been
continued criticism of the biological, physical and methodological basis of its associated physical habitat simula-
tion. It has been criticized for not considering many interactions between species, life stages and other variables
that influence the state of the ecosystem (e.g. Gore et al., 1998; Jowette 1997; Holm et al., 2001; Gore and
Nestler, 1988).
Statistical hydraulic models poposed by Lamouroux et al. (1998) are based on using statistically based models
of river hydraulics. It has been suggested that at the reach scale, statistical hydraulic models can provide estimates
of frequency distributions of hydraulic variables such as depth and width (Lamouroux et al., 1998; Lamouroux and
Capra, 2002) from simple input variables (e.g. discharge). It has been demonstrated that consistent patterns of such
distributions appear among different streams. Parasiewicz and Dunbar (2001) pointed out that these models are
most suited to rivers with relative natural morphology. Recognizing that, in spite of these criticisms, habitat models
such as PHABSIM remain popular; the objective of this paper is to provide water resources managers and scientists
with a review of the various statistical methods available to establish the link between species abundance and the
abiotic and biotic variables used in such models. Unlike recent reviews (e.g. Tharme, 2003; Leclerc et al., 2003) on
different methodologies for environmental flow assessment, this review specifically focuses on the assessment of
the strengths and weaknesses of the available statistical methods to estimate habitat suitability. A second objective
of this article is to propose new approaches using existing statistical methods. The remainder of the paper is
divided as follows: First, a brief review of habitat models is provided. Then, Habitat Suitability Indices (HSI)
are defined. Finally, and most importantly, statistical methods available are described and assessed in terms of their
advantages and disadvantages for implementation in habitat modelling.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 505

HABITAT MODELLING
Empirical habitat models are based on a description of the abiotic variables that affect the distribution of species.
Univariate or multivariate functions are used to link abiotic characteristics to habitat suitability. Univariate func-
tions consider separate effect of individual parameters, while multivariate analysis takes into account the interac-
tion of physical variables and determines species response to cumulative effect of number of environmental
factors. A habitat model uses a set of habitat components to predict some attributes of wildlife populations. Many
aquatic species attributes can be considered in habitat studies. These may be either characteristics of a single spe-
cies population, multiple species populations, or community characteristics such as species richness. The simplest
measurement of individual species populations is presence-absence, which is used when the objective is to verify
the use of a habitat by a species. Abundance and density are other measures which have been widely used in
monitoring studies. Presence-absence data provides insight into how frequently a species occurs in a study region,
whereas fish abundance can provide additional information on how successful a species is in a stream (Wang et al.,
2003).

BIOTIC AND ABIOTIC FACTORS


Habitat requirements are often defined as abiotic features of the environment that are necessary for the survival and
persistence of individuals or populations (Armstrong et al., 2003; Rosenfield, 2003). In general, the abiotic com-
ponents can be divided into chemical and physical factors: chemical processes affect habitat resources and many
key biological processes for aquatic resources are linked to different factors such as pH, dissolved oxygen, nutrient,
and organic and inorganic contaminants (Jackson et al., 2001).
Physical habitat structure is of paramount importance in determining both the abundance and species composi-
tion of stream fishes (e.g. Allan, 1995; Bovee et al., 1998; Peeters and Gardeniers, 1998; Vadas and Orth, 2001).
The most important physical habitat variables include water depth (e.g. Geist et al., 2000; Guay et al., 2000;
Beecher et al., 2002; Kynard et al., 2000), water velocity and flow (e.g. Geist et al., 2000; Kynard et al., 2000;
Mallet et al., 2000; Peeters and Gardeniers, 1998), cover (e.g. Vadas and Orth, 2001) and substratum composition
(e.g. Knapp and Preisler, 1999; Vadas and Orth, 2001). Water velocity and the associated physical forces collec-
tively represent the most important environmental factors affecting the organisms of running waters. Current
affects food resources via the delivery and removal of nutrients and food items. Moreover, velocity is related to
the physical force that the organisms experience within the water column (Allan, 1995). The substrate provides
habitat space for a variety of activities such as resting and movement, reproduction and for refuge from predators
and flow (Giller and Malmqvist, 1998). It also provides food directly (organic particles) or surfaces on which food
aggregates (e.g. epiphytic algae). Other abiotic variables which affect distribution of aquatic organisms include:
water temperature, light penetration and bottom slope. Stream temperature sets limits to where species can live and
species are generally adapted to certain temperature regimes. There are also indirect effects of temperature on biota
through its influence of dissolved oxygen and metabolism rates (Giller and Malmqvist, 1998). For large fluvial
systems with fetches, wave climate and organic content of sediments may be important habitat variables (Morin
et al., 2003). Armstrong et al. (2003) remarked that models developed for one localised stream type rarely work in
other stream types or in other regions. Better models (in terms of explaining variance in fish abundance or predict-
ing abundance in new data sets) can be obtained by combining both local site features (e.g. width, depth, substrate,
cover, flow type, bank side vegetation) and catchment-scale variables (e.g. altitude, flow, stream order, geology,
primary productivity indices).

HABITAT SUITABILITY CRITERIA/INDICES


The classic approach of quantifying habitat consists of estimating local habitat indices based on available knowl-
edge regarding optimum range of abiotic conditions for the targeted aquatic species (Leclerc et al., 2003). The
habitat suitability index (HSI), the most commonly used index of habitat, is an analytical tool used to represent
preferences of different aquatic species for various instream variables (e.g. velocity, depth, substrate, cover) at
different life stages (Beecher et al., 2002; Vadas and Orth, 2001; Vismara et al., 2001). Three different types of

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
506 B. AHMADI-NEDUSHAN ET AL.

habitat suitability indices are distinguished (Bovee, 1986): Category I: professional judgement indices; Category II:
habitat use indices; and Category III: habitat preference indices. Category I indices are derived from life history
studies in the literature, or professional judgement. Category II indices use data collected specifically for habitat
studies, based on frequency of occurrence of actual habitat conditions used by different species and life stages in a
stream. Category III data combine a category II frequency analysis with additional information on the availability
of habitat combinations in the sampling reaches. Preference for a specific range of physical factors such as water
depth, substrate diameter, or current speed can be determined as the ratio of percent utilization (percentage of
fishes observed that use this range of variable) to present availability (percentage of the surface area of the river
with this range of variable) of these environmental conditions (Vadas and Orth, 2001). In general, the indices are in
the range of 0 to 1 for each variable with 0 meaning no preference for the particular habitat condition and 1 mean-
ing maximum preference for the particular condition. Suitability indices are usually created separately for different
microhabitat variables (e.g. cover, substratum size, depth, and velocity), and many of the established habitat mod-
els in the literature are developed to deal with only a single factor such as depth, velocity, substrate or stream cover.
Generally, physical habitat is dependent on more than one variable (e.g. depth, velocity, cover) and several
suitability indices must be combined to define a composite suitability index (Vadas and Orth, 2001). Different
methods have been used to combine the different SI’s obtained for each physical factor. Many researchers multiply
the SI’s for individual habitat variables to obtain a composite HSI (Beecher et al., 2002; Vadas and Orth, 2001).
This method is based on the assumption that fish selects each particular variable independently of other vari-
ables (Bovee, 1986), as multiplication of individual SI’s is analogous to multiplying assumed independent probabi-
lities of different variables (Vadas and Orth, 2001). The product equation yields zero suitability for any given
unsuitable habitat variable:

HSI ¼ SI1  SI2      SIn ð1Þ

Several alternative methods are available for calculating a composite HSI. The arithmetic-mean HSI is based on
the assumption that good habitat conditions on one variable (e.g. velocity) can compensate for poor conditions on
other variables (e.g. depth). Another approach, the lowest SI assumes that the most limiting factor determines the
upper limit of habitat suitability and the fact that high SI values cannot compensate for low SI values in other
variables (Korman et al., 1994). The geometric mean HSI is the nth root of the product of n individual indices
(e.g. the fourth root of the product of four indices). This approach also implies some compensation (Korman
et al., 1994; Layher and Maughan, 1985), yet like the product equation, it yields zero suitability for any zero-
valued Habitat Suitability Index (Brown et al., 2000). Once the composite suitability index has been determined,
then the amount of weighted usable area (WUA) is computed by multiplying each cell area by the respective com-
posite suitability factor (Vismara et al., 2001).
Several assumptions are implicitly used in discussed composite indices: 1) all variables are equally important to
the growth and survival of the aquatic organisms, 2) all environmental variables are independent and there is no
interaction between them (Beecher et al., 2002). The first assumption can be relaxed by using the weighted product
equation to consider the relative importance of each habitat variable to the aquatic organisms. The weighted pro-
duct equation assigns an exponent to each SI (Eqn 2) before multiplication

HSI ¼ SIb11  SIb22      SIbnn ð2Þ

The coefficients b1, . . . , bn can be obtained by using log transformation and performing a multiple linear regres-
sion (See Guay et al., 2000). The HSI models have been criticized for the fact that they do not consider the inter-
relation and correlation structure of the habitat variables (Jowette, 2003; Leclerc et al., 2003). For example, the
speed of current greatly influences the size and compositions of particles of the substrate (Giller and Malmqvist,
1998; Jowette, 2003), and depth and velocity are generally highly dependent. For instance, in a recent study,
Welker and Scarnecchia (2004) found a correlation of r > 0.9 between depth and velocity in North Dakota rivers.
Habitat selection by fish is undoubtedly a multivariate process where location is selected based on several inter-
acting variables (Nykanen et al., 2001).

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 507

MULTIVARIATE STATISTICAL METHODS


Multivariate analysis methods take into account the interaction of physical variables and determine species
response to cumulative effect of a number of environmental characteristics. Multivariate approaches inherently
consider the interrelation and correlation structure of the environmental variables and therefore are more appro-
priate for the analysis of fish habitat. Use of multivariate statistical methods to model species distribution and habi-
tat requirements has increased in the past twenty years with a wide variety of techniques (e.g. Braaten and Guy,
1999; Corbacho and Sanchez, 2001; Filipe et al., 2002; Garland et al., 2002; Geist et al., 2000; Guay et al., 2000,
2003; Harvey et al., 2002; Jutila et al., 2001; Mallet et al., 2000; Manel et al., 1999; Mérigoux et al., 2001;
Neumann and Wildman, 2002; Reash and Pigg, 1990; Tilma and Guy, 1998; Vadas and Orth, 2001; Vismara
et al., 2001; Yu and Lee, 2002). Different multivariate methods that have been reported in habitat modelling studies
are reviewed next.

MULTIPLE LINEAR REGRESSION


Multiple linear regression (MLR) is one of the most commonly used method to describe relationships between a
dependent variable (e.g. species abundance) and independent variables (e.g. abiotic predictors). Regression analysis
uses associations between independent and dependent variables to relate a response variable to a single (simple
regression) or a combination (multiple regression) of environmental variables. Multiple-regression models use con-
tinuous response data to describe the relationship between the abundance of fish and habitat variables such as depth,
substrate, water temperature, conductivity, channel width, or basin-scale features such as land use or forest cover.
The relationship between species response and environmental variables can be described as:

Y ¼  0 þ X T b þ " ¼  0 þ  1 x 1 þ  2 x 2 þ    þ  m xm þ " ð3Þ

where Y represents the response variable (e.g. abundance), X ¼ (x1; . . . ; xm) is a vector of m predictor variables, b0
is a constant called the intercept, b ¼ (1; . . . ; m) is the vector of m regression coefficients (one for each predictor),
and " is the error. Regression models have been widely used to predict species distribution, abundance and habitat
preferences. The MLR has been used for building HSI models for fish guilds in rivers (Vadas and Orth, 2001), as
well as association of species–habitat in rivers and lakes (e.g. Braaten and Guy, 1999; Brosse and Lek, 2002;
Brosse et al., 1999; Rathert et al., 1999; Reash and Pigg, 1990; Tilma and Guy, 1998; Vismara et al., 2001; Yu
and Lee, 2002).
Regression coefficients are generally estimated using Ordinary Least Squares (OSL) algorithm to minimize the
difference between predicted and observed response. If the response variable does not have a linear relationship
with a predictor, a transformed term of the environmental variable can be included in the model. A regression
model including higher order terms is called a polynomial regression. Second order polynomial regressions simu-
late unimodal symmetric responses, whereas third order or higher terms allow simulating skewed and bimodal
responses, or even a combination of both (Guisan and Zimmerman, 2000). Polynomial regression models have
been used in numerous studies to model the preference of fish species (e.g. Vadas and Orth, 2001; Vismara et
al., 2001). Vismara et al. (2001) used a second order polynomial regression to model preference of Brown trout
(Salmo trutta fario) in Adda River, Italy. The data were fitted by varying the order of the depth and velocity terms
and adding or removing the interaction term. Different models were tested, and two final best-fit models were
selected. For juvenile trout, a model with a first-order depth term and a first-order interaction term was used
and a coefficient of determination (R2) of 0.55 was found. For adult trout, a model with a second-order depth term
and a first-order interaction term was selected as the best model and coefficient of determination (R2) was 0.70.
As many habitat features are correlated with each other (Armstrong et al., 2003), the use of equation (3) with
typical habitat variables may lead to a problem called multicollinearity. This situation occurs when some of the
predictor variables are highly correlated (Afifi and Clark, 1996; Montgomery and Peck, 1992). When multicolli-
nearity is present, the computed estimates of the regression coefficients are unstable and have large standard errors.
The regression coefficients fluctuate when used across samples; and even a slight change in the data can result in
different regression coefficients.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
508 B. AHMADI-NEDUSHAN ET AL.

A diagnostic test to assess multicollinearity can be performed by calculating pairwise correlations of independent
variables. Harraway (1995) suggested that a pairwise correlation exceeding 0.5 in magnitude could indicate a colli-
nearity problem. However, pairwise correlations do not identify the multicollinearity when one independent variable
is highly correlated with a linear combination of other independent variables. Variance Inflation Factor (VIF) analysis
is another multicollinearity diagnostic which can reveal more complex relationships between independent variables
(Montgomery and Peck, 1992). Variance inflation factor for each independent variable xj is defined as:
1
VIFj ¼ ð4Þ
1  Rj 2

where Rj2 is the coefficient of determination of predicting xj using the rest of independent variables. Usually, VIF
values larger than 10 suggest that multicollinearity may be causing estimation problems (Montgomery and Peck,
1992). Another useful method for checking the stability of regression coefficients is to standardize the data, com-
pute the regression coefficients from the standardized data, and then compare them to standardized coefficients
obtained from the original regression equation (Afifi and Clark, 1996). If there is no significant difference between
the standardized regression coefficients from the original data and the regression coefficients from the standardized
data, then the occurrence of multicollinearity is less likely.

VARIABLE SELECTION
Like any other model building technique, the goal of the multiple regression analysis is to find the best fitting and
most parsimonious, yet biologically reasonable model to describe the relationship between species distribution and
a set of environmental variables. In many applications, the major interest is to identify the most important factors
that affect the species distribution. In these situations one wishes to identify a small subset of environmental factors
that relate significantly to the species distribution. There are at least two different approaches for selecting the best
set of explanatory variables and building the best multiple regression model (Annoni et al., 1997), the first is the
stepwise regression; the second considers all possible subsets and compares some overall measure of goodness of
fit for all possible subset models (Montgomery and Peck, 1992; Ryan, 1997).
Selection of variables can be achieved by using a strategy that adds into or removes from a model one variable at
a time according to a certain criteria of relative importance. Two different available strategies are forward and
backward selection procedures (Ryan, 1997). The forward selection process begins with the assumption that there
are no variables in the model other than the intercept. The optimal subset is chosen by inserting independent vari-
ables into the model one at a time. The first variable selected for entry into the model is the one that has the largest
correlation with the response variable. This procedure begins with the variable explaining the highest percentage of
the variance and adds the other significant variables in an iterative procedure. In contrast, backward elimination
begins with the model that includes all predictor variables and deletes the non-significant variables in an iterative
procedure (Ryan, 1997). The two procedures described above suggest a number of possible combinations. One of
the most popular is the stepwise regression algorithm. Stepwise regression is a modified version of forward regres-
sion in which at each step all explanatory variables entered into the model previously are reassessed via their par-
tial F-statistics (Afifi and Clark, 1996). A variable entered at an early stage may become redundant later because of
its relationship with other variables now in the model. That variable may be removed, if it meets the elimination
criteria and the model is re-fitted with the remaining variables, and the forward process goes on. The whole pro-
cess, one step forward followed by one step backward, continues until no more variables can be added or removed
(Montgomery and Peck, 1992). The forward selection, backward selection, and stepwise regression do not neces-
sarily lead to the same final model. The discussed algorithms have been criticized on various grounds, the most
common being that none of the procedures generally guarantees that the best subset regression model of any size
will be identified. It must be noted that the order in which independent variables enter or leave the model does not
necessarily imply an order of importance for the environmental variables.
Stepwise regression has been widely used in stream ecology for different species: for analysis of habitat selec-
tion by fish (e.g. Bult et al., 1999; Heggenes, 2002), analysis of fish abundance (Annoni et al., 1997; Inoue and
Nakano, 2001; Nakamoto, 1994; Reash and Pigg, 1990), and analysis of benthic diversity in streams (Matlock and
Maughan, 1988). Inoue and Nakano (2001) modelled population densities of Salmonids, based on 12 habitat

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 509

variables measured in 55 reaches of forest and grasslands streams in Japan. The stepwise regression indicated that
salmon density was best modelled by a combination of woody debris, cover area, mean depth and maximum water
temperature. Reash and Pigg (1990) used the stepwise regression to model fish assemblage parameters (species
richness, diversity, biomass) based on abiotic variables at seven sites in the Camerun River, USA. Separate models
were developed for different fish assemblage parameters at each site. Generally temperature, velocity, conductivity
and depth were the most important abiotic variables. Armstrong et al. (2003) remarked that most habitat features
are strongly correlated with each other (width, depth, velocity, substrate size, gradient, etc.) and the final set of
variables in the models may not necessarily include those which most influence species. This is because stepwise
regression cannot distinguish between factors directly controlling fish abundance from those which are merely
surrogates, by virtue of their strong statistical association, with the other direct factors. Moreover, some important
variables may be omitted from models because of the dominance by correlated variables.
An alternative to stepwise algorithms is to consider all possible subsets of different sizes. The amount of com-
putation required to perform all-possible-subset regression increases as the number of variables and possible sub-
models increases. When the number of available variables, k, is small (say, less than 6), it is practical to compute to
compute all 2k  1 possible regressions models explicitly. When k is large, however, it is desirable to have all pos-
sible regressions computed implicitly. This can be achieved by performing the branch-and-bound algorithm (Ryan,
1997). Different criteria like the adjusted R2, Mallow’s CP (Mallows, 1973) or the Akaike information criterion
(AIC) (Akaike, 1974) can be used to compare different models and select the best parsimonious model. Annoni
et al. (1997) used an all regressor method and compared models of different sizes to predict the salmonid biomass
at different stations of an Italian river, based on hydrological, chemical and biological variables. Two criteria, the
adjusted R2 and Mallow’s CP were used to compare the performance of the models. The selected model included
six variables out of nine original variables, and explained 98% of the biomass variance. The selected variables were
mean annual flow, altitude, slope, reach width, macro invertebrate density and diversity.
Neter et al. (1989) discussed the use of all-possible-subset regression in conjunction with stepwise regression.
They stated that a limitation of the stepwise regression search approach is that it presumes there is a single best
subset of variables and seeks to identify it. However in many instances, there is often no unique best subset. Hence,
Neter et al. (1989) suggested that all possible regression models with a similar number of variables as in the step-
wise regression solution be fitted subsequently to study whether some other subsets of environmental variables
might be better. This approach implies that after finding a stepwise solution, the best of all the possible subsets
of the same number of variables should be examined to determine if the stepwise solution is among the best. If
there is a strong physical or biological justification for the inclusion of certain variables in the model, this proce-
dure might provide models that can be physically more meaningful.

RIDGE REGRESSION
In situations where multicollinearity exists, the challenge is to minimize the possibility of including redundant
variables in the model. However, care must also be taken not to delete any important variables. One solution to
the problem of multicollinearity is the use the ridge regression procedure. Hoerl and Kennard (1970) proposed
ridge regression, a biased estimation method, which provides more stable estimates of the regression coefficients.
These estimates are biased but they often result in a smaller mean square error for the estimates and regression
coefficients remain stable when used on different samples. The basic justification for ridge regression is trading off
bias against variance as this procedure reduces the variance, but increases the bias. By trading off these two quan-
tities, a model that best predicts unseen observations can be identified (Guisan and Zimmerman, 2000). In this
sense, a slightly biased estimator with smaller variance may be more advantageous than an unbiased estimator
having large variances. In practice ridge regression is usually applied to predictor variables which have been cen-
tered by subtracting the sample average and then scaled so that the diagonal elements of X0 X are equal to one. In
this form the X0 X matrix is the sample correlation matrix of the original predictor variables (Miller, 1990). The
expression for the ridge regression coefficients can be written as:

bðkÞ ¼ ðX0 X þ kIÞ1 X0 Y ð5Þ

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
510 B. AHMADI-NEDUSHAN ET AL.

where I is a m  m identity matrix, and ridge constant k is a positive amount which is added to each diagonal
element of correlation matrix. The optimum value of k is usually determined empirically by plotting the so called
ridge trace (Afifi and Clark, 1996; Hoerl and Kennard, 1970; Montgomery and Peck, 1992), which is a plot of ridge
regression coefficients as a function of k and finding on this plot the point of inflection beyond which, for any value
of k, the regression coefficients remain relatively stable. The use of ridge regression is beneficial and preferable
when the amount of data is not large relative to number of variables as noisy variables can frequently be included in
the chosen subset (Ryan, 1997). This property makes ridge regression a useful tool in habitat modelling as numbers
of observations are often limited. To the authors’ knowledge, ridge regression has not been applied in statistical
modelling of aquatic habitat and may be used as an alternative to other statistical models when some of the envir-
onmental variables used in the study are highly correlated.

PRINCIPAL COMPONENT REGRESSION


Principal Component Analysis (PCA) is another multivariate statistical method that can be used in case of multi-
colinear data. PCA is concerned with explaining the variance-covariance structure of a data set through a few linear
combinations of the original variables (Afifi and Clark, 1996). The purpose of PCA is to identify the dependence
structure of multivariate observations in order to obtain a compact representation. The analysis identifies charac-
teristic and uncorrelated modes of variation of the variables. Attractive features of this operation are that: (1) it
produces ‘new latent variables’ that are not correlated; and (2) it is an efficient method of reducing the number of
variables. PCA is applied to either the correlation matrix (R) or the covariance matrix (S) of the original variables.
Principal components are obtained from the solution

ðR  lIÞp ¼ 0 ð6Þ

where I is the identity matrix of order m. Solving for (6) results in a set of eigenvalues lj(j ¼ 1 : m), and a corre-
sponding set of eigenvectors pj(j ¼ 1 : m). Principal Components or scores fj(i ¼ 1 : m) are linear combinations of
the original variables, where the loadings on each variable are given by the eigenvectors. The percentage of var-
iance explained by each principal component is equal to its associated eigenvalue. In PCA, the original variables
are transferred into new, uncorrelated variables called the principal components or factors. When the observed
variables are correlated, the number of variables can be reduced without losing much of the information. This
objective can be achieved by selecting only the first few principal components. PCA offers a means of handling
multicollinearity among variables. It takes advantages of the dependencies among independent variables by elim-
inating redundancy and creating new orthogonal uncorrelated variables. Since the principal components are
arranged in decreasing order of explained variance; the first few can be selected to be representative of variability
of original variables (Afifi and Clark, 1996). One way of using PCA in habitat modelling is to use principal com-
ponent regression. In this approach, principal components are derived from the data of abiotic variables and then
response variable is regressed versus the first few selected principal component. The least square procedure is used
to obtain a prediction equation for response variables as a function of selected principal components. Once the
fitted equation is obtained in terms of selected principal components, it can be transformed back into a function
of the original variables, if desired. In some cases the combination of variables from different categories in the
same principal component makes the interpretation of the principal components difficult. In such cases, rotation
of principal components might improve the interpretation, as the pattern of loadings on the rotated principal com-
ponents might be more conceptually meaningful. Varimax rotation is probably the most popular orthogonal rota-
tion procedure, by which, the coordinate axes are rotated so as to maximize the sum of variances of the squared
loadings whitin each column of the loading matrix. The varimax criteria results in a new set of orthogonal coor-
dinate axes where each new coordinate axe has either large or small loadings of the variables on it. This property
allows better physical interpretation of principal components.
Principal component regression has been used in many aquatic habitat studies: for analysis of fish abundance in
tributary confluences of rivers (e.g. Braaten and Guy, 1999), analysis of fish abundance in rivers (Corbacho and
Sanchez, 2001; Jutila et al., 2001; Neumann and Wildman, 2002), seasonal and spatial variation of fishes in pools

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 511

(Elso and Giller, 2001); and regionalization of aquatic habitat (Cohen et al., 1998). PCA can be also used in iden-
tifying which groups of variables will account for the greatest variability of the data. Annoni et al. (1997) used
PCA in combination with multiple linear regression to select the best subset of variables to model variation in
salmonid biomass in 13 stations in Italian Alpine rivers. In this approach principal components are calculated
and environmental variables highly correlated with the first few principal components are used to model the
response variable. If the first few components explain a high percentage of variance; environmental variables
which are not correlated with the first few principal components can be disregarded from the analysis (Toepfer
et al., 1998).

LOGISTIC REGRESSION
Multiple linear regression is used when the response variable (e.g. species abundance) is continuous, but is not an
appropriate method for analysing dichotomous response variables, as is the case for data of presence-absence of
species. Application of multiple linear regression to dichotomous response variables would not produce values
restricted to 1 or 0 as it is desired, and many unwanted values between 0 and 1 and even uninterpretable values
greater than 1 could be obtained. An appropriate model in such cases is the logistic regression, which can be used
to analyse the relationship between a Bernoulli (or binary) response (suitable versus unsuitable) and explanatory
environmental factors describing the quality of the habitat (e.g. depth, velocity and substrate). Logistic regression
allows for the simultaneous analysis of categorical (e.g. substrate, cover) and continuous variables such as depth
and current velocity (e.g. Filipe et al., 2002). The model estimates the probability of a positive response occurring
given a set of explanatory variables (Agresti, 1996; Jongman et al., 1995). Based on the presence-absence data, a
response curve of a species describes the probability of the species being present, p, as a function of environmental
variables (Fladung et al., 2003). The response variable is transformed by the logit link function (Agresti, 1996),
which transforms bounded probabilities (between 0 and 1) to unbounded values:
 
pi
gðxÞ ¼ log ð7Þ
1  pi

Where pi can be the probability of species presence in a cell or the probability that a habitat cell would be suitable
for species and g(x) is the linear combination of environmental factors (Garland et al., 2002; Geist et al., 2000).
Transforming the probability to odds removes the upper bounds, and taking the logarithm of the odds removes the
lower bound (Allison, 1999). Logistic regression model is expressed as:
 
pi
log ¼ 0 þ  1 x 1 þ 2 x 2 þ    þ m x m ð8Þ
1  pi

The Maximum Likelihood Method (MLM) is generally used for estimating the parameters of the logistic regres-
sion model. Logistic regression has been used extensively in studies of habitat use by various aquatic species (e.g.
Filipe et al., 2002; Garland et al., 2002; Geist et al., 2000; Guay et al., 2000, 2003; Harvey et al., 2002; Mallet
et al., 2000; Manel et al., 2001; Rosenfeld et al., 2000). These studies have shown that different variables (mainly
depth, velocity, cover and temperature) affect species distribution. Garland et al. (2002) applied logistic regression
to model presence-absence of salmon subyearling in Lake Wallula, USA. Logistic regression model indicated that
substrate was the most important factor determining subyearling habitat use. Guay et al. (2000) successfully
applied logistic regression to model the distribution of juvenile Atlantic salmon (Salmo salar) using main physical
habitat variables (e.g. velocity, depth, substrate) in the Sainte-Marguerite River in Québec. The logistic regression
produced a habitat probabilistic index (HPI) representing the probability of observing a species under given phy-
sical conditions. It was concluded that HPI may be a more powerful biological model than HSI for predicting local
variations in fish density. Harvey et al. (2002) used logistic regression to examine fish-habitat relationship in small
tributaries of the Eal river, California. For most species, maximum weekly water temperature was the main impor-
tant variables affecting fish distribution. Filipe et al. (2002) used logistic regression to identify a combination of

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
512 B. AHMADI-NEDUSHAN ET AL.

environmental variables that affect species distributions at different sites in the Guadiana river, a semi-arid stream
in Portugal. Their results suggest that the occurrence of species in semi-arid streams can be explained to a reason-
able degree of precision by a few variables. Variables relating to location in the drainage basin (e.g. stream order
and distance to the main river) and geomorphological variables (e.g. elevation, longitudinal slope and vegetation
cover of the banks) are important in discriminating the presence of fish species.

GENERALIZED LINEAR MODELS


Although a powerful approach in situations when appropriately applied, linear regression models are limited by
following main assumptions: 1) the errors are assumed to be identically and independently distributed; this
includes the assumption that the variance of response variable is constant across observations (homoscedasity);
2) model errors are assumed to follow a normal Gaussian distribution; and 3) the regression function is linear
in the predictors. These assumptions are often far from reality and data rarely have normal errors (Austin and
Meyers, 1996). When these assumptions are not met, one can use transformations and perform the linear regression
on transformed data. A common motivation for the transformation is to achieve an appropriately stable variance. In
other cases, transformations are used when the random errors do not appear to follow a normal distribution. In
practice, data transformations work reasonably well at times. However, in some cases, it may be impossible for
the same transformation to create normally distributed errors, to stabilize the variance, and to lead to a linear model
(Myers et al., 2002). Modern regression methods like generalized linear models are better suited to analyze these
types of data.
An important statistical development of the last 30 years has been the improvement in regression analysis pro-
vided by the Generalized Linear Model (GLM) and the Generalized Additive Model (GAM). Since their devel-
opment, both approaches have been widely used in ecological research due to their ability to deal with the different
types of distributions that characterize ecological data (Guisan et al., 2002). In GLM, data may be assumed to be
from several families of probability distributions, many of which better fit the non-normal error structures found in
most ecological data. GLMs are more flexible for analysing habitat species relationships, which can be poorly
represented by classical Gaussian distribution. All generalized linear models are comprised of three components;
a response variable distribution which identifies the response Y, a linear predictor which specifies the environmen-
tal variables used as predictors in the model, and the link function g, which describes the functional relationship
between the linear predictors and the expected value m ¼ E(Y) of the response variable. The GLM relates a func-
tion of the mean to environmental variables through a prediction equation of a linear form (Agresti, 1996; Myers
et al., 2002). Using the same notations used for multiple linear regression (i.e. equation 3):
gððxÞÞ ¼ 0 þ xT  ¼ 0 þ 1 x1 þ    þ m xm ð9Þ

The generalized linear model can be used to fit regression models for univariate response data that follows a very
general class of statistical distributions called the exponential family. The exponential family includes the normal,
binomial, Poisson, geometric, negative binomial, exponential, and inverse normal distributions (Myers et al.,
2002). In fact ordinary linear regression and logistic regression discussed in previous sections are special cases
of GLMS. Ordinary linear regression is based on the assumption of a normal distribution and models the mean
directly. A GLM generalizes ordinary regression models in two ways: first, it allows the response variable to have a
distribution other than the normal. Second, it can model some function of the mean (Agresti, 1996).
GLMs are receiving increased attention in the ecological studies as a powerful modelling paradigm for unco-
vering patterns in ecological data (Guisan et al., 2002); a fact recently demonstrated during a recent international
workshop held in Switzerland entitled: ‘Advances in GLMs/GAMs modelling: from species distribution to envir-
onmental management’ (conference papers are published in Ecological Modelling: Volume 157, Issue 2–3). Most
of GLM applications in ecology have been in vegetation research and analysis of plant species (e.g. Austin and
Myers, 1996; Cairns, 2001; Cawsey et al., 2002; Yee and Mackenzie, 2002) and applications of GLMs to aquatic
habitat modelling are limited. Labonne et al. (2003) applied GLMs for analyzing fish-habitat relationships in
the Beaume river. The main conclusion was that GLMs can be used as effective tools for the analysis of aquatic
habitat-species relationship, especially when the assumptions of simpler methods cannot be justified.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 513

GENERALIZED ADDITIVE MODELS


General additive models (GAMs) are non-parametric extension of GLMs (Lehman et al., 2002), which are them-
selves a generalization of Multiple linear regression (MLR). While GLMs extend the application of classical
regression into other statistical distributions (binomial, Poisson, gamma, negative binomial), GAMs estimate
response curves with a non-parametric smoothing functions instead of parametric terms (e.g. ax þ bx2).
The only underlying assumption made is that the functions are additive and that the components are smooth. A
GAM, like a GLM, uses a link function to establish a relationship between the mean of the response variable and a
smoothed function of the predictor variables. The generalized additive model approach provides more flexibility in
the model form. It is actually possible to combine a standard generalized linear regression model in some of the
predictors with nonparametric regressions in the others (Myers et al., 2002). The strength of GAMs is their ability
to deal with highly non-linear and non-monotonic relationships between the response and the set of predictor vari-
ables. Like GLMs, the ability of this tool to handle non-linear data structures can aid in the development of habitat
models that better represent the underlying data, and hence increase our understanding of ecological systems.

gððxÞÞ ¼ 0 þ f 1 ðx1 Þ þ    þ fm ðxm Þ ð10Þ

GAMs extend GLM as a sum of smooth functions of the individual environmental variables. These smooth func-
tions are most often estimated by cubic smoothing splines (Yee and Mackenzie, 2002). GAMs have been used in
numerous ecological applications to predict species distribution as a function of their environment: modelling of
forest biota (e.g. Austin and Meyers, 1996); and to predict vegetation distribution along environmental gradients
(e.g. Lehman et al., 2002).
A few applications of GAMs to aquatic habitat modelling are also reported in the literature. Knapp and Preisler
(1999) used nonparametric logistic regression, a subclass of GAMs, to predict habitat use of spawning salmonids
in a stream located in the southern California from channel and microhabitat variables. A parametric model was
then developed from nonparametric model. The nonparametric model explained 62% of the variation in spawning
location. Four of the eight habitat variables, substrate size, water depth, water velocity, and stream width explained
59% of the variation in spawning location. Lehman (1998) used GAMs to identify responses of aquatic macrophyte
to environmental factors in Lake Geneva (Switzerland). Milner et al. (2001) applied a GAM model to predict
macro invertebrate richness at the reach level across glacier-fed river sites. Brosse and Lek (2002) compared
the predictive ability of GAMs and Artificial Neural Networks (ANNs, see following section) to model fish abun-
dance in Lake Pareloup, France. It was concluded that ANNs are better suited to model the non-linear relation-
ships. Although GAMs have a lot of interesting options for developing statistical habitat models, several authors
cautioned about their use. One potential drawback of GAM, and other non-parametrical methods such as ANNs, is
that they do not produce a parametric function of the model. This is particularly a problem when the aim is to build
a spatial prediction model using GIS (Lehman, 1998; Lehman et al., 2002). Myers et al. (2002) recommended
using GAMs only when no simpler model (e.g. linear regression) can provide an adequate fit to the data.

ARTIFICIAL NEURAL NETWORKS


Species habitat relationships are often governed by complex and non-linear interactions among various environ-
mental variables. To deal with this complexity different transformation (e.g. logarithmic, power, or exponential
functions) are used. These transformations can appreciably improve the results, but have often failed to fit data
(Brosse and Lek, 2002; Lek et al., 1996). The artificial neural networks (ANNs), are one the various proposed
methods to overcome these problems. ANNs are receiving increased attention in the ecological studies as a power-
ful modelling paradigm for uncovering patterns in ecological data (Olden and Jackson, 2002a).
ANNs are computational models that consist of a set of processing elements (i.e. nodes or neurons) and con-
nections between them, as well as a training algorithm (Tambe et al., 1996). A suitably trained neural network has
the interesting property of being able to generalize from the training set, and provide reasonably accurate predic-
tions when presented with new inputs that were not part of the original training set. Neural networks are fitted to
applications through a computer-intensive methodology of learning from examples that require relatively little

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
514 B. AHMADI-NEDUSHAN ET AL.

Figure 1. Feed forward Artificial Neural Network

prior knowledge on the essentials basis of the problem (Fine, 1999). ANNs are applied in many fields of science
and engineering owing to their attractive computational properties (Tambe et al., 1996): 1) suitability for nonlinear
systems, 2) ability to handle multivariable systems, 3) learning, adaptation and generalization, 4) fault tolerance
and 5) data compression. There are different types of neural networks, which differ from one another in architec-
ture and training algorithms. The multi layer feed-forward neural network is one of the most widely used (Brosse
et al., 1999). It has been proven that these networks with at least one hidden layer can approximate any nonlinear
function. This class of ANNs consists of three neural layers: an input layer, one or more hidden layers, and an
output layer (Figure 1). The input layer is made up of predictor variables nodes (neurons) and a bias node used
during neural network training. The hidden layer is the location where the neural network is trained.
Feed forward networks have been extensively used mainly as nonlinear function approximation and classifica-
tion tools (Tambe et al., 1996). A feed forward neural network model is presented with a set of example cases
(input and output cases) and the network connection weights and biases are optimized so that the output units
match as closely as possible to the example cases. The examples are referred to as the ‘training set’. There are
several learning algorithms procedures for adjusting the weights, of which the most popular type is back propaga-
tion (Fine, 1999). Back propagation is essentially a neural network implementation of gradient descent and
requires the specification of some additional parameters: the learning rate (which basically represents the search
step size) and, optionally, the momentum, which takes into account the effect of previous search steps in order to
lessen over-sensitivity to the training data.
Applications of ANNs in ecological and environmental sciences have been reported since the beginning of
1990s (Lee et al., 2003). Feed forward ANNs have been used in stream ecology for modelling fish species richness,
density and biomass of various fish populations (e.g. Gozlan et al., 1999; Ibarra et al., 2003; Lek and Guégan,
1999; Lek et al., 1996; Mastrorillo et al., 1997; Olden and Jackson, 2001; Olden and Jackson, 2002a, 2002b).
The general conclusion is that the ANNs are very well suited for predicting fish abundance and analysing nonlinear
associations of aquatic habitat data. In all applications, only one hidden layer was sufficient to model the response
variables. The number of nodes in hidden layer varied between two to eight depending on the type of application.
Single hidden hidden layer networks are popular as they are considered to be universal approximator of any con-
tinuous function. Furthermore, computational time is greatly reduced compared to multiple-layer ANNs (Olden
and Jackson, 2001).
Although ANNs have been proven to be powerful modelling tools in ecological sciences, researchers have often
criticized the explanatory value of ANNs, calling it a ‘black box’ approach to modelling ecological phenomena
(Lek and Guégan, 1999; Olden and Jackson, 2002a). This view is based on the fact that the contribution of the input
variables in predicting the output is difficult to disentangle within the network, and explanations regarding the
relative importance of each independent variable are not as straightforward as in the case of linear regression

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 515

methods (Olden and Jackson, 2002a; Laë et al., 1999). Nevertheless there are methods available for studying
contribution of environmental factors to ANNs, these methods have been the subject of few recent articles in
ecology (Gevrey et al., 2003; Olden and Jackson, 2002a). Gevrey et al. (2003) presented and discussed several
methods for studying contribution of environmental factors. The use of expert opinions in the selection of
physical variables is underlined. They recommend that if available, expert opinion regarding the ranking of
importance of input variables must be taken into consideration. If an ecologist’s opinion is unavailable, results
of two of these methods, Pad (a method which calculates partial derivative of an ANN output to an input) and
Perturb (a method that assesses the effect of small changes in each input on the ANN output) should be compared.
If results of these two methods are different, the network may be poorly calibrated or the data may be very difficult
to analyse.

FUZZY-RULE BASED MODELLING


Uncertainties often exist in ecological modelling. A large inherent uncertainty of ecological data results from the
presence of random variables, incomplete or inaccurate data, use of approximate estimations instead of direct mea-
surements (Salski, 2003). One way of handling the uncertainty is to use the fuzzy approach. Compared to conven-
tional methods, fuzzy logic allows for a better use of imprecise and uncertain measurements and vague expert
knowledge in two ways: 1) the representation and handling of imprecise data defined as fuzzy sets, 2) the repre-
sentation and processing of vague expert knowledge in the form of linguistic rules with imprecise terms defined as
fuzzy rules.
The fuzzy set theory (Zadeh, 1965) is as an extension of classic set theory, and is built around the central concept
of fuzzy sets or membership functions. Fuzzy set theory enables the processing of imprecise information by means
of membership functions, in contrast to Boolean characteristic mappings (Zadeh, 1965). The conventional char-
acteristic mapping of a classical set (sometimes called crisp set) takes only two values: one, when an element
belongs to the set; and zero, when it does not. In fuzzy set theory, an element can belong to a fuzzy set with its
membership degree varying from zero to one (Adriaenssens et al., 2004). The fuzzy approach is particularly useful
for the representation of uncertainty in habitat modelling and it allows working with imprecise or fuzzy informa-
tion. These aspects present a significant advantage in habitat modelling as expert knowledge often readily available
from experienced fish biologists can be easily transferred into preference data sets (Jorde et al., 2001). Applica-
tions of fuzzy modelling in stream ecology are reported for habitat quality and instream flow assessment (Jorde
et al., 2001; Kerle et al., 2002; Schneider and Peter, 1999). The application of fuzzy logic modelling to habitat
models requires that fuzzy rules be used to consider combination of different physical factors (e.g. depth, substrate,
velocity and cover). Using a fuzzy approach has the advantage that the rules used for determining the suitability
can be expressed verbally (e.g. if the water depth is medium, the flow velocity is relatively low, substrate is small
and no cover is provided then suitability for a given species is medium), and different combinations of variables
can be considered in a manner deemed closer to the human way of thinking and communicating (Jorde et al., 2001;
Kerle et al., 2002). In the fuzzy approach, first, habitat suitability and inputs (i.e. depth, velocity, substrate) are
subdivided in different classes (e.g. low, medium, good and very good). Then, the model is built as follows: With
crisp input numbers from the hydraulic model (water depth, flow velocity) the fuzzy model first calculates the
degrees of membership of these parameters (Figure 2). In a next step the degree of fulfilment of each fuzzy rule
is analyzed. Then, the fuzzy sets of the output variable (HSI) are weighted with these degrees of fulfilments and
combined to a final fuzzy set. In a last step called defuzzification the final fuzzy set is transformed back into a
standardized number to describe the HSI defined between 0 and 1 representing unsuitable and most suitable,
respectively. Fuzzy rule-based approach has the following advantages: 1) it allows the numerical processing of
qualitative knowledge about fish habitat, 2) it considers multivariate effects but no independence of the input
parameters is required, 3) new parameters can be added easily and it allows the inclusion of numerous combina-
tions of physical parameters into habitat simulation tools and it is easy to implement (Jorde et al., 2001; Kerle
et al., 2002). Jorde et al. (2001) used fuzzy logic in a habitat simulation model (CASIMIR) for fish habitat
evaluation of several rivers in Switzerland and concluded that observed fish densities show a higher correlation
with fuzzy based simulations than for those based on preference functions.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
516 B. AHMADI-NEDUSHAN ET AL.

Figure 2. Fuzzy sets describing membership functions for habitat variables and habitat suitability index (adopted from Jorde et al., 2001)

STATISTICAL ANALYSIS OF MULTI-SPECIES AND COMMUNITIES


The statistical predictive models discussed in the previous sections generally tend to focus on a particular species
and how this particular species is related to its surrounding environment. These methods are not suitable for the
simultaneous analysis of multiple species, as a distinct model must be developed for each species and many models
are needed. For example, the application of regression methods on multi species data, individual species regression
models must be developed. Each regression focuses on a particular species and how this particular species is asso-
ciated to environmental variables (Jongman et al., 1995). Other classes of multivariate statistical methods are
needed for the analysis of multiple species at multiple sites. These methods, which quantify the degree of associa-
tion between sets of environmental variables and fish assemblage structures, are generally called ordination or
gradient analysis in ecology literature. Ordination methods are multivariate statistical methods designed to analyse
ecological data sets of species occurrence and environmental variables over numerous sites by ordering environ-
mental entities such as species (or sites) in such a way that similar entities are placed close to one another and
dissimilar entities farther apart (Jongman et al., 1995). Conclusions concerning environmental factors that affect
the species are drawn from relative proximity of these entities. Two categories of ordination methods can be dis-
tinguished: (1) direct and (2) indirect. Indirect ordination methods extract dominant, orthogonal axes of variation
in abundance indices for multiple species at multiple sites based on species data and without direct consideration of
environmental variables in the process. Typically, the ordination axes are then interpreted indirectly with the use of
environmental data, either qualitatively or with regression methods (Gauch, 1982). PCA and Correspondence Ana-
lysis (CA) are two of more widely used indirect methods. Correspondence Analysis calculates site and species
‘scores’ based on reciprocal averaging. Reciprocal averaging is weighted averages based on species abundances
or weighted on known species environmental optima or tolerances. For community ecology data, CA generally
performs better than principal component analysis, due to the unimodal response of species abundance to envir-
onmental gradients (Jongman et al., 1995). In contrast to indirect gradient analysis methods, direct gradient ana-
lysis methods consider observed environmental variables directly in calculation of ordination axes. Direct gradient
analysis methods are designed to detect the pattern of species abundance that can best be explained by observed

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 517

environmental variables (Jongman et al., 1995), and use a supplied matrix of predictor variables (i.e. environmen-
tal) to quantify the variation in a matrix of response variables (Jongman et al., 1995; Wang et al., 2003). This prop-
erty makes direct ordination methods a useful tool in aquatic habitat modelling of multiple species. Canonical
Correspondence Analysis (CCA) and Redundancy Analysis (RDA) are two direct gradient analyses techniques
that have been used in statistical habitat modelling (Jongman et al., 1995). Two types of species responses to
an environmental gradient can be assumed in habitat modelling: the linear and the unimodal response; however
linear responses are valid only for short gradients (e.g. a limited range of an environmental factor). Nonlinear
(unimodal) responses are more appropriate for modelling aquatic habitat as aquatic species abundance usually
have an upper and a lower limit of occurrence and an optimum value along gradients. For example, fish species
are rare at very low and high velocities and are more abundant in medium velocities (Heggenes, 2002; Vismara
et al., 2001). In RDA, species responses are assumed to be linear along an environmental gradient but CCA is
based on nonlinear responses. This property makes CCA more appropriate in habitat modelling. In CCA, the spe-
cies ordination is done directly and iteratively in relation to supplied environmental variables. CCA relates species
abundance to measured variation in the environment by requiring the ordination axes to be linear combinations of
the environmental variables (Ter Braak and Verdonschot, 1995). The assumed distribution of species is Gaussian,
with an upper and a lower limit of occurrence and an optimum along gradients. CCA estimates a series of site
scores that are linear combinations of the environmental variables that maximize the species-environment correla-
tion. CCA has been widely used for analysis of multiple aquatic species: for different fish species (e.g. Fladung
et al., 2003; Godinho et al., 2000; Grift et al., 2003; Ostrand and Wilde, 2002; Taylor, 2000; Toepfer et al., 1998;
Watkins et al., 1997), for microinvertebrate species (Cortes et al., 2002; Miserendino, 2001; Ruse and Davison,
2000), and macrophyte vegetation along rivers (Bernez et al., 2004). In these studies CCA determines which phy-
sical habitat variables influence spatial and temporal abundance of aquatic biota. For example, Grift et al. (2003)
used CCA to analyse the spatial distribution of the yearlings fish along gradients of environmental variables. The
analysis resulted in a set of environmental variables (flow velocity, depth, temperature, Secchi-disc depth, inun-
dated terrestrial vegetation, and substrate type) that were significantly associated with species occurrence. The
results of CCA are generally presented by a diagram (ordination bi-plot), on which species are represented by
the points and environmental variables by arrows. The directions and relative lengths of the arrows for environ-
mental variables represent their contributions to the ordination. Model calibration of CCA is very close to multiple
regression except that the goal is to minimize the ratio of mean whitin-species sum-of-squares of the variate to the
overall sum of squares (Guisan et al., 1999). Selection of variables can also be made in a stepwise approach in
which selection is made of the environmental variables explaining successively the highest proportion of variance
in a species data as whole. CCA fit is measured both by the trace of the underlying correspondence analysis (i.e. the
total variance in the species data) and by the proportion of variance in the species’ data explained by each canonical
axis.

COMPARATIVE STUDIES
Despite the variety of statistical models available for predictive modelling of aquatic species distribution, few stu-
dies directly compare different methods on a common data set. Comparative studies, which use more than two
statistical models for aquatic habitat modelling, are very limited and most published studies use only one of several
available statistical methods. Most comparative studies concentrate on the comparison of ANNs and regression
based methods (i.e. MLR, logistic regression). Several researchers have compared the predictive ability of ANNs
and MLR for modelling fish abundance (Brosse et al., 1999; Manel et al., 1999; Olden and Jackson, 2001; Olden
and Jackson, 2002b). Few studies compared the predictive power of ANN and logistic regression in modelling fish
presence-absence (Manel et al., 1999; Olden and Jackson, 2002b).
Brosse et al. (1999) used MLR, GAM and backpropagation ANNs to model fish abundance in a French reser-
voir. Eight independent environmental variables (depth, distance from the bank, slope of the bottom, flooded vege-
tation cover, percentage of boulders, percentage of pebbles, percentage of gravel and percentage of mud) were used
to model abundance of six main fish populations of Lake Pareloup in the southwest of France. The GAM models
had higher correlation coefficients with the observed data. The improvement obtained by using GAM models

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
518 B. AHMADI-NEDUSHAN ET AL.

indicated the existence of nonlinear relationships between dependent variables (i.e. abundance of fish populations)
and independent variables. ANN models performed better than both MLR and GAM models. Ibarra et al. (2003)
compared MLR and ANN in modelling fish guilds in rivers of Garon Basin, France. Five environmental variables
(i.e. altitude, distance from the river source, surface of catchment area, annual mean water temperature and annual
mean water flow) were used to predict presence-absence of fish. Results showed that ANNs performed better than
multiple regression models for predicting species richness of guilds. Olden and Jackson (2002b) compared differ-
ent methods for predicting presence-absence of 27 fish species in 286 lakes located in Ontario, Canada. Logistic
regression, linear discriminant analysis and ANNs were the methods used to model presence-absence and abun-
dance of the fish. On average, ANNs outperformed the other models, although all approaches had moderate to
excellent results for predicting species presence-absence. The same methods were used for two simulated data sets
using linear and nonlinear species response curves. For simulated linear data all methods showed similar classi-
fication rates while ANNs outperformed the other methods for nonlinear simulated data. It was concluded that a
non-linear modelling approach (i.e. ANNs) are better able to capture complex pattern of ecological data. Olden and
Jackson (2001) compared ANNs and logistic regression for predicting species presence-absence in 160 lakes in the
province of Ontario (Canada). It was demonstrated that ANNs outperformed logistic regression models and had
greater or equal correct classification rates than logistic models for most of fish-habitat models. Manel et al. (1999)
compared the performance of logistic regression, ANNs and discriminant analysis in predicting a common river
bird presence-absence from 32 different dependent variables describing habitat structure, stream, altitude, slope,
chemistry and invertebrate abundance. Their results contrast with other studies as it was concluded that ANNs do
not have any major advantages over logistic regression and discriminant analysis. Results of these comparative
studies demonstrate the capabilities of ANNs to model nonlinear relationship existing between aquatic species
and their surrounding environment and introduce ANNs as a promising alternative to traditional statistical
approaches. Comparative studies that assess the performance of other statistical methods for modelling of aquatic
habitat were not found. However an example is available in plant ecology for comparison of CCA and GLM.
Guisan et al. (1999) compared the predictive power of GLM and CCA models of plant distribution in the Spring
Mountains of Nevada, USA. Results indicated that GLM models performed batter than CCA models, as specific
subsets of environmental variables can be selected for each individual species, where in CCA, all species are mod-
elled using the same set of environmental variables. Results of CCA can be more easily implemented in a GIS for
many species at once. Predictions of CCA are not probabilistic but are expressed as a distance from centroid of
each species. An additional step can be taken by using a logistic regression of ordination axes versus environmental
variables. Results of GLM are probabilistic but a distinct regression model must be developed for each species.
This makes the implementation of GLM models for many species less straightforward than with CCA. In general,
GLM provides better species specific-models, but CCA provides a broader overview of multiple species, diversity
and species communities.

SUMMARY AND DISCUSSION


This study provides an overview of existing methods and compared relative advantages and drawbacks of statis-
tical methods used in aquatic habitat modelling. Statistical models are important tools in the prediction of species
distributions and abundance based on relevant environmental variables. A variety of statistical methods are already
in use to model the aquatic species-environment relationship. It was found that most of the statistical models
reported in the literature are based on multiple linear regression and logistic regression which have been used
for analysis of aquatic species density (mostly fish) as well as presence-absence. Other statistical methods
reviewed in this paper present advantages in different contexts (Table I). The choice of the appropriate model
of species-environment relationship depends on the goals and resources of study and especially on the types of
measured environmental and response variables. In situations where environmental variables are numerous and
are highly correlated (multicollinearity), principal component regression has been used. Principal component
regression allows for the use of new orthogonal uncorrelated components derived from environmental variables
in modelling species abundance. In this review, ridge regression is proposed as a novel approach to deal with multi-
collinear data. Ridge regression allows to use correlated environmental variables and provides more stable

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 519

Table I. Statistical approaches with application examples

Modelling approach Advantages Disadvantages Examples of habitat


modelling applications

Multiple linear regression Straightforward to fit to Not appropriate when Annoni et al., 1997; Braaten
observations and to analyze assumptions of normality, and Guy, 1999; Brosse and
abundance data linear relationship, and Lek, 2002; Reash and Pigg,
Widely available in most constant variance of errors 1990; Vadas and Orth, 2001;
computer packages (homoscedasticity) are Vismara et al., 2001; Yu and
violated Lee, 2002
The estimated parameters are
not stable when the variables
are highly correlated
Logistic regression Easy to use for the analysis of Better adopted to Filipe et al., 2002; Fladung
dichotomous (presence- dichotomous data and hence et al., 2003; Garland et al.,
absence) data and is imple- they reduce the information 2002; Geist et al., 2000; Guay
mented in many software available from continuous et al., 2000, 2003; Manel
packages variables et al., 2001
Generalized linear models Not limited to normally dis- For small samples (less than Labonne et al., 2003
tributed variables. Offers 20), maximum-likelihood
flexibility to choose different estimates are biased
types of distribution for
responses
Generalized additive models Considers nonlinear Does not produce a Knap and Preisler, 1999;
relationship between response parametric mathematical Lehman, 1998; Milner et al.
variables and environmental function 2001
variables
Artificial neural networks Ability to implicitly detect ‘‘Black-Box’’ approach: con- Gozlan et al., 1999; Ibarra
complex nonlinear relation- tribution of the input variables et al., 2003; Lek et al., 1996;
ships between response in predicting species response Mastorillo et al., 1997; Olden
variables and environmental is difficult to disentangle and Jackson, 2002a, 2002b
variables within the network (less
straightforward than
regression methods)
Fuzzy logic Knowledge from imprecise The number of fuzzy-rules is Jorde et al., 2001; Kerle et al.,
and uncertain measurements rapidly increasing as more 2002
and vague expert knowledge parameters are considered.
can be considered
Canonical correspondence Provides a broad overview of Does not provide the Godinho et al., 2000; Grift
analysis multiple species, diversity, best specific-models for et al., 2003; Ostrand and
and species communities. each species Wilde., 2002; Taylor, 2000;
Models are more easily Watkins et al., 1997
implemented for many
species at once
Ridge regression Can handle highly correlated The estimates are biased Not used yet
environmental variables
(multicollinearity).
Is beneficial when the amount
of data is not large relative to
number of variables

regression coefficients than ordinary least squares regression. Where the assumptions of normality inherent in lin-
ear regression cannot be justified, modern regression modelling paradigms like GLM are more appropriate. GLMs
are a more flexible family of regression models, which allow other distributions for the response variable. Use of
ANNs and GAMs are preferable when the species-habitat relationships are nonlinear, which is the case for most
commonly used data. ANNs are a promising area of predictive statistical habitat modelling owing to their attractive

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
520 B. AHMADI-NEDUSHAN ET AL.

computational properties: 1) suitability for nonlinear systems, 2) ability to handle multivariable systems, 3) learn-
ing, adaptation and generalization, 4) fault tolerance and 5) data compression. GAMs are used to implement non-
parametric smoothers in regression models. This technique applies smoothers independently to each predictor and
additively calculates the component response. A drawback of GAM and ANNs, both non-parametrical methods, is
that they do not provide users with a conventional mathematical function and it is difficult to export a model. This
problem becomes particularly evident when the aim is to build a spatial prediction in a GIS.
The fuzzy logic approach presents advantages in dealing with uncertainties which often exist in habitat model-
ling. The fuzzy approach allows better use of expert knowledge and imprecise and uncertain measurements in the
modelling. Ordination methods can be used for analysis of multi species data, the most useful of which is CCA, a
direct gradient analysis, which allows using unimodal responses and uses environmental variables directly in cal-
culation of ordination axes. Finally, the few existing comparative studies for predictive modelling were reviewed;
however it was noted that there is a shortage of comprehensive comparative studies (such as Olden and Jackson,
2002b; Manel et al., 1999) in which more than two statistical methods are applied to the same aquatic species data,
and most published papers use only one of the many statistical techniques. There is a need for more comparative
studies as the choice of an appropriate statistical method would be much facilitated with an increasing number of
studies that compare the advantages and disadvantages of different methods in a particular context.

REFERENCES

Adriaenssens V, De Baets B, Goethals P, De Pauw N. 2004. Fuzzy rule-based models for decision support in ecosystem management. The
Science of the Total Environment 319: 1–12.
Afifi AA, Clark V. 1996. Computer-Aided Multivariate Analysis. Chapman & Hall: New York.
Agresti A. 1996. An Introduction to Categorical Data Analysis. John Wiley & Sons: New York.
Akaike H. 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control 19: 716–723.
Allan JD. 1995. Stream Ecology: Structure and Function of Running Waters. Chapman & Hall: London.
Allison PD. 1999. Logistic Regression Using the SAS System: Theory and Application. SAS Institute Inc: Cary, NC.
Annear T, Chisholm I, Beecher H, Locke A, Aarrestad P, Burkardt N, Coomer C, Estes C, Hunt J, Jacobson R, Jobsis G, Kauffman J, Marshall J,
Mayes K, Stalnaker C, Wentworth R. 2004. Instream Flows for Riverine Resource Stewardship, Revised edition. Instream Flow Council:
Cheyenne, WY; 268.
Annoni P, Saccardo I, Gentili G, Guzzi L. 1997. A multivariate model to relate hydrological, chemical and biological parameters to salmonoid
biomass in Italian Alpine rivers. Fisheries Management and Ecology 4(6): 439–452.
Armstrong JD, Kemp PS, Kennedy GJA, Ladle M, Milner NJ. 2003. Habitat requirements of Atlantic salmon and brown trout in rivers and
streams. Fisheries Research 62: 143–170.
Austin MP, Meyers JA. 1996. Current approaches to modelling the environmental niche of eucalyptus: implication for management of forest
biodiversity. Forest Ecology and Management 85: 95–106.
Beecher HA, Caldwell BA, Demond SB. 2002. Evaluation of depth and velocity preferences of juvenile coho salmon in Washington streams.
North American Journal of Fisheries Management 22: 785–795.
Bernez I, Daniel H, Haury J, Ferreira MT. 2004. Combined effects of environmental factors and regulation on macrophyte vegetation along three
rivers in western France. River Research and Applications 20: 43–59.
Bovee KD. 1986. Development and Evaluation of Habitat Suitability Criteria for Use in the Instream Flow Incremental Methodology. U.S. Fish
and Wildlife Service Biological Report 86(7): 1–235.
Bovee KD, Lamb BL, Bartholow JM, Stalnaker CB, Taylor J, Henriksen J. 1998. Stream Habitat Analysis Using the Instream Flow Incremental
Methodology. U.S. Geological Survey, Biological Resources Discipline Information and Technology Report USGS/BRD-1998-0004.
Braaten PJ, Guy CS. 1999. Relations between physicochemical factors and abundance of fishes in tributary confluences of the lower channelized
Missouri river. Transactions of the American Fisheries Society 128(6): 1213–1221.
Brosse S, Guégan JF, Lek S. 1999. The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a
mesotrophic lake. Ecological Modelling 120: 299–311.
Brosse S, Lek S. 2002. Relationships between environmental characteristics and the density of age-0 Eurasian perch perca fluviatilis in the
littoral zone of a lake: a nonlinear approach. Transactions of the American Fisheries Society 131: 1033–1043.
Brown SK, Buja KR, Jury SH, Monaco ME, Banner A. 2000. Habitat suitability index models for eight fish and invertebrate species in Casco
and Sheepscot Bays, Maine. North American Journal of Fisheries Management 20: 408–435.
Bult TP, Riley SC, Haedrich RL, Gibson RJ, Heggenes J. 1999. Density-dependent habitat selection by juvenile Atlantic salmon (Salmo salar L.)
in experimental riverine habitats. Canadian Journal of Fish and Aquatic Sciences. 56: 1298–1306.
Cairns DM. 2001. A comparison of methods for predicting vegetation type. Plant Ecology 156: 3–18.
Caissie D, El-Jabi N. 2003. Instream flow assessment: from holistic approaches to habitat modelling. Canadian Water Resources Journal 28(2):
173–184.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 521

Cawsey EM, Austin MP, Baker BL. 2002. Regional vegetation mapping in Australia: a case study in practical use of statistical modelling.
Biodiversity and Conservation 11: 2239–2274.
Cohen P, Andriamahefa H, Wasson J. 1998. Towards a regionalization of aquatic habitat: distribution of mesohabitat at the scale of a large basin.
Regulated Rivers Research and Management 14: 391–404.
Cooperrider A, Boyd RJ, Stuart HR. 1986. Inventory and Monitoring of Wildlife Habitat. U.S. Dept. of the Interior, Bureau of Land Manage-
ment: Washington, DC.
Corbacho C, Sánchez JM. 2001. Patterns of species richness and introduced species in native freshwater fish faunas of a Mediterranean-type
basin: the Guadiana River (southwest Iberian Peninsula). Regulated Rivers: Research and Management 17: 699–707.
Cortes RMV, Ferreira MT, Oliveira SV, Oliveira D. 2002. Macroinvertebrate community structure in a regulated river segment with different
flow conditions. River Research and Applications 18: 367–392.
Department of Fisheries and Oceans (DFO). 1986. Policy for the Management of Fish Habitat. Communication directorate: Ottawa, Canada.
Elso JI, Giller PS. 2001. Physical characteristics influencing the utilization of pools by brown trout in an afforested catchment in Southern
Ireland. Fish Biology 58: 201–221.
Filipe AF, Cowx IG, Collares-pereira MJ. 2002. Spatial modelling of freshwater fish in semi-arid river systems: a tool for conservation. River
Research and Applications 18: 123–136.
Fine TL. 1999. Feedforward Neural Network Methodology. Springer: New York.
Fladung M, Scolten M, Thiel R. 2003. Modelling the habitat preferences of preadult and adult fishes on the shoreline of the large, lowland Elbe
River. Journal of Applied Ichthyology 19: 303–314.
Garland RD, Tiffan KF, Rondorf DW, Clark LO. 2002. Comparison of subyearling fall chinook salmon’s use of riprap revetments and unaltered
habitats in Lake Wallula of the Columbia River. North American Journal of Fisheries Management 22: 1283–1289.
Gauch HG. 1982. Multivariate Analysis in Community Ecology. Cambridge University Press: Cambridge.
Geist DR, Jones J, Murray CJ, Dauble DD. 2000. Suitability criteria analyzed at the spatial scale of redd clusters improved estimates of fall chinook
salmon spawning habitat use in the Hanford Reach, Columbia River. Canadian Journal of Fisheries and Aquatic Sciences 57: 1636–1646.
Gevrey M, Dimopoulos I, Lek S. 2003. Review and comparison of methods to study the contribution of variables in artificial neural network
models. Ecological Modelling 160: 249–264.
Giller PS, Malmqvist B. 1998. The Biology of Streams and Rivers. Oxford University Press: New York.
Godinho FN, Ferreira MT, Santos JM. 2000. Variation in fish community composition along an Iberian river basin from low to high discharge:
relative contributions of environmental and temporal variables. Ecology of Freshwater Fishes 9: 22–29.
Gore JA, Nestler JM. 1988. Instream flow studies in perspective. Regulated Rivers: Research and Management 2: 93–101.
Gore JA, Crawford DJ, Addison DS. 1998. An analysis of artificial riffles and enhancement of benthic community diversity by physical habitat
simulation (PHABSIM) and direct observation. Regulated Rivers: Research and Management 14: 69–77.
Gozlan RE, Mastrorillo S, Copp GH, Lek S. 1999. Predicting the structure and diversity of young-of-the-year fish assemblages in large rivers.
Freshwater Biology 41: 809–820.
Grift RE, Buijse AD, Van Densen WLT, Machiels MAM, Kranenbarg J, Klein Breteler JGP, Backx JJGM. 2003. Suitable habitats for 0-group
fish in rehabilitated floodplains along the River Rhine. River Research and Applications 19: 353–374.
Guay JC, Boisclair D, Rioux D, Leclerc M, Lapointe M, Legendre P. 2000. Development and validation of numerical habitat models for juve-
niles of Atlantic salmon (Salmo salar). Canadian Journal of Fisheries and Aquatic Sciences 57: 2065–2075.
Guay JC, Boisclair D, Leclerc M, Lapointe M. 2003. Assessment of the transferability of biological habitat models for juveniles of Atlantic
salmon (Salmo salar). Canadian Journal of Fish and Aquatic Sciences 60(11): 1398–1408.
Guisan A, Weiss SB, Weiss AD. 1999. GLM versus CCA spatial modelling of plant species distribution. Plant Ecology 143: 107–122.
Guisan A, Zimmermann NE. 2000. Predictive habitat distribution models in ecology. Ecological Modelling 135: 147–186.
Guisan A, Edwards TC, Hastie T. 2002. Generalized linear and generalized additive models in studies of species distributions: setting the scene.
Ecological Modelling 157: 89–100.
Harraway J. 1995. Regression Methods Applied. University of Otago Press: Dunedin, New Zealand.
Harvey BC, White JL, Nakamoto RJ. 2002. Habitat relationships and larval drift of native and nonindigenous fishes in neighboring tributaries of
a coastal California river. Transactions of the American Fisheries Society 131: 159–170.
Heggenes J. 2002. Flexible summer habitat selection by wild, allopatric brown trout in lotic environments. Transactions of the American Fish-
eries Society 131: 287–298.
Hoerl AE, Kennard RW. 1970. Ridge regression: applications to nonorthogonal problems. Technometrics 12(1): 55–67.
Holm CF, Armstrong JD, Gilvear DJ. 2001. Investigation a major assumption of predictive instream habitat models: is water velocity preference
of juvenile Atlantic salmon independent of discharge? Fish Biology 59: 1653–1666.
Ibarra AA, Gevrey M, Park YS, Lim P, Lek S. 2003. Modelling the factors that influence fish guilds composition using a back-propagation
network. Ecological Modelling 160: 281–290.
Inoue M, Nakano S. 2001. Fish abundance and habitat relationships in forest and grassland streams, northern Hokkaido, Japan. Ecological
Research 16: 233–247.
Jackson DA, Peres-Neto PA, Olden JD. 2001. What controls who is where in freshwater fish communities—the roles of biotic, abiotic, and
spatial factors. Canadian Journal of Fish and Aquatic Sciences 58: 157–170.
Jongman RHG, Ter Braak CJF, Tongeren OFR. 1995. Data Analysis in Community and Landscape Ecology. Cambridge University Press:
Cambridge.
Jorde K, Schneider M, Peter A, Zoellner F. 2001. Fuzzy based models for the evaluation of fish habitat quailty and instream flow assessment.
Proceedings of the 3rd International Symposium on Environmental Hydraulics. 5–8 December, Tempe, AZ.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
522 B. AHMADI-NEDUSHAN ET AL.

Jowette IG. 1997. Instream flow methods: a comparison of approaches. Regulated Rivers: Research and Management 13: 115–127.
Jowette IG. 2003. Hydraulic constraints on habitat suitability for benthic invertebrates in gravel-bed river. River Research and Applications 19:
495–507.
Jutila E, Ahvonen A, Julkunen M. 2001. Instream and catchment characteristics affecting the occurrence and population density of brown trout,
salmo trutta L., in forest brooks of a boreal river basin. Fisheries Management and Ecology 8: 501–511.
Katopodis C. 2003. Case studies of instream flow modelling for fish habitat in Canadian Prairie rivers. Canadian Water Resources Journal 28(2):
199–216.
Kerle F, Zöllner F, Schneider M, Kappus B, Baptist MJ. 2002. Modelling of long-term fish habitat changes in restored secondary floodplain
channels of the River Rhine. Fourth International Ecohydraulics Symposium, 3–8 March, Cape Town, South Africa.
King JM, Tharme RE, Brown CA. 1999. Definition and Implementation of Instream Flows. Thematic report for World Commission on dams.
Southern Waters Ecological Research and consulting: Cape Town, South Africa.
Korman J, Perrin CJ, Lekstrum T. 1994. A Guide for the Selection of Standard Methods for Quantifying Sportfish Habitat Capability and Suit-
ability in Streams and Lakes of British Columbia. Report to B.C. Environment Fisheries Branch, Vancouver: British Columbia.
Knapp RA, Preisler HK. 1999. Is it possible to predict habitat use by spawning salmonids? A test using California golden trout (Oncorhynchus
mykiss aguabonita). Canadian Journal of Fish and Aquatic Sciences 56: 1576–1584.
Kynard B, Horgan M, Kieffer M. 2000. Habitats used by shortnose sturgeon in two Massachusetts rivers, with notes on estuarine Atlantic stur-
geon: a hierarchical approach. Transactions of the American Fisheries Society 129: 487–503.
Labonne J, Allouche S, Gaudin P. 2003. Use of a generalised linear model to test habitat preferences: the example of Zingel asper, an endemic
endangered percid of the River Rhône. Freshwater Biology 48: 687–697.
Laë R, Lek S, Moreau J. 1999. Predicting fish yield of African lakes using neural networks. Ecological Modelling 120: 325–335.
Lamouroux N, Capra H, Pouilly M. 1998. Predicting habitat suitability for lotic fish: linking statistical hydraulic models with multivariate
habitat use models. Regulated Rivers: Research and Management 14: 1–11.
Lamouroux N, Capra H. 2002. Simple predictions of instream habitat model outputs for target fish populations. Freshwater Biology 47: 1543–
1556.
Layher WG, Maughan OE. 1985. Spotted bass habitat evaluation using an unweighted geometric mean to determine HSI values. Proceedings of
the Oklahoma Academy of Science 65: 11–17.
Leclerc M, St-Hilaire A, Bechara JA. 2003. State-of-the-art and perspectives of habitat modelling. Canadian Water Resources Journal 28(2):
153–172.
Lee JHW, Huang Y, Dickman M, Jayawardena AW. 2003. Neural network modelling of coastal algal blooms. Ecological Modelling 159: 179–
201.
Lehman A. 1998. GIS modelling of submerged macrophyte distribution using Generalized Additive Models. Plant Ecology 139: 113–124.
Lehman A, Overton JM, Leathwick JR. 2002. GRASP: generalised regression analysis and spatial prediction. Ecological Modelling 157: 189–
207.
Lek S, Delacoste M, Baran P, Dimopoulos I, Lauga J, Aulagnier S. 1996. Application of neural networks to modelling nonlinear relationships in
ecology. Ecological Modelling 90: 39–52.
Lek S, Guégan JF. 1999. Artificial neural networks as a tool in ecological modelling, an introduction. Ecological Modelling 120: 65–73.
Mallet JP, Lamouroux P, Sagnes P, Persat H. 2000. Habitat preferences of European grayling in a medium size stream, the Ain river, France.
Journal of Fish Biology 56: 1312–1326.
Mallows CL. 1973. Some comments of Cp. Technometrics 13: 899–900.
Manel S, Dias J, Ormerod SJ. 1999. Comparing discriminant analysis, neural networks and logistic regression for predicting species distribu-
tions: a case study with a Himalayan river bird. Ecological Modelling 120: 337–347.
Manel S, Williams HC, Ormerod SJ. 2001. Evaluating presence-absence models in ecology: the need to account for prevalence. Applied Ecol-
ogy 38: 921–931.
Mastrorillo S, Dauba F, Oberdorff T, Guégan JF, Lek S. 1997. Predicting local fish species richness in the Garonne River basin. C. R. Académie
des Science 321: 423–428.
Matlock JK, Maughan OE. 1988. The relationship between physical habitat factors and benthic diversity in southeastern Oklahoma streams.
Proceedings of the Oklahoma Academy of Science 68: 81–84.
Mérigoux S, Dolédec S, Statzner B. 2001. Species traits in relation to habitat variability and state: neotropical juvenile fish in floodplain creeks.
Freshwater Biology 46: 1251–1267.
Miller AJ. 1990. Subset Selection in Regression. Chapman and Hall: London.
Milner AM, Brittain JE, Castella E, Petts GE. 2001. Trends of macroinvertebrate community structure in glacier-fed rivers in relation to envir-
onmental conditions: a synthesis. Freshwater Biology 46: 1833–1847.
Miserendino ML. 2001. Macroinvertebrate assemblages in Andean Patagonian rivers and streams: environmental relationships. Hydrobiologia
444: 147–158.
Montgomery DC, Peck EA. 1992. Introduction to Linear Regression Analysis, 2nd edn. John Wiley & Sons: New York.
Morin J, Mengelbier M, Bechara JA, Champoux O, Secretan Y, Jean M, Frenette JJ. 2003. Emergence of new explanatory variables for 2D
habitat modelling in large rivers: the St. Lawrence experience. Canadian Water Resources Journal 28(2): 249–272.
Myers RH, Montgomery DC, Vining GG. 2002. Generalized Linear Models with Applications in Engineering and the Sciences. John Wiley &
Sons: New York.
Nakamoto RJ. 1994. Characteristics of pools used by adult summer steelhead oversummering in the New River, California. Transactions of the
American Fisheries Society 123: 757–765.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)
STATISTICAL METHODS FOR INSTREAM FLOWS 523

Neter J, Wasserman W, Kunter MH. 1989. Applied Linear Regression, 2nd edn. Richard D. Irwin Inc: Homewood, IL.
Neumann RM, Wildman TL. 2002. Relationships between trout habitat use and woody debris in two southern New England streams. Ecology of
Freshwater Fish 11: 240–250.
Nykanen M, Huusko A, Maki-Petays A. 2001. Seasonal changes in the habitat use and movements of adult European grayling in a large sub-
arctic river. Journal of Fish Biology 58: 506–519.
Olden JD, Jackson DA. 2001. Fish-habitat relationships in lakes: gaining predictive and explanatory insight using artificial neural networks.
Transactions of the American Fisheries Society 130: 878–897.
Olden JD, Jackson DA. 2002a. Illuminating the ‘black box’: understanding variable contributions in artificial neural networks. Ecological Mod-
elling 154: 135–150.
Olden JD, Jackson DA. 2002b. A comparison of statistical approaches for modelling fish species distributions. Freshwater Biology 47: 1976–
1995.
Orth DJ, Maughan OE. 1981. Evaluation of the ‘Montana method’ for recommending instream flows in Oklahoma streams. Proceedings of the
Oklahoma Academy of Science 61: 62–66.
Ostrand KG, Wilde GR. 2002. Seasonal and spatial variation in a prairie stream-fish assemblage. Ecology of Freshwater Fish 11: 137–149.
Parasiewicz P, Dunbar MJ. 2001. Physical habitat modelling for fish—a developing approach. Large Rivers 12: 239–268.
Peeters ETHM, Gardeniers JJP. 1998. Logistic regression as a tool for defining habitat requirements of two common gammarids. Freshwater
Biology 39: 605–615.
Postel SL. 1998. Water for food production: will there be enough in 2025? BioScience 48: 629–637.
Rathert D, White D, Sifneos JC, Hughes RM. 1999. Environmental correlates of species richness for native freshwater fish in Oregon, USA.
Biogeography 26: 257–273.
Reash RJ, Pigg J. 1990. Physicochemical factors affecting the abundance and species richness of fishes in the Cimmaron River. Proceedings of
the Oklahoma Academy of Science 70: 23–28.
Rosenfeld J, Porter M, Parkinson E. 2000. Habitat factors affecting the abundance and distribution of juvenile cutthroat trout (Oncorhynchus
clarki) and coho salmon (Oncorhynchus kisutch). Canadian Journal of Fish and Aquatic Sciences 57(4): 766–774.
Rosenfeld J. 2003. Assessing the habitat requirements of stream fishes: an overview and evaluation of different approaches. Transactions of the
American Fisheries Society 132: 953–968.
Ruse L, Davison M. 2000. Long-term data assessment of chironomid taxa structure and function in River Thames. Regulated Rivers: Research
and Management 16: 113–116.
Ryan TP. 1997. Modern Regression Methods. John Wiley & Sons Inc: New York.
Salski A. 2003. Ecological applications of fuzzy logic. In Ecological Applications of Fuzzy Logic, in Ecological Informatics, Understanding
Ecology by Biologically—Inspired Computation, Recknagel F (ed.). Springer: New York.
Schneider M, Peter A. 1999. Okostrom: field study and use of the simulation model CASIMIR for determination of fish habitat in River Brenno.
Proceedings of the 3rd International Symposium on Ecohydraulics, Salt Lake City, Utah.
Stalnaker CB, Lamb BL, Henriksen J, Bovee K, Bartholow J. 1995. The Instream Flow Incremental Methodology: A Primer for IFIM. Biolo-
gical Report 29. United States National Biological Service, Fort Collins, Colorado.
Tambe SS, Kulkarni BD, Deshpande PB. 1996. Elements of Artificial Neural Networks with Selected Applications in Chemical Engineering, and
Chemical and Biological Sciences. Simulation and Advanced Controls, Inc: Louisville, Kentucky.
Taylor SM. 2000. A large-scale comparative analysis of riffle and pool fish communities in an upland stream system. Environmental Biology of
Fishes 58: 89–95.
Ter Braak CJF, Verdonschot DFM. 1995. Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquatic
Sciences 57: 255–289.
Tharme RE. 2003. A global perspective on environmental flow assessment: emerging trends in the development and application of environ-
mental flow methodologies for rivers. River Research and Applications 19: 397–441.
Tilma JS, Guy CS. 1998. Relations among habitat and population characteristics of spotted bass in Kansas streams. North American Journal of
Fisheries Management 18: 886–893.
Toepfer CS, Williams LR, Martinez AD, Fisher WL. 1998. Fish and habitat heterogeneity in four streams in the central Oklahoma/Texas plains
ecoregion. Proceedings of the Oklahoma Academy of Science 78: 41–48.
Vadas RL, Orth DJ. 2001. Formulation of habitat suitability models for stream fish guilds: do the standard methods work? Transactions of the
American Fisheries Society 130: 217–235.
Vismara R, Azzellino A, Bosi R, Crosa G, Gentili G. 2001. Habitat suitability curves for brown trout (Salmo trutta fario L.) in the River Adda,
northern Italy: comparing univariate and multivariate approaches. Regulated Rivers: Research and Management 17: 37–50.
Wang L, Lyons J, Rasmussen P, Seelbach P, Simon T, Wiley M, Kanehl P, Baker E, Niemela S, Stewart PM. 2003. Watershed, reach, and riparian
influences on stream fish assemblages in the Northern Lakes and Forest Ecoregion, U.S.A. Canadian Journal of Fish and Aquatic Sciences
60(5): 491–505.
Watkins MS, Doherty S, Copp GH. 1997. Microhabitat use by 0þ and older fishes in a small English chalk stream. Fish Biology 50: 1010–1024.
Welker TL, Scarnecchia DL. 2004. Habitat use and population structure of four native minnows (family Cyprinidae) in the upper Missouri and
lower Yellowstone rivers, North Dakota (USA). Ecology of Freshwater Fish 13: 8–22.
Yee TW, Mackenzie M. 2002. Vector generalized additive model in plant ecology. Ecological Modelling 157: 141–156.
Yu S, Lee T. 2002. Habitat preference of the stream fish, Sinogastromyzon Puliensis (Homalopteridae). Zoological Studies 41(2): 183–187.
Zadeh LA. 1965. Fuzzy sets. Information and Control 8: 338–353.

Copyright # 2006 John Wiley & Sons, Ltd. River Res. Applic. 22: 503–523 (2006)

You might also like