Professional Documents
Culture Documents
Multivariate Legendre Etal EM2005
Multivariate Legendre Etal EM2005
Abstract. Robert H. Whittaker defined beta diversity as the variation in species com-
position among sites in a geographic area. Beta diversity is a key concept for understanding
the functioning of ecosystems, for the conservation of biodiversity, and for ecosystem
management. This paper explains how hypotheses about the origin of beta diversity can
be tested by partitioning the spatial variation of community composition data (presence–
absence or abundance data) with respect to environmental variables and spatial base func-
tions. We compare two statistical methods to accomplish that. The sum-of-squares of a
community composition data table, which is one possible measure of beta diversity, is
correctly partitioned by canonical ordination; hence, canonical partitioning produces correct
estimates of the different portions of community composition variation. In recent years,
several authors interested in the variation in community composition among sites (beta
diversity) have used another method, variation partitioning on distance matrices (Mantel
approach). Their results led us to compare the two partitioning approaches, using simulated
data generated under hypotheses about the variation of community composition among
sites. The theoretical developments and simulation results led to the following observations:
(1) the variance of a community composition table is a measure of beta diversity. (2) The
variance of a dissimilarity matrix among sites is not the variance of the community com-
position table nor a measure of beta diversity; hence, partitioning on distance matrices
should not be used to study the variation in community composition among sites. (3) In
all of our simulations, partitioning on distance matrices underestimated the amount of
variation in community composition explained by the raw-data approach, and (4) the tests
of significance had less power than the tests of canonical ordination. Hence, the proper
statistical procedure for partitioning the spatial variation of community composition data
among environmental and spatial components, and for testing hypotheses about the origin
and maintenance of variation in community composition among sites, is canonical parti-
tioning. The Mantel approach is appropriate for testing other hypotheses, such as the var-
iation in beta diversity among groups of sites. Regression on distance matrices is also
appropriate for fitting models to similarity decay plots.
Key words: beta diversity; canonical ordination; community composition; Mantel test; PCNM
analysis; regression on distance matrices; simulation study; spatial variation; variation partitioning.
1) Species composition is uniform over large areas. be analyzed to determine if significant spatial patterns
This hypothesis, which plays the role of a null model, (different from random) are present in the data and if
emphasizes the role of biological interactions. It sug- the environmental variables explain a significant pro-
gests that communities are dominated by a limited suite portion of the community composition variation. In this
of competitively superior species (Pitman et al. 1999, contribution, we explain how hypotheses about the or-
2001); beta diversity is small. igin of beta diversity can be tested by partitioning the
2) Species composition fluctuates in a random, au- spatial variation of community composition (presence–
tocorrelated way. This hypothesis emphasizes spatially absence or abundance data) with respect to environ-
limited dispersal history. Models derived from neutral mental variables and spatial base functions. Next we
theory state that all species are demographically and set out to assess the appropriateness and robustness of
competitively equal. Differences are created through two partitioning approaches used by researchers to as-
spatially limited dispersal of species drawn at random sess the likelihood of different mechanisms structuring
from a metacommunity, plus possibly the appearance beta diversity.
of newly evolved species in different areas. Neutral
models differ in the details of the mechanisms that they BETA DIVERSITY
invoke (Bell 2001, Hubbell 2001, He 2005). Alpha (a) diversity is the diversity in species at in-
3) Species distributions are related to environmental dividual sites (e.g., plots, quadrats), or the variance in
conditions. This hypothesis emphasizes environmental the species identity of the individuals observed at a
control. Landscapes are mosaics where species com- site. A monoculture, for example, has the lowest pos-
position is controlled by environmental site character- sible alpha diversity because there is no variance in the
istics (Whittaker 1956, Bray and Curtis 1957, Hutch- species identity of individuals. It is measured either by
inson 1957, Gentry 1988, Tuomisto et al. 1995). the number of species present at the site (species rich-
Testing these hypotheses is important for under- ness), or by some other function (index) that takes into
Concepts & Synthesis
standing the functioning of ecosystems, for the con- account the relative frequencies of the species. The
servation of biodiversity, and for ecosystem manage- most widely used of such indices are the Shannon
ment. Regarding the establishment of natural reserves, (1948) (or Shannon-Weaver, or Shannon-Wiener) index
e.g., hypothesis 1 implies that all parts of the ecosystem of diversity and measures based on Simpson’s (1949)
are equivalent. Reserves can be located anywhere. Hy- concentration l, which are often referred to as Simp-
pothesis 2 implies that different parts of the ecosystem son’s indices: either (1 2 l) (Greenberg 1956) or 1/l
may, for historical reasons, sustain different species (Hill 1973). The form (1 2 l) is to be preferred be-
compositions, although these parts are environmentally cause, when applied to combined sites (g diversity, next
equivalent. Portions of space supporting the different paragraph), that measure cannot be smaller than the
species compositions should be preserved. Reserves mean of the a indices for the sites that have been com-
must be large, allowing the dynamics to go on without bined (Lande 1996).
many species going extinct. Hypothesis 3 implies that Gamma (g) diversity is that of the whole region of
all parts of the ecosystem are not equivalent. Reserves interest for the study. It is usually measured by pooling
must represent the different types of habitat and each the observations from a sample (in the statistical sense),
portion must be of sufficient size to be sustainable. The i.e., a large number of sites from the area, except in
parts representing favorable dispersion routes must be those rare instances in which the community compo-
especially preserved. sition of the entire area is known. Gamma diversity is
When studying the origin of beta diversity, one must measured using the same indices as alpha diversity.
consider different hypotheses and answer the following Beta (b) diversity is the variation in species com-
questions. (1) Is the variation in species composition position among sites in the geographic area of interest.
among sites random, i.e., devoid of significant spatial If the variation in community composition is random,
pattern? A positive answer will support hypothesis 1. and accompanied by biotic processes (e.g., reproduc-
(2) Is there significant spatial patchiness (different tion) that generate spatial autocorrelation, a gradient
from random) in the distributions of species? A positive in species composition may appear and beta diversity
answer will support hypothesis 2, and possibly hy- can be interpreted in terms of the rate of change, or
pothesis 3 if the environmental variables influencing turnover, in species composition along that gradient. If
species distributions are spatially structured. (3) Can differentiation among sites is due to environmental fac-
the environmental variables explain a significant pro- tors, beta diversity should be analyzed with respect to
portion of the community composition variation? A the hypothesized forcing variables. In many ecosys-
positive answer will support hypothesis 3, which is tems, beta diversity may be caused concurrently by
compatible with hypothesis 2. these two processes in different proportions. In two
Empirical data must be used to determine the like- seminal papers, Whittaker (1960, 1972) showed that
lihood of each hypothesis in different systems and at beta diversity could be measured from either presence–
different spatial scales. To test these hypotheses, the absence or quantitative species abundance data. Ecol-
variation of community composition at many sites must ogists actually use both to study beta diversity. Some
November 2005 ANALYZING BETA DIVERSITY 437
only refer to presence–absence when they talk about in dissimilarity form, by Odum [1950], and by Bray
the rate of species replacement, or turnover, along eco- and Curtis [1957]). In these cases, Whittaker (1972)
logical gradients. In the ordination literature, however, suggested that the mean (not the variance) of the dis-
ecologists most often use species abundances to study similarities among sites should be used as the measure
turnover rates by reference to the appearance and dis- of beta differentiation: ‘‘the mean CC [Jaccard’s co-
appearance of species with unimodal distributions efficient of community] for samples of a set compared
along a gradient. with one another [. . . ] is one expression [of] their
If one is interested in a global interpretation of beta relative dissimilarity, or beta differentiation’’ (Whit-
species diversity within a study area, one can turn to taker 1972:233). This idea had been suggested by Koch
the methods described in the section Analyzing beta (1957: Table III and analysis in the text), who was
diversity. When the goal is to compare different areas, looking for an index of biotal dispersity. Whittaker thus
or different taxonomic groups within the same area, recognized that the dissimilarities are themselves mea-
one may be interested in using a single number sum- sures of the differentiation between or among sites. In
marizing beta diversity. The total sum of squares in the Eq. 1, we will study in more detail the relationship
species composition data at the sampling sites, SS(Y), between dissimilarity matrices and the total sum of
is one such number. The total variance in the data table, squares of community composition data matrices, used
Var(Y) 5 SS(Y)/(n 2 1), would be better to compare in variation partitioning to assess the likelihood of hy-
areas with different numbers of study sites n. The beta potheses 2 and 3.
diversity indices proposed by Whittaker and others are Another approach to measuring a and b diversity is
other such measures. through the methods of ordination. Interrelationships
The first method for obtaining a global measure of between ordination techniques and diversity indices
beta diversity from species presence–absence data, pro- have been known since Gauch and Whittaker (1972).
posed by Whittaker (1960, 1972), is to compute the They were recently reviewed and synthesized by Pél-
species and their abundances. Studying variation in Mantel correlations between distance matrices are
species identity of individuals at a given site is studying known to be much smaller in absolute value than reg-
alpha diversity. (2) Variation in community composi- ular correlation coefficients computed on the same raw
tion among sites: the number, identity, and abundance data (Dutilleul et al. 2000, Legendre 2000). This is thus
of species define a community composition, which may also the case when comparing coefficients of deter-
vary among sites. Studying variation of community mination in regression on distance matrices and in mul-
composition among sites is studying beta diversity and tiple regression or canonical analysis. Canonical anal-
may be done using the raw-data approach. (3) Variation ysis partitions the variation in the species abundance
in beta diversity among groups of sites: if measured data and therefore explains a greater amount of the total
among pairs of sites, beta diversity may vary from pair variation than does the Mantel test, which partitions
to pair. Therefore, studying how and why beta diversity the variation in pairwise dissimilarities among sites.
varies among regions or groups of sites pertains to a Legendre (2000: Table II) reported simulations show-
third level of abstraction, which must not be confused ing that, for two simple variables, the Pearson corre-
with the previous one. The hypotheses arising at this lation, which is the simplest form of the raw-data ap-
level may be addressed using either the distance (i.e., proach, had more power than the Mantel test computed
Mantel) approach or the tests of significance recently on the same data. That paper showed that the Mantel
proposed by Kiflawi and Spencer (2004). This paper test was inappropriate for testing hypotheses concern-
is devoted to the methods that are appropriate for test- ing variation of the raw data, although it may be ap-
ing hypotheses that concern the variation in community propriate for hypotheses concerning variation of dis-
composition among sites, i.e., those that pertain to the tances. Using numerical simulations, we will confirm
second level of abstraction. the last point for multivariate data and compare the
Because of the ease with which spatial relationships statistical power of the two procedures for partitioning
among sampling sites can be incorporated into the anal- the variation of raw data. We will also present our most
Concepts & Synthesis
ysis in the form of a geographic distance matrix, Le- recent thoughts about differentiating the raw-data and
gendre and Troussellier (1988) proposed using partial Mantel approaches.
Mantel analysis to test hypotheses about the relative Two handicaps of the raw-data approach have been
influence of the spatial structure and environmental lifted in recent years: Legendre and Gallagher (2001)
variables on ecological response variables. Population showed how to transform community composition data
geneticists had already successfully experimented with in such a way that distances that are of interest for
the Mantel approach for similar problems (e.g., Smouse community ecology (e.g., the chord, chi-square, and
et al. 1986). A few years later, Borcard et al. (1992) Hellinger distances) would be preserved during canon-
proposed using variation partitioning through partial ical redundancy analysis. Borcard and Legendre
canonical analysis (redundancy analysis [RDA] or ca- (2002), Borcard et al. (2004), and S. Dray, P. Legendre,
nonical correspondence analysis [CCA]) to partition and P. Peres-Neto (unpublished manuscript) developed
community variation among environmental and spatial more complex forms of spatial representation (PCNM
components. At that time, canonical analysis was lim- analysis [principal coordinates of neighbor matrices]
ited to the Euclidean or chi-square distances for re- and related distance-based eigenvector maps) that al-
sponse data. The Mantel approach offered more flex- low ecologists to model spatial structures at multiple
ibility, allowing the use of other types of distance func- spatial scales and incorporate these representations in
tions such as Jaccard or Steinhaus/Bray-Curtis. Also, canonical analyses to answer the questions mentioned
the canonical approach was limited to analyzing spatial in the Introduction.
gradients or broadscale spatial structures modeled by Many authors have used either partial Mantel tests
a polynomial trend-surface function of the geographic (Smouse et al. 1986) or multiple regression on distance
coordinates. The Mantel approach was limited to sta- matrices (Legendre et al. [1994]; computer program,
tistical testing only, whereas canonical analysis pro- Casgrain [2001]), with the objective of analyzing the
vided estimates of the contributions of the response spatial variation of community composition among
and explanatory variables to the canonical relationship, sites, while in fact these methods should be limited to
as well as the visual analysis of patterns in ordination the study of the variation in beta diversity among
biplots. In 1993 and 1998, P. Legendre concluded that groups of sites. Here are examples from the recent lit-
both methods had been useful in enriching our under- erature in which authors used a Mantel approach, often
standing of spatial processes occurring in ecosystems citing Legendre and Legendre (1998) or Fortin and
(Legendre 1993, Legendre and Legendre 1998: Section Gurevitch (1993) as references, although they declared
13.6). In the absence of statistical reasons for choosing that the purpose of their study was the analysis of the
one or the other, he recommended using both approach- variation in community composition among sites. Ex-
es until comparative studies provided criteria for amples of vegetation analyses include: Svenning (1999;
choosing one or the other. Fortin and Gurevitch (1993) tropical palms), Potts et al. (2002; tropical trees), Phil-
also recommended the Mantel approach for vegetation lips et al. (2003; tropical trees), Reynolds and Houle
studies. (2003; Aster). Examples of animal analyses include:
November 2005 ANALYZING BETA DIVERSITY 439
Parris and McCarthy (1999; frog assemblages), So- For simplicity, the development that follows is written
merfield and Gage (2000; macrobenthos communities), in terms of sums of squares instead of variances.
Pusey et al. (2000; stream fish), Decaens and Rossi SS can also be computed as the sum of squared Eu-
(2001; earthworm communities), Orwig et al. (2002; clidean distances (D2hi ) among the row vectors of table
homopteran insects), Spencer et al. (2002; invertebrate Y (sites) divided by the number of sites (n):
O O (y OO
communities), Williams et al. (2003; fish and macro- n p n21 n
1
invertebrates), Mulder et al. (2003; soil nematode com- SS(Y) 5 ij 2 ȳ j ) 2 5 2
Dhi . (1)
munities). The present paper shows that canonical anal- i51 j51 n h51 i5h11
yses of the species abundance data would have pro- This well-known relationship is presented as Theorem
vided more powerful tests of significance than analyses 1 in Legendre and Anderson (1999: Appendix B); it is
based on distance matrices. They also would have pro- also illustrated in Legendre and Legendre (1998: Fig.
duced estimates of the contributions of the response 8.18).
and explanatory variables to the canonical relationship, Consider any real-data example involving two sites
as well as bi- or triplots to illustrate the relationships; only, each one represented by a vector of species pres-
the distance approach does not provide estimates of the ence or abundance data. First, compute the total sum
contribution of individual variables to the overall Man- of squares in that 2 3 p data table after centering the
tel correlation. species on their means (for the two sites only). Com-
Authors have also used Mantel tests to compare seed pute a 2 3 2 dissimilarity matrix among these sites
banks to actual vegetation; examples are Erkkilä (1998) using the Euclidean distance formula. The total sum of
and Jutila (2002). In the same vein, there are papers squares in the 2 3 p data table is equal to the squared
that used Mantel tests to study the concordance (or dissimilarity divided by 2 (2 is the number of sites).
congruence) of communities, comparing the spatial This shows that a single dissimilarity measure contains
patterns of two communities observed at a series of
FIG. 1. Partition of the variation of a response matrix Y between environmental (matrix X) and spatial (matrix W)
explanatory variables. The bold horizontal line, which corresponds to 100% of the variation in Y, is divided into four fractions
labeled [a] to [d]. The figure is adapted from Borcard et al. (1992) and Legendre (1993).
ysis of the dissimilarity matrix produces a rectangular either community dynamics creating spatial patterns,
data table that can replace the original data in further or environmental control, or both. These questions are
analyses (Legendre and Anderson 1999). Some of these particularly well suited for canonical analysis. Com-
dissimilarities (chord, chi-square, Hellinger) can be ob- munity composition is the response data table (Y) in
tained by directly transforming the original data table the analysis, whereas a table of environmental variables
and then computing the Euclidean distance on the trans- and/or some representation of the spatial relationships
formed data (Legendre and Gallagher 2001). Using the forms the explanatory table (X). The representation of
transformed data directly in canonical analyses permits spatial relationships in such analyses is described in
one to relate the species data table to the explanatory the next subsection. As in subsection 1, community
variables of beta diversity. Lessons learned: (1) a dis- composition can be analyzed as presence–absence or
similarity matrix among sites, computed from one of abundance data, raw or transformed.
the functions commonly used by community ecologists,
is an expression of the beta diversity in the area under Partitioning community composition variation among
study. (2) For a dissimilarity matrix computed using a groups of explanatory variables
Concepts & Synthesis
function that is considered appropriate for community Based on the general statistical method of partition-
composition data, the value 1/n SD2ij is equal to SS(Y), ing of the variation of a response variable among two
the sum of squares of the transformed community com- or more explanatory variables, called ‘‘commonality
position data table, which is a measure of beta diversity. analysis’’ by Kerlinger and Pedhazur (1973: Chapter
11) for the univariate case, the method of partitioning
Analyzing beta diversity by clustering or ordination the variation of multivariate community composition
Besides the estimation of the total sum of squares, data among environmental and spatial components,
the variation in community composition among sites through canonical analysis, was developed and de-
(beta diversity) can be analyzed in various ways using scribed by Borcard et al. (1992, 2004), Borcard and
the dissimilarity matrix D. For example, one may de- Legendre (1994, 2002), Legendre and Legendre (1998),
scribe it by looking for geographically compact clusters and Méot et al. (1998). That form of statistical analysis
of similar sites separated by discontinuities, using clus- allows users to decompose the variation found in a
tering or partitioning methods. One may also look for multivariate table of response variables (e.g., com-
gradients using ordination methods. munity composition data) as a function of a set of en-
vironmental variables (in a broad sense, i.e., variables
Testing hypotheses about the origin of beta diversity corresponding to hypotheses related to the environ-
using community composition data mental control model, the biotic control model, or de-
Hypotheses about the origin of beta diversity, such scribing historical dynamics) and a set of spatial var-
as those outlined in the Introduction, can be tested by iables constructed from the geographic coordinates of
using standard forms of linear modeling such as mul- the sites. One can calculate [a 1 b] how much of the
tivariate ANOVA or canonical analysis. Because the variation is explained by the environmental variables,
hypotheses concern the origin of the variation in com- [b 1 c] how much is spatially structured, [b] how much
munity composition among sites, tests of significance of the environmentally explained variation is spatially
must be carried out on the original (or transformed) structured, and [d] how much remains unexplained
community composition data table, not on distance ma- (Fig. 1).
trices. In some instances, one may use a proxy table There are different ways of representing geographic
obtained by principal coordinate analysis of an appro- information in the calculations, e.g., raw geographic
priate form of dissimilarity matrix (db-RDA approach; coordinates of the sampling sites if one wants to fit a
Legendre and Anderson 1999). The main forms of anal- flat (planar) trend surface response to the data, or a
ysis used by ecologists are canonical redundancy anal- polynomial function of the geographic coordinates in
ysis (Rao [1964], who called this method ‘‘principal order to model a wavy or saddle-shaped trend surface
components of instrumental variables’’) and canonical (Legendre 1990). Other types of spatial structures can
correspondence analysis (ter Braak 1986, 1987 a, b). be modeled, e.g., a river network connecting lakes
As stated in the Introduction, hypotheses about the (Magnan et al. 1994). A recent development for mod-
processes that create or maintain beta diversity invoke eling the spatial variation at multiple scales is PCNM
November 2005 ANALYZING BETA DIVERSITY 441
analysis, described by Borcard and Legendre (2002) tion of the variation of the community composition
and Borcard et al. (2004). Other forms of distance- distance matrix (R2).
based eigenvector maps (DBEM) related to PCNM, This method of analysis is inappropriate when the
based on different types of spatial weighting matrices, hypothesis to be tested concerns the raw data collected
are described in S. Dray, P. Legendre, and Peres-Neto to study the origin of variation in species composition
(unpublished manuscript). Another interesting devel- among sites (beta diversity), because it partitions the
opment is the suggestion by Wagner (2004) to take up variation of a dissimilarity matrix SS(D), not the var-
the fractions of variation obtained by Borcard et al. iation SS(Y) of the community composition data table.
(1992) partitioning, and reanalyze them by distance Although the mean of the squared dissimilarities pro-
classes through a multivariate variogram to identify vides a measure of beta diversity (Eq. 1 and upper
patterns corresponding to particular scales. That meth- portion of Eq. 2), the variation (sum of squares) of the
od allows one to further refine and test the hypotheses dissimilarities (lower portion of Eq. 2) does not, be-
stated in our Introduction. Is the among-sites beta di- cause it is not equal to, nor is it a simple function of,
versity spatially structured? Is the among-sites beta di- the variation of the original data, SS(Y):
OOD±
versity related to environmental variables in a scale- n21 n
1
dependent manner? Are the residuals spatially auto- SS(Y) 5 2
hi
correlated and, if so, can this be interpreted as the sig- n h51 i5h11
SS(D) 5 O O (D 2 D
nature of biotic processes or as the indication that n21 n
5 O O D 2
n21 n hi
as shown by Quinghong and Bråkenhielm (1995), An- 2
. (2)
derson and Gribble (1998), Pozzi and Borcard (2001), h51 i5h11
hi
n (n 2 1)/2
FIG. 2. (a) Community composition data table and (b) derived distance matrix (one-complements of Jaccard similarities).
(c) Total variation (sum of squares, SS) in table (a). (d) Total variation computed from matrix (b) for the Jaccard-transformed
data (a). (e) Total SS of the distances in the upper triangle of the distance matrix in (b).
cerning the origin of the variation in beta diversity tance matrices to estimate the relative contributions
among groups of sites (level-3 questions; see Analyzing (R2) of competing hypotheses to the variation of the
Concepts & Synthesis
beta diversity). For example, Tuomisto et al. (2003 b) floristic distances. These procedures are valid for test-
looked for areas of high variability in floristic distances ing hypotheses about the causes of the variation of the
and tried to interpret them. Another example is Hub- distances, such as the predictions of Hubbell’s model,
bell’s neutral model of biodiversity, which predicts that but perhaps not for parameter estimation. In any case,
the shape of the relationship between community sim- they should not be construed as explanations of the
ilarity (based on community composition, i.e., species among-sites variation in species composition, SS(Y),
presence–absence data) and geographic distance de- which lays in the variance of the raw species data.
pends on two parameters, the fundamental biodiversity Unanswered problems remain about the use of mul-
number u and the dispersal rate m. The shape of the tiple regression on distance matrices to address such
curve also depends on the size of the sampling units questions, particularly the lack of independence among
(Hubbell 2001: Chapter 7). the distances. The matrix permutation procedure allows
Following Nekola and White (1999), Condit et al. us to carry out a test of significance that has correct
(2002) and Tuomisto et al. (2003c) presented similarity Type I error, but what is the effect of the lack of in-
decay plots to model the decrease of species compo- dependence on parameter estimation? What does the
sition similarities among sites as a function of geo- coefficient of determination mean, besides a measure
graphic distances, a typical application of the distance- of adjustment of a model to the distances? Can we
based approach. These plots are interpreted like cor- interpret it in terms of a proportion of explained var-
relograms. Models can be fitted to the data, corre- iation, as in ordinary regression analysis? Statistical
sponding to various hypotheses. The coefficient of developments, as well as simulation studies, are needed
determination (R2) indicates the fit, or degree of ad- to determine the validity and the limits of application
justment of the data, to the model corresponding to the of that method for analyzing spatial variation in beta
hypothesis. Tests of significance of this R2 can be com- diversity measures, for example among regions. This
puted using either the Mantel test or statistical pro- procedure will have to be thoroughly validated before
cedures that use Mantel-like permutations, such as the we use it to draw conclusions about complex ecological
partial Mantel test or regression on distance matrices. problems and to design environmental policies from
Another interesting application is that of Tuomisto et these conclusions. We will not speculate any further,
al. (2003a), who wanted to relate reflectance differ- one way or another, in the present paper.
ences, visible in satellite images, to differences in veg-
etation composition, irrespective of the specific vege- SIMULATION STUDY: CANONICAL ANALYSIS
VS . MANTEL T EST
tation, in order to draw rough maps delineating vege-
tation patches in the tropical forest using satellite data. We carried out simulations to empirically assess the
Tuomisto et al. (2003c) used Mantel and partial Man- power of two partitioning methods, canonical and on
tel tests to compare competing hypotheses explaining distance matrices (Mantel approach), for the interpre-
the variation among floristic distances, which is ap- tation of the variation in community composition
propriate. They also used multiple regression on dis- among sites. This is a level-2 question, according to
November 2005 ANALYZING BETA DIVERSITY 443
the nomenclature established at the beginning of the autocorrelation to species j at site i; and «Sij is the
section Analyzing beta diversity. However, we have ‘‘innovation’’ value for species j at site i (normal error).
shown that several authors studying such questions Two groups of species were generated and written to
have used the Mantel approach, which should be re- the species data files: a group of species always un-
served for studying the variation in beta diversity related to the environment and another group possibly
among groups of sites (level-3 question); hence the related to the environmental variables. Environmental
pertinence of the comparison in the simulations re- variable k may contain any combination (controlled by
ported in the present section. Rather than analyzing the list of parameters of the computer run) of the fol-
actual data sets, we chose the simulation approach be- lowing three elements: a deterministic structure, spatial
cause the use of simulations allows us to compare the autocorrelation, and normal ‘‘innovation’’ at each site
outcome of the analysis to ‘‘the truth,’’ which we know simulated by random normal deviates N (0,1).
because we generated it. The analysis of real data sets The preliminary species data generated by Eq. 3 were
is often limited in terms of the number of data sets positive or negative real numbers, fairly symmetrically
available with all the necessary variables; it is also distributed. The following four steps were necessary
limited by the fact that one does not know whether the to transform them into final species-like data. (1) Spe-
null hypothesis H0 (there is no effect of the explanatory cies vectors containing the influence of an environ-
variables on the species data) or the alternative hy- mental variable had means and variances different from
pothesis H1 (there is an effect) is true in any particular the species unrelated to environmental variables. To
case. Monte Carlo simulations allow researchers to give equal importance to all species at this step, each
know the exact relationship between variables in the species vector was standardized. (2) To obtain a com-
data. No doubt exists as to which hypothesis, H0 or H1, bination of species with different mean abundances, a
is true in each particular simulated data set (Milligan normal deviate with mean 0 and standard deviation 1.5
1996). was drawn at random for each species; all values Sij
the first set of species. Five environmental variables (1) a geographic distance matrix D(XY) computed from
were created containing N (0,1) deviates, but they had the X and Y geographic coordinates, (2) a distance ma-
no influence on the species because the transfer param- trix D(polyn.) obtained by computing Euclidean dis-
eter b from environment to species was set to 0. These tances based on the third-order polynomial of the geo-
simulations will allow us to assess the frequency of graphic coordinates described in the previous para-
Type I error when testing the influence of the environ- graph, and (3) a distance matrix ln(D(XY)) obtained by
mental variables on the species data (test of fraction taking the natural logarithm of the values in D(XY).
[a 1 b]). Because that influence was set to b 5 0, a The detailed analysis of one of the sets of data tables
valid test of significance should have a rejection rate used in the simulations is presented in Appendix B for
equal to or smaller than the significance level a of the illustration.
tests. The first and second analyses are comparable for the
Section B.—In the second set of simulations, an in- two methods: the first one uses the X and Y coordinates,
dependently generated environmental variable was raw or in the form of distances, whereas the second
made to influence each of five species with a transfer one uses the third-order polynomial function of X and
parameter b 5 0.5. The environmental variables were Y, raw or in the form of distances. The third analysis
autocorrelated with range of 15 in the two geographic is possibly each method’s most favorable analysis and
directions of the plane; they also contained N (0,1) de- does not have an equivalent in the other method. For
viates. The data generation parameters of the stochastic raw data, PCNM analysis allows the modeling of spa-
simulations were chosen in such a way that the rejection tial relationships at multiple scales and is thus likely
rates in RDA were close to 1. to account for more variation than the two forms of
Section C.—The third set of simulations was similar trend-surface analysis, linear and polynomial. For dis-
to that of section B, with a lower transfer parameter b tance data generated under hypothesis 2 of the Intro-
5 0.25.
Concepts & Synthesis
TABLE 1. Simulation results: rates of rejection of the null hypothesis, H0 (trace of the fraction is 0), at the a 5 5% level
after 1000 simulations.
ation reported in Table C1 are in general agreement than RDA using the polynomial function. Forward se-
with the findings of Dutilleul et al. The coefficients of lection was applied in the case of PCNM analysis, and
determination (R2) obtained by regression on distance resulted in the selection of 4.5 PCNM variables on
matrices are thus clearly not estimates of the propor- average, whereas RDA using the polynomial was al-
tions of variation explained by the explanatory vari- ways conducted with the nine monomials. The adjusted
ables. coefficient of determination, R2adj, is more suitable than
Appendix A shows that the portions of variation es- R2 for comparing regression or canonical analysis re-
timated by canonical analysis are statistically correct sults obtained using different numbers of objects and
estimates of fractions of the community composition explanatory variables. Thus, R2adj would provide a fairer
variation (beta diversity). Hence, regression on dis- representation of the differences in explained variation
tance matrices, applied to data simulated to be at least between sets of simulations. For the first group of sim-
partly similar to the data collected to study the origin ulations, for example (A in Fig. 3 and Table C1), R2adj
of variation in species composition among sites, pro- for the total explained variation [a 1 b 1 c] is 0.0053
duced incorrect partitioning of that variation. for RDA using XY, 0.0200 for RDA using the poly-
The objective of these simulations was to compare nomial, and 0.0742 for RDA using PCNM after forward
RDA to regression on distance matrices, not RDA using selection. In all of our simulations, PCNM analysis
a cubic polynomial function of the geographic coor- always extracted more information from the species
dinates to PCNM analysis. PCNM analysis always pro- data than RDA based upon a cubic polynomial of the
duced a portion of explained variation not much larger geographic coordinates. R2adj could not be applied to the
November 2005 ANALYZING BETA DIVERSITY 447
whole simulation study, however, because we do not multiple regression or RDA. This is slightly higher than
know how to calculate it in the case of the Mantel test. the observed values. Accounting for the effect reported
It is interesting to note that there was not much dif- by Dutilleul et al. (2000) and described in paragraph
ference between the partitioning results obtained for 1 of this subsection, the portion of explained variation
the simulated presence–absence and abundance data. should be 0.01012 5 0.0001, which is slightly lower
2. Table 1A, columns [a 1 b] and [a].—Type I error, than the values reported in the table.
which is the frequency of rejection of the null hypoth-
esis when H0 is true, was correct for both methods. The CONCLUSION
significance level used in the simulation tests, a 5 0.05,
In recent years, several authors who were studying
was inside the 95% confidence intervals of the rejection
the variation in community composition among sites
rates in all cases. Had the species and environmental
(beta diversity) have used variation partitioning on dis-
data sets both been spatially structured, we would have
expected the tests to be too liberal and the rejection tance matrices (Mantel approach) instead of canonical
rates to be higher than a (property shown for partial partitioning. These authors have certainly used parti-
Mantel test by Oden and Sokal 1992, and for corre- tioning on distance matrices in good faith, after dis-
lation and regression analysis by Legendre et al. 2002). covering that partial Mantel tests (Smouse et al. 1986)
3. Table 1.—In all simulations in which an effect or our program for regression on distance matrices
was present (Table 1A, columns [b 1 c] and [c], and (Casgrain 2001) would do the job. Their results trig-
Table 1B–E), canonical analysis rejected the null hy- gered our interest and led us to compare variation par-
pothesis with frequencies much higher than regression titioning done in two different ways, on raw data and
on distance matrices. That was the case, in particular, on distance matrices, for data corresponding to ques-
when comparing each method’s most favorable form tions concerning the variation in community compo-
of spatial analysis, i.e., PCNM analysis in the canonical sition among sites (level-2 questions in Analyzing beta
1993) is simply a form of the Mantel test (Legendre Casgrain, P. 2001. Permute! Version 3.4 user’s manual. Dé-
and Legendre 1998: section 10.5). partement de sciences biologiques, Université de Montréal,
Montreal, Quebec, Canada.
The simulations reported in this paper provide im- Clarke, K. R. 1988. Detecting change in benthic community
portant results regarding the Type I error and power of structure. Pages 131–142 in R. Oger, editor. Proceedings
the permutation test commonly used in canonical anal- of Invited Papers, 14th International Biometric Conference,
ysis when applied to community composition variation. Namur, Belgium. Société Adolphe Quételet, Gembloux,
Belgium.
We also tested the power of PCNMs in spatial analysis Clarke, K. R. 1993. Non-parametric multivariate analyses of
and concluded that they provide a much better repre- changes in community structure. Australian Journal of
sentation of spatial structures than other procedures Ecology 18:117–143.
commonly used by ecologists, such as trend surface Condit, R., N. et al. 2002. Beta-diversity in tropical forest
analysis. trees. Science 295:666–669.
Decaens, T., and J. P. Rossi. 2001. Spatio-temporal structure
Simulated data are always a simplification compared of earthworm community and soil heterogeneity in a trop-
to the complexity of ecological data observed in the ical pasture. Ecography 24:671–682.
field. This is certainly true for the data analyzed in this Deutsch, C. V., and A. G. Journel. 1992. GSLIB: geosta-
paper, despite our efforts to build into them the key tistical software library and user’s guide. Oxford University
Press, New York, New York, USA.
properties necessary for the purpose of comparing the Dutilleul, P., J. D. Stockwell, D. Frigon, and P. Legendre.
two variation partitioning procedures: species data 2000. The Mantel-Pearson paradox: statistical consider-
structure close to those observed in nature, different ations and ecological implications. Journal of Agricultural,
intensities of the relationships between species and en- Biological and Environmental Statistics 5(2):131–150.
vironmental variables (including the absence of such Erkkilä, H. M. J. B. 1998. Seed banks of grazed and ungrazed
Baltic seashore meadows. Journal of Vegetation Science 9:
relationships), spatial structure due to autocorrelation 395–408.
in the species data, and to autocorrelation or deter- Fortin, M.-J., and J. Gurevitch. 1993. Mantel tests: spatial
ministic processes in the environmental data. structure in field experiments. Pages 342–359 in S. M.
Concepts & Synthesis
We hope that this paper will reach the ecologists who Scheiner and J. Gurevitch, editors. Design and analysis of
ecological experiments. Chapman and Hall, New York,
will be producing the environmental studies on which
New York, USA.
world deciders may base scientific management deci- Gauch, H. G., Jr., and R. H. Whittaker. 1972. Coenocline
sions about biodiversity in endangered environments. simulation. Ecology 53:446–451.
Gentry, A. H. 1988. Changes in plant community diversity
ACKNOWLEDGMENTS and floristic composition on environmental and geograph-
This paper has greatly benefited from discussions and de- ical gradients. Annals of the Missouri Botanical Garden 75:
tailed comments by Hanna Tuomisto and Kalle Ruokolainen, 1–34.
and from interesting comments from two anonymous review- Greenberg, J. H. 1956. The measurement of linguistic di-
ers. Access to the computers of the Environnement Scienti- versity. Language 32:109–115.
fique Intégré (ESI) of Université de Montréal for our simu- He, F. 2005. Deriving a neutral model of species abundance
lation work is gratefully acknowledged, with special thanks from fundamental mechanisms of population dynamics.
to Bernard Lorazo. This research was supported by NSERC Functional Ecology 19:187–193.
grant number OGP0007738 to P. Legendre. Hill, M. O. 1973. Diversity and evenness: a unifying notation
and its consequences. Ecology 54:427–432.
LITERATURE CITED Hubbell, S. P. 2001. The unified neutral theory of biodiversity
and biogeography. Princeton University Press, Princeton,
Allan, J. D. 1975. Components of diversity. Oecologia 18: New Jersey, USA.
359–367. Hutchinson, G. E. 1957. Concluding remarks. Cold Spring
Anderson, M. J., and N. A. Gribble. 1998. Partitioning the Harbor Symposia on Quantitative Biology 22:15–427.
variation among spatial, temporal and environmental com- Jutila, H. M. 2002. Seed banks of river delta meadows on
ponents in a multivariate data set. Australian Journal of the west coast of Finland. Annales Botanici Fennici 39:
Ecology 23:158–167. 49–61.
Bell, G. 2001. Neutral macroecology. Science 293:2413– Kerlinger, F. N., and E. J. Pedhazur. 1973. Multiple regression
2418. in behavioral research. Holt, Rinehart and Winston, New
Borcard, D., and P. Legendre. 1994. Environmental control York, New York, USA.
and spatial structure in ecological communities: an example Kiflawi, M., and M. Spencer. 2004. Confidence intervals and
using oribatid mites (Acari, Oribatei). Environmental and hypothesis testing for beta diversity. Ecology 85:2895–
Ecological Statistics 1:37–53. 2900.
Borcard, D., and P. Legendre. 2002. All-scale spatial analysis Koch, L. F. 1957. Index of biotal dispersity. Ecology 38:145–
of ecological data by means of principal coordinates of 148.
neighbour matrices. Ecological Modelling 153:51–68. Lande, R. 1996. Statistics and partitioning of species diver-
Borcard, D., P. Legendre, C. Avois-Jacquet, and H. Tuomisto. sity, and similarity among multiple communities. Oikos 76:
2004. Dissecting the spatial structure of ecological data at 5–13.
multiple scales. Ecology 85:1826–1832. Legendre, P. 1990. Quantitative methods and biogeographic
Borcard, D., P. Legendre, and P. Drapeau. 1992. Partialling analysis. Pages 9–34 in D. J. Garbary and R. G. South,
out the spatial component of ecological variation. Ecology editors. Evolutionary biogeography of the marine algae of
73:1045–1055. the North Atlantic. NATO ASI Series. Volume G 22.
Bray, R. J., and J. T. Curtis. 1957. An ordination of the upland Springer-Verlag, Berlin, Germany.
forest communities of southern Wisconsin. Ecological Legendre, P. 1993. Spatial autocorrelation: trouble or new
Monographs 27:325–349. paradigm? Ecology 74:1659–1673.
November 2005 ANALYZING BETA DIVERSITY 449
Legendre, P. 2000. Comparison of permutation methods for Parris, K. M., and M. A. McCarthy. 1999. What influences
the partial correlation and partial Mantel tests. Journal of the structure of frog assemblages at forest streams? Aus-
Statistical Computation and Simulation 67:37–73. tralian Journal of Ecology 24:495–502.
Legendre, P., and M. J. Anderson. 1999. Distance-based re- Paszkowski, C. A., and W. M. Tonn. 2000. Community con-
dundancy analysis: testing multispecies responses in mul- cordance between the fish and aquatic birds of lakes in
tifactorial ecological experiments. Ecological Monographs northern Alberta, Canada: the relative importance of en-
69:1–24. vironmental and biotic factors. Freshwater Biology 43:
Legendre, P., M. R. T. Dale, M.-J. Fortin, P. Casgrain, and J. 421–437.
Gurevitch. 2004. Effects of spatial structures on the results Pélissier, R., P. Couteron, S. Dray, and D. Sabatier. 2003.
of field experiments. Ecology 85:3202–3214. Consistency between ordination techniques and diversity
Legendre, P., M. R. T. Dale, M.-J. Fortin, J. Gurevitch, M. measurements: two strategies for species occurrence data.
Hohn, and D. Myers. 2002. The consequences of spatial Ecology 84:242–251.
structure for the design and analysis of ecological field Phillips, O. L., P. N. Vargas, A. L. Monteagudo, A. P. Cruz,
surveys. Ecography 25:601–615. M. E. C. Zans, W. G. Sanchez, M. Yli-Halla, and S. Rose.
Legendre, P., and E. D. Gallagher. 2001. Ecologically mean- 2003. Habitat association among Amazonian tree species:
ingful transformations for ordination of species data. Oec- a landscape-scale approach. Journal of Ecology 91:757–
ologia 129:271–280. 775.
Legendre, P., F.-J. Lapointe, and P. Casgrain. 1994. Modeling Pitman, N. C. A., J. Terborgh, M. R. Silman, and P. Nuñez
brain evolution from behavior: a permutational regression V. 1999. Tree species distributions in an upper Amazonian
approach. Evolution 48:1487–1499. forest. Ecology 80:2651–2661.
Legendre, P., and L. Legendre. 1998. Numerical ecology. Pitman, N. C. A., J. W. Terborgh, M. R. Silman, P. Núñez V.,
Second English edition. Elsevier, Amsterdam, The Neth- D. A. Neill, C. E. Cerón, W. A. Palacios, and M. Aulestia.
erlands. 2001. Dominance and distribution of tree species in two
Legendre, P., and M. Troussellier. 1988. Aquatic heterotro- upper Amazonian terra firme forests. Ecology 82:2101–
phic bacteria: modeling in the presence of spatial autocor- 2117.
relation. Limnolology and Oceanography 33:1055–1067. Potts, M. D., P. S. Ashton, L. S. Kaufman, and J. B. Plotkin.
Levins, R. 1968. Evolution in changing environments: some 2002. Habitat patterns in tropical rain forests: a comparison
theoretical explorations. Princeton University Press, of 105 plots in Northwest Borneo. Ecology 83:2782–2797.
ter Braak, C. J. F. 1987a. The analysis of vegetation—en- Tuomisto, H., K. Ruokolainen, R. Kalliola, A. Linna, W. Dan-
vironment relationships by canonical correspondence anal- joy, and Z. Rodriguez. 1995. Dissecting Amazonian bio-
ysis. Vegetatio 69:69–77. diversity. Science 269:63–66.
ter Braak, C. J. F. 1987b. Calibration. Pages 78–90 in R. H. Tuomisto, H., K. Ruokolainen, and M. Yli-Halla. 2003c. Dis-
G. Jongman, C. J. F. ter Braak, and O. F. R. van Tongeren, persal, environment, and floristic variation of western Am-
editors. Data analysis in community and landscape ecology. azonian forests. Science 299:241–244.
Pudoc, Wageningen, The Netherlands. [Reissued in 1995 Veech, J. A., K. S. Summerville, T. O. Crist, and J. C. Gering.
by Cambridge University Press, Cambridge, UK.] 2002. The additive partitioning of species diversity: recent
ter Braak, C. J. F., and A. P. Schaffers. 2004. Co-correspon- revival of an old idea. Oikos 99:3–9.
dence analysis: a new ordination method to relate two com- Whittaker, R. H. 1952. A study of summer foliage insect
munity compositions. Ecology 85:834–846. communities in the Great Smoky Mountains. Ecological
ter Braak, C. J. F., and P. Šmilauer. 1998. CANOCO reference Monographs 22:1–44.
manual and user’s guide to CANOCO for Windows. Soft- Whittaker, R. H. 1956. Vegetation of the Great Smoky Moun-
ware for canonical community ordination. Version 4. Cen- tains. Ecological Monographs 26:1–80.
tre for Biometry, Wageningen, The Netherlands, and Mi- Whittaker, R. H. 1960. Vegetation of the Siskiyou Mountains,
crocomputer Power, Ithaca, New York, USA. Oregon and California. Ecological Monographs 30:279–
Tuomisto, H., A. D. Poulsen, K. Ruokolainen, R. C. Moran, 338.
C. Quintana, J. Celi, and G. Canas. 2003a. Linking floristic Whittaker, R. H. 1972. Evolution and measurement of species
patterns with soil heterogeneity and satellite imagery in diversity. Taxon 21:213–251.
Ecuadorian Amazonia. Ecological Applications 13:352– Williams, L. R., C. M. Taylor, M. L. Warren, and J. A. Clin-
371. genpeel. 2003. Environmental variability, historical con-
Tuomisto, H., K. Ruokolainen, M. Aguilar, and A. Sarmiento. tingency, and the structure of regional fish and macroin-
2003b. Floristic patterns along a 43-km long transect in vertebrate faunas in Ouachita Mountain stream systems.
an Amazonian rain forest. Journal of Ecology 91:743–756. Environmental Biology of Fishes 67:203–216.
APPENDIX A
Concepts & Synthesis
Canonical variation partitioning with statistical details showing that canonical variation partitioning, as per Borcard et al.
(1992), provides a correct partitioning of the variation of a response data table Y, is available in ESA’s Electronic Data
Archive: Ecological Archives M075-017-A1.
APPENDIX B
Partitioning examples with an explanation, a table, and three figures showing the analysis of one of the sets of data tables
used in the simulation study, are available in ESA’s Electronic Data Archive: Ecological Archives M075-017-A2.
APPENDIX C
A table showing simulation results is available in ESA’s Electronic Data Archive: Ecological Archives M075-017-A3.
SUPPLEMENT 1
Data generation and analysis programs used in the simulation study are available in ESA’s Electronic Data Archive:
Ecological Archives M075-017-S1.
SUPPLEMENT 2
Data tables used in the example detailed in Appendix B are available in ESA’s Electronic Data Archive: Ecological Archives
M075-017-S2.