Professional Documents
Culture Documents
1.3. Principal Coordinate Analysis: Pierre Legendre Département de Sciences Biologiques Université de Montréal
1.3. Principal Coordinate Analysis: Pierre Legendre Département de Sciences Biologiques Université de Montréal
Pierre Legendre
Département de sciences biologiques
Université de Montréal
http://www.NumericalEcology.com/
1 Nonmetric D indices, not used in ecology, can produce –Inf values due to division by 0.
The objective of PCoA is
• to represent the points in a full-dimensional Euclidean space
• that allows one to fully reconstruct the original dissimilarities
among the objects.
Applications of PCoA
• Produce ordinations of the objects in reduced 2-D (or
sometimes 3-D) space.
• Act as a data transformation after computation of an
appropriately chosen dissimilarity measure. The coordinates of
the objects in full-dimensional PCoA space represent the
transformed data. They can be used as input to RDA or other
methods of data analysis.
Definitions
• Euclidean space: a space in which the distance among points is
measured by the Euclidean distance (Pythagora’s formula).
Synonym: Cartesian space.
• Euclidean property: a dissimilarity coefficient is Euclidean if
any resulting dissimilarity matrix can be fully represented in a
Euclidean space without distortion (Gower & Legendre 1986).
Criterion: Principal coordinate analysis (PCoA) of such a
dissimilarity matrix does not produce negative eigenvalues.
⎡ 2 1 ⎤
Example – ⎢ ⎥
⎢ 3 4 ⎥
The same data as for PCA: Y = ⎢ 5 0 ⎥
⎢ 7 6 ⎥
⎢ ⎥
⎢⎣ 9 2 ⎥⎦
3
4 2
⎡ 3.578 0.000 ⎤
2
⎢ ⎥ 2.61
3.85
1
⎢ 1.342 2.236 ⎥ 3.58
Axis.2
1
0
Pr.coo = ⎢ 1.342 −2.236 ⎥
-1
⎢ −3.130 2.236 ⎥
-2
⎢ ⎥ 5 3
⎢⎣ −3.130 −2.236 ⎥⎦
-3
-4 -2 0 2 4
Observation – Axis.1
In the centred matrix G, the ⎡ 12.8 4.8 4.8 −11.2 −11.2 ⎤
diagonal values are the ⎢ ⎥
⎢ 4.8 6.8 −3.2 0.8 −9.2 ⎥
squares of the distances of the G = ⎢ 4.8 −3.2 6.8 −9.2 0.8 ⎥
points to the origin in the ⎢ −11.2 0.8 −9.2 14.8 4.8 ⎥
⎢ ⎥
PCoA plot. ⎢⎣ −11.2 −9.2 0.8 4.8 14.8 ⎥⎦
Ex. sqrt(12.8) = 3.58
sqrt( 6.8) = 2.61
sqrt(14.8) = 3.85
The dissimilarities D are fully reconstructed by the principal coordinates
Can we verify that property? (Demonstrated by Gower, 1966)
3
4 2
⎡ 3.578 0.000 ⎤
2
⎢ ⎥
1
⎢ 1.342 2.236 ⎥
Axis.2
1
0
Pr.coo = ⎢ 1.342 −2.236 ⎥
-1
⎢ −3.130 2.236 ⎥
-2
⎢ ⎥ 5 3
⎢⎣ −3.130 −2.236 ⎥⎦
-3
-4 -2 0 2 4
Axis.1
3
• Axes may be inverted; 4 2
2
no consequence for interpretation.
1
Axis.2
1
0
• The distances among the points
-1
in the PCA and PCoA plots are the
-2
5 3
same.
-3
Reason:
-4 -2 0 2 4
The Euclidean distance was
Axis.1
computed among the objects
before the PCoA. PCA biplot
If another dissimilarity function
had been computed, the distances
among objects would not be the
same in the PCoA plot (other D)
and the PCA biplot (Euclidean D).
John C. Gower
Metric property
The attributes of a metric dissimilarity are the following 1:
1. Minimum 0: if a = b, then D(a, b) = 0;
2. Positiveness: if a ≠ b, then D(a, b) > 0;
3. Symmetry: D(a, b) = D(b, a);
4. Triangle inequality: D(a, b) + D(b, c) ≥ D(a, c).
The sum of two sides of a triangle drawn in ordinary Euclidean space
is equal to or larger than the third side. b
a c
1 Attributes
also described in the course on dissimilarity functions.
Note: A metric dissimilarity function is also called a distance.
Principal coordinate analysis
Some of the D functions used to analyse community composition data
are semimetrics1, meaning that the triangle inequality does not always
hold in D matrices computed with these functions.
Examples: percentage difference (aka Bray-Curtis dissimilarity),
Whittaker and Kulczynski indices, and the one-complements of some
widely used binary coefficients (Sørensen, Ochiai, etc.)
?
D(1,3) = 0.341 Site3 D(3,5) = 0.195
Site1 Site5
D(1,5) = 0.583
is.euclid(D.mat)
[1] FALSE # The D matrix is not Euclidean
sqrt(D)
Site3
D(1,3) = 0.584 D(3,5) = 0.442
Site1 Site5
D(1,5) = 0.764
1.0
0.00 0.0
0.01 0.1 Steep Slow
0.8
0.04 0.2 increase increase
0.09 0.3
0.6
sqrt(D)
0.16 0.4
0.4
0.25 0.5
0.36 0.6
0.2
0.49 0.7
0.64 0.8
0.0
0.81 0.9
0.0 0.2 0.4 0.6 0.8 1.0
1.00 1.0
D
res2$values$Eigenvalues
3.0
2.0
Eigenvalues
2.0
1.0
1.0
0.0
0.0
0 5 10 15 20 25 0 5 10 15 20 25
Index Index
Constants c1 and c2 increase the small D values relatively more than large ones.
Details: see Gower & Legendre (1986); Legendre & Legendre (2012, Sect. 9.3.4).
Non-Euclidean Euclidean solutions Euclidean solution
D matrix in (n – 2) = 2 dimensions in (n – 1) = 3 dimensions
x3
Lingoes
Original correction
matrix D
D̂hi = Dhi2 + 2c1
x3 sqrt(D)
x4
.49
9 x4
0
D=
4
0 .61 x3
x1 D = 0.864 x2 D=
x4? x1
.377
=0
D x3 D=
0.89
x1 x2 4
D = 0.8
Cailliez
correction x2
D̂hi = Dhi + c2
x4
8
0.57
D=
x1 D = 1.001 x2
Constants c1 and c2 increase the small D values relatively more than the large ones.
Example: spider data, n = 5 sites, p = 7 species
spiders[1:5,1:7]
is.euclid(D.mat)
[1] FALSE
0.4
0.3 1
1
0.2
0.1
2
4 2
Axis.2
Axis.2
4
0.0
-0.1
3
3
-0.2
5
-0.3
-0.4
-0.4 -0.2 0.0 0.2 0.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Axis.1 Axis.1
PCoA, Lingoes correction PCoA, Cailliez correction
0.4
1 1
0.2
0.2
2 2
4 4
Axis.2
Axis.2
0.0
0.0
3 3
-0.2
-0.2
5 5
-0.4
-0.4
-0.4 -0.2 0.0 0.2 0.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
Axis.1 Axis.1
PCoA biplots
0.4
Site 1
3
Alop.acce
2
0.2
1
Site 2
Site 4
0.0
0
Axis.2
Aulo.albi Alop.fabr
Pard.lugu Site 3
-1
Alop.cune
-0.2
Arct.lute
-2
Site 5
-3
-0.4
Axis.1
A full biplot example: PCoA ordination of the spider data.
6
0.4
Site
Site
Site20
19
15
18
Site 17 21
Site
Site 16
4
0.2
Site 25 Site 22
23
Site
Site28
24
2
Pard.lugu Site 26 Site 27
Site 8
Alop.fabr
Axis.2
Arct.peri
Site 9 Site 10
0.0
0
Site 11
Alop.acce
Site 12
Arct.lute
-2
-0.2
Troc.terr Zora.spin
Alop.cune
Pard.mont
Aulo.albi
-4
Site 2 Site 1
Site 6
Site 3
Pard.nigr
-0.4
SiteSite
14 5
Site Site
4 137
Pard.pull
-6
Site
Axis.1
What proportion of the species variance captured by the
percentage difference index is expressed on the first PCoA axes?
Examine the eigenvalue table (the first 4 and last lines are shown).
res2$values
Eigenvalues Relative_eig Broken_stick Cumul_eig Cumul_br_stick
1 2.7899889 0.34942841 0.14412803 0.3494284 0.1441280
2 1.9004365 0.23801761 0.10709099 0.5874460 0.2512190
3 0.7469060 0.09354524 0.08857247 0.6809913 0.3397915
4 0.3514708 0.04401949 0.07622679 0.7250107 0.4160183
[… ]
27 0.0184412 0.00230964 0.00137174 1.0000000 1.0000000
0.5
Pard.pull Pard.mont
Pard.nigr
Arct.lute
Aulo.albi
Alop.cune Alop.acce
0.0
Axis.2
Zora.spin
Alop.fabr
Troc.terr Arct.peri
-0.5
Pard.lugu
-1.0
Axis.1
Borcard, D., F. Gillet & P. Legendre. 2018. Numerical ecology with R, 2nd edition.
Use R! series, Springer Science, New York.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in
multivariate analysis. Biometrika 53: 325-338.
Gower, J. C. 1982. Euclidean distance geometry. Mathematical Scientist 7: 1-14.
Gower, J. C. & P. Legendre. 1986. Metric and Euclidean properties of dissimilarity
coefficients. Journal of Classification 3: 5-48.
Legendre, P. & L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier
Science BV, Amsterdam. xvi + 990 pp. ISBN-13: 978-0444538680.
Ordination of sites
(c) Distance-based approach (PCoA)
Representation of elements:
Sites = symbols
Constrained
© Legendre & Legendreordination of species data
2012, Figure 9.8.
(d) Classical approach
Reference
Legendre, P. & L. Legendre. 2012. Numerical ecology, 3rd English edition. Elsevier
Science BV, Amsterdam. xvi + 990 pp. ISBN-13: 978-0444538680.