Professional Documents
Culture Documents
ESM 244 Lecture 4
ESM 244 Lecture 4
ESM 244 Lecture 4
● PCA continued
● Redundancy analysis
1
Recall: Ordination methods
WHY?
• Because if you have original data with very different variances, those with much
larger variances will disproportionately load the PCAs (this can be units
dependent…e.g. changing from meters to millimeters)
What R gives us:
Standard deviations for
new PCs. Higher SD = more
variance explained in PC.
Generally, the eigenvalues fall off quickly and the cumulative proportions
increase quickly (especially useful for large numbers of initial variables):
Eigenvalues
1 2 3 4 5
# Principal Component
So let’s say we pick the first 3 components to stick with,
since we’ve decided that they explain an acceptable
amount of the total variance :
From: NYC Data Science Academy Higgs Boson Machine Learning Challenge
https://nycdatascience.com/blog/student-works/secretepipeline-higgs-boson-machine-learning-challenge/
We can also visualize contributions of the different initial variables to the PCs
STHDA Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization
http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization#visualizing-
dimension-reduction-analysis-outputs
If the point of ordination methods/dimensional
reduction is to simplify our understanding of
multivariate relationships, then there should also be a
way to visualize that simplified information.
(1) The length of a line (or arrow) indicates variance (longer length =
larger variance)
0° difference = correlation of 1
180° difference = correlation of -1
90° or 270° difference = correlation of 0
INTERPRETING BIPLOT OUTPUTS: POINTS (observations)
(1) The closer points are to each other, the smaller their Euclidian
Distance – meaning they are more similar overall in multivariate
space
(2) Biplots can make it easier to see multivariate groups and outliers
In PC1 direction: SBP, Height,
Weight, Cholesterol vary
similarly (as one increases,
the others increase)
PC2 Direction
PC1 Direction
Biplots (and dimensional reduction in general):
Mohammad Ali Zare Chahouki (2012) Classification and Ordination Methods as a Tool for Analyzing of Plant Communities, Intech Open (online).
PCA: No specification of explanatory variables and
outcome variables...it’s just variables
Site 2
Site 3
Site n
“Redundancy analysis and Pearson correlations also revealed that leaf litter
decomposition (k) varied across the sites according to climatic factors (Figure
7). Specifically, it was positively correlated to growing season length (GSL),
degree-days (DD) and growing season average air temperature (Tair).
Additionally, leaf litter decomposition was related to moisture (negatively) and
temperature (positively) in the topsoil (Tsoil)…Similarly, willow leaf litter k and A
were positively correlated to leaf litter N concentration (N) and negatively
correlated to leaf litter C:N ratio (C:N)…”
Thanks to
Sebastian
Tapia for this
example!