ESM 244 Lecture 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

ESM 244: 4

● PCA continued
● Redundancy analysis

1
Recall: Ordination methods

In PCA, the axes (PRINCIPAL COMPONENTS) are chosen based on the


direction of the data with the greatest variance (therefore explaining the
most variance possible using a simplified number of dimensions).
Cartesian Coordinate System

...but we can define it however we want to.


We can redefine our primary axes.
How do we describe our data in this new system?

Eigenvectors and eigenvalues are paired information:

● An eigenvector is the vector used to describe the new coordinate


system (i.e., axes) in PCA, and is a linear* combination of the
original variables

● The eigenvalue is a measure of how much of the total variance in


the multivariate data is explained by an eigenvector
*Remember this...
Example: Let’s say instead of having two variables (x, y), we originally have 3
(age, hours watching TV, hours studying). For conceptual understanding, we’re
going to say that these observations miraculously fall in the general shape of an
ellipsoid (pancake-ish). Each point indicates an observation for a single person.
PCA: New axes are created (linear combinations of the original variables) such
that the first (PC1) is the direction accounting for the most variance in
multivariate data, the second (PC2) accounts for the second most (after PC1
taken into consideration), etc.

If PC1 and PC2 explain most


of the variance in data (see
eigenvalues), then we’d still
be seeing most of the
important things about our
data if we just view on PC1
and PC2...
We’ve gone from 3 dimensions to 2 dimensions that explain the greatest
possible amount of variance. It doesn’t show us everything, but it does show us
a lot about the data in just 2 dimensions…
What did we just do?
Dimensionality Reduction

Converting complex multidimensional data into fewer


dimensions to explain as much about the data as
simply as possible

OK, so that doesn’t seem that cool going from 3 → 2


dimensions...but what if we could go from 15 → 2 dimensions
and still describe 80% of variance in the data? Then that
becomes pretty cool.
Simplified data, loaded as .csv ‘Patients.csv’

Using these data, how many principal components will we get?


Sure you can do it by hand, but…
prcomp() function in R:

What does this scaling term do?

For dataset ‘Patients.csv’:


Scaling data before PCA: You don’t have to do it, but
it’s usually a good idea (advisable).
WHEN?
• When you have variables that are on different SCALES in terms of values (rescales so that
ALL follow z-distribution)
• If you have variables of different UNITS, scaling can be useful

WHY?
• Because if you have original data with very different variances, those with much
larger variances will disproportionately load the PCAs (this can be units
dependent…e.g. changing from meters to millimeters)
What R gives us:
Standard deviations for
new PCs. Higher SD = more
variance explained in PC.

THESE ARE THE EIGENVALUES! Also called “loadings” –


correlation between each variable and the component
Remember how we said that the new components (PCs) are
linear combinations of the original components? That’s what
these give us – eigenvalues for those linear relationships.
But how much of the variance do the PCs actually explain?
These tell us how much
of the total variance is
extracted by EACH PC
(note order)

These tell us the cumulative amount of


variance in data as you increase from the
first PC (PC1) to the final PC (here, PC5)

So what do we choose for the ‘cut-off’ for the proportion of variance


beyond which we say that additional components aren’t so helpful
(i.e., how do we know how many PCs to retain)?
There aren’t really rules about where the cut-off should be. Some say 80% is
pretty good, some say you need to look at the cumulative proportions, some say
you need to look at the eigenvalues…

It’s really a judgment call.

Generally, the eigenvalues fall off quickly and the cumulative proportions
increase quickly (especially useful for large numbers of initial variables):

Cumulative Proportion of Variance

Eigenvalues

1 2 3 4 5
# Principal Component
So let’s say we pick the first 3 components to stick with,
since we’ve decided that they explain an acceptable
amount of the total variance :

What can we learn based on our truncated “model”?


Yes, in this case you might say “This hardly seems worth it to decrease my
dimensions from 5 to 3”…but in some cases you’ll have 50 variables and this
can allow you to reduce it to just a few!
A scree-plot is useful for visualizing PC contributions

From: NYC Data Science Academy Higgs Boson Machine Learning Challenge
https://nycdatascience.com/blog/student-works/secretepipeline-higgs-boson-machine-learning-challenge/
We can also visualize contributions of the different initial variables to the PCs

STHDA Factoextra R Package: Easy Multivariate Data Analyses and Elegant Visualization

http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization#visualizing-
dimension-reduction-analysis-outputs
If the point of ordination methods/dimensional
reduction is to simplify our understanding of
multivariate relationships, then there should also be a
way to visualize that simplified information.

BIPLOTS: an approximation of original


multidimensional space, reduced to 2 dimensions with
information about variables (as vectors) and
observations (as points)
BIPLOT EXAMPLE
A biplot for pca shows two things:

(1) The loading (eigenvalues) of


variables for the first two principal
components (red arrows)

(2) The score of each case


(observations) based on the first
two principal components
(numbered points)
Interpreting biplot outputs: lines (variables)

(1) The length of a line (or arrow) indicates variance (longer length =
larger variance)

(2) The angle between lines (or arrows) indicates strength of


correlation

0° difference = correlation of 1
180° difference = correlation of -1
90° or 270° difference = correlation of 0
INTERPRETING BIPLOT OUTPUTS: POINTS (observations)

(1) The closer points are to each other, the smaller their Euclidian
Distance – meaning they are more similar overall in multivariate
space

(2) Biplots can make it easier to see multivariate groups and outliers
In PC1 direction: SBP, Height,
Weight, Cholesterol vary
similarly (as one increases,
the others increase)
PC2 Direction

Height, weight and


cholesterol are minimally
correlated with Age.

Not observable clusters (no


grouping done)

PC1 Direction
Biplots (and dimensional reduction in general):

- Help us to assess grouping/clustering of multivariate


observations in a simpler way (2-dimensions)

- Visually explore data for interesting relationships, correlations


between variables

- It’s not a hypothesis test or a “model” – but that doesn’t make


it any less valuable!

- Yes, we sacrifice information - but it can be worth the trade-off


for simplification and visualization
What else is there?

Mohammad Ali Zare Chahouki (2012) Classification and Ordination Methods as a Tool for Analyzing of Plant Communities, Intech Open (online).
PCA: No specification of explanatory variables and
outcome variables...it’s just variables

What if we have a scenario where we have


explanatory variables and outcome variables?
Multivariate Approaches
Unconstrained Find maximum variance
Ordination (PCA, components for variables,
nMDS, etc.) distance-based methods
Constrained Find maximum variance
Ordination (RDA, components for dependent
CCA, etc.) variables, explained by
predictor variables
Discrimination Test for significant differences
Methods (MANOVA, in groups
etc.)
Cluster Analysis Find similar groups of
values/families
Redundancy Analysis:

• An extension of MLR with more that one DV

• > 2 IVs, >2 DVs

• Same idea as PCA (constrained because


principal components are combinations of
IVs)

• Similar interpretation, but tells us about


correlations between IVs and DVs
An example of RDA: Exploring leaf litter
decomposition rates

GENERAL DATA STRUCTURE:


Environmental (Independent) Variables Dependent (Outcome) Variables

N Temp C:N Etc k A DLV C % Increase


.
Site 1

Site 2

Site 3

Site n
“Redundancy analysis and Pearson correlations also revealed that leaf litter
decomposition (k) varied across the sites according to climatic factors (Figure
7). Specifically, it was positively correlated to growing season length (GSL),
degree-days (DD) and growing season average air temperature (Tair).
Additionally, leaf litter decomposition was related to moisture (negatively) and
temperature (positively) in the topsoil (Tsoil)…Similarly, willow leaf litter k and A
were positively correlated to leaf litter N concentration (N) and negatively
correlated to leaf litter C:N ratio (C:N)…”
Thanks to
Sebastian
Tapia for this
example!

“We used a redundancy analysis to explore whether certain types of


responses were related to the fishers’ socioeconomic characteristics. Fishers
that would employ amplifying responses had greater economic wealth but
lacked options. Fishers who would adopt dampening responses possessed
characteristics associated with having livelihood options. Fishers who would
adopt neither amplifying nor dampening responses were less likely to belong
to community groups and sold the largest proportion of their catch.”
“Fishers that would employ amplifying responses had greater economic wealth but lacked
options. Fishers who would adopt dampening responses possessed characteristics
associated with having livelihood options. Fishers who would adopt neither amplifying nor
dampening responses were less likely to belong to community groups and sold the
largest proportion of their catch.”

You might also like