Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Shape of data

4/25
Today
1. Motivating neuroscience problem
2. Extended example: 2-dimensional data
3. Generalizing, PCA in full glory
4. Example
Motivating neuroscience problem
Motivating neuroscience problem

Question: How to obtain a compressed, quantitative description of the data?


Extended example: 2-dimensional data
Extended example: 2-dimensional data
• Come up with a heuristic for compressing data
• Turn this into a math problem
• Solve
A heuristic for compressing data
…a compressed, quantitative description of the data

1. Quantifying Shape 2. Throw shortdis


away

keep long
Length (v)
directions

un
PCA
1. Writing down Last(p)
1. Projectonto o jestx.p, px,
x
xp
w/ Variance
it(Xip"
2. Measure length
Length (p) =

3.
(Simpirs) (p.xi) =(pxi)(xi4) covariance
*

px,x,p
= -

matrix

f
-

Lest(p) 1E.9*xxp
=

p") xxi)p
=
p2p
=

nxx
e
Din
2. Quantitative criterion

pCp
mx
St. 11P11 1
=
Solving our math problem
mix
pCp 2 p(p =
-

xpp
St.
4p 1=

6 (p x
22p 2xp
=
-
=
=

Lerst(P) pCp =

xpip
=
=

x
covariance
matrix length

C xP
P
-
-

/ /orga
direction
More directions? More eigenvectors!

cis
·
ji =

C
Compression vs approximation
Summarizing
1. Get the length function
1. Compute the covariance matrix C

2. Length (variance) is then just

2. Find k longest directions

3. Compress
or
Approximate
PCA: a recipe for reducing data to k dimensions
1. Get the length function
1. Compute the covariance matrix C =
x, x,
i 1
=

2. Length (variance) is then just


pYCp
X,
2. Find k longest directions Vi
2p xp
=

X Ve

3. Compress
-

- -

or
Approximate
An example: SNPs of 3,000 Europeans
An example: SNPs of 3,000 Europeans

You might also like