Professional Documents
Culture Documents
Dav LVC-1
Dav LVC-1
adukejyme1990@gmail.com
DEWPNZAQ4Y
Data Collection and Visualization for Exploratory Data Analysis
adukejyme1990@gmail.com
DEWPNZAQ4Y
http://www.carolineuhler.com
This file is meant for personal use by adukejyme1990@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 2 / 28
Overview
Network analysis
Hypothesis testing
adukejyme1990@gmail.com
DEWPNZAQ4Y
V Does mammography speed up detection by enough to matter?
adukejyme1990@gmail.com
DEWPNZAQ4Y
V Does mammography speed up detection by enough to matter?
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
Instead compare the whole treatment group against the whole control
group (i.e., compare the numbers 1.3 versus 2.0)
V Intention-to-treat analysis
63
Death rate from breast cancer in control group: 0.0020 (= 31000 )
39
Death rate from breast cancer in treatment group: 0.0013 (= 31000 )
Is the difference in death rates between the treatment and control group
adukejyme1990@gmail.com
sufficient to establish that mammography reduces the risk of death from
DEWPNZAQ4Y
breast cancer?
0.05
control model:
0.04
binomial distribution with
probability 63/31000
and sample size 31000
0.03
density
0.02
adukejyme1990@gmail.com
DEWPNZAQ4Y significance
level = 5%
0.01
observed # observed #
of deaths with of deaths in
treatment = 39 control = 63
0.00
x−value
adukejyme1990@gmail.com
DEWPNZAQ4Y
measure 100 variables before and after taking the pill: weight, blood
pressure, etc.
adukejyme1990@gmail.com
perform a hypothesis test with a significance level of 5%
DEWPNZAQ4Y
measure 100 variables before and after taking the pill: weight, blood
pressure, etc.
adukejyme1990@gmail.com
perform a hypothesis test with a significance level of 5%
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
Holm-Bonferroni correction:
adukejyme1990@gmail.com
Sort p-values in increasing order: p(1) ≤ · · · ≤ p(m)
DEWPNZAQ4Y
Reject H0 when: (m − i + 1)p(i) ≤ α (more power than Bonferroni)
Holm-Bonferroni correction implies FWER ≤ α
Benjamini-Hochberg correction:
Sort p-values in increasing order: p(1) ≤ · · · ≤ p(m)
Reject H0 when: mp(i) /i ≤ α
This file is meant for personal use by adukejyme1990@gmail.com only.
Benjamini-Hochberg correction implies FDR ≤ α
Sharing or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 15 / 28
Commonly accepted practice
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
Reference: J. is
This file Novembre
meant et
foral.,personal
Genes mirror
usegeography within Europe, Nature 456 (2008).
by adukejyme1990@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 20 / 28
Definition 1: Maximize projection variance
PCA (Version 1): Orthogonal
n×p
directions
Start with centered data X ∈ R
Sharing or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 21 / 28
Definition 2: Minimize projection residuals
PC 1: Straight line with smallest orthogonal distance to all points
PC 1 & PC 2: Plane with with smallest orthogonal distance to all
points
PCA: Intuition in 2d
etc.
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
Person
DEWPNZAQ4YAge Height
(years) (cm)
A 35 190 Close
B 40 190
C 35 160
D 40 160
adukejyme1990@gmail.com
Person
DEWPNZAQ4Y Age Height
(years) (feet)
A 35 6.232
B 40 6.232
C 35 5.248
D 40 5.248
Person
Using covariance will find the variable with largest spread as 1. PC
A
Example
Use correlation, if different units 1: scaled
are compared B
C
4 persons D
Sharing or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 24 / 28
Stochastic neighbor embedding (tSNE)
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
adukejyme1990@gmail.com
DEWPNZAQ4Y
For tSNE:
L. van der Maaten & G. E. Hinton. Visualizing Data using t-SNE.
JMLR, 2008.
This fileG.isE.meant for&personal
Hinton use by
S. T. Roweis. adukejyme1990@gmail.com
Stochastic Neighbor Embedding. only.NIPS,
Sharing2002.
or publishing the contents in part or full is liable for legal action.
Caroline Uhler (MIT) Exploratory Data Analysis Great Learning 28 / 28