Professional Documents
Culture Documents
Notes
Notes
Brandon Stewart1
Princeton
1
These slides are heavily influenced by Matt Blackwell, Adam Glynn, Justin Grimmer
and Jens Hainmueller. Some illustrations by Shay O’Brien.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 1 / 141
Where We’ve Been and Where We’re Going...
Last Week
I random variables
I joint distributions
This Week
I Monday: Point Estimation
F sampling and sampling distributions
F point estimates
F properties (bias, variance, consistency)
I Wednesday: Interval Estimation
F confidence intervals
F comparing two groups
Next Week
I hypothesis testing
I what is regression?
Long Run
I probability → inference → regression
Questions?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 2 / 141
Where We’ve Been and Where We’re Going...
Inference: given that we observe something in the data, what is our best
guess of what it will be in other samples?
For the last few classes we have talked about probability—that is, if
we knew how the world worked, we are describing what kind of data
we should expect.
Now we want to move the other way. If we have a set of data, can we
estimate the various parts of the probability distributions that we have
talked about. Can we estimate the mean, the variance, the
covariance, etc?
Moving forward this is going to be very important. Why? Because we
are going to want to estimate the population conditional expectation
in regression.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 3 / 141
Primary Goals for This Week
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 4 / 141
An Overview
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 5 / 141
An Overview
Population Distribution
Y ~ ?(μ, σ2)
Estimand /Parameter
μ, σ2
Estimate Sample
ĝ(Y1 = y1,Y2 = y2 , … , YN = yN) (Y1, Y2,…,YN)
Estimator/Statistic
ĝ(Y1, Y2,…,YN)
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 6 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 7 / 141
Populations
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 8 / 141
Population Distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 9 / 141
Population Distribution
For instance, suppose we have a binary r.v. such as intending to vote
for Hillary Clinton (X = 1). Then we might assume that the
population distribution is Bernoulli with unknown probability of
X = 1, θ.
Thus we would have:
f (x; θ) = θx (1 − θ)1−x
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 10 / 141
Using Random Samples to Estimate Population Parameters
where population is a vector that contains the Yi values for all units in the
population.
Our estimators, µb, are functions of Y1 , . . . , Yn and will therefore be random
variables with their own probability distributions.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 11 / 141
Why Samples?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 12 / 141
Estimands, Estimators, and Estimates
The goal of statistical inference is to learn about the unobserved
population distribution, which can be characterized by parameters.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 13 / 141
What Are We Estimating?
Our goal is to learn about the data generating process that generated
the sample
We might assume that the data Y1 , . . . , Yn are i.i.d. draws from the
population distribution with p.m.f. or p.d.f. of a certain form fY ()
indexed by unknown parameters θ
Even without a full probability model we can estimate particular
properties of a distribution such as the mean E [Yi ] = µ or the
variance V [Yi ] = σ 2
An estimator θ̂ of some parameter θ, is a function of the sample
θ̂ = h(Y1 , . . . , Yn ) and thus is a random variable.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 14 / 141
Why Study Estimators?
Two Goals:
1 Inference: How much uncertainty do we have in this estimate?
2 Evaluate Estimators: How do we choose which estimator to use?
We will consider the hypothetical sampling distribution of estimates
we would have obtained if we had drawn repeated samples of size n
from the population.
In real applications, we cannot draw repeated samples, so we attempt
to approximate the sampling distribution (either by resampling or by
mathematical formulas)
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 15 / 141
Sampling Distribution of the Sample Mean
Consider a toy example where we have the population and can construct
sampling distributions.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 16 / 141
Repeated Sampling Procedure
# population data
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 17 / 141
Sampling Distribution of the Sample Mean
2.5
2.0
Density
1.5
1.0
0.5
0.0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 18 / 141
Sampling Distribution of the Sample Standard Deviation
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 19 / 141
Sampling Distribution of the Sample Standard Deviation
3.5
3.0
2.5
2.0
Density
1.5
1.0
0.5
0.0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 20 / 141
Standard Error
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 21 / 141
Bootstrapped Sampling Distributions
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 22 / 141
Southern Responses to Baseline List
# of angering items 0 1 2 3
# of responses 2 37 66 34
# Bootstrapping
BootMeans <- replicate(
10000,
mean(sample(ysam,size=139,replace=TRUE))
)
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 23 / 141
> ysam <- c(rep(0,2),rep(1,37),rep(2,66),rep(3,34))
>
> mean(ysam) # sample mean
[1] 1.949640
>
> BootMeans <- replicate(
+ 10000,
+ mean(sample(ysam,size=n,replace=TRUE))
+ )
>
> sd(BootMeans) # estimated standard error
[1] 0.06234988
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 24 / 141
Bootstrapped Sampling Distribution of the Sample Mean
6
5
4
Density
3
2
1
0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 25 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 26 / 141
Independence for Random Variables
Recall:
fX ,Y (x, y ) = fX (x)fY (y )
Independence implies
fY |X (y |x) = fY (y )
and thus
E [Y |X = x] = E [Y ]
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 27 / 141
Notation for Sampling Distributions
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 28 / 141
Describing the Sampling Distribution for the Mean
We would like a full description of the sampling distribution for the mean,
but it will be useful to separate this description into three parts.
E [X n ]
V [X n ]
?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 29 / 141
Expectation of X n
1X
E [X n ] = E [ Xi ]
n
i=1
=?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 30 / 141
Variance of X n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 31 / 141
Question:
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 32 / 141
What about the “?”
2
X n ∼ N(µ, σn )
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 33 / 141
Bernoulli (Coin Flip) Distribution
12
6
Density
Density
0 2 4 6 8
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
n=5
8
5
6
4
Density
Density
3
4
2
2
1
0
n=30 n=100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 34 / 141
Poisson (Count) Distribution
0.6
0.20
0.4
Density
Density
0.10
0.2
0.00
0.0
0 2 4 6 8 10 12 1 2 3 4 5 6 7
n=5
Density
0.4
0.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5 2.4 2.8 3.2 3.6
n=30 n=100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 35 / 141
Uniform Distribution
1.5
0.8
Density
Density
1.0
0.4
0.5
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
n=2
6
2.0
Density
Density
4
1.0
2
0.0
n=5 n=30
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 36 / 141
Why would this be true?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 37 / 141
The Central Limit Theorem
σ2
X n ∼approx N(µ, )
n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 38 / 141
The Central Limit Theorem: What are we glossing over?
We write this as
p
Xn −
→ a or plim Xn = a.
n→∞
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 39 / 141
The Central Limit Theorem: What are we glossing over?
Definition (Law of Large Numbers)
Let X1 , X2 , . . . , Xn be a sequence of i.i.d. random variables, each with finite mean
µ. Then for all ε > 0,
p
Xn − → µ as n → ∞
or equivalently,
lim Pr |X n − µ| ≥ ε = 0
n→∞
●
●● ●●●●●
●●● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●● ●
●●
●●
●●
●● ● ●
●●●● ●●
●●●●● ●●●
●●
●●
● ●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
● ●●
●●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●
Xbar
●● ●●
●●
●●
●● ● ●
●●●●
●●
●●● ● ●●●
● ●●
●●●
●●
●
● ● ●
0.4
●
●
0.0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 40 / 141
The Central Limit Theorem: What are we glossing over?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 41 / 141
The Central Limit Theorem: What are we glossing over?
Definition (Lindeberg-Lévy Central Limit Theorem)
Let X1 , ..., Xn a sequence of i.i.d. random variables each with mean µ and
variance σ 2 < ∞. Then, for any population distribution of X ,
√ d
→ N (0, σ 2 ).
n(X n − µ) −
√
As n grows, the n-scaled sample mean converges to a normal
random variable.
CLT also implies that the standardized sample mean converges to a
standard normal random variable:
Xn − E Xn Xn − µ d
Zn ≡ q = √ −
→ N (0, 1).
V X σ/ n
n
Note that CLT holds for a random sample from any population
distribution (with finite mean and variance) — what a convenient
result!
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 42 / 141
Question:
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 43 / 141
Point Estimation
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 45 / 141
Why Point Estimation?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 46 / 141
Estimators for µ
Clearly, one of these estimators is better than the other, but how can we
define “better”?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 47 / 141
Age population distribution in blue, sampling distributions in red
0.02 0.04
f(x)
−0.01
0 20 40 60 80 100
~
Sampling Distribution for X4
0.02 0.04
f(x)
−0.01
0 20 40 60 80 100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 48 / 141
Methods of Finding Estimators
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 49 / 141
Desirable Properties of Estimators
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 51 / 141
Properties of Estimators
Estimators are random variables, for which randomness comes from repeated
sampling from the population.
The distribution of an estimator due to repeated sampling is called the sampling
distribution.
The properties of an estimator refer to the characteristics of its sampling
distribution.
Bias(µ̂) = E [µ̂ − E [X ]]
= E [µ̂] − µ
Bias is not the difference between a particular estimate and the parameter.
For example,
Bias(X n ) 6= E [x n − E [X ]]
Bias(µ̂) = 0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 53 / 141
Example: Estimators for Population Mean
Candidate estimators:
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 54 / 141
Bias of Example Estimators
Estimators 1,2, and 4 are unbiased because they get the right answer
on average.
Estimator 3 is biased.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 55 / 141
Age population distribution in blue, sampling distributions in red
0.02 0.04
f(x)
−0.01
0 20 40 60 80 100
~
Sampling Distribution for X4
0.02 0.04
f(x)
−0.01
0 20 40 60 80 100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 56 / 141
2: Efficiency (doesn’t change much sample to sample)
How should we choose among unbiased estimators?
All else equal, we prefer estimators that have a sampling distribution with
smaller variance.
Definition (Efficiency)
If θ̂1 and θ̂2 are two estimators of θ, then θ̂1 is more efficient relative to θ̂2 iff
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 57 / 141
Variance of Example Estimators
Among the unbiased estimators, the sample average has the smallest
variance. This means that Estimator 4 (the sample average) is likely to be
closer to the true value µ, than Estimators 1 and 2.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 58 / 141
Age population distribution in blue, sampling distributions in red
0.02 0.04
f(x)
−0.01
| |
0 20 40 60 80 100
~
Sampling Distribution for X4
0.02 0.04
f(x)
−0.01
| |
0 20 40 60 80 100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 59 / 141
3-4: Asymptotic Evaluations (what happens as sample size
increases)
Unbiasedness and efficiency are finite-sample properties of estimators,
which hold regardless of sample size
Estimators also have asymptotic properties, i.e., the characteristics of
sampling distributions when sample size becomes infinitely large
To define asymptotic properties, consider a sequence of estimators at
increasing sample sizes:
θ̂1 , θ̂2 , ..., θ̂n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 61 / 141
Convergence in Probability
Definition (Convergence in Probability)
A sequence X1 , ..., Xn of random variables converges in probability towards a real
number a if, for all accuracy levels ε > 0,
lim Pr |Xn − a| ≥ ε = 0
n→∞
We write this as
p
Xn −
→ a or plim Xn = a.
n→∞
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 62 / 141
3: Consistency (does it get closer to the right answer as
sample size increases)
Definition
An estimator θn is consistent if the sequence θ1 , ..., θn converges in probability to
the true parameter value θ as sample size n grows to infinity:
p
θn −
→θ or plim θn = θ
n→∞
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 64 / 141
Consistency
0.05
estimator for µ.
0.04
σ2
0.03
X n ∼approx N µ,
f(x)
n
0.02
σ2
0.01
As n increases, n approaches 0.
0.00
n = 125100
−0.01
0 20 40 60 80 100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 65 / 141
Inconsistency
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 66 / 141
Inconsistency
~
Sampling Distribution for X100
0.05
0.04
Consider the median estimator: X̃n =
0.03
median(Y1 , ..., Yn ) Is this estimator
f(x)
0.02
consistent for the expectation?
n = 125100
0.01
0.00
−0.01
0 20 40 60 80 100
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 67 / 141
4: Asymptotic Distribution (known sampling distribution
for large sample size)
For example, we’ve seen that the sampling distribution of the sample mean
converges to the normal distribution.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 68 / 141
Mean Squared Error
How can we choose between an unbiased estimator and a biased, but more
efficient estimator?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 69 / 141
Review and Example
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 70 / 141
Basic Analysis
load("gerber_green_larimer.RData")
## turn turnout variable into a numeric
social$voted <- 1 * (social$voted == "Yes")
neigh.mean <- mean(social$voted[social$treatment == "Neighbors"])
neigh.mean
contr.mean <- mean(social$voted[social$treatment == "Civic Duty"])
contr.mean
neigh.mean - contr.mean
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 71 / 141
Population vs. Sampling Distribution
We want to think about the sampling distribution of the estimator.
Population
Frequency
Distribution
Sampling
Distribution
0
But remember that we only get to see one draw from the sampling
distribution. Thus ideally we want an estimator with good properties.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 72 / 141
Asymptotic Normality
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 73 / 141
Asymptotic Normality
We can plot this to get a feel for it.
1500
Frequency
Observed
Difference
500
0
sampling.dist
Does the observed difference in means seem plausible if there really were
no difference between the two groups in the population?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 74 / 141
Summary of Today
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 75 / 141
Summary of Properties
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 76 / 141
Fun with Hidden Populations
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 77 / 141
Scale-up Estimator
P
yi,T
N̂T = Pi ˆ ×N
i di
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 79 / 141
Network scale-up method
2
N̂T = 10 × 30 = 6
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 79 / 141
If yi,k ∼ Bin(di , Nk /N), then maximum likelihood estimator is
| {z }
basic scale-up model
P
yi,T
N̂T = Pi ×N
ˆ
i di
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 80 / 141
Target population Location Citation
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 81 / 141
Does it Work? Under What Conditions?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 82 / 141
Meta points
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 83 / 141
References
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 84 / 141
Where We’ve Been and Where We’re Going...
Last Week
I random variables
I joint distributions
This Week
I Monday: Point Estimation
F sampling and sampling distributions
F point estimates
F properties (bias, variance, consistency)
I Wednesday: Interval Estimation
F confidence intervals
F comparing two groups
Next Week
I hypothesis testing
I what is regression?
Long Run
I probability → inference → regression
Questions?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 85 / 141
Last Time
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 86 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 87 / 141
What is Interval Estimation?
A point estimator θ̂ estimates a scalar population parameter θ with a
single number.
However, because we are dealing with a random sample, we might
also want to report uncertainty in our estimate.
An interval estimator for θ takes the following form:
[θ̂lower , θ̂upper ]
where θ̂lower and θ̂upper are random quantities that vary from sample
to sample.
The interval represents the range of possible values within which we
estimate the true value of θ to fall.
An interval estimate is a realized value from an interval estimator.
The estimated interval typically forms what we call a confidence
interval, which we will define shortly.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 88 / 141
Normal Population with Known σ 2
Suppose we have an i.i.d. random sample of size n, X1 , ..., Xn , from X ∼ N(µ, 1).
From previous lecture, we know that the sampling distribution of the sample
average is:
X n ∼ N(µ, σ 2 /n) = N(µ, 1/n)
Xn − µ
√ ∼ N(0, 1)
1/ n
This implies
Xn − µ
Pr −1.96 < √ < 1.96 = .95
1/ n
Why?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 89 / 141
CDF of the Standard Normal Distribution
1.0
0.8
0.6
CDF(X)
CDF(1.96) = .975
CDF(−1.96) = .025
0.4
0.2
0.0
−3 −2 −1 0 1 2 3
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 90 / 141
Constructing a Confidence Interval with Known σ 2
So we know that:
Xn − µ
Pr −1.96 < √ < 1.96 = .95
1/ n
Rearranging yields:
√ √
Pr X n − 1.96/ n < µ < X n + 1.96/ n = .95
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 91 / 141
Kuklinski Example
0.30
Suppose the 1,161 respondents in
0.25
the Kuklinski data set were the
0.20
population, with
µ = 42.7 and σ 2 = 257.9.
Density
0.15
If we sampled 100 respondents, the
0.10
sampling distribution of Y 100 is:
0.05
0.00
Y 100 ∼approx N(42.7, 2.579)
0 20 40 60 80 100
Age
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 92 / 141
The standard error of Y
0.25
sampling distribution for Y :
0.20
σ
q
SE (Y ) = V (Y ) = √
n
0.15
Density
0.10
What is the probability that Y falls
within 1.96 SEs of µ?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 93 / 141
Normal Population with Unknown σ 2
In practice, it is rarely the case that we somehow know the true value of
σ2.
Suppose now that we have an i.i.d. random sample of size n X1 , ..., Xn
from X ∼ N(µ, σ 2 ), where σ 2 is unknown. Then, as before,
Xn − µ
X n ∼ N(µ, σ 2 /n) and so √ ∼ N(0, 1).
σ/ n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 94 / 141
Estimators for the Population Variance
Two possible estimators of population variance:
n
2 1X
S0n = (Xi − X n )2
n
i=1
n
2 1 X
S1n = (Xi − X n )2
n−1
i=1
2 n−1 2 2
E [S0n ]= σ and E [S1n ] = σ2
n
2
S1n (unbiased and consistent) is commonly called the sample variance.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 95 / 141
Estimating σ and the SE
c [µ̂] = √S
SE
n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 96 / 141
95% Confidence Intervals
µ
b ∼ c [µ̂])2 )
N(µ, (SE
b−µ
µ ∼ c [µ̂])2 )
N(0, (SE
b−µ
µ
∼ N(0, 1)
Density
SE
c [µ̂]
We know that
!
b−µ
µ −2 0 2
P −1.96 ≤ ≤ 1.96 = 95%
SE
c [µ̂]
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 97 / 141
95% Confidence Intervals
!
b−µ
µ
P −1.96 ≤ ≤ 1.96 = 95%
SE
c [µ̂]
P −1.96SE c [µ̂] ≤ µ
b − µ ≤ 1.96SE
c [µ̂] = 95%
P µb − 1.96SE
c [µ̂] ≤ µ ≤ µ
b + 1.96SE
c [µ̂] = 95%
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 98 / 141
What does this mean?
0 20 40 60 80 100
2) Calculate µ̂ and SE
c [µ̂]:
µ̂ = 43.53 SE
c [µ̂] = 1.555
Age
(40.5, 46.6)
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 99 / 141
What does this mean?
Age
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 100 / 141
Interpreting a Confidence Interval
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 101 / 141
What makes a good confidence interval?
1 The coverage probability: how likely it is that the interval covers the
truth.
2 The length of the confidence interval:
I Infinite intervals (−∞, ∞) have coverage probability 1
I For a probability, a confidence interval of [0, 1] also have coverage
probability 1
I Zero-length intervals, like [Ȳ , Ȳ ], have coverage probability 0
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 102 / 141
Is 95% all there is?
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 103 / 141
Normal PDF
0.4
probability in the tails of the
0.3
standard normal distribution.
When (1 − α) = 0.95, we want to
F(z)
0.2
pick z so that 2.5% of the probability
0.1
is in each tail.
This gives us a value of 1.96 for z.
0.0
−1.96 1.96
−4 −2 0 2 4
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 104 / 141
Normal PDF
0.4
What if we want a 50% confidence
interval?
0.3
When (1 − α) = 0.50, we want to
F(z)
0.2
pick z so that 25% of the probability
is in each tail.
0.1
This gives us a value of 0.67 for z.
0.0
−0.67 0.67
−4 −2 0 2 4
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 105 / 141
(1 − α)% Confidence Intervals
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 106 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 107 / 141
The problem with small samples
If the sample is large enough, then the sampling distribution of the sample
mean follows a normal distribution.
If the sample is large enough, then the sample standard deviation (S) is a
good approximation for the population standard deviation (σ).
When the sample size is small, we need to know something about the
distribution in order to construct confidence intervals with the correct
coverage (because we can’t appeal to the CLT or assume that S is a good
approximation of σ).
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 108 / 141
Canonical Small Sample Example
µ̂ ± 1.96SE
c [µ̂]
Percent alcohol
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 109 / 141
The t distribution
X ∼ N(µ, σ 2 )
If we know σ, then
X −µ
∼ N(0, 1)
√σ
n
X −µ
∼ ??tn−1
√s
n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 110 / 141
The t distribution
95 %
0.4
distribution of X√−µ
s is still
n t1
0.3
Z
density
0.2
estimates of σ improve and extreme
values of X√−µ
s become less likely.
0.1
n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 111 / 141
(1 − α)% Confidence Intervals
!
b−µ
µ
P −tα/2 ≤ ≤ tα/2 = (1 − α)%
SE
c [µ̂]
P µb − tα/2 SE
c [µ̂] ≤ µ ≤ µ c [µ̂] = (1 − α)%
b + tα/2 SE
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 112 / 141
Small Sample Example
µ̂ ± 1.96SE
c [µ̂]
µ̂ ± tα/2 SE
c [µ̂]2.57SE
c [µ̂]
Percent alcohol
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 113 / 141
Another Rationale for the t-Distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 114 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 115 / 141
Kuklinski Example Returns
The Kuklinski et al. (1997) article compares responses to the baseline list
with responses to the treatment list.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 116 / 141
Comparing Two Groups
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 117 / 141
Sampling Distribution for X 1 − X 2
E [X 1 − X 2 ] = E [X 1 ] − E [X 2 ]
1 X 1 X
= E [X1i ] − E [X2j ]
n1 n2
1 X 1 X
= µ1 − µ2
n1 n2
= µ1 − µ2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 118 / 141
Sampling Distribution for X 1 − X 2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 119 / 141
Sampling Distribution for X 1 − X 2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 120 / 141
CIs for µ1 − µ2
Using the same type of argument that we used for the univariate case, we
write a (1 − α)% CI as the following:
s
σ12 σ22
X 1 − X 2 ± zα/2 +
n1 n2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 121 / 141
Interval estimation of the population proportion
Let’s say that we have a sample of iid Bernoulli random variables,
Y1 , . . . , Yn , where each takes Yi = 1 with probability π. Note that this is
also the population proportion of ones. We have shown in previous weeks
that the expectation of one of these variable is just the probability of seeing
a 1: E [Yi ] = π.
The variance of a Bernoulli random variable is a simple function of its mean:
Var(Yi ) = π(1 − π).
Pn
Problem Show that the sample proportion, π̂ = n1 i=1 Yi , of the above iid
Bernoulli sample, is unbiased for the true population proportion, π, and that
the sampling variance is equal to π(1−π)
n .
Note that if we have an estimate of the population proportion, π̂, then we
also have an estimate of the sampling variance: π̂(1−π̂)
n .
Given the facts from the previous problem, we just apply the same logic
from the population mean to show the following confidence interval:
r r !
π̂(1 − π̂) π̂(1 − π̂)
P π̂ − zα/2 × ≤ π ≤ π̂ + zα/2 × = (1 − α)
n n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 122 / 141
Gerber, Green, and Larimer experiment
Let’s go back to the Gerber, Green, and Larimer experiment from last
class. Here are the results of their experiment:
Let’s use what we have learned up until now and the information in
the table to calculate a 95% confidence interval for the difference in
proportions voting between the Neighbors group and the Civic Duty
group.
You may assume that the samples with in each group are iid and the
two samples are independent.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 123 / 141
Calculating the CI for social pressure effect
We know distribution of sample proportion turned among Civic Duty
group π̂C ∼ N(πC , (πC (1 − πC ))/nC )
Sample proportions are just sample means, so we can do difference in
means: q
π̂N − π̂C ∼ N πN − πC , SEN2 + SEC2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 126 / 141
Next Week
Hypothesis testing
What is regression?
Reading
I Aronow and Miller 3.2.2
I Fox Chapter 2: What is Regression Analysis?
I Fox Chapter 5.1 Simple Regression
I Aronow and Miller 4.1.1 (bivariate regression)
I “Momentous Sprint at the 2156 Olympics” by Andrew J Tatem et al.
Nature 2004
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 127 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 128 / 141
Fun with Anscombe’s Quartet
10 11
● ● ● ●
9
● ●
0.82 ●
● ●
8
9
● ●
7
●
●
8
●
y
y
●
6
●
7
0.82
5
●
6
4
5
●
● ●
3
4
4 6 8 10 12 14 4 6 8 10 12 14
x x
10
10
0.82
y
●
●
●
8
● ●
8
● ●
●
● ●
●
● ●
●
0.82
6
●
6
●
● ●
● ●
4 6 8 10 12 14 8 10 12 14 16 18
x x
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 129 / 141
Fun Beyond Correlation
Reshef, D. N.; Reshef, Y. A.; Finucane, H. K.; Grossman, S. R.; McVean, G.;
Turnbaugh, P. J.; Lander, E. S.; Mitzenmacher, M.; Sabeti, P. C. (2011).
”Detecting Novel Associations in Large Data Sets”. Science 334 (6062):
1518-1524.
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 130 / 141
Fun with Correlation Part 2
Enter the Maximal Information Coefficient
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 131 / 141
Fun with Correlation Part 2
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 132 / 141
1 Populations, Sampling, Sampling Distributions
Conceptual
Mathematical
2 Overview of Point Estimation
3 Properties of Estimators
4 Review and Example
5 Fun With Hidden Populations
6 Interval Estimation
7 Large Sample Intervals for a Mean
Simple Example
Kuklinski Example
8 Small Sample Intervals for a Mean
9 Comparing Two Groups
10 Fun With Correlation
11 Appendix: χ2 and t-distribution
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 133 / 141
A Sketch of why the Student t-distribution
Back
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 134 / 141
χ2 Distribution
Suppose Z ∼ Normal(0, 1).
Consider X = Z 2
FX (x) = P(X ≤ x)
= P(Z 2 ≤ x)
√
= P(− x ≤ Z ≤ x)
Z √x
1 z2
= √ √
e − 2 dt
2π − x
√ √
= FZ ( x) − FZ (− x)
∂FX (x) √ 1 √ 1
= fZ ( x) √ + fZ (− x) √
∂x 2 x 2 x
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 135 / 141
Definition
Suppose X is a continuous random variable with X ≥ 0, with pdf
1
f (x) = x n/2−1 e −x/2
2n/2 Γ(n/2)
X ∼ χ2 (n)
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 136 / 141
χ2 Properties
Suppose X ∼ χ2 (n)
" N #
X
E [X ] = E Zi2
i=1
N
X
= E [Zi2 ]
i=1
var(Zi ) = E [Zi2 ] − E [Zi ]2
1 = E [Zi2 ] − 0
E [X ] = N
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 137 / 141
χ2 Properties
Suppose X ∼ χ2 (n)
N
X
var(X ) = var(Zi2 )
i=1
XN
E [Zi4 ] − E [Zi ]2
=
i=1
XN
= (3 − 1) = 2N
i=1
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 138 / 141
Student’s t-Distribution
Definition
Suppose Z ∼ Normal(0, 1) and U ∼ χ2 (n). Define the random variable Y
as,
Z
Y = q
U
n
− n+1
Γ( n+1 ) x2
2
f (x) = √ 2 n 1+
πnΓ( 2 ) n
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 139 / 141
Student’s t-Distribution, Properties
Suppose n = 1, Cauchy distribution
20
15
10
Mean
5
0
Trial
If X ∼ Cauchy(1), then:
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 140 / 141
Student’s t-Distribution, Properties
Stewart (Princeton) Week 3: Learning From Random Samples September 26/28, 2016 141 / 141