Professional Documents
Culture Documents
6-Testing&conf Intervals PDF
6-Testing&conf Intervals PDF
Introduction to Econometrics
Professor Woodbury
Fall Semester 2015
Experiments and Basic Statistics
1. The probability framework for statistical inference
2. Estimation
3. Hypothesis testing
4. Confidence intervals
3. Hypothesis testing
The hypothesis testing problem (for the mean)
Definition: The statement being tested in a significance test is
called the null hypothesis. The significance test is designed to
assess the strength of the evidence against the null hypothesis.
Usually, the null is a statement of no effect or no difference.
Do mean hourly earnings of recent US college students equal
$20 per hour?
Are mean earnings the same for men and women?
Do students at private colleges take on greater debt than
those who go to publics?
3
But maybe the true mean really is Y,0, (the null is true) but Y
differs from Y,0 because of random sampling (sampling error)
10
11
12
p-value = PrH 0 [| Y
act
|
|
Y
Y ,0
Y ,0
|]
act
Y
where
is the value of Y actually observed (nonrandom)
act
where Y
1/2/3-48
13
= PrH 0 [|
Y ,0
Y
= PrH 0 [|
/ n
Y ,0
Y
| |
Y act
Y
| |
Y ,0
/ n
Y act
Y ,0
|]
|]
probability
under
left+right
probability under
left + right
tails
of N(0,1)N(0,1) tails
where =Y =
ofdev.
the of
distribution
of Y of
= YY / n .
where
Ystd.
/ dev.
= std.
the distribution
1/2/3-49
14
known:
Calculating
the the
p-value
with
Calculating
p-value
with YY known
act
Y,0)/ Y |
16
Estimator
of the
variance
of Y:of Y
Estimator
of the
variance
The sample variance of Y =
1 n
2
s
=
(
Y
Y
)
= sample variance of Y
ample variance of nY 1 i 1 i
2
Y
Fact:
Fact: If (Y1,,Yn) are i.i.d. and E(Y4) < , then
p
2
Y
2
Y
Whydoes
doesthe
thelaw
lawofoflarge
large numbers
numbers apply?
Why
apply?
s apply?
2 2
sYis isa sample
Because
s
Appendix
Y
Because
based onaverage;
a samplesee
(averages
over3.3
observations)
Y =
Y
SE(
)
=
sY2 /(n) = sY /n
18
2
Computing
with Yestimated
estimated
Computingthe
the p-value
p-value with
:
Y
act
|
|
Y
Y ,0
p-value = PrH 0 [| Y
= PrH 0 [|
Y ,0
Y
PrH 0 [|
/ n
Y ,0
sY / n
| |
Y ,0
Y act
Y
| |
|],
Y ,0
/ n
Y act
Y ,0
sY / n
|]
|] (large n)
so
p-value = PrH 0 [| t | | t act |]
2
Y
estimated)
Y ,0
sY / n
19
PrH 0 [|
so
so
Y ,0
sY / n
| |
Y act
Y ,0
sY / n
2
Y
|] (large n)
estimated)
Y ,0
sY / n
1/2/3-52
20
Notes on significance
The significance level of a test is a pre-specified probability of
incorrectly rejecting the null, when the null is true
What is the link between the p-value and the
significance level?
Definition: If the p-value is as small or smaller than , then we
say the estimate is statistically significant at the level
21
22
23
24
25
26
degrees of freedom
(n 1)
10
20
30
60
5% t-distribution
critical value
2.23
2.09
2.04
2.00
1.96
27
28
29
30
31
) is rarely
plausible in practice (income? number of children?)
For n > 30, the t-distribution and N(0,1) are very close as n
grows large, the tn1 distribution converges to N(0,1)
The t-distribution is an artifact from days when sample sizes
were small and computers were people
For historical reasons, statistical software typically uses the tdistribution to compute p-values but this is irrelevant when
the sample size is moderate or large
For these reasons, we will focus on the large-n approximation
given by the CLT
32
2
Y
s = sY /n
33
34
4. Confidence Intervals
Definition: A 95% confidence interval for Y (or any
parameter) is an interval calculated from sample data that contains
the true value of Y in 95 percent of repeated samples
Remember what is random here
The values of Y1,,Yn and thus any functions of them are random
including the confidence interval
So the confidence interval will differ from one sample to the next
The population parameter Y is not random; we just dont
know it
35
Y
Y
{ Y:
sY / n
1.96} = { Y: 1.96
Y
Y
sY / n
1.96}
sY
= { Y: 1.96
n
sY
Y Y 1.96
}
n
sY
sY
= { Y (Y 1.96
, Y + 1.96
)}
n
n
This confidence interval relies on the large-n results that Y is
approximately normally distributed and sY2
2
Y
.
36
1/2/3-62
n
sY
sY
, Y + 1.96
)}
n
n
This confidence
interval relies on the large-n result that
is
s on theapproximately
large-n results
that Y distributed
normally
and
2
Y
ibuted and s
2
Y
is
37
1.96 SE( )}
2.58 SE( )}
1.64 SE( )}
38
Example
In 2012, the mean hourly earnings of men and women who were
college graduates ages 2534 were:
(men) = $25.30
(women) = $21.50
39
Summary
We have two assumptions:
1. Simple random sampling of a population, that is,
{Yi, i =1,,n} are independent and identically distributed
2. 0 < E(Y4) < (so we dont have extreme outliers)
From these, we developed, for large samples (large n):
Question:
Are assumptions (1) & (2) plausible in practice?
Answer:
Yes
41
42
student/class?
Have we answered this question?
1/2/3-64
43