Professional Documents
Culture Documents
Lecture 1 Introduction
Lecture 1 Introduction
Lecture 1: Introduction
1 / 43
Introduction
2 / 43
Learning Resources
I Required Textbook:
Stock, J. and M. Watson (2019) “Introduction to Econometrics" 4th edition,
Pearson.
I Recommended Resource:
Wooldridge, J. (2019) “Introductory Econometrics" 7th edition, Cengage
I As the semester progresses, I will provide lecture slides, tutorial materials, data,
assignments, etc.
I I strongly encourage you to read the relevant parts of the textbook after lecture.
3 / 43
Learning Activities
4 / 43
Assessment
5 / 43
Online Quizzes
6 / 43
Research Projects
I You will have TWO research projects (assignments). Each contributes 15%
towards your final grade. You will have at least two weeks of time for each
research project.
I You are allowed to work in groups on the project, but you must write up your own
answers in your own words.
I You are required to submit an electronic copy through the course webpage
(Blackboard); see Section 5.3 of ECP for penalties for late submission
I Submission within 1 day (24 hours) after the due date/time => will lose 10% of the whole
project
I For each additional day, you will lose 10% more
I After one week, no marks will be given for the project
7 / 43
Final Problem Set
8 / 43
Brief Overview of the Course
9 / 43
This course is about using data to measure causal effects.
10 / 43
In this course you will:
11 / 43
Example: The California Test Score Data Set
12 / 43
Initial look at the data
13 / 43
Do districts with smaller classes have higher test scores?
14 / 43
We need to get some numerical evidence on whether districts with low
STRs have higher test scores, but how?
1. Compare average test scores in districts with low STRs to those with high STRs
(“estimation")
2. Test the “null" hypothesis that the mean test scores in the two types of districts are
the same, against the “alternative" hypothesis that they differ (“hypothesis testing")
3. Estimate an interval for the difference in the mean test scores, high vs. low STR
districts (“confidence interval")
15 / 43
Initial data analysis:
I Compare districts with “small" (STR < 20) and “large" (STR ≥ 20) class sizes:
I Summary:
Class Size Average score Standard deviation n
(Y ) (SDev (Y ))
Small 657.4 19.4 238
Large 650.0 17.9 182
16 / 43
1. Estimation
17 / 43
2. Hypothesis testing
Ys − Y` Ys − Y`
t= r =
ss2 s`2 SE(Y s − Y ` )
ns
+ n`
18 / 43
Compute the t-statistic:
Ys − Y` 657.4 − 650.0
t= r = q = 4.05
s`2 19.42 2
ss2
+ 238
+ 17.9
182
ns n`
|t| > 1.96, so reject (at the 5% significance level) the null hypothesis that the two
means are the same.
19 / 43
3. Confidence interval
A 95% confidence interval for the difference between the means is,
20 / 43
What comes next?
21 / 43
Review random variable
I The Y could be either 0 or 1 whenever the coin is tossed. Each time, the
realization of Y , say y , would be different.
I The uncertainty is fully described by the probability function, e.g.,
Pr(Y = 0) = 0.5, Pr(Y = 1) = 0.5, and Pr(Y = y ) = 0 for y ∈ / {0, 1}.
22 / 43
Review random variable
23 / 43
Random Variable some population parameters
I The most important feature might be the expectation of Y or P(·), which is often
called the population mean, denoted by E[Y ] or µY or simply µ.
I The (population) mean measures the central tendency, or long-run average, of Y .
So, it is a widely used location
I Also, the mean is the best predictor of Y in the sense that it minimizes the mean
squared error (MSE),
24 / 43
Random Variable estimation of µY
iid
Y1 , . . . , Yn ∼ P(·)
n
1X
Y n := Yi
n
i=1
25 / 43
Random Variable statistical properties of Y n
1
A function of the random sample {Y1 , . . . , Yn } is called a statistic, e.g., estimator, test statistic, etc. Since the
random sample is a collection of random variables, statistics are also random. The distribution of a statistic is called
a sampling distribution.
26 / 43
Random Variable unbiasedness of Y n
27 / 43
Random Variable consistency of Y n
28 / 43
Random Variable standard error of Y n
n
1 X
bY2 :=
σ (Yi − Y )2
n−1
i=1
p p
bY2 −→ σY2 . Note that n1 ni=1 (Yi − Y )2 −→ σY2 , too
P
and we know that σ
q √
I The standard deviation of Y n is V (Y n ) = σY / n.
I We can consistently estimate the standard deviation of Y n by
σ
bY
SE(Y ) := √
n
29 / 43
Random Variable asymptotic distribution of Y n
Y n − µY d
−→ N (0, 1) (3)
SE(Y n )
d
where −→ is read as ‘converges in distribution to’.2
I The ratio, (Y n − µY )/SE(Y n ), is asymptotically distributed as N (0, 1), regardless
of the underlying P(·). This is a very powerful result!!
2
To show this result, we will need to review ‘Central Limit Theorem’ and ‘Slutsky Lemma’. See, for example,
pages 728 – 729 (SW).
30 / 43
Random Variable asymptotic distribution of Y n
Y n − µY approx
∼ N (0, 1)
SE(Y n )
I We have full knowledge about N (0, 1) and we can use this approximate
distribution for hypothesis testing and confidence intervals.
I As a side note, the ratio on LHS is often called the t-ratio because if
iid
Yi ∼ N (µY , σY2 ),
31 / 43
Random Variable asymptotic distribution of Y n
I Yi = 1 with prob. p = 0.78 or Yi = 0 with prob 1 − p. Notice that this distribution
is completely different form the normal distribution.
I Using a random sample of size n, {Y1 , . . . , Yn }, construct the t-ratio, Y n −µY
.
SE(Y n )
32 / 43
Random Variable review of standard normal
I We’ll get back to the t-ratio. But, since it is approximately distributed as the
standard normal distribution N (0, 1), we will first review it.
I Recall that when Z ∼ N (0, 1),
I That is, the event that |Z | ≤ 1.96 happens with 95% probability.
But, the event that |Z | > 1.96 happens with 5% probability.
I Conversely, when the event that |Z | > 1.96 happens, you might suspect whether
Z is really N (0, 1)
33 / 43
Hypothesis Testing
I For example, suppose you believe Z ∼ N (0, 1) but you still don’t know whether
your belief is really true. (just a belief)
I Suppose you observe a realization z = 0.238 (lower case for realizations). Then,
it is likely because |z| < 1.96, and you will be comfortable with the belief.
I Alternatively, suppose you observe z = 4.212 Then, looking at this large number
(large because |z| > 1.96), because it is unlikely, you may think that your belief
Z ∼ N (0, 1) is wrong!
I This is the idea of hypothesis testing.
34 / 43
Hypothesis Testing using t-statistic
Y n − µY approx
∼ N (0, 1),
SE(Y n )
H0 : µY = µ0Y
with µ0Y is the known value under the null hypothesis H0 , i.e., a value that is
chosen by some theory, convention, etc.
I Under H0 (= if H0 is correct), then, we know that
Y n − µ0Y approx
∼ N (0, 1),
SE(Y n )
35 / 43
Hypothesis Testing using t-statistic
I Now, you observe realizations of the random sample, say y1 , . . . , yn . Then, you
can compute the estimates, y n and SE(y n ), to construct
y n − µ0Y
t :=
SE(y n )
0
y n −µ0Y
I You know that y n −µY < 1.96 is likely if H0 is correct, but > 1.96 is
SE(y n ) SE(y n )
unlikely if H0 is correct.
0
I Therefore, if you find y n −µY > 1.96, you will doubt about H0 .
SE(y n )
The usual story at this point is that you would reject the null, H0 .
I But, before making such a decision, let’s examine carefully what we are actually
doing
36 / 43
Hypothesis Testing type I error, significance level
I Suppose that H0 is really correct (but you still don’t know it).
0
I In this case, there is some chance that you have a sample with y n −µY > 1.96
SE(y n )
and therefore mistakenly reject H0 .
I This error (rejecting H0 when H0 is correct) is called Type I error.3 The probability
of making Type I error is called the (significance) level (or size) of the test.
I In this example, the level of test is 5% and its critical value is 1.96.
0
I Now, suppose we test H0 at 5% level and observe y n −µY > 1.96. Then,
SE(y n )
formally, we say that we reject H0 at 5% (significance) level.
I Here, notice that we explicitly say that we may make a mistake and give some
information on how likely that error will arise.
3
Not rejecting H0 when it is wrong is called Type II error.
37 / 43
Hypothesis Testing using p-value
I If you want to be more conservative, i.e., if you want to avoid type I error more
strongly, you use a smaller level, 1%. When α = 0.01, the critical value is 2.576.
I It is the researcher who chooses the level α for hypothesis testing. Choice of α
depends on the researcher’s personal tolerance against type I error.
I Decision rule: reject H0 if the test statistic is greater than the critical value:
|t| > 1.96 if α = 0.05 or |t| > 2.576 if α = 0.01
I Alternatively we can use p-value. (R will compute p-value).
Decision rule: reject H0 if p-value < α.
I In most cases, it is better to report p-value (than t-ratio), because then the readers
can use their own α to conduct a hypothesis testing.
38 / 43
Hypothesis Testing using confidence intervals
y n ± 1.96 × SE(y n )
y n ± 2.576 × SE(y n )
39 / 43
Covariance and Correlation
I Mostly, we will have more than one random variable and be interested in their
statistical relationship such as statistical association, causal relationship.
I Suppose we have two random variables, say X and Y . The covariance of X and Y
is defined as;
C(X , Y ) = E[(X − E[X ])(Y − E[Y ])]
I If C(X , Y ) > 0, the X and Y move in the same direction.
If C(X , Y ) < 0, they move in the opposite direction.
I The magnitude of C(X , Y ) depends on the units of X and Y . So, a large
covariance does not necessarily mean that X and Y are closely related.4
C(X , Y )
ρXY := p p ∈ [−1, 1]
V (X ) V (Y )
4
For example, the covariance of household income and household consumption both measured in cent is 10,000
times bigger than the covariance with the same variables but measured in dollar.
40 / 43
Covariance and Correlation, examples
I This is because the covariance measures only the linear relationship between
random variables.
41 / 43
Conditional Expectation
42 / 43
Regression Function
E[Y |X ] = α + βX ,
43 / 43