Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

ECON 7310 Elements of Econometrics

Lecture 1: Introduction

1 / 43
Introduction

I Ruby Nguyen (Coordinator & Lecturer)


Colin Clark Building (39) Room 625
econ7310@uq.edu.au

I Lectures: Monday, 10:00–11:50 am (in-person and Zoom ID: 85480199600)


Lecturer’s Consultations: Thursday, 16:30–18:30 (Zoom ID: 85480199600)
Tutors' Consultations: Timetable is available on Blackboard
Tutorials: Please sign on to a Tutorial session through MyTimetable

I Lectures: Introduce theories and methods with illustrating examples


I Tutorials: Empirical analysis using R

2 / 43
Learning Resources

I Required Textbook:
Stock, J. and M. Watson (2019) “Introduction to Econometrics" 4th edition,
Pearson.

I Recommended Resource:
Wooldridge, J. (2019) “Introductory Econometrics" 7th edition, Cengage

I As the semester progresses, I will provide lecture slides, tutorial materials, data,
assignments, etc.

I I strongly encourage you to read the relevant parts of the textbook after lecture.

3 / 43
Learning Activities

4 / 43
Assessment

5 / 43
Online Quizzes

I Quizzes are 40-minute online, open book, non-invigilated tests.


I You can complete and submit your quiz any time before the deadline.
I Quizzes are designed to test your understanding of key concepts and methods,
and will not require computation software (e.g., R).

6 / 43
Research Projects

I You will have TWO research projects (assignments). Each contributes 15%
towards your final grade. You will have at least two weeks of time for each
research project.

I You are allowed to work in groups on the project, but you must write up your own
answers in your own words.
I You are required to submit an electronic copy through the course webpage
(Blackboard); see Section 5.3 of ECP for penalties for late submission
I Submission within 1 day (24 hours) after the due date/time => will lose 10% of the whole
project
I For each additional day, you will lose 10% more
I After one week, no marks will be given for the project

7 / 43
Final Problem Set

8 / 43
Brief Overview of the Course

I Economics suggests important relationships, often with policy implications, but


virtually never suggests quantitative magnitudes of causal effects.
I What is the quantitative effect of reducing class size on student achievement?
I How does another year of education change earnings?
I What is the price elasticity of cigarettes?
I What is the effect on output growth of a 1 percentage point increase in interest rates by
the Fed?
I What is the effect on housing prices of environmental improvements?

9 / 43
This course is about using data to measure causal effects.

I Ideally, we would like an experiment


I What would be an experiment to estimate the effect of class size on standardized test
scores?
I But almost always we only have observational (nonexperimental) data.
I returns to education
I cigarette prices
I monetary policy
I Most of the course deals with difficulties arising from using observational data to
estimate causal effects
I confounding effects (omitted factors)
I simultaneous causality
I correlation does not imply causation

10 / 43
In this course you will:

I Learn methods for estimating causal effects using observational data;


I Learn tools that can be used for other purposes; for example, forecasting using
time series data;
I Learn to evaluate the regression analysis of others – this means you will be able
to read/understand empirical economics papers in other economics courses;
I Get hands-on experience with regression analysis in your tutorials;
I Focus on applications – theory is used only as needed to understand the whys of
the methods.

11 / 43
Example: The California Test Score Data Set

I Empirical problem: Class size and educational output


I Policy question: What is the effect on test scores of reducing class size by one
student per class?
I We use data from California school districts (n = 420)
I We observe 5th grade test scores (district average) and student-teacher ratio
(STR)

12 / 43
Initial look at the data

I You should already know how to interpret this table


I This table doesn’t tell us anything about the relationship between test scores and
the student-teacher ratio (STR).

13 / 43
Do districts with smaller classes have higher test scores?

I Scatterplot of test score against student-teacher ratio


I What does this figure show?

14 / 43
We need to get some numerical evidence on whether districts with low
STRs have higher test scores, but how?

1. Compare average test scores in districts with low STRs to those with high STRs
(“estimation")
2. Test the “null" hypothesis that the mean test scores in the two types of districts are
the same, against the “alternative" hypothesis that they differ (“hypothesis testing")
3. Estimate an interval for the difference in the mean test scores, high vs. low STR
districts (“confidence interval")

15 / 43
Initial data analysis:

I Compare districts with “small" (STR < 20) and “large" (STR ≥ 20) class sizes:
I Summary:
Class Size Average score Standard deviation n
(Y ) (SDev (Y ))
Small 657.4 19.4 238
Large 650.0 17.9 182

I We will do the following;


1. Estimation of ∆ = difference between group means
2. Test the hypothesis that ∆ = 0
3. Construct a confidence interval for ∆

16 / 43
1. Estimation

Y small − Y large = 657.4 − 650.0 = 7.4

Is this a large difference in a real-world sense?


I Standard deviation across districts = 19.1
I Difference between the 60th and 75th percentiles of test score distribution is
666.7 − 659.4 = 7.3
I Is this a big enough difference to be important for school reform discussions, for
parents, or for a school committee?

17 / 43
2. Hypothesis testing

I Difference-in-means test: compute the t-statistic,

Ys − Y` Ys − Y`
t= r =
ss2 s`2 SE(Y s − Y ` )
ns
+ n`

where SE(Y s − Y ` ) is the “standard error" of (Y s − Y ` ), the subscripts s and `


refer to “small" and “large" STR districts, and ss and s` are standard deviations of
test scores in “small” and “large” STR districts, respectively.

18 / 43
Compute the t-statistic:

Class Size Average score Standard deviation n


(Y ) (SDev (Y ))
Small 657.4 19.4 238
Large 650.0 17.9 182

Ys − Y` 657.4 − 650.0
t= r = q = 4.05
s`2 19.42 2
ss2
+ 238
+ 17.9
182
ns n`

|t| > 1.96, so reject (at the 5% significance level) the null hypothesis that the two
means are the same.

19 / 43
3. Confidence interval

A 95% confidence interval for the difference between the means is,

(Y s − Y ` ) ± 1.96 × SE(Y s − Y ` ) = 7.4 ± 1.96 × 1.83


= [3.8, 11.0]

Two equivalent statements:


I The 95% confidence interval for ∆ doesn’t include 0;
I The hypothesis that ∆ = 0 is rejected at the 5% level

20 / 43
What comes next?

I The mechanics of estimation, hypothesis testing, and confidence intervals should


be familiar
I These concepts extend directly to regression and its variants
I Before turning to regression, however, we will review some of the underlying
theory of estimation, hypothesis testing, and confidence intervals:

21 / 43
Review random variable

I Informally, a Random Variable (RV) takes on numerical values determined by an


experiment (so, by chance).

I Example: Consider an experiment in which a fair coin is tossed. Then, the


possible outcomes are Head and Tail, i.e., {H, T }. Then, we define a random
variable Y as follows;

0 if the outcome is T
Y =
1 if the outcome is H

I The Y could be either 0 or 1 whenever the coin is tossed. Each time, the
realization of Y , say y , would be different.
I The uncertainty is fully described by the probability function, e.g.,
Pr(Y = 0) = 0.5, Pr(Y = 1) = 0.5, and Pr(Y = y ) = 0 for y ∈ / {0, 1}.

22 / 43
Review random variable

I More generally, a random variable, Y , is fully described by its probability function


P(·), which represents the population: Y ∼ P(·).
I The econometrician does not know the probability function (population), but has a
random sample of size n from the population;
IID
Y1 , Y2 , . . . , Yn ∼ P(·) (1)

where IID stands for ‘identically and independently distributed’.


I Note here that {Yi }s are random. The econometrician actually observes the
realization of the random sample, say y1 , . . . , yn .
I But, using the sample and the fact that (1) is true with unknown P(·), the
econometrician learns important features of P(·), i.e., estimation, and summarizes
uncertainty about what one learns, i.e., inference.

23 / 43
Random Variable some population parameters

I The most important feature might be the expectation of Y or P(·), which is often
called the population mean, denoted by E[Y ] or µY or simply µ.
I The (population) mean measures the central tendency, or long-run average, of Y .
So, it is a widely used location
I Also, the mean is the best predictor of Y in the sense that it minimizes the mean
squared error (MSE),

min E[(Y − a)2 ] (2)


a

I Another very important feature of Y is the (population) variance V (Y ) or σ 2 and


q Y
the (population) standard deviation σY = σY2 . Note that the variance (or
standard deviation) measures the variability (uncertainty) of the random variable
Y.

24 / 43
Random Variable estimation of µY

I The parameters, e.g., (µY , σ 2 ), are generally unknown.


Y
Suppose we want to estimate µY = E[Y ] using

iid
Y1 , . . . , Yn ∼ P(·)

I We can estimate µY by the sample mean of Y1 , . . . , Yn , i.e.,

n
1X
Y n := Yi
n
i=1

Y n is an estimator for µY , often written as Y or, alternatively, µ


bY ,
I Here, Y n is random because Y1 , . . . , Yn are random, each with Yi ∼ P(·),
especially., E[Yi ] = µY and V (Yi ) = σY2 .

25 / 43
Random Variable statistical properties of Y n

I Since Y n is a random variable, it must follow some distribution, which we need to


study the properties of Y n , e.g., how precise and accurate.1
I Without knowing the underlying distribution P(·), however, we cannot calculate the
exact distribution of Y n in general.
I Nevertheless, we can say a few very important facts about the distribution of Y n
only using Yi are iid with E[Yi ] = µY and V (Yi ) = σY2
I It is important that µY and σ 2 explains some part of the population P(·) (so,
Y
population parameters), and the econometrician does not know the values of the
parameters.
I We wish to learn µY using the random sample, Y1 , . . . , Yn with E[Yi ] = µY and
V (Yi ) = σY2

1
A function of the random sample {Y1 , . . . , Yn } is called a statistic, e.g., estimator, test statistic, etc. Since the
random sample is a collection of random variables, statistics are also random. The distribution of a statistic is called
a sampling distribution.
26 / 43
Random Variable unbiasedness of Y n

I First, it is easy to show that


E[Y n ] = µY
I The expectation of the estimator (in this case Y n ) is equal to the parameter to be
estimated (in this case µY ) =⇒ the estimator is said to be unbiased. All
interpretations of expectation apply here:
I The estimator Y n is distributed centred around µY ; Y n equals µY on average
(long-run). So, unbiasedness is good!
I Moreover, one can show that Y n is the solution to the sample analogue of (2), i.e.,
Y n solves
n
1X
min (Yi − a)2
a n
i=1
I Unbiasedness is not the only desirable property for estimators. For example, the
first observation Y1 in the random sample is also an unbiased estimator because
E[Y1 ] = µY .

27 / 43
Random Variable consistency of Y n

I Second, it is also known as


σY2
V (Y n ) =
n
I Note that another unbiased estimator Y1 has variance of V (Y1 ) = σ 2 ≥ V (Y n ).
Y
I When we have two estimators, the one with smaller variance is said to be more
efficient (or more precise). If one estimator has the smallest variance in a class of
estimators, the estimator is said to be efficient.
I The variability (uncertainty) of the estimator Y n disappears as the sample size n
increases. Combining with the unbiasedness of Y n , we are pretty sure that Y n is
close to µY when n is large.
I A related concept: the estimator is consistent to the parameter if it converges (in
probability) to the parameter as n grows, e.g.,
p
Y n −→ µY

28 / 43
Random Variable standard error of Y n

I We can show that the unbiased estimator for σ 2 is given as


Y

n
1 X
bY2 :=
σ (Yi − Y )2
n−1
i=1

p p
bY2 −→ σY2 . Note that n1 ni=1 (Yi − Y )2 −→ σY2 , too
P
and we know that σ
q √
I The standard deviation of Y n is V (Y n ) = σY / n.
I We can consistently estimate the standard deviation of Y n by

σ
bY
SE(Y ) := √
n

which is called the standard error of Y n .

29 / 43
Random Variable asymptotic distribution of Y n

I More generally, the standard error of an estimator is an estimated standard


deviation of the estimator. For example, the standard error of the unbiased
estimator Y1 is simply σ
bY .
I We still do not know the distribution of Y n . However, one can show that

Y n − µY d
−→ N (0, 1) (3)
SE(Y n )

d
where −→ is read as ‘converges in distribution to’.2
I The ratio, (Y n − µY )/SE(Y n ), is asymptotically distributed as N (0, 1), regardless
of the underlying P(·). This is a very powerful result!!

2
To show this result, we will need to review ‘Central Limit Theorem’ and ‘Slutsky Lemma’. See, for example,
pages 728 – 729 (SW).
30 / 43
Random Variable asymptotic distribution of Y n

I Thanks to (3), we can say that if n is large,

Y n − µY approx
∼ N (0, 1)
SE(Y n )
I We have full knowledge about N (0, 1) and we can use this approximate
distribution for hypothesis testing and confidence intervals.
I As a side note, the ratio on LHS is often called the t-ratio because if

iid
Yi ∼ N (µY , σY2 ),

the t-ratio is exactly distributed as the Student t distribution with


degree-of-freedom of n − 1.
I But, this course does not make the additional normality assumption because (1)
essentially no economic data are normally distributed, (2) most economic data are
fairly large, and therefore (3) the approximation is good.

31 / 43
Random Variable asymptotic distribution of Y n
I Yi = 1 with prob. p = 0.78 or Yi = 0 with prob 1 − p. Notice that this distribution
is completely different form the normal distribution.
I Using a random sample of size n, {Y1 , . . . , Yn }, construct the t-ratio, Y n −µY
.
SE(Y n )

32 / 43
Random Variable review of standard normal

I We’ll get back to the t-ratio. But, since it is approximately distributed as the
standard normal distribution N (0, 1), we will first review it.
I Recall that when Z ∼ N (0, 1),

Pr(|Z | ≤ 1.96) = 0.95 or, equivalently, Pr(|Z | > 1.96) = 0.05

I That is, the event that |Z | ≤ 1.96 happens with 95% probability.
But, the event that |Z | > 1.96 happens with 5% probability.

I Similar, but, let’s repeat one more time:


The event that |Z | ≤ 1.96 is likely to happen.
But, the event that |Z | > 1.96 is unlikely to happen.

I Conversely, when the event that |Z | > 1.96 happens, you might suspect whether
Z is really N (0, 1)

33 / 43
Hypothesis Testing

I For example, suppose you believe Z ∼ N (0, 1) but you still don’t know whether
your belief is really true. (just a belief)
I Suppose you observe a realization z = 0.238 (lower case for realizations). Then,
it is likely because |z| < 1.96, and you will be comfortable with the belief.
I Alternatively, suppose you observe z = 4.212 Then, looking at this large number
(large because |z| > 1.96), because it is unlikely, you may think that your belief
Z ∼ N (0, 1) is wrong!
I This is the idea of hypothesis testing.

34 / 43
Hypothesis Testing using t-statistic

I Back to the t-ratio, we know

Y n − µY approx
∼ N (0, 1),
SE(Y n )

but we don’t know the true value of µY .


I So, we make a hypothesis on it. Say,

H0 : µY = µ0Y

with µ0Y is the known value under the null hypothesis H0 , i.e., a value that is
chosen by some theory, convention, etc.
I Under H0 (= if H0 is correct), then, we know that

Y n − µ0Y approx
∼ N (0, 1),
SE(Y n )

35 / 43
Hypothesis Testing using t-statistic

I Now, you observe realizations of the random sample, say y1 , . . . , yn . Then, you
can compute the estimates, y n and SE(y n ), to construct

y n − µ0Y
t :=
SE(y n )
0
y n −µ0Y
I You know that y n −µY < 1.96 is likely if H0 is correct, but > 1.96 is
SE(y n ) SE(y n )
unlikely if H0 is correct.
0
I Therefore, if you find y n −µY > 1.96, you will doubt about H0 .
SE(y n )
The usual story at this point is that you would reject the null, H0 .
I But, before making such a decision, let’s examine carefully what we are actually
doing

36 / 43
Hypothesis Testing type I error, significance level

I Suppose that H0 is really correct (but you still don’t know it).
0
I In this case, there is some chance that you have a sample with y n −µY > 1.96
SE(y n )
and therefore mistakenly reject H0 .
I This error (rejecting H0 when H0 is correct) is called Type I error.3 The probability
of making Type I error is called the (significance) level (or size) of the test.
I In this example, the level of test is 5% and its critical value is 1.96.
0
I Now, suppose we test H0 at 5% level and observe y n −µY > 1.96. Then,
SE(y n )
formally, we say that we reject H0 at 5% (significance) level.
I Here, notice that we explicitly say that we may make a mistake and give some
information on how likely that error will arise.

3
Not rejecting H0 when it is wrong is called Type II error.
37 / 43
Hypothesis Testing using p-value

I If you want to be more conservative, i.e., if you want to avoid type I error more
strongly, you use a smaller level, 1%. When α = 0.01, the critical value is 2.576.
I It is the researcher who chooses the level α for hypothesis testing. Choice of α
depends on the researcher’s personal tolerance against type I error.
I Decision rule: reject H0 if the test statistic is greater than the critical value:
|t| > 1.96 if α = 0.05 or |t| > 2.576 if α = 0.01
I Alternatively we can use p-value. (R will compute p-value).
Decision rule: reject H0 if p-value < α.
I In most cases, it is better to report p-value (than t-ratio), because then the readers
can use their own α to conduct a hypothesis testing.

38 / 43
Hypothesis Testing using confidence intervals

I Another alternatively way of conducting hypothesis testing is to use a confidence


interval. For α = 5%, the (asymptotic) 95% confidence interval is given as

y n ± 1.96 × SE(y n )

I Similarly, if α = 1%, the (asymptotic) 99% CI is

y n ± 2.576 × SE(y n )

I We reject H0 at level α ifµ0Yis outside the (1 − α) × 100% CI.


I Formally, the (1 − α) × 100% CI is the set of parameter values that cannot be
rejected by a (two-sided) test of significance level α.

39 / 43
Covariance and Correlation

I Mostly, we will have more than one random variable and be interested in their
statistical relationship such as statistical association, causal relationship.
I Suppose we have two random variables, say X and Y . The covariance of X and Y
is defined as;
C(X , Y ) = E[(X − E[X ])(Y − E[Y ])]
I If C(X , Y ) > 0, the X and Y move in the same direction.
If C(X , Y ) < 0, they move in the opposite direction.
I The magnitude of C(X , Y ) depends on the units of X and Y . So, a large
covariance does not necessarily mean that X and Y are closely related.4

I So, we need a standardised


p measure
p for statistical association. Notice that it is
always that |C(X , Y )| ≤ V (X ) V (Y ). The correlation coefficient is defined as

C(X , Y )
ρXY := p p ∈ [−1, 1]
V (X ) V (Y )

I |ρXY | = 1 indicates a deterministic relationship between X and Y

4
For example, the covariance of household income and household consumption both measured in cent is 10,000
times bigger than the covariance with the same variables but measured in dollar.
40 / 43
Covariance and Correlation, examples

I If X and Y are independent, C(X , Y ) = 0 so ρXY = 0.


But, the converse is not generally true.

I This is because the covariance measures only the linear relationship between
random variables.

41 / 43
Conditional Expectation

I In econometrics, we are more interested in the conditional expectation, E[Y |X ],


as it is directly related to causal relationship.
I We understand E[Y |X ] by an example. Suppose Y is the individual’s annual
income and X the individual’s years of schooling in a country.
I Then, the marginal expectation E[Y ] predicts the annual income for an average
person in the country.
I But, E[Y |X = 12] predicts the annual income for an average person with a high
school diploma. Similarly, E[Y |X = 16] predicts the annual income for an average
person with a university degree.
I Then, we may interpret E[Y |X = 16] − E[Y |X = 12] as the effect of university
education on annual income (under some conditions).

42 / 43
Regression Function

I More generally, E[Y |X = x] is a function of x, i.e., prediction on Y for those


individuals with X = x. When we do not specify the value of X , E[Y |X ] is random
because X is random.
I Most cases, we are interested in estimating the conditional expectation function
E[Y |X ] using a random sample of {(Y1 , X1 ), . . . , (Yn , Xn )}.
I Another name for E[Y |X ] is regression or regression function
I When we specify E[Y |X ] as a linear function, e.g.,

E[Y |X ] = α + βX ,

it is called a linear regression (function).


I Next week, we will start learning basic methodologies for linear regression.

43 / 43

You might also like