Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

Econ 131 – Quantitative Economics

Lecture 2.1: Simple regression model


Assumptions on the population
regression function
A simple assumption
• The average value of 𝑢, the error term, in the population
is 0. That is,

𝐸(𝑢) = 0
Zero conditional mean
• We need to make a crucial assumption about how
𝑢 and 𝑥 are related
• We want it to be the case that knowing something
about 𝑥 does not give us any information about 𝑢,
so that they are completely unrelated. That is, that
𝐸 𝑢 𝑥) = 𝐸 𝑢 = 0
Zero conditional mean
𝐸 𝑢 𝑥) = 𝐸 𝑢 = 0, which implies

𝐸 𝑦 𝑥) = 𝛽0 + 𝛽1 𝑥
Zero conditional mean
𝐸 𝑢 𝑥) = 𝐸 𝑢 = 0, which implies

𝐸 𝑦 𝑥) = 𝛽0 + 𝛽1 𝑥
Show!
Let 𝑦 = 𝛽0 + 𝛽1 𝑥 + 𝑢
𝐸 𝑦 𝑥 = 𝐸 𝛽0 + 𝛽1 𝑥 + 𝑢 𝑥
= 𝐸 𝛽0 𝑥 + 𝛽1 𝐸 𝑥 𝑥 + 𝐸 𝑢 𝑥
= 𝛽0 + 𝛽1 𝑥
Zero conditional mean
Zero Conditional Mean
Assumption: Implications
Population regression function (PRF)

𝐸 𝑦 𝑥) = 𝛽0 + 𝛽1 𝑥

• 𝐸 𝑦 𝑥) is a linear function of 𝑥.
• The linearity means that a one-unit increase in 𝑥 changes
the expected value of 𝑦 by the amount 𝛽1 .
• For any given value of 𝑥, the distribution of 𝑦 is centered
about 𝐸 𝑦 𝑥).
PRF and the zero conditional mean
y
f(y) E(y|x) = b0 + b1x
𝐸 𝑦 𝑥) as a linear
function of 𝑥, where for
any 𝑥 the distribution of .E(y|x2)
𝑦 is centered about .
𝐸 𝑦 𝑥). E(y|x1)

x1 x2 x
Population regression line and data points
y 𝑦4 = 𝛽0 + 𝛽1𝑥4 + 𝑢4
𝒚 = 𝜷𝟎 + 𝜷𝟏𝒙
y4 {
. or
u4 𝑬 𝒚 𝒙) = 𝜷𝟎 + 𝜷𝟏𝒙

y3 𝑦2 = 𝛽0 + 𝛽1𝑥2 + 𝑢2
.} u3 𝑦3 = 𝛽0 + 𝛽1𝑥3 + 𝑢3
y2 u{ .
2

y1 .} u1 𝑦1 = 𝛽0 + 𝛽1𝑥1 + 𝑢1

x1 x2 x3 x4 x
Independence of 𝒙 and 𝒖
• To derive the OLS estimates we need to realize that
our main assumption of 𝐸 𝑢 𝑥) = 𝐸(𝑢) = 0 also
implies that

𝐶𝑜𝑣(𝑥, 𝑢) = 𝐸(𝑥𝑢) = 0

• Remember from basic probability theory that


𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸(𝑋𝑌) – 𝐸(𝑋)𝐸(𝑌)
Independence of 𝒙 and 𝒖
• Premise: 𝐸 𝑢 𝑥) = 𝐸 𝑢 = 0 and 𝐶𝑜𝑣 𝑥, 𝑢 = 0 (𝑥
and 𝑢 are independent.
We need to show that 𝐸(𝑥𝑢) = 0
• Remember from probability theory that
𝐶𝑜𝑣 𝑥, 𝑢 = 𝐸 𝑥𝑢 − 𝐸 𝑥 𝐸(𝑢)
But 𝐸 𝑢 = 0, from our assumption. Hence,
𝐶𝑜𝑣 𝑥, 𝑢 = 𝐸 𝑥𝑢 .
But 𝐶𝑜𝑣 𝑥, 𝑢 = 0, therefore 𝐸 𝑥𝑢 = 0.
Deriving the estimators
Simple Linear Regression using a Random Sample
Deriving the estimators:
Method of moments
Simple Linear Regression using a Random Sample
Recall: Methods of deriving the
estimators
1. Method of Moments
2. Least Squares Method
3. Maximum Likelihood Method
Recall: Methods of deriving the
estimators
1. Method of Moments
2. Least Squares Method
3. Maximum Likelihood Method
Deriving the estimators:
Method of Moments
• The method of moments approach to estimation
implies imposing the population moment restrictions
on the sample moments.
• These are called moment restrictions.
Deriving the estimators:
Method of Moments
• The method of moments approach to estimation implies
imposing the population moment restrictions on the
sample moments.
• These are called moment restrictions.
• Example: If the population moment restriction for the
mean of 𝑋 is 𝐸 X = 𝜇0 (assumed to be a constant
value), the sample moments could then be:
𝑛
1
෍ 𝑥𝑖 = 𝜇0
𝑛
𝑖=1
Deriving the estimators:
Method of Moments
• From our previous assumption: 𝐸 𝑢 = 0 and
𝐸(𝑥𝑢) = 0, we can write them in in terms of 𝑥, 𝑦, 𝛽0,
and 𝛽1 since 𝑢 = 𝑦 − 𝛽0 − 𝛽1𝑥:

𝐸 𝑦 − 𝛽0 − 𝛽1𝑥 = 0
𝐸 𝑥 𝑦 − 𝛽0 − 𝛽1𝑥 =0

• These are now the moment restrictions for our


regression.
Deriving the estimators:
Method of Moments
• We want to choose values of the parameters that will
ensure that the sample versions of our moment
restrictions are true.
Deriving the estimators:
Method of Moments
• The sample moment versions are as follows:
𝑛
1
෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑛
𝑖=1
𝑛
1
෍ 𝑥𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑛
𝑖=1

• Using these equations, we can solve for 𝛽መ0 and 𝛽መ1 .


Deriving the estimators:
Least square method
Simple Linear Regression using a Random Sample
Many possible lines can be estimated to
describe the relationship
y
. ෡𝟎 + 𝜷
ෝ=𝜷
𝒚 ෡ 𝟏𝒙
Which among
all the many
possible lines .
would best .
represent the
relationship?
.
x
Possible criteria: Derive a line that has
the shortest distance from the points
y
. ෡𝟎 + 𝜷
ෝ=𝜷
𝒚 ෡ 𝟏𝒙
y4 {
𝑢ො 4

y3 .} 𝑢ො 3
y2 𝑢ො {
.
2

y1 .}𝑢
ො1

x1 x2 x3 x4 x
Goal: Derive a line that has
the shortest distance – line of best fit
• Given the intuitive idea of fitting a line, we can set up
a formal minimization problem:
minimizing the sum of the values of the error terms
across all data points.
𝑛

min ෍ 𝑢ො 𝑖
𝑖=1
Goal: Derive a line that has
the shortest distance – line of best fit
• Given the intuitive idea of fitting a line, we can set up a
formal minimization problem:
minimizing the sum of the values of the error terms across
all data points.
𝑛

min ෍ 𝑢ො 𝑖
𝑖=1
• But note that these could result into large negative and the
distance of the line from the data points could also be large
Goal: Derive a line that has
the shortest distance – line of best fit
• We can express the minimization problem as the
square of the error terms:
𝑛

min ෍ 𝑢ො 𝑖2
𝑖=1
Goal: Derive a line that has
the shortest distance – line of best fit
• We can express the minimization problem as the
square of the error terms:
𝑛 𝑛
2
min ෍ 𝑢ො 𝑖2 = ෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
𝑖=1 𝑖=1

Note: This is an unconstrained optimization problem!


Goal: Derive a line that has
the shortest distance – line of best fit
• Recall: Calculus
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
1. Get the first derivative of 𝑦 (𝑦′ = 𝑑𝑦/𝑑𝑥) with
respect to 𝑥.
2. Equate to zero. Then, solve for 𝑥. >> this will be the
optimal point.
3. To determine if it is minimum or maximum, get the
second derivative (𝑦′′). If 𝑦 ′′ > 0, then we have a
minimum. If If 𝑦 ′′ < 0, the we have a maximum.
Goal: Derive a line that has
the shortest distance – line of best fit
• Recall: Calculus
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
1. Get the first derivative of 𝑦 (𝑦′ = 𝑑𝑦/𝑑𝑥) with
respect to 𝑥.
𝑦 ′ = 2𝑎𝑥 + 𝑏
2. Equate to zero. Then, solve for 𝑥. >> this will be the
optimal point.

𝑦 = 2𝑎𝑥 + 𝑏 = 0
𝑏
𝑥=−
2𝑎
Goal: Derive a line that has
the shortest distance – line of best fit
• Recall: Calculus
𝑦 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐
3. To determine if it is minimum or maximum, get the
second derivative (𝑦′′). If 𝑦 ′′ > 0, then we have a
minimum. If If 𝑦 ′′ < 0, the we have a maximum.

𝑦 = 2𝑎𝑥 + 𝑏
𝑦 ′′ = 2𝑎
If 2𝑎 > 0, then minimum
If 2𝑎 < 0, then maximum
Goal: Derive a line that has
the shortest distance – line of best fit
• What are the steps in solving an unconstrained
optimization problem?

1. Get the partial derivative with respect to 𝛽መ0 and 𝛽መ1 .


2. Equate the derivatives to zero (this will be the first
order conditions). Then solve for 𝛽መ0 and 𝛽መ1 .
3. Check if 𝛽መ0 and 𝛽መ1 are really the optimal estimators
by getting second partial derivatives (i.e., the second
order conditions)
Goal: Derive a line that has
the shortest distance – line of best fit
• What are the steps in solving an unconstrained
optimization problem?

1. Get the partial derivative with respect to 𝛽መ0 and 𝛽መ1 .


2. Equate the derivatives to zero (this will be the first
order conditions). Then solve for 𝛽መ0 and 𝛽መ1 .
3. Check if 𝛽መ0 and 𝛽መ1 are really the optimal estimators
by getting second partial derivatives (i.e., the second
order conditions)
Goal: Derive a line that has
the shortest distance – line of best fit
𝑛 𝑛
2
min ෍ 𝑢ො 𝑖2 = ෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
𝑖=1 𝑖=1

1. Get the partial derivative with respect to 𝛽መ0 and 𝛽መ1 .


𝑖 = 1: (𝑦1 − 𝛽መ0 − 𝛽መ1 𝑥1 )2
𝑖 = 2: (𝑦2 − 𝛽መ0 − 𝛽መ1 𝑥2 )2

𝑖 = 𝑛: (𝑦𝑛 − 𝛽መ0 − 𝛽መ1 𝑥𝑛 )2
Goal: Derive a line that has
the shortest distance – line of best fit
𝑑(.)
𝑖 = 1: (𝑦1 − 𝛽መ0 − 𝛽መ1 𝑥1 )2 → ෡0 = −2(𝑦1 − 𝛽መ0 − 𝛽መ1 𝑥1 )
𝑑𝛽

𝑑(.)
𝑖 = 2: (𝑦2 − 𝛽መ0 − 𝛽መ1 𝑥2 ) 2 → ෡0 = −2(𝑦2 − 𝛽መ0 − 𝛽መ1 𝑥2 )
𝑑𝛽

𝑑(.)
𝑖 = 𝑛: (𝑦𝑛 − 𝛽መ0 − 𝛽መ1 𝑥𝑛 )2 → ෡0
𝑑𝛽
= −2(𝑦𝑛 − 𝛽መ0 − 𝛽መ1 𝑥𝑛 )
Goal: Derive a line that has
the shortest distance – line of best fit
Hence,
2 𝑛
𝑑 𝑖=1 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
σ𝑛
= −2 ෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
𝑑 𝛽መ0
𝑖=1
Goal: Derive a line that has
the shortest distance – line of best fit
𝑑(.)
𝑖 = 1: (𝑦1 − 𝛽መ0 − 𝛽መ1 𝑥1 )2 → ෡1
𝑑𝛽
= −2𝑥1 (𝑦1 − 𝛽መ0 − 𝛽መ1 𝑥1 )
𝑑(.)
𝑖 = 2: (𝑦2 − 𝛽መ0 − 𝛽መ1 𝑥2 ) →2
෡1 = −2𝑥2 (𝑦2 − 𝛽መ0 − 𝛽መ1 𝑥2 )
𝑑𝛽

𝑑(.)
𝑖 = 𝑛: (𝑦𝑛 − 𝛽መ0 − 𝛽መ1 𝑥𝑛 )2 → ෡1
𝑑𝛽
= −2𝑥𝑛 (𝑦𝑛 − 𝛽መ0 − 𝛽መ1 𝑥𝑛 )
Goal: Derive a line that has
the shortest distance – line of best fit
Hence,
2 𝑛
𝑑 σ𝑖=1 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
𝑛
= −2 ෍ 𝑥𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖
𝑑 𝛽መ1 𝑖=1
Goal: Derive a line that has
the shortest distance – line of best fit
2. Equate the derivatives to zero (this will be the first order
conditions).
FOCs:
𝑛

−2 ෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑖=1
𝑛

−2 ෍ 𝑥𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑖=1
Goal: Derive a line that has
the shortest distance – line of best fit
Then solve for 𝛽መ0 and 𝛽መ1 .
From the first FOC, we have:
σ𝑛𝑖=1 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
→ σ𝑖=1 𝑦𝑖 − σ𝑖=1 𝛽0 − σ𝑖=1 𝛽መ1 𝑥𝑖 = 0
𝑛 𝑛 መ 𝑛

→ 𝑖=1 𝑦𝑖 − 𝑛𝛽0 − 𝑖=1 𝛽መ1 𝑥𝑖 = 0


σ 𝑛 መ σ𝑛

→ 𝑖=1 𝑦𝑖 − 𝑖=1 𝛽መ1 𝑥𝑖 = 𝑛𝛽መ0


σ 𝑛
σ 𝑛
Goal: Derive a line that has
the shortest distance – line of best fit
→ 𝑖=1 𝑦𝑖 − 𝑖=1 𝛽መ1 𝑥𝑖 = 𝑛𝛽መ0
σ𝑛 σ𝑛

→ 𝑛𝛽0 = σ𝑖=1 𝑦𝑖 − σ𝑖=1 𝛽መ1 𝑥𝑖


መ 𝑛 𝑛

1 𝑛 1 𝑛
መ መ
→ 𝛽0 = σ𝑖=1 𝑦𝑖 − 𝛽1 σ𝑖=1 𝑥𝑖
𝑛 𝑛
→𝜷෡𝟎 = 𝒚
ഥ−𝜷 ෡ 𝟏𝒙

෡ 𝟎.
** We have solved for 𝜷
Goal: Derive a line that has
the shortest distance – line of best fit

෡𝟎 + 𝜷
ഥ=𝜷
A result from this is that 𝒚 ෡ 𝟏𝒙
ഥ, which
implies that the estimated sample regression passes
through the sample means of 𝒚 and 𝒙.
Goal: Derive a line that has
the shortest distance – line of best fit
Solve for 𝛽መ1 .
From the second FOC and the derived 𝛽መ0 , we have:
σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
→ 𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝛽መ1 𝑥ҧ − 𝛽መ1 𝑥𝑖 = 0
σ𝑛

→ 𝑛
σ𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝛽መ1 (𝑥𝑖 − 𝑥)ҧ = 0
Goal: Derive a line that has
the shortest distance – line of best fit
→ 𝑛
σ𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝛽መ1 (𝑥𝑖 − 𝑥)ҧ = 0
σ 𝑛 መ σ𝑛
→ 𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝛽1 𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ = 0
መ 𝑛 𝑛
→ 𝛽1 σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ = σ𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത
𝑛
σ𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ
Goal: Derive a line that has
the shortest distance – line of best fit
σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത σ 𝑛
− 𝑥ҧ 𝑖=1 𝑦𝑖 − 𝑦ത
→ 𝛽መ1 = 𝑛 𝑛
σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ − 𝑥ҧ σ𝑖=1 𝑦𝑖 − 𝑦ത
𝑛
σ𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑦ത 𝑛
− σ𝑖=1 𝑥ҧ 𝑦𝑖 − 𝑦ത
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ − σ𝑛𝑖=1 𝑥ҧ 𝑥𝑖 − 𝑥ҧ
σ𝑛𝑖=1(𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝑥ҧ 𝑦𝑖 − 𝑦ത )
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ
Goal: Derive a line that has
the shortest distance – line of best fit
σ𝑛𝑖=1(𝑥𝑖 𝑦𝑖 − 𝑦ത − 𝑥ҧ 𝑦𝑖 − 𝑦ത )
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 𝑥𝑖 − 𝑥ҧ − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ
𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑥𝑖 − 𝑥ҧ
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
→ 𝛽መ1 = 𝑛
σ𝑖=1 𝑥𝑖 − 𝑥ҧ 2
Goal: Derive a line that has
the shortest distance – line of best fit
σ𝒏𝒊=𝟏 𝒙𝒊 − 𝒙
ഥ 𝒚𝒊 − 𝒚ഥ
෡𝟏 =
→𝜷
σ𝒏𝒊=𝟏 𝒙𝒊 − 𝒙
ഥ 𝟐

෡𝟏
We have solved for 𝜷
Recap: Deriving the estimators:
Least square methods
• If one uses calculus to solve the minimization problem
for the two parameters you obtain the following first
order conditions, which are the same as the Method of
Moment restrictions if multiplied by n
𝑛

෍ 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑖=1

෍ 𝑥𝑖 𝑦𝑖 − 𝛽መ0 − 𝛽መ1 𝑥𝑖 = 0
𝑖=1
Recap: Deriving the estimators:
Least square methods
• Given the definition of a sample mean, and properties
of summation, we can rewrite the first condition as
follows
y  bˆ0  bˆ1 x
or

bˆ0  y  bˆ1 x
Recap: Deriving the estimators:
Least square methods
Solving for 𝛽መ1 , we will get:
𝑛
σ𝑖=1 𝑥𝑖 − 𝑥lj 𝑦𝑖 − 𝑦lj
𝛽መ1 =
σ𝑛𝑖=1 𝑥𝑖 − 𝑥lj 2
provided that
𝑛
2
෍ 𝑥𝑖 − 𝑥lj >0
𝑖=1
Exercise
Summary: Ordinary Least Square
estimates (OLS)
• The slope estimate is the sample covariance
between 𝑥 and 𝑦 divided by the sample variance
of 𝑥
• If 𝑥 and 𝑦 are positively correlated, the slope
will be positive
• If 𝑥 and 𝑦 are negatively correlated, the slope
will be negative
Summary: OLS estimates
Recall: correlation coefficient
𝑛
σ𝑖=1 𝑥𝑖 − 𝑥lj 𝑦𝑖 − 𝑦lj
𝜌=
σ𝑛𝑖=1 𝑥𝑖 − 𝑥lj 2 σ𝑛
𝑖=1 𝑦𝑖 − 𝑦lj 2

vis-à-vis, the estimator of the slope coefficient


σ𝑛𝑖=1 𝑥𝑖 − 𝑥lj 𝑦𝑖 − 𝑦lj
𝛽መ1 =
σ𝑛𝑖=1 𝑥𝑖 − 𝑥lj 2
More on OLS
• Intuitively, OLS is fitting a line through the sample
points such that the sum of squared residuals is as
small as possible, hence the term least squares
• The residual, 𝑢,
ො is an estimate of the error term, 𝑢,
and is the difference between the fitted line (sample
regression function) and the sample point
𝑦𝑖 − 𝑦ො𝑖 = 𝑢ො 𝑖

You might also like