Lecture 7

AIL 7310: MACHINE LEARNING FOR ECONOMICS
Lecture 7
10th August, 2023
AIL 7310: ML for Econ Lecture 7 1 / 11

Statistical Inference
What is statistical inference?

Statistical Inference is the process of drawing conclusions about an

underlying population using a subset or sample of the data.


Statistical Inference is necessary to understand the exact nature of the

relationship (f ) between the predictors and response. This is important is
making decisions both in businesses and in policy.



Note: Statistical Inference is a vast literature (over 200 year old) and is
the subject of numerous textbooks. In this course, we will only focus on a
small part of it.



Note: Statistical Inference is a vast literature (over 200 year old) and is
the subject of numerous textbooks. In this course, we will only focus on a
small part of it.
We will focus on a linear model estimated using ordinary least squares.

Simple Linear Regression Model
We wish to understand the relation between x and y in the population

using a simple linear regression model given by
y = β0 + β1 x + u (1)
In the inference setting, we use different terminology to describe these

variables.

We wish to understand the relation between x and y in the population

using a simple linear regression model given by
y = β0 + β1 x + u (1)
In the inference setting, we use different terminology to describe these

variables.
y x
Dependent variable Independent variable, Covariates
Explained Variable Explanatory variable
Regressand Regressor

The variable u in eq. (1) is called the error term or disturbance term. It
represents factors other than x that affect y i.e. the unobserved factors.

The variable u in eq. (1) is called the error term or disturbance term. It
represents factors other than x that affect y i.e. the unobserved factors.
If all other factors are kept constant so that the change in u is 0 ∆u = 0,

then x has a linear effect on y .
∆y = β1 ∆x
Thus the change in y is simply β1 multiplied by change in x. β1 is the
slope parameter. This slope parameter is of primary importance in
econometrics. β0 is the intercept parameter (also called constant term)
although not central to econometrics analysis it is very useful in some
cases.

As long as the intercept β0 is included in the model, we can assume that
the average value of u in the population is zero
E (u) = 0 (2)
This assumption simply tells us about the distribution of unobserved
factors in the population, nothing about the relation between x and u.

E (u) = 0 (2)
How do we look at the relation between x and u?

E (u) = 0 (2)
How do we look at the relation between x and u?
A natural way would be to use correlation coefficient. But correlation only

measures linear dependence. It would be possible for u to be uncorrelated
with x and still be correlation with functions of x , such as x 2 . So the
assumption of u being uncorrelated with x is not enough here as it can
cause problems for interpreting the model and deriving statistical
properties.

We assume that the average value of u does not depend on the value of x.
E (u|x) = E (u) (3)

Equation 3 says that the average value of unobservables is the same across
all slices of the population determined by x and this common average is
equal to the average of u over the entire population.

We assume that the average value of u does not depend on the value of x.
E (u|x) = E (u) (3)

Equation 3 says that the average value of unobservables is the same across
all slices of the population determined by x and this common average is
equal to the average of u over the entire population.
Combining equation 2 and equation 3 we get the zero conditional mean

assumption
E (u|x) = 0 (4)

The zero conditional mean assumption gives β1 another interpretation.

Taking the expected value of equation (1) conditional on x and using
E (u|x) = 0 gives
E (y |x) = β0 + β1 x (5)
Equation (5) shows the population regression line. E (y |x) is a linear
function of x. This linearity means that a one-unit increase in x changes
the expected value of y by the amount β1 .


E (u|x) = 0 gives
E (y |x) = β0 + β1 x (5)
Note: Eq (5) tells us how the average value of y changes with x. It does
not say that y equals β0 + β1 x for all units in the population.


E (u|x) = 0 gives
E (y |x) = β0 + β1 x (5)
Note: Eq (5) tells us how the average value of y changes with x. It does
not say that y equals β0 + β1 x for all units in the population.
The piece β0 + β1 x which represents E (y |x) is called the systematic part

of y i.e. the part of y that is explained by x. u represents the unsystematic
part or the part that isn’t explained by x.

Now we turn to the issue of how to estimate β0 and β1 in equation (1).

To do this we take a random sample of size n from the population.

To do this we take a random sample of size n from the population.
Since the data come from eq. (1), we can write
yi = β0 + β1 xi + ui (6)
for each i. Here ui is the error term for observation i because it contains
factors other the xi affecting yi .

We will use the previous assumption in eq. (2) E (u) = 0 and an important
implication of assumption eq. (3) : u is uncorrelated with x.

Therefore u has 0 expected value and that the covariance between x and u
is zero.

is zero.
E (u) = 0 (7)

is zero.
E (u) = 0 (7)
Cov (x, u) = E (xu) = 0 (8)

Rewriting the above equations in terms of observables we get
E (y − β0 − β1 x) = 0 (9)

is zero.
E (u) = 0 (7)
Cov (x, u) = E (xu) = 0 (8)

Rewriting the above equations in terms of observables we get
E (y − β0 − β1 x) = 0 (9)
E [x(y − β0 − β1 x)] = 0 (10)

Given a sample of data, we choose estimates βˆ0 and βˆ1 to solve the
sample counterparts of eq (9) and eq(10).
n
1X
(yi − βˆ0 − βˆ1 xi ) = 0 (11)
n
i=1

Given a sample of data, we choose estimates βˆ0 and βˆ1 to solve the
sample counterparts of eq (9) and eq(10).
n
1X
(yi − βˆ0 − βˆ1 xi ) = 0 (11)
n
i=1
n
1X
xi (yi − βˆ0 − βˆ1 xi ) = 0 (12)
n
i=1
This is the method of moments approach to estimation.

These two equations (11) and (12) can be solved to get βˆ0 and βˆ1 .
Pn
(x − x̄)(yi − ȳ )
ˆ
β1 = i=1 Pn i 2
(13)
i=1 (xi − x̄)

These two equations (11) and (12) can be solved to get βˆ0 and βˆ1 .
Pn
(x − x̄)(yi − ȳ )
ˆ
β1 = i=1 Pn i 2
(13)
i=1 (xi − x̄)
βˆ0 = ȳ − βˆ1 x̄ (14)

These are called the ordinary least squares estimates (OLS) of β1 and β0

Lecture 7

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 7

Uploaded by

Copyright:

Available Formats

AIL 7310: MACHINE LEARNING FOR ECONOMICS

10th August, 2023

AIL 7310: ML for Econ Lecture 7 1 / 11

What is statistical inference?

AIL 7310: ML for Econ Lecture 7 2 / 11

What is statistical inference?

Statistical Inference is the process of drawing conclusions about an

AIL 7310: ML for Econ Lecture 7 2 / 11

What is statistical inference?

Statistical Inference is the process of drawing conclusions about an

Statistical Inference is necessary to understand the exact nature of the

AIL 7310: ML for Econ Lecture 7 2 / 11

What is statistical inference?

Statistical Inference is the process of drawing conclusions about an

Statistical Inference is necessary to understand the exact nature of the

AIL 7310: ML for Econ Lecture 7 2 / 11

What is statistical inference?

Statistical Inference is the process of drawing conclusions about an

Statistical Inference is necessary to understand the exact nature of the

We will focus on a linear model estimated using ordinary least squares.

AIL 7310: ML for Econ Lecture 7 2 / 11

We wish to understand the relation between x and y in the population

In the inference setting, we use different terminology to describe these

AIL 7310: ML for Econ Lecture 7 3 / 11

We wish to understand the relation between x and y in the population

In the inference setting, we use different terminology to describe these

AIL 7310: ML for Econ Lecture 7 3 / 11

AIL 7310: ML for Econ Lecture 7 4 / 11

If all other factors are kept constant so that the change in u is 0 ∆u = 0,

AIL 7310: ML for Econ Lecture 7 4 / 11

AIL 7310: ML for Econ Lecture 7 5 / 11

How do we look at the relation between x and u?

AIL 7310: ML for Econ Lecture 7 5 / 11

How do we look at the relation between x and u?

A natural way would be to use correlation coefficient. But correlation only

AIL 7310: ML for Econ Lecture 7 5 / 11

E (u|x) = E (u) (3)

AIL 7310: ML for Econ Lecture 7 6 / 11

E (u|x) = E (u) (3)

Combining equation 2 and equation 3 we get the zero conditional mean

AIL 7310: ML for Econ Lecture 7 6 / 11

The zero conditional mean assumption gives β1 another interpretation.

AIL 7310: ML for Econ Lecture 7 7 / 11

The zero conditional mean assumption gives β1 another interpretation.

AIL 7310: ML for Econ Lecture 7 7 / 11

The zero conditional mean assumption gives β1 another interpretation.

The piece β0 + β1 x which represents E (y |x) is called the systematic part

AIL 7310: ML for Econ Lecture 7 7 / 11

Now we turn to the issue of how to estimate β0 and β1 in equation (1).

AIL 7310: ML for Econ Lecture 7 8 / 11

Now we turn to the issue of how to estimate β0 and β1 in equation (1).

To do this we take a random sample of size n from the population.

AIL 7310: ML for Econ Lecture 7 8 / 11

Now we turn to the issue of how to estimate β0 and β1 in equation (1).

To do this we take a random sample of size n from the population.

Since the data come from eq. (1), we can write

AIL 7310: ML for Econ Lecture 7 8 / 11

AIL 7310: ML for Econ Lecture 7 9 / 11

AIL 7310: ML for Econ Lecture 7 9 / 11

AIL 7310: ML for Econ Lecture 7 9 / 11

Cov (x, u) = E (xu) = 0 (8)

AIL 7310: ML for Econ Lecture 7 9 / 11

Cov (x, u) = E (xu) = 0 (8)