Lecture 2

AIL 7310: MACHINE LEARNING FOR ECONOMICS
Lecture 2
3rd August, 2023
AIL 7310: ML for Econ Lecture 2 1 / 18

Setting the stage
Suppose we observe a quantitative response Y and p different predictors,

X1 , X2 , ..., Xp .

Setting the stage

X1 , X2 , ..., Xp .
We assume that there is some relationship between Y and

X = (X1 , X2 , ..., Xp ) which can be written as
Y = f (X ) + ϵ
Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some

random error term. ϵ is independent of X and has mean zero.

Setting the stage

X1 , X2 , ..., Xp .

Y = f (X ) + ϵ

f represents the systematic information that X provides about Y .

Setting the stage

X1 , X2 , ..., Xp .

Y = f (X ) + ϵ

f represents the systematic information that X provides about Y .
Statistical Learning/Machine Learning refers to the set of tools used to

learn this relationship f .

Inference Vs Prediction
There are two main reason why we wish to estimate f : inference and
prediction.

Prediction
Often, we encounter situations where a set of inputs (X ) are readily

available but their corresponding values of output (Y ) are not readily
available.

Prediction

available.
In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .

Prediction

available.
In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .
In this type of tasks, the exact form of fˆ is not our concern. We care
about getting as accurate as possible Ŷ .

Prediction
The accuracy of Ŷ as a prediction for Y depends on two quantities -

reducible error and irreducible error.

Prediction

fˆ is usually not a perfect estimate for f and this will introduce some error.

Prediction

This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.

Prediction

This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.
In fact, the goal of most applied ML papers have been to choose from a
variety of available ML tools to choose the one that will minimize this
reducible error.

Prediction
However, even if we were able to estimate f perfectly so that our

estimated response takes the form Ŷ = f (X ), our prediction would still
have some error in it.

Prediction

This is because Y is also a function of ϵ which by definition cannot be

explained by X . Thus the variation associated with ϵ will also affect the
accuracy of our predicted Ŷ .

Prediction

This is because Y is also a function of ϵ which by definition cannot be

explained by X . Thus the variation associated with ϵ will also affect the
accuracy of our predicted Ŷ .
This is known as the irreducible error, since no matter how well we

estimate fˆ, we cannot reduce the error introduced by ϵ.

Prediction
ϵ contains information from unmeasured variables that are useful in
predicting Y . Since we don’t measure them, they cannot be used while
estimating fˆ. Thus this irreducible error is always larger than zero.

Prediction
ϵ contains information from unmeasured variables that are useful in
predicting Y . Since we don’t measure them, they cannot be used while
estimating fˆ. Thus this irreducible error is always larger than zero.
Mathematically, consider a given estimate fˆ and a set of predictors X,

which yields the prediction Ŷ = fˆ(X ). Assume that both fˆ and X are
fixed. Then, the expected value of squared difference between actual and
predicted value of Y
E (Y − Ŷ )2 = E [f (X ) + ϵ − fˆ(X )]2
= [f (X ) + ϵ − fˆ(X )]2 + Var (ϵ)
| {z } | {z }
reducible irreducible
Here, Var (ϵ) is the variance associated with the error term.

Inference
In other cases, we are interested in understanding the way Y is affected as

X1 , X2 , ..., Xp change.

Inference

In this case too, we wish to estimate f but our goal is not to make
predictions for Y.

Inference

predictions for Y.
Instead we try to understand the relationship between X and Y . More

specifically, we wish to understand how Y changes as X changes.

Inference

predictions for Y.
Instead we try to understand the relationship between X and Y . More

specifically, we wish to understand how Y changes as X changes.
Note: Here, the exact form of f becomes very important. Can no longer
be a black box.

Inference
Why do we care about inference?

Inference
Why do we care about inference?
We need to answer questions such as these.

Which predictors are associated with the response?
What is the relationship between the response and each predictor?
Can the relationship between Y and each predictor be adequately
summarized using a linear equation, or is the relationship more
complicated?
How much will an extra unit of X contribute to Y?

Inference
Let us think about these questions one by one.

Inference

Inference
Often only a small fraction of the available predictors are substantially

associated with Y. Identifying the few important predictor among a large
set of possible variables can be extremely useful, depending on the
application.

Inference

Inference
Some predictors may have a positive relationship with Y , in the sense that
increasing the predictor is associated with increasing values of Y . Other
predictors may have the opposite relationship. Depending on the
complexity of f, the relationship between the response and a given
predictor may also depend on the values of the other predictors.

Inference

complicated?

Inference

complicated?
Historically, most methods for estimating f have taken a linear form. In

some situations, such an assumption is reasonable or even desirable. But
often the true relationship is more complicated, in which case a linear
model may not provide an accurate representation of the relationship
between the input and output variables.

Inference

Inference
This is known as the Marginal Effect. It is calculated/estimated using

model parameters. This information is very important in economics. It
tells us how large the effect of X on Y is. This is very important in policy
formulation and cost calculations. The interpretability of the model
becomes crucial here.

Inference
This is known as the Marginal Effect. It is calculated/estimated using

model parameters. This information is very important in economics. It
tells us how large the effect of X on Y is. This is very important in policy
formulation and cost calculations. The interpretability of the model
becomes crucial here.
Sign, Size and Significance (both statistical and economic)

Statistical Modeling: Two Cultures
Truth
Two goals in analyzing data: Prediction and Inference

Prediction - Algorithmic Modeling Culture
This approach is to find a function f(x)-an algorithm that operates on x to

predict the responses y.

Inference - Data Modeling Culture
The values of the parameters are estimated from the data and the model
then used for information and/or prediction. Thus the black box is filled in.

Estimating f
So how do we estimate this f ?

Estimating f
There are many different techniques for estimating f .

Estimating f
Broadly defined, they can be characterized as parametric or

non-parametric.

Estimating f
Broadly defined, they can be characterized as parametric or

non-parametric.
Our goal is to apply a statistical learning method to the training data in

order to estimate the unknown function f . In other words, we want to find
a function fˆ such that Y ≈ fˆ(X ) for any observation (X , Y ).

Estimating f
Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.

Estimating f
Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.
For only Inference purposes, we use the entire dataset to estimate f .

This is because, in case of inference the exact form of fˆ is very important
and we want to use the entire information available to us to get this right.

Lecture 2

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 2

Uploaded by

Copyright:

Available Formats

AIL 7310: MACHINE LEARNING FOR ECONOMICS

3rd August, 2023

AIL 7310: ML for Econ Lecture 2 1 / 18

Suppose we observe a quantitative response Y and p different predictors,

AIL 7310: ML for Econ Lecture 2 2 / 18

Suppose we observe a quantitative response Y and p different predictors,

We assume that there is some relationship between Y and

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some

AIL 7310: ML for Econ Lecture 2 2 / 18

Suppose we observe a quantitative response Y and p different predictors,

We assume that there is some relationship between Y and

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some

f represents the systematic information that X provides about Y .

AIL 7310: ML for Econ Lecture 2 2 / 18

Suppose we observe a quantitative response Y and p different predictors,

We assume that there is some relationship between Y and

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some

f represents the systematic information that X provides about Y .

Statistical Learning/Machine Learning refers to the set of tools used to

AIL 7310: ML for Econ Lecture 2 2 / 18

AIL 7310: ML for Econ Lecture 2 3 / 18

Often, we encounter situations where a set of inputs (X ) are readily

AIL 7310: ML for Econ Lecture 2 4 / 18

Often, we encounter situations where a set of inputs (X ) are readily

AIL 7310: ML for Econ Lecture 2 4 / 18

Often, we encounter situations where a set of inputs (X ) are readily

AIL 7310: ML for Econ Lecture 2 4 / 18

The accuracy of Ŷ as a prediction for Y depends on two quantities -

AIL 7310: ML for Econ Lecture 2 5 / 18

The accuracy of Ŷ as a prediction for Y depends on two quantities -

AIL 7310: ML for Econ Lecture 2 5 / 18

The accuracy of Ŷ as a prediction for Y depends on two quantities -

AIL 7310: ML for Econ Lecture 2 5 / 18

The accuracy of Ŷ as a prediction for Y depends on two quantities -

AIL 7310: ML for Econ Lecture 2 5 / 18

However, even if we were able to estimate f perfectly so that our

AIL 7310: ML for Econ Lecture 2 6 / 18

However, even if we were able to estimate f perfectly so that our

This is because Y is also a function of ϵ which by definition cannot be

AIL 7310: ML for Econ Lecture 2 6 / 18

However, even if we were able to estimate f perfectly so that our

This is because Y is also a function of ϵ which by definition cannot be

This is known as the irreducible error, since no matter how well we

AIL 7310: ML for Econ Lecture 2 6 / 18

AIL 7310: ML for Econ Lecture 2 7 / 18

Mathematically, consider a given estimate fˆ and a set of predictors X,

AIL 7310: ML for Econ Lecture 2 7 / 18

In other cases, we are interested in understanding the way Y is affected as

AIL 7310: ML for Econ Lecture 2 8 / 18

In other cases, we are interested in understanding the way Y is affected as

AIL 7310: ML for Econ Lecture 2 8 / 18

In other cases, we are interested in understanding the way Y is affected as

Instead we try to understand the relationship between X and Y . More

AIL 7310: ML for Econ Lecture 2 8 / 18

In other cases, we are interested in understanding the way Y is affected as

Instead we try to understand the relationship between X and Y . More

AIL 7310: ML for Econ Lecture 2 8 / 18

Why do we care about inference?