Lecture 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

AIL 7310: MACHINE LEARNING FOR ECONOMICS

Lecture 2

3rd August, 2023

AIL 7310: ML for Econ Lecture 2 1 / 18


Setting the stage

Suppose we observe a quantitative response Y and p different predictors,


X1 , X2 , ..., Xp .

AIL 7310: ML for Econ Lecture 2 2 / 18


Setting the stage

Suppose we observe a quantitative response Y and p different predictors,


X1 , X2 , ..., Xp .

We assume that there is some relationship between Y and


X = (X1 , X2 , ..., Xp ) which can be written as

Y = f (X ) + ϵ

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some


random error term. ϵ is independent of X and has mean zero.

AIL 7310: ML for Econ Lecture 2 2 / 18


Setting the stage

Suppose we observe a quantitative response Y and p different predictors,


X1 , X2 , ..., Xp .

We assume that there is some relationship between Y and


X = (X1 , X2 , ..., Xp ) which can be written as

Y = f (X ) + ϵ

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some


random error term. ϵ is independent of X and has mean zero.

f represents the systematic information that X provides about Y .

AIL 7310: ML for Econ Lecture 2 2 / 18


Setting the stage

Suppose we observe a quantitative response Y and p different predictors,


X1 , X2 , ..., Xp .

We assume that there is some relationship between Y and


X = (X1 , X2 , ..., Xp ) which can be written as

Y = f (X ) + ϵ

Here f is some fixed but unknown function of X1 , X2 , ...Xp and ϵ is some


random error term. ϵ is independent of X and has mean zero.

f represents the systematic information that X provides about Y .

Statistical Learning/Machine Learning refers to the set of tools used to


learn this relationship f .

AIL 7310: ML for Econ Lecture 2 2 / 18


Inference Vs Prediction

There are two main reason why we wish to estimate f : inference and
prediction.

AIL 7310: ML for Econ Lecture 2 3 / 18


Prediction

Often, we encounter situations where a set of inputs (X ) are readily


available but their corresponding values of output (Y ) are not readily
available.

AIL 7310: ML for Econ Lecture 2 4 / 18


Prediction

Often, we encounter situations where a set of inputs (X ) are readily


available but their corresponding values of output (Y ) are not readily
available.

In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .

AIL 7310: ML for Econ Lecture 2 4 / 18


Prediction

Often, we encounter situations where a set of inputs (X ) are readily


available but their corresponding values of output (Y ) are not readily
available.

In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .

In this type of tasks, the exact form of fˆ is not our concern. We care
about getting as accurate as possible Ŷ .

AIL 7310: ML for Econ Lecture 2 4 / 18


Prediction

The accuracy of Ŷ as a prediction for Y depends on two quantities -


reducible error and irreducible error.

AIL 7310: ML for Econ Lecture 2 5 / 18


Prediction

The accuracy of Ŷ as a prediction for Y depends on two quantities -


reducible error and irreducible error.

fˆ is usually not a perfect estimate for f and this will introduce some error.

AIL 7310: ML for Econ Lecture 2 5 / 18


Prediction

The accuracy of Ŷ as a prediction for Y depends on two quantities -


reducible error and irreducible error.

fˆ is usually not a perfect estimate for f and this will introduce some error.

This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.

AIL 7310: ML for Econ Lecture 2 5 / 18


Prediction

The accuracy of Ŷ as a prediction for Y depends on two quantities -


reducible error and irreducible error.

fˆ is usually not a perfect estimate for f and this will introduce some error.

This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.

In fact, the goal of most applied ML papers have been to choose from a
variety of available ML tools to choose the one that will minimize this
reducible error.

AIL 7310: ML for Econ Lecture 2 5 / 18


Prediction

However, even if we were able to estimate f perfectly so that our


estimated response takes the form Ŷ = f (X ), our prediction would still
have some error in it.

AIL 7310: ML for Econ Lecture 2 6 / 18


Prediction

However, even if we were able to estimate f perfectly so that our


estimated response takes the form Ŷ = f (X ), our prediction would still
have some error in it.

This is because Y is also a function of ϵ which by definition cannot be


explained by X . Thus the variation associated with ϵ will also affect the
accuracy of our predicted Ŷ .

AIL 7310: ML for Econ Lecture 2 6 / 18


Prediction

However, even if we were able to estimate f perfectly so that our


estimated response takes the form Ŷ = f (X ), our prediction would still
have some error in it.

This is because Y is also a function of ϵ which by definition cannot be


explained by X . Thus the variation associated with ϵ will also affect the
accuracy of our predicted Ŷ .

This is known as the irreducible error, since no matter how well we


estimate fˆ, we cannot reduce the error introduced by ϵ.

AIL 7310: ML for Econ Lecture 2 6 / 18


Prediction
ϵ contains information from unmeasured variables that are useful in
predicting Y . Since we don’t measure them, they cannot be used while
estimating fˆ. Thus this irreducible error is always larger than zero.

AIL 7310: ML for Econ Lecture 2 7 / 18


Prediction
ϵ contains information from unmeasured variables that are useful in
predicting Y . Since we don’t measure them, they cannot be used while
estimating fˆ. Thus this irreducible error is always larger than zero.

Mathematically, consider a given estimate fˆ and a set of predictors X,


which yields the prediction Ŷ = fˆ(X ). Assume that both fˆ and X are
fixed. Then, the expected value of squared difference between actual and
predicted value of Y

E (Y − Ŷ )2 = E [f (X ) + ϵ − fˆ(X )]2
= [f (X ) + ϵ − fˆ(X )]2 + Var (ϵ)
| {z } | {z }
reducible irreducible

Here, Var (ϵ) is the variance associated with the error term.

AIL 7310: ML for Econ Lecture 2 7 / 18


Inference

In other cases, we are interested in understanding the way Y is affected as


X1 , X2 , ..., Xp change.

AIL 7310: ML for Econ Lecture 2 8 / 18


Inference

In other cases, we are interested in understanding the way Y is affected as


X1 , X2 , ..., Xp change.

In this case too, we wish to estimate f but our goal is not to make
predictions for Y.

AIL 7310: ML for Econ Lecture 2 8 / 18


Inference

In other cases, we are interested in understanding the way Y is affected as


X1 , X2 , ..., Xp change.

In this case too, we wish to estimate f but our goal is not to make
predictions for Y.

Instead we try to understand the relationship between X and Y . More


specifically, we wish to understand how Y changes as X changes.

AIL 7310: ML for Econ Lecture 2 8 / 18


Inference

In other cases, we are interested in understanding the way Y is affected as


X1 , X2 , ..., Xp change.

In this case too, we wish to estimate f but our goal is not to make
predictions for Y.

Instead we try to understand the relationship between X and Y . More


specifically, we wish to understand how Y changes as X changes.

Note: Here, the exact form of f becomes very important. Can no longer
be a black box.

AIL 7310: ML for Econ Lecture 2 8 / 18


Inference

Why do we care about inference?

AIL 7310: ML for Econ Lecture 2 9 / 18


Inference

Why do we care about inference?

We need to answer questions such as these.


Which predictors are associated with the response?
What is the relationship between the response and each predictor?
Can the relationship between Y and each predictor be adequately
summarized using a linear equation, or is the relationship more
complicated?
How much will an extra unit of X contribute to Y?

AIL 7310: ML for Econ Lecture 2 9 / 18


Inference

Let us think about these questions one by one.

AIL 7310: ML for Econ Lecture 2 10 / 18


Inference

Let us think about these questions one by one.

Which predictors are associated with the response?

AIL 7310: ML for Econ Lecture 2 10 / 18


Inference

Let us think about these questions one by one.

Which predictors are associated with the response?

Often only a small fraction of the available predictors are substantially


associated with Y. Identifying the few important predictor among a large
set of possible variables can be extremely useful, depending on the
application.

AIL 7310: ML for Econ Lecture 2 10 / 18


Inference

What is the relationship between the response and each predictor?

AIL 7310: ML for Econ Lecture 2 11 / 18


Inference

What is the relationship between the response and each predictor?

Some predictors may have a positive relationship with Y , in the sense that
increasing the predictor is associated with increasing values of Y . Other
predictors may have the opposite relationship. Depending on the
complexity of f, the relationship between the response and a given
predictor may also depend on the values of the other predictors.

AIL 7310: ML for Econ Lecture 2 11 / 18


Inference

Can the relationship between Y and each predictor be adequately


summarized using a linear equation, or is the relationship more
complicated?

AIL 7310: ML for Econ Lecture 2 12 / 18


Inference

Can the relationship between Y and each predictor be adequately


summarized using a linear equation, or is the relationship more
complicated?

Historically, most methods for estimating f have taken a linear form. In


some situations, such an assumption is reasonable or even desirable. But
often the true relationship is more complicated, in which case a linear
model may not provide an accurate representation of the relationship
between the input and output variables.

AIL 7310: ML for Econ Lecture 2 12 / 18


Inference

How much will an extra unit of X contribute to Y?

AIL 7310: ML for Econ Lecture 2 13 / 18


Inference

How much will an extra unit of X contribute to Y?

This is known as the Marginal Effect. It is calculated/estimated using


model parameters. This information is very important in economics. It
tells us how large the effect of X on Y is. This is very important in policy
formulation and cost calculations. The interpretability of the model
becomes crucial here.

AIL 7310: ML for Econ Lecture 2 13 / 18


Inference

How much will an extra unit of X contribute to Y?

This is known as the Marginal Effect. It is calculated/estimated using


model parameters. This information is very important in economics. It
tells us how large the effect of X on Y is. This is very important in policy
formulation and cost calculations. The interpretability of the model
becomes crucial here.

Sign, Size and Significance (both statistical and economic)

AIL 7310: ML for Econ Lecture 2 13 / 18


Statistical Modeling: Two Cultures

Truth

Two goals in analyzing data: Prediction and Inference

AIL 7310: ML for Econ Lecture 2 14 / 18


Statistical Modeling: Two Cultures

Prediction - Algorithmic Modeling Culture

This approach is to find a function f(x)-an algorithm that operates on x to


predict the responses y.

AIL 7310: ML for Econ Lecture 2 15 / 18


Statistical Modeling: Two Cultures

Inference - Data Modeling Culture

The values of the parameters are estimated from the data and the model
then used for information and/or prediction. Thus the black box is filled in.

AIL 7310: ML for Econ Lecture 2 16 / 18


Estimating f

So how do we estimate this f ?

AIL 7310: ML for Econ Lecture 2 17 / 18


Estimating f

So how do we estimate this f ?

There are many different techniques for estimating f .

AIL 7310: ML for Econ Lecture 2 17 / 18


Estimating f

So how do we estimate this f ?

There are many different techniques for estimating f .

Broadly defined, they can be characterized as parametric or


non-parametric.

AIL 7310: ML for Econ Lecture 2 17 / 18


Estimating f

So how do we estimate this f ?

There are many different techniques for estimating f .

Broadly defined, they can be characterized as parametric or


non-parametric.

Our goal is to apply a statistical learning method to the training data in


order to estimate the unknown function f . In other words, we want to find
a function fˆ such that Y ≈ fˆ(X ) for any observation (X , Y ).

AIL 7310: ML for Econ Lecture 2 17 / 18


Estimating f

Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.

AIL 7310: ML for Econ Lecture 2 18 / 18


Estimating f

Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.

For only Inference purposes, we use the entire dataset to estimate f .


This is because, in case of inference the exact form of fˆ is very important
and we want to use the entire information available to us to get this right.

AIL 7310: ML for Econ Lecture 2 18 / 18

You might also like