Professional Documents
Culture Documents
Lecture 2
Lecture 2
Lecture 2
Lecture 2
Y = f (X ) + ϵ
Y = f (X ) + ϵ
Y = f (X ) + ϵ
There are two main reason why we wish to estimate f : inference and
prediction.
In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .
In this setting, since the error terms ϵ averages to zero, we can predict Y
using
Ŷ = fˆ(X )
where fˆ is our estimate for f and Ŷ is the predicted value of Y .
In this type of tasks, the exact form of fˆ is not our concern. We care
about getting as accurate as possible Ŷ .
fˆ is usually not a perfect estimate for f and this will introduce some error.
fˆ is usually not a perfect estimate for f and this will introduce some error.
This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.
fˆ is usually not a perfect estimate for f and this will introduce some error.
This error is the reducible error since we can reduce this by choosing more
appropriate statistical learning techniques.
In fact, the goal of most applied ML papers have been to choose from a
variety of available ML tools to choose the one that will minimize this
reducible error.
E (Y − Ŷ )2 = E [f (X ) + ϵ − fˆ(X )]2
= [f (X ) + ϵ − fˆ(X )]2 + Var (ϵ)
| {z } | {z }
reducible irreducible
Here, Var (ϵ) is the variance associated with the error term.
In this case too, we wish to estimate f but our goal is not to make
predictions for Y.
In this case too, we wish to estimate f but our goal is not to make
predictions for Y.
In this case too, we wish to estimate f but our goal is not to make
predictions for Y.
Note: Here, the exact form of f becomes very important. Can no longer
be a black box.
Some predictors may have a positive relationship with Y , in the sense that
increasing the predictor is associated with increasing values of Y . Other
predictors may have the opposite relationship. Depending on the
complexity of f, the relationship between the response and a given
predictor may also depend on the values of the other predictors.
Truth
The values of the parameters are estimated from the data and the model
then used for information and/or prediction. Thus the black box is filled in.
Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.
Note: For prediction, we usually divide the data into training and test.
We estimate f from training set and use this estimated fˆ to get Ŷ on the
test set.