Professional Documents
Culture Documents
Introduction To Statistical Learning
Introduction To Statistical Learning
Introduction To Statistical Learning
(ISLR 2.1)
Yingbo Li
Clemson University
MATH 8050
1 Why Estimate f
2 How to Estimate f
5 Regression vs Classification
25
25
20
20
20
Sales
Sales
Sales
15
15
15
10
10
10
5
5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100
TV Radio Newspaper
f : unknown function
: random error with mean zero, i.e., E() = 0.
In the Advertising example:
f (X1 , X2 , X3 ) = E(Y | X1 , X2 , X3 )
Statistical learning, and this course, are all about how to estimate f .
Why?
Prediction.
Inference.
Prediction
If we can get a good estimate for f, we can make accurate predictions for
the response Y , based on a new value of X.
For a new market, given three media budgets, what’s the sales?
Just want to predict sales, not to know which media is more
important.
Suppose our estimate for f is fˆ, the output Y for input X is predicted as
Ŷ = fˆ(X)
Inference
We are often interested in understanding the relationship between that Y
and each of X1 , . . . , Xp . For example
1 Which predictors actually affect the response?
2 Is the relationship positive or negative?
3 Is the relationship a simple linear one or is it more complicated?
How to estimate f
Use the training data and a statistical method to estimate f .
We have observed a set of training data
Incom
e
y
rit
Ye
io
a
n
rs
Se
of
Ed
uc
ati
on
Parametric methods
Reduces the problem of estimating f down to one of estimating a (finite)
set of parameters. A two-step model based approach:
1 Come up with a model (some functional form assumption about f ).
The most common example is a linear model.
f (X) = β0 + β1 X1 + β2 X2 + · · · + βp Xp
Incom
e
ity
or
Ye
ni
ars
Se
of
Ed
uc
ati
on
Non-parametric methods
They do not make explicit assumptions about the functional form of f .
Advantages: accurately fit a wider range of possible shapes of f .
Disadvantages: require large n to obtain an accurate estimate.
Incom
Incom
e
e
ity
ity
or
Ye
or
Ye
ni
a
ni
a rs
Se
rs
Se
of of
E Edu
du ca
ca tio
tio n
n
Some trade-offs
Prediction accuracy vs model interpretability
Linear models are easy to interpret; thin-plate splines are not.
Good fit vs over-fit
A model that overfits the training data may not predict well.
High
Subset Selection
Lasso
Least Squares
Interpretability
Bagging, Boosting
Low High
Flexibility
Yingbo Li (Clemson) Intro to Statistical Learning MATH 8050 14 / 16
Supervised vs Unsupervised Learning
8
10
8
6
X2
X2
6
4
4
2
2
0 2 4 6 8 10 12 0 2 4 6
X1 X1
Regression vs classification
Regression: Y is continuous (quantitative).
I Predicting the value of the Dow in 6 months.
I Predicting the value of a given house based on various inputs.
Classification: Y is categorical (qualitative).
I Will the Dow be up (U) or down (D) in 6 months?
I Is this email a SPAM or not?