INSY662 - F23 - Week 3-2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

INSY 662 – Fall 2023

Data Mining and Visualization

Week 3-2: Regression Model Part 2


September 14, 2023
Elizabeth Han
Today’s class

§ Revisit linear regression


§ Regularization technique
– Ridge regression
– LASSO regression
§ Coding session

2
Linear Regression Revisited
y

§ The core idea is to find a linear relationship


between predictors and the target variable

§ It works well in both statistical and data


mining perspectives

3
Linear Regression Revisited
y

§ But in data mining, there is one important


issue
– The model is very sensitive to the training dataset

4
Linear Regression Revisited
y

§ The relationship estimated from the training


data (red) would be different than the “true”
relationship (dark grey)

5
Linear Regression Revisited
y

§ When the model is used on the test dataset


(blue), the performance would be subpar

6
The Issue

§ This issue occurs because the objective of


the linear regression model is to optimize the
sum of squared error in the training data
§ This leads to low bias & high variance

7
Regularization

§ The idea is to add a small amount of bias to


the model (i.e., making the model performs
worse with the training data)
§ There are several models that utilize the
regularization technique
– Ridge regression
– LASSO regression

8
Ridge Regression

§ Adds bias by changing the objective of the


model from minimizing the sum of squared
errors (SSE) to minimizing:
$

𝑺𝑺𝑬 + (𝜆 " 𝛽!% )


!"#

Additional Penalty Imposed


by Ridge Regression
(a.k.a. shrinkage penalty)

9
Ridge Regression

§ Intuitively, 𝛽 represents the sensitivity of the


target variable in respond to the change in
the value of predictor(s)
§ The tuning parameter λ (always ≥ 0) controls
how sensitive you want the target variable to
be with respect to the change in the value of
predictor(s)

𝑺𝑺𝑬 + (𝜆 " 𝛽!% )


!"#
10
Ridge Regression
y Linear regression line
Ridge regression line

§ With trial-and-errors, we find the value of λ


that optimizes the SSE based on the test set

§ In practice, we use cross validation to find


the optimal value of λ (in python RidgeCV())
11
LASSO Regression

§ Least Absolute Shrinkage and Selection


Operator

§ Very similar to Ridge Regression with one


important difference
– The objective function
$
𝑺𝑺𝑬 + (𝜆 " 𝛽! )
!"#

– Coefficients can be set to zero

12
Ridge and LASSO
§ The role of λ
– Penalizes the predictor(s) with respect to their
influence on the target variable
– Imposed penalty different for different predictors

§ Need to standardize the predictors before


applying Ridge or LASSO

13
Ridge Regression vs. LASSO

Ridge LASSO
$ $
Objective
function 𝑺𝑺𝑬 + (𝜆 + 𝛽!% ) 𝑺𝑺𝑬 + (𝜆 + 𝛽! )
!"# !"#
Penalty on Slope can be Slope can be
the slope asymptotically zero decreased to zero

When to use When most predictors When there are a lot


are useful of useless predictors

14
Coding Session

§ Use cereals.csv dataset

§ The dataset contains information about cereal


products.

§ We are going to predict the product rating


based on the product’s nutritional
information.
§ We will apply linear regression, cross
validation, ridge, and LASSO.

15

You might also like