02 02 Regression

9/5/23
Polynomial Regression Warm-up: Linear Regression
1 2
Linear Regression (Task) Least Squares Regression (Method)

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ
Output: a vector 𝐰 ∈ ℝ# and scalar 𝑏 ∈ ℝ such that 𝐱 $% 𝐰 + 𝑏 ≈ 𝑦$ . 1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min
!"#
𝐗𝐰−𝐲 ,
.
𝐰∈ℝ
Tasks assume 𝑦$ is a linear Tasks Methods

function of 𝐱 $ .
Linear Linear Least Squares Regression

Regression Regression
3 4
1
9/5/23
Least Squares Regression (Method)

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ
1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min 𝐗𝐰−𝐲 .
𝐰∈ℝ!"# , Polynomial Regression
Tasks Methods Algorithms
Analytical Solution
Linear Least Squares Regression
Regression Gradient Descent
Conjugate Gradient
5 6
The Regression Task The Regression Task

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦. Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.
Question: 𝑓 is unknown! So how to learn 𝑓? Question: 𝑓 is unknown! So how to learn 𝑓?
Answer: polynomial approximation; 𝑓 is a polynomial function.

"$$ #
Taylor expansion: 𝑓 𝑥 = 𝑓 𝑎 + 𝑓 ! 𝑎 𝑎 − 𝑥 + 𝑎−𝑥 $ +⋯
$!
7 8
2
9/5/23
Polynomial Regression: 1D Example Polynomial Regression: 1D Example

Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦. Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦.
One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤.𝑥 .. One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤. 𝑥 . .
Polynomial regression:
1. Define a feature map 𝛟 𝑥 = [1, 𝑥, 𝑥 ,, 𝑥 /, ⋯ , 𝑥 . ].
2. For 𝑗 = 1 to 𝑛, do the mapping 𝑥& ↦ 𝛟(𝑥& ).
• Let 𝚽 = 𝛟 𝑥! ; ⋯ , 𝛟 𝑥" % ∈ ℝ"× .'!
,
3. Solve the least squares regression min
%"#
𝚽𝐰−𝐲 ,
.
𝐰∈ℝ
9 10
Polynomial Regression: 2D Example Polynomial Regression

In [1]:
import numpy
Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ, and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

X = numpy.arange(6).reshape(3, 2)
print('X = ')
print(X)
Output: a function 𝑓: ℝ, ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ . X =
[[0 1]
[2 3]
Two-dimensional example: how to do feature mapping? [4 5]]
In [2]:
Polynomial features:
from sklearn.preprocessing import PolynomialFeatures
𝛟 𝐱 = [1, 𝑥!, 𝑥,, 𝑥!,, 𝑥,,, 𝑥!𝑥,, 𝑥!/, 𝑥,/, 𝑥!𝑥,,, 𝑥!,𝑥, ]. poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
print('Phi = ')
print(Phi)
degree-0 degree-1 degree-2 degree-3 Phi =

[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
In [1]:
11 12 from keras.datasets import boston_housing
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
print('shape of x_train: ' + str(x_train.shape))

print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))
3
print('shape of y_test: ' + str(y_test.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the sec
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
shape of x_train: (404, 13)
9/5/23
In [1]:
Polynomial Regression Polynomial Regression

import numpy
XIn= [1]:
numpy.arange(6).reshape(3, 2)
print('X = ')
import numpy
print(X)
X = numpy.arange(6).reshape(3, 2)
Xprint('X
= = ')
[[0 1]
print(X)
[2 3] • 𝐱: 𝑑-dimensional
X[4
= 5]]
[[0 1] • 𝛟(𝐱): degree-𝑝 polynomial
[2 3]
In[4[2]:
5]] • The dimension of 𝛟 𝐱 is 𝑂 𝑑 .
from sklearn.preprocessing import PolynomialFeatures
In [2]:
poly = PolynomialFeatures(degree=3)
from sklearn.preprocessing
print('Phi = ') import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
print(Phi)
print('Phi
Phi = = ')
print(Phi)
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
Phi
[ 1.= 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
In [1]:
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
degree-0 degree-1 degree-2 degree-3
from keras.datasets import boston_housing
In [1]:
13 (x_train, y_train), (x_test, y_test) = boston_housing.load_data()

from keras.datasets import boston_housing 14
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
print('shape of y_test: ' + str(y_test.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `
print('shape
from ._conv of y_test:
import ' + str(y_test.shape))
register_converters as _register_converters
Using TensorFlow backend.
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `n
fromof
shape ._conv import
x_train: register_converters
(404, 13) as _register_converters
Using of
shape TensorFlow backend.
x_test: (102, 13)
Polynomial Regression
shape of y_train: (404,)
shape of y_test: (102,)
shape of x_test: (102, 13)
shape of y_train: (404,)
In [2]:
Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ
shape #
of y_test: (102,)
and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
import numpy
In [2]:
#
Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝐱 ≈𝑦.
n, d = x_train.shape
import numpy $ $
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
n, d = x_train.shape
axis=1)
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
Training, Test, and Overfitting
axis=1)
print('shape of xbar_train: ' + str(xbar_train.shape))
shape
!
print('shape of x_train:
of x_train:
# of xbar_train:
print('shape
shape
(404, 13)
of xbar_train: "
' + str(x_train.shape))
' + str(xbar_train.shape))
(404, 14)
In [3]:
shape of xbar_train: (404, 14)
# the analytical solution
In [3]:
$
xx = numpy.dot(xbar_train.T,
# the analytical solution % $&
xbar_train)
xx_inv = numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
xx = numpy.dot(xbar_train.T, xbar_train)
w = numpy.dot(xx_inv,
xx_inv xy)
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
15 w =[4]:
In numpy.dot(xx_inv, xy)
16
# mean squared error (training)
In [4]:
y_lsr
# mean= squared
numpy.dot(xbar_train,
error (training)w)
diff = y_lsr - y_train
mse = numpy.mean(diff * diff)
y_lsr = numpy.dot(xbar_train, w)
print('Train
diff = y_lsr -MSE: ' + str(mse))
y_train
mse = MSE:
Train numpy.mean(diff * diff)
22.00480083834814
print('Train MSE: ' + str(mse)) 4
Train
In MSE: 22.00480083834814
[5]:
print(y_train[0:10])
9/5/23
Polynomial Regression: Training Polynomial Regression: Training

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . . Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . .
' '
Least squares: min 𝚽𝐰−𝐲
'
. Least squares: min 𝚽𝐰−𝐲 '
.
𝐰 𝐰
Question: what will happen as 𝑝 grows?
1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.

2. Then you can find 𝐰 such that 𝚽 𝐰 = 𝐲. (Zero training error!)
17 18
Training and Testing Training and Testing
Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Train:
Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ .
Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ# .

Test:
Input: predict its label by 𝑓 𝐱 7 .
Underfitting Overfitting
19 20
5
9/5/23
Training and Testing Training and Testing
𝐱′ 𝐱′ 𝐱′
BAD GOOD BAD linear model
Degree-15 polynomial
Degree-4 polynomial
21 22
Hyper-Parameter Tuning Hyper-Parameter Tuning

Question: for the polynomial regression model, how to determine the degree 𝑝? Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Answer: the degree 𝑝 leads to the smallest test error.
23 24
6
9/5/23
Hyper-Parameter Tuning Hyper-Parameter Tuning

Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Test MSE = 23.2
labl e!
labels are unavaiTest
• Wrong! The test
Train a degree-3 polynomial regression Test MSE = 16.7 Train a degree-3 polynomial regression MSE = 16.7
th e te st labels, ne ver do this!
Train a degree-4 polynomial regression Test MSE = 12.2 if yo u ha
Train a degree-4 polynomial
• Even ve
regression Test MSE = 12.2
25 26
Select Models Using Test Labels
⋯ Cross-Validation (Naïve Approach)

for Hyper-Parameter Tuning
27 28
7
9/5/23
Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)
labels labels
features features
𝑛 training samples 𝑚 test samples 𝑛 training samples 𝑛&'( validation 𝑚 test samples
samples
29 30
Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)

Validation Set
Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Valid. MSE = 23.1
31 32
8
9/5/23
𝑘-Fold Cross-Validation
1. Propose a grid of hyper-parameters.
• E.g. 𝑝 ∈ {1, 2, 3, 4, 5, 6}.
2. Randomly partition the training samples to 𝑘 parts.
𝑘-Fold Cross-Validation • 𝑘 − 1 parts for training.
• One part for test.
3. Compute the averaged test errors of the 𝑘 repeats.
• The average is called the validation error.
4. Choose the hyper-parameter 𝑝 that leads to the
smallest validation error.
Example: 5-fold cross-validation
33 34
Example: 10-Fold Cross-Validation Example: 10-Fold Cross-Validation

hyper-parameter validation error
p=1 23.19
p=2 21.00
p=3 18.54
validation error
p=4 24.36
p=5 27.96
p=6 33.10
35 36
9
9/5/23
The Available Data

Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB
Real-World Machine Learning Competition
Test Data
The public and private are mixed;
Participants cannot distinguish them.
37 38
Train A Model Prediction

Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB
Model
𝐲89:;<= 𝐲8><?@AB
Model
39 40
10
9/5/23
Submission to Leaderboard Submission to Leaderboard

Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB
𝐲89:;<= 𝐲8><?@AB Question: Why two leaderboards? 𝐲89:;<= 𝐲8><?@AB
Submission Submission
Answer: The score can be evilly used
for hyper-parameter tuning (cheating).
Score=0.9527 Secret! Score=0.9527 Secret!
41 42
Summary
• Polynomial regression for non-linear problems.

• Polynomial regression has a hyper-parameter 𝑝.
• Underfitting (very small 𝑝) and overfitting (very big 𝑝) .
• Tune the hyper-parameters using cross-validation.
• Make your model parameters and hyper-parameters
independent of the test set!!!
43
11

02 02 Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 02 Regression

Uploaded by

Copyright:

Available Formats

9/5/23

Polynomial Regression Warm-up: Linear Regression

Linear Regression (Task) Least Squares Regression (Method)

Tasks assume 𝑦$ is a linear Tasks Methods

Linear Linear Least Squares Regression

Least Squares Regression (Method)

The Regression Task The Regression Task

Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦. Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.

Question: 𝑓 is unknown! So how to learn 𝑓? Question: 𝑓 is unknown! So how to learn 𝑓?

Answer: polynomial approximation; 𝑓 is a polynomial function.

Polynomial Regression: 1D Example Polynomial Regression: 1D Example

Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦. Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦.

One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤.𝑥 .. One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤. 𝑥 . .

Polynomial Regression: 2D Example Polynomial Regression

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ, and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

degree-0 degree-1 degree-2 degree-3 Phi =

11 12 from keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

print('shape of x_train: ' + str(x_train.shape))

Polynomial Regression Polynomial Regression

13 (x_train, y_train), (x_test, y_test) = boston_housing.load_data()

Polynomial Regression: Training Polynomial Regression: Training

Question: what will happen as 𝑝 grows?

1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.

Training and Testing Training and Testing

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ# .

Training and Testing Training and Testing

Hyper-Parameter Tuning Hyper-Parameter Tuning

Hyper-Parameter Tuning Hyper-Parameter Tuning

Select Models Using Test Labels

⋯ Cross-Validation (Naïve Approach)

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)

Example: 5-fold cross-validation

Example: 10-Fold Cross-Validation Example: 10-Fold Cross-Validation

The Available Data

Train A Model Prediction

Submission to Leaderboard Submission to Leaderboard

𝐲89:;<= 𝐲8><?@AB Question: Why two leaderboards? 𝐲89:;<= 𝐲8><?@AB

• Polynomial regression for non-linear problems.

You might also like