Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

9/5/23

Polynomial Regression Warm-up: Linear Regression

1 2

Linear Regression (Task) Least Squares Regression (Method)


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ

Output: a vector 𝐰 ∈ ℝ# and scalar 𝑏 ∈ ℝ such that 𝐱 $% 𝐰 + 𝑏 ≈ 𝑦$ . 1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min
!"#
𝐗𝐰−𝐲 ,
.
𝐰∈ℝ

Tasks assume 𝑦$ is a linear Tasks Methods


function of 𝐱 $ .

Linear Linear Least Squares Regression


Regression Regression

3 4

1
9/5/23

Least Squares Regression (Method)


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ
1. Add one dimension to 𝐱 ∈ ℝ# : 𝐱& = [𝐱& ; 1] ∈ ℝ#'!.
,
2. Solve least squares regression: min 𝐗𝐰−𝐲 .
𝐰∈ℝ!"# , Polynomial Regression
Tasks Methods Algorithms
Analytical Solution
Linear Least Squares Regression
Regression Gradient Descent

Conjugate Gradient

5 6

The Regression Task The Regression Task


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦. Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 ≈ 𝑦.

Question: 𝑓 is unknown! So how to learn 𝑓? Question: 𝑓 is unknown! So how to learn 𝑓?

Answer: polynomial approximation; 𝑓 is a polynomial function.


"$$ #
Taylor expansion: 𝑓 𝑥 = 𝑓 𝑎 + 𝑓 ! 𝑎 𝑎 − 𝑥 + 𝑎−𝑥 $ +⋯
$!

7 8

2
9/5/23

Polynomial Regression: 1D Example Polynomial Regression: 1D Example


Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: scalars 𝑥!, ⋯ , 𝑥" ∈ ℝ and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.

Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦. Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝑥 ≈ 𝑦.

One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤.𝑥 .. One-dimensional example: 𝑓 𝑥 = 𝑤- + 𝑤!𝑥 + 𝑤,𝑥 , + ⋯ + 𝑤. 𝑥 . .

Polynomial regression:
1. Define a feature map 𝛟 𝑥 = [1, 𝑥, 𝑥 ,, 𝑥 /, ⋯ , 𝑥 . ].
2. For 𝑗 = 1 to 𝑛, do the mapping 𝑥& ↦ 𝛟(𝑥& ).
• Let 𝚽 = 𝛟 𝑥! ; ⋯ , 𝛟 𝑥" % ∈ ℝ"× .'!
,
3. Solve the least squares regression min
%"#
𝚽𝐰−𝐲 ,
.
𝐰∈ℝ

9 10

Polynomial Regression: 2D Example Polynomial Regression


In [1]:

import numpy

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ, and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.


X = numpy.arange(6).reshape(3, 2)
print('X = ')
print(X)
Output: a function 𝑓: ℝ, ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ . X =
[[0 1]
[2 3]
Two-dimensional example: how to do feature mapping? [4 5]]

In [2]:
Polynomial features:
from sklearn.preprocessing import PolynomialFeatures
𝛟 𝐱 = [1, 𝑥!, 𝑥,, 𝑥!,, 𝑥,,, 𝑥!𝑥,, 𝑥!/, 𝑥,/, 𝑥!𝑥,,, 𝑥!,𝑥, ]. poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
print('Phi = ')
print(Phi)

degree-0 degree-1 degree-2 degree-3 Phi =


[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]

In [1]:

11 12 from keras.datasets import boston_housing

(x_train, y_train), (x_test, y_test) = boston_housing.load_data()

print('shape of x_train: ' + str(x_train.shape))


print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))

3
print('shape of y_test: ' + str(y_test.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the sec
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
shape of x_train: (404, 13)
9/5/23

In [1]:

Polynomial Regression Polynomial Regression


import numpy
XIn= [1]:
numpy.arange(6).reshape(3, 2)
print('X = ')
import numpy
print(X)
X = numpy.arange(6).reshape(3, 2)
Xprint('X
= = ')
[[0 1]
print(X)
[2 3] • 𝐱: 𝑑-dimensional
X[4
= 5]]
[[0 1] • 𝛟(𝐱): degree-𝑝 polynomial
[2 3]
In[4[2]:
5]] • The dimension of 𝛟 𝐱 is 𝑂 𝑑 .
from sklearn.preprocessing import PolynomialFeatures
In [2]:
poly = PolynomialFeatures(degree=3)
Phi = poly.fit_transform(X)
from sklearn.preprocessing
print('Phi = ') import PolynomialFeatures
poly = PolynomialFeatures(degree=3)
print(Phi)
Phi = poly.fit_transform(X)
print('Phi
Phi = = ')
print(Phi)
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
Phi
[ 1.= 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
[[ 1. 0. 1. 0. 0. 1. 0. 0. 0. 1.]
[ 1. 2. 3. 4. 6. 9. 8. 12. 18. 27.]
In [1]:
[ 1. 4. 5. 16. 20. 25. 64. 80. 100. 125.]]
degree-0 degree-1 degree-2 degree-3
from keras.datasets import boston_housing
In [1]:

13 (x_train, y_train), (x_test, y_test) = boston_housing.load_data()


from keras.datasets import boston_housing 14
print('shape of x_train: ' + str(x_train.shape))
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))
print('shape of x_train: ' + str(x_train.shape))
print('shape of y_test: ' + str(y_test.shape))
print('shape of x_test: ' + str(x_test.shape))
print('shape of y_train: ' + str(y_train.shape))
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `
print('shape
from ._conv of y_test:
import ' + str(y_test.shape))
register_converters as _register_converters
Using TensorFlow backend.
/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `n
fromof
shape ._conv import
x_train: register_converters
(404, 13) as _register_converters
Using of
shape TensorFlow backend.
x_test: (102, 13)

Polynomial Regression
shape of y_train: (404,)
shape of x_train: (404, 13)
shape of y_test: (102,)
shape of x_test: (102, 13)
shape of y_train: (404,)
In [2]:
Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ
shape #
of y_test: (102,)
and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
import numpy
In [2]:
#
Output: a function 𝑓: ℝ ↦ ℝ such that 𝑓 𝐱 ≈𝑦.
n, d = x_train.shape
import numpy $ $
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
n, d = x_train.shape
axis=1)
xbar_train = numpy.concatenate((x_train, numpy.ones((n, 1))),
Training, Test, and Overfitting
print('shape of x_train: ' + str(x_train.shape))
axis=1)
print('shape of xbar_train: ' + str(xbar_train.shape))
shape
!
print('shape of x_train:
of x_train:
# of xbar_train:
print('shape
shape
(404, 13)
of xbar_train: "
' + str(x_train.shape))
' + str(xbar_train.shape))
(404, 14)
shape of x_train: (404, 13)
In [3]:
shape of xbar_train: (404, 14)
# the analytical solution
In [3]:
$
xx = numpy.dot(xbar_train.T,
# the analytical solution % $&
xbar_train)
xx_inv = numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
xx = numpy.dot(xbar_train.T, xbar_train)
w = numpy.dot(xx_inv,
xx_inv xy)
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
15 w =[4]:
In numpy.dot(xx_inv, xy)
16
# mean squared error (training)
In [4]:
y_lsr
# mean= squared
numpy.dot(xbar_train,
error (training)w)
diff = y_lsr - y_train
mse = numpy.mean(diff * diff)
y_lsr = numpy.dot(xbar_train, w)
print('Train
diff = y_lsr -MSE: ' + str(mse))
y_train
mse = MSE:
Train numpy.mean(diff * diff)
22.00480083834814
print('Train MSE: ' + str(mse)) 4
Train
In MSE: 22.00480083834814
[5]:
print(y_train[0:10])
9/5/23

Polynomial Regression: Training Polynomial Regression: Training


Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ. Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.
Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . . Feature map: 𝛟 𝐱 =⊗. 𝐱 . Its dimension is 𝑂 𝑑 . .
' '
Least squares: min 𝚽𝐰−𝐲
'
. Least squares: min 𝚽𝐰−𝐲 '
.
𝐰 𝐰

Question: what will happen as 𝑝 grows?

1. For sufficiently large 𝑝, the dimension of the feature 𝛟 𝐱 exceeds 𝑛.


2. Then you can find 𝐰 such that 𝚽 𝐰 = 𝐲. (Zero training error!)

17 18

Training and Testing Training and Testing

Input: vectors 𝐱!, ⋯ , 𝐱 " ∈ ℝ# and labels 𝑦!, ⋯ , 𝑦" ∈ ℝ.


Train:
Output: a function 𝑓: ℝ# ↦ ℝ such that 𝑓 𝐱 $ ≈ 𝑦$ .

Input: a never-seen-before feature vectors 𝐱′ ∈ ℝ# .


Test:
Input: predict its label by 𝑓 𝐱 7 .

Underfitting Overfitting

19 20

5
9/5/23

Training and Testing Training and Testing

𝐱′ 𝐱′ 𝐱′
BAD GOOD BAD linear model
Degree-15 polynomial
Degree-4 polynomial

21 22

Hyper-Parameter Tuning Hyper-Parameter Tuning


Question: for the polynomial regression model, how to determine the degree 𝑝? Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2
Answer: the degree 𝑝 leads to the smallest test error.
Train a degree-2 polynomial regression Test MSE = 19.0
Train a degree-3 polynomial regression Test MSE = 16.7
Train a degree-4 polynomial regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0

23 24

6
9/5/23

Hyper-Parameter Tuning Hyper-Parameter Tuning


Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Test MSE = 23.2
Train a degree-2 polynomial regression Test MSE = 19.0 Train a degree-2 polynomial regression Test MSE = 19.0
labl e!
labels are unavaiTest
• Wrong! The test
Train a degree-3 polynomial regression Test MSE = 16.7 Train a degree-3 polynomial regression MSE = 16.7
th e te st labels, ne ver do this!
Train a degree-4 polynomial regression Test MSE = 12.2 if yo u ha
Train a degree-4 polynomial
• Even ve
regression Test MSE = 12.2
Train a degree-5 polynomial regression Test MSE = 14.8 Train a degree-5 polynomial regression Test MSE = 14.8
Train a degree-6 polynomial regression Test MSE = 25.1 Train a degree-6 polynomial regression Test MSE = 25.1
Train a degree-7 polynomial regression Test MSE = 39.4 Train a degree-7 polynomial regression Test MSE = 39.4
Train a degree-8 polynomial regression Test MSE = 53.0 Train a degree-8 polynomial regression Test MSE = 53.0

25 26

Select Models Using Test Labels

⋯ Cross-Validation (Naïve Approach)


for Hyper-Parameter Tuning

27 28

7
9/5/23

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)

labels labels

features features

𝑛 training samples 𝑚 test samples 𝑛 training samples 𝑛&'( validation 𝑚 test samples
samples

29 30

Cross-Validation (Naïve Approach) Cross-Validation (Naïve Approach)


Validation Set
Training Set Test Set Training Set Test Set
Train a degree-1 polynomial regression Test MSE = 23.2 Train a degree-1 polynomial regression Valid. MSE = 23.1
Train a degree-2 polynomial regression Test MSE = 19.0 Train a degree-2 polynomial regression Valid. MSE = 19.2
Train a degree-3 polynomial regression Test MSE = 16.7 Train a degree-3 polynomial regression Valid. MSE = 16.3
Train a degree-4 polynomial regression Test MSE = 12.2 Train a degree-4 polynomial regression Valid. MSE = 12.5
Train a degree-5 polynomial regression Test MSE = 14.8 Train a degree-5 polynomial regression Valid. MSE = 14.4
Train a degree-6 polynomial regression Test MSE = 25.1 Train a degree-6 polynomial regression Valid. MSE = 25.0
Train a degree-7 polynomial regression Test MSE = 39.4 Train a degree-7 polynomial regression Valid. MSE = 39.1
Train a degree-8 polynomial regression Test MSE = 53.0 Train a degree-8 polynomial regression Valid. MSE = 53.5

31 32

8
9/5/23

𝑘-Fold Cross-Validation
1. Propose a grid of hyper-parameters.
• E.g. 𝑝 ∈ {1, 2, 3, 4, 5, 6}.
2. Randomly partition the training samples to 𝑘 parts.
𝑘-Fold Cross-Validation • 𝑘 − 1 parts for training.
• One part for test.
3. Compute the averaged test errors of the 𝑘 repeats.
• The average is called the validation error.
4. Choose the hyper-parameter 𝑝 that leads to the
smallest validation error.

Example: 5-fold cross-validation

33 34

Example: 10-Fold Cross-Validation Example: 10-Fold Cross-Validation


hyper-parameter validation error

p=1 23.19
p=2 21.00
p=3 18.54
validation error
p=4 24.36
p=5 27.96
p=6 33.10

35 36

9
9/5/23

The Available Data


Training Public Private
Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB
Real-World Machine Learning Competition
Test Data
The public and private are mixed;
Participants cannot distinguish them.

37 38

Train A Model Prediction


Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB

Model
𝐲89:;<= 𝐲8><?@AB
Model

39 40

10
9/5/23

Submission to Leaderboard Submission to Leaderboard


Training Public Private Training Public Private
Labels: 𝐲 unknown unknown Labels: 𝐲 unknown unknown
Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB Features: 𝐗 𝐗 89:;<= 𝐗 8><?@AB

𝐲89:;<= 𝐲8><?@AB Question: Why two leaderboards? 𝐲89:;<= 𝐲8><?@AB

Submission Submission
Answer: The score can be evilly used
for hyper-parameter tuning (cheating).
Score=0.9527 Secret! Score=0.9527 Secret!

41 42

Summary

• Polynomial regression for non-linear problems.


• Polynomial regression has a hyper-parameter 𝑝.
• Underfitting (very small 𝑝) and overfitting (very big 𝑝) .
• Tune the hyper-parameters using cross-validation.
• Make your model parameters and hyper-parameters
independent of the test set!!!

43

11

You might also like