Professional Documents
Culture Documents
02 01 Regression
02 01 Regression
02 01 Regression
1 2
3 4
1
9/5/23
𝐛 Transpose:
• The ℓ! -distance (Euclidean distance):
𝐚−𝐛 !
(green line) Square matrix: a matrix with the same number of rows and columns.
5 6
Eigenvalue Decomposition
7 8
2
9/5/23
9 10
Optimization: Basics
Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 .
𝐰
11 12
3
9/5/23
Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ . Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ .
𝑦 𝑦
1-dim (𝑑 = 1) example: Question (regard training):
how to compute 𝐰 and 𝑏?
Solution:
𝑦$ ≈ 0.15 𝑥$ + 5.0
𝑥 𝑥
13 14
15 16
4
9/5/23
𝐱$ 𝐱$ 𝐰
𝐱. $ = 𝐱. $ = 0 =
𝐰
1 1 𝑏
17 18
= 𝐱 $# 𝐰
𝐱$ 𝐰 𝐱$ 𝐰
𝐱. $ = 𝐰
0= 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏 𝐱. $ = 0 =
𝐰 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏
1 𝑏 1 𝑏
19 20
5
9/5/23
= 𝐱 $# 𝐰
𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"#
21 22
23 24
6
9/5/23
25 26
2 2
𝐱. "# 𝐰
0 − 𝑦" 𝐱. "# 𝐰
0 − 𝑦"
# #
𝐱. ! 𝐰0 − 𝑦! 𝐱. ! 𝐰0 − 𝑦!
! ! 𝟐
0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 = ∑ &$%" 𝐱. $# 𝐰
0 − 𝑦$ .
! !
⋮ ⋮
𝐱. &# 𝐰
0 − 𝑦& 𝐱. &# 𝐰
0 − 𝑦&
2 2
27 28
7
9/5/23
!
Matrix form: min 𝐗𝐰−𝐲
𝐰∈ℝ!"# !
29 30
31 32
8
9/5/23
33 34
Analytical solution: 𝐰⋆ = 𝐗 # 𝐗 4" 𝐗#𝐲 Conjugate Gradient (CG) Conjugate Gradient (CG)
35 36
9
9/5/23
" "
Convergence: after 𝑂 𝜅 log iterations, Convergence: after 𝑂 𝜅 log iterations,
8 Algorithms 8 Algorithms
𝐗 𝐰! − 𝐰⋆ ≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ . 𝐗 𝐰! − 𝐰⋆ #
≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ #
.
# #
Analytical Solution Analytical Solution
%%&' 𝐗(𝐗 %%&' 𝐗(𝐗
𝜅= is the condition number. 𝜅= is the condition number.
%%)* 𝐗(𝐗 Gradient Descent (GD) %%)* 𝐗(𝐗 Gradient Descent (GD)
37 38
39 40
10
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
In [5]: In [5]:
print('shape of x_train: ' + str(x_train.shape))
from keras.datasets import boston_housing from keras.datasets
print('shape import boston_housing
of x_test: ' + str(x_test.shape)) 9/5/23
print('shape of y_train: ' + str(y_train.shape))
(x_train, y_train), print('shape
(x_train,ofy_train),
(x_test, y_test) = boston_housing.load_data() y_test: ' + str(y_test.shape))
(x_test, y_test) = boston_housing.load_
shape of x_train: (404, 13)
print('shape of x_train: ' + str(x_train.shape)) print('shape
shape of x_train:
of x_test: (102, 13) ' + str(x_train.shape))
shape of y_train: (404,)
print('shape of x_test: ' + str(x_test.shape)) print('shape of x_test: ' + str(x_test.shape))
shape of y_test: (102,)
In
print('shape
[5]:
1. Load Data
of y_train: ' + str(y_train.shape))
2. Add A Feature
print('shape of y_train: ' + str(y_train.shape))
print('shape of y_test: ' + str(y_test.shape)) In print('shape
[14]: of y_test: ' + str(y_test.shape))
from keras.datasets import boston_housing
shape of x_train: (404, 13) shape
import of
numpy x_train: (404, 13)
(x_train,
shape y_train), (x_test,
of x_test: y_test)
(102, 13)= boston_housing.load_data() shape of x_test: (102, 13)
n, d = x_train.shape
shape ofofy_train:
print('shape x_train: ' +(404,)
str(x_train.shape)) shape of
xbar_train y_train: (404,)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape ofofy_test:
print('shape x_test: ' (102,)
+ str(x_test.shape)) shape of y_test: (102,)axis=1)
print('shape of y_train: ' + str(y_train.shape))
print('shape of y_test: ' + str(y_test.shape)) print('shape of x_train: ' + str(x_train.shape))
In [14]:
shape of x_train: (404, 13)
In [14]: of xbar_train: ' + str(xbar_train.shape))
print('shape
shape of x_test: (102, 13) shape of x_train: (404, 13)
import
shape numpy(404,)
of y_train: import numpy
shape of xbar_train: (404, 14)
shape of y_test: (102,)
n, d = x_train.shape In n, d = x_train.shape
[15]:
In [14]:
xbar_train = numpy.concatenate((x_train, numpy.ones((n, xbar_train
1))), = numpy.concatenate((x_train, numpy.ones((n, 1)
import numpy
# the analytical solution
41 axis=1) 42 axis=1)
n, d = x_train.shape xx = numpy.dot(xbar_train.T, xbar_train)
print('shape
xbar_train of x_train: ' +numpy.ones((n,
= numpy.concatenate((x_train, str(x_train.shape))
1))), xx_inv = numpy.linalg.pinv(xx)
print('shape of x_train: ' + str(x_train.shape))
axis=1) xy = numpy.dot(xbar_train.T, y_train)
print('shape of xbar_train: ' + str(xbar_train.shape)) print('shape of xbar_train: ' + str(xbar_train.shape))
w = numpy.dot(xx_inv, xy)
print('shape of x_train: ' + str(x_train.shape))
shape ofofx_train:
print('shape xbar_train: (404, 13)
' + str(xbar_train.shape)) shape of x_train: (404, 13)
In [21]:
shape of xbar_train: (404, 14) shape of xbar_train: (404, 14)
3. Solve the Least Squares
shape of x_train: (404, 13)
shape of xbar_train: (404, 14) 3. Solve the Least Squares
# mean squared error (training)
In [15]: In [15]:
y_lsr = numpy.dot(xbar_train, w)
In [15]: diff = y_lsr - y_train
Analytical solution: 𝐰
0 = 𝐗 0# 𝐗
0 4𝟏 𝐗
0# 𝐲 Analytical solution: 𝐰 0# 𝐗
0 * =diff)
𝐗 0 4𝟏 𝐗
0# 𝐲
# #
the the analytical
analytical solution solution mse# =the analytical
numpy.mean(diff solution
print('Train MSE: ' + str(mse))
xx = numpy.dot(xbar_train.T, xbar_train)
xx =
xx_inv numpy.dot(xbar_train.T, xbar_train)
= numpy.linalg.pinv(xx) xx MSE:
Train = numpy.dot(xbar_train.T,
22.00480083834814 xbar_train)
xy xx_inv = numpy.linalg.pinv(xx)
= numpy.dot(xbar_train.T, y_train) xx_inv = numpy.linalg.pinv(xx)
w =xy
numpy.dot(xx_inv, xy)
= numpy.dot(xbar_train.T, y_train) In xy
[22]:
= numpy.dot(xbar_train.T, y_train)
w = numpy.dot(xx_inv, xy) w = numpy.dot(xx_inv, xy)
print(y_train[0:10])
In [21]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
# mean squared error (training)
In [21]: In [21]:
y_lsr = numpy.dot(xbar_train, w) In [23]:
# =mean
diff y_lsr squared
- y_train error (training) # mean squared error (training)
mse = numpy.mean(diff * diff) # prediction for the test samples
43print('Train MSE: ' + str(mse)) 44
y_lsr = numpy.dot(xbar_train, w) y_lsr_ == x_test.shape
n_test, numpy.dot(xbar_train, w)
Train MSE: 22.00480083834814 xbar_test
diff = y_lsr - y_train diff == y_lsr
numpy.concatenate((x_test,
- y_train numpy.ones((n_test, 1))), axis=1)
y_pred = numpy.dot(xbar_test, w)
In mse
[22]: = numpy.mean(diff * diff) mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse)) print('Train MSE: ' + str(mse))
print(y_train[0:10]) In [25]: 11
Train
[15.2 42.3 MSE: 22.00480083834814
50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] Train
# mean MSE:error
squared 22.00480083834814
(testing)
n, d shape of x_train: (404, 13)
= x_train.shape
shape of
xbar_train x_test: (102, 13)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape of y_train: (404,) axis=1) 9/5/23
shape of y_test: (102,)
print('shape of x_train: ' + str(x_train.shape))
print('shape of xbar_train: ' + str(xbar_train.shape))
In [14]:
shape of x_train: (404, 13)
shapeimport numpy
of xbar_train: (404, 14)
3. Solve the Least Squares 3. Solve the Least Squares
n, d = x_train.shape
In [15]: " !
xbar_train = numpy.concatenate((x_train, numpy.ones((n,
Training1))),
Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
Analytical solution: 𝐰
0 = 0# 𝐗
𝐗 0 4𝟏 𝐗
0# 𝐲 axis=1)
& !
# the analytical solution
xx = print('shape of x_train:
numpy.dot(xbar_train.T, ' + str(x_train.shape))
xbar_train)
print('shape
xx_inv of xbar_train: ' + str(xbar_train.shape))
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
shape of x_train:xy)
w = numpy.dot(xx_inv, (404, 13)
shape of xbar_train: (404, 14)
In [21]:
In [15]:
# mean squared error (training)
45 # the analytical solution 46
y_lsr = numpy.dot(xbar_train, w)
diff xx
= y_lsr - y_train
= numpy.dot(xbar_train.T, xbar_train)
mse =xx_inv
numpy.mean(diff * diff)
= numpy.linalg.pinv(xx)
print('Train MSE: ' + str(mse))
xy = numpy.dot(xbar_train.T, y_train)
Trainw MSE:
= numpy.dot(xx_inv,
22.00480083834814xy)
3. Solve the Least Squares Linear Regression for Housing Price
In [21]:
In [22]: " !
Training Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
& ! Linear
# mean squared error (training)
print(y_train[0:10])
Regressor
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] 𝐰0
y_lsr = numpy.dot(xbar_train, w)
diff = y_lsr - y_train Train
In [23]:
mse = numpy.mean(diff * diff)
label, 𝑦$
print('Train
# prediction MSE:
for the ' +samples
test str(mse))
Train MSE: 22.00480083834814 features, 𝐱 $
n_test, _ = x_test.shape
xbar_test = numpy.concatenate((x_test, numpy.ones((n_test, 1))), axis=1)
In =[22]:
y_pred numpy.dot(xbar_test, w)
47 48
print(y_train[0:10])
In [25]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
# mean squared error (testing)
12
diff In [23]: - y_test
= y_pred
mse = numpy.mean(diff * diff)
shape of x_train: (404, 13) In [21]:
shape of xbar_train: (404, 14)
# mean squared error (training) 9/5/23
In [15]: y_lsr = numpy.dot(xbar_train, w)
diff = y_lsr - y_train
# the analytical solution mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse))
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] In [26]:
# Baseline
In [23]:
y_mean = numpy.mean(y_train)
# prediction for the test samples
4. Make Prediction for Test Samples 4. Make Prediction for Test Samples
diff = y_pred - y_mean
mse = numpy.mean(diff * diff)
n_test, _ = x_test.shape print('Test MSE: ' + str(mse))
• Add a feature
xbar_test = numpy.concatenate((x_test, 0 ;<=; .
to the test feature matrix: 𝐗 ;<=;numpy.ones((n_test,
è𝐗 1))), axis=1)
Test MSE: 57.38297638530044
y_pred = numpy.dot(xbar_test, 0 w)0
• Make prediction by: 𝐲>?<@ = 𝐗 ;<=; 𝐰.
! Also known as out-of-sample error.
" Test MSE is 23.2
• MSE (test): 𝐲>?<@ − 𝐲;<=;
In [25]: & '()' !
The gap is an indicator of generalization.
# mean squared error (testing)
51 In [26]: 52
# Baseline
y_mean = numpy.mean(y_train)
13
diff = y_pred - y_mean
mse = numpy.mean(diff * diff)
y_pred = numpy.dot(xbar_test, w)
In [25]: 9/5/23
# mean squared error (testing)
5.23.195599256409857
Test MSE: Compare with Baseline Summary
InTrivial
[26]: baseline:
• Linear regression problem.
• whatever the features are, the prediction is mean(𝐲).
# Baseline • Least squares model.
• 3 algorithms for solving the model.
y_mean = numpy.mean(y_train) • Make predictions for never-seen-before test data.
• Evaluation of the model (training MSE and test MSE).
diff = y_pred - y_mean
• Compare with baselines.
mse = numpy.mean(diff * diff)
print('Test MSE: ' + str(mse))
Test MSE of least
Test MSE: 57.38297638530044 squares is 23.19
53 54
14