02 01 Regression

9/5/23
Linear Regression Warm-up: Vector and Matrix
1 2
Vector and Matrix Vector Norms

2 3 2 3
2 a13 a1
6aa12 7 6 a2 7
66 7 7 6 7 P p 1/p .
Vector (𝑛 n-dim
-dim) vector
n-dim a vector
= 66a2. 77a = 6 .7 • The `p norm: kxkp :=
n-dim vector a = 64 ... 75 4 .. 5 i |xi |
4 .. 5 an
an P 2
an
2 2 3 3 1/2
2 a11 a12 a· 11 ·· a12 1d3 · · · a1d • The `2 norm: kxk2 = i xi (the Euclidean norm).
a
6 a11 a 12 6·a ··21
··· aa1d 7 a2d 7
66a 21A aa=22 6 2d77 · · ·
22 7 P
n ⇥ d matrix
n ⇥ d) matrix A = 66 21 6
.. 4 ....·
· · a2d .. 7
Matrix (𝑛×𝑑 .. 22 .. 77 . . . 5 • The `1 norm kxk1 = i |xi |.
n ⇥ d matrix A = 64 .. ... . .. . ... 75 .
4 .. . 5
an1 a.n2 a· n1 ·· a.n2nd · · · and
an1 an2 · · · and • The `1 norm is defined by kxk1 = maxi |xi |.
2 3 2 3
2 a1:3 a1:
6 a 7 6 7
⇥ 6 aa2:77⇤ = 6 a2: 7
⇥ ⇤ 1:
Row and columns
A = ⇥ a:1 Aa:2= · a· ·:1 aa:d:2⇤ =· · ·6
6 a
6 2:. 77
:d 6 . 7
A = a:1 a:2 · · · a:d = 64 ... 75 4 .. 5
4 .. 5 an:
an:
an:
3 4
1
9/5/23
Vector Norms Transpose and Rank
𝐛 Transpose:
• The ℓ! -distance (Euclidean distance):
𝐚−𝐛 !
(green line) Square matrix: a matrix with the same number of rows and columns.
• The ℓ" -distance (Manhattan distance): Symmetric: a square matrix 𝐀 is symmetric if 𝐀# = 𝐀.

𝐚−𝐛 "
(red, blue, yellow lines) Rank: the number of linearly independent rows (or columns).
Full rank: a square matrix is full rank if its rank is equal to #columns.
𝐚
5 6
Eigenvalue Decomposition
• Let 𝐀 be any 𝑛×𝑛 symmetric matrix.

• Eigenvalue decomposition: 𝐀 = ∑& #
$%" 𝜆$ 𝐯$ 𝐯$ . Warm-up: Optimization
• Eigenvalues satisfy 𝜆" ≥ 𝜆! ≥ ⋯ ≥ |𝜆&|.
• Eigenvectors satisfy 𝐯$# 𝐯' = 0 for all 𝑖 ≠ 𝑗.
• 𝐀 is full rank all the eigenvalues are nonzero.
7 8
2
9/5/23
Optimization: Basics Optimization: Basics

Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 . Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 .
𝐰 𝐰
• 𝐰 = 𝑤", ⋯ , 𝑤( : optimization variables • 𝐰 = 𝑤", ⋯ , 𝑤( : optimization variables
• 𝑓 ∶ ℝ( ↦ ℝ : objective function • 𝑓 ∶ ℝ( ↦ ℝ : objective function
Constraint
• 𝒞 (a subset of ℝ( ) : feasible set • 𝒞 (a subset of ℝ( ) : feasible set
9 10
Optimization: Basics
Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 .
𝐰
Optimal solution: 𝐰⋆ = argmin 𝑓 𝐰 . Least Squares Regression

𝐰∈𝒞
• 𝑓 𝐰 ⋆ ≤ 𝑓 𝐰 for all the vectors 𝐰 in the set 𝒞.
• 𝐰 ⋆ may not exist, e.g., 𝒞 is the empty set.
• If 𝐰 ⋆ exists, it may not be unique.
11 12
3
9/5/23
Linear Regression Linear Regression

Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ
Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ . Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ .
𝑦 𝑦
1-dim (𝑑 = 1) example: Question (regard training):
how to compute 𝐰 and 𝑏?
Solution:
𝑦$ ≈ 0.15 𝑥$ + 5.0
𝑥 𝑥
13 14
Least Squares Regression Least Squares Regression

Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ • The optimization model:
𝟐
Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ . min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,,
Method: least squares regression.

Intercept (or bias)
• The optimization model:
𝟐
min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,,
15 16
4
9/5/23

• The optimization model: • The optimization model:
𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,, 𝐰,,
𝐱$ 𝐱$ 𝐰
𝐱. $ = 𝐱. $ = 0 =
𝐰
1 1 𝑏
17 18

𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,, 𝐰,,
= 𝐱 $# 𝐰
𝐱$ 𝐰 𝐱$ 𝐰
𝐱. $ = 𝐰
0= 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏 𝐱. $ = 0 =
𝐰 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏
1 𝑏 1 𝑏
19 20
5
9/5/23

𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰,, 𝐰∈ℝ!"#
= 𝐱 $# 𝐰
𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"#
21 22

𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#
𝐱"# 1 𝐱"# 1 𝑦"

𝐱 !# 1 𝐱 !# 1 𝑦!
0=
𝐗 𝐱 0# 1 0=
𝐗 𝐱 0# 1 𝐲= 𝑦0
⋮ ⋮ ⋮ ⋮ ⋮
𝐱 &# 1 𝐱 &# 1 𝑦&
23 24
6
9/5/23

𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#
𝐱. "# 𝐰0 𝐱. "# 𝐰0 𝐱. "# 𝐰

0 − 𝑦"
𝐱. !# 𝐰0 𝐱. !# 𝐰0 𝐱. !# 𝐰
0 − 𝑦!
0𝐰
𝐗 0= #
𝐱. 0 𝐰 0 0𝐰
𝐗 0 = #
𝐱. 0 𝐰 0 0𝐰
𝐗 0−𝐲= 𝐱. 0# 𝐰
0 − 𝑦0
⋮ ⋮ ⋮
𝐱. &# 𝐰0 𝐱. &# 𝐰0 𝐱. &# 𝐰
0 − 𝑦&
25 26

𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#
2 2
𝐱. "# 𝐰
0 − 𝑦" 𝐱. "# 𝐰
0 − 𝑦"
# #
𝐱. ! 𝐰0 − 𝑦! 𝐱. ! 𝐰0 − 𝑦!
! ! 𝟐
0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 = ∑ &$%" 𝐱. $# 𝐰
0 − 𝑦$ .
! !
⋮ ⋮
𝐱. &# 𝐰
0 − 𝑦& 𝐱. &# 𝐰
0 − 𝑦&
2 2
27 28
7
9/5/23

𝟐 !
min ∑& 𝐱 $# 𝐰 − 𝑦$ min 𝐗𝐰−𝐲 !
!"# $%" 𝐰∈ℝ!"#
𝐰∈ℝ
!
Matrix form: min 𝐗𝐰−𝐲
𝐰∈ℝ!"# !
29 30

! !
min 𝐗𝐰−𝐲 min 𝐗𝐰−𝐲 !
𝐰∈ℝ !"# ! 𝐰∈ℝ!"#
Tasks Methods Algorithms Tasks Methods Algorithms

Least Squares Regression Least Squares Regression Analytical Solution
Linear
Regression LASSO ? Linear
Regression LASSO Gradient Descent (GD)
Least Absolute Deviations Least Absolute Deviations Conjugate Gradient (CG)
31 32
8
9/5/23

• Solve the optimization model: • Solve the optimization model:
! !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#
$ $
1 𝐗 𝐰4𝐲 1 𝐗 𝐰4𝐲
Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 = 𝟎
1 𝐰 1 𝐰
Algorithms Algorithms
Analytical Solution Analytical Solution
1st-order optimality condition
Gradient Descent (GD) Gradient Descent (GD)
Conjugate Gradient (CG) Conjugate Gradient (CG)
33 34

! !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#
$
1 𝐗 𝐰4𝐲
Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 = 𝟎
1 𝐰
Algorithms Algorithms
Normal equation: 𝐗 # 𝐗 𝐰 ⋆ = 𝐗 # 𝐲 Gradient descent repeats:
1. Compute gradient: 𝐠 6 = 𝐗 # 𝐗 𝐰6 − 𝐗 # 𝐲
Assume 𝐗#𝐗 is full rank. Gradient Descent (GD) 2. Update: 𝐰67" = 𝐰6 − 𝛼 𝐠 6 Gradient Descent (GD)
Analytical solution: 𝐰⋆ = 𝐗 # 𝐗 4" 𝐗#𝐲 Conjugate Gradient (CG) Conjugate Gradient (CG)
35 36
9
9/5/23

! !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#
" "
Convergence: after 𝑂 𝜅 log iterations, Convergence: after 𝑂 𝜅 log iterations,
8 Algorithms 8 Algorithms
𝐗 𝐰! − 𝐰⋆ ≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ . 𝐗 𝐰! − 𝐰⋆ #
≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ #
.
# #
%%&' 𝐗(𝐗 %%&' 𝐗(𝐗
𝜅= is the condition number. 𝜅= is the condition number.
%%)* 𝐗(𝐗 Gradient Descent (GD) %%)* 𝐗(𝐗 Gradient Descent (GD)
Conjugate Gradient (CG) Conjugate Gradient (CG)

The pseudo-code of CG is available at the Wikipedia.
37 38
Least Squares Regression

• Solve the optimization model:
!
min
!"#
𝐗𝐰−𝐲 !
𝐰∈ℝ
Solve Least Squares in Python

Tasks Methods Algorithms
Least Squares Regression Analytical Solution
Linear
Regression LASSO Gradient Descent (GD)
Least Absolute Deviations Conjugate Gradient (CG)
39 40
10
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
In [5]: In [5]:
print('shape of x_train: ' + str(x_train.shape))
from keras.datasets import boston_housing from keras.datasets
print('shape import boston_housing
of x_test: ' + str(x_test.shape)) 9/5/23
print('shape of y_train: ' + str(y_train.shape))
(x_train, y_train), print('shape
(x_train,ofy_train),
(x_test, y_test) = boston_housing.load_data() y_test: ' + str(y_test.shape))
(x_test, y_test) = boston_housing.load_
shape of x_train: (404, 13)
print('shape of x_train: ' + str(x_train.shape)) print('shape
shape of x_train:
of x_test: (102, 13) ' + str(x_train.shape))
shape of y_train: (404,)
print('shape of x_test: ' + str(x_test.shape)) print('shape of x_test: ' + str(x_test.shape))
shape of y_test: (102,)
In
print('shape
[5]:
1. Load Data
of y_train: ' + str(y_train.shape))
2. Add A Feature
print('shape of y_test: ' + str(y_test.shape)) In print('shape
[14]: of y_test: ' + str(y_test.shape))
from keras.datasets import boston_housing
shape of x_train: (404, 13) shape
import of
numpy x_train: (404, 13)
(x_train,
shape y_train), (x_test,
of x_test: y_test)
(102, 13)= boston_housing.load_data() shape of x_test: (102, 13)
n, d = x_train.shape
shape ofofy_train:
print('shape x_train: ' +(404,)
str(x_train.shape)) shape of
xbar_train y_train: (404,)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape ofofy_test:
print('shape x_test: ' (102,)
+ str(x_test.shape)) shape of y_test: (102,)axis=1)
print('shape of y_test: ' + str(y_test.shape)) print('shape of x_train: ' + str(x_train.shape))
In [14]:
In [14]: of xbar_train: ' + str(xbar_train.shape))
print('shape
shape of x_test: (102, 13) shape of x_train: (404, 13)
import
shape numpy(404,)
of y_train: import numpy
shape of xbar_train: (404, 14)
n, d = x_train.shape In n, d = x_train.shape
[15]:
In [14]:
xbar_train = numpy.concatenate((x_train, numpy.ones((n, xbar_train
1))), = numpy.concatenate((x_train, numpy.ones((n, 1)
import numpy
# the analytical solution
41 axis=1) 42 axis=1)
n, d = x_train.shape xx = numpy.dot(xbar_train.T, xbar_train)
print('shape
xbar_train of x_train: ' +numpy.ones((n,
= numpy.concatenate((x_train, str(x_train.shape))
1))), xx_inv = numpy.linalg.pinv(xx)
axis=1) xy = numpy.dot(xbar_train.T, y_train)
print('shape of xbar_train: ' + str(xbar_train.shape)) print('shape of xbar_train: ' + str(xbar_train.shape))
w = numpy.dot(xx_inv, xy)
shape ofofx_train:
print('shape xbar_train: (404, 13)
' + str(xbar_train.shape)) shape of x_train: (404, 13)
In [21]:
shape of xbar_train: (404, 14) shape of xbar_train: (404, 14)
3. Solve the Least Squares
shape of xbar_train: (404, 14) 3. Solve the Least Squares
# mean squared error (training)
In [15]: In [15]:
y_lsr = numpy.dot(xbar_train, w)
In [15]: diff = y_lsr - y_train
Analytical solution: 𝐰
0 = 𝐗 0# 𝐗
0 4𝟏 𝐗
0# 𝐲 Analytical solution: 𝐰 0# 𝐗
0 * =diff)
𝐗 0 4𝟏 𝐗
0# 𝐲
# #
the the analytical
analytical solution solution mse# =the analytical
numpy.mean(diff solution
print('Train MSE: ' + str(mse))
xx = numpy.dot(xbar_train.T, xbar_train)
xx =
xx_inv numpy.dot(xbar_train.T, xbar_train)
= numpy.linalg.pinv(xx) xx MSE:
Train = numpy.dot(xbar_train.T,
22.00480083834814 xbar_train)
xy xx_inv = numpy.linalg.pinv(xx)
= numpy.dot(xbar_train.T, y_train) xx_inv = numpy.linalg.pinv(xx)
w =xy
numpy.dot(xx_inv, xy)
= numpy.dot(xbar_train.T, y_train) In xy
[22]:
= numpy.dot(xbar_train.T, y_train)
w = numpy.dot(xx_inv, xy) w = numpy.dot(xx_inv, xy)
print(y_train[0:10])
In [21]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
In [21]: In [21]:
y_lsr = numpy.dot(xbar_train, w) In [23]:
# =mean
diff y_lsr squared
- y_train error (training) # mean squared error (training)
mse = numpy.mean(diff * diff) # prediction for the test samples
43print('Train MSE: ' + str(mse)) 44
y_lsr = numpy.dot(xbar_train, w) y_lsr_ == x_test.shape
n_test, numpy.dot(xbar_train, w)
Train MSE: 22.00480083834814 xbar_test
diff = y_lsr - y_train diff == y_lsr
numpy.concatenate((x_test,
- y_train numpy.ones((n_test, 1))), axis=1)
y_pred = numpy.dot(xbar_test, w)
In mse
[22]: = numpy.mean(diff * diff) mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse)) print('Train MSE: ' + str(mse))
print(y_train[0:10]) In [25]: 11
Train
[15.2 42.3 MSE: 22.00480083834814
50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] Train
# mean MSE:error
squared 22.00480083834814
(testing)
n, d shape of x_train: (404, 13)
= x_train.shape
shape of
xbar_train x_test: (102, 13)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape of y_train: (404,) axis=1) 9/5/23
print('shape of xbar_train: ' + str(xbar_train.shape))
In [14]:
shapeimport numpy
of xbar_train: (404, 14)
3. Solve the Least Squares 3. Solve the Least Squares
n, d = x_train.shape
In [15]: " !
xbar_train = numpy.concatenate((x_train, numpy.ones((n,
Training1))),
Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
Analytical solution: 𝐰
0 = 0# 𝐗
𝐗 0 4𝟏 𝐗
0# 𝐲 axis=1)
& !
# the analytical solution
xx = print('shape of x_train:
numpy.dot(xbar_train.T, ' + str(x_train.shape))
xbar_train)
print('shape
xx_inv of xbar_train: ' + str(xbar_train.shape))
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
shape of x_train:xy)
w = numpy.dot(xx_inv, (404, 13)
In [21]:
In [15]:
45 # the analytical solution 46
diff xx
= y_lsr - y_train
= numpy.dot(xbar_train.T, xbar_train)
mse =xx_inv
numpy.mean(diff * diff)
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
Trainw MSE:
= numpy.dot(xx_inv,
22.00480083834814xy)
3. Solve the Least Squares Linear Regression for Housing Price
In [21]:
In [22]: " !
Training Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
& ! Linear
Regressor
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] 𝐰0
diff = y_lsr - y_train Train
In [23]:
mse = numpy.mean(diff * diff)
label, 𝑦$
print('Train
# prediction MSE:
for the ' +samples
test str(mse))
Train MSE: 22.00480083834814 features, 𝐱 $
n_test, _ = x_test.shape
xbar_test = numpy.concatenate((x_test, numpy.ones((n_test, 1))), axis=1)
In =[22]:
y_pred numpy.dot(xbar_test, w)
47 48
In [25]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
# mean squared error (testing)
12
diff In [23]: - y_test
= y_pred
shape of x_train: (404, 13) In [21]:
# mean squared error (training) 9/5/23
In [15]: y_lsr = numpy.dot(xbar_train, w)
diff = y_lsr - y_train
# the analytical solution mse = numpy.mean(diff * diff)
xx = numpy.dot(xbar_train.T, xbar_train) Train MSE: 22.00480083834814

xx_inv = numpy.linalg.pinv(xx)
Linear
xy Regression fory_train)
= numpy.dot(xbar_train.T, Housing Price 4. Make Prediction for Test Samples
In [22]:
w = numpy.dot(xx_inv, xy) print(y_train[0:10])

Add
[15.2 •42.3 a
feature
50. to the
21.1 17.7 18.5test feature
11.3 matrix:
15.6 15.6 0 ;<=; .
14.4]𝐗 ;<=; è 𝐗
Linear Price: • Make prediction by: 𝐲>?<@ = 𝐗 0 ;<=; 𝐰.
0
In [21]:
Regressor 0 # 𝐱. : = $500K
𝐰 In [23]:
𝐰0
# mean squared error (training) Predict # prediction for the test samples
Features of a House, 𝐱′
n_test, _ = x_test.shape
è y_lsr
Extend=it to
numpy.dot(xbar_train,
𝐱′ w)
xbar_test = numpy.concatenate((x_test, numpy.ones((n_test, 1))), axis=1)
diff = y_lsr - y_train y_pred = numpy.dot(xbar_test, w)
print('Train MSE: ' + str(mse)) In [25]:

Train MSE: 22.00480083834814
diff = y_pred - y_test
In [22]: print('Test MSE: ' + str(mse))
49 print(y_train[0:10]) 50Test MSE: 23.195599256409857
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] In [26]:
# Baseline
In [23]:
y_mean = numpy.mean(y_train)
# prediction for the test samples
4. Make Prediction for Test Samples 4. Make Prediction for Test Samples
diff = y_pred - y_mean
n_test, _ = x_test.shape print('Test MSE: ' + str(mse))
• Add a feature
xbar_test = numpy.concatenate((x_test, 0 ;<=; .
to the test feature matrix: 𝐗 ;<=;numpy.ones((n_test,
è𝐗 1))), axis=1)
Test MSE: 57.38297638530044
y_pred = numpy.dot(xbar_test, 0 w)0
• Make prediction by: 𝐲>?<@ = 𝐗 ;<=; 𝐰.
! Also known as out-of-sample error.
" Test MSE is 23.2
• MSE (test): 𝐲>?<@ − 𝐲;<=;
In [25]: & '()' !
The gap is an indicator of generalization.
diff = y_pred - y_test Training MSE is 22.0

mse = numpy.mean(diff * diff) Also known as in-sample error.
print('Test MSE: ' + str(mse))
Test MSE: 23.195599256409857 Training MSE is 22.0
51 In [26]: 52
# Baseline
y_mean = numpy.mean(y_train)
13
y_pred = numpy.dot(xbar_test, w)
In [25]: 9/5/23
diff = y_pred - y_test

5.23.195599256409857
Test MSE: Compare with Baseline Summary
InTrivial
[26]: baseline:
• Linear regression problem.
• whatever the features are, the prediction is mean(𝐲).
# Baseline • Least squares model.
• 3 algorithms for solving the model.
y_mean = numpy.mean(y_train) • Make predictions for never-seen-before test data.
• Evaluation of the model (training MSE and test MSE).
• Compare with baselines.
Test MSE of least
Test MSE: 57.38297638530044 squares is 23.19
53 54
14

02 01 Regression

Uploaded by

Copyright:

Available Formats

You might also like

02 01 Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 01 Regression

Uploaded by

Copyright:

Available Formats

9/5/23

Linear Regression Warm-up: Vector and Matrix

Vector and Matrix Vector Norms

Vector Norms Transpose and Rank

• The ℓ" -distance (Manhattan distance): Symmetric: a square matrix 𝐀 is symmetric if 𝐀# = 𝐀.

• Let 𝐀 be any 𝑛×𝑛 symmetric matrix.

• 𝐀 is full rank all the eigenvalues are nonzero.

Optimization: Basics Optimization: Basics

Optimal solution: 𝐰⋆ = argmin 𝑓 𝐰 . Least Squares Regression

Linear Regression Linear Regression

Least Squares Regression Least Squares Regression

Method: least squares regression.

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

𝐱"# 1 𝐱"# 1 𝑦"

Least Squares Regression Least Squares Regression

𝐱. "# 𝐰0 𝐱. "# 𝐰0 𝐱. "# 𝐰

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

Tasks Methods Algorithms Tasks Methods Algorithms

Least Absolute Deviations Least Absolute Deviations Conjugate Gradient (CG)

Least Squares Regression Least Squares Regression

Conjugate Gradient (CG) Conjugate Gradient (CG)

Least Squares Regression Least Squares Regression

Least Squares Regression Least Squares Regression

Conjugate Gradient (CG) Conjugate Gradient (CG)

Least Squares Regression

Solve Least Squares in Python

Least Absolute Deviations Conjugate Gradient (CG)

xx = numpy.dot(xbar_train.T, xbar_train) Train MSE: 22.00480083834814

w = numpy.dot(xx_inv, xy) print(y_train[0:10])

# mean squared error (testing)

diff = y_pred - y_test Training MSE is 22.0

Test MSE: 23.195599256409857 Training MSE is 22.0

diff = y_pred - y_test

You might also like