02 01 Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

9/5/23

Linear Regression Warm-up: Vector and Matrix

1 2

Vector and Matrix Vector Norms


2 3 2 3
2 a13 a1
6aa12 7 6 a2 7
66 7 7 6 7 P p 1/p .
Vector (𝑛 n-dim
-dim) vector
n-dim a vector
= 66a2. 77a = 6 .7 • The `p norm: kxkp :=
n-dim vector a = 64 ... 75 4 .. 5 i |xi |
4 .. 5 an
an P 2
an
2 2 3 3 1/2
2 a11 a12 a· 11 ·· a12 1d3 · · · a1d • The `2 norm: kxk2 = i xi (the Euclidean norm).
a
6 a11 a 12 6·a ··21
··· aa1d 7 a2d 7
66a 21A aa=22 6 2d77 · · ·
22 7 P
n ⇥ d matrix
n ⇥ d) matrix A = 66 21 6
.. 4 ....·
· · a2d .. 7
Matrix (𝑛×𝑑 .. 22 .. 77 . . . 5 • The `1 norm kxk1 = i |xi |.
n ⇥ d matrix A = 64 .. ... . .. . ... 75 .
4 .. . 5
an1 a.n2 a· n1 ·· a.n2nd · · · and
an1 an2 · · · and • The `1 norm is defined by kxk1 = maxi |xi |.
2 3 2 3
2 a1:3 a1:
6 a 7 6 7
⇥ 6 aa2:77⇤ = 6 a2: 7
⇥ ⇤ 1:
Row and columns
A = ⇥ a:1 Aa:2= · a· ·:1 aa:d:2⇤ =· · ·6
6 a
6 2:. 77
:d 6 . 7
A = a:1 a:2 · · · a:d = 64 ... 75 4 .. 5
4 .. 5 an:
an:
an:

3 4

1
9/5/23

Vector Norms Transpose and Rank

𝐛 Transpose:
• The ℓ! -distance (Euclidean distance):
𝐚−𝐛 !
(green line) Square matrix: a matrix with the same number of rows and columns.

• The ℓ" -distance (Manhattan distance): Symmetric: a square matrix 𝐀 is symmetric if 𝐀# = 𝐀.


𝐚−𝐛 "
(red, blue, yellow lines) Rank: the number of linearly independent rows (or columns).
Full rank: a square matrix is full rank if its rank is equal to #columns.
𝐚

5 6

Eigenvalue Decomposition

• Let 𝐀 be any 𝑛×𝑛 symmetric matrix.


• Eigenvalue decomposition: 𝐀 = ∑& #
$%" 𝜆$ 𝐯$ 𝐯$ . Warm-up: Optimization
• Eigenvalues satisfy 𝜆" ≥ 𝜆! ≥ ⋯ ≥ |𝜆&|.
• Eigenvectors satisfy 𝐯$# 𝐯' = 0 for all 𝑖 ≠ 𝑗.

• 𝐀 is full rank all the eigenvalues are nonzero.

7 8

2
9/5/23

Optimization: Basics Optimization: Basics


Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 . Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 .
𝐰 𝐰
• 𝐰 = 𝑤", ⋯ , 𝑤( : optimization variables • 𝐰 = 𝑤", ⋯ , 𝑤( : optimization variables
• 𝑓 ∶ ℝ( ↦ ℝ : objective function • 𝑓 ∶ ℝ( ↦ ℝ : objective function
Constraint
• 𝒞 (a subset of ℝ( ) : feasible set • 𝒞 (a subset of ℝ( ) : feasible set

9 10

Optimization: Basics
Optimization problem: min 𝑓 𝐰 ; s. t. 𝐰 ∈𝒞 .
𝐰

Optimal solution: 𝐰⋆ = argmin 𝑓 𝐰 . Least Squares Regression


𝐰∈𝒞
• 𝑓 𝐰 ⋆ ≤ 𝑓 𝐰 for all the vectors 𝐰 in the set 𝒞.
• 𝐰 ⋆ may not exist, e.g., 𝒞 is the empty set.
• If 𝐰 ⋆ exists, it may not be unique.

11 12

3
9/5/23

Linear Regression Linear Regression


Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ

Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ . Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ .

𝑦 𝑦
1-dim (𝑑 = 1) example: Question (regard training):
how to compute 𝐰 and 𝑏?
Solution:
𝑦$ ≈ 0.15 𝑥$ + 5.0

𝑥 𝑥

13 14

Least Squares Regression Least Squares Regression


Input: vectors 𝐱", ⋯ , 𝐱 & ∈ ℝ( and labels 𝑦", ⋯ , 𝑦& ∈ ℝ • The optimization model:
𝟐
Output: a vector 𝐰 ∈ ℝ( and scalar 𝑏 ∈ ℝ such that 𝐱 $# 𝐰 + 𝑏 ≈ 𝑦$ . min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,,

Method: least squares regression.


Intercept (or bias)
• The optimization model:
𝟐
min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,,

15 16

4
9/5/23

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,, 𝐰,,

𝐱$ 𝐱$ 𝐰
𝐱. $ = 𝐱. $ = 0 =
𝐰

1 1 𝑏

17 18

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 + 𝑏 − 𝑦$
𝐰,, 𝐰,,

= 𝐱 $# 𝐰

𝐱$ 𝐰 𝐱$ 𝐰
𝐱. $ = 𝐰
0= 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏 𝐱. $ = 0 =
𝐰 𝐱. $# 𝐰
0 = 𝐱 $# 𝐰 + 𝑏

1 𝑏 1 𝑏

19 20

5
9/5/23

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 + 𝑏 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰,, 𝐰∈ℝ!"#

= 𝐱 $# 𝐰

𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"#

21 22

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#

𝐱"# 1 𝐱"# 1 𝑦"


𝐱 !# 1 𝐱 !# 1 𝑦!
0=
𝐗 𝐱 0# 1 0=
𝐗 𝐱 0# 1 𝐲= 𝑦0
⋮ ⋮ ⋮ ⋮ ⋮
𝐱 &# 1 𝐱 &# 1 𝑦&

23 24

6
9/5/23

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑&
$%" 𝐱 $# 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#

𝐱. "# 𝐰0 𝐱. "# 𝐰0 𝐱. "# 𝐰


0 − 𝑦"
𝐱. !# 𝐰0 𝐱. !# 𝐰0 𝐱. !# 𝐰
0 − 𝑦!
0𝐰
𝐗 0= #
𝐱. 0 𝐰 0 0𝐰
𝐗 0 = #
𝐱. 0 𝐰 0 0𝐰
𝐗 0−𝐲= 𝐱. 0# 𝐰
0 − 𝑦0
⋮ ⋮ ⋮
𝐱. &# 𝐰0 𝐱. &# 𝐰0 𝐱. &# 𝐰
0 − 𝑦&

25 26

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 𝟐
min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$ min ∑& #
$%" 𝐱 $ 𝐰 − 𝑦$
𝐰∈ℝ!"# 𝐰∈ℝ!"#

2 2
𝐱. "# 𝐰
0 − 𝑦" 𝐱. "# 𝐰
0 − 𝑦"
# #
𝐱. ! 𝐰0 − 𝑦! 𝐱. ! 𝐰0 − 𝑦!
! ! 𝟐
0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 0𝐰
𝐗 0−𝐲 = 𝐱. 0# 𝐰
0 − 𝑦0 = ∑ &$%" 𝐱. $# 𝐰
0 − 𝑦$ .
! !
⋮ ⋮
𝐱. &# 𝐰
0 − 𝑦& 𝐱. &# 𝐰
0 − 𝑦&
2 2

27 28

7
9/5/23

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
𝟐 !
min ∑& 𝐱 $# 𝐰 − 𝑦$ min 𝐗𝐰−𝐲 !
!"# $%" 𝐰∈ℝ!"#
𝐰∈ℝ

!
Matrix form: min 𝐗𝐰−𝐲
𝐰∈ℝ!"# !

29 30

Least Squares Regression Least Squares Regression


• The optimization model: • The optimization model:
! !
min 𝐗𝐰−𝐲 min 𝐗𝐰−𝐲 !
𝐰∈ℝ !"# ! 𝐰∈ℝ!"#

Tasks Methods Algorithms Tasks Methods Algorithms


Least Squares Regression Least Squares Regression Analytical Solution
Linear
Regression LASSO ? Linear
Regression LASSO Gradient Descent (GD)

Least Absolute Deviations Least Absolute Deviations Conjugate Gradient (CG)

31 32

8
9/5/23

Least Squares Regression Least Squares Regression


• Solve the optimization model: • Solve the optimization model:
! !
min 𝐗𝐰−𝐲 min 𝐗𝐰−𝐲 !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#
$ $
1 𝐗 𝐰4𝐲 1 𝐗 𝐰4𝐲
Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 = 𝟎
1 𝐰 1 𝐰
Algorithms Algorithms
Analytical Solution Analytical Solution
1st-order optimality condition
Gradient Descent (GD) Gradient Descent (GD)

Conjugate Gradient (CG) Conjugate Gradient (CG)

33 34

Least Squares Regression Least Squares Regression


• Solve the optimization model: • Solve the optimization model:
! !
min 𝐗𝐰−𝐲 min 𝐗𝐰−𝐲 !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#
$
1 𝐗 𝐰4𝐲
Gradient: $
= 2 𝐗#𝐗 𝐰 − 𝐗#𝐲 = 𝟎
1 𝐰
Algorithms Algorithms
Normal equation: 𝐗 # 𝐗 𝐰 ⋆ = 𝐗 # 𝐲 Gradient descent repeats:
Analytical Solution Analytical Solution
1. Compute gradient: 𝐠 6 = 𝐗 # 𝐗 𝐰6 − 𝐗 # 𝐲
Assume 𝐗#𝐗 is full rank. Gradient Descent (GD) 2. Update: 𝐰67" = 𝐰6 − 𝛼 𝐠 6 Gradient Descent (GD)

Analytical solution: 𝐰⋆ = 𝐗 # 𝐗 4" 𝐗#𝐲 Conjugate Gradient (CG) Conjugate Gradient (CG)

35 36

9
9/5/23

Least Squares Regression Least Squares Regression


• Solve the optimization model: • Solve the optimization model:
! !
min 𝐗𝐰−𝐲 min 𝐗𝐰−𝐲 !
!"#
𝐰∈ℝ ! 𝐰∈ℝ!"#

" "
Convergence: after 𝑂 𝜅 log iterations, Convergence: after 𝑂 𝜅 log iterations,
8 Algorithms 8 Algorithms
𝐗 𝐰! − 𝐰⋆ ≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ . 𝐗 𝐰! − 𝐰⋆ #
≤ 𝜖 𝐗 𝐰$ − 𝐰⋆ #
.
# #
Analytical Solution Analytical Solution
%%&' 𝐗(𝐗 %%&' 𝐗(𝐗
𝜅= is the condition number. 𝜅= is the condition number.
%%)* 𝐗(𝐗 Gradient Descent (GD) %%)* 𝐗(𝐗 Gradient Descent (GD)

Conjugate Gradient (CG) Conjugate Gradient (CG)


The pseudo-code of CG is available at the Wikipedia.

37 38

Least Squares Regression


• Solve the optimization model:
!
min
!"#
𝐗𝐰−𝐲 !
𝐰∈ℝ

Solve Least Squares in Python


Tasks Methods Algorithms
Least Squares Regression Analytical Solution
Linear
Regression LASSO Gradient Descent (GD)

Least Absolute Deviations Conjugate Gradient (CG)

39 40

10
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
In [5]: In [5]:
print('shape of x_train: ' + str(x_train.shape))
from keras.datasets import boston_housing from keras.datasets
print('shape import boston_housing
of x_test: ' + str(x_test.shape)) 9/5/23
print('shape of y_train: ' + str(y_train.shape))
(x_train, y_train), print('shape
(x_train,ofy_train),
(x_test, y_test) = boston_housing.load_data() y_test: ' + str(y_test.shape))
(x_test, y_test) = boston_housing.load_
shape of x_train: (404, 13)
print('shape of x_train: ' + str(x_train.shape)) print('shape
shape of x_train:
of x_test: (102, 13) ' + str(x_train.shape))
shape of y_train: (404,)
print('shape of x_test: ' + str(x_test.shape)) print('shape of x_test: ' + str(x_test.shape))
shape of y_test: (102,)

In
print('shape
[5]:
1. Load Data
of y_train: ' + str(y_train.shape))
2. Add A Feature
print('shape of y_train: ' + str(y_train.shape))
print('shape of y_test: ' + str(y_test.shape)) In print('shape
[14]: of y_test: ' + str(y_test.shape))
from keras.datasets import boston_housing
shape of x_train: (404, 13) shape
import of
numpy x_train: (404, 13)
(x_train,
shape y_train), (x_test,
of x_test: y_test)
(102, 13)= boston_housing.load_data() shape of x_test: (102, 13)
n, d = x_train.shape
shape ofofy_train:
print('shape x_train: ' +(404,)
str(x_train.shape)) shape of
xbar_train y_train: (404,)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape ofofy_test:
print('shape x_test: ' (102,)
+ str(x_test.shape)) shape of y_test: (102,)axis=1)
print('shape of y_train: ' + str(y_train.shape))
print('shape of y_test: ' + str(y_test.shape)) print('shape of x_train: ' + str(x_train.shape))
In [14]:
shape of x_train: (404, 13)
In [14]: of xbar_train: ' + str(xbar_train.shape))
print('shape
shape of x_test: (102, 13) shape of x_train: (404, 13)
import
shape numpy(404,)
of y_train: import numpy
shape of xbar_train: (404, 14)
shape of y_test: (102,)
n, d = x_train.shape In n, d = x_train.shape
[15]:
In [14]:
xbar_train = numpy.concatenate((x_train, numpy.ones((n, xbar_train
1))), = numpy.concatenate((x_train, numpy.ones((n, 1)
import numpy
# the analytical solution
41 axis=1) 42 axis=1)
n, d = x_train.shape xx = numpy.dot(xbar_train.T, xbar_train)
print('shape
xbar_train of x_train: ' +numpy.ones((n,
= numpy.concatenate((x_train, str(x_train.shape))
1))), xx_inv = numpy.linalg.pinv(xx)
print('shape of x_train: ' + str(x_train.shape))
axis=1) xy = numpy.dot(xbar_train.T, y_train)
print('shape of xbar_train: ' + str(xbar_train.shape)) print('shape of xbar_train: ' + str(xbar_train.shape))
w = numpy.dot(xx_inv, xy)
print('shape of x_train: ' + str(x_train.shape))
shape ofofx_train:
print('shape xbar_train: (404, 13)
' + str(xbar_train.shape)) shape of x_train: (404, 13)
In [21]:
shape of xbar_train: (404, 14) shape of xbar_train: (404, 14)
3. Solve the Least Squares
shape of x_train: (404, 13)
shape of xbar_train: (404, 14) 3. Solve the Least Squares
# mean squared error (training)

In [15]: In [15]:
y_lsr = numpy.dot(xbar_train, w)
In [15]: diff = y_lsr - y_train
Analytical solution: 𝐰
0 = 𝐗 0# 𝐗
0 4𝟏 𝐗
0# 𝐲 Analytical solution: 𝐰 0# 𝐗
0 * =diff)
𝐗 0 4𝟏 𝐗
0# 𝐲
# #
the the analytical
analytical solution solution mse# =the analytical
numpy.mean(diff solution
print('Train MSE: ' + str(mse))
xx = numpy.dot(xbar_train.T, xbar_train)
xx =
xx_inv numpy.dot(xbar_train.T, xbar_train)
= numpy.linalg.pinv(xx) xx MSE:
Train = numpy.dot(xbar_train.T,
22.00480083834814 xbar_train)
xy xx_inv = numpy.linalg.pinv(xx)
= numpy.dot(xbar_train.T, y_train) xx_inv = numpy.linalg.pinv(xx)
w =xy
numpy.dot(xx_inv, xy)
= numpy.dot(xbar_train.T, y_train) In xy
[22]:
= numpy.dot(xbar_train.T, y_train)
w = numpy.dot(xx_inv, xy) w = numpy.dot(xx_inv, xy)
print(y_train[0:10])
In [21]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
# mean squared error (training)
In [21]: In [21]:
y_lsr = numpy.dot(xbar_train, w) In [23]:
# =mean
diff y_lsr squared
- y_train error (training) # mean squared error (training)
mse = numpy.mean(diff * diff) # prediction for the test samples
43print('Train MSE: ' + str(mse)) 44
y_lsr = numpy.dot(xbar_train, w) y_lsr_ == x_test.shape
n_test, numpy.dot(xbar_train, w)
Train MSE: 22.00480083834814 xbar_test
diff = y_lsr - y_train diff == y_lsr
numpy.concatenate((x_test,
- y_train numpy.ones((n_test, 1))), axis=1)
y_pred = numpy.dot(xbar_test, w)
In mse
[22]: = numpy.mean(diff * diff) mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse)) print('Train MSE: ' + str(mse))
print(y_train[0:10]) In [25]: 11
Train
[15.2 42.3 MSE: 22.00480083834814
50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] Train
# mean MSE:error
squared 22.00480083834814
(testing)
n, d shape of x_train: (404, 13)
= x_train.shape
shape of
xbar_train x_test: (102, 13)
= numpy.concatenate((x_train, numpy.ones((n, 1))),
shape of y_train: (404,) axis=1) 9/5/23
shape of y_test: (102,)
print('shape of x_train: ' + str(x_train.shape))
print('shape of xbar_train: ' + str(xbar_train.shape))
In [14]:
shape of x_train: (404, 13)
shapeimport numpy
of xbar_train: (404, 14)
3. Solve the Least Squares 3. Solve the Least Squares
n, d = x_train.shape
In [15]: " !
xbar_train = numpy.concatenate((x_train, numpy.ones((n,
Training1))),
Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
Analytical solution: 𝐰
0 = 0# 𝐗
𝐗 0 4𝟏 𝐗
0# 𝐲 axis=1)
& !
# the analytical solution

xx = print('shape of x_train:
numpy.dot(xbar_train.T, ' + str(x_train.shape))
xbar_train)
print('shape
xx_inv of xbar_train: ' + str(xbar_train.shape))
= numpy.linalg.pinv(xx)
xy = numpy.dot(xbar_train.T, y_train)
shape of x_train:xy)
w = numpy.dot(xx_inv, (404, 13)
shape of xbar_train: (404, 14)
In [21]:
In [15]:
# mean squared error (training)
45 # the analytical solution 46
y_lsr = numpy.dot(xbar_train, w)
diff xx
= y_lsr - y_train
= numpy.dot(xbar_train.T, xbar_train)
mse =xx_inv
numpy.mean(diff * diff)
= numpy.linalg.pinv(xx)
print('Train MSE: ' + str(mse))
xy = numpy.dot(xbar_train.T, y_train)
Trainw MSE:
= numpy.dot(xx_inv,
22.00480083834814xy)
3. Solve the Least Squares Linear Regression for Housing Price
In [21]:
In [22]: " !
Training Mean Squared Error (MSE): 0𝐰
𝐲−𝐗 0
& ! Linear
# mean squared error (training)
print(y_train[0:10])
Regressor
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] 𝐰0
y_lsr = numpy.dot(xbar_train, w)
diff = y_lsr - y_train Train
In [23]:
mse = numpy.mean(diff * diff)
label, 𝑦$
print('Train
# prediction MSE:
for the ' +samples
test str(mse))
Train MSE: 22.00480083834814 features, 𝐱 $
n_test, _ = x_test.shape
xbar_test = numpy.concatenate((x_test, numpy.ones((n_test, 1))), axis=1)
In =[22]:
y_pred numpy.dot(xbar_test, w)
47 48
print(y_train[0:10])
In [25]:
[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4]
# mean squared error (testing)
12
diff In [23]: - y_test
= y_pred
mse = numpy.mean(diff * diff)
shape of x_train: (404, 13) In [21]:
shape of xbar_train: (404, 14)
# mean squared error (training) 9/5/23
In [15]: y_lsr = numpy.dot(xbar_train, w)
diff = y_lsr - y_train
# the analytical solution mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse))

xx = numpy.dot(xbar_train.T, xbar_train) Train MSE: 22.00480083834814


xx_inv = numpy.linalg.pinv(xx)
Linear
xy Regression fory_train)
= numpy.dot(xbar_train.T, Housing Price 4. Make Prediction for Test Samples
In [22]:

w = numpy.dot(xx_inv, xy) print(y_train[0:10])


Add
[15.2 •42.3 a
feature
50. to the
21.1 17.7 18.5test feature
11.3 matrix:
15.6 15.6 0 ;<=; .
14.4]𝐗 ;<=; è 𝐗
Linear Price: • Make prediction by: 𝐲>?<@ = 𝐗 0 ;<=; 𝐰.
0
In [21]:
Regressor 0 # 𝐱. : = $500K
𝐰 In [23]:
𝐰0
# mean squared error (training) Predict # prediction for the test samples
Features of a House, 𝐱′
n_test, _ = x_test.shape
è y_lsr
Extend=it to
numpy.dot(xbar_train,
𝐱′ w)
xbar_test = numpy.concatenate((x_test, numpy.ones((n_test, 1))), axis=1)
diff = y_lsr - y_train y_pred = numpy.dot(xbar_test, w)
mse = numpy.mean(diff * diff)
print('Train MSE: ' + str(mse)) In [25]:

# mean squared error (testing)


Train MSE: 22.00480083834814
diff = y_pred - y_test
mse = numpy.mean(diff * diff)
In [22]: print('Test MSE: ' + str(mse))
49 print(y_train[0:10]) 50Test MSE: 23.195599256409857

[15.2 42.3 50. 21.1 17.7 18.5 11.3 15.6 15.6 14.4] In [26]:
# Baseline
In [23]:
y_mean = numpy.mean(y_train)
# prediction for the test samples
4. Make Prediction for Test Samples 4. Make Prediction for Test Samples
diff = y_pred - y_mean
mse = numpy.mean(diff * diff)
n_test, _ = x_test.shape print('Test MSE: ' + str(mse))
• Add a feature
xbar_test = numpy.concatenate((x_test, 0 ;<=; .
to the test feature matrix: 𝐗 ;<=;numpy.ones((n_test,
è𝐗 1))), axis=1)
Test MSE: 57.38297638530044
y_pred = numpy.dot(xbar_test, 0 w)0
• Make prediction by: 𝐲>?<@ = 𝐗 ;<=; 𝐰.
! Also known as out-of-sample error.
" Test MSE is 23.2
• MSE (test): 𝐲>?<@ − 𝐲;<=;
In [25]: & '()' !
The gap is an indicator of generalization.
# mean squared error (testing)

diff = y_pred - y_test Training MSE is 22.0


mse = numpy.mean(diff * diff) Also known as in-sample error.
print('Test MSE: ' + str(mse))

Test MSE: 23.195599256409857 Training MSE is 22.0

51 In [26]: 52
# Baseline

y_mean = numpy.mean(y_train)
13
diff = y_pred - y_mean
mse = numpy.mean(diff * diff)
y_pred = numpy.dot(xbar_test, w)

In [25]: 9/5/23
# mean squared error (testing)

diff = y_pred - y_test


mse = numpy.mean(diff * diff)
print('Test MSE: ' + str(mse))

5.23.195599256409857
Test MSE: Compare with Baseline Summary
InTrivial
[26]: baseline:
• Linear regression problem.
• whatever the features are, the prediction is mean(𝐲).
# Baseline • Least squares model.
• 3 algorithms for solving the model.
y_mean = numpy.mean(y_train) • Make predictions for never-seen-before test data.
• Evaluation of the model (training MSE and test MSE).
diff = y_pred - y_mean
• Compare with baselines.
mse = numpy.mean(diff * diff)
print('Test MSE: ' + str(mse))
Test MSE of least
Test MSE: 57.38297638530044 squares is 23.19

53 54

14

You might also like