Professional Documents
Culture Documents
Lecture 5 - Least Square and Linear (DONE!!)
Lecture 5 - Least Square and Linear (DONE!!)
Lecture 5 - Least Square and Linear (DONE!!)
fromSklearn import
Learning
metrics
mean squared error
Lecture 5
MSEa meansquarederror 7test 7
Semester 2
predicted 9
data 2021/2022
Acknowledgement:
EE2211 development team
(Thomas, Kar-Ann, Helen, Vincent, Chen Khong, Robby, and Haizhou)
2
© Copyright EE, NUS. All Rights Reserved.
Least Squares and Linear Regression
Module II Contents
• Notations, Vectors, Matrices (introduced in L3)
• Operations on Vectors and Matrices
• Systems of Linear Equations
• Set and Functions
• Derivative and Gradient
• Least Squares, Linear Regression
• Linear Regression with Multiple Outputs
• Linear Regression for Classification
• Ridge Regression
• Polynomial Regression
3
© Copyright EE, NUS. All Rights Reserved.
Recap: Linear and Affine Functions
Linear Functions
A function !: #$ → # is linear if it satisfies the following two properties:
Affine function
! " = $+ " + ) scalar ) is called the offset (or bias)
b o
Ref: [Book4] Stephen Boyd and Lieven Vandenberghe, “Introduction to Applied Linear Algebra”, Cambridge University Press, 2018 (p31)
4
© Copyright EE, NUS. All Rights Reserved.
Functions: Maximum and Minimum
A local and a global minima of a function
5
© Copyright EE, NUS. All Rights Reserved.
Functions: Maximum and Minimum
Max and Arg Max
• Given a set of values ! = {$, , $- , … , $1 },
• The operator max *($) returns the highest value *($) for all elements in
234
the set !
• The operator arg max *($) returns the element of the set ! that
234
maximizes *($)
• When the set is implicit or infinite, we can write
max *($) or arg max *($)
2 2
E.g. !(6) = 36, 6 ϵ [0,1] à max ! 6 = 3 and arg max ! 6 = 1
2 2
Note: arg max returns a value from the domain of the function and max returns
from the range (codomain) of the function.
Ref: [Book1] Andriy Burkov, “The Hundred-Page Machine Learning Book”, 2019 (p6-7 of chp2).
6
© Copyright EE, NUS. All Rights Reserved.
Derivative and Gradient
• The derivative D′ of a function D is a function that describes
how fast D grows (or decreases)
– If the derivative is a constant value, e.g. 5 or −3
• The function ! grows (or decreases) constantly at any point x of its domain
– When the derivative !′ is a function
• If !′ is positive at some x, then the function ! grows at this point
• If !′ is negative at some x, then the function ! decreases at this point
• The derivative of zero at x means that the function’s slope at x is horizontal (e.g.
maximum or minimum points)
Ref: [Book1] Andriy Burkov, “The Hundred-Page Machine Learning Book”, 2019 (p8 of chp2).
7
© Copyright EE, NUS. All Rights Reserved.
Derivative and Gradient
The gradient of a function is a vector of partial derivatives
Differentiation of a scalar function w.r.t. a vector
If !(") is a scalar function of d variables, " is a d x1 vector.
Then differentiation of !(") w.r.t. " results in a d x1 vector
.!
eg
-!(") .&,
µ -"
= ⋮ firs p
.!. i stalar
.&$
This is referred to as the gradient of !(") and often written
as ,' !.
scalar % I E
E.g. ! " = %&, + )&- ,' ! =
) Ref: Duda, Hart, and Stork, “Pattern Classification”, 2001 (Appendix)
8
© Copyright EE, NUS. All Rights Reserved.
Derivative and Gradient
Partial Derivatives
Differentiation of a vector function w.r.t. a vector
If 0(") is a vector function of size h x1 and " is a d x1 vector.
Then differentiation of 0(") results in a h x d matrix
.!, .!,
⋯ eg R xyz
linputt
-0(") .&, .&$
= ⋮ ⋱ ⋮
gh of
-" .!F .!.F pl V Va Va
⋯ Contputt
.&, .&$
d.at
a
The matrix is referred to as the Jacobian of 0(") f Yj.YI
Ref: Duda, Hart, and Stork, “Pattern Classification”, 2001 (Appendix)
9
© Copyright EE, NUS. All Rights Reserved.
Derivative and Gradient
combination
Some Vector-Matrix Differentiation Formulae
-2" TA T
=2 y
-"
M ATy
-(3+ ") -(4 + 2")
=3 = 2+ 4
-" -"
-(" + 2")
= (2 + 2+ )"
-"
Derivations: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf
10
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
Ref: [Book1] Andriy Burkov, “The Hundred-Page Machine Learning Book”, 2019 (p3 of chp3).
11
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
scalarfunctionwithinputvector inputscalar
Problem Statement: To predict the unknown H for a given ' (testing)
1
• We have a collection of labeled examples (training) {(' J , yJ )}JS,
– I is the size of the collection
– ' J is the d-dimensional feature vector of example K = 1, … , I (input)
– yJ is a real-valued target (1-D) output scalar
– Note:
• when yJ is continuous valued, it is a regression problem
• when yJ is discrete valued, it is a classification problem
12
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
13
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
Learning objective function (using simplified notation hereon)
• To find the optimal values for w* which minimizes the
following expression:
1
>(!N " J − yJ )-
JS,
with !N " J = " + 6, offset
where we define 6 = [), <,, … <$ ]+ = [<V, <,, … <$ ]+ ,
and " J = [1, & , … & ]+ = [& , & , … & ]+ , D = 1, … , E
O J,, J,$ J,V J,, J,$
14
© Copyright EE, NUS. All Rights Reserved.
1
Linear Regression >(!N,O " J − yJ )-
JS,
• All model-based learning algorithms have a loss function
• What we do to find the best model is to minimize the
objective known as the cost function
• Cost function is a sum of loss functions over training set
plus possibly some model complexity penalty (regularization)
15
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
Learning (Training)
• Consider the set of feature vector " J and target output =J
indexed by D = 1, … , E, a linear model !N " = " + 6 can
be stacked as =,
FN 5 = 56 4= ⁞
=1
Learning
Model ",+ 6 Learning
target vector
= ⁞
+6
"1
)
O
cette <
where " J+ 6 = [1, &J,,, … , &J,$ ] ,
<$
⁞
Note: The bias/offset term is responsible for translating the line/plane/hyperplane
away from the origin.
16
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
17
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
18
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
oneinput oneoutput
Example 1 Training set {(.J , yJ )}1
JS, {. = −9} → {H = −6}
{. = −7} → {H = −6}
5 6 4 . = −5 → {H = −4}
1 −9 −6 .= 1 → H = −1
1 −7 −6 .= 5 → {H = 1 }
`V −4 .= 9 → {H = 4 }
1 −5
`, = −1
1 1
1 5 1
1 9 4 over determinedsystem
This set of linear equations has no exact solution
However, d + d is invertible Least square approximation
M = 5 c 4 = (5 + 5)XY 5 + 4
6 −6
−6 offset
X,
6 −6 1 1 1 1 1 1 −4 −1.4375
= =
−6 262 −9 − 7 − 5 1 5 9 −1 0.5625
1 slope
4
19
© Copyright EE, NUS. All Rights Reserved.
importnumpyasup
fromnumpylinalgimportinv
fromsklearnmetricsimportmeansquarederror
Linear RegressionXgimp
nparrayly 93,11 73.11 51.11.1 1.53.11.91
array If61,167.141,113.111,14
learnmodel w invlX.t xjaX.T y ts
pintle 10.5625
knew uparray11 1
new knew w
=O = 56M
man −1.4375
=5
0.5625
y = −1.4375+0.5625x
Prediction:
Test set
{& = −1} → {= =? }
−1.4375
Hg = 1 − 1
0.5625
= −2
computemeansquareerror
Linear Regression on one-dimensional samples
Ytest X
MSE mean.squared_errorlYtest.Y Python demo 1
print Mst
20
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
multipleinputs vector oneoutput
Example 2 {(' J , yJ )}1
JS,
{., = 1, .- = 1, .h = 1} → {H = 1}
{., = 1, .- = −1, .h = 1} → {H = 0}
Training set {., = 1, .- = 1, .h = 3} → {H = 2}
5 6 4 {., = 1, .- = 1, .h = 0} → {H = −1}
1 1 1 1
`,
1 −1 1 `- = 0
1 1 3 `h 2
1 1 0 −1
This set of linear equations has no exact solution
However, d + d is invertible
M = 5 c 4 = (5 + 5)XY 5 + 4
6 Least square approximation
1 1
X,
4 2 5 1 1 1 −0.7500
= 2 4 3 0 = 0.1786
1 −1 1 1 2
5 3 11 1 1 3 0 −1 0.9286
21
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
The four linear equations Prediction:
Test set
gst {., = 1, .- = 6, .h = 8} → {H =? }
2nd{., = 1, .- = 0, .h = −1} → {H =? }
j = Dm N d Z[\ = d Z[\ N
k j
Kaew
1 6 8 −0.7500
j=
k
=
g
1 0 −1
7.7500
0.1786
0.9286
−1.6786 Kid
22
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
A vector
Learning of Vectored Function (Multiple Outputs)
For one sample: a linear model 0N " = " + [ Vector function
For m samples: \N 5 = 5[ = ]
+ <V,, … <V,F vector
Sample 1
", 1 & ,,, … & ,,$
<,,, … <,,F
⁞ = ⁞ [= ⁞ ⁞ ⋱ ⁞
+ 1 & … & ⁞ ⋱ ⁞
Sample m " 1 1,, 1,$ <
$,, … <$,F
multipleoutputs ℎ
Sample 1’s output =,,, … =,,F for asample matrix
⁞ = ⁞ I
Sample m’s output =1,, … =1,F
ofontp
Iff
of
number features numberof
ℎ offset samples
o
5 ∈ 91×($o,) , [ ∈ 9($o,)×F , ] ∈ 91×F
g y
23
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
1
Objective:JJS,(0N " J − 4J )- = e+ e
Ref: Hastie, Tibshirani, Friedman, “The Elements of Statistical Learning”, (2nd ed., 12th printing) 2017 (chp.3.2.4)
24
© Copyright EE, NUS. All Rights Reserved.
Linear Regression
J([) = trace(^ + ^)
G,+
= trace( ⁞ [G, G- … GF ])
G+F
25
© Copyright EE, NUS. All Rights Reserved.
Linear Regression of multiple outputs
Example 3
Training set {., = 1, .- = 1, .h = 1} → { H, = 1, H- = 0}
{" J , 4J }1
JS,
{., = 1, .- = −1, .h = 1} → { H, = 0, H- = 1}
{., = 1, .- = 1, .h = 3} → { H, = 2, H- = −1}
{., = 1, .- = 1, .h = 0} → { H, = −1, H- = 3}
5 [ ]
Bias 1 1 1 ` 1 0 1.07Tos 12 1 E1 3
,,, `,,- Y uparray math
1 − 1 1 `-,, `-,- = 0 1
2 −1 WeinvIXT x xT y
1 1 3 `h,, `h,-
Xnew uparray 1,68 1 0 1
1 1 0 −1 3
This set of linear equations has NO exact solution
Least square
d = 5 c ] = (5 + 5)X,5 + ]
[ d + d is invertible approximation
4 2 5 X,
1 1 1 1 1 0
−0.75 2.25
= 2 4 3 0 1 = 0.1786 0.0357
1 −1 1 1 2 −1
5 3 11 1 1 3 0 −1 0.9286 − 1.2143
3
26
© Copyright EE, NUS. All Rights Reserved.
Linear Regression of multiple outputs
Example 3
Prediction:
Test set: two new samples
{., = 1, .- = 6, .h = 8} → { H, =? , H- =? }
{., = 1, .- = 0, .h = −1} → { H, =? , H- =? }
differentmapping
q = d Z[\ s
r t mm mm
−0.75 2.25
Bias 1 6 8
= 0.1786 0.0357
1 0 −1
0.9286 − 1.2143
7.75 − 7.25
=
−1.6786 3.4643
Python demo 2
27
© Copyright EE, NUS. All Rights Reserved.
PlotGraph X uparrayEi 1,4 1,6 it
i'o X7Cx.T
Y hP.arrayllC0isJ.Ci.si4j.Es.gj.Exnojpi.ggjywiinVlx.T
Ytest X
y
Linear Regression of multiple outputs
X1 17,71oplt.ploto label Yi
Example 4 pitplotXC lies y label 42
PltPlotXE 1 Yestt o pit.piotlxf.d.ytgtf.gg
The values of feature x and their corresponding values of multiple
it
pitglabel Y
d = 5 c ] = (5 + 5)X,5 + ] = 1.9
[
3.6
Python demo 3
−0.4667 0.5
h
] i
uvw = 5 Z[\ [ = 1 2 [i = 0.9667 4.6 Prediction
28
© Copyright EE, NUS. All Rights Reserved.
Summary
• Notations, Vectors, Matrices Midterm (1 hour, L1 to L5)
• Operations on Vectors and Matrices Trialçquiz
• Dot-product, matrix inverse Set up test
• Systems of Linear Equations DN d = dN = *
• Matrix-vector notation, linear dependency, invertible
• Even-, over-, under-determined linear systems
• Functions, Derivative and Gradient
• Inner product, linear/affine functions
• Maximum and minimum, partial derivatives, gradient
• Least Squares, Linear Regression
• Objective function, loss function
• Least square solution, training/learning and testing/prediction
• Linear regression with multiple outputs
Learning/training M = (5 +xyz{u 5 xyz{u )XY 5 +xyz{u 4xyz{u
6
Prediction/testing 4xv|x = 5 xv|x 6 M
• Classification Python packages: numpy, pandas, matplotlib.pyplot,
• Ridge Regression numpy.linalg, and sklearn.metrics (for
• Polynomial Regression mean_squared_error), numpy.linalg.pinv
29
© Copyright EE, NUS. All Rights Reserved.