Professional Documents
Culture Documents
Machine Learning II: The Linear Model
Machine Learning II: The Linear Model
Linear Regression
Logistic Regression
1
Table of contents
Linear Regression
Logistic Regression
2
⃝ AM
L
3
x = (x0,x1, x2, · · · , x256)
x = (x0,x1, x2)
⃝ AM
L
4
x = (x0,x1, x2) x1 x2
⃝ AM
L
5
h(x) = (w x) y= +1 w+y x
x
w
(x1, y1), (x2, y2), · · · , (xN , yN )
(w xn) ̸= yn y= −1 w
x
w+y x
w ← w + ynxn
⃝ AM
L
6
E E
⃝ AM
L
7
E
E
E
E
⃝ AM
L
8
⃝ AM
L
9
Table of contents
Linear Regression
Logistic Regression
10
x =
··· ···
d
!
h(x) = wi xi = w x
i=0
⃝ AM
L
11
(x1, y1), (x2, y2), · · · , (xN , yN )
yn ∈ R xn
⃝ AM
L
12
How to measure the error in linear regression
or, equivalently, by
N
1 X
Ein (w ) = (yn − w T xn )2 .
N
n=1
13
How to measure the error in linear regression
N
1 X
Ein (w ) = (yn − w T xn )2 (1)
N
n=1
1
= ky − X w k22 (2)
N
where k·k is the Euclidean norm of a vector.
y1 x1T
|
y2 x2T
|
|
y =
.. ∈ R
N
and X =
.. ∈ RN×(d+1)
. .
yN xNT
|
|
14
Minimizing Ein
N
1 X T
Ein (w ) = (w xn − yn )2 (3)
N
n=1
1
= ky − X w k22 (4)
N
1
= (y − X w )T (y − X w ) (kv k22 = v T v ) (5)
N
1
= (y T y − 2w T X T y + w T X T X w ). (6)
N
We need to solve the following optimization problem
Linear Regression
Logistic Regression
16
Multivariate calculus - Gradient
17
Multivariate calculus - Gradient
18
Multivariate calculus - Hessian
where i, j = 1, . . . , d.
19
Multivariate calculus - Hessian
∂r ∂r ∂2r
If r is single-valued and the partial derivatives ∂zi , ∂zj and ∂zi ∂zj are
∂2r ∂2r ∂2r
continuous, then ∂zi ∂zj exists and ∂zi ∂zj = ∂zj ∂zi .
22
Convex functions
23
Optimality conditions for unconstrained optimization
Linear Regression
Logistic Regression
25
Minimizing Ein
∇Ein (w ) = 0 (11)
2 T
⇐⇒ X Xw − XTy = 0 (12)
N
⇐⇒ X T X w = X T y (13)
⇐⇒ w = (X T X )−1 X T y (if X T X is invertible) (14)
27
X y
(x1, y1), · · · , (xN , yN )
⎡ ⎤ ⎡ ⎤
x1 y1
⎢ ⎥ ⎢ ⎥
⎢ x2 ⎥ ⎢ y ⎥
X=⎢ ⎥, y = ⎢ 2 ⎥.
⎣ ⎦ ⎣ ⎦
x yN
' () N * ' () *
X† = (X X)−1X
w = X† y
⃝ AM
L
28
y
y
x1 x2
x
⃝ AM
L
29
Learning curves in linear regression
E E
D N
ED [E (g (D))]
ED [E (g (D))]
⃝ AM
L
30
Learning curves in linear regression
y = w∗ x +
w = (X X)−1X y
Xw − y
Xw − y′
⃝ AM
L
31
Learning curves in linear regression
σ2
! " E
σ 2 1 − d+1
N
σ2
! "
σ 2 1 + d+1
N E
! "
2 d+1
2σ N
d+1 N
⃝ AM
L
32
y = f (x) ∈ R
±1 ∈ R
w w xn ≈ yn = ±1
(w xn) yn = ±1
⃝ AM
L
33
⃝ AM
L
34
Table of contents
Linear Regression
Logistic Regression
35
d
!
s = wixi
i=0
⃝ AM
L
36
θ
1
es θ(s)
θ(s) =
1 + es
0 s
⃝ AM
L
37
h(x) = θ(s)
θ(s)
s=wx
h(x) = θ(s)
⃝ AM
L
38
(x, y) y
!
f (x) y = +1;
P (y | x) =
1 −f (x) y = −1.
f : Rd → [0, 1]
⃝ AM
L
39
(x, y) y
!
f (x) y = +1;
P (y | x) =
1 −f (x) y = −1.
f : Rd → [0, 1]
⃝ AM
L
39
(x, y) y f (x)
h=f y x
!
h(x) y = +1;
P (y | x) =
1 − h(x) y = −1.
⃝ AM
L
40
Formula for likelihood
where
h(x) for y = +1;
P(y |x) =
1 − h(x) for y = −1.
Maximize ΠN N
n=1 P(yn |xn ) ≡ Maximize ln Πn=1 P(yn |xn )
1
≡ Maximize ln ΠN
n=1 P(yn |xn )
N
1
≡ Minimize − ln ΠNn=1 P(yn |xn )
N
Furthermore, we can write
1
− ln ΠN n=1 P(yn |xn )
N
N
1 X 1
= ln
N n=1 P(yn |xn )
N
1 X 1 1
= 1{yn = +1} ln + 1{yn = −1} ln
N n=1 h(xn ) 1 − h(xn )
42
From likelihood to Ein
s
e 1
We have h(x) = θ(w T x), where θ(s) = 1+e s = 1+e −s with θ(−s) =
1 X
N
T
= ln 1 + e −yn w xn
N n=1
| {z }
Ein (w )
43
Cross-entropy
1 1
p log + (1 − p) log .
q 1−q
44
E
1 ! " #
N
E (w) = ln 1 + e−ynw xn
←−
N n=1
N
1 !
E (w) = (w xn − yn)2 ←−
N n=1
⃝ AM
L
45
⃝ AM
L
46