Download as pdf or txt
Download as pdf or txt
You are on page 1of 33

Advanced Machine Learning

Lecture 2: Linear models


Sandjai Bhulai
Vrije Universiteit Amsterdam

s.bhulai@vu.nl
8 September 2023
Linear models

Advanced Machine Learning


Polynomial curve tting
▪ 10 points sampled from sin(2πx) + disturbance

x3 x5 x7 x9
sin x = x − + − + +⋯
3! 5! 7! 9!
3 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting
▪ Polynomial curve
M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j

j=0

▪ Performance is measured by

1 N
{y(xn, w) − tn}2
2∑
E(w) =
n=1

4 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


fi
Polynomial curve tting: order 0

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j

j=0
5 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 1

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j

j=0
6 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 3

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j

j=0
7 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Polynomial curve tting: order 9

M
y(x, w) = w0 + w1x + w2 x 2 + ⋯ + wM x M = wj x j

j=0
8 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
fi
Over tting
▪ Root mean square (RMS) error: ERMS = 2E(w*)/N

9 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


fi
Over tting

10 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


fi
E ect of dataset size
▪ Polynomial of order 9 and N = 15

11 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


ff
E ect of dataset size
▪ Polynomial of order 9 and N = 100

12 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


ff
Regularization
▪ Penalize large coe cients values

1 N λ
{y(xn, w) − tn} + ∥w∥2
2
2∑
Ẽ(w) =
n=1
2

▪ λ becomes a model parameter

13 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


ffi
Regularization
▪ Regularization with ln λ = − 18

14 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Regularization
▪ Regularization with ln λ = 0

15 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Regularization
▪ ERMS versus ln λ

16 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Regularization

17 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


A deeper analysis

Advanced Machine Learning


What is the issue?

x3 x5 x7 x9
sin x = x − + − + +⋯
3! 5! 7! 9!
19 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)

y(x, w) =
j=0

▪ φj are known are basis functions


▪ Typically, φ0(x) = 1 so that w0 acts as bias

20 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)

y(x, w) =
j=0

▪ Polynomial basis functions:


φj(x) = x j

▪ These are global functions

21 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)

y(x, w) =
j=0

▪ Gaussian basis functions:


(x − μj)2
{ 2s 2 }
φj(x) = exp −

▪ These are local functions

> μj controls location


> s controls scale
22 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Linear basis function models
▪ General model is

M−1
wjφj(x) = w⊤φ(x)

y(x, w) =
j=0

▪ Sigmoidal basis functions:

( )
x − μj
φj(x) = σ
s
where
1
σ(a) =
1 + exp(−a)

23 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

{ 2σ 2 }
2 1 1 2
▪ Note that (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2

β = 1/σ 2

(x | μ, σ 2) > 0

∫−∞
(x | μ, σ 2) dx = 1
𝒩
𝒩
24 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒩
𝒩
Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

{ 2σ 2 }
2 1 1 2
▪ Note that (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2

∫−∞
[x] = x (x | μ, σ 2) dx = μ

∫−∞
[x 2] = x 2 (x | μ, σ 2) dx = μ 2 + σ 2

var[x] = [x 2] − [x]2 = σ 2
𝒩
𝔼
𝔼
𝒩
𝒩
25 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝔼
𝔼
𝒩
Maximum likelihood
▪ Assume observations from a deterministic function with
added Gaussian noise:
t = y(x, w) + ϵ where p(ϵ | β) = (ϵ | 0, β −1)

▪ This is the same as saying

p(t | x, w, β) = (t | y(x, w), β −1)

M−1
wjφj(x) = w⊤φ(x)
▪ Recall: y(x, w) =

j=0

26 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


𝒩
𝒩
Maximum likelihood
▪ This is the same as saying

p(t | x, w, β) = (t | y(x, w), β −1)

▪ Given observed inputs X = {x1, …, xN} and targets


t = [t1, …, tN ]⊤, we obtain the likelihood function:
N
(tn | w⊤φ(xn), β −1)

p(t | X, w, β) =
n=1
𝒩
27 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒩
Maximum likelihood
▪ Taking the logarithm, we get
N
(tn | w⊤φ(xn), β −1)

ln p(t | w, β) = ln
n=1
N N
= ln β − ln(2π) − βED(w)
2 2

1 N
{tn − w⊤φ(xn)}2
2∑
where ED(w) =
n=1

{ 2σ 2 }
2 1 1 2
▪ Recall: (x | μ, σ ) = exp − (x − μ)
(2πσ 2)1/2
𝒩
𝒩
28 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
Maximum likelihood
▪ Computing the gradient and setting it to zero yields
N
{tn − w⊤φ(xn)}φ(xn)⊤ = 0

∇w ln p(t | w, β) = β
n=1
The Moore-Penrose
pseudo-inverse
▪ Solve for w, we get
wML = (Φ Φ) ⊤ −1
Φ⊤t

with φ0(x1) φ1(x1) ⋯ φM−1(x1)


φ0(x2) φ1(x2) ⋯ φM−1(x2)
Φ=
⋮ ⋮ ⋱ ⋮
φ0(xN ) φ1(xN ) ⋯ φM−1(xN )

29 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Interpretation
▪ Consider y = ΦwML = [φ1, …, φM]wML

y∈ ⊆

N-dimensional

M-dimensional

▪ is spanned by φ1, …, φM
▪ wML minimizes the distance between t and its orthogonal
projection on , i.e., y

30
𝒮
𝒮
Sandjai Bhulai / Advanced Machine Learning / 8 September 2023
𝒮
𝒯
Regularization
▪ Consider the error function

ED(w) + λEW (w)


data term + regularization term

▪ With the sum-of-squares error function and a quadratic


regularizer, we get
1 N ⊤ 2 λ ⊤

{tn − w φ(xn)} + w w
2 n=1 2

▪ This is minimized by w = (λI + Φ⊤Φ)


−1
Φ⊤t

31 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Regularization
▪ With a more general regularizer, we have

1 N λ M
{tn − w⊤φ(xn)}2 + | wj |q
2∑
n=1
2 ∑
j=1

lasso quadratic

32 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023


Regularization
▪ Lasso tends to generate sparser solutions than a quadratic
regularizer

33 Sandjai Bhulai / Advanced Machine Learning / 8 September 2023

You might also like