Professional Documents
Culture Documents
Lecture05 Notes
Lecture05 Notes
Lecture05 Notes
Iain Styles
25 October 2019
Regularisation
∂LLSE (w)
= −2ΦT (y − Φw) + 2λw, (4)
∂w
We set this to zero to obtain
ΦT y − ΦT Φ − λI w∗ = 0 (5)
where we have employed the identity matrix I to enable the
expression to be factorised in this form. This method is referred to
in the literature by a number of different names:
• Ridge regression.
lambda = 1/e^6 —
lambda = 1/e^5 — small
smallest
Figure 1: Fitting y = sin(2πx ) with
log( )=-6 log( )=-5 polynomial fits of degree M = 9
y
polynomial of that order.
2 2
1 0 1 1 0 1
x x
0
1 1
1 0 1 1 0 1
x x
lambda = 1/e biggest
1
2 1
1 0 1 1 0 1
x x
log10 λ w0 w1 w2 w3 w4 w5 w6 w7 w8 w9
-6 1.06 8.31 -17.92 -76.20 76.16 250.58 -112.92 -350.94 53.88 168.61
-5 1.67 5.77 -41.44 -29.12 212.84 31.81 -356.24 1.30 183.44 -9.40
-4 0.74 6.51 -5.21 -33.75 1.46 33.96 20.81 13.89 -17.55 -20.25
-3 0.94 1.29 -9.30 5.42 15.29 -21.02 5.50 -4.36 -12.20 19.08
-2 0.20 4.26 2.26 -10.00 -5.63 -4.88 -0.83 2.60 4.33 8.47
-1 0.56 -2.12 -1.35 4.00 -0.63 1.44 0.50 -0.84 1.30 -2.35
Table 1: Model weights for different
value of λ with L2 regularisation. The
shrinkage effect of this regularisation
term is clearly seen.
lecture 5: regularisation 4
1.0
Other common regularisation functions
0.8
RMS Error
L2 regularisation is extremely common and many machine learning
libraries use it by default (this is something to be aware of when
0.6
working with external libraries), but it belongs to a much broader 0.4
class of regulariser known as the L p norms, which use a regularisa-
0.2
tion term of the form
6 4 2
R(w) = ∑ | wi | p
(6) log( )
i
Figure 2: RMS Training Error of
Some common choices for the value of p include: polynomial fit with M = 9 to y =
sin(2πx ) for different degrees of L2
• p = 2, the method we have already studied. regularisation.
log10 λ w0 w1 w2 w3 w4 w5 w6 w7 w8 w9
-5 0.00 3.59 -0.00 -15.72 0.00 11.78 0.00 1.96 0.00 -1.60
-4 0.00 3.54 0.00 -15.27 0.00 11.05 0.00 1.93 0.00 -1.24
-3 0.00 3.10 -0.00 -12.02 -0.00 5.44 -0.00 3.49 -0.00 0.00
-2 0.00 0.95 -0.00 -3.76 -0.00 -0.00 -0.00 0.00 -0.00 2.74
-1 0.00 -0.05 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00
0 0.00 -0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00
Table -0.00
2: Model weights for different
value of λ with L1 regularisation. The
sparsifying effect of this regularisation
term is clearly seen.
lecture 5: regularisation 5
1 0 1 1 0 1
x x
0 0
y
1 1
1 0 1 1 0 1
x x
0 0
y
1 1
1 0 1 1 0 1
x x
lecture 5: regularisation 6
Multivariate Regression
y = w0 + w1 x 1 + w2 x 2 (8)
1 x1,N x2,N
ΦT ΦW = ΦT Y. (17)
Non-linear Regression
This section is provided for information only and is not examinable con-
tent.
lecture 5: regularisation 8
∂ f ( xi )
where matrix J has components Jij = ∂w j and the residual after
the change is
r(w + ∆w) = y − f(w) − J∆w (20)
where f i (w) = f ( xi , w). The error is therefore
end
if e0 /e < TerminationCriterion then
Converged ← true
end
end
Algorithm 1: The Levenberg-Marquadt Algorithm.
lecture 5: regularisation 10
Reading