Professional Documents
Culture Documents
09 RegularizationBayesian
09 RegularizationBayesian
09 RegularizationBayesian
Pawel Herman
Outline
Regularisation Concept
Bayesian Framework
Pawel Herman
What is generalisation?
Generalisation is the capacity to apply learned knwoledge
to new situations
the capability of a learning machine to perform well on
unseen data
a tacit assumption that the test set is drawn from the same
distribution as the training one
Overfitting phenomenon
X T : T = f (X) +
The expected L2 -norm risk of the estimator is
h
i Z
R[f ] = E (t f (x))2 =
Pawel Herman
1
N
i =1
Bias-variance dilemma
The effectiveness of NN model F (x, w) as an estimator of the
regression f = E [t | x]
i
h
MSE = E (F (x, w) E [t | x])2
E over all possible training sets MSE helps to estimate egen
h
2 i
MSE = (E [F (x, w)] E [t | x])2 + E F (x, w) E [F (x, w)]
The need to balance bias (approximation error) and variance
(estimation error) on a limited data sample
Structural risk minimisation (SRM) and model complexity issues
Pawel Herman
Bias-variance illustration
Pawel Herman
Early stopping
An additional data set is required - validation set (split of
original data)
The network is trained with BP until the error monitored on the
validation set reaches minimum (further on only noise fitting)
Pawel Herman
Pawel Herman
Pawel Herman
Pawel Herman
Bootstrap
Bootstrap estimate:
boot
T
Etrue
= Eemp
+ B boot
B boot =
1
K
Tk
Tk T
Eemp
Eemp
Pawel Herman
1
K
T T
k
Eemp
n-fold cross-validation
CV
Etrue
=
1
E T rTk Tk
n emp
Pawel Herman
= E +
E
trade-off controlled by the regularisation parameter
smooth stabilisation of the solution due to
the complexity
term
n
in classical penalised ridging, = wYn
Pawel Herman
kDY k2
weight elimination
=
(wi /w0 )2
1 + (wi /w0 )2
w0 is a parameter
variable selection algorithm
large weights are penalised less than small ones
Pawel Herman
2 yout
xin2
!2
P (Hi ) P (D | Hi )
P (D)
Pawel Herman
Pawel Herman
P (w| D, Hi ) =
exp (Ew )
=R
P(w) =
Zw ()
exp( kwk2 )dw
The likelihood function P (D| w) can also be expressed in an
2
exponential form with the error function ED = N
i =1 (y (xi , w) ti )
P (D| w) =
exp ( ED )
ZD ( )
=
Zw ()
ZD ( )
ZM (, )
Pawel Herman
Pawel Herman
N (n )
(n )
(n+1)
=
2 ,
(n ) t 2
2 N
2
w(n)
i
i =1 y xi , w
Pawel Herman
P (Hi ) P (D | Hi )
P (D)
P (D | w, Hi ) P (w | Hi ) dw w P D | wopt , Hi
P wopt | Hi 4wposterior
Occam factor =
Pawel Herman
4wposterior
4wprior
1
4wprior
Pawel Herman
Pawel Herman
Pawel Herman
Pawel Herman
References
Pawel Herman