Professional Documents
Culture Documents
5 ML NaiveBayes
5 ML NaiveBayes
5 ML NaiveBayes
Dariush Hosseini
dariush.hosseini@ucl.ac.uk
Department of Computer Science
University College London
1 45
Lecture Overview
Lecture Overview
1 Lecture Overview
3 Naı̈ve Bayes
Categorical Naı̈ve Bayes
Gaussian Naı̈ve Bayes
Gaussian Naı̈ve Bayes & Logistic Regression
4 Summary
2 45
Lecture Overview
Lecture Overview
3 45
Generative Classification - Recap
Lecture Overview
1 Lecture Overview
3 Naı̈ve Bayes
Categorical Naı̈ve Bayes
Gaussian Naı̈ve Bayes
Gaussian Naı̈ve Bayes & Logistic Regression
4 Summary
4 45
Generative Classification - Recap
Notation
Inputs
x = [1, x1 , ..., xm ]T 2 Rm+1
Binary Outputs
y 2 {0, 1}
Training Data
S = {(x(i ) , y (i ) )}ni=1
Data-Generating Distribution, D
S⇠D
5 45
Generative Classification - Recap
Probabilistic Environment
We assume:
(x, y ) are drawn i.i.d. from some data generating distribution, D, i.e.:
(x, y ) ⇠ D
and:
S ⇠ Dn
6 45
Generative Classification - Recap
Learning Problem
Representation
f 2F
Evaluation
Loss Measure:
E(f (x), y ) = I[y 6= f (x)]
Generalisation Loss:
⇥ ⇤
L(E, D, f ) = ED I[Y 6= f (X)]
Optimisation ⇥ ⇤
⇤
f = argmin ED I[Y 6= f (X)]
f 2F
7 45
Generative Classification - Recap
8 45
Generative Classification - Recap
Probabilistic Classifier
9 45
Generative Classification - Recap
Generative Classification
In Generative Classification we seek to learn pY (y = 1|x)
indirectly
First we re-express the Bayes Optimal Classifier as follows, without
loss of generality:
pX (x|y )pY (y )
= argmax P Bayes’ Theorem
y 2{0,1} y 2{0,1} pX (x|y )pY (y )
Then we seek to infer the likelihood pX (x|y ) and the prior pY (y ) for each
class separately
10 45
Generative Classification - Recap
Inference Problem
Inferring pY (y ) is straightforward
In binary classification there is only one parameter to learn
11 45
Generative Classification - Recap
12 45
Generative Classification - Recap
13 45
Naı̈ve Bayes
Lecture Overview
1 Lecture Overview
3 Naı̈ve Bayes
Categorical Naı̈ve Bayes
Gaussian Naı̈ve Bayes
Gaussian Naı̈ve Bayes & Logistic Regression
4 Summary
14 45
Naı̈ve Bayes
So, 8i , j , k :
P x (i ) |y (j ) , z (k ) = P x (i ) |z (k )
=) P x (i ) |y (j ) , z (k ) P y (j ) |z (k ) = P x (i ) |z (k ) P y (j ) |z (k )
=) P x (i ) , y (j ) |z (k ) = P x (i ) |z (k ) P y (j ) |z (k )
H V
16 45
Naı̈ve Bayes
Naı̈ve Bayes
Recall that each sample, (x, y ) is an outcome of a random variable,
X, Y
Furthermore: Each element of x, xi , is the outcome of a
corresponding random variable, Xi
Thus: pX (x) = pX1 ,X2 ,...,Xm (x1 , x2 , ..., xm )
Naı̈ve Bayes seeks to simplify the likelihood by assuming that
{Xi }m
i =1 are all conditionally independent given Y:
m
Y
pX (x|y ) = pXi (xi |y )
i =1
This is a much simpler representation
So: For our vector of boolean attributes we now need only 2m
parameters, rather than 2m - 1, to characterise the likelihood
17 45
Naı̈ve Bayes
18 45
Naı̈ve Bayes
Representation
But how do we learn the parameterisation of the prior and the likelihood?
19 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
20 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
Evaluation: pY (y )
21 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
Optimisation: pY (y )
22 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
Optimisation: pY (y )
For stationarity set this equal to zero:
n
X y (i ) (1 - y (i ) )
- =0
✓y MLE 1 - ✓y MLE
i =1
X n
=) y (i ) (1 - ✓y MLE ) - (1 - y (i ) )✓y MLE = 0
i =1
X n n
X
=) y (i ) = ✓y MLE
i =1 i =1
Pn
n1 i =1 y (i )
=) ✓y MLE = =
n n
Where n1 is equal to the number of training points for which y = 1
24 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
(xi |y = k ) ⇠ Categorical(⇥ik )
25 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
26 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
Recap
Representation
m
Y
F= f✓y ,{✓ijk } (x) = argmax pY (y ) pXi (xi |y ) pY (y = 1) = ✓y ,
y 2{0,1} i =1
✏
m,mi ,1
pXi (xij |k ) = ✓ijk i =1,j =1,k =0
Evaluation
⌦ ↵m , 1
ln L(✓y ) and ln (L(⇥ik ))
i =1,k =0
Optimisation
m,mi ,1
n1 nijk
✓y MLE = and ✓ijk MLE =
n nk i =1,j =1,k =0
27 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
Problem: Overfitting
We must take care when the training data contains no instances that satisfy
Xi = xij
If this occurs then the resulting parameter, ✓ijk , would be zero and for any data
point for which Xi = xij , regardless of the state of Y, then:
pXi (xij |k ) = 0 8k
m
Y
=) pX (x|k ) = pXi (xij |k ) = 0 8k
i =1
pX (x|k )pY (k ) 0
=) pY (k |x) = P = 8k
k̃ pX (x|k̃ )pY (k̃ ) 0
28 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
29 45
Naı̈ve Bayes / Categorical Naı̈ve Bayes
30 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes
Then we attempt to learn the pmf for pY (y ) and the pdf for pXi (xi |y )
in a frequentist setting using MLE
31 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes
(xi |y = k ) ⇠ N(µik , ik )
(xi -µik )2
1 -
2 2
pXi (xi |k ; µik , ik ) =q e ik
2⇡ 2
ik
x2
pX (x|y = 0)
µ20
pX (x|y = 1)
µ21
x1
µ10 µ11
32 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes
(j )
Given {xi }nj =k 1 samples drawn from the Normal distribution, and for
which y = k, the log-likelihood is given by:
0 (j )
1
nk (x -µik )2
Y 1 - i
2 2
ln (L(µik , ik )) = ln @ q e ik A
2⇡ 2
j =1 ik
nk (j )
!
X 2
( xi - µik )
= -nk ln ik - 2
+ const.
2 ik
j =1
33 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes
nk
X (j )
x i
µik MLE =
nk
j =1
nk
2 1 X ⇣ (j ) ⌘2
ik MLE = xi - µik
nk
j =1
34 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes
Recap
Representation
m
Y
F= f✓y ,{µik , ik }
(x) = argmax pY (y ) N(xi ; µik , ik ) pY (y = 1) = ✓y ,
y 2{0,1} i =1
✏
m ,1
µik 2 R, ik >0 i =1,k =0
Evaluation
⌦ ↵m,1
ln L(✓y ) and ln (L(µik , ik ))
i =1, k =0
Optimisation
nk nk
✏m , 1
1 X⇣ ⌘2
X (j )
n1 x i 2 (j )
✓y MLE = and µik MLE = , ik MLE = xi - µik
n nk nk
j =1 j =1 i =1,k =0
35 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
This gives and insight into the tradeoff between Generative and
Discriminative methods more generally
36 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
pY (y = 1)pX (x|y = 1)
pY (y = 1|x) =
pY (y = 1)pX (x|y = 1) + pY (y = 0)pX (x|y = 0)
1
= pY (y =0)pX (x|y =0)
1+ pY (y =1)pX (x|y =1)
1
= ⇣ ⇣ ⌘⌘
pY (y =0)pX (x|y =0)
1 + exp ln pY (y =1)pX (x|y =1)
1
= ⇣ ⇣ ⌘ ⇣ ⌘⌘
pY (y =0) pX (x|y =0)
1 + exp ln pY (y =1) + ln pX (x|y =1)
37 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
Logistic Regression
38 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
Naı̈ve Bayes
39 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
40 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
Where:
⇣ ⌘ Pm ⇣ µ2i1 -µ2i0 ⌘
1-✓y
w0 = ln ✓y + i =1 2 2
i
µi0 -µi1
wi = 2
i
41 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
42 45
Naı̈ve Bayes / Gaussian Naı̈ve Bayes & Logistic Regression
Lecture Overview
1 Lecture Overview
3 Naı̈ve Bayes
Categorical Naı̈ve Bayes
Gaussian Naı̈ve Bayes
Gaussian Naı̈ve Bayes & Logistic Regression
4 Summary
44 45
Summary
Summary
In the next lecture we will return to more theoretical considerations and discuss
the problem of Model Selection
45 45