PRSL 9

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

PRS L9

Naive Bayesian Classifier: Digit Recognition Application


Naive Bayesian Classifier

• Classifier that uses the Bayes formula, that describes the conditional
probability of an event given another event:

A, B – events
P(A|B) – probability of a A given that B is true
P(B|A) – probability of a B given that A is true
P(A), P(B) – independent probabilities of A and B
Naive Bayesian Classifier

• Given n samples S = {s1, s2, …sn} (training set)


• Each sample has a set of feaures x = {x1, x2, …, xd}
• Each sample belongs to a class C = {c1, c2, …cJ}

• Given a new sample that does not belong to S (it belongs to the test
set). Given the features x of the new sample, to which class does this
sample belong?
• We make the “naive” assumption that the features are independent one from
the other in order to simplify computation
Naive Bayesian Classifier
• Given a new sample and its features x, to which class does this
sample belong?
• For each class ci from C:
• Compute Pi be the probability of the new sample to belong to the class ci
• Pi = P(c = ci|x) = P(ci) * P(x|c=ci) / P(x)

For each feature xj from the feature vector x of the new sample:
• Pj = the fraction of samples with class ci from S which have the feature from j equal
to xj
P(x|c=ci) *= Pj (compute the product)

• The new sample belongs to class ci for which Pi is maximum


Naive Bayesian Classifier
• Simple example
• Training set
• S = {s1, s2, s3, s4}
• xs1= {255, 0, 255, 0}
• xs2= {255, 255, 255, 0}
• xs3= {0, 0, 0, 255}
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
Naive Bayesian Classifier
• Simple example
• Training set Test set

• S = {s1, s2, s3, s4} T= {t1, t2}


• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0}
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
Naive Bayesian Classifier
Priors:
Training set Test set P(c=0) = nr_samples_class_0/
total_samples
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/
total_samples
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
Naive Bayesian Classifier
Priors:
Training set Test set P(c=0) = nr_samples_class_0/
total_samples
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/
total_samples
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255} Likelihood:
P(x0 == 255 | c=0) = 2/2 = 1
• C = {0, 1} P(x1 == 255 | c=0) = ½ = 0.5
• s1, s2 – class 0 P(x2 == 255 | c=0) = 2/2 =1
P(x3 == 255 | c=0) = 0/2 = 0
• s3, s4 - class 1
P(x0 == 255 | c=1) = 0
P(x1 == 255 | c=1) = 0
P(x2 == 255 | c=1) = 0.5
P(x3 == 255 | c=1) = 1
Naive Bayesian Classifier
xt1= {255, 0, 0, 0}
Priors: What class?
• Training set Test set P(c=0) = nr_samples_class_0/ - Use Bayes
total_samples P(c=0| xt1) =P(xt1| c=0)
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5 *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/ P(x0==255|c=0) *
total_samples P(x1==0 | c=0) * P(x2==0|
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5 c=0) * P(x3 ==0 | c=0) *
• xs3= {0, 0, 0, 255} What class t1, t2? P(c=0) =
• xs4= {0, 0, 255, 255} Likelihood: = 1 * 0.5 * 0 * 1 * 0.5
P(x0 == 255 | c=0) = 2/2 = 1
• C = {0, 1} P(x1 == 255 | c=0) = ½ = 0.5 P(c=1| xt1) = P(xt1| c=1)
• s1, s2 – class 0 P(x2 == 255 | c=0) = 2/2 =1 *P(c=1) / P(xt1) =
P(x3 == 255 | c=0) = 0/2 = 0 P(x0==255|c=1) * P(x1==0
• s3, s4 - class 1
| c=1) * P(x2==0| c=1) *
P(x0 == 255 | c=1) = 0 P(x3 ==0 | c=1) =
P(x1 == 255 | c=1) = 0 = 0 * 1 * 0.5 * 0 * 0.5
P(x2 == 255 | c=1) = 0.5
P(x3 == 255 | c=1) = 1
Naive Bayesian Classifier
xt1= {255, 0, 0, 0}
P(c=0) = nr_samples_class_0/ What class?
• Training set Test set total_samples - Use Bayes
P(c=0) = 0.5 P(c=0| xt1) =P(xt1| c=0)
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=1) = nr_samples_class_1/ *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} total_samples P(x0==255|c=0) *
P(c=1) = 0.5 P(x1==0 | c=0) * P(x2==0|
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
c=0) * P(x3 ==0 | c=0) *
• xs3= {0, 0, 0, 255} What class t1, t2? P(x0 == 255 | c=0) = 2/2 = 1 P(c=0) =
• xs4= {0, 0, 255, 255} P(x1 == 255 | c=0) = ½ = 0.5 = 1 * 0.5 * 0 * 1 * 0.5
P(x2 == 255 | c=0) = 2/2 =1
• C = {0, 1} P(x3 == 255 | c=0) = 0/2 = 0 Notes:
• s1, s2 – class 0 - we do not divide by
P(x0 == 255 | c=1) = 0 P(xt1) as in Bayes, as it
• s3, s4 - class 1 P(x1 == 255 | c=1) = 0 would only scale the
P(x2 == 255 | c=1) = 0.5 results, so we can
P(x3 == 255 | c=1) = 1 ignore it
- we have lots of 0 in the
product => solve this
Naive Bayesian Classifier
P(c=0) = nr_samples_class_0/ xt1= {255, 0, 0, 0}
total_samples What class?
• Training set Test set - Use Bayes
P(c=0) = 0.5
• S = {s1, s2, s3, s4} T= {t1, t2} P(c=1) = nr_samples_class_1/ P(c=0| xt1) =P(xt1| c=0)
total_samples *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0}
P(c=1) = 0.5
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255} P(x0==255|c=0) *
P(x0 == 255 | c=0) = P(x1==0 | c=0) * P(x2==0|
• xs3= {0, 0, 0, 255} What class t1, t2? c=0) * P(x3 ==0 | c=0) *
(2+1)/(2+2) = 0.75
• xs4= {0, 0, 255, 255} P(x1 == 255 | c=0) = P(c=0) =
(1+1)/(2+2) = 0.5 =0.75 * (1-0.5) * (1-0.75) *
• C = {0, 1} (1-0.25) * 0.5
P(x2 == 255 | c=0) =
• s1, s2 – class 0 (2+1)/(2+2) =0.75
P(x3 == 255 | c=0) = Compute for P(c=1|xt1)
• s3, s4 - class 1
(0+1)/(2+2) = 0.25
t1 will belong to the class
We add 1 to the numerator with maximum probability
and the number of classes to
the denominator
Naive Bayesian Classifier- Implementation

• Classify digits from the MNIST dataset


1. Create the training set:
• We will use only two classes (0 and 1): int C = 2
• Load the images (28x28) in a feature matrix:
Mat features(num_samples, d, CV_8UC1);
• We will load the first 100 images from classes 0 and 1 => num_samples = 200, d=28x28
Naive Bayesian Classifier- Implementation

• Obtain binary images by thresholding with 128:


threshold(img, img, 128, 255, CV_THRESH_BINARY);
• Keep the class of each sample in y
Mat y(num_samples, 1, CV_8UC1);
• Compute the priors matrix (the fraction of samples that belong to each class):
Mat priors(C,1,CV_64FC1);

Priors = nr_samples_class_c / total_number of samples


Naive Bayesian Classifier- Implementation

• Compute the likelihood matrix:


• Mat likelihood(C, d, CV_64FC1, Scalar(1)); // init with 1 to
avoid 0
• We will compute only the likelihood of features to be equal to 255
• Likelihood of features to be equal to 0 will be (1 – likelihood(255)

• The likelihood of having feature j equal to 255 given class i is given by the fraction of the
training instances which have feature j equal to 255 and are from class i:

• Adding +1 to the numerator and +C to the denominator is called Laplace smoothing


Naive Bayesian Classifier- Implementation

• Implement the classification function:


int classifyBayes(Mat img, Mat priors, Mat likelihood);
• Read an img from the test set
• Threshold the image: threshold(img, img, 128, 255, CV_THRESH_BINARY);
• Call classifyBayes(img, priors, likelihood);
Naive Bayesian Classifier- Implementation
int classifyBayes(Mat img, Mat priors, Mat likelihood) {
• Compute the feature matrix for the new image
• Compute the log posterior of each class

• To avoid precision problems, it is recommended to work with the logarithm (product of 784 values less
than 1)
• Which is equivalent to:
• Mat prob(C,1,CV_64FC1); - probability to belong to each class
• for c = 0, C:
• prob[c, 0] = log(priors[c, 0])
• for j = 0, d:
prob[c, 0] += log(likelihood[c, j]) if test_feat[0, j] == 255 else log(1 - likelihood[c, j])
• The sample will be classified into class c for which prob[c] is max
}
Practical work
1. Load each image from the training set, perform binarization and save
the values in the training matrix X. Save the class label in the label
vector y. For the initial version use only the first 100 images from the
first two classes.
2. Implement the training method.
2.1. Compute and save the priors for each class.
2.2. Compute and save the likelihood values for each class and
each feature. Apply Laplace smoothing to avoid zero values.
3. Implement the Naive Bayes classifier for an unknown image

You might also like