PRSL 9

PRS L9
Naive Bayesian Classifier: Digit Recognition Application

Naive Bayesian Classifier
• Classifier that uses the Bayes formula, that describes the conditional
probability of an event given another event:
A, B – events
P(A|B) – probability of a A given that B is true
P(B|A) – probability of a B given that A is true
P(A), P(B) – independent probabilities of A and B
• Given n samples S = {s1, s2, …sn} (training set)

• Each sample has a set of feaures x = {x1, x2, …, xd}
• Each sample belongs to a class C = {c1, c2, …cJ}
• Given a new sample that does not belong to S (it belongs to the test
set). Given the features x of the new sample, to which class does this
sample belong?
• We make the “naive” assumption that the features are independent one from
the other in order to simplify computation
• Given a new sample and its features x, to which class does this
sample belong?
• For each class ci from C:
• Compute Pi be the probability of the new sample to belong to the class ci
• Pi = P(c = ci|x) = P(ci) * P(x|c=ci) / P(x)
For each feature xj from the feature vector x of the new sample:
• Pj = the fraction of samples with class ci from S which have the feature from j equal
to xj
P(x|c=ci) *= Pj (compute the product)
• The new sample belongs to class ci for which Pi is maximum

• Simple example
• Training set
• S = {s1, s2, s3, s4}
• xs1= {255, 0, 255, 0}
• xs2= {255, 255, 255, 0}
• xs3= {0, 0, 0, 255}
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
• Simple example
• Training set Test set
• S = {s1, s2, s3, s4} T= {t1, t2}

• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0}
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
Priors:
Training set Test set P(c=0) = nr_samples_class_0/
total_samples
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/
total_samples
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255}
• C = {0, 1}
• s1, s2 – class 0
• s3, s4 - class 1
Priors:
Training set Test set P(c=0) = nr_samples_class_0/
total_samples
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/
total_samples
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5
• xs3= {0, 0, 0, 255} What class t1, t2?
• xs4= {0, 0, 255, 255} Likelihood:
P(x0 == 255 | c=0) = 2/2 = 1
• C = {0, 1} P(x1 == 255 | c=0) = ½ = 0.5
• s1, s2 – class 0 P(x2 == 255 | c=0) = 2/2 =1
P(x3 == 255 | c=0) = 0/2 = 0
• s3, s4 - class 1
P(x0 == 255 | c=1) = 0
P(x1 == 255 | c=1) = 0
P(x2 == 255 | c=1) = 0.5
P(x3 == 255 | c=1) = 1
xt1= {255, 0, 0, 0}
Priors: What class?
• Training set Test set P(c=0) = nr_samples_class_0/ - Use Bayes
total_samples P(c=0| xt1) =P(xt1| c=0)
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=0) = 0.5 *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} P(c=1) = nr_samples_class_1/ P(x0==255|c=0) *
total_samples P(x1==0 | c=0) * P(x2==0|
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
P(c=1) = 0.5 c=0) * P(x3 ==0 | c=0) *
• xs3= {0, 0, 0, 255} What class t1, t2? P(c=0) =
• xs4= {0, 0, 255, 255} Likelihood: = 1 * 0.5 * 0 * 1 * 0.5
P(x0 == 255 | c=0) = 2/2 = 1
• C = {0, 1} P(x1 == 255 | c=0) = ½ = 0.5 P(c=1| xt1) = P(xt1| c=1)
• s1, s2 – class 0 P(x2 == 255 | c=0) = 2/2 =1 *P(c=1) / P(xt1) =
P(x3 == 255 | c=0) = 0/2 = 0 P(x0==255|c=1) * P(x1==0
• s3, s4 - class 1
| c=1) * P(x2==0| c=1) *
P(x0 == 255 | c=1) = 0 P(x3 ==0 | c=1) =
P(x1 == 255 | c=1) = 0 = 0 * 1 * 0.5 * 0 * 0.5
P(x2 == 255 | c=1) = 0.5
P(x3 == 255 | c=1) = 1
xt1= {255, 0, 0, 0}
P(c=0) = nr_samples_class_0/ What class?
• Training set Test set total_samples - Use Bayes
P(c=0) = 0.5 P(c=0| xt1) =P(xt1| c=0)
• S = {s1, s2, s3, s4} T= {t1, t2}
P(c=1) = nr_samples_class_1/ *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0} total_samples P(x0==255|c=0) *
P(c=1) = 0.5 P(x1==0 | c=0) * P(x2==0|
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255}
c=0) * P(x3 ==0 | c=0) *
• xs3= {0, 0, 0, 255} What class t1, t2? P(x0 == 255 | c=0) = 2/2 = 1 P(c=0) =
• xs4= {0, 0, 255, 255} P(x1 == 255 | c=0) = ½ = 0.5 = 1 * 0.5 * 0 * 1 * 0.5
P(x2 == 255 | c=0) = 2/2 =1
• C = {0, 1} P(x3 == 255 | c=0) = 0/2 = 0 Notes:
• s1, s2 – class 0 - we do not divide by
P(x0 == 255 | c=1) = 0 P(xt1) as in Bayes, as it
• s3, s4 - class 1 P(x1 == 255 | c=1) = 0 would only scale the
P(x2 == 255 | c=1) = 0.5 results, so we can
P(x3 == 255 | c=1) = 1 ignore it
- we have lots of 0 in the
product => solve this
P(c=0) = nr_samples_class_0/ xt1= {255, 0, 0, 0}
total_samples What class?
• Training set Test set - Use Bayes
P(c=0) = 0.5
• S = {s1, s2, s3, s4} T= {t1, t2} P(c=1) = nr_samples_class_1/ P(c=0| xt1) =P(xt1| c=0)
total_samples *P(c=0) / P(xt1) =
• xs1= {255, 0, 255, 0} xt1= {255, 0, 0, 0}
P(c=1) = 0.5
• xs2= {255, 255, 255, 0} xt2= {0, 255, 0, 255} P(x0==255|c=0) *
P(x0 == 255 | c=0) = P(x1==0 | c=0) * P(x2==0|
• xs3= {0, 0, 0, 255} What class t1, t2? c=0) * P(x3 ==0 | c=0) *
(2+1)/(2+2) = 0.75
• xs4= {0, 0, 255, 255} P(x1 == 255 | c=0) = P(c=0) =
(1+1)/(2+2) = 0.5 =0.75 * (1-0.5) * (1-0.75) *
• C = {0, 1} (1-0.25) * 0.5
P(x2 == 255 | c=0) =
• s1, s2 – class 0 (2+1)/(2+2) =0.75
P(x3 == 255 | c=0) = Compute for P(c=1|xt1)
• s3, s4 - class 1
(0+1)/(2+2) = 0.25
t1 will belong to the class
We add 1 to the numerator with maximum probability
and the number of classes to
the denominator
Naive Bayesian Classifier- Implementation
• Classify digits from the MNIST dataset

1. Create the training set:
• We will use only two classes (0 and 1): int C = 2
• Load the images (28x28) in a feature matrix:
Mat features(num_samples, d, CV_8UC1);
• We will load the first 100 images from classes 0 and 1 => num_samples = 200, d=28x28
• Obtain binary images by thresholding with 128:

threshold(img, img, 128, 255, CV_THRESH_BINARY);
• Keep the class of each sample in y
Mat y(num_samples, 1, CV_8UC1);
• Compute the priors matrix (the fraction of samples that belong to each class):
Mat priors(C,1,CV_64FC1);
Priors = nr_samples_class_c / total_number of samples

• Compute the likelihood matrix:

• Mat likelihood(C, d, CV_64FC1, Scalar(1)); // init with 1 to
avoid 0
• We will compute only the likelihood of features to be equal to 255
• Likelihood of features to be equal to 0 will be (1 – likelihood(255)
• The likelihood of having feature j equal to 255 given class i is given by the fraction of the
training instances which have feature j equal to 255 and are from class i:
• Adding +1 to the numerator and +C to the denominator is called Laplace smoothing

• Implement the classification function:

int classifyBayes(Mat img, Mat priors, Mat likelihood);
• Read an img from the test set
• Threshold the image: threshold(img, img, 128, 255, CV_THRESH_BINARY);
• Call classifyBayes(img, priors, likelihood);
int classifyBayes(Mat img, Mat priors, Mat likelihood) {
• Compute the feature matrix for the new image
• Compute the log posterior of each class
• To avoid precision problems, it is recommended to work with the logarithm (product of 784 values less
than 1)
• Which is equivalent to:
• Mat prob(C,1,CV_64FC1); - probability to belong to each class
• for c = 0, C:
• prob[c, 0] = log(priors[c, 0])
• for j = 0, d:
prob[c, 0] += log(likelihood[c, j]) if test_feat[0, j] == 255 else log(1 - likelihood[c, j])
• The sample will be classified into class c for which prob[c] is max
}
Practical work
1. Load each image from the training set, perform binarization and save
the values in the training matrix X. Save the class label in the label
vector y. For the initial version use only the first 100 images from the
first two classes.
2. Implement the training method.
2.1. Compute and save the priors for each class.
2.2. Compute and save the likelihood values for each class and
each feature. Apply Laplace smoothing to avoid zero values.
3. Implement the Naive Bayes classifier for an unknown image

PRSL 9

Uploaded by

Copyright:

Available Formats

You might also like

PRSL 9

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PRSL 9

Uploaded by

Copyright:

Available Formats

PRS L9

Naive Bayesian Classifier: Digit Recognition Application

• Given n samples S = {s1, s2, …sn} (training set)

• The new sample belongs to class ci for which Pi is maximum

• S = {s1, s2, s3, s4} T= {t1, t2}

• Classify digits from the MNIST dataset

• Obtain binary images by thresholding with 128:

Priors = nr_samples_class_c / total_number of samples

• Compute the likelihood matrix:

• Adding +1 to the numerator and +C to the denominator is called Laplace smoothing

• Implement the classification function:

You might also like