Professional Documents
Culture Documents
Student ID: 16-33040-3: Dowla, Md. Nozib Ud
Student ID: 16-33040-3: Dowla, Md. Nozib Ud
NOZIB UD
Student ID: 16-33040-3
Problem
Task:
1. Choose any five classifier algorithms from Weka and find the best classifier.
2. Find the ROC curve and the report.
Addition task:
1. Create one test data set from your assigned training data set.
2. Run this test data set on Weka with the training data set and show the performance of
test data.
1
Project Definition
The main aim of this project is to be able to run different data mining algorithms in Weka tools
and find the best classifier for a particular data set.
2
Results
NaiveBayes Classifier
From Figure 1, it can be said that 44.7991% instances were correctly classified. Whereas
55.2009% instances were incorrect. Kappa statistic value is 0.2697. However, in class “bus” the
recall value is only 0.147 which is very poor. Also, recall value of class “saab” is only 0.392 which
is also not good. Moreover, if we look at the confusion matrix for class “bus” out of 218 instances
only 32 was classified as “bus”, which indicated a really poor performance. In class “saab” out of
217 instances only 85 was classified correctly, 132 instances were incorrectly classified. More than
60% instances were classified incorrectly. ROC of class “bus” is over 0.843 or 84.3% but still the
algorithm is not performing that much.
By that, it can be said the NaiveBayes algorithm performance is not satisfactory stage.
3
AdaBoostM1 Classifier
From Figure 2, it can be said that 39.9527% instances were correctly classified. Whereas
60.0473% instances were incorrect. Kappa statistic value is 0.2059 or 20.59%. However, in “bus”
class the recall value is 0. Also, recall value of “opel” class is 0.245 which is also not good. For
further understanding we can look at the Confusion Matrix in “bus” class out of 218 instances 0
instances are classified correctly. In “van” class the algorithm performed perfectly out of 119
instances 119 instances are classified correctly, no instances were classified incorrectly.
By that, it can be said the AdaBoost algorithm is not performing well.
4
ZeroR Classifier
From Figure 3, it can be said that 25.6501% instances is correctly classified. Whereas 74.3499%
is incorrect. Kappa statistic value is -0.0014. F-Measure for all the class is below 0.4 which indicate
that the algorithm is not performing very poor. However, in “van” class F-Measure value is 0%
which is very bad.
To understand we have to analyze the confusion matrix. If we look at “van” class the number of
correctly classified instances decreased compare to previous AdaBoostM1 algorithm. Not a single
instance was classified correctly, which is same as “opel” class. In “bus” class out of 218 instances
196 instances classified correctly. By those analyzations we can surely say that among all other
algorithm used to evaluate this data set by far performed worse.
5
RandomForest Classifier
From Figure 4, it can be said that 75.0591% instances were correctly classified. Whereas
24.9409% were incorrect. Kappa statistic value is 0.6675 or 66.75%. In “bus”class 214 instances
is correctly classified out of 218 instances. Only 4 instances are in correctly classified. If we see
the recall value which is 0.982, it is good. In previous ZeroR algorithm, the “bus” suffered to
classify correctly. For this algorithm if we see at the value of F-Measures for “bus” class it is
significantly higher than previous ZeroR algorithm. If we look at the Weighted Avg. of F-Measure
for RandomForest has the highest value of 0.745 which indicates that among other algorithm
RandomForest is performing best.
To justify the answer, we can look at the confusion matrix, previous algorithm does not perform
well in classifying “bus” class but in RandomForest does far better in this area.
6
SimpleCart Classifier
From Figure 5, it can be said that 68.9125% instances are correctly classified. Whereas 31.0875%
is incorrect. Kappa statistic value is 0.5856 or 58.586%. However, in “saab” class has F-Measure
value is 0.463 which is not good. Despite of have 68% correctness which is far better from
NaiveBayes, AdaBoostM1, ZeroR performance algorithms.
To understand we have to analyze the Confusion Matrix. If we look at the “bus” class 201 number
of instances were classified out of 218 instances, which is indicates a good performance. In “van”
class 173 number of instances is classified correctly and only 26 instances are classified
incorrectly. By those analyzations we can surely say that SimpleCart algorithm peform good.
7
Thus, it can be said that among the entire algorithm for this data set RandomForest has best
performance.
8
Figure 8 ZeroR ROC curve
9
Addition task
test data set
In Figure 11, a test data set has been created to evaluate algorithm performance.
10
2. Run this test data set on Weka with the training data set and show the performance of
test data.
11