Professional Documents
Culture Documents
DS CP Paper
DS CP Paper
Harshal Dhande, Manoj Dohale, Ayush Doshi, Ojas Dudhabaware, Abha Marathe
2 AGE
3 YELLOW FINGER
4 PEER PRESSURE Patient having yellow finger or not (Two values 2 and 1)
6 PEER PRESSURE Patient having peer pressure or not (Two values 2 and 1)
7 CHRONIC DISEASE Patient having chronic disease or not (Two values 2 and 1)
9 ALLERGY Patient having any kind of allergy or not (Two values 2 and 1)
Fig.3.Decision Tree
B) Random Forest
Random Forest is an ensemble model having Decision Trees
as their base model. It consists of many Decision Trees Fig. 5. SVM radial
which operate individually and provide a prediction for the
E) Logistics Regression
target variable. To build each individual tree, the model uses
bagging and feature randomness which helps in making it an Logistic regression is a statistical method for binary
uncorrelated forest of trees. This helps the model to achieve classification that can be generalized to multiclass
prediction accuracy which is more accurate than any of the classification. It is a classification model, which is very easy
individual tree to realize and achieves very good performance with linearly
separable classes. Logistic regression has been extensively
employed as an algorithm for classification in industry.
2) CLASSIFICATION ACCURACY
Classification accuracy shows the correct rate of prediction
results. It computes from the confusion matrix. The
classification accuracy is found by equation 1:
3) CLASSIFICATION ERROR
Classification error shows the incorrect rate of prediction
results. It computes from the confusion matrix. The
classification error is found by equation 2:
4) PRECISION
Precision is an important model performance evaluation
matrix. It is the fraction of related instances among the total
Fig. 6. Logistics retrieved instances. It is a positive predicted value. The
precision is calculated as follows in equation 3:
A) PERFORMANCE EVALUATION MEASURE
Precision = TP ∗ 100 (3)
Various evaluation matrices were used for checking the
TP + FP
performance of the classifier. For this purpose, the confusion
matrix was used. It is a 2∗2 matrix due to two classes in the
5) RECALL
dataset. The confusion matrix gives two types of correct
Recall is also an important model performance evaluation
prediction of the classifier and two types of incorrect
matrix. It is the fraction of related instances among the total
prediction of the classifier. The confusion matrix is
number of retrieved instances. The recall is calculated as
presented in Table 2.
follows in equation 4:
TABLE 2. Confusion Matrix.
Recall = TP ∗ 100 (4)
TP + FN
6) Specificity
Specificity is also an important model performance
evaluation matrix. It is proportion of truly negative cases
that were classified as negative thus it is measure of how
well classifier identifies negative cases also called as
negative rate. The Specificity is calculated as follows in
equation 5:
Table III
40
Accuracy SVM SVM RF LGR Tree
Linear Radial
100
Accuracy Precision Recall
90
60
Among applied classifiers RSVM gives highest accuracy of
50 96.66 % along with recall score of 100, LSVM gives
accuracy of 92.13 % along with recall score of 93.33,
40 logistic regression gives accuracy of 94.38 % along with
SVM SVM RF LGR Tree
Linear Radial recall score of 93.47, Random Forest gives accuracy of
91.667 % along with recall score of 100 and Decision Tree
Fig. 7. Classifier Accuracy on Different Algorithms has lowest accuracy of 75% along with recall 72.41.