Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 18

Presentation on

Classification(2)
Section 8.5.1-8.5.3

K Saiveer – 30121101.
8.5.1 Metrics for Evaluating Classifier Performance

• Focus on the predictive capability of a model.


• Rather than how fast it takes to classify or build models, scalability, etc.
• Model Evaluation Metrix:
• Accuracy (also known as recognition rate)
• Sensitivity (or recall)
• Specificity
• Precision
• F-measure
• F1, and Fβ
Confusion Matrix : A table that is often used to describe the
performance of a classification model (or classifier) on a set of test data
for which the true values are known.

• Confusion matrix is a table with size m by m.

• An entry, CMi,j in thr first m rows and m columns


Indicates the number of tuples if class i that were
Labeled by the classifier as class j.
Before we discuss the various measures, we need
to become comfortable with some terminology.
• True Positives (TP): Positive tuples that were correctly labelled by the
classifier.
• True Negatives (TN) : Negative tuples that were correctly labelled by
the classifier
• False Positives (FP) : Negative tuples that were incorrectly labelled as
positive
• False negatives (FN) : Positive tuples that were mislabelled as negative
• Most widely used metric:

• Classifier Accuracy, or recognition rate: percentage of test set tuples


that are correctly classified.

• Error rate, or misclassification rate: Error rate = 1 – accuracy, or


Limitation of Accuracy
• Consider a 2-class problem
• Number of class 0 examples = 9990
• Number of class 1 examples = 10

• If model predicts everything to be class 0:


• Accuracy is 9990/10000 = 99.9%.
• Accuracy is mis leading because model does not detect any class 1 example.

• Class Imbalance Problem:


• One class may be rare ,eg. Fraud or cancer.
• Significant majority of the negative class and minority of the positive class
• Sensitivity is also referred to as the true positive (recognition) rate
(TPR).
• The proportion of positive tuples that are correctly identified

• Specificity is the true negative rate (TNR)


• The proportion of negative tuples that are correctly identified

• It can be shown that accuracy is a function of sensitivity and specificity.


• Precision: Can be thought of as measure of exactness
• What percentage of tuples labelled as positive are actually positive.

• Recall: is a measure od completeness


• What percentage of positive tuples are labelled as positive?

• Perfect score is 1.0


• Inverse relationship between precision and recall
• If recall seems familiar. That’s because it is the same as sensitive (or the true positive rate).
• F measure (F1 or F-Score): harmonic mean of precision and recall,
• It gives equal weight to precision and recall.

• Fβ : weighted measure of precision and recall

• Assigns β times as much weight to recall as to precicion


• Commonly used Fβ measures are
• F2: which weights recall twice as much as precision
• F0.5 which weights precision twice as much as recall
Classifier Evaluation Metrics Example
8.5.2 Holdout Method and Random
Subsampling
• Holdout Method :
• This is a basic concept of estimating a prediction.

• Given a dataset, it is partitioned into two disjoint sets called training set and testing set.

• Classifier is learned based on the training set and get evaluated with testing set.

• Proportion of training and testing sets is at the discretion of analyst; typically 1:1 or 2:1, and
there is a trade-off between these sizes of these two sets.

• If the training set is too large, then model may be good enough, but estimation may be less
reliable due to small testing set and vice-versa.
Random Subsampling
• It is a variation of Holdout method to overcome the drawback of over-
presenting a class in one set thus under-presenting it in the other set and vice-
versa.

• In this method, Holdout method is repeated k times, and in each time, two
disjoint sets are chosen at random with a predefined sizes.

• Overall estimation is taken as the average of estimations obtained from each


iteration.
8.5.3 Cross-Validation
• The main drawback of Random subsampling is, it does not have control over the number of times
each tuple is used for training and testing.

• Cross-validation is proposed to overcome this problem.

• There are two variations in the cross-validation method.

• k-fold cross-validation

• N-fold cross-validation
k-fold Cross-Validation
• Dataset consisting of N tuples is divided into k (usually, 5 or 10) equal,
mutually exclusive parts or folds (, and if N is not divisible by k, then the last
part will have fewer tuples than other (k-1) parts.

• A series of k runs is carried out with this decomposition, and in ith iteration is
used as test data and other folds as training data
• Thus, each tuple is used same number of times for training and once for testing.

• Overall estimate is taken as the average of estimates obtained from each


iteration.
D1 Fold 1

Learning
Di technique
Fold i

Data set
Dk

Fold k
CLASSIFIER

Accuracy Performance
N-fold Cross-Validation
• In k-fold cross-validation method, part of the given data is used in training
with k-tests.

• N-fold cross-validation is an extreme case of k-fold cross validation, often


known as “Leave-one-out’’ cross-validation.

• Here, dataset is divided into as many folds as there are instances; thus, all
most each tuple forming a training set, building N classifiers.

• In this method, therefore, N classifiers are built from N-1 instances, and
each tuple is used to classify a single test instances.

• Test sets are mutually exclusive and effectively cover the entire set (in
sequence). This is as if trained by entire data as well as tested by entire data
set.

• Overall estimation is then averaged out of the results of N classifiers.


N-fold Cross-Validation : Issue
• So far the estimation of accuracy and performance of a classifier model is
concerned, the N-fold cross-validation is comparable to the others we have
just discussed.

• The drawback of N-fold cross validation strategy is that it is


computationally expensive, as here we have to repeat the run N times; this
is particularly true when data set is large.

• In practice, the method is extremely beneficial with very small data set
only, where as much data as possible to need to be used to train a classifier.
Thank
you.

You might also like