Evaluation Method Holdout

CoSc 6211 Advanced Concepts of Data Mining
Classification and Prediction (3-1)
Model Evaluation and Selection

Model Evaluation and Selection
• Evaluation metrics: How can we measure accuracy? Other metrics to

consider?
• Use validation test set of class-labeled tuples instead of training set
when assessing accuracy
• Methods for estimating a classifier’s accuracy:
– Holdout method, random subsampling
– Cross-validation
– Bootstrap
• Comparing classifiers:
– Confidence intervals
– Cost-benefit analysis and ROC Curves
JJU CoSc 6211 2

Classifier Evaluation Metrics: Confusion
Matrix
• The confusion matrix is a useful tool for analyzing how well your
classifier can recognize tuples of different classes. A confusion
matrix for two classes is shown in Figure 6.27.
Actual class\Predicted class C1 ¬ C1

C1 True Positives (TP) False Negatives (FN)
¬ C1 False Positives (FP) True Negatives (TN)
• Given m classes, an entry, CMi,j in a confusion matrix indicates # of

tuples in class i that were labeled by the classifier as class j
• May have extra rows/columns to provide totals
JJU CoSc 6211 3

The confusion matrix is a useful tool for analyzing how well your classifier can
recognize tuples of different classes. TP and TN tell us when the classifier is getting
things right, while FP and FN tell us when the classifier is getting things wrong (i.e.,
mislabeling).
Predicted class
A\P C ¬C
Yes No
C TP FN P
Yes True positive False negative
¬C FP TN N Actual class
P’ N’ All
No False positive True negative
P’ is the number of tuples that were labeled as positive (TP +FP) and N’ is the number of
tuples that were labeled as negative (TN +FN). The total number of tuples is TP +TN +FP
+TN, or P +N, or P’ + N’. Note that although the confusion matrix shown is for a binary
classification problem, confusion matrices can be easily drawn for multiple classes in a
similar manner.
JJU CoSc 6211 4

Classifier Evaluation Metrics: Accuracy, Error
Rate, Sensitivity and Specificity
Class Imbalance Problem:

A\P C ¬C  One class may be rare, e.g.
C TP FN P fraud, or HIV-positive
 Significant majority of the
¬C FP TN N
P’ N’ All negative class and minority of
the positive class
 Classifier Accuracy, or  Sensitivity: True Positive
recognition rate: percentage of test recognition rate

 Sensitivity = TP/P
set tuples that are correctly
 Specificity: True Negative
classified
recognition rate
Accuracy = (TP + TN)/All  Specificity = TN/N
 Error rate: 1 – accuracy, or
Error rate = (FP + FN)/All
JJU CoSc 6211 5
Classifier Evaluation Metrics:
Precision and Recall, and F-measures
• Precision: exactness – what % of tuples that the classifier
labeled as positive are actually positive
• Recall: completeness – what % of positive tuples did the

classifier label as positive?
• Perfect score is 1.0
• Inverse relationship between precision & recall
• F measure (F1 or F-score): harmonic mean of precision and
recall,
• Fß: weighted measure of precision and recall

– assigns ß times as much weight to recall as to precision
JJU CoSc 6211
6
Classifier Evaluation Metrics: Example
Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

cancer = yes 90 210 300 30.00 (sensitivity
cancer = no 140 9560 9700 98.56 (specificity)
Total 230 9770 10000 96.40 (accuracy)
Precision = TP/TP+FP
90/300 = 30.00 (sensitivity)
= 90/230 = 39.13%
9560/9770 = 98.56 (specificity)
Recall = TP/TP+FN 9560/10000 = 96.40 (accuracy)
= 90/300 = 30.00%
JJU CoSc 6211 7

Evaluating Classifier Accuracy:
Holdout & Cross-Validation Methods
• Holdout method
– Given data is randomly partitioned into two independent sets
• Training set (e.g., 2/3) for model construction
• Test set (e.g., 1/3) for accuracy estimation
– Random sampling: a variation of holdout
• Repeat holdout k times, accuracy = avg. of the accuracies obtained
• Cross-validation (k-fold, where k = 10 is most popular)
– Randomly partition the data into k mutually exclusive subsets, each approximately equal
size
– At i-th iteration, use Di as test set and others as training set
– Leave-one-out is a special case of k-fold cross-validation where k is set to the number of
initial tuples. That is, only one sample is “left out” at a time for the test set.
– In stratified cross-validation, the folds are stratified so that the class distribution of the
tuples in each fold is approximately the same as that in the initial data.
• In general, stratified 10-fold cross-validation is recommended for estimating
accuracy (even if computation power allows using more folds) due to its relatively
low bias and variance.
JJU CoSc 6211 8

Cross Validation: Holdout Method
— Break up data into groups of the same size
— Hold aside one group for testing and use the rest to build model
— Repeat
iteration
Test
JJU CoSc 6211 9

JJU CoSc 6211 10
Evaluating Classifier Accuracy: Bootstrap
• Bootstrap
– Works well with small data sets
– Samples the given training tuples uniformly with replacement
• i.e., each time a tuple is selected, it is equally likely to be selected
again and re-added to the training set
• Several bootstrap methods, and a common one is .632 boostrap
– A data set with d tuples is sampled d times, with replacement,
resulting in a training set of d samples. The data tuples that did not
make it into the training set end up forming the test set. About 63.2%
of the original data end up in the bootstrap, and the remaining 36.8%
form the test set (since (1 – 1/d)d ≈ e-1 = 0.368)
– Repeat the sampling procedure k times, overall accuracy of the model:
JJU CoSc 6211 11

Model Selection: ROC Curves
• ROC (Receiver Operating
Characteristics) curves: for visual
comparison of classification models
• Originated from signal detection theory
• Shows the trade-off between the true
positive rate and the false positive rate
• The area under the ROC curve is a  Vertical axis
measure of the accuracy of the model represents the true
• Rank the test tuples in decreasing positive rate
order: the one that is most likely to  Horizontal axis rep.
belong to the positive class appears at the false positive rate
the top of the list  The plot also shows a
• The closer to the diagonal line (i.e., the diagonal line
closer the area is to 0.5), the less
 A model with perfect
accuracy will have an
accurate is the model area of 1.0
JJU CoSc 6211 12

Issues Affecting Model Selection
• Accuracy
– classifier accuracy: predicting class label
• Speed
– time to construct the model (training time)
– time to use the model (classification/prediction time)
• Robustness: handling noise and missing values
• Scalability: efficiency in disk-resident databases
• Interpretability
– understanding and insight provided by the model
• Other measures, e.g., goodness of rules, such as decision tree size or
compactness of classification rules
JJU CoSc 6211 13

Reading assignment
• Text Book: Data Mining Concepts and

Techniques, by J.Han
• Linear regression
• Model Evaluation and Selection
JJU CoSc 6211 14

Evaluation Method Holdout

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluation Method Holdout

Uploaded by

Copyright:

Available Formats

CoSc 6211 Advanced Concepts of Data Mining

Classification and Prediction (3-1)

Model Evaluation and Selection

• Evaluation metrics: How can we measure accuracy? Other metrics to

JJU CoSc 6211 2

Actual class\Predicted class C1 ¬ C1

• Given m classes, an entry, CMi,j in a confusion matrix indicates # of

JJU CoSc 6211 3

JJU CoSc 6211 4

A\P C ¬C  One class may be rare, e.g.

recognition rate: percentage of test recognition rate

• Recall: completeness – what % of positive tuples did the

• Fß: weighted measure of precision and recall

Actual Class\Predicted class cancer = yes cancer = no Total Recognition(%)

JJU CoSc 6211 7

JJU CoSc 6211 8

— Break up data into groups of the same size

JJU CoSc 6211 9

JJU CoSc 6211 11

JJU CoSc 6211 12

JJU CoSc 6211 13

• Text Book: Data Mining Concepts and

JJU CoSc 6211 14

You might also like