Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 17

Precision, Recall and ROC

curves
CIS4526 Complementary materials
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?

• Methods for Performance Evaluation


• How to obtain reliable estimates?

• Methods for Model Comparison


• How to compare the relative performance among competing models?
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?

• Methods for Performance Evaluation


• How to obtain reliable estimates?

• Methods for Model Comparison


• How to compare the relative performance among competing models?
Metrics for Performance Evaluation
• Focus on the predictive capability of a model
• Rather than how fast it takes to classify or build models, scalability, etc.
• Confusion Matrix:

PREDICTED CLASS

Class=Yes Class=No

a: TP (true positive)
ACTUAL Class=Yes a b
b: FN (false negative)
CLASS
Class=No c d c: FP (false positive)
d: TN (true negative)
Metrics for Performance Evaluation…
PREDICTED CLASS

Class=Yes Class=No

ACTUAL Class=Yes a b
(TP) (FN)
CLASS
Class=No c d
(FP) (TN)

• Most widely-used metric:


ad TP  TN
Accuracy  
a  b  c  d TP  TN  FP  FN
Limitation of Accuracy
• Consider a 2-class problem
• Number of Class 0 examples = 9990
• Number of Class 1 examples = 10

• If model predicts everything to be class 0, accuracy is 9990/10000 =


99.9 %
• Accuracy is misleading because model does not detect any class 1 example
Precision-Recall
Count PREDICTED CLASS
Class=Yes Class=No
a TP Class=Yes a b
Precision (p)  
a  c TP  FP ACTUAL Class=No c d
CLASS
a TP
Recall (r)  
a  b TP  FN
1 2rp 2a 2TP
F - measure (F)    
 1 / r  1 / p  r  p 2a  b  c 2TP  FP  FN
 
 2 

 Precision: how many detected positives are true positives?


 Recall: among all true positives, how many are detected?
Precision-Recall plot
• Usually for parameterized models, it controls the precision/recall
tradeoff
Precision Recall curve
Model Evaluation
• Metrics for Performance Evaluation
• How to evaluate the performance of a model?

• Methods for Performance Evaluation


• How to obtain reliable estimates?

• Methods for Model Comparison


• How to compare the relative performance among competing models?
ROC (Receiver Operating Characteristic)

• Developed in 1950s for signal detection theory to analyze noisy signals


• Characterize the trade-off between positive hits and false alarms
• ROC curve plots TPR (on the y-axis) against FPR (on the x-axis)

TP
TPR 
TP  FN PREDICTED CLASS
Fraction of positive instances Yes No
predicted as positive
Yes a b
FP Actual (TP) (FN)
FPR 
FP  TN No c d
(FP) (TN)
Fraction of negative instances
predicted as positive
ROC (Receiver Operating Characteristic)

• Performance of a classifier represented as a point on the ROC curve

• Changing some parameter of the algorithm, sample distribution, or


cost matrix changes the location of the point
ROC Curve
- 1-dimensional data set containing 2 classes (positive and negative)
- any points located at x > t is classified as positive

At threshold t:
TP=0.5, FN=0.5, FP=0.12, FN=0.88
ROC Curve
(TP,FP):
• (0,0): declare everything
to be negative class
• (1,1): declare everything
to be positive class
• (1,0): ideal

PREDICTED CLASS
• Diagonal line:
• Random guessing Yes No
• Below diagonal line: Yes a b
• prediction is opposite of Actual (TP) (FN)
the true class No c d
(FP) (TN)
Using ROC for Model Comparison
 No model consistently
outperform the other
 M is better for
1
small FPR
 M is better for large
2
FPR
 Area Under the ROC
curve (AUC)
 Ideal: Area = 1
 Random guess:
 Area = 0.5
ROC curve vs Precision-Recall curve

Area Under the Curve (AUC) as a single number for evaluation

You might also like