Professional Documents
Culture Documents
Deep dive into Confusion Matrix _ Towards AI
Deep dive into Confusion Matrix _ Towards AI
Deep dive into Confusion Matrix _ Towards AI
Search
MODEL EVALUATION
Image by Author
In the field of Data Science, model evaluation is the key component of the Training
Lifecycle. There are many metrics to evaluate the classification model, but the
Accuracy metric is often used. However, Accuracy might not give the correct
depiction of the model due to class imbalance, and in such case, the Confusion
Matrix is to be used for evaluation.
Confusion Matrix is pivotal to know, as many metrics are derived from it, be it
precision, recall, F1-score, or Accuracy.
True Positive (TP) is the number of correct predictions when the actual class is
positive.
True Negative (TN) is the number of correct predictions when the actual class is
negative.
False Positive (FP) is the number of incorrect predictions when the actual class is
positive, also referred to as Type I Error.
False Negative (FN) is the number of incorrect predictions when the actual class is
negative, also referred to as Type II Error.
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.33,
random_state=42)
lr= LogisticRegression()
lr.fit(X_train,y_train)
y_pred=lr.predict(X_test)
Output:
TP: 63
TN: 118
FP: 3
FN: 4
True Positive Rate (TPR), Sensitivity, Recall: It is the probability of a person testing
positive who has a disease. In other words, Recall is the proportion of examples of a
particular class predicted by the model as belonging to that class.
Output:
0.9752066115702479
True Negative Rate (TNR), Specificity: It is the probability of a person testing
negative who does not have a disease.
False Positive Rate (FPR), fall-out: It is the probability of a person testing positive
who does not have a disease.
False Negative Rate (FNR), miss rate: It is the probability of a person testing
negative who does have a disease.
TNR = TN/(TN+FP)
print("Specificity: ", TNR)
FPR = FP/(TN+FP)
print("FPR: ", FPR)
FNR = FN/(TP+FN)
print("FNR: ", FNR)
Output:
Specificity: 0.9752066115702479
FPR: 0.024793388429752067
FNR: 0.05970149253731343
Output:
0.9672131147540983
TNR = TP/(TP+FN)
NPV = TN/(TN+FN)
print("NPV: ", NPV)
LRp = TPR/FPR
print("LR+: ", LRp)
LRn = FNR/TNR
print("LR-: ", LRn)
Output:
NPV: 0.9672131147540983
LR+: 37.92537313432836
LR-: 0.06349206349206349
Accuracy: Accuracy is the proportion of examples that were correctly classified. To
be more precise, It is the ratio of correct prediction over the total number of cases.
Output:
0.9627659574468085
Balanced Accuracy: It is the arithmetic mean of TPR and TNR. Balanced Accuracy
finds its usage where data imbalance exists.
Output:
0.9577525595164673
F1 Score: It is the harmonic mean of precision and recall, so it’s an overall measure
of the quality of a classifier’s predictions. It is usually the metric of choice for most
people because it captures both precision and recall. It finds its way during Data
Imbalance.
from sklearn.metrics import f1_score
f1_score(y_test, y_pred)
Output:
0.9711934156378601
F1 is the composite metric where precision and recall are considered There are
other composite metrics like precision-recall curve and ROC, and AUC, which are
important to assess any classification model. To read more about these curves,
please visit Precision-Recall and ROC Curve.
The below code is similar to the classification report of sklearn instead, it will give all
metrics out of the confusion matrix for binary classification.
1 def binary_classification_report(y_true, y_pred):
2 """
3 Args:
4 y_true (ndarray)
5 y_pred (ndarray)
6
7 Return:
8 report (dict)
9 """
10
11 from sklearn.metrics import f1_score
12 from sklearn.metrics import recall_score
13 from sklearn.metrics import accuracy_score
14 from sklearn.metrics import precision_score
15 from sklearn.metrics import balanced_accuracy_score
16
17 from . import confusion_matrix
18
19 conf_mat = confusion_matrix(y_true, y_pred, plot=False)
20 TN, FP, FN, TP = conf_mat.ravel()
21 TPR = recall_score(y_true, y_pred)
22 TNR = TN/(TN+FP) if (TN+FP)!=0 else 0
23 PPV = precision_score(y_true, y_pred)
24 report = {'TP': TP, 'TN': TN, 'FP': FP, 'FN': FN,
25 'TPR': TPR, 'Recall': TPR, 'Sensitivity': TPR,
26 'TNR' : TNR, 'Specificity': TNR,
27 'FPR': FP/(FP+TN) if (FP+TN)!=0 else 0,
28 'FNR': FN/(FN+TP) if (FN+TP)!=0 else 0,
29 'PPV': PPV, 'Precision': PPV,
30 'Accuracy': accuracy_score(y_true, y_pred),
31 'Balaced Accuracy': balanced_accuracy_score(y_true, y_pred),
32 'F1 Score': f1_score(y_true, y_pred)
33 }
34 return report
Output:
{'TP': 118,
'TN': 63,
'FP': 4,
'FN': 3,
'TPR': 0.9752066115702479,
'Recall': 0.9752066115702479,
'Sensitivity': 0.9752066115702479,
'TNR': 0.9402985074626866,
'Specificity': 0.9402985074626866,
'FPR': 0.05970149253731343,
'FNR': 0.024793388429752067,
'PPV': 0.9672131147540983,
'Precision': 0.9672131147540983,
'Accuracy': 0.9627659574468085,
'Balaced Accuracy': 0.9577525595164673,
'F1 Score': 0.9711934156378601}
Note: all the above codes mentioned in the blog are for binary classification,
In this blog, we understood the confusion matrix for binary classification. However, if you
are interested in multiclass, please refer to Multi-class Model Evaluation with Confusion
Matrix and Classification Report and if you are wondering about the “from . import
confusion_matrix”, please refer to the Introduction to Confusion Matrix for the Python
method.
References:
Follow