Binary Classification Machine Learning Models

Performance Considerations for Binary
Classification Machine Learning Models

Joseph Dixon
Binary Classification (BC) Machine Learning Models1

BC models are algorithms in executable program form that categorize (new) observations into one of
two classes, after being trained to distinguish between the classes using known observations.
(Karabiber, 2023) Such models predict a condition’s presence or absence for an observation (generally,
positive or negative). In practice, BC models prove effective at such tasks as:
▪ Credit-worthiness assessment ▪ Spam filtering ▪ Image classification
Key Performance Metrics

A BC model’s classification performance is vital to the model’s effectiveness:
▪ for performance comparisons during the model development and training;

▪ for performance during use, where an observation’s actual classification is known.
Performance at first appears to require two considered outcomes:
correct classifications / [correct + incorrect classifications]
Full understanding, however, requires nuanced considerations, because –
A. There are actually four outcome possibilities (Irizarry, 2019):
B. The classification task’s logical context actually determines which possible outcome(s) are
relevant.
Key concepts to measuring classification performance are: (Géron, 2017) (Starmer, 2020)
Precision – the model’s ability to predict (positive) only the actual (positive) cases – refers to “how
precise are the model’s true positive (i.e., correct positive) predictions relative to all positive
predictions it made?” False-positives (FP) penalize performance.
1
For discussion purposes, ‘classification’ and ‘prediction’ are synonyms.
DTSC-670, Assignment 5 – Dixon, Joseph Page 1 of 4

Sensitivity/Recall – the model’s ability to correctly predict actual
positives in the set of observations – refers to “how many of the total
actual positives does the model actually recall?”; or, “how sensitive are
positive predictions to actual (positive) cases?”. Positive prediction of
an actual negative (false-negative, FN) penalizes performance.
Specificity – the model’s ability to correctly predict all actual negative

cases (TN) – refers to “how specific is the model in accurately
classifying actual negative conditions”? Positive predictions for actual
negatives (FP) penalize performance.
Figure 2 shows each measure’s mathematical formulation, while Figure 3 visualizes performance
concepts’ relations to outcome possibilities.
Explaining to a Lay Audience

As already described, classification performance bears nuances that could challenge lay audiences.
Therefore, explanation to a lay audience likely warrants:
1. Avoiding math-based description, as the audience’s “percentages-challenged” members may be

substantive. (Siegler, 2017)
2. Emphasizing the why and how each measure’s term “make sense”.
3. Illustrating/reinforcing concepts through (many) common-case examples.
Precision/Recall Trade-Off
Precision and recall (aka sensitivity) can contend: i.e., improving one can degrade the other.
Significantly, figure 2 shows that precision and recall formulas:
▪ differ only in denominator, by additive FP & FN factors, respectively.
Figure 4 illustrates how contention can arise between the metrics:
▪ Tactics to increase precision

performance require lowering its
denominator value, achieved by
lowering FP count;
▪ Lowering FP counts requires increasing

the decision-making threshold value;

▪ However, threshold value increase sets a “lower bar” for predicting negatives, raising
likelihood of negative predictions of actual positives (FNs);
▪ Increased FNs increase recall’s denominator, which decreases recall performance.
A Unique Example
Consider Mars robotic mineral sampling at candidate collection locales:
▪ TPs are important because quality samples analysis is a prime exploration objective
▪ Avoiding FPs (higher precision) is important – collected, but shouldn’t have – depletes the
battery, reducing per-excursion quality yield; likewise, drill bit wear reduces sampling lifetime.
▪ Avoiding FNs (higher recall) is important – didn’t collect, but should have – directly risks failing
the exploration objective.
ROC Curve and Use

The ROC (radio operator characteristic) curve visually characterizes prediction accuracy trade-off, as
Figure 5 illustrates, by:
▪ plotting recall (sensitivity) as f(specificity) = [1 – specificity],

visually illustrating trade-off between the two
▪ plotting a random classification scheme’s (RCS) tradeoff – a

[0, 0] to [1, 1,] diagonal – as reference.
Area under the ROC curve (AUC, shaded) provides a calculable

performance measure in the range [0, 1] that includes relevant
trade-offs. (Glen, 2019) (Wikipedia, 2023)
Finally, the model developer identifies a desirable trade-off point on the ROC curve, then selects the
point’s corresponding threshold cutoff value to include in the BC decision function.

Bibliography
Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reillly Media, Inc.
Glen, S. (2019). ROC Curve Explained in One Picture. Retrieved from Data Science Central:
https://www.datasciencecentral.com/roc-curve-explained-in-one-picture/
Irizarry, R. (2019). Introduction to Data Science: Data Analysis and Prediction Algorithms with R.
Chapman & Hall.
Karabiber, F. (2023). Binary Classification. Retrieved from Learning Data Science:

https://www.learndatasci.com/
Siegler, R. (2017, November 28). Fractions: Where It All Goes Wrong. Scientific American.
Starmer, J. (Director). (2020). Machine Learning Fundamentals: Sensitivity and Specificity [Motion
Picture].
Wikipedia. (2023). Receiver operating characteristic. Retrieved from Wikipedia:

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Binary Classification Machine Learning Models

Uploaded by

Copyright:

Available Formats

You might also like

Binary Classification Machine Learning Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Binary Classification Machine Learning Models

Uploaded by

Copyright:

Available Formats

Performance Considerations for Binary

Classification Machine Learning Models

Binary Classification (BC) Machine Learning Models1

▪ Credit-worthiness assessment ▪ Spam filtering ▪ Image classification

Key Performance Metrics

▪ for performance comparisons during the model development and training;

Performance at first appears to require two considered outcomes:

correct classifications / [correct + incorrect classifications]

Full understanding, however, requires nuanced considerations, because –

A. There are actually four outcome possibilities (Irizarry, 2019):

DTSC-670, Assignment 5 – Dixon, Joseph Page 1 of 4

Specificity – the model’s ability to correctly predict all actual negative

Explaining to a Lay Audience

Therefore, explanation to a lay audience likely warrants:

1. Avoiding math-based description, as the audience’s “percentages-challenged” members may be

▪ differ only in denominator, by additive FP & FN factors, respectively.

Figure 4 illustrates how contention can arise between the metrics:

▪ Tactics to increase precision

▪ Lowering FP counts requires increasing

DTSC-670, Assignment 5 – Dixon, Joseph Page 2 of 4

▪ Increased FNs increase recall’s denominator, which decreases recall performance.

ROC Curve and Use

▪ plotting recall (sensitivity) as f(specificity) = [1 – specificity],

▪ plotting a random classification scheme’s (RCS) tradeoff – a

Area under the ROC curve (AUC, shaded) provides a calculable

DTSC-670, Assignment 5 – Dixon, Joseph Page 3 of 4

Karabiber, F. (2023). Binary Classification. Retrieved from Learning Data Science:

Wikipedia. (2023). Receiver operating characteristic. Retrieved from Wikipedia:

DTSC-670, Assignment 5 – Dixon, Joseph Page 4 of 4

You might also like