Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

University of Nairobi

Course: Msc. Computer Science (Computational Intelligence)


Date: 3/11/2023
SCS6105 – Machine Learning
Assignment 2
GROUP 3
Group Members

Name email Phone Number Reg


Number
Clara Musenya musyokamusenya@gmail.com 0703755941 Not available
Musyoka yet
AGIRA CHRIS chrisagira@students.uonbi.ac.ke 0755591046
JAMES
NGUU JOHN kikuvijohn@students.uonbi.ac.k 0712523444
KIKUVI e

Evaluating Supervised Learning Models


There are numerous performance metrics that we can choose from to evaluate the performance of
supervised learning models (classification and regression). For classification, commonly used measures
include accuracy, recall, precision, F1 score etc, while for regression, commonly used measures include
MSE, RMSE, MAE AND MAPE. Read up on these metrics and discuss the key considerations to be made
to guide selection of a given metric over another. Remember that metrics serve more than one purpose, so
adopt a non-limiting perspective in your discussions.

Prepare a summary (2-page max) document on your findings


REGRESSION METRICS

1. Mean Absolute Error (MAE) calculates the average magnitude of errors between predicted and actual values,
without considering their direction.
When to use: MAE is beneficial when you need a simple, interpretable metric to assess the performance of
a regression model. It's a suitable choice when you want to understand the average error magnitude without
being concerned about the direction of errors.
When not to use: when you want to heavily penalize large errors. MAE treats all errors equally, so it may
not effectively capture the impact of significant errors. In such cases, metrics like Mean Squared Error
(MSE) or Root Mean Squared Error (RMSE) could be more appropriate.

2. Mean Squared Error (MSE) It measures the average squared difference between predicted and actual values,
giving more weight to larger errors.

When to use: MSE is a suitable choice when you want to place a higher emphasis on large errors,
especially in scenarios where dealing with outliers is important.

When to avoid: When you require a more easily interpretable metric or when your dataset contains a
significant number of outliers, as MSE can be sensitive to them.

3. Root Mean Squared Error (RMSE) RMSE is valuable in models that emphasize large errors, similar to MSE,
but with the added advantage of providing results in a more interpretable format, directly related to the problem
context.
When to use:RMSE is a preferred choice when you want to penalize large errors, just like with MSE,
while also having the benefit of an error metric that shares the same unit as the target variable. This makes
it easier to relate the error to the problem at hand.
When to avoid: When you specifically need a more easily interpretable metric or when your dataset
contains a substantial number of outliers, as RMSE, like MSE, can be sensitive to outliers.
4. R-squared/Coefficient of Determination R-squared reveals the proportion of the variance in the dependent
variable that the model's independent variables can explain.

When to use: R-squared is valuable when you want to understand how well the model explains the
variation in the target variable compared to a simpler average. It provides a meaningful comparison to a
basic mean model.

When not to use: When the model involves a large number of independent variables or when it is sensitive
to outliers. In such cases, other metrics like adjusted R-squared or alternative performance metrics may be
more appropriate.

CLASSIFICATION METRICS

1. Accuracy is a classification metric that quantifies the ratio of correctly predicted instances to the total instances
in the dataset.

When to Use: When class distribution is approximately balanced, and false positives and negatives have
equal importance.

When Not to Use: When the dataset is imbalanced, meaning one class significantly dominates the other, or
when the cost of false positives and false negatives differs significantly.

2. Precision and Recall


Precision (P): Precision measures the proportion of true positive predictions among all positive predictions.
It focuses on minimizing false positive errors.

Recall (R) / Sensitivity / True Positive Rate: Recall measures the proportion of true positive predictions
among all actual positive instances. It focuses on minimizing false negative errors.
When to Use:

Use Recall when your primary objective is to minimize false negatives. A high recall value, close to 100%,
means that you want to capture as many true positives as possible, even if it results in some false positives.
This is useful when missing positive instances is more critical than making a few false positive predictions.

Use Precision when your primary goal is to minimize false positives. A high precision value, close to
100%, indicates that you want the model to make positive predictions with high confidence, even if it
results in fewer true positives. This is beneficial in scenarios where false positives are costly or undesirable.

When Not to Use:If your dataset is balanced, and the costs of false positives and false negatives are roughly equal,
and you want an overall evaluation of your model, accuracy may be more appropriate. Precision and recall are
particularly valuable when the trade-off between false positives and false negatives is unbalanced or when a specific
aspect of model performance is critical.

3. F1-Score is a classification metric that represents the harmonic mean of precision and recall. It provides a
metric that balances both precision and recall, particularly useful for imbalanced datasets and when false
positives and false negatives need to be considered.
When to Use: Use F1-Score when the class distribution in the dataset is imbalanced, and you want to
consider both precision and recall, but one is slightly more important than the other. For example, when
false negatives (missed positive instances) or false positives (incorrect positive predictions) have different
costs or implications.
It is appropriate when you need a balanced evaluation metric that considers both types of errors.
When Not to Use:If your dataset is balanced, and the costs of false positives and false negatives are
roughly equal, and you want an overall evaluation of your model's performance, accuracy may be more
suitable. F1-Score is particularly valuable when there's an imbalance in the class distribution or when you
need to emphasize the trade-offs between precision and recall.

4. Area Under The Receiver Operating Characteristic Curve (AU-ROC)AU-ROC (Area Under The Receiver
Operating Characteristic Curve) quantifies a model's ability to discriminate between positive and negative
classes.
When to use:
AU-ROC is ideal when comparing the performance of different classification models, especially in
scenarios where the class distribution is imbalanced.
It measures the quality of predictions in terms of their ranking, independent of specific threshold values,
making it valuable for assessing a model's overall discriminatory power.
When not to use:
AU-ROC is scale-invariant, which might not always be desirable. Calibration of probability outputs may be
needed.
In cases of balanced datasets with equal error costs, where minimizing a specific type of classification error
is challenging, accuracy could be a more suitable choice.

References:

1. “Top Performance Metrics in Machine Learning: A Comprehensive Guide.” V7,


www.v7labs.com/blog/performance-metrics-in-machine-learning. Accessed 3 Nov. 2023.
2. “Performance Metrics in Machine Learning - Javatpoint.” Www.Javatpoint.Com,
www.javatpoint.com/performance-metrics-in-machine-learning. Accessed 3 Nov. 2023.

You might also like