Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 22

Tutorial 6

Evaluation metrics for machine


learning models
Classification and regression models
Classification vs. regression problems in
machine learning
Classification problem: Regression problem:
• The output/dependent variables are often called
labels or categories. • The output/dependent variables are
• The task is to predict the class or category for a given continuous variable (a real-value).
observation.
• A classification problem requires that examples be • Such as an integer. These are often
classified into one of two or more classes.
quantities, such as amounts and
• A problem with two classes is often called a two-class
or binary classification problem (e.g., spam or not sizes(e.g., dollar value of the selling
spam). price of a house)
• A problem with more than two classes is often called
a multi-class classification problem (e.g., letter grade) • Called regression problem
• A model to solve the classification problem is called a
classification model (either a multi-class or binary
• A model to solve the regression
classification model) problem is called a regression model
to predict the dependent variable which is a class
to predict the dependent variable which is a real value
Evaluation metrics
• Purpose
To evaluate the predictive performance of the established model
• Categories
Metrics for regression models (problems)
Metrics for classification models(problems)
Evaluation metrics for classification problem
1. Accuracy and Kappa (for binary and multiclass classification
problems)
2. Area Under ROC Curve (AUC): for binary classification problems only
1. Accuracy and Kappa
(for binary and multiclass classification problems)
Accuracy
• Accuracy is the percentage of correctly classified instances out of all
instances.
• It is more useful on a binary classification than multiclass classification
problem
• The number of correct predictions/ the number of all predictions
Kappa
• Kappa
Kappa is a more useful measure of the predictive accuracy when used on problems
that have an imbalance in the classes
• Imbalance
Imbalanced data typically refers to a problem with classification problems where the
classes are not represented equally.
• For example, you have a 2-class (binary) classification problem with 100 instances.
o A total of 80 instances are labeled with Class-1 and the remaining 20 instances are labeled
with Class-2.
o This is an imbalanced dataset and the ratio of Class-1 to Class-2 instances is 80:20
2. AUC
Area under the ROC Curve
Confusion Matrix

• Accuracy:
The number of correct predictions/the number of all
predictions
• Accuracy=(TP+TN)/(TP+FP+FN+TN)
Sensitivity and Specificity
Actual value (class)
Predicted value (class) Positive (diabetes=1) Negative (diabetes=0)
Positive (diabetes=1) 25 15 40
Negative (diabetes=0) 15 55 60
30 70 100

Actual value (class)


Predicted value (class) Positive (diabetes=1) Negative (diabetes=0)
Positive (diabetes=1) A=TP B=FP
Negative (diabetes=0) C=FN D=TN

• Sensitivity: is the true positive rate also called the recall. It is the number of instances
from the positive (first) class that actually predicted correctly.
• Specificity: is also called the true negative rate. It is the number of instances from the
negative class (second class) that were actually predicted correctly.
ROC curve (A receiver operating
characteristic curve)
• A receiver operating characteristic curve, or ROC curve,
• a graphical plot that illustrates the predictive ability of a binary
classification model.
• The ROC curve is created by plotting the true positive rate (TPR)
against the false positive rate (FPR) at various threshold settings.
• The TPR is also known as sensitivity, recall or probability of detection
in machine learning.
• The FPR is also known as the fall-out or probability of false alarm and
can be calculated as (1 − specificity).
• Sensitivity: is the true positive rate also called the recall. It is the number of instances from the positive
(first) class that actually predicted correctly.
• Specificity: is also called the true negative rate. It is the number of instances from the negative class (second
class) that were actually predicted correctly.
AUC=1 (ideal case)
The model is perfectly able to distinguish between positive
class and negative class.
AUC=0.7

When AUC is 0.7, it means there is 70% chance that


model will be able to distinguish between positive
class and negative class.
AUC=0.5
When AUC is approximately 0.5, model has no discrimination
capacity to distinguish between positive class and negative class.
AUC=0

The model is predicting negative class as a positive class and


vice versa.
Evaluation metrics for regression problems
1. RMSE
2. MAE
3. R squared
1. RMSE
Root Mean Square Error
1. RMSE: Root Mean Square Error
• RMSE: standard deviation of the residuals (prediction errors).
• Residuals are a measure of how far from the regression line data
points are;
• RMSE is a measure of how spread out these residuals are.
• In other words, it tells you how concentrated the data is around the
line of best fit.


𝑛
1
𝑅𝑀𝑆𝐸= ∑ ( 𝑇𝑟𝑢𝑒𝑉𝑎𝑙𝑢𝑒 𝑗 − 𝑀𝑜𝑑𝑒𝑙𝐸𝑠𝑡𝑖𝑚𝑎𝑡 𝑒 𝑗 )
2

𝑛 𝑗=1
2. MAE
Mean Absolute Error
MAE
• mean absolute error (MAE)
• MAE is the average of the absolute differences between prediction
and actual observation.
• Compared to RMSE, MAE is easier to interpret but less sensitive to
outliers:
=
3. R squared
3. R squared
• also called the coefficient of determination
• provides a goodness-of-fit measure for the predictions.
• This is a value between 0 and 1 for no-fit and perfect fit respectively.

You might also like