Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

▪ The outputs are finite and the developed model must assign a single

class to new inputs.


▪ the dataset is a collection of labeled-data records in the form: {independent
variables as inputs, and the associated classes (i.e., labels) as outputs}. The
objective is to develop a machine learning model to relate the inputs to the
outputs, and to predict the class of new inputs.
▪ In practice, the dataset is divided into two sets, which are the training and the
testing sets.
▪ The training set is employed to develop the classification model, while the testing
set is utilized afterwards to evaluate the accuracy of the developed model.
▪ For binary classification models, the outputs of the model can be presented in a
confusion matrix form.
▪ The confusion matrix shows how many instances are correctly classified by the
developed model, as seen in the following figure for a two-class problem.
True Predicted
0 1
1 1
1 0
0 0
1 1
0 0
0 1
1 1
• Compute the confusion matrix
0 0
1 1
Predicted (Model output)
0 1

Actual 0 TN FP
(desired
output) 1 FN TP
• Precision
The number of true positives (i.e., the number of correctly labeled instances) to a specific
class, divided by the total number of positive predictions.
• Precision
The number of true positives (i.e., the number of correctly labeled instances) to a specific
class, divided by the total number of instances assigned to this class.

The precision is a measure of the


model’s exactness.
• Precision
The number of true positives (i.e., the number of correctly labeled instances) to a specific
class, divided by the total number of positive predictions.

• Recall
The number of true positives to a specific class, divided by the total number of instances
actually in this class.
• Precision
The number of true positives (i.e., the number of correctly labeled instances) to a specific
class, divided by the total number of positive predictions.

The recall is a measure of the


model’s completeness.
• Recall
The number of true positives to a specific class, divided by the total number of instances
actually in this class.
• F-score
measures the weighted average of precision and recall.
▪ One example is the historical dataset of real estate values. If the characteristics and
corresponding prices for many houses within a certain city are provided, can the
price of a different house in this area be predicted by its characteristics?
▪ One example is the historical dataset of real estate values. If the characteristics and
corresponding prices for many houses within a certain city are provided, can the
price of a different house in this area be predicted by its characteristics?
▪ The evaluation metric that is routinely calculated to judge the model’s performance
is the mean squared error (MSE), as given by the following equation:

Where y is the output of the developed regression


model, d is the desired output, and n is the total number
of the instances.
Where y is the output of the developed regression
model, d is the desired output, and n is the total number
of the instances.
▪ In general, when exposed to more observations, the model improves its predictive
performance.
▪ However, too much adaptability will force the model to learn the noise within the
training data rather than the underlying input/output relations.
▪ Therefore, the resultant model overfits the training set, and will not perform equally
well on the testing set.
▪ Overfitting happens when the model highly adapts to the training set but by doing
so fails on the testing set.
Croteau et al. (2017) explain, “overfitting will
impact negatively on the degree of
generalization to new data and thus must be
avoided in order for solutions to be useful for
practical application” (p. 306)
▪ An efficient machine learning model attempts to decrease generalization errors
and thus have good predictions on data that the model was not trained for.
▪ On the other hand, underfitting happens when the model does not capture enough
of the inherent structure in the training data, which results in poor performance
with both training and testing sets.

You might also like