Professional Documents
Culture Documents
Management Development Institute: Murshidabad
Management Development Institute: Murshidabad
Institute
Murshidabad
Batch 2019-21
Project on
Business Analyst - I
Coconut water is trendy for its health benefits. These growing market opportunities have, however,
necessitated that coconut water be accessible in a more convenient format and have thus led to
the development of technologies for the preservation and sale of the product in a bottled format.
But still the local players are dominant in this sector.
Problem statement
The coconut seller to find out the number of coconuts with water and without water.
The whole process of a model consists of a match count: how many data rows have been
correctly classified and how many data rows have been incorrectly classified by the model. These
counts are summarized in the confusion matrix.
For defining this model, we need to find out several questions, which will help us to quantify the
model:
1. How many of the actual number of coconuts were predicted with water?
2. How many as without water?
3. Were some coconuts without water was predicted with water?
4. How many numbers of coconuts without water were predicted correctly?
Confusion Matrix
The confusion matrix was initially introduced to evaluate results from random survey and
classification done on that basis. Thus, the first thing to do is to take one of the two classes as the
class of interest, i.e. the positive class. In the target column, we need to choose one value as the
positive class (coconuts with water). The other value is then automatically considered the negative
class. Here we chose the coconuts with water as the positive class and the coconuts with no water
as the negative class.
The confusion matrix in Figure reports the count of:
The data rows (number of coconuts with water) belonging to the positive class and correctly
classified as such. These are called True Positives (TP). The number of true positives is
placed in the top left cell of the confusion matrix.
The data rows (number of coconuts with water) belonging to the positive class and
incorrectly classified as negative . These are called False Negatives (FN). The number of
false negatives is placed in the top right cell of the confusion matrix. (Type – 1 Error)
The data rows (number of coconuts with no water) belonging to the negative class and
incorrectly classified as positive. These are called False Positives (FP). The number of
false positives is placed in the lower left cell of the confusion matrix. ( Type – 2 Error)
Confusion Matrix
Predicated. YES(1) Predicted.NO(1)
Actual. YES(1) 240 67 0.781758958
Actual.NO(0) 34 311 0.901449275
0.875912409 0.822751323
Sensitivity measures how apt the model is to detecting events in the positive class. So, given that
number of coconuts with water are the positive class, sensitivity quantifies how many of the actual
number of coconuts with water are correctly predicted with water.
TP 240
Sensitivity =
TP+ FN
= 240+67
= 0.782
We divide the number of true positives by the number of all positive events in the dataset: the
positive class events predicted correctly (TP) and the positive class events predicted incorrectly
(FN). The model in this example reaches the sensitivity value of 0.782. This means that about
78% of the number of coconuts in the lot were correctly predicted as with water.
Specificity measures how exact the assignment to the positive class is, in this case, a spam label
assigned to an email.
TN 311
Specificity =
F P+TN
= 311+34
= 0.901
We divide the number of true negatives by the number of all negative events in the dataset: the
negative class events predicted incorrectly (FP) and the negative class events predicted correctly
(TN). The model reaches the specificity value of 0.901, so about 10% of all coconuts with water
are predicted incorrectly as coconuts with no water.
Precision measures how good the model is at assigning positive events to the positive class. That
is, how accurate the prediction is.
TP 240
Precision =
T P+ FP
= 240+34
= 0.88
We divide the number of true positives by the number of all events assigned to the positive class,
i.e. the sum of true positives and false positives. The precision value for the model is 0.88.
Therefore, almost 88% of the number of coconuts predicted as with water were actually with
water.
Summary
In this article, we’ve laid the first stone for the metrics used in model performance evaluation: the
confusion matrix.
Indeed, a confusion matrix shows the performance of a Number of coconut quality in market: how
many positive and negative events are predicted correctly or incorrectly. These counts are the
basis for the calculation of more general class statistics metrics. Here, we reported those most
commonly used: sensitivity and specificity, Precision.