Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Management Development

Institute
Murshidabad

Batch 2019-21

Project on
Business Analyst - I

Submitted by: Submitted to:


Atul Anand Dr. K. R. R. Gandhi
19PGPM081
Executive Summary
Coconut water has traditionally been consumed as a refreshing beverage in a majority of coconut
producing states of India. South Indian states produce close to 85 per cent of the coconut
production in India, with Tamil Nadu being a major producer and exporter. "Coconut and its varied
by-products are seeing very good demand from European and gulf countries, pushing the export
numbers for products like desiccated coconut and virgin coconut oil higher.”

Coconut water is trendy for its health benefits. These growing market opportunities have, however,
necessitated that coconut water be accessible in a more convenient format and have thus led to
the development of technologies for the preservation and sale of the product in a bottled format.
But still the local players are dominant in this sector.

Problem statement
The coconut seller to find out the number of coconuts with water and without water.

The whole process of a model consists of a match count: how many data rows have been
correctly classified and how many data rows have been incorrectly classified by the model. These
counts are summarized in the confusion matrix.
For defining this model, we need to find out several questions, which will help us to quantify the
model:

1. How many of the actual number of coconuts were predicted with water?
2. How many as without water?
3. Were some coconuts without water was predicted with water?
4. How many numbers of coconuts without water were predicted correctly?
Confusion Matrix

The confusion matrix was initially introduced to evaluate results from random survey and
classification done on that basis. Thus, the first thing to do is to take one of the two classes as the
class of interest, i.e. the positive class. In the target column, we need to choose one value as the
positive class (coconuts with water). The other value is then automatically considered the negative
class. Here we chose the coconuts with water as the positive class and the coconuts with no water
as the negative class.
The confusion matrix in Figure reports the count of:

 The data rows (number of coconuts with water) belonging to the positive class and correctly
classified as such. These are called True Positives (TP). The number of true positives is
placed in the top left cell of the confusion matrix.

 The data rows (number of coconuts with water) belonging to the positive class and
incorrectly classified as negative . These are called False Negatives (FN). The number of
false negatives is placed in the top right cell of the confusion matrix. (Type – 1 Error)

 The data rows (number of coconuts with no water) belonging to the negative class and
incorrectly classified as positive. These are called False Positives (FP). The number of
false positives is placed in the lower left cell of the confusion matrix. ( Type – 2 Error)

Confusion Matrix
Predicated. YES(1) Predicted.NO(1)
Actual. YES(1) 240 67 0.781758958
Actual.NO(0) 34 311 0.901449275
0.875912409 0.822751323

True False True False


Class Positive Positive Negatives Negative Sensitivity Specificity Precision
1 240 34 311 67 0.782 0.901 0.88
0 311 67 240 34 0.901 0.782 0.822

Sensitivity measures how apt the model is to detecting events in the positive class. So, given that
number of coconuts with water are the positive class, sensitivity quantifies how many of the actual
number of coconuts with water are correctly predicted with water.
TP 240
Sensitivity =
TP+ FN
= 240+67
= 0.782

We divide the number of true positives by the number of all positive events in the dataset: the
positive class events predicted correctly (TP) and the positive class events predicted incorrectly
(FN). The model in this example reaches the sensitivity value of 0.782. This means that about
78% of the number of coconuts in the lot were correctly predicted as with water.

Specificity measures how exact the assignment to the positive class is, in this case, a spam label
assigned to an email.
TN 311
Specificity =
F P+TN
= 311+34
= 0.901

We divide the number of true negatives by the number of all negative events in the dataset: the
negative class events predicted incorrectly (FP) and the negative class events predicted correctly
(TN). The model reaches the specificity value of 0.901, so about 10% of all coconuts with water
are predicted incorrectly as coconuts with no water.

Precision measures how good the model is at assigning positive events to the positive class. That
is, how accurate the prediction is.
TP 240
Precision =
T P+ FP
= 240+34
= 0.88

We divide the number of true positives by the number of all events assigned to the positive class,
i.e. the sum of true positives and false positives. The precision value for the model is 0.88.
Therefore, almost 88% of the number of coconuts predicted as with water were actually with
water.
Summary

In this article, we’ve laid the first stone for the metrics used in model performance evaluation: the
confusion matrix.
Indeed, a confusion matrix shows the performance of a Number of coconut quality in market: how
many positive and negative events are predicted correctly or incorrectly. These counts are the
basis for the calculation of more general class statistics metrics. Here, we reported those most
commonly used: sensitivity and specificity, Precision.

You might also like