Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Project1 - Cold Storage Case Study

Report
Table of Contents
1 Project Objective............................................................................................................. 3
2 Assumptions.................................................................................................................... 3
3 Exploratory Data Analysis – Step by step approach........................................................3
3.1 Environment Set up and Data Import.......................................................................3
3.1.1 Install necessary Packages and Invoke Libraries..............................................3
3.1.2 Set up working Directory...................................................................................3
3.1.3 Import and Read the Dataset............................................................................4
3.2 Variable Identification...............................................................................................4
3.2.1 Variable Identification – Inferences...................................................................4
3.3 Univariate Analysis...................................................................................................4
3.4 Bi-Variate Analysis...................................................................................................5
3.5 Missing Value Identification......................................................................................5
3.6 Outlier Identification.................................................................................................5
3.7 Variable Transformation / Feature Creation.............................................................5
4 Conclusion...................................................................................................................... 5
5 Appendix A – Source Code.............................................................................................5
1 Project Objective
The objective of the report is to explore Cold Storage Data Sets -
(“Cold_Storage_Temp_Data”) and (“Cold_Storage_Mar2018”) in R and generate insights
about the data set. This exploration report will consist of the following:

 Importing the dataset in R


 Understanding the structure of dataset
 Graphical exploration
 Descriptive statistics
 Insights from the dataset

2 Assumptions

The Hypothesis test that we are performing considers the following assumptions:

1) The sample of Cold Storage temperature was randomly selected.

2) As the sample size of our data is sufficiently large (N > 35), we know, based on
the central limit theorem, that the sampling distribution of the mean will be
approximately normal, regardless of the distribution being sampled.

3) The z-test and the t-test both assume that the data are independently sampled
from a normal distribution

4) A z-test assumes that σ (Standard Deviation) is known but a t-test does not.

3 Exploratory Data Analysis – Step by step approach


A Typical Data exploration activity consists of the following steps:

1. Environment Set up and Data Import


2. Variable Identification
3. Univariate Analysis
4. Bi-Variate Analysis
5. Missing Value Treatment
6. Outlier Treatment
7. Variable Transformation / Feature Creation
8. Feature Exploration

3.1 Environment Set up and Data Import

3.1.1 Install necessary Packages and Invoke Libraries


The following packages and libraries were installed for the analysis:
1. Dplyr – is a grammar of data manipulation, providing a consistent set of verbs
that help you solve the most common data manipulation challenges. These all
combine naturally with group_by() which allows you to perform any operation
“by group”.
3| Pag
e
2. Readr - readr provide a fast and friendly way to read rectangular data (like 'csv',
'tsv', and 'fwf') and we have our dataset available in .csv format which needs
readr to be installed.
3. Ggplot2 - ggplot2 is a system for declaratively creating graphics, based on The
Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to
aesthetics, what graphical primitives to use, and it takes care of the details.

3.1.2 Set up working Directory


Setting a working directory on starting of the R session makes importing and
exporting data files and code files easier. Basically, working directory is the location/
folder on the PC where you have the data, codes etc. related to the project.

Please refer Appendix A for Source Code.

4| Pag
e
3.1.3 Import and Read the Dataset
The given dataset is in .csv format. Hence, the command ‘read.csv’ is used for
importing the file.

Please refer Appendix A for Source Code.

3.2 Variable Identification

 setwd() - set working directory


 mean(x) - to calculate mean of X data sample.
 Function() - These functions provide the base mechanisms for defining new functions
in the R language.
 Summarise() - Creates a variable named mean_temp_season which is the average of
the column temperature for each season from the dataset data.
 group_by() - It creates summary statistic by group. The library dplyr applies a function
automatically to the group you passed inside the verb group_by.
 Sd() – Sd(Cold_Storage_Temp_Data$Temperature) is used for calculation standard
deviation for the column Temperature of the dataset.
 pnorm()- pnorm gives the distribution function and calculates the c. d. f. where F(x)
= P(X <= x) where X is normal. In our case study we have used it to calculate the
probability for temperature having gone above 4* C or/and below 2* C.
 pnorm(-abs(zstat)) - Used to calculate the p-value of the Z test taken.
 qnorm(P(X)) – qnorm is the R function that calculates the inverse c. d. f. F-1 of the
normal distribution .In our case study we used to calculate the Z critical point which is
used to identify the rejection and acceptance region at alpha = 0.10 in our case. But we
have used .90 as the value for lower which is 1.2815.
 plot() - Generic function for plotting of R objects
 boxplot() - Produce box-and-whisker plot(s) of the given (grouped) values.
 qplot() - qplot is a shortcut designed to be familiar if you're used to base plot(). It's a
convenient wrapper for creating a number of different types of plots using a consistent
calling scheme.
 hist() - The generic function hist computes a histogram of the given data values.
 t.test() - We have used his function to perform one tailed t test and as our test is left
tailed so we have used these parameters as alternative = “less”, conf.level= .90 for
90% confidence level and Mu = 3.9 as it is the true value of mean.

Please refer appendix A for Source Code

3.3 Univariate Analysis

1. The following box plot gives the overall temperature distribution frequency depicting
that there is one outlier in the data which has the value of 5 degree whereas
maximum upper limit is : Q3 + 1.5IQR = 3.30 + (1.5*0.8) = 4.5 degree
As shown below, Histogram distribution of temperature shows that the temperature
for overall year follows normal distribution bell curve.

2. The below given bar plot shows that there are approximately equal no. of days in
the three seasons which have been taken under observation (Rainy, Summer and
Winter)

Rainy: 122
Summer:120
Winter: 123
3.4 Bi-Variate Analysis

The following box plot shows the co-relation between the seasons and the
temperature variations for each season. It clearly gives a pictorial
representation of the distribution of temperatures for respective seasons.

3.5 Missing Value Identification

The given data set contains outliers in Winter and Rainy season:
1. 3 outliers towards the right in case of Winter season.
2. 1 outlier towards the right in case of Rainy season.

3.6 Variable Transformation / Feature Creation

“Season” variable has been transformed to “as.factor” as it is needed to


categorize season into 3 different types – “Winter”, “Summer” and “Rainy”
to clearly publish the data for these seasons.

4 Conclusion

In our case study of Cold Storage, it is given that to ensure that there is no
change of texture, body appearance, separation of fats the optimal temperature
to be maintained is between 2 deg - 4 deg C.

In March 2018, customers started complaining for the dairy products going sour
and often smelling. According to the supervisor, he has been vigilant on
maintaining the temperature below 3.9 deg C. So according to the problem, we
can formulate our null and alternative hypothesis for the test as follows:

1. Ho (Null hypothesis): μ > = 3.9

2. Ha (Alternative hypothesis) : μ < 3.9


5| Pag
e
This means, if the temperature for the sample is actually maintained below
3.9 deg C, the supervisor’s claim would be correct and the Null hypothesis
will be rejected but if the temperature is not maintained according to
permissible standards, the Null hypothesis will be accepted.

After studying the sample of 35 days that has been pulled out by the
supervisor, we calculated the following:

1. Sample Mean (x bar): 3.974

2. Population Mean μ of the sample (given): 3.9

3. Standard Deviation of the sample (sd)- assumed for ztest as given in the
question: 0.508

4. No. of sample values taken (n): 35

Z-test Statistics Analysis: As per the ztest for one tailed test, we can
formulate that-

1. Zstat = 0.8617

2. P-value = 0.1944 (19.44%)

This shows that the Zstat lies in the acceptance region(Null acceptance) and
the p-value being more than the significance value (alpha = 0.1) indicates that
our Null Hypothesis is true and should be accepted. Therefore, we find
sufficient statistical evidence to accept the null hypothesis at the 0.1 level of
significance.

T-test Statistics Analysis: As per the t-test for one tailed test(Left tail), we
can easily formulate through R, the value for t.test that clearly shows that the
true mean is more than the 3.9 which is in Null Acceptance region and the p-
value equals 0.9953 that means with 99.53% confidence level, Null
Hypothesis can be accepted.

Therefore, it can be concluded after performing both the Hypothesis tests, that
the claim of supervisor of maintaining the temperature below 3.9 deg C is
incorrect and corrective measures should be taken by the Cold Storage Plant to
strictly monitor and maintain the temperature of the dairy products between 2 –
4 degC to ensure delivery of quality products to the customers.

5 Appendix A – Source Code

6| Pag
e
PROBLEM 1:

7| Pag
e
8| Pag
e
PROBLEM 2:

You might also like