Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

STATISTICS FOR BUSINESS

CIA 3 – DATA ANALYSIS EXERCISE

A Report submitted in partial fulfillment of the requirements for the degree of


Master of Business Administration

By
ARITRA GHOSH (2327106)
DEBANIK HAZRA (2327113)
DIANA JOHN (2327114)
SHINY SHAH (2327150)
GARIKAPATI SANDEEP CHOWDARY (2327120)

Under the Guidance of

Dr. RAJASHREE KAMATH K

MBA PROGRAMME
SCHOOL OF BUSINESS AND MANAGEMENT
CHRIST (DEEMED TO BE UNIVERSITY), BANGALORE
JULY 2023

1
INDEX

S. No. Content Page No.


1 Introduction 3
2 Research Methodology 3
3 Business Problem 3
4 Data Dictionary 3-4
5 Sample Size Determination 4
6 Point & Interval Estimation 5-8
7 One sample t-test. 8 – 12
8 One Sample Test for Proportion 12 - 15
9 Two sample t-test 15 – 17
10 ANOVA. 17 – 19
11 Chi-square test 19 – 21
12 Correlation 21 – 23
13 Simple Linear Regression 23 - 27
14 Conclusion 28

2
INTRODUCTION

Sales Prediction in the retail domain is the focus of this study. For studying the same, the retail sales
dataset of outlets of varied types and sizes across different locations are taken.

This dataset contains crucial variables like Item Weight, Item Fat Content, Item Visibility, Item Type,
Item MRP, Outlet Size, Outlet Location Type, Outlet Type, and Item Outlet Sales, relevant to the
study.

Point & Interval estimation, One sample t test, One Sample Test for Proportion, Two sample t test,
Anova, Chi-square test, Correlation, and Simple Linear Regression, and are the statistical methods
used in this work.

RESEARCH METHODOLOGY

Secondary data is used for the purpose of this study.

Dataset URL:
https://www.kaggle.com/datasets/adarshkumarjha/big-mart-sales-prediction

This dataset was utilized to forecast the demand in the retail domain.

BUSINESS PROBLEM

Demand Forecasting: The study focuses on forecasting the demand in the retail domain by analyzing
how product attributes like Item Weight, Item Fat Content, Item Visibility, Item Type, and Item MRP
and outlet attributes like Outlet Size, Outlet Location Type, Outlet Type relates to Item Outlet Sales.

DATA DICTIONARY

Variable Description Data Type Sub-classsification


Item Identifier Unique identifier for each Qualitative Nominal
item
Item Weight Weight of the product Quantitative Continuous (Ratio)
Item Fat Content Fat content of the product Qualitative Nominal
Item Visibility Percentage of total Quantitative Continuous (Ratio)
display area occupied by
the product on shelves
Item Type Category/type of the Qualitative Nominal
product
Item MRP Maximum Retail Price of Quantitative Continuous (Ratio)
the product
Outlet Identifier Unique identifier for each Qualitative Nominal
outlet
Outlet Year in which the outlet Quantitative Discrete
Establishment Year was established
Outlet Size Size of the outlet (e.g., Qualitative Ordinal
Small, Medium, High)
Outlet Location Type of location where Qualitative Ordinal
Type the outlet is situated (e.g.,
Tier 1, Tier 2, Tier 3)

3
Outlet Type Type of outlet (e.g., Qualitative Nominal
Supermarket Type1,
Grocery Store)
Item Outlet Sales Sales of the product in the Quantitative Continuous (Ratio)
outlet

SAMPLE SIZE DETERMINATION

DESCRIPTIVE STATICS FOR THE LEADER VARIABLE (ITEM OUTLET SALES)


Mean 2181.288914
Standard Error 18.48459547
Median 1794.331
Mode 958.752
Standard Deviation 1706.499616
Sample Variance 2912140.938
Kurtosis 1.615876681
Skewness 1.177530603
Range 13053.6748
Minimum 33.29
Maximum 13086.9648
Sum 18591125.41
Count 8523
Confidence Level (95.0%) 36.23428767

Out of a population of 8523 peoples, the optimum sample size for the study is computed below. Here:

Mean = 2181.2889
Assumed Mean = 2000
Margin of Error = 181
Significance level = Alpha = 0.05

Sample size, n = (Z alpha/2*S.D.)/k)^2


n = ((Z 0.05/2*1706.4996)/181)^2
n = ((1.96*1706.4996)/181)^2
n = 341.4816

Hence, optimum sample size for the study = 341

STATISTICAL TOOLS USED FOR DATA ANALYSIS

• Point & Interval Estimation


• One sample t-test.
• One Sample Test for Proportion
• Two sample t-test.
• ANOVA.
• Chi-square test
• Correlation
• Simple Linear Regression.

4
1) Point & Interval Estimation

The sample mean and proportion are said to be the point estimators of the population mean and
proportion respectively. Interval estimation uses sample data to determine an interval of potential
values for an unknown population parameter.

The (1-alpha) % confidence interval for μ is given by: Mean +- Margin of Error

Continuous Variables

DESCRIPTIVE STATISTICS ON ALL CONTINUOUS VARIABLES

Item Item Item Outlet


Weight Visibility Item MRP Sales
Mean 11.03036765 0.062668024 141.8061314 2155.928737
Standard Error 0.336653637 0.00258931 3.273337088 87.11901211
Median 11 0.052836076 142.6496 1780.3492
Mode 0 0 110.157 1414.1592
Standard Deviation 6.207586347 0.047814676 60.44604925 1608.755822
Sample Variance 38.53412826 0.002286243 3653.72487 2588095.295
- -
Kurtosis 0.881443285 1.178610751 0.699547354 2.48453533
-
Skewness 0.297725837 1.061660226 0.173488172 1.294612023
Range 21 0.27321283 234.7984 10034.9376
Minimum 0 0 31.89 37.9506
Maximum 21 0.27321283 266.6884 10072.8882
Sum 3750.325 21.36979602 48355.8908 735171.6994
Count 341 341 341 341
Confidence
Level(95.0%) 0.662193146 0.005093084 6.438541888 171.3601116

Upper Level 11.69256079 0.067761107 148.2446733 2327.288849


Lower Level 10.3681745 0.05757494 135.3675895 1984.568626

Item Weight

Point Estimator = Sample Mean = 11.0304


Interval = Mean +- Margin of Error
Interval = (11.0304 - 0.6622, 11.0304 + 0.6622)
Interval = (10.3682, 11.6926)

Item Visibility

Point Estimator = 0.0627

Interval = Mean +- Margin of Error


Interval = (0.0627 - 0.0051, 0.0627 + 0.0051)
Interval = (0.0576, 0.0678)

5
Item MRP

Point Estimator = 141.8061

Interval = Mean +- Margin of Error


Interval = 141.8061 - 6.4385, 141.8061 + 6.4385)
Interval = (135.3676, 148.2446)

Item Outlet Sales

Point Estimator = 2155.9287

Interval = Mean +- Margin of Error


Interval = (2155.9287 - 171.3601, 2155.9287 + 171.3601)
Interval = (1984.5686, 2327.2888)

Interpretation: Here, Point & Confidence Interval Estimation is same with manual and excel
calculation.

Categorical Variables

Item Fat
Item Fat Content Count of Item Fat Content Sample Proportion
Low Fat 218 0.639296188
Regular 123 0.360703812
Grand Total 341 1
Margin of Error 0.050968905
Upper Level 0.690265093
Lower Level 0.588327282
Point Estimator = Sample Proportion = 0.6393

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)


Alpha = Significance level = 0.05
Interval = 0.6393 +- Z 0.05/2*sqrt(0.6393*0.3607/341)
Interval = (0.5883, 0.6903)

Item Type

Item Type Count of Item Type Sample Proportion


Baking Goods 26 0.076246334
Breads 13 0.038123167
Breakfast 3 0.008797654
Canned 31 0.090909091
Dairy 20 0.058651026
Frozen Foods 32 0.093841642
Fruits and
Vegetables 53 0.15542522
Hard Drinks 10 0.029325513
Health and Hygiene 28 0.082111437
Household 43 0.126099707
Meat 13 0.038123167

6
Others 6 0.017595308
Seafood 3 0.008797654
Snack Foods 39 0.114369501
Soft Drinks 18 0.052785924
Starchy Foods 3 0.008797654
Grand Total 341 1
Margin of Error 0.038455519
Upper Level 0.193880739
Lower Level 0.116969701

Point Estimator = 0.1554

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)


Alpha = Significance level = 0.05

Interval = 0.1554 +- Z 0.05/2*sqrt(0.1554*0.8446/341)


Interval = (0.1170, 0.1939)

Outlet Location Type

Count of Outlet Location


Outlet Location Type Type Sample Proportion
Tier 1 92 0.269794721
Tier 2 120 0.351906158
Tier 3 129 0.37829912
Grand Total 341 1
Margin of Error 0.051473925
Upper Level 0.429773046
Lower Level 0.326825195

Point Estimator = 0.3783

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)


Alpha = Significance level = 0.05
Interval = 0.3783 +- Z alpha/2*(sqrt(0.3783*0.6217/341)
Interval = (0.3268, 0.4298)

Outlet Type

Outlet Type
Count of Outlet Type Sample Proportion
Grocery Store 39 0.114369501
Supermarket Type1 235 0.68914956
Supermarket Type2 36 0.105571848
Supermarket Type3 31 0.090909091
Grand Total 341 1
Margin of Error 0.049125996
Upper Level 0.738275556
Lower Level 0.640023564

Point Estimator = 0.6891

7
Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)
Alpha = Significance level = 0.05

Interval = 0.6891 +- Z alpha/2*sqrt(0.6891*0.3109/341)


Interval = (0.6400, 0.7383)

Outlet Size

Outlet Size
Count of Outlet_Size Sample Proportion
High 77 0.225806452
Medium 135 0.395894428
Small 129 0.37829912
Grand Total 341 1
Margin of Error 0.051906889
Upper Level 0.447801317
Lower Level 0.34398754

Point Estimator = 0.3959

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)


Alpha = Significance level = 0.05

Interval = 0.3959 +- Z 0.05/2*sqrt(0.3959*0.6041/341)


Interval = (0.3440, 0.4478)

Interpretation: Here, Point & Confidence Interval Estimation is same with manual and excel
calculation.

2) One sample t-Test

In statistics, the process of hypothesis testing involves putting an analyst's presumption about a
population parameter to the test. The type of data used and the purpose of the study will determine the
methodology the analyst uses.

Using sample data, hypothesis testing is done to determine whether a claim is plausible. These data
could originate from a broader population or a process that creates data. In the descriptions that
follow, "population" will be used to refer to both situations.

Type I Error is the false rejection of the null hypothesis, and;


Type II Error is the false acceptance of the null hypothesis.

Population Condition H0 True Ha True (H0 False)


Conclusion
Accept H0 Correct (1 – alpha) Type II Error (beta)
Reject H0 Type I Error (alpha) Correct (1 – beta)
Power of the test)

8
Practical Statistical Statistical Practical
Test
Question Question Decision Solution

Step 1: Development of Null Hypothesis (H0)


H0 = --------------------
Step 2: Development of alternative Hypothesis (Ha)
Ha = ---------------------
Step 3: Type of test (One-tailed or Two-tailed)
Step 4: Significance level (5% or 1%)
Step 5: Critical Value (Table Value based on the level of significance)
Step 6: Calculation of test statistic (Use of formula)
Step 7: Decision Based on comparison of Test Statistic and critical value

Item Outlet Sales

t-Test: Two-Sample Assuming Unequal Variances

Item Outlet Sales


Mean 2155.928737
Variance 2588095.295
Observations 341
Hypothesized Mean Difference 0
df 341
t Stat 24.74693738
P(T<=t) one-tail 2.4061E-78
t Critical one-tail 1.649347611
P(T<=t) two-tail 4.81221E-78
t Critical two-tail 1.966965734

Step 1: Development of null hypothesis


H0: Average Item Outlet Sales is less than or equal to 2000 (u <= 2000)
Step 2: Development of alternate hypothesis
Ha: Average Item Outlet Sales is greater than than 2000 (u > 2000)

Step 3: Type of test - One-tailed test (Upper Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha, (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (Xbar - uo)/ Standard Error)
T.S. = (2155.9287 - 2000)/ 87.119
T.S. 1.789835921

Interpretation
Since, Test Statistic (1.7898) is greater than t 0.05, 340 (1.65), we reject H0 at the 5% level of
significance. This means that the Average Item Outlet Sales is greater than 2000, with a 5% chance of
error in judgement.

9
Item MRP

Item MRP
Mean 141.8061314
Variance 3653.72487
Observations 341
Hypothesized Mean Difference 0
df 341
t Stat 43.32157904
P(T<=t) one-tail 8.9134E-141
t Critical one-tail 1.649347611
P(T<=t) two-tail 1.7827E-140
t Critical two-tail 1.966965734

Step 1: Development of null hypothesis


H0: Average Item MRP is greater than or equal to 150 (u >= 150)

Step 2: Development of alternate hypothesis


Ha: Average Item MRP is less than 150 (u < 150)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha, (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (Xbar - uo)/ Standard Error)
T.S. = (141.8061 - 150)/ 3.2733
T.S. -2.503215649

Interpretation
Since, Test Statistic (-2.5032) is less than -t 0.05, 340 (-1.65), we reject H0 at the 5% level of
significance. This means that the Average Item MRP is less than 150 with a 5% chance of error in
judgement.

Item Weight

t-Test: Two-Sample Assuming Unequal


Variances

Item Weight
Mean 11.0304
Variance 38.5341
Observations 341
Hypothesized Mean Difference 0
df 339
t Stat 32.7647
P(T<=t) one-tail 2E-107
t Critical one-tail 1.64936
P(T<=t) two-tail 4E-107
t Critical two-tail 1.96699

10
Step 1: Development of null hypothesis
H0: Average Item Weight is greater than or equal to 15 (u >= 15)

Step 2: Development of alternate hypothesis


Ha: Average Item MRP is less than 15 (u < 15)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha, (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (Xbar - uo)/ Standard Error)
T.S. = (11.0304 - 15)/ 0.3367
T.S. -11.79144354

Interpretation
Since, Test Statistic (-11.7914) is less than -t 0.05,340 (-1.65), we reject H0 at the 5% level of
significance. This means that the Average Item Weight is less than 15, with a 5% chance of error in
judgement.

Item Visibility

t-Test: Two-Sample Assuming Unequal Variances

Item Visibility
Mean 0.0627
Variance 0.0023
Observations 341
Hypothesized Mean Difference 0
df 340
t Stat 24.203
P(T<=t) one-tail 3E-76
t Critical one-tail 1.6493
P(T<=t) two-tail 6E-76
t Critical two-tail 1.967

Step 1: Development of null hypothesis


H0: Average Item Visibility is less than or equal to 0.06 (u <= 0.06)

Step 2: Development of alternate hypothesis


Ha: Average Item Visibility is greater than 0.06 (u > 0.06)

Step 3: Type of test - One-tailed test (Upper Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic > t apha (n-1)
Step 5: Significance level - 5% (0.05)

Step 7: Test Statistics


T.S.= (Xbar - uo)/ Standard Error)
T.S. = (0.0627 - 0.06)/ 0.0026
T.S. 1.030399472

11
Interpretation
Since, Test Statistic (1.0304) is less than t 0.05, 340 (1.65), we accept H0 at the 5% level of
significance. This means that the Average Item Visibility is less than or equal to 0.06, with a 5%
chance of error in judgement.

3) One Sample Test for Proportion

Item Type

Sample
Item Type Count of Item Type Proportion
Baking Goods 26 0.076246334
Breads 13 0.038123167
Breakfast 3 0.008797654
Canned 31 0.090909091
Dairy 20 0.058651026
Frozen Foods 32 0.093841642
Fruits and
Vegetables 53 0.15542522
Hard Drinks 10 0.029325513
Health and Hygiene 28 0.082111437
Household 43 0.126099707
Meat 13 0.038123167
Others 6 0.017595308
Seafood 3 0.008797654
Snack Foods 39 0.114369501
Soft Drinks 18 0.052785924
Starchy Foods 3 0.008797654
Grand Total 341 1
Margin of Error 0.038455519
Upper Level 0.193880739
Lower Level 0.116969701

Step 1: Development of null hypothesis


H0: Proportion of Fruits & Vegetables item types is greater than or equal to 0.2 (p >= 0.2)

Step 2: Development of alternate hypothesis


Ha: Proportion of Fruits & Vegetables item types is less than 0.2 (p < 0.2)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65
Step 7: Test Statistics
T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.1554-0.2)/ sqrt[0.2*0.8/341]
T.S. -2.057815372

Interpretation
Since, Test Statistic (-2.05782) is less than -t 0.05, 340 (-1.65), we reject H0 at the 5% level of
significance. This means that the proportion of Fruits & Vegetables item types is less than 0.2, with a
5% chance of error in judgement.

12
Item Fat Content

Sample
Item Fat Content Count of Item Fat Content Proportion
Low Fat 218 0.639296188
Regular 123 0.360703812
Grand Total 341 1
Margin of Error 0.050968905
Upper Level 0.690265093
Lower Level 0.588327282

Step 1: Development of null hypothesis


H0: Proportion of Low-Fat products is less than or equal to 0.6 (p <= 0.6)

Step 2: Development of alternate hypothesis


Ha: Proportion of Fruits & Vegetables sales is greater than 0.6 (p > 0.6)

Step 3: Type of test - One-tailed test (Upper Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.6393-0.6)/ sqrt[0.6*0.4/341]
T.S. 1.481228256

Interpretation
Since, Test Statistic (1.4812) is less than t 0.05, 340 (1.65), we accept H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Low-Fat products
is less than or equal to 0.6, with a 5% chance of error in judgement.

Outlet Location Type

Outlet Location Count of Outlet Location Sample


Type Type Proportion
Tier 1 92 0.269794721
Tier 2 120 0.351906158
Tier 3 129 0.37829912
Grand Total 341 1
Margin of Error 0
Upper Level 0.37829912
Lower Level 0.37829912

Step 1: Development of null hypothesis


H0: Proportion of Tier 3 Outlet Locations is greater than or equal to 0.5 (p >= 0.5)

Step 2: Development of alternate hypothesis


Ha: Proportion of Tier 3 Outlet Locations are less than 0.5 (p < 0.5)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha (n-1)
Step 5: Significance level - 5% (0.05)

13
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.3783-0.5)/ sqrt[0.5*0.5/341]
T.S. -4.494701997

Interpretation
Since, Test Statistic (-4.4947) is less than -t 0.05, 340 (-1.65), we reject H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Tier 3 Outlet
Locations are less than 0.5, with a 5% chance of error in judgement.

Outlet Type

Sample
Outlet Type Count of Outlet Type Proportion
Grocery Store 39 0.114369501
Supermarket Type1 235 0.68914956
Supermarket Type2 36 0.105571848
Supermarket Type3 31 0.090909091
Grand Total 341 1
Margin of Error 0
Upper Level 0.68914956
Lower Level 0.68914956

Step 1: Development of null hypothesis


H0: Proportion of Supermarket Type 1 Outlet types is less than or equal to 0.4 (p <= 0.4)

Step 2: Development of alternate hypothesis


Ha: Proportion of Supermarket Type 1 Outlet types is greater than 0.4 (p > 0.4)

Step 3: Type of test - One-tailed test (Upper Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.6891-0.4)/ sqrt[0.4*0.6/341]
T.S. 10.89918702

Interpretation
Since, Test Statistic (10.8992) is greater than t 0.05, 340 (1.65), we reject H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Supermarket
Type 1 Outlet types is greater than 0.4, with a 5% chance of error in judgement.

Outlet Size

Sample
Outlet Size Count of Outlet_Size Proportion
High 77 0.225806452
Medium 135 0.395894428

14
Small 129 0.37829912
Grand Total 341 1
Margin of Error 0
Upper Level 0.395894428
Lower Level 0.395894428

Step 1: Development of null hypothesis


H0: Proportion of Medium Outlet Sizes is greater than or equal to 0.5 (p >= 0.5)

Step 2: Development of alternate hypothesis


Ha: Proportion of Medium Outlet Sizes are less than 0.5 (p < 0.5)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65

Step 7: Test Statistics


T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.3959-0.5)/ sqrt[0.5*0.5/341]
T.S. -1.215853248

Interpretation
Since, Test Statistic (-1.2159) is less than -t 0.05, 340 (-1.65), we reject H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Medium Outlet
Sizes are less than 0.5, with a 5% chance of error in judgement.

4) Two Sample t-Test

Low Fat Item Sales v/s Regular fat Item Sales

t-Test: Two-Sample Assuming Unequal Variances

Regular Fat Item


Low Fat Item Sales Sales
Mean 2269.220485 1955.135233
Variance 2991618.919 1827983.135
Observations 218 123
Hypothesized Mean Difference 0
df 305
t Stat 1.857722851
P(T<=t) one-tail 0.032085978
t Critical one-tail 1.649864893
P(T<=t) two-tail 0.064171956
t Critical two-tail 1.967772355
Now here sample size (n1) = 218; (n2) = 123
Significance level = alpha = 0.05

Step 1: Development of null hypothesis


H0: Average sales of Low-Fat food items is less than or equal to Regular Fat food items
(u1 <= u2)

15
Step 2: Development of alternate hypothesis
Ha: Average sales of Low-Fat food items is greater than Regular Fat food items (u1 > u2)

Step 3: Type of test - One-tailed test (Upper Tail Test)


Step 4: Rejection Criteria: Reject H0 if p value < alpha
Step 5: Significance level - 5% (0.05)
Step 6: p-value for one tailed test is 0.0321

Interpretation
Here, p-value (0.0321) < alpha (0.05), H0 is rejected. From the hypothesis testing, it can be concluded
that the average sales of Low-Fat food items is greater than Regular Fat food items with a 5% chance
of error in judgement.

Supermarket Type 1 Sales v/s Other Outlet Types Sales

t-Test: Two-Sample Assuming Unequal Variances

Supermarket Type 1
Sales Other Outlet Types Sales
Mean 2264.904274 1914.332028
Variance 1945409.208 3959511.932
Observations 235 106
Hypothesized Mean
Difference 0
df 153
t Stat 1.641125209
P(T<=t) one-tail 0.051412758
t Critical one-tail 1.654873847
P(T<=t) two-tail 0.102825515
t Critical two-tail 1.975590315

Now here sample size (n1) = 235; (n2) = 106


Significance level = alpha = 0.05

Step 1: Development of null hypothesis


H0: Average sales in Supermarket Type 1 is greater than or equal to other Outlet Types
(u1 >= u2)

Step 2: Development of alternate hypothesis


Ha: Average sales in Supermarket Type 1 is less than other Outlet Types? (u1 < u2)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if p value < alpha
Step 5: Significance level - 5% (0.05)
Step 6: p-value for one tailed test is 0.0514

Interpretation
Here, p-value (0.0514) > alpha (0.05), H0 cannot be rejected. From the hypothesis testing, it can be
concluded that the Average sales in Supermarket Type 1 is greater than or equal to other Outlet Types
with a 5% chance of error in judgement.

16
Tier 1 Outlets Sales v/s Other Outlet Location Sales

t-Test: Two-Sample Assuming Unequal Variances

Tier 1 Outlets Sales Other Outlet Location Sales


Mean 2010.498891 2209.661853
Variance 2655332.263 2563114.735
Observations 92 249
Hypothesized Mean Difference 0
df 160
t Stat 1.006490651
P(T<=t) one-tail 0.157849658
t Critical one-tail 1.654432901
P(T<=t) two-tail 0.315699317
t Critical two-tail 1.97490156

Now here sample size (n1) = 235; (n2) = 106

Step 1: Development of null hypothesis


H0: Average sales in Tier1 Outlets is less than or equal to other Outlet Location Types
(u1 <= u2)

Step 2: Development of alternate hypothesis


Ha: Average sales in Tier 1 Outlets is greater than other Outlet Location Types? (u1 > u2)

Step 3: Type of test - One-tailed test (Lower Tail Test)


Step 4: Rejection Criteria: Reject H0 if p value < alpha
Step 5: Significance level - 5% (0.05)
Step 6: p-value for one tailed test is 0.0514

Interpretation
Here, p-value (0.1578) > alpha (0.05), H0 cannot be rejected. From the hypothesis testing, it can be
concluded that the Average sales in Tier1 Outlets is less than or equal to other Outlet location Types
with a 5% chance of error in judgement.

5) Anova

ANOVA is a statistical method used to determine whether the means of two or more groups differ
from one another significantly. ANOVA compares the means of various samples to examine the
influence of one or more factors.

Outlet Size & Sales

Anova: Single
Factor

SUMMARY
Groups Count Sum Average Variance
Small Outlet Size
Sales 129 248017.8 1922.619 1993521
Medium Outlet Sales 135 332941.9 2466.237 3379642
High Outlet Sales 77 154211.9 2002.752 1974761

17
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 21827861 2 10913930 4.298803 0.014335 3.022441
Within Groups 8.58E+08 338 2538830

Total 8.8E+08 340

Alpha: Significance Level: 0.05

Hypothesis

H0: The average sales across different outlet sizes are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet size is different

Reject H0 if p-value <= alpha

Interpretation
As, p-value (0.0143) < 0.05, H0 is rejected. This means that at least the average sales across one
category of outlet size are different.

Outlet Location & Sales

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Tier 1 Outlet Location
Sales 92 184965.9 2010.499 2655332
Tier 2 Outlet Location
Sales 120 269192.3 2243.269 1583974
Tier 3 Outlet Location
Sales 129 281013.5 2178.4 3491390

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 2926317 2 1463159 0.563892 0.569525 3.022441
Within Groups 8.77E+08 338 2594752

Total 8.8E+08 340

Hypothesis
H0: The average sales across different outlet location types are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet location type is different

Reject H0 if F Calculated > F critical value

Interpretation
As, F calculated (0.5638) > critical value (3.0224), H0 is rejected. This means that at least the average
sales across one category of outlet location type are different.

18
Outlet Type & Sales

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Grocery Store Sales 39 15611.01 400.2824 75349.21
Supermarket Type 1
Sales 274 547863.5 1999.502 2103973
Supermarket Type 2
Sales 36 75367.23 2093.534 2385182
Supermarket Type 3
Sales 31 111941 3610.999 4986918

ANOVA
Source of Variation SS df MS F P-value F crit
2.62E-
Between Groups 1.8E+08 3 60117003 27.89456 16 2.628646
Within Groups 8.1E+08 376 2155151

Total 9.91E+08 379

Hypothesis
H0: The average sales across different outlet types are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet type is different

Reject H0 if F Calculated > F critical value

Interpretation
As, F calculated (27.8946) > F critical value (2.6286), H0 is rejected. This means that at least the
average sales across one category of outlet type are different.

6) Chi-Square Test

The chi square test is conducted to check the association between two categorical values.

Outlet Type & Item Fat

Count of
Item_Fat_Content Item fat Content
Grand
Outlet Type Low Fat Regular Total
Grocery Store 20 19 39
Supermarket Type1 154 81 235
Supermarket Type2 24 12 36
Supermarket Type3 20 11 31
Grand Total 218 123 341

19
Expected Values

Count of
Item_Fat_Content Item fat Content
Grand
Outlet Type Low Fat Regular Total
Grocery Store 24.93255132 14.0674 39
Supermarket Type1 150.2346041 84.7654 235
Supermarket Type2 23.01466276 12.9853 36
Supermarket Type3 19.81818182 11.1818 31
Grand Total 218 123 341

Hypothesis
H0: There is no association between Outlet Type and Item Fat Content
Ha: There is association between Outlet Type and Item Fat Content

Reject H0 if p value < 0.05

alpha = 0.05
p - value = 0.3782

Interpretation
As, p-value (0.3782) > alpha (0.05), H0 cannot be rejected. This means that there is no association
between Outlet Type and Item Fat Content.

Outlet Location Type & Outlet Size

Count of Outlet Size Outlet Size


Grand
Outlet Location Type Medium Small Total
Tier 1 34 58 92
Tier 2 28 65 93
Tier 3 73 6 79
Grand Total 135 129 264

Expected Values

Count of Outlet Size Outlet Size


Grand
Outlet Location Type Medium Small Total
Tier 1 47.04545455 44.95454545 92
Tier 2 47.55681818 45.44318182 93
Tier 3 40.39772727 38.60227273 79
Grand Total 135 129 264

Hypothesis
H0: There is no association between Outlet Location Type and Outlet Size
Ha: There is association between Outlet Location Type and Outlet Size

Reject H0 if p value < 0.05

20
alpha = 0.05
p - value = 1.3 x 10^-17

Interpretation
As, p-value (1.3 x 10^-17) < alpha (0.05), H0 is rejected. This means that there is association between
Outlet Location Type and Outlet Size.

Item Type & Item fat Content

Count of Item Fat Content Item fat Content


Grand
Item Type Low Fat Regular Total
Canned 14 17 31
Frozen Foods 15 17 32
Fruits and Vegetables 21 32 53
Snack Foods 26 13 39
Grand Total 76 79 155

Expected Values
Count of
Item_Fat_Content Item fat Content
Grand
Item Type Low Fat Regular Total
Canned 15.2 15.8 31
Frozen Foods 15.69032258 16.30968 32
Fruits and Vegetables 25.98709677 27.0129 53
Snack Foods 19.12258065 19.87742 39
Grand Total 76 79 155

Hypothesis
H0: There is no association between Item Type and Item Fat Content
Ha: There is association between Item Type and Item Fat Content

Reject H0 if p value < 0.05

alpha = 0.05
p - value = 0.0727

Interpretation
As, p-value (0.0727) > alpha (0.05), H0 cannot be rejected. This means that there is no association
between Item Type and Item fat Content.

7) Correlation

A statistical measure called correlation shows how much two or more variables fluctuate in
connection to one another.

21
Item Visibility and Item Outlet Sales

Step 1: Variables
X: Item Visibility
Y: Item Outlet Sales

Step 2: Correlation Values


Item
Visibility Item Outlet Sales
Item Visibility 1
Item Outlet
Sales -0.12439928 1

Correlation: -0.1244

Interpretation
Both the variables are negatively correlated. Since the correlation value (0.1224) is above 0.7, there is
strong correlation between Item Visibility and Item Outlet Sales. This means that as Item Visibility
increases, Item Outlet Sales decreases.

Item MRP and Item Outlet Sales

Step 1: Variables
X: Item MRP
Y: Item Outlet Sales

Step 2: Correlation Values

Item MRP Item Outlet Sales


Item_MRP 1
Item Outlet Sales 0.551435693 1

Correlation = 0.5514

Interpretation
Both the variables are positively correlated. Since the correlation value (0.5514) is between 0.5 and
0.7, there is moderate correlation between Item MRP and Item Outlet Sales. This means that as Item
MRP increases, Item Outlet Sales will also increase.

Item Weight & Item Outlet Sales

Objective: To check the relationship between the values of Item Weight and Item Outlet Sales.

Step 1: Variables
X: Item Weight
Y: Item Outlet Sales

Step 2: Correlation Values

Item Weight Item Outlet Sales


Item Weight 1
Item Outlet Sales 0.002445 1

22
Correlation: 0.0024

Interpretation: Both the variables are positively correlated. Since the correlation value (0.0024) is
between 0.0 and 0.2, there is very weak to negligible correlation between Item Weight and Item
Outlet Sales.

Item_Outlet_Sales
15000 y = 148.02x

10000

5000

0
0 5 10 15 20 25

Interpretation:
0 is the baseline Item Outlet Sales when the Item Weight is zero.

8) Simple Linear Regression

When attempting to predict a continuous dependent variable from a number of independent


factors, regression analysis is used.

Item Visibility & Item Outlet Sales

Significant F- Value

Regression
Statistics
Multiple R 0.124399
R Square 0.015475
Adjusted R
Square 0.012571
Standard Error 1598.612
Observations 341

Multiple R: 0.124399
Interpretation: As the value of Multiple R is greater than 0.7, it indicates a strong correlation
between Item Outlet Sales and Item Visibility.

R Square: 0.015475
Interpretation: It indicates that 1.55% of the variance in Item Outlet Sales can be explained by Item
Visibility.

ANOVA
Significance
df SS MS F F
Regression 1 13617423 13617423 5.328547 0.02158017
Residual 339 8.66E+08 2555560
Total 340 8.8E+08

23
Hypothesis
H0: Item Outlet Sales does not have a linear relationship with Item Visibility
Ha: Item Outlet Sales have a linear relationship with Item Visibility

Level of Significance alpha = 5% or 0.05

Reject H0 if p value < 0.05

Interpretation
As, p-value (0.0216) < alpha (0.05), H0 is rejected. This means that Item Outlet Sales has a linear
relationship with Item Visibility, at the 5% level of significance.

Co-efficient Table
Uppe
r
Coefficient
Standard Upper Lower 95.0
s Error t Stat P-value Lower 95% 95% 95.0% %
142.848 16.9285 5.06E- 2137.2435 2699. 2137.
Intercept 2418.225 9 5 47 2 2 2 2699
Item 1813.18 0.0215
Visibility -4185.49 6 -2.30836 8 -7752.0074 -619 -7752 -619

Regression Equation
Y = (-4185.49*X) + 2418.225
Net Sales = (-4185.49*Item Visibility) + 2418.225

Intercept Hypothesis

H0: The intercept is zero


Ha: The intercept is non-zero

Reject H0 if p-value corresponding to Intercept < alpha


p-value = 5.06E-47
alpha = 0.05

Interpretation
Since the p-value corresponding to Intercept (5.06E-47) < alpha (0.05), H0 is rejected. This means
that the intercept is non-zero or the regression line does not pass through the origin. As this is the
case, we can say that there are outliers in the data, which makes intercept significant.

Assumption Checking

Linearity

Item_Outlet_Sales
y = -4185.5x + 2418.2
15000
Item Outlet Sales

10000
5000
0
0 0.1 0.2 0.3
Item Visibility

24
Interpretation: The points are more or less along the straight line. This implies that the relationship
between Item Visibility and Item Outlet Sales are linear in the parameters m and c.

Normality

Normal Probability Plot


20000
Item_Outlet_Sales

10000
0
0 5000 10000 15000
-10000
Sample Percentile

Interpretation: Normal Probability Plot indicates that all the points are along the 45-degree line. This
implies that the error in estimating Net Sales, i.e., ε, and hence Net Sales, follows a normal
distribution.

Homoscedasticity

Item_Visibility Residual
Plot
20000
Residuals

0
0 0.05 0.1 0.15 0.2 0.25 0.3
-20000
Item_Visibility

Interpretation: The points in the Item Visibility Residual plot are randomly placed without any
pattern. This implies that the error in estimating Net sales, i.e., ε, has a constant variance across all the
values of Item Visibility.

Independence

Interpretation: The errors, across all the values of Item Visibility, are independent of each other.

Item MRP & Item Outlet Sales

Regression
Statistics
Multiple R 0.551436
R Square 0.304081
Adjusted R
Square 0.302028
Standard Error 1344.03
Observations 341

25
Multiple R (0.551436)
Interpretation: The value of Multiple R is greater than 0.5 but less than 0.7, it indicates a moderate
correlation between Item Outlet Sales and Item MRP.

R Square (0.304081)
Interpretation: It indicates that 30.4% of the variance in Item Outlet Sales can be explained by Item
MRP.

ANOVA
Significance
df SS MS F F
Regression 1 267577090.9 3E+08 148.1 1.60628E-28
Residual 339 612375309.3 2E+06
Total 340 879952400.2

Hypothesis
H0: Item Outlet Sales does not have a linear relationship with Item MRP
Ha: Item Outlet Sales have a linear relationship with Item MRP

Level of Significance alpha = 5% or 0.05

Reject H0 if p value < 0.05

Interpretation: As the p-value (1.60628E-28) < alpha (0.05), H0 is rejected. This means that Item
Outlet Sales has a linear relationship with Item MRP, at the 5% level of significance.

Co-efficient Table
Standard P- Upper Lower Upper
Coefficients Error t Stat value Lower 95% 95% 95.0% 95.0%
-
Intercept 74.737 185.8453074 0.4021 0.688 290.8182075 440.2922 -290.818 440.2922
2E-
Item_MRP 14.67632 1.205873101 12.171 28 12.30438097 17.04825 12.30438 17.04825

Regression Equation
Y = 14.67632*X + 74.737
Net Sales = 14.67632*Item MRP + 74.737

Intercept Hypothesis

H0: The intercept is zero


Ha: The intercept is non-zero

Reject H0 if p-value corresponding to Intercept < alpha


p-value = 0.688
alpha = 0.05

Interpretation
Since the p-value corresponding to Intercept (0.688) > alpha (0.05), H0 cannot be rejected. This
means that the intercept is zero or the regression line passes through the origin.

26
Assumption Checking

Linearity

Item_Outlet_Sales
y = 14.676x + 74.737
15000
Item Outlet Sales

10000
5000
0
0 100 200 300
Item MRP

Interpretation: The points are more or less along the straight line. This implies that the relationship
between Item MRP and Item Outlet Sales are linear in the parameters m and c.

Normality

Normal Probability Plot


15000
Item_Outlet_Sales

10000
5000
0
0 5000 10000 15000
-5000
Sample Percentile

Interpretation: Normal Probability Plot indicates that all the points are along the 45-degree line. This
implies that the error in estimating Net Sales, i.e., ε, and hence Net Sales, follows a normal
distribution.

Homoscedasticity

Item_MRP Residual Plot


10000
Residuals

0
0 100 200 300
-10000
Item_MRP

Interpretation: The points in the Item Visibility Residual plot are randomly placed without any
pattern. This implies that the error in estimating Net sales, i.e., ε, has a constant variance across all the
values of Item MRP.

Independence

Interpretation: The errors, across all the values of Item Visibility, are independent of each other.

27
CONCLUSION

The following factors could be considered by the outlets in retail domain to accelerate their sales:

Item Fat Content: The average sales of Low-Fat food items are greater than Regular Fat food items.
To accelerate the sales of Regular Fat food items, companies should focus on marketing the feel, taste
and other aspects of these products.

Outlet Type: The average sales in Supermarket Type 1 are greater than or equal to other Outlet Types
(Grocery, Supermarket Type 2 & 3). To boost the sales in other outlet types other than Supermarket
Type 1, the outlets should adopt Competitive Pricing Strategies.

Outlet Location Type: The average sales in Tier1 Outlet Location Type are less than or equal to
other Outlet location Type. To accelerate the sales in these outlets, they should place their image as
easy to access outlets where quality products are available at affordable rates. They could also adopt
Home Delivery services and provide free delivery to customers who have a membership in their
outlets.

28

You might also like