SFB - CIA 3 - Report - FINAL

STATISTICS FOR BUSINESS
CIA 3 – DATA ANALYSIS EXERCISE
A Report submitted in partial fulfillment of the requirements for the degree of

Master of Business Administration
By
ARITRA GHOSH (2327106)
DEBANIK HAZRA (2327113)
DIANA JOHN (2327114)
SHINY SHAH (2327150)
GARIKAPATI SANDEEP CHOWDARY (2327120)
Under the Guidance of
Dr. RAJASHREE KAMATH K
MBA PROGRAMME
SCHOOL OF BUSINESS AND MANAGEMENT
CHRIST (DEEMED TO BE UNIVERSITY), BANGALORE
JULY 2023
1
INDEX
S. No. Content Page No.

1 Introduction 3
2 Research Methodology 3
3 Business Problem 3
4 Data Dictionary 3-4
5 Sample Size Determination 4
6 Point & Interval Estimation 5-8
7 One sample t-test. 8 – 12
8 One Sample Test for Proportion 12 - 15
9 Two sample t-test 15 – 17
10 ANOVA. 17 – 19
11 Chi-square test 19 – 21
12 Correlation 21 – 23
13 Simple Linear Regression 23 - 27
14 Conclusion 28
2
INTRODUCTION
Sales Prediction in the retail domain is the focus of this study. For studying the same, the retail sales
dataset of outlets of varied types and sizes across different locations are taken.
This dataset contains crucial variables like Item Weight, Item Fat Content, Item Visibility, Item Type,
Item MRP, Outlet Size, Outlet Location Type, Outlet Type, and Item Outlet Sales, relevant to the
study.
Point & Interval estimation, One sample t test, One Sample Test for Proportion, Two sample t test,
Anova, Chi-square test, Correlation, and Simple Linear Regression, and are the statistical methods
used in this work.
RESEARCH METHODOLOGY
Secondary data is used for the purpose of this study.
Dataset URL:
https://www.kaggle.com/datasets/adarshkumarjha/big-mart-sales-prediction
This dataset was utilized to forecast the demand in the retail domain.
BUSINESS PROBLEM
Demand Forecasting: The study focuses on forecasting the demand in the retail domain by analyzing
how product attributes like Item Weight, Item Fat Content, Item Visibility, Item Type, and Item MRP
and outlet attributes like Outlet Size, Outlet Location Type, Outlet Type relates to Item Outlet Sales.
DATA DICTIONARY
Variable Description Data Type Sub-classsification

Item Identifier Unique identifier for each Qualitative Nominal
item
Item Weight Weight of the product Quantitative Continuous (Ratio)
Item Fat Content Fat content of the product Qualitative Nominal
Item Visibility Percentage of total Quantitative Continuous (Ratio)
display area occupied by
the product on shelves
Item Type Category/type of the Qualitative Nominal
product
Item MRP Maximum Retail Price of Quantitative Continuous (Ratio)
the product
Outlet Identifier Unique identifier for each Qualitative Nominal
outlet
Outlet Year in which the outlet Quantitative Discrete
Establishment Year was established
Outlet Size Size of the outlet (e.g., Qualitative Ordinal
Small, Medium, High)
Outlet Location Type of location where Qualitative Ordinal
Type the outlet is situated (e.g.,
Tier 1, Tier 2, Tier 3)
3
Outlet Type Type of outlet (e.g., Qualitative Nominal
Supermarket Type1,
Grocery Store)
Item Outlet Sales Sales of the product in the Quantitative Continuous (Ratio)
outlet
SAMPLE SIZE DETERMINATION
DESCRIPTIVE STATICS FOR THE LEADER VARIABLE (ITEM OUTLET SALES)

Mean 2181.288914
Standard Error 18.48459547
Median 1794.331
Mode 958.752
Standard Deviation 1706.499616
Sample Variance 2912140.938
Kurtosis 1.615876681
Skewness 1.177530603
Range 13053.6748
Minimum 33.29
Maximum 13086.9648
Sum 18591125.41
Count 8523
Confidence Level (95.0%) 36.23428767
Out of a population of 8523 peoples, the optimum sample size for the study is computed below. Here:
Mean = 2181.2889
Assumed Mean = 2000
Margin of Error = 181
Significance level = Alpha = 0.05
Sample size, n = (Z alpha/2*S.D.)/k)^2

n = ((Z 0.05/2*1706.4996)/181)^2
n = ((1.96*1706.4996)/181)^2
n = 341.4816
Hence, optimum sample size for the study = 341
STATISTICAL TOOLS USED FOR DATA ANALYSIS
• Point & Interval Estimation

• One sample t-test.
• One Sample Test for Proportion
• Two sample t-test.
• ANOVA.
• Chi-square test
• Correlation
• Simple Linear Regression.
4
1) Point & Interval Estimation
The sample mean and proportion are said to be the point estimators of the population mean and
proportion respectively. Interval estimation uses sample data to determine an interval of potential
values for an unknown population parameter.
The (1-alpha) % confidence interval for μ is given by: Mean +- Margin of Error
Continuous Variables
DESCRIPTIVE STATISTICS ON ALL CONTINUOUS VARIABLES
Item Item Item Outlet

Weight Visibility Item MRP Sales
Mean 11.03036765 0.062668024 141.8061314 2155.928737
Standard Error 0.336653637 0.00258931 3.273337088 87.11901211
Median 11 0.052836076 142.6496 1780.3492
Mode 0 0 110.157 1414.1592
Standard Deviation 6.207586347 0.047814676 60.44604925 1608.755822
Sample Variance 38.53412826 0.002286243 3653.72487 2588095.295
- -
Kurtosis 0.881443285 1.178610751 0.699547354 2.48453533
-
Skewness 0.297725837 1.061660226 0.173488172 1.294612023
Range 21 0.27321283 234.7984 10034.9376
Minimum 0 0 31.89 37.9506
Maximum 21 0.27321283 266.6884 10072.8882
Sum 3750.325 21.36979602 48355.8908 735171.6994
Count 341 341 341 341
Confidence
Level(95.0%) 0.662193146 0.005093084 6.438541888 171.3601116
Upper Level 11.69256079 0.067761107 148.2446733 2327.288849

Lower Level 10.3681745 0.05757494 135.3675895 1984.568626
Item Weight
Point Estimator = Sample Mean = 11.0304

Interval = Mean +- Margin of Error
Interval = (11.0304 - 0.6622, 11.0304 + 0.6622)
Interval = (10.3682, 11.6926)
Item Visibility
Point Estimator = 0.0627

Interval = (0.0627 - 0.0051, 0.0627 + 0.0051)
Interval = (0.0576, 0.0678)
5
Item MRP

Interval = 141.8061 - 6.4385, 141.8061 + 6.4385)
Interval = (135.3676, 148.2446)
Item Outlet Sales

Interval = (2155.9287 - 171.3601, 2155.9287 + 171.3601)
Interval = (1984.5686, 2327.2888)
Interpretation: Here, Point & Confidence Interval Estimation is same with manual and excel
calculation.
Categorical Variables
Item Fat
Item Fat Content Count of Item Fat Content Sample Proportion
Low Fat 218 0.639296188
Regular 123 0.360703812
Grand Total 341 1
Margin of Error 0.050968905
Upper Level 0.690265093
Lower Level 0.588327282
Point Estimator = Sample Proportion = 0.6393
Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)

Alpha = Significance level = 0.05
Interval = 0.6393 +- Z 0.05/2*sqrt(0.6393*0.3607/341)
Interval = (0.5883, 0.6903)
Item Type
Item Type Count of Item Type Sample Proportion

Baking Goods 26 0.076246334
Breads 13 0.038123167
Breakfast 3 0.008797654
Canned 31 0.090909091
Dairy 20 0.058651026
Frozen Foods 32 0.093841642
Fruits and
Vegetables 53 0.15542522
Hard Drinks 10 0.029325513
Health and Hygiene 28 0.082111437
Household 43 0.126099707
Meat 13 0.038123167
6
Others 6 0.017595308
Seafood 3 0.008797654
Snack Foods 39 0.114369501
Soft Drinks 18 0.052785924
Starchy Foods 3 0.008797654
Grand Total 341 1

Interval = 0.1554 +- Z 0.05/2*sqrt(0.1554*0.8446/341)

Interval = (0.1170, 0.1939)
Outlet Location Type
Count of Outlet Location

Outlet Location Type Type Sample Proportion
Tier 1 92 0.269794721
Tier 2 120 0.351906158
Tier 3 129 0.37829912
Grand Total 341 1

Interval = 0.3783 +- Z alpha/2*(sqrt(0.3783*0.6217/341)
Interval = (0.3268, 0.4298)
Outlet Type
Outlet Type
Count of Outlet Type Sample Proportion
Grocery Store 39 0.114369501
Supermarket Type1 235 0.68914956
Grand Total 341 1
7
Interval = 0.6891 +- Z alpha/2*sqrt(0.6891*0.3109/341)

Interval = (0.6400, 0.7383)
Outlet Size
Outlet Size
Count of Outlet_Size Sample Proportion
High 77 0.225806452
Medium 135 0.395894428
Small 129 0.37829912
Grand Total 341 1

Interval = 0.3959 +- Z 0.05/2*sqrt(0.3959*0.6041/341)

Interval = (0.3440, 0.4478)
Interpretation: Here, Point & Confidence Interval Estimation is same with manual and excel
calculation.
2) One sample t-Test
In statistics, the process of hypothesis testing involves putting an analyst's presumption about a
population parameter to the test. The type of data used and the purpose of the study will determine the
methodology the analyst uses.
Using sample data, hypothesis testing is done to determine whether a claim is plausible. These data
could originate from a broader population or a process that creates data. In the descriptions that
follow, "population" will be used to refer to both situations.
Type I Error is the false rejection of the null hypothesis, and;

Type II Error is the false acceptance of the null hypothesis.
Population Condition H0 True Ha True (H0 False)

Conclusion
Accept H0 Correct (1 – alpha) Type II Error (beta)
Reject H0 Type I Error (alpha) Correct (1 – beta)
Power of the test)
8
Practical Statistical Statistical Practical
Test
Question Question Decision Solution
Step 1: Development of Null Hypothesis (H0)

H0 = --------------------
Step 2: Development of alternative Hypothesis (Ha)
Ha = ---------------------
Step 3: Type of test (One-tailed or Two-tailed)
Step 4: Significance level (5% or 1%)
Step 5: Critical Value (Table Value based on the level of significance)
Step 6: Calculation of test statistic (Use of formula)
Step 7: Decision Based on comparison of Test Statistic and critical value
Item Outlet Sales
t-Test: Two-Sample Assuming Unequal Variances
Item Outlet Sales

Mean 2155.928737
Variance 2588095.295
Observations 341
Hypothesized Mean Difference 0
df 341
t Stat 24.74693738
P(T<=t) one-tail 2.4061E-78
t Critical one-tail 1.649347611
P(T<=t) two-tail 4.81221E-78
t Critical two-tail 1.966965734
Step 1: Development of null hypothesis

H0: Average Item Outlet Sales is less than or equal to 2000 (u <= 2000)
Step 2: Development of alternate hypothesis
Ha: Average Item Outlet Sales is greater than than 2000 (u > 2000)
Step 3: Type of test - One-tailed test (Upper Tail Test)

Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha, (n-1)
Step 5: Significance level - 5% (0.05)
Step 6: Critical Value for one tailed test is 1.65
Step 7: Test Statistics

T.S.= (Xbar - uo)/ Standard Error)
T.S. = (2155.9287 - 2000)/ 87.119
T.S. 1.789835921
Interpretation
Since, Test Statistic (1.7898) is greater than t 0.05, 340 (1.65), we reject H0 at the 5% level of
significance. This means that the Average Item Outlet Sales is greater than 2000, with a 5% chance of
error in judgement.
9
Item MRP
Item MRP
Mean 141.8061314
Variance 3653.72487
Observations 341
df 341
t Stat 43.32157904
P(T<=t) one-tail 8.9134E-141
P(T<=t) two-tail 1.7827E-140

H0: Average Item MRP is greater than or equal to 150 (u >= 150)

Ha: Average Item MRP is less than 150 (u < 150)
Step 3: Type of test - One-tailed test (Lower Tail Test)

Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha, (n-1)

T.S. = (141.8061 - 150)/ 3.2733
T.S. -2.503215649
Interpretation
Since, Test Statistic (-2.5032) is less than -t 0.05, 340 (-1.65), we reject H0 at the 5% level of
significance. This means that the Average Item MRP is less than 150 with a 5% chance of error in
judgement.
Item Weight
t-Test: Two-Sample Assuming Unequal

Variances
Item Weight
Mean 11.0304
Variance 38.5341
Observations 341
df 339
t Stat 32.7647
P(T<=t) one-tail 2E-107
P(T<=t) two-tail 4E-107
10
H0: Average Item Weight is greater than or equal to 15 (u >= 15)

Ha: Average Item MRP is less than 15 (u < 15)

Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha, (n-1)

T.S. = (11.0304 - 15)/ 0.3367
T.S. -11.79144354
Interpretation
Since, Test Statistic (-11.7914) is less than -t 0.05,340 (-1.65), we reject H0 at the 5% level of
significance. This means that the Average Item Weight is less than 15, with a 5% chance of error in
judgement.
Item Visibility
Item Visibility
Mean 0.0627
Variance 0.0023
Observations 341
df 340
t Stat 24.203
P(T<=t) one-tail 3E-76
P(T<=t) two-tail 6E-76

H0: Average Item Visibility is less than or equal to 0.06 (u <= 0.06)

Ha: Average Item Visibility is greater than 0.06 (u > 0.06)

Step 4: Rejection Criteria: Reject H0 if Test Statistic > t apha (n-1)

T.S. = (0.0627 - 0.06)/ 0.0026
T.S. 1.030399472
11
Interpretation
Since, Test Statistic (1.0304) is less than t 0.05, 340 (1.65), we accept H0 at the 5% level of
significance. This means that the Average Item Visibility is less than or equal to 0.06, with a 5%
chance of error in judgement.
3) One Sample Test for Proportion
Item Type
Sample
Item Type Count of Item Type Proportion
Baking Goods 26 0.076246334
Breads 13 0.038123167
Breakfast 3 0.008797654
Canned 31 0.090909091
Dairy 20 0.058651026
Frozen Foods 32 0.093841642
Fruits and
Vegetables 53 0.15542522
Hard Drinks 10 0.029325513
Health and Hygiene 28 0.082111437
Household 43 0.126099707
Meat 13 0.038123167
Others 6 0.017595308
Seafood 3 0.008797654
Snack Foods 39 0.114369501
Soft Drinks 18 0.052785924
Starchy Foods 3 0.008797654
Grand Total 341 1

H0: Proportion of Fruits & Vegetables item types is greater than or equal to 0.2 (p >= 0.2)

Ha: Proportion of Fruits & Vegetables item types is less than 0.2 (p < 0.2)

Step 4: Rejection Criteria: Reject H0 if Test Statistic < -t alpha (n-1)
T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.1554-0.2)/ sqrt[0.2*0.8/341]
T.S. -2.057815372
Interpretation
significance. This means that the proportion of Fruits & Vegetables item types is less than 0.2, with a
5% chance of error in judgement.
12
Item Fat Content
Sample
Item Fat Content Count of Item Fat Content Proportion
Low Fat 218 0.639296188
Regular 123 0.360703812
Grand Total 341 1

H0: Proportion of Low-Fat products is less than or equal to 0.6 (p <= 0.6)

Ha: Proportion of Fruits & Vegetables sales is greater than 0.6 (p > 0.6)

Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha (n-1)

T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.6393-0.6)/ sqrt[0.6*0.4/341]
T.S. 1.481228256
Interpretation
Since, Test Statistic (1.4812) is less than t 0.05, 340 (1.65), we accept H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Low-Fat products
is less than or equal to 0.6, with a 5% chance of error in judgement.
Outlet Location Type
Outlet Location Count of Outlet Location Sample

Type Type Proportion
Tier 1 92 0.269794721
Tier 2 120 0.351906158
Tier 3 129 0.37829912
Grand Total 341 1
Margin of Error 0

H0: Proportion of Tier 3 Outlet Locations is greater than or equal to 0.5 (p >= 0.5)

Ha: Proportion of Tier 3 Outlet Locations are less than 0.5 (p < 0.5)

13

T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.3783-0.5)/ sqrt[0.5*0.5/341]
T.S. -4.494701997
Interpretation
significance. From the hypothesis testing, it can be concluded that the proportion of Tier 3 Outlet
Locations are less than 0.5, with a 5% chance of error in judgement.
Outlet Type
Sample
Outlet Type Count of Outlet Type Proportion
Grocery Store 39 0.114369501
Grand Total 341 1
Margin of Error 0

H0: Proportion of Supermarket Type 1 Outlet types is less than or equal to 0.4 (p <= 0.4)

Ha: Proportion of Supermarket Type 1 Outlet types is greater than 0.4 (p > 0.4)

Step 4: Rejection Criteria: Reject H0 if Test Statistic > t alpha (n-1)

T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.6891-0.4)/ sqrt[0.4*0.6/341]
T.S. 10.89918702
Interpretation
Since, Test Statistic (10.8992) is greater than t 0.05, 340 (1.65), we reject H0 at the 5% level of
significance. From the hypothesis testing, it can be concluded that the proportion of Supermarket
Type 1 Outlet types is greater than 0.4, with a 5% chance of error in judgement.
Outlet Size
Sample
Outlet Size Count of Outlet_Size Proportion
High 77 0.225806452
Medium 135 0.395894428
14
Small 129 0.37829912
Grand Total 341 1
Margin of Error 0

H0: Proportion of Medium Outlet Sizes is greater than or equal to 0.5 (p >= 0.5)

Ha: Proportion of Medium Outlet Sizes are less than 0.5 (p < 0.5)


T.S.= (p bar - p0)/ sqrt[p0*(1- p0)/n]
T.S. = (0.3959-0.5)/ sqrt[0.5*0.5/341]
T.S. -1.215853248
Interpretation
significance. From the hypothesis testing, it can be concluded that the proportion of Medium Outlet
Sizes are less than 0.5, with a 5% chance of error in judgement.
4) Two Sample t-Test
Low Fat Item Sales v/s Regular fat Item Sales
Regular Fat Item

Low Fat Item Sales Sales
Mean 2269.220485 1955.135233
Variance 2991618.919 1827983.135
Observations 218 123
df 305
t Stat 1.857722851
P(T<=t) one-tail 0.032085978
P(T<=t) two-tail 0.064171956
Now here sample size (n1) = 218; (n2) = 123
Significance level = alpha = 0.05

H0: Average sales of Low-Fat food items is less than or equal to Regular Fat food items
(u1 <= u2)
15
Ha: Average sales of Low-Fat food items is greater than Regular Fat food items (u1 > u2)

Step 4: Rejection Criteria: Reject H0 if p value < alpha
Step 6: p-value for one tailed test is 0.0321
Interpretation
Here, p-value (0.0321) < alpha (0.05), H0 is rejected. From the hypothesis testing, it can be concluded
that the average sales of Low-Fat food items is greater than Regular Fat food items with a 5% chance
of error in judgement.
Supermarket Type 1 Sales v/s Other Outlet Types Sales
Supermarket Type 1
Sales Other Outlet Types Sales
Mean 2264.904274 1914.332028
Variance 1945409.208 3959511.932
Observations 235 106
Hypothesized Mean
Difference 0
df 153
t Stat 1.641125209
P(T<=t) one-tail 0.051412758
P(T<=t) two-tail 0.102825515

Significance level = alpha = 0.05

H0: Average sales in Supermarket Type 1 is greater than or equal to other Outlet Types
(u1 >= u2)

Ha: Average sales in Supermarket Type 1 is less than other Outlet Types? (u1 < u2)

Interpretation
Here, p-value (0.0514) > alpha (0.05), H0 cannot be rejected. From the hypothesis testing, it can be
concluded that the Average sales in Supermarket Type 1 is greater than or equal to other Outlet Types
with a 5% chance of error in judgement.
16
Tier 1 Outlets Sales v/s Other Outlet Location Sales
Tier 1 Outlets Sales Other Outlet Location Sales

Mean 2010.498891 2209.661853
Variance 2655332.263 2563114.735
Observations 92 249
df 160
t Stat 1.006490651
P(T<=t) one-tail 0.157849658
P(T<=t) two-tail 0.315699317

H0: Average sales in Tier1 Outlets is less than or equal to other Outlet Location Types
(u1 <= u2)

Ha: Average sales in Tier 1 Outlets is greater than other Outlet Location Types? (u1 > u2)

Interpretation
Here, p-value (0.1578) > alpha (0.05), H0 cannot be rejected. From the hypothesis testing, it can be
concluded that the Average sales in Tier1 Outlets is less than or equal to other Outlet location Types
with a 5% chance of error in judgement.
5) Anova
ANOVA is a statistical method used to determine whether the means of two or more groups differ
from one another significantly. ANOVA compares the means of various samples to examine the
influence of one or more factors.
Outlet Size & Sales
Anova: Single
Factor
SUMMARY
Groups Count Sum Average Variance
Small Outlet Size
Sales 129 248017.8 1922.619 1993521
Medium Outlet Sales 135 332941.9 2466.237 3379642
High Outlet Sales 77 154211.9 2002.752 1974761
17
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 21827861 2 10913930 4.298803 0.014335 3.022441
Within Groups 8.58E+08 338 2538830
Total 8.8E+08 340
Alpha: Significance Level: 0.05
Hypothesis
H0: The average sales across different outlet sizes are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet size is different
Reject H0 if p-value <= alpha
Interpretation
As, p-value (0.0143) < 0.05, H0 is rejected. This means that at least the average sales across one
category of outlet size are different.
Outlet Location & Sales
Anova: Single Factor
SUMMARY
Tier 1 Outlet Location
Sales 92 184965.9 2010.499 2655332
Sales 120 269192.3 2243.269 1583974
Sales 129 281013.5 2178.4 3491390
ANOVA
Between Groups 2926317 2 1463159 0.563892 0.569525 3.022441
Within Groups 8.77E+08 338 2594752
Total 8.8E+08 340
Hypothesis
H0: The average sales across different outlet location types are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet location type is different
Reject H0 if F Calculated > F critical value
Interpretation
As, F calculated (0.5638) > critical value (3.0224), H0 is rejected. This means that at least the average
sales across one category of outlet location type are different.
18
Outlet Type & Sales
Anova: Single Factor
SUMMARY
Grocery Store Sales 39 15611.01 400.2824 75349.21
Supermarket Type 1
Sales 274 547863.5 1999.502 2103973
Supermarket Type 2
Sales 36 75367.23 2093.534 2385182
Supermarket Type 3
Sales 31 111941 3610.999 4986918
ANOVA
2.62E-
Between Groups 1.8E+08 3 60117003 27.89456 16 2.628646
Within Groups 8.1E+08 376 2155151
Total 9.91E+08 379
Hypothesis
H0: The average sales across different outlet types are same (u1 = u2 = u3)
Ha: At least the average sales across one category of outlet type is different
Reject H0 if F Calculated > F critical value
Interpretation
As, F calculated (27.8946) > F critical value (2.6286), H0 is rejected. This means that at least the
average sales across one category of outlet type are different.
6) Chi-Square Test
The chi square test is conducted to check the association between two categorical values.
Outlet Type & Item Fat
Count of
Item_Fat_Content Item fat Content
Grand
Outlet Type Low Fat Regular Total
Grocery Store 20 19 39
Supermarket Type1 154 81 235
Grand Total 218 123 341
19
Expected Values
Count of
Grand
Outlet Type Low Fat Regular Total
Grocery Store 24.93255132 14.0674 39
Supermarket Type1 150.2346041 84.7654 235
Hypothesis
H0: There is no association between Outlet Type and Item Fat Content
Ha: There is association between Outlet Type and Item Fat Content
Reject H0 if p value < 0.05
alpha = 0.05
p - value = 0.3782
Interpretation
As, p-value (0.3782) > alpha (0.05), H0 cannot be rejected. This means that there is no association
between Outlet Type and Item Fat Content.
Outlet Location Type & Outlet Size
Count of Outlet Size Outlet Size

Grand
Outlet Location Type Medium Small Total
Tier 1 34 58 92
Tier 2 28 65 93
Tier 3 73 6 79
Expected Values
Count of Outlet Size Outlet Size

Grand
Outlet Location Type Medium Small Total
Tier 1 47.04545455 44.95454545 92
Tier 2 47.55681818 45.44318182 93
Tier 3 40.39772727 38.60227273 79
Hypothesis
H0: There is no association between Outlet Location Type and Outlet Size
Ha: There is association between Outlet Location Type and Outlet Size
20
alpha = 0.05
p - value = 1.3 x 10^-17
Interpretation
As, p-value (1.3 x 10^-17) < alpha (0.05), H0 is rejected. This means that there is association between
Outlet Location Type and Outlet Size.
Item Type & Item fat Content
Count of Item Fat Content Item fat Content

Grand
Item Type Low Fat Regular Total
Canned 14 17 31
Frozen Foods 15 17 32
Fruits and Vegetables 21 32 53
Snack Foods 26 13 39
Expected Values
Count of
Grand
Item Type Low Fat Regular Total
Canned 15.2 15.8 31
Frozen Foods 15.69032258 16.30968 32
Fruits and Vegetables 25.98709677 27.0129 53
Snack Foods 19.12258065 19.87742 39
Hypothesis
H0: There is no association between Item Type and Item Fat Content
Ha: There is association between Item Type and Item Fat Content
alpha = 0.05
p - value = 0.0727
Interpretation
As, p-value (0.0727) > alpha (0.05), H0 cannot be rejected. This means that there is no association
between Item Type and Item fat Content.
7) Correlation
A statistical measure called correlation shows how much two or more variables fluctuate in
connection to one another.
21
Item Visibility and Item Outlet Sales
Step 1: Variables
X: Item Visibility
Y: Item Outlet Sales
Step 2: Correlation Values

Item
Visibility Item Outlet Sales
Item Visibility 1
Item Outlet
Sales -0.12439928 1
Correlation: -0.1244
Interpretation
Both the variables are negatively correlated. Since the correlation value (0.1224) is above 0.7, there is
strong correlation between Item Visibility and Item Outlet Sales. This means that as Item Visibility
increases, Item Outlet Sales decreases.
Item MRP and Item Outlet Sales
Step 1: Variables
X: Item MRP
Item MRP Item Outlet Sales

Item_MRP 1
Item Outlet Sales 0.551435693 1
Correlation = 0.5514
Interpretation
Both the variables are positively correlated. Since the correlation value (0.5514) is between 0.5 and
0.7, there is moderate correlation between Item MRP and Item Outlet Sales. This means that as Item
MRP increases, Item Outlet Sales will also increase.
Item Weight & Item Outlet Sales
Objective: To check the relationship between the values of Item Weight and Item Outlet Sales.
Step 1: Variables
X: Item Weight
Item Weight Item Outlet Sales

Item Weight 1
Item Outlet Sales 0.002445 1
22
Correlation: 0.0024
Interpretation: Both the variables are positively correlated. Since the correlation value (0.0024) is
between 0.0 and 0.2, there is very weak to negligible correlation between Item Weight and Item
Outlet Sales.
Item_Outlet_Sales
15000 y = 148.02x
10000
5000
0
0 5 10 15 20 25
Interpretation:
0 is the baseline Item Outlet Sales when the Item Weight is zero.
8) Simple Linear Regression
When attempting to predict a continuous dependent variable from a number of independent

factors, regression analysis is used.
Item Visibility & Item Outlet Sales
Significant F- Value
Regression
Statistics
Multiple R 0.124399
R Square 0.015475
Adjusted R
Square 0.012571
Observations 341
Multiple R: 0.124399
Interpretation: As the value of Multiple R is greater than 0.7, it indicates a strong correlation
between Item Outlet Sales and Item Visibility.
R Square: 0.015475
Interpretation: It indicates that 1.55% of the variance in Item Outlet Sales can be explained by Item
Visibility.
ANOVA
Significance
df SS MS F F
Regression 1 13617423 13617423 5.328547 0.02158017
Residual 339 8.66E+08 2555560
Total 340 8.8E+08
23
Hypothesis
H0: Item Outlet Sales does not have a linear relationship with Item Visibility
Ha: Item Outlet Sales have a linear relationship with Item Visibility
Level of Significance alpha = 5% or 0.05
Interpretation
As, p-value (0.0216) < alpha (0.05), H0 is rejected. This means that Item Outlet Sales has a linear
relationship with Item Visibility, at the 5% level of significance.
Co-efficient Table
Uppe
r
Coefficient
Standard Upper Lower 95.0
s Error t Stat P-value Lower 95% 95% 95.0% %
142.848 16.9285 5.06E- 2137.2435 2699. 2137.
Intercept 2418.225 9 5 47 2 2 2 2699
Item 1813.18 0.0215
Visibility -4185.49 6 -2.30836 8 -7752.0074 -619 -7752 -619
Regression Equation
Y = (-4185.49*X) + 2418.225
Net Sales = (-4185.49*Item Visibility) + 2418.225
Intercept Hypothesis
H0: The intercept is zero

Ha: The intercept is non-zero
Reject H0 if p-value corresponding to Intercept < alpha

p-value = 5.06E-47
alpha = 0.05
Interpretation
Since the p-value corresponding to Intercept (5.06E-47) < alpha (0.05), H0 is rejected. This means
that the intercept is non-zero or the regression line does not pass through the origin. As this is the
case, we can say that there are outliers in the data, which makes intercept significant.
Assumption Checking
Linearity
Item_Outlet_Sales
y = -4185.5x + 2418.2
15000
Item Outlet Sales
10000
5000
0
0 0.1 0.2 0.3
Item Visibility
24
Interpretation: The points are more or less along the straight line. This implies that the relationship
between Item Visibility and Item Outlet Sales are linear in the parameters m and c.
Normality
Normal Probability Plot

20000
Item_Outlet_Sales
10000
0
0 5000 10000 15000
-10000
Sample Percentile
Interpretation: Normal Probability Plot indicates that all the points are along the 45-degree line. This
implies that the error in estimating Net Sales, i.e., ε, and hence Net Sales, follows a normal
distribution.
Homoscedasticity
Item_Visibility Residual
Plot
20000
Residuals
0
0 0.05 0.1 0.15 0.2 0.25 0.3
-20000
Item_Visibility
Interpretation: The points in the Item Visibility Residual plot are randomly placed without any
pattern. This implies that the error in estimating Net sales, i.e., ε, has a constant variance across all the
values of Item Visibility.
Independence
Interpretation: The errors, across all the values of Item Visibility, are independent of each other.
Item MRP & Item Outlet Sales
Regression
Statistics
Multiple R 0.551436
R Square 0.304081
Adjusted R
Square 0.302028
Observations 341
25
Multiple R (0.551436)
Interpretation: The value of Multiple R is greater than 0.5 but less than 0.7, it indicates a moderate
correlation between Item Outlet Sales and Item MRP.
R Square (0.304081)
Interpretation: It indicates that 30.4% of the variance in Item Outlet Sales can be explained by Item
MRP.
ANOVA
Significance
df SS MS F F
Regression 1 267577090.9 3E+08 148.1 1.60628E-28
Residual 339 612375309.3 2E+06
Total 340 879952400.2
Hypothesis
H0: Item Outlet Sales does not have a linear relationship with Item MRP
Ha: Item Outlet Sales have a linear relationship with Item MRP
Level of Significance alpha = 5% or 0.05
Interpretation: As the p-value (1.60628E-28) < alpha (0.05), H0 is rejected. This means that Item
Outlet Sales has a linear relationship with Item MRP, at the 5% level of significance.
Co-efficient Table
Standard P- Upper Lower Upper
Coefficients Error t Stat value Lower 95% 95% 95.0% 95.0%
-
Intercept 74.737 185.8453074 0.4021 0.688 290.8182075 440.2922 -290.818 440.2922
2E-
Item_MRP 14.67632 1.205873101 12.171 28 12.30438097 17.04825 12.30438 17.04825
Regression Equation
Y = 14.67632*X + 74.737
Net Sales = 14.67632*Item MRP + 74.737
Intercept Hypothesis
H0: The intercept is zero

Ha: The intercept is non-zero
Reject H0 if p-value corresponding to Intercept < alpha

p-value = 0.688
alpha = 0.05
Interpretation
Since the p-value corresponding to Intercept (0.688) > alpha (0.05), H0 cannot be rejected. This
means that the intercept is zero or the regression line passes through the origin.
26
Assumption Checking
Linearity
Item_Outlet_Sales
y = 14.676x + 74.737
15000
Item Outlet Sales
10000
5000
0
0 100 200 300
Item MRP
Interpretation: The points are more or less along the straight line. This implies that the relationship
between Item MRP and Item Outlet Sales are linear in the parameters m and c.
Normality
Normal Probability Plot

15000
Item_Outlet_Sales
10000
5000
0
0 5000 10000 15000
-5000
Sample Percentile
Interpretation: Normal Probability Plot indicates that all the points are along the 45-degree line. This
implies that the error in estimating Net Sales, i.e., ε, and hence Net Sales, follows a normal
distribution.
Homoscedasticity
Item_MRP Residual Plot

10000
Residuals
0
0 100 200 300
-10000
Item_MRP
Interpretation: The points in the Item Visibility Residual plot are randomly placed without any
pattern. This implies that the error in estimating Net sales, i.e., ε, has a constant variance across all the
values of Item MRP.
Independence
Interpretation: The errors, across all the values of Item Visibility, are independent of each other.
27
CONCLUSION
The following factors could be considered by the outlets in retail domain to accelerate their sales:
Item Fat Content: The average sales of Low-Fat food items are greater than Regular Fat food items.
To accelerate the sales of Regular Fat food items, companies should focus on marketing the feel, taste
and other aspects of these products.
Outlet Type: The average sales in Supermarket Type 1 are greater than or equal to other Outlet Types
(Grocery, Supermarket Type 2 & 3). To boost the sales in other outlet types other than Supermarket
Type 1, the outlets should adopt Competitive Pricing Strategies.
Outlet Location Type: The average sales in Tier1 Outlet Location Type are less than or equal to
other Outlet location Type. To accelerate the sales in these outlets, they should place their image as
easy to access outlets where quality products are available at affordable rates. They could also adopt
Home Delivery services and provide free delivery to customers who have a membership in their
outlets.
28

SFB - CIA 3 - Report - FINAL

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SFB - CIA 3 - Report - FINAL

Uploaded by

Copyright:

Available Formats

STATISTICS FOR BUSINESS

CIA 3 – DATA ANALYSIS EXERCISE

A Report submitted in partial fulfillment of the requirements for the degree of

Under the Guidance of

Dr. RAJASHREE KAMATH K

S. No. Content Page No.

Secondary data is used for the purpose of this study.

Variable Description Data Type Sub-classsification

SAMPLE SIZE DETERMINATION

DESCRIPTIVE STATICS FOR THE LEADER VARIABLE (ITEM OUTLET SALES)

Sample size, n = (Z alpha/2*S.D.)/k)^2

Hence, optimum sample size for the study = 341

STATISTICAL TOOLS USED FOR DATA ANALYSIS

• Point & Interval Estimation

DESCRIPTIVE STATISTICS ON ALL CONTINUOUS VARIABLES

Item Item Item Outlet

Upper Level 11.69256079 0.067761107 148.2446733 2327.288849

Point Estimator = Sample Mean = 11.0304

Point Estimator = 0.0627

Interval = Mean +- Margin of Error

Point Estimator = 141.8061

Interval = Mean +- Margin of Error

Item Outlet Sales

Point Estimator = 2155.9287

Interval = Mean +- Margin of Error

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)

Item Type Count of Item Type Sample Proportion

Point Estimator = 0.1554

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)

Interval = 0.1554 +- Z 0.05/2*sqrt(0.1554*0.8446/341)

Outlet Location Type

Count of Outlet Location

Point Estimator = 0.3783

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)

Point Estimator = 0.6891

Interval = 0.6891 +- Z alpha/2*sqrt(0.6891*0.3109/341)

Point Estimator = 0.3959

Interval = p bar +- Z alpha/2*sqrt(p bar*1 - p bar/n)

Interval = 0.3959 +- Z 0.05/2*sqrt(0.3959*0.6041/341)

2) One sample t-Test

Type I Error is the false rejection of the null hypothesis, and;

Population Condition H0 True Ha True (H0 False)

Step 1: Development of Null Hypothesis (H0)

Item Outlet Sales

t-Test: Two-Sample Assuming Unequal Variances

Item Outlet Sales

Step 1: Development of null hypothesis

Step 3: Type of test - One-tailed test (Upper Tail Test)

Step 7: Test Statistics

Step 1: Development of null hypothesis

Step 2: Development of alternate hypothesis

Step 3: Type of test - One-tailed test (Lower Tail Test)

Step 7: Test Statistics

t-Test: Two-Sample Assuming Unequal

Step 2: Development of alternate hypothesis

Step 3: Type of test - One-tailed test (Lower Tail Test)

Step 7: Test Statistics

t-Test: Two-Sample Assuming Unequal Variances

Step 1: Development of null hypothesis

Step 2: Development of alternate hypothesis

Step 3: Type of test - One-tailed test (Upper Tail Test)

Step 7: Test Statistics

3) One Sample Test for Proportion

Interval = p bar +- Z alpha/2sqrt(p bar1 - p bar/n)

Interval = p bar +- Z alpha/2sqrt(p bar1 - p bar/n)

Interval = 0.1554 +- Z 0.05/2sqrt(0.15540.8446/341)

Interval = p bar +- Z alpha/2sqrt(p bar1 - p bar/n)

Interval = 0.6891 +- Z alpha/2sqrt(0.68910.3109/341)

Interval = p bar +- Z alpha/2sqrt(p bar1 - p bar/n)

Interval = 0.3959 +- Z 0.05/2sqrt(0.39590.6041/341)