Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Business Report Project - SMDM

Raveena Kumari
1 – Wholesale Customer Data Analysis...............................................
1.1 Problem ..........................................................................
1.2 Problem ..........................................................................
1.3 Problem .........................................................................
1.4 Problem ..........................................................................
1.5 Problem ..........................................................................

2 – Clear Mountain State University (CMSU) Survey........................


2.1 Problem ….........................................................................
2.2 Problem.............................................................................
2.3 Problem.............................................................................
2.4 Problem.............................................................................
2.5 Problem.............................................................................
2.6 Problem.............................................................................
2.7 Problem.............................................................................
2.8 Problem ............................................................................

3 – Hypothesis Testing for Quality of Shingles…………….…….......


3.1 Problem .............................................................................
3.2 Problem .............................................................................

1
Wholesale Customers Analysis

Statement:
A wholesale distributor operating in different regions of Portugal has information on
annual spending of several items in their stores across different regions and channels. The
data consists of 440 large retailers’ annual spending on 6 different varieties of products in 3
different regions (Lisbon, Oporto, Other) and across different sales channel (Hotel, Retail).
We imported the ‘Wholesale Customer data’ dataset in python to analyse the spend under
each store items across regions and channel to find solutions to each problem.

The data is given in the file “Wholesale+Customers+Data” as shown below.

2
There are no null values in any column which is evident from the below result.

Dataset has 9 variables “Buyer/Spender”,” Channel”,” Region”,” Fresh”,


“Milk”,”Grocery”,”Frozen”,”Detergents_Paper”,”Delicatessen”.
Channel and Region both are categorial columns and all the others are integer type.
Dropping as no use column for our analysis 1 continuous types of feature (Buyer/Spender) will be
dropped as no use for our analysis.

3
1.1 Use methods of descriptive statistics to summarize data. Which Region
and which Channel spent the most ? Which Region and which Channel
spent the least ?

The following table is derived using Descriptive Statistics to summarize the data

4
5
1.2 There are 6 different varieties of items that are considered. Describe and
comment / explain all the varieties across Region and Channel? Provide
a detailed ? justification for your answer.

6
See Behaviour in all items across Channel and Region use Bar Plot. Here we see that they are

different in Channel and Region.

In Above Bar graph, it is represent that Fresh items are most demand in Hotel Channel and

highest used in other region of Hotel channel.

From the above graph, we can see that at Lisbon most spent product are Fresh products and the

least spent product is Delicatessen.

At Oporto, the most spent product are Fresh products and least spent products are Delicatessen.

In other category, the most spent product are Fresh products and least spent product are

Delicatessen

The above graph clearly shows that the most spent product in retail category is Grocery products

and least spent product in retail category is the Frozen food products. In Hotel category the most

spent product is the Fresh products and least spent product is the Detergents paper

7
1.3 On the basis of a descriptive measure of variability, which item shows the
most inconsistent behaviour? Which items show the least inconsistent
behaviour?

Fresh item have highest Standard deviation So that is Inconsistent. Delicatessen item have

smallest Standard deviation, So that is consistent.

The above table represents the descriptive statistics for all six food Items

Fresh,Milk,Grocery,Frozen,Detergents_Paper and Delicatessen.

Here the consistency of any food items can be calculated using Coefficient of

Variance(CV).The higher the Coefficient of Variance, greater the level of inconsistency and

vice versa.

Co efficient of Variance(CV).=Standard deviation/mean

8
Coefficient of variance of Fresh products is 1.0527 while that of Delicatessen is 1.8430.

Therefore, Fresh products show the most inconsistent behavior and Delicatessen shows the

least inconsistent behavior

9
1.4 Are there any outliers in the data?
To determine the presence of Outliers in the Data the best method is creating Box Plot of all the
variables as shown.

Yes there are outliers in all the items across the product range (Fresh, Milk, Grocery, Frozen,

Detergents_Paper & Delicatessen) Outliers are detected but not necessarily removed, it

depends of the situation. Here I will assume that the wholesale distributor provided us a

dataset with correct data, so I will keep them as is.

10
From the above it can be concluded that Yes,There are outliers in the data.

Outliers are presents in the variables Fresh, Milk, Grocery, Frozen, Dtergents_Paper and
Delicatessen.

1.5 On the basis of your analysis, what are your recommendations for the
business? How can your analysis help the business to solve its problem?
Answer from the business perspective.

As per the analysis:

I find out that there are inconsistencies in spending of different items (by calculating Coefficient of

Variation), which should be minimized. The spending of Hotel and Retail channel are different

11
which should be more or less equal. And also spent should equal for different regions. Need to

focus on other items also than “Fresh” and “Grocery”

Based on given sample data, the wholesale distributor must focus on below observations:
 In Hotel channel they must focus on Fresh items as it is more in used.
 In Retail channel they must focus on Grocery items as it is more in used.
 All the region spending more in Fresh items, hence they must increase the stock of Fresh
items.
 They must reduce the stock of Delicatessen items as it is less in demand in all the region.

12
Clear Mountain State University (CMSU) Survey

Problem 2 –

The Student News Service at Clear Mountain State University (CMSU) has decided to gather
data about the undergraduate students that attend CMSU. CMSU creates and distributes a
survey of 14 questions and receives responses from 62 undergraduates.

Summary–

This business report provides detailed explanation of approach to each problem


given in the assignment and provides relative information with regards to solving the problem.

CMSU Survey Data Analysis We imported the ‘CMSU Survey-1’ dataset in python to analyse
the data about the undergraduate students who attend CMSU. Below is the detailed
approach and answer.

13
14
2.1Problem. For this data, construct the following contingency tables (Keep
Gender as row variable).

Problem 2.1.1 Gender and Major Solution:

15
Problem 2.1.2. Gender and Grad Intention

Below is the output from Python,

Problem 2.1.3. Gender and Employment


Below is the output from Python,

Problem 2.1.4. Gender and Computer


Below is the output from Python,

2.2. Assume that the sample is representative of the population of CMSU.


Based on the data, answer the following question:

Solution: For this we need to find out total male students out of whole student from the
given data.

16
2.2.1. What is the probability that a randomly selected CMSU student will be
male?

After calculation we got the result that probability of 46.77% student will be male in CMSU if
randomly selected.

2.2.2. What is the probability that a randomly selected CMSU student will be
female?

Solution: For this we need to find out total female students out of whole student from the
given data. After calculation we got the result that probability of 53.23% student will be
female in CMSU if randomly selected.

17
2.3. Assume that the sample is representative of the population of CMSU.
Based on the data, answer the following question:
2.3.1. Find the conditional probability of different majors among the male
students in CMSU.

Solution:

The Probability of Male in Accounting: 13.79 %

The Probability of Male in CIS: 3.45 %

The Probability of Male in Economics/Finance: 13.79 %

18
The Probability of Male in International Business: 6.9 %

The Probability of Male in Management: 20.69 %

The Probability of Male in Other: 13.79 %

The Probability of Male in Retailing/Marketing: 17.24 %

The Probability of Male in Undecided: 10.34 %

2.3.2 Find the conditional probability of different majors among the female
students in CMSU.

Solution:

19
The Probability of Female in Accounting: 9.09 %

The Probability of Female in CIS: 9.09 %

The Probability of Female in Economics/Finance: 21.21 %

The Probability of Female in International Business: 12.12 %

The Probability of Female in Management: 12.12 %

The Probability of Female in Other: 9.09 %

The Probability of Female in Retailing/Marketing. is 27.27%

The Probability of Female in Undecided. is 0.00%

2.4 Problem. Assume that the sample is a representative of the population of


CMSU. Based on the data, answer the following question:

2.4.1 Find the probability That a randomly chosen student is a male and
intends to graduate.

Solution: Using contingency tables of Gender and Grad Intention we got the total numbers
of males and number of males intends to be graduate And post calculation we find out that -
Probability of Males and intends to be Graduate. is 58.62%

20
2.4.2 Find the probability that a randomly selected student is a female and
does NOT have a laptop.
Solution: Using contingency tables of Gender and Computer we got the total numbers of
females and number of females does not have a laptop

And post calculation we find out that - Probability of randomly selected student is a Female
and does NOT have a laptop. is 12.12%

2.5 Problem. Assume that the sample is representative of the population of


CMSU. Based on the data, answer the following question:
2.5.1 Find the probability that a randomly chosen student is either a male or
has fulltime employment?
Solution: Using contingency tables of Gender and Employment we got the total numbers of
males and number of males who are full time employed.

And post calculation we find out that - Probability of randomly chosen student is either Male
or has full time employment. is 51.61%

21
2.5.2. Find the conditional probability that given a female student is randomly
chosen, she is majoring in international business or management.

Solution:
Using contingency tables of Gender and Major we got the total numbers of females and
number of females majoring in international business or management. And post calculation
we find out that - Probability that given a female student is randomly chosen, she is majoring
in international business or management is 24.24%

22
2.6 Problem. Construct a contingency table of Gender and Intent to Graduate at
2 levels (Yes/No). The Undecided students are not considered now and the
table is a 2x2 table. Do you think the graduate intention and being female are
independent events?

The Probability that a randomly selected student ‘being female’

The Probability that a randomly selected student the graduate intention and being female

P(Grad Intention Yes) = 28/40*100 = 70

P(Grad Intention Yes | female) = 11 / 20*100 = 55

The probabilities are not equal. This suggests that the two events are independent.

2.7 Problem. Note that there are four numerical (continuous) variables in the
data set, GPA, Salary, Spending, and Text Messages.

Answer the following questions based on the data 2.7

2.7.1 If a student is chosen randomly, what is the probability that his/her GPA
is less than 3?

Solution: Using contingency tables of Gender and GPA we got the total numbers of
students and number of students GPA less than 3 And post calculation we find out that -
Probability that student is chosen randomly and that his/her GPA is less than 3 is 27.42%

23
2.7.2 Find the conditional probability that a randomly selected male earns 50 or
more. Find the conditional probability that a randomly selected female earns 50
or more.

Solution:
Using contingency tables of Gender and Salary we got the total numbers of Male and
Female and number of male and female earning 50 or more And post calculation we find out
that - Probability that randomly selected male earns 50 or more is 48.28% And Probability
that randomly selected female earns 50 or more is 54.55%

24
2.8.1 Problem Note that there are four numerical (continuous) variables in the
data set, GPA, Salary, Spending, and Text Messages. For each of them
comment whether they follow a normal distribution.

Solution: Used distplot to know the normal distribution of these four numerical (continuous)
variables in the data set – GPA, Salary, Spending and Text Message.

25
The most Simplest test for normality of any distribution is through graphing box plots of data sets.
Here,The distribution is assumed to be normal if the box is concentrated in the center with the
length of the whiskers considerably long.
For which, the GPA and Salary, satisfies the conditions, while Text Messages and Spending do
not because the box tend to concentrate a little bit lower.

By these details we confirm that out of the given four data sets ‘GPA’ and ‘Salary’ are following
normal distribution whereas other two ‘Spending’ and ‘Text Messages’ are not following the
normal distribution

26
2.8.2 Write a Note summarizing your conclusions

From this analysis, we can conclude that the sample survey conducted for the students from
central Missouri state university shows that there are multiple factors that affect the graduation of
a student. The survey conducted by Student News Service at Clear Mountain State University
(CMSU) has information about what major the undergrad students are pursuing, whether they
intent to graduate, what is their GPA, nature of their employment and their salary, social
networking, spending, satisfaction, computer and text messages. Using our analysis, we have
constructed contingency tables and calculated probabilities between these variables. We can
conclude that in order to help students graduate and find suitable employment the university can
work on improving the infrastructure by providing easy access to computers and conducting
social networking events. The probabilities of male students graduating is more than that of
female students, so female students need more support and choice of major.

27
Hypothesis Testing for Quality of Shingles

Problem 3 –
An important quality characteristic used by the manufacturers of ABC asphalt shingles is the
amount of moisture the shingles contain when they are packaged. Customers may feel that
they have purchased a product lacking in quality if they find moisture and wet shingles inside
the packaging. In some cases, excessive moisture can cause the granules attached to the
shingles for texture and coloring purposes to fall off the shingles resulting in appearance
problems. To monitor the amount of moisture present, the company conducts moisture tests.

A shingle is weighed and then dried. The shingle is then reweighed, and based on the
amount of moisture taken out of the product; the pounds of moisture per 100 square feet are
calculated. The company would like to show that the mean moisture content is less than 0.35
pound per 100 square feet. The file (A & B shingles.csv) includes 36 measurements (in
pounds per 100 square feet) for A shingles and 31 for B shingles.

Summary–This business report provides detailed explanation of approach to each problem


given in the assignment and provides relative information with regards to solving the problem.
3 – Asphalt Shingles Data Analysis We imported the ‘A & B shingles’ dataset in python to
analyze the data about the Asphalt Shingles. Below is the detailed approach and answer.

The file (A&B singles.csv) including 36 measurements(in pounds per 100 square feet) for A
shingles and 31 for B singles.

28
29
3.1Problem Do you think there is evidence that mean moisture contents in both
types of shingles are within the permissible limits? State your conclusions
clearly showing all steps.

Since pvalue > 0.05, do not reject H0 . There is not enough evidence to conclude that the mean

moisture content for Sample A shingles is less than 0.35 pounds per 100 square feet. p-value =

0.0747. If the population mean moisture content is in fact no less than 0.35 pounds per 100 square

feet, the probability of observing a sample of 36 shingles that will result in a sample mean moisture

content of 0.3167 pounds per 100 square feet or less is .0747.

Since pvalue < 0.05, reject H0 . There is enough evidence to conclude that the mean moisture

content for Sample B shingles is not less than 0.35 pounds per 100 square feet. p-value = 0.0021. If

the population mean moisture content is in fact no less than 0.35pounds per 100 square feet, the

probability of observing a sample of 31 shingles that will result in a sample mean moisture content of

0.2735 pounds per 100 square feet or less is .0021.

30
3.2 Problem Do you think that the population means for shingles A and B are
equal?

Form the hypothesis and conduct the test of the hypothesis. What assumption
do you need to check before the test for equality of means is performed?

Theoretical Assumptions for the Hypothesis Testing:

To perform a Test of equality of the population mean of the A singles and B singles, the null
and alternative hypothesis to test whether the population mean moisture content is equal
given:

H0: mean moisture content of A = mean moisture content of B

HA: mean moisture content of A != mean moisture content of B

Level of significance :0.05

We have a sample and don’t know the population standard deviation.

The sample is not a large sample. So we can use the t distribution and tSTAT test statistic.

Since we a testing for equality between sample A and B we use two sample T Test.

Solution:

As the pvalue > α , do not reject H0; and we can say that population mean for shingles A and B are
equal Test Assumptions When running a two-sample t-test, the basic assumptions are that the

31
distributions of the two populations are normal, and that the variances of the two distributions are the
same.
If those assumptions are not likely to be met, another testing procedure could be use

32
Thank You !

33

You might also like