Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Statistical Method for Decision Making

Assignment -January 2021

PGP - DSBA

Submitted by:

Aamir Haque
Post Graduate Program in Data Science and Business Analytics

Page | 1
Contents:
1 – Wholesale Customer Data Analysis...................................................................................................4

1.1 Problem 1.1............................................................................................................................7

1.2 Problem 1.2.............................................................................................................................9

1.3 Problem 1.3.............................................................................................................................12

1.4 Problem 1.4.............................................................................................................................15

1.5 Problem 1.5.............................................................................................................................16

2 - Clear Mountain State University (CMSU) Survey...............................................................................17

2.1Problem 2.1..............................................................................................................................18

2.2Problem 2.2..............................................................................................................................20

2.3. Problem 2.3.............................................................................................................................20

2.4. Problem 2.4.............................................................................................................................25

2.5. Problem 2.5.............................................................................................................................27

2.6. Problem 2.6.............................................................................................................................30

2.7. Problem 2.7.............................................................................................................................31

2.8. Problem 2.8.............................................................................................................................33

3 – Hypothesis Testing for Quality of Shingles.......................................................................................36

3.1. Problem 3.1.............................................................................................................................36

3.2. Problem 3.2.............................................................................................................................39

Page | 2
SMDM JAN-2021

Executive Summary

This project summarizes events and situations from three different problem areas driven with
their current available data. The primary objective of this project is to thoroughly analyze the
diversified data using the different lines of statistics and draw conclusions/recommendations in
order to make a decision backed by the results of statistical analysis of data.
Note:
#Kindly refer to the I/p datasets from SMDM project olympus portal for all three problems.
#Kindly refer to the python notebook file(SMDM Assignment Jan 2021 - Aamir Haque.ipynb )
SMDM for the code wise execution and solution of the assignment.

Page | 3
Problem Statement 1

A wholesale distributor operating in different regions of Portugal has information on annual


spending of several items in their stores across different regions and channels. The data
consists of 440 large retailers’ annual spending on 6 different varieties of products in 3
different regions (Lisbon, Oporto, Other) and across different sales channel (Hotel, Retail).

Exploratory Data Analysis: -

Page | 4
SMDM JAN-2021

Dataset shows that there are 9 variables.

1. Channel and Region both are categorical columns


2. Fresh, Milk, Grocery, Frozen, Detergents Paper and Delicatessen are integer data type.
3. Buyer/Spender has unique row number for every transaction detail. There are
2 types of Channel (Hotel & Retail). There are 3 Regions (Other, Lisbon & Oporto) and
rest are the 6varieties for which the spending has been provided.

Descriptive Data Analysis: -

1. The Channel column has two unique values in which ‘Hotels’ have the most values
(298), The Region column has three unique values in which ‘Other’ have the most values
(316).
2. The various items mean and their standard deviation are as follows: -

Page | 5
No. Items Mean Standard Deviation

1 Fresh 12000.2977 12647.3289

2 Milk 5796.2659 7380.3772

3 Grocery 7951.2773 9503.1628

4 Frozen 3071.9318 4854.6733

5 Detergents Paper 2881.4932 4767.8544

6 Delicatessen 1524.8704 2820.1059

Checking for Null-values: -

The above data shows that there are no null-values in the data.

All the variables are in numerical format except Region and Channel which are in object format.
There are total 440 rows and 9 columns in the dataset

Page | 6
SMDM JAN-2021

Question 1.1: - Which Region and which Channel seems to spend more? Which Region and
which Channel seems to spend less?

Answer 1.1: -

In order to calculate the amount spent by Region and Channel, the data needs to be
aggregated individually by Region and Channel. Based on the aggregation, the annual spend is
more in the sales channel ‘Hotel’ than Retail. The above descriptive statistics shows that average
spending on Fresh is 1200, Milk is 5796,
Grocery is 7951, Frozen is 3071, Detergents_Paper is 2881 and Delicatessen is 1525. From this
result we can say that highest spending amount is on Grocery.

Page | 7
1. The data shows that the spending in Other regions is way more as compared to both of
the regions combined. The spending in Others is 10,677,599 as compared to annual
spending in Lisbon and Oporto are 2,386,813 and 1,555,088 respectively.
2. The spending in Lisbon region is 53.4% more as compared to Oporto. Oporto is spending
the least which is 1,555,088

3. This clearly shows that the Hotels as a Channel are spending more i.e., 7,999,569 than
Retails which is spending 6,619,931.
4. This shows that hotels are spending 20.8% more as compared to Retail.
5. The Hotel and Oporto region seems to spending the least.

Page | 8
SMDM JAN-2021

Question 1.2: - There are 6 different varieties of items are considered. Do all varieties show
similar behavior across Region and Channel?

Answer 1.2: -

All the behavioral characteristics of data can be determined by aggregating the data
using the describe function of python. Below data shows the count, mean, std, min, max and IQRs
of all the varieties across the three regions and two channels.

Page | 9
1. The graph clearly shows that the amount spend on the ‘Fresh’ items is more through Hotel
channel as compared to the Retail channel. Also, in Hotel channel spending on the Fresh
items is maximum in every region as compared to Retail channel.

2. Looking at the above graphs, we see that some categories like Milk, Grocery &
Detergents_Paper have higher spend in the Retail channel versus Hotel, across all regions.
On the other hand, Fresh and Frozen have higher consumption in the Hotel channel versus
Retail, across all regions.

3. The average annual spending on the ‘Fresh’ items is high in Lisbon region as compare to
Oporto region in Hotel channel, and vice-versa trend can be seen in Retail channel.

Page | 10
SMDM JAN-2021

4. The average annual spending on ‘Milk’ items is higher in the Retail channel as compared
to Hotel. Lisbon seems to spending more as compared to Oporto region on Milk items
through both channels.

5. The average amount spent on the ‘Grocery’ items is more in Retail channel as compared
to the Hotel. Also, the Lisbon region is spending more in Grocery items via Retail
channel as compared to the Hotel in which Oporto’s annual spending is highest across the
country.

6. The average annual spending on ‘Frozen’ items in Hotel is more as compared to Retail
channel. Oporto region is the major contributor for Hotel whereas the average spending is
highest in Retail channel.

7. The average annual spending on ‘Detergent Paper’ is very high across the Retail channel
as compared to the Hotel. The Oporto region is spending the most via Retail whereas
Hotels dominate the spending in Lisbon region.

8. The average annual spending in ‘Delicatessen’ items is less in hotels as compared to the
Retail channel. The Lisbon region on an average spends the highest via Retail.

The skewness results for all the items are displayed above which clearly states that the data for
Delicatessen is highly skewed and asymmetric. It clearly says that the data is right skewed for

Page | 11
all the columns. However, the item ‘Delicatessen’ is highly right skewed with the value 11.11
above and Since our data is positively skewed here, it means that it has a higher number of
data points having low values.

Question 1.3: - On the basis of descriptive measure of variability, which item shows the
most inconsistent behavior? Which items show the least inconsistent behaviour?

Answer 1.3: -

Descriptive measures of variability are used to describe the amount of variability or spread in a set
of data. The most common measures of variability are the range, the interquartile range (IQR),
variance, standard deviation, and coefficient of variation. We will use coefficient of variation here.

CV= σ/ μ

Where σ=standard deviation and μ=mean

Standard Deviation of all items:

Therefore, Coefficient of Variation – CV=

From the above results, we found that Delicatessen shows the most inconsistent behavior and Fresh
shows the least inconsistent behavior.

Page | 12
SMDM JAN-2021

Page | 13
The histograms of the items also show the same that the ‘Fresh’ items and ‘Grocery’ items are the
most widespread among the items and have the highest standard deviation as well whereas
‘Delicatessen’ items being less variable have the lowest standard deviation.

When a distribution has lower variability, the values in a dataset are more consistent. However,
when the variability is higher, the data points are more dissimilar and extreme values become more
likely.

Page | 14
SMDM JAN-2021

Question 1.4: - Are there any outliers in the data?

Answer 1.4: -

To check the outliers in the dataset, there are couple of graphs and methods are available to
perform.
Plots which we can use to check for the outliers are Boxplot, scatterplot. Data point far from
other data point present in the plots would be considered as an outlier.
We can calculate IQR Interquartile range for the variable. If the data point is outside the IQR
range, then it would consider as outlier.
The data for the above questions such as skewness and the behavioral characteristics indicates
that there are many outliers in the data. We can also confirm the same by doing a boxplot for all
the items.

The box plots clearly show us that each of the item in the data has outliers in it. We can notice
data for median and IQR also not evenly placed for few such as Detergents_Paper. There are
extreme outlier occurrences for all of these items.

Page | 15
Question 1.5 - On the basis of your analysis, what are your recommendations for the
business? How can your analysis help the business to solve its problem? Answer from the
business perspective.

Answer 1.5: -

Business Recommendations-

1.It can be noticed that the overall sales in Hotels is much higher than the sales in Retail. The
distributor may consider Retail channel as a target area for further expansion on growth. Spend in
Hotel needs to be increased in Milk, Grocery and Detergents_Paper. Spend in Retail needs to be
increased in Fresh, Frozen and Delicatessen.

2.The annual spending on Grocery items is directly proportional to the number of Retailers in the
region. So, Retail Channel should spend most on the Grocery items. The spending should be done
carefully as Grocery items are also very inconsistent.

3.The annual spending through both channels by all the regions should be managed carefully
especially in case of Fresh items. The Fresh items have the highest standard deviation and are least
inconsistent. So, the spending on this item should be done carefully.

4.The data is not normally distributed due to the presence of many outliers. This indicates that a
large no of sales can be attributed to some specific buyers. Additional consideration in the business
should be taken for these buyers to ensure long term retention.

5.As the skewness of delicatessen is very high, it indicates the dependency of sales on few buyers
depicting a high risk and high dependency.

6.There needs to be focus on increasing the spend in Lisbon, Oporto regions and Retail Channel
to balance the spend to reduce risk while increasing business

Page | 16
SMDM JAN-2021

Problem Statement 2

The Student News Service at Clear Mountain State University (CMSU) has decided to gather
data about the undergraduate students that attend CMSU. CMSU creates and distributes a survey
of 14 questions and receives responses from 62 undergraduates.

Exploratory Data Analysis: -

Data has 14 variables in it. ID is the variable which has the unique row number for each response

Page | 17
1. There are 6 categorical variables that are Gender, Class, major, Grad Intent, Employment
and Computer.
2. There are 5 integer data type variables that are Age, Social Networking, Satisfaction,
Spending and Text Messages.
3. There GPA and Salary are 2 float data type variables.

There are no missing values present in the dataset.

Question 2.1: - For this data, construct the following contingency tables

2.1.1. Gender and Major

2.1.2. Gender and Grad Intention

2.1.3. Gender and Employment

2.1.4. Gender and Computer

Page | 18
SMDM JAN-2021

Answer 2.1.1: - The below table is created using the crosstab function denoting the varied
subjects as Major for each Male and Female.

Answer 2.1.2: - The below table is created using the crosstab function denoting the intention to
be a graduate for each Male and Female.

Answer 2.1.3: - The below table is created using the crosstab function denoting the Employment
status for each Male and Female.

Answer 2.1.4: - The below table is created using the crosstab function denoting the type of
computer type availability status each Male and Female.

Page | 19
Question 2.2: - Assume that the sample is representative of the population of CMSU. Based
on the data, answer the following question:

2.2.1. What is the probability that a randomly selected CMSU student will be male?

Answer 2.2.1: -

Total Students=62

Total Male Students=29

Probability that a randomly selected CMSU student will be Male = 29/62

P(Male) = 0.4677 or 46.78%

2.2.2. What is the probability that a randomly selected CMSU student will be female?

Answer 2.2.2: -

Total number of students = 62

Number of Female students = 33

Probability that a randomly selected CMSU student will be female = 33/62

P(Female) = 0.532 or 53.2%

Question 2.3: - Assume that the sample is representative of the population of CMSU. Based
on the data, answer the following question:

2.3.1 Find the conditional probability of different majors among the male students in
CMSU

Page | 20
SMDM JAN-2021

Answer 2.3.1: -

Total number of students = 62

Number of males = 29

1. Number of males prefer Acc = 4


Probability Acc ∩ Male = 4/62
Probability Male = 29/62
Probability of a male selecting Acc = (Probability Acc ∩ Male)/ (Probability Male)
Probability of a male selecting Acc = 4/29
P (Accounting| Male) = 0.138 or 13.8%

2. Number of males prefer CIS = 1


Probability CIS ∩ Male = 1/62
Probability Male = 29/62
Probability of a male selecting CIS = (Probability CIS ∩ Male)/ (Probability Male)
Probability of a male selecting CIS = 1/29
P (CIS| Male) = 0.035 or 3.5%

3. Number of males prefer Economics/Finance = 4


Probability Economics Finance ∩ Male = 4/62
Probability Male = 29/62
Probability of a male selecting Economics/Finance =
(Probability Economics/Finance ∩ Male)/ (Probability
Male)

Page | 21
Probability of a male selecting Economics Finance = 4/29
P (Economics/Finance| Male) = 0.138 or 13.8%

4. Number of males prefer International Business = 2


Probability International Business ∩ Male = 2/62
Probability Male = 29/62
Probability of a male selecting International Business =
(Probability International Business ∩ Male)/ (Probability
Male)
Probability of a male selecting International Business = 2/29
P (International Business| Male) = 0.069 or 6.9%

5. Number of males prefer Management = 6


Probability Management ∩ Male = 6/62
Probability Male = 29/62
Probability of a male selecting Management =
(Probability Management ∩ Male)/ (Probability
Male)
Probability of a male selecting Management = 6/29
P (Management| Male) = 0.207 or 20.7%

6. Number of males prefer Retailing/Marketing = 5


Probability Retailing/Marketing ∩ Male = 5/62
Probability Male = 29/62
Probability of a male selecting Retailing Marketing =
(Probability Retailing Marketing ∩ Male)/ (Probability
Male)
Probability of a male selecting Retailing/Marketing = 5/29
P (Retailing/Marketing| Male) = 0.172 or 17.2%

7. Number of males prefer Other = 4


Probability Other ∩ Male = 4/62
Probability Male = 29/62
Probability of a male selecting Other = (Probability Other ∩ Male)/ (Probability Male)

Page | 22
SMDM JAN-2021

Probability of a male selecting Other = 4/29


P (Other| Male) = 0.138 or 13.8%

8. Number of males Undecided = 3


Probability Undecided ∩ Male = 3/62
Probability Male = 29/62
Probability of a male Undecided = (Probability Undecided ∩ Male)/ (Probability Male)
Probability of a male Undecided = 3/29
P (Undecided| Male) = 0.103 or 10.3%

2.3.2 Find the conditional probability of different majors among the female students in
CMSU

Answer 2.3.2: -

Total number of students = 62

Number of females = 33

1. Number of females prefer Acc = 3


Probability Acc ∩ Female = 3/62
Probability Female = 33/62
Probability of a female selecting Acc = (Probability Acc ∩ Female)/ (Probability Female)
Probability of a female selecting Acc = 3/33
P (Accounting| Female) = 0.091 or 9.1%

Page | 23
2. Number of females prefer CIS = 3
Probability CIS ∩ Female = 3/62
Probability Female = 33/62
Probability of a female selecting CIS = (Probability CIS ∩ Female)/ (Probability Female)
Probability of a female selecting CIS = 3/33
P (CIS| Female) = 0.091 or 9.1%

3. Number of females prefer Economics/ Finance = 7


Probability Economics Finance ∩ Female = 7/62
Probability Female = 33/62
Probability of a female selecting Economics/Finance =
(Probability Economics/Finance ∩ Female)/ (Probability Female)
Probability of a female selecting Economics Finance = 7/33
P (Economics/Finance| Female) = 0.212 or 21.2%

4. Number of females prefer International Business = 4


Probability International Business ∩ Female = 4/62
Probability Female = 33/62
Probability of a female selecting International Business =
(Probability International Business ∩ Female)/ (Probability
Female)
Probability of a female selecting International Business = 4/33
P (International Business| Female) = 0.121 or 12.1%

5. Number of females prefer Management = 4


Probability Management ∩ Female = 4/62
Probability Female = 33/62
Probability of a female selecting Management =
(Probability Management ∩ Female)/ (Probability
Female)
Probability of a female selecting Management = 4/33
P (Management| Female) = 0.121 or 12.1%

Page | 24
SMDM JAN-2021

6. Number of females prefer Retailing/Marketing = 9


Probability Retailing/Marketing ∩ Female = 9/62
Probability Female = 33/62
Probability of a female selecting Retailing Marketing =
(Probability Retailing/Marketing ∩ Female)/ (Probability
Female)
Probability of a female selecting Retailing Marketing = 9/33
P (Retailing/Marketing| Female) = 0.273 or 27.3%

7. Number of females prefer Other = 3


Probability Other ∩ Female = 3/62
Probability Female = 33/62

Probability of a female selecting Other =

(Probability Other ∩ Female)/ (Probability Female)

Probability of a female selecting Other = 3/33


P (Other| Female) = 0.091 or 9.1%

8. Number of females Undecided = 0


Probability Undecided ∩ Female = 0/62
Probability Female = 33/62
Probability of a female Undecided =
(Probability Undecided ∩ Female)/ (Probability
Female)
Probability of a female Undecided = 0/33
P (Undecided| Female) = 0.0

Question 2.4: - Assume that the sample is a representative of the population of CMSU.
Based on the data, answer the following question:

2.4.1 Find the probability that a randomly chosen student is a male and intends to
graduate.

Page | 25
Answer 2.4.1: -

Total number of students = 62

Number of males = 29

Number of males Graduation Intent [Yes] = 17

Probability Graduation Intent [Yes] ∩ Male = 17/62

Probability Male = 29/62

Probability of a male Graduation Intent [Yes] =

(Probability Graduation Intent [Yes] ∩ Male)/ (Probability


Male)

Probability of a male Graduation Intent [Yes] = 17/29

P (Graduation Intent [Yes]| Male) = 0.586 or 58.6%

2.4.1 Find the probability that a randomly chosen student is a female and does NOT
have a laptop.

Page | 26
SMDM JAN-2021

Answer 2.4.1: -

Total number of students = 62

Number of females = 33

Number of females Not having laptop = 4

Probability Not having laptop ∩ Female = 4/62

Probability Female = 33/62

Probability of a female Not having laptop =

(Probability Not having laptop ∩ Female)/ (Probability


Female)

Probability of a female Not having laptop = 4/33

P (No Laptop | Female) = 0.121 or 12.1%

Question 2.5: - Assume that the sample is representative of the population of CMSU. Based
on the data, answer the following question:

2.5.1 Find the probability that a randomly chosen student is either a male or has full-
time employment?

Page | 27
Answer 2.5.1: -

Total number of students = 62

Number of males = 29

Number of full-time employees = 10

Number of males’ full-time employees = 7

Probability Male = 29/62

Probability full time employees = 10/62

Probability males ∩ full-time employees = 7/62

P (Male U Full-Time Employment) =

Probability Male + Probability full-time employees - Probability males ∩ full-time


employees

Probability Male U Full Time Employment = 32/62

P (Male U Full-Time Employment) = 0.516 or 51.6%

Page | 28
SMDM JAN-2021

2.5.2 Find the conditional probability that given a female student is randomly chosen, she is
majoring in international business or management.

Answer 2.5.2: -

Total number of students = 62

Number of females = 33

Total number of female International Business or Management = 8

Probability International Business and Management ∩ Female = 8/62

Probability Female = 33/62

Probability female International Business or Management =

(Probability International Business or Management ∩ Female)/ (Probability female)

Probability female International Business or Management = 8/33

P (International Business or Management| Female) = 0.242 or 24.2%

Page | 29
Question 2.6: - Construct a contingency table of Gender and Intent to Graduate at 2 levels
(Yes/No). The Undecided students are not considered now and the table is a 2x2 table. Do
you think the graduate intention and being female are independent events?

Answer 2.6: -

Computation of Probability- P (F ∩ Yes) = P(F)P(Yes)

Total number of students = 62

Total Students Undecided Grad=22

Total Students new=Total Students old-Total Students Undecided Grad=62-22=40

Number of females = 20

Number of student Graduation Intent Yes = 28

Number of female Graduation Intent Yes = 11

Total Students Grad Intent=40

Probability female = 20/40

Probability Graduation Intent Yes | Female = 11/40

Probability of a Female Student with Grad Int Yes= 20/40*11/40 = 0.1375

P (F ∩ Yes) = 13.8%

Therefore, graduate intention and being female are independent events.

Page | 30
SMDM JAN-2021

Question 2.7: - Note that there are four numerical (continuous) variables in the data set,
GPA, Salary, Spending, and Text Messages. Answer the following questions based on the
data

2.7.1. If a student is chosen randomly, what is the probability that his/her GPA is
less than 3?

Answer 2.7.1: -

Total number of students = 62

Number of students’ GPA less than 3 = 17

Probability student’s GPA less than 3 = 17/62

P (GPA < 3.0) = 0.27 or 27.4%

2.7.2. Find the conditional probability that a randomly selected male earns 50 or
more. Find the conditional probability that a randomly selected female earns 50 or more.

Page | 31
Answer 2.7.2: -

Total number of students = 62

1. Total Male Students=29


Total Male Students sal_eq_grt50 = 14
Probability of Random Male sal_eq_grt50 = Total Male Students sal_eq_grt50/Total
Male Students
Probability of Random Male sal_eq_grt50 = 14/29

P (Salary >= 50|Male) = 0.483 or 48.3%

2. Number of females = 33
Number of females salary greater than 50 = 18

Probability of Random female sal_eq_grt50 = Total female Students sal_eq_grt50/Total


female Students
Probability of Random female sal_eq_grt50 = 18/33

P (Salary >= 50|Females) = 0.545 or 54.5%

Page | 32
SMDM JAN-2021

Question 2.8: - Note that there are four numerical (continuous) variables in the data set,
GPA, Salary, Spending, and Text Messages. For each of them comment whether they
follow a normal distribution. Write a note summarizing your conclusions

Answer 2.8: -

Using histogram and mean, median and mode to know the normal distribution of these four
numerical (continuous) variables in the data set – GPA, Salary, Spending and Text Messages.

GPA Variable:

GPA Mean: 3.13


GPA Median: 3.15
GPA Standard Deviation: 0.377

GPA follows a normal distribution and has no outliers.

Page | 33
Salary Variable:

Salary Mean: 48.55


Salary Median: 50.0
Salary Standard Deviation: 12.08

As shown above in the visual representations and concluded from this that the data is not
following the normal distribution. Since mean and median of the Salary column has slight
difference. It means that the data is slightly skewed. The boxplot shows that it has outliers.

Spending Variable:

Page | 34
SMDM JAN-2021

Spending Mean: 482.02


Spending Median: 500.0
Spending Standard Deviation: 221.95

As shown above in the visual representations and concluded from this that the data is not
following the normal distribution. Since mean and median of the Spending column is not same.
That means the variable is skewed and has outliers.

Text Messages Variable:

Text Messages Mean: 246.21


Text Messages Median: 200.0
Text Messages Standard Deviation: 214.47

As shown above in the visual representations and concluded from this that the data is not
following the normal distribution. Since mean and median of the Text Messages column has
huge difference. It results that that data is highly right skewed and has outliers.

The GPA box plot is normally distributed as the whiskers of the box plot are of the same length
whereas the box plots of Salary, Spending, Text Messages have different whisker length and
hence are not normally distributed.

Page | 35
Problem Statement 3

An important quality characteristic used by the manufacturers of ABC asphalt shingles is the
amount of moisture the shingles contain when they are packaged. Customers may feel that they
have purchased a product lacking in quality if they find moisture and wet shingles inside the
packaging. In some cases, excessive moisture can cause the granules attached to the shingles for
texture and coloring purposes to fall off the shingles resulting in appearance problems. To
monitor the amount of moisture present, the company conducts moisture tests. A shingle is
weighed and then dried. The shingle is then reweighed, and based on the amount of moisture
taken out of the product, the pounds of moisture per 100 square feet are calculated. The company
would like to show that the mean moisture content is less than 0.35 pound per 100 square feet.

Exploratory Data Analysis: -

Dataset has 2 variables A and B, which has the measurement of moisture present per 100 sq. ft.
Both variables are float data types.

Question 3.1. Do you think there is evidence that mean moisture contents in both types of
shingles are within the permissible limits? State your conclusions clearly showing all steps.

Page | 36
SMDM JAN-2021

Answer 3.1.: -

Let

𝜇A = Mean moisture content in shingle A

𝜇B = Mean moisture content in shingle B

For Shingle A

Step I

Null hypothesis (𝐻0) states that mean moisture contents in shingles A, 𝜇A is less than
equals to 0.35.
Alternative hypothesis (𝐻𝐴) states that moisture contents in shingles A, 𝜇A is greater
than 0.35.

𝐻0: 𝜇A <= 0.35


𝐻𝐴: 𝜇A >0.35

Step II

Since the 𝛼 is not given so,


Here we select 𝛼 = 0.05

The sample size (n) for this problem is 36.

Step III

We do not know the population standard deviation and n = 36. So, we use the t
distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic.

Step IV

We will conduct a one sample t test on Shingles A(n=36) by using a python function to
compare the mean of a sample to a pre-specified value and tests for a deviation from that
value.
One sample t test

Page | 37
From Python notebook:
t statistic: -1.473505
P value: 0.07477633

Step V

Level of significance: 0.05


P value > Level of significance.
P value is 0.074776 and it is greater than 5% level of significance

So, the statistical decision is failing to reject the null hypothesis at 5% level of
significance.

Conclusion

Hence, at 95% confidence level, there is sufficient evidence to prove that mean moisture
content in A shingles is less than 0.35 pound per 100 square feet.

For Shingle B

Step I

Null hypothesis (𝐻0) states that mean moisture contents in shingles B, 𝜇B is less than or
equals to 0.35 pound per 100 square feet.
Alternative hypothesis (𝐻𝐴) states that moisture contents in shingles B, 𝜇B is greater than
0.35 pound per 100 square feet.

𝐻0: 𝜇A <= 0.35


𝐻𝐴: 𝜇A > 0.35

Step II

Since the 𝛼 is not given so,


Here we select 𝛼 = 0.05

Page | 38
SMDM JAN-2021

The sample size (n) for this problem is 31.

Step III

We do not know the population standard deviation and n = 31. So, we use the t
distribution and the 𝑡𝑆𝑇𝐴𝑇 test statistic.

Step IV

One sample t test from Python notebook:

t statistic: -3.10033130
P value: 0.0020904774

Step V

Level of significance: 0.05


P value < Level of significance
P value is 0.00209 and it is less than 5% level of significance

So, the statistical decision is to reject the null hypothesis at 5% level of significance.

Conclusion

Hence, at 95% confidence level, there is sufficient evidence to prove that mean moisture
content in B shingles is more than 0.35 pound per 100 square feet.

Question 3.2: - Do you think that the population mean for shingles A and B are equal?
Form the hypothesis and conduct the test of the hypothesis. What assumption do you need
to check before the test for equality of means is performed?

Page | 39
Answer 3.2: -

In testing whether the population mean in shingles is same in both A and B.

1. We assumed that samples are random and both the populations are normally distributed.
2. We assumed unequal variances of the populations.
3. We assume that for a t-test the scale of measurement applied to the data collected follows
a continuous or ordinal scale.
4. We assume that a large sample size is used. A larger sample size means the distribution
of results should approach a normal bell-shaped curve.

Step I

Null hypothesis (𝐻0) states that the population mean in shingles is the same, 𝜇𝐴 equals
𝜇𝐵.
Alternative hypothesis (𝐻𝐴) states that the population mean in shingles is different, 𝜇𝐴 is
not equal to 𝜇𝐵.

𝐻0: 𝜇𝐴 - 𝜇𝐵 = 0 i.e. 𝜇𝐴 = 𝜇𝐵
𝐻𝐴: 𝜇𝐴 - 𝜇𝐵 ≠ 0 i.e. 𝜇𝐴 ≠ 𝜇𝐵

Step II

Since the 𝛼 is not given so


Here we select 𝛼 = 0.05.

Step III

We have two samples and we do not know the population standard deviation.

The sample is not a large sample. So, you use the t distribution and the 𝑡𝑆𝑇𝐴𝑇 test
statistic for two sample unpaired test.

Step IV

We use the scipy.stats.ttest_ind to calculate the t-test for the means of TWO
INDEPENDENT samples of shingles given the two sample observations. This function

Page | 40
SMDM JAN-2021

returns t statistic and two-tailed p value. This is a two-sided test for the null hypothesis
that 2 independent samples have identical average (expected) values. This test assumes
that the populations have identical variances.

Independent Sample t-test Assumed unequal variances.


t Stat = 1.28851
P Value = 0.20226

Step V

Level of significance: 0.05


P value > Level of significance.
P value is 0.20226 and it is greater than 5% level of significance

So, the statistical decision is failing to reject the null hypothesis at 5% level of
significance.

Conclusion

Hence, at 95% confidence level, there is sufficient evidence to prove that population
means in shingles A is equal to population mean in shingles B.

Page | 41
Page | 42

You might also like