Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Great Learning

GRADED PROJECT

ADVANCED STATISTICS

Done By

Thejaswin S

1
LIST OF FIGURES

Figure Name Page

Fig 3.1 Distribution Plot 6

Fig 3.2 Distribution Plot 7

Fig 3.3 Distribution Plot 7

Fig 3.4 Distribution Plot 8

Fig 4.1 Distribution Plot 8

Fig 4.2 Distribution Plot 9

Fig 4.3 Distribution Plot 9

Fig 5 Distribution of Stone surface 10

Fig 6 Distribution of PushUps 11

Fig 7.6.1 Interaction Plot for 18


Alloy 1

Fig 7.6.2 Interaction Plot for 19


Alloy 2

LIST OF TABLES

Table Name Page

Table 7.4.1 Tukey's HSD 15


MultiComparison Alloy 1

Table 7.4.2 Tukey's HSD 16


MultiComparison Alloy 2

Table 7.6 ANOVA comparison 17

Table 7.7.1 : ANOVA Test for 19


methods, dentist and Interaction

Table 7.7.2 : Turkey’s HSD Results 20

2
Problem 1

A physiotherapist with a male football team is interested in studying the relationship between
foot injuries and the positions at which the players play from the data collected

1.1 What is the probability that a randomly chosen player would suffer an injury?
P(Player Injured) = N(players injured)/ N(Total Players)
P(Player Injured) = 145/235
P(Player Injured) = 0.617

1.2 What is the probability that a player is a forward or a winger?


P(forward or winger) = P(forward) + P(winger)
P(forward or winger) = 94/235 + 29/235
P(forward or winger) = 0.523

1.3 What is the probability that a randomly chosen player plays in a striker position
and has a foot injury?
P(Striker and Injured) = 45/235
P(Striker and Injured) = 0.191

1.4 What is the probability that a randomly chosen injured player is a striker?
P(Striker / Injured) = P(Injured and Striker) / P(Injured)
P(Striker / Injured) = (45/235) / (145/235)
P(Striker / Injured) = 45/145
P(Striker / Injured) = 0.31

1.5 What is the probability that a randomly chosen injured player is either a forward
or an attacking midfielder?
P((forward or Attacking midfielder) / Injured) =
(P(forward and Injured) + P(midfielder and Injured)) / P(Injured)

3
P((forward or Attacking midfielder) / Injured) = (56/235 + 24/235) / (145/235)
P((forward or Attacking midfielder) / Injured) = 80/145
P((forward or Attacking midfielder) / Injured) = 0.551

Problem 2

An independent research organization is trying to estimate the probability that an


accident at a nuclear power plant will result in radiation leakage. The types of
accidents possible at the plant are, fire hazards, mechanical failure, or human error.
The research organization also knows that two or more types of accidents cannot
occur simultaneously.

According to the studies carried out by the organization, the probability of a radiation
leak in case of a fire is 20%, the probability of a radiation leak in case of a
mechanical 50%, and the probability of a radiation leak in case of a human error is
10%. The studies also showed the following;

The probability of a radiation leak occurring simultaneously with a fire is 0.1%.


The probability of a radiation leak occurring simultaneously with a mechanical failure
is 0.15%.
The probability of a radiation leak occurring simultaneously with a human error is
0.12%.
On the basis of the information available, answer the questions below:

2.1 What are the probabilities of a fire, a mechanical failure, and a human error,
respectively?
Let's denote the events as follows:
F = Accident is a fire hazard
M = Accident is a mechanical failure
H = Accident is a human error
R = Radiation leakage occurs

Given:
P(R/F) = 20% = 0.2
P(R/M) = 50% = 0.5
P(R/H) = 10% = 0.1
P(R∩F) = 0.1% = 0.001
P(R∩M) = 0.15% = 0.0015
P(R∩H) = 0.12% = 0.0012

P(F) = P(R∩F) / P(R/F)


P(F) = 0.001 / 0.2 = 0.005

4
P(M) = P(R∩M) / P(R/M)
P(M) = 0.0015 / 0.5 = 0.003

P(H) = P(R∩H) / P(R/H)


P(H) = 0.0012 / 0.1 = 0.012
The probabilities of a fire, a mechanical failure, and a human error are 0.005,
0.003, and 0.012, respectively.

2.2 What is the probability of a radiation leak?


By using the law of total probability:
P(R) = P(R/F) * P(F) + P(R/M) * P(M) + P(R/H) * P(H)
P(R) = (0.2 * 0.005) + (0.5 * 0.003) + (0.1 * 0.012)
P(R) = 0.001 + 0.0015 + 0.0012
P(R) = 0.0037
The probability of a radiation leak is 0.0037

2.3 Suppose there has been a radiation leak in the reactor for which the definite
cause is not known. What is the probability that it has been caused by:
A Fire. A Mechanical Failure. A Human Error.

P(F/R) = Probability of a Fire given a radiation leak


P(M/R) = Probability of a Mechanical Failure given a radiation leak
P(H/R) = Probability of a Human Error given a radiation leak

Using Bayes' theorem:


P(F/R) = (P(R/F) * P(F)) / P(R)
P(M/R) = (P(R/M) * P(M)) / P(R)
P(H/R) = (P(R/H) * P(H)) / P(R)

P(F/R) = (0.2 * 0.005) / 0.0037


P(F/R) = 0.27

P(M/R) = (0.5 * 0.003) / 0.0037


P(M/R) = 0.405

P(H/R) = (0.1 * 0.012) / 0.0037


P(H/R) = 0.324

The probabilities that the radiation leak has been caused by a Fire, a Mechanical
Failure, and a Human Error are approximately 0.27, 0.405, and 0.324,
respectively.

5
Problem 3:

The breaking strength of gunny bags used for packaging cement is normally
distributed with a mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg
per sq. centimeter. The quality team of the cement company wants to know the
following about the packaging material to better understand wastage or pilferage
within the supply chain; Answer the questions below based on the given information;
(Provide an appropriate visual representation of your answers, without which marks
will be deducted)

Given: Normal Distribution

3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg
per sq cm?

Fig 3.1 Distribution Plot


The proportion of the gunny bags have a breaking strength less than 3.17 kg per sq
cm is 11.12%

3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per
sq cm.?

6
Fig 3.2 Distribution Plot
The proportion of the gunny bags have a breaking strength at least 3.6 kg per sq cm
is 82.46%.

3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5
kg per sq cm.?

Fig 3.3 Distribution Plot


The proportion of the gunny bags have a breaking strength between 5 and 5.5 kg per
sq cm. Is 13.05%.

3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and
7.5 kg per sq cm.?

7
Fig 3.4 Distribution Plot
The proportion of the gunny bags that have a breaking strength NOT between 3 and
7.5 kg per sq cm is 13.9%

Problem 4:

Grades of the final examination in a training course are found to be normally


distributed, with a mean of 77 and a standard deviation of 8.5. Based on the given
information answer the questions below.

4.1 What is the probability that a randomly chosen student gets a grade below 85 on
this exam?

Fig 4.1 Distribution Plot

8
The probability that a randomly chosen student gets a grade below 85 on this exam
is 0.826.

4.2 What is the probability that a randomly selected student scores between 65 and
87?

Fig 4.2 Distribution Plot


The probability that a randomly selected student scores between 65 and 87 is 0.801.

4.3 What should be the passing cut-off so that 75% of the students clear the exam?

Fig 4.3 Distribution Plot


82.73 should be the passing cut-off so that 75% of the students clear the exam.

9
Problem 5:

Zingaro stone printing is a company that specialises in printing images or patterns on


polished or unpolished stones. However, for the optimum level of printing of the
image the stone surface has to have a Brinell's hardness index of at least 150.
Recently, Zingaro has received a batch of polished and unpolished stones from its
clients. Use the data provided to answer the following (assuming a 5% significance
level)

Fig 5 Distribution of Stone surface

5.1 Earlier experience of Zingaro with this particular client is favorable as the stone
surface was found to be of adequate hardness. However, Zingaro has reason to
believe now that the unpolished stones may not be suitable for printing. Do you think
Zingaro is justified in thinking so?

We can leverage scipy.stats package which offers functions to compute t-stat


and significance. We will perform 2 tailed

Lets Say,
Null Hypothesis: Unpolished stones suitable for printing
Alternate Hypothesis: Unpolished stones may not be suitable for printing
Assuming a 5% significance level
t-statistic: -3.2422320501414053
p-value: 0.0014655150194628353

p_value < alpha, therefore rejecting Null Hypothesis.


Yes, Zingaro is justified in thinking that unpolished stones may not be suitable for
printing.

10
5.2 Is the mean hardness of the polished and unpolished stones the same?
Null Hypothesis(H0): the mean hardness of the polished and unpolished stones
are equal.
Thus we perform independent 2 tailed t test
Alternate Hypothesis(H1): the mean mean hardness of the polished and
unpolished stones are not equal.
Ttest_indResult(statistic=-3.2422320501414053,
pvalue=0.0014655150194628353)
Since p-value is less than 0.05 alpha significance, we reject null Hypothesis.
No, The mean hardness of the polished and unpolished stones are not same.
The mean hardness of polished stone is 147.78
The mean hardness of unpolished stone is 134.11

Problem 6

Aquarius health club, one of the largest and most popular cross-fit gyms in the
country has been advertising a rigorous program for body conditioning. The program
is considered successful if the candidate is able to do more than 5 push-ups, as
compared to when he/she enrolled in the program. Using the sample data provided
can you conclude whether the program is successful? (Consider the level of
Significance as 5%)

Fig 6 Distibution of PushUps

Note that this is a problem of the paired-t-test. Since the claim is that the training will
make a difference of more than 5, the null and alternative hypotheses must be
formed accordingly.

11
We perform paired t-test on two related samples of pushups count as they are
related to each other in some way.
We compute the difference of pushups before and after program.

Let's say,
Null Hypothesis (H0): The mean difference in pushups is less than or equal to 5.
Alternative Hypothesis (H1): The mean difference in pushups is greater than 5.
Given significance level is 5% (0.05)

t-statistic: 19.322619811082458
p-value: 2.292*10^(-35)
The p-value is less than alpha (0.05), therefore rejecting Null Hypothesis.
Thus the program is successful in making a difference of more than 5 push-ups.

Problem 7

Dental implant data: The hardness of metal implants in dental cavities depends on
multiple factors, such as the method of implant, the temperature at which the metal is
treated, the alloy used as well as the dentists who may favor one method above
another and may work better in his/her favorite method. The response is the variable
of interest.

7.1 Test whether there is any difference among the dentists on implant hardness.
State the null and alternative hypotheses. Note that both types of alloys cannot be
considered together. You must state the null and alternative hypotheses separately
for the two types of alloys.?

To test whether there is any difference among the dentists on the implant
hardness, we can use a one-way ANOVA (Analysis of Variance) test as there is
only 1 factor involved. This test will help us determine if there are significant
differences in implant hardness based on the dentists.

The null and alternative hypotheses for the one-way ANOVA test are as follows:

For Alloy1:
Null Hypothesis (H0): There is no difference among dentists on implant hardness
Alternative Hypothesis (H1): There is a difference among dentists on implant
hardness.

For Alloy2:
Null Hypothesis (H0): There is no difference among dentists on implant hardness
Alternative Hypothesis (H1): There is a difference among dentists on implant
hardness.

12
7.2. Before the hypotheses may be tested, state the required assumptions. Are the
assumptions fulfilled? Comment separately on both alloy types.?

Some assumptions include


Independence of observations: The data points are independent of each other.
Each hardness measurement is from a different stone implant, and there should
be no relationship between the measurements.
Normality: The hardness measurements for each group (dentist) should follow a
normal distribution.
Homogeneity of variance: The variance of hardness measurements should be
equal across all groups (dentists).

For Alloy 1:
Independence of observations: Assuming that each hardness measurement is
from a different stone implant, this assumption is likely to be fulfilled.

Normality: To check the normality assumption, we can perform a normality test


on the hardness measurements for each dentist group. For example, we can use
the Shapiro-Wilk test.
Lets say,
Null Hypothesis: Follows normal distribution.
Alternate Hypothesis: Doesnt follow normal distribution.
Shapiro-Wilk Test for Alloy 1:
Dentist 1: ShapiroResult(statistic=0.9113541841506958,
pvalue=0.3254688084125519)
Dentist 2: ShapiroResult(statistic=0.9642462134361267,
pvalue=0.8415456414222717)
Dentist 3: ShapiroResult(statistic=0.8721169233322144,
pvalue=0.12953516840934753)

P-value for all dentist is greater than alpha (0.05), therefore we reject alternate
hypothesis, thus it follows normal distribution.

Homogeneity of variance: We can perform a Levene's test to check the


homogeneity of variance assumption.
Levene's Test for Alloy 1: LeveneResult(statistic=0.1985755079266091,
pvalue=0.8212313638473192)
p-value is greater than 0.05, the null hypothesis is followed therefore,
homogeneity of variance is present.

For Alloy 2:
Independence of observations: Assuming that each hardness measurement is
from a different stone implant, this assumption is likely to be fulfilled.

13
Normality: To check the normality assumption, we can perform a normality test
on the hardness measurements for each dentist group. For example, we can use
the Shapiro-Wilk test.
Lets say,
Null Hypothesis: Follows normal distribution.
Alternate Hypothesis: Doesn't follow a normal distribution.
Shapiro-Wilk Test for Alloy 2:
Dentist 1: ShapiroResult(statistic=0.9039731621742249,
pvalue=0.27593979239463806)
Dentist 2: ShapiroResult(statistic=0.9392004013061523,
pvalue=0.5735077857971191)
Dentist 3: ShapiroResult(statistic=0.9340971112251282,
pvalue=0.5213080644607544)

The P-value for all dentists is greater than alpha (0.05). Therefore we reject the
alternate hypothesis. Thus it follows a normal distribution.

Homogeneity of variance: We can perform Levene's test to check the


homogeneity of variance assumption.
Levene's Test for Alloy 2: LeveneResult(statistic=1.1314570586352275,
pvalue=0.33917195814890017)
p-value is greater than 0.05, the null hypothesis is followed therefore,
homogeneity of variance is present.

7.3. Irrespective of your conclusion in 2, we will continue with the testing procedure.
What do you conclude regarding whether implant hardness depends on dentists?
Clearly state your conclusion. If the null hypothesis is rejected, is it possible to
identify which pairs of dentists differ?

For Alloy1:
Null Hypothesis (H0): There is no difference among dentists on implant hardness
Alternative Hypothesis (H1): There is a difference among dentists on implant
hardness.

F-statistic: 1.1232073892024739
p-value: 0.3417393954842689
As the p-value is greater than alpha (0.05), we reject the alternate hypothesis.
There is no significant difference among the dentists on the implant hardness for
Alloy 1.

For Alloy2:
Null Hypothesis (H0): There is no difference among dentists on implant hardness

14
Alternative Hypothesis (H1): There is a difference among dentists on implant
hardness.

F-statistic: 0.26968540577569117
p-value: 0.7659030899578484
As the p-value is greater than alpha (0.05), we reject the alternate hypothesis.
There is no significant difference among the dentists on the implant hardness for
Alloy 2.

7.4. Now test whether there is any difference among the methods on the hardness of
dental implants, separately for the two types of alloys. What are your conclusions? If
the null hypothesis is rejected, is it possible to identify which pairs of methods differ?

For Alloy1:
Null Hypothesis (H0): There is no significant difference in implant hardness
among methods.
Alternate Hypothesis (H1): There is a significant difference in implant hardness
among methods.

p-value: 0.004
Since p-value is less than alpha (0.05), we reject null hypothesis.
Thus there is a significant difference in implant hardness among methods.
we can perform posthoc tests (Tukey's HSD test or Bonferroni correction) to
identify which pairs of methods differ significantly in terms of implant hardness.
These post-hoc tests help us pinpoint the specific methods between which the
differences exist for each type of alloy.

Table 7.4.1 Tukey's HSD MultiComparison Alloy 1

Method 1 and Method 2 do not have a significant difference in implant hardness


(p-adj > 0.05).
Method 1 and Method 3 have a significant difference in implant hardness (p-adj
< 0.05).
Method 2 and Method 3 have a significant difference in implant hardness (p-adj
< 0.05).

For Alloy2:
Null Hypothesis (H0): There is no significant difference in implant hardness
among methods.

15
Alternate Hypothesis (H1): There is a significant difference in implant hardness
among methods.

p-value: 0.000006
Since p-value is less than alpha (0.05), we reject null hypothesis.
Thus there is a significant difference in implant hardness among methods.
we can perform posthoc tests (Tukey's HSD test or Bonferroni correction) to
identify which pairs of methods differ significantly in terms of implant hardness.
These post-hoc tests help us pinpoint the specific methods between which the
differences exist for each type of alloy.

Table 7.4.2 Tukey's HSD MultiComparison Alloy 2


Method 1 and Method 2 do not have a significant difference in implant hardness
(p-adj > 0.05).
Method 1 and Method 3 have a significant difference in implant hardness (p-adj
< 0.05).
Method 2 and Method 3 have a significant difference in implant hardness (p-adj
< 0.05).

7.5.Now test whether there is any difference among the temperature levels on the
hardness of dental implant, separately for the two types of alloys. What are your
conclusions? If the null hypothesis is rejected, is it possible to identify which levels of
temperatures differ?

For Alloy1:
Null Hypothesis (H0): There is no significant difference in implant hardness
among temperature levels.
Alternate Hypothesis (H1): There is a significant difference in implant hardness
among temperature levels.

One-way ANOVA p-value: 0.7170741113686678


Since p-value is greater than alpha (0.05), we reject alternate hypothesis.
Thus there is no significant difference in implant hardness among temperature
levels.

16
For Alloy2:
Null Hypothesis (H0): There is no significant difference in implant hardness
among temperature levels.
Alternate Hypothesis (H1): There is a significant difference in implant hardness
among temperature levels.

One-way ANOVA p-value: 0.16467846603141556


Since p-value is greater than alpha (0.05), we reject alternate hypothesis.
Thus there is no significant difference in implant hardness among temperature
levels.

7.6.Consider the interaction effect of dentist and method and comment on the
interaction plot, separately for the two types of alloys?

We can perform a two-way ANOVA with interaction to test for the main effects of
'Dentist' and 'Method'.

Table 7.6 ANOVA comparison


For Alloy 1:

1. The p-value for the factor 'Dentist' is 0.01, which is less than the significance
level of 0.05. Therefore, we reject the null hypothesis and conclude that there is
a significant difference in implant hardness among different dentists for Alloy.
2. The p-value for the factor 'Method' is 0.0002, which is less than 0.05. Hence,
we reject the null hypothesis and conclude that there is a significant difference in
implant hardness among different methods for Alloy 1.
3. The p-value for the interaction term 'C(Dentist):C(Method)' is 0.006, which is
less than 0.05. Therefore, we reject the null hypothesis and conclude that there
is a significant interaction effect between 'Dentist' and 'Method' on implant
hardness for Alloy 1.

17
For Alloy 2:

The p-value for the factor 'Dentist' is 0.371, which is greater than 0.05. As a
result, we fail to reject the null hypothesis, indicating that there is no significant
difference in implant hardness among different dentists for Alloy 2.
The p-value for the factor 'Method' is 0.000004, which is less than 0.05. Thus, we
reject the null hypothesis and conclude that there is a significant difference in
implant hardness among different methods for Alloy 2.
The p-value for the interaction term 'Dentist:C(Method)' is 0.09, which is greater
than 0.05. Hence, we fail to reject the null hypothesis, suggesting that there is no
significant interaction effect between 'Dentist' and 'Method' on implant hardness
for Alloy 2.

For Alloy 1, both dentists and methods significantly influence implant hardness,
and there is a significant interaction effect between dentists and methods.
For Alloy 2, only the method significantly influences implant hardness, while the
effect of dentists and the interaction between dentists and methods are not
significant.

Fig 7.6.1 Interaction Plot for Alloy 1

18
Fig 7.6.2 Interaction Plot for Alloy 2

7.7.Now consider the effect of both factors, dentist, and method, separately on each
alloy. What do you conclude? Is it possible to identify which dentists are different,
which methods are different, and which interaction levels are different?
Based on the provided conclusions and post-hoc test results:

Table 7.7.1 : ANOVA Test for methods, dentist and Interaction

19
Table 7.7.2 Turkey HSD Results for comparison on methods

For Alloy 1:
The null hypothesis is rejected, indicating that there is a significant difference in
implant hardness among different methods for Alloy 1.
The post-hoc test (Tukey's HSD) results show that there is a significant
difference in implant hardness between Method 1 and Method 3, as well as
between Method 2 and Method 3. However, there is no significant difference
between Method 1 and Method 2.

For Alloy 2:
The null hypothesis is rejected, implying that there is a significant difference in
implant hardness among different methods for Alloy 2.
The post-hoc test (Tukey's HSD) results show that there is a significant
difference in implant hardness between Method 1 and Method 3, as well as
between Method 2 and Method 3. However, there is no significant difference
between Method 1 and Method 2.
Conclusions:

For both Alloy 1 and Alloy 2, there is a significant difference in implant hardness
among different methods.
In Alloy 1, Method 3 shows significantly different implant hardness compared to
Method 1 and Method 2. However, there is no significant difference between
Method 1 and Method 2.
In Alloy 2, Method 3 exhibits significantly different implant hardness compared to
Method 1 and Method 2. However, there is no significant difference between
Method 1 and Method 2.

20

You might also like