Professional Documents
Culture Documents
As Guided Project Report
As Guided Project Report
PROJECT
REPORT
P a g e 1 | 19
Contents:
Question 1: What are the probabilities of a fire, a mechanical failure, and a human error
respectively?
Question 3: Suppose there has been a radiation leak in the reactor for which the definite cause
is not known. What is the probability that it has been caused by a) a fire b) a mechanical failure
c) a human error?
Question 4: What is the probability that a randomly chosen student gets a grade below 85 on
this exam?
Question 5: What is the probability that a randomly selected student scores between 65 and
87?
Question 6: What should be the passing cut-off so that 75% of the students clear the exam?
Question 7: Define the problem and perform an Exploratory Data Analysis"- Problem
definition,
questions to be answered - Data background and contents - Univariate analysis - Bivariate
analysis"
Question 8: Illustrate the insights based on EDA Key meaningful observations on individual
variables and the relationship between variables
Question 9: Do the users spend more time on the new landing page than the old landing
page?
- State the null and alternate hypotheses - Conduct the hypothesis test and compute the p-
value - Write down conclusions from the test results
Question 10: Does the converted status depend on the preferred language?
- State the null and alternate hypotheses - Conduct the hypothesis test and compute the p-value -
Write down conclusions from the test results
Question 11: Is the mean time spent on the new page same for the different language users?
- State the null and alternate hypotheses - Check the assumptions of the hypothesis test. - Conduct
the hypothesis test and compute the p-value - Write down conclusions from the test results
P a g e 2 | 19
Question 1: What are the probabilities of a fire, a mechanical failure, and a human
error respectively?
Ans: First let’s define what all are the events that we can have: F = Fire, M = Mechanical
Failure, H =Human Error, R =Radiation Leak, N = No accident Now let’s define the
probabilities already given in the problem Prob (R/F) = 0.2, Prob(R/M) =0.5, Prob(R/H)
=0.1,Prob(R Ω F) = 0.001,Prob(R Ω M)=0.0015,Prob(R Ω H)= 0.0012
Prob (F) = Prob (Prob(R Ω F)/ Prob(R/F)) = 0.001/0.2 = 0.005 Prob (M) = Prob(Prob(R
ΩM)/ Prob(R/M)) =
0.0015/0.5 = 0.003 Prob (H) = Prob (Prob(R ΩH)/ Prob(R/H)) = 0.0012/0.1 = 0.012
2.2
Question 2: What is the probability of a radiation leak?
We can have 3 types of possible accidents - Fire/Mechanical Error and Human Error
Probability of No accident Prob(N) = 1- (0.005 -0.003 -0.012) = 0.98 Prob (R/N) = 0
Prob( R Ω N) = Prob(R/N)Prob(N) =0 Probability Theorem: P(R) = 0.001
+0.0015+0.0012+0 = 0.0037
Question 3: Suppose there has been a radiation leak in the reactor for which the
definite cause is not known. What is the probability that it has been caused by a) a fire
b) a mechanical failure c) a human error?
Prob (F/R) = 0.001/0.0037 = 0.270 Prob (M/R)= 0.0015/0.0037 =0.405 Prob(H/R) =
0.0012/0.0037 =0.324
Question 4: What is the probability that a randomly chosen student gets a grade
below 85 on this exam?
Using the Z-score formula:
Z = (X - μ) / σ
where:
X = the value we want to find the probability for (85 in this case)
μ = the mean (77)
σ = the standard deviation (8.5)
Z = (85 - 77) / 8.5
Z = 0.941176
Now, we can use a standard normal distribution table or a calculator to find the cumulative
P a g e 3 | 19
probability corresponding to the Z-score of 0.941176.
From the standard normal distribution table, the cumulative probability (area under the
curve) for a Z-score of 0.941176 is approximately 0.8264.
Therefore, the probability that a randomly chosen student gets a grade below 85 is
approximately 0.8264, or 82.64%.
Question 5: What is the probability that a randomly selected student scores between 65 and 87?
Ans:
For 65:
Z1 = -1.411765
For 87:
Z2 = (87 - 77) / 8.5
Z2 = 1.176471
Using the standard normal distribution table or a calculator, we find the cumulative
probabilities corresponding to Z1 and Z2.
The probability of scoring between 65 and 87 is the difference between the cumulative
probabilities:
Therefore, the probability that a randomly selected student scores between 65 and 87 is
approximately 0.7997, or 79.97%.
Question 6: What should be the passing cut-off so that 75% of the students clear the
exam?
Ans: From the standard normal distribution table or a calculator, we find the Z-score
corresponding to a cumulative probability of 0.75 is approximately 0.6745.
Using the Z-score formula:
Z = (X - μ) / σ
Substituting the known values:
0.6745 = (X - 77) / 8.5
P a g e 4 | 19
Solving for X:
X - 77 = 0.6745 * 8.5
X - 77 = 5.73425
X = 82.73425
Therefore, the passing cut-off should be set at approximately 82.73425 for 75% of the
students to clear the exam.
Question 7: Define the problem and perform an Exploratory Data Analysis"- Problem
definition, questions to be answered - Data background and contents - Univariate analysis
- Bivariate analysis"
observe the first few rows of the dataset, to check whether the dataset has been
loaded properly or not
get information about the number of rows and columns in the dataset
find out the data types of the columns to ensure that data is stored in the preferred
format and the value of each property is as expected.
check the statistical summary of the dataset to get an overview of the numerical
columns of the data
Shap of dataset:(100, 6)
UNIVARIATE ANALYSIS:
P a g e 5 | 19
control 50
treatment 50
P a g e 6 | 19
P a g e 7 | 19
P a g e 8 | 19
P a g e 9 | 19
BIVARIATE ANALYSIS:
P a g e 10 | 19
P a g e 11 | 19
Do the users spend more time on the new landing page than the existing landing page?
P a g e 12 | 19
H0
: The mean time spent by the users on the new page is equal to the mean time spent by the
users on the old page.
Ha
: The mean time spent by the users on the new page is greater than the mean time spent by
the users on the old page.
This is a one-tailed test concerning two population means from two independent
populations.
The population standard deviations are unknown. Based on this information, select the
appropriate test.
The sample standard deviation of the time spent on the new page is: 1.82
The sample standard deviation of the time spent on the old page is: 2.58
The p-value is 0.0001392381225166549
As the p-value 0.0001392381225166549 is less than the level of significance, we reject the
null hypothesis.
P a g e 13 | 19
Is the conversion rate (the proportion of users who visit the landing page and get
converted) for the new page greater than the conversion rate for the old page?
H0:
The conversion rate of the new page is equal to the conversion rate of the old page.
Ha:
The conversion rate of the new page is greater than the conversion rate of the old page.
This is a one-tailed test concerning two population proportions from two independent
populations.
Based on this information, a two proportion z-test would be the most appropriate.
The numbers of users served the new and old pages are 50 and 50 respectively
P a g e 14 | 19
H0:
The converted status is independent of the preferred language.
Ha:
The converted status is dependent of the preferred language.
Based on this information, a chi-square test for independence would be the most
approriate.
As the p-value 0.2129888748754345 is greater than the level of significance, we fail to reject
the null hypothesis.
Is the time spent on the new page same for the different language users?
P a g e 15 | 19
H0:
The mean time spent on the new lading page is the same across all preferred langauges.
Ha:
At least one of the mean times spent on the new landing page is different amongst the
preferred languages.
This is a problem, concerning three population means. Based on this information,
a one-way ANOVA test would be the most appropriate.
The p-value is 0.8040016293525696
Levene’s test
H0
: All the population variances are equal
Ha
: At least one variance is different from the rest
Since the p-value is large, we fail to reject the null hypothesis, meaning the variances are
equal.
P a g e 16 | 19
The p-value is 0.43204138694325955
Draw inference
Since the p-value is greater than the level of significance at 5%, the null hypothesis fails to
be rejected.
This means that the mean time spent on the new landing page is relatively similar regardless
of the preferred language.
Conclusions:
To answer the question if users spend more time on the new landing page than the
existing landing page,
a two-sample independent t-test was performed.
A p-value of 0.0001 has resulted from the test, which is less than the level of
significance of 5%.
Therefore, the null hypothesis is rejected.
What this means in context is that there is significant evidence that
the mean time spent by the users on the new page is greater than the mean time
spent by the users on the old page. In order to answer the question
if the conversion rate for the new page is greater than the conversion rate of the old
page,
a two-proportion z-test was performed.
A p-value of 0.008 has resulted from the test, which is less than the level of
significance of 5%.
Therefore, the null hypothesis is rejected.
What this means in context is that there is significant evidence that the conversion
rate of the new landing page
was greater than the conversion rate of the old landing page. In order to answer the
question if the conversion status andn preferred language are related,
a chi-square test for independence was performed.
A p-value of 0.213 was resulted from the test, which is more than the level of
significance of 5%.
Therefore, the null hypothesis is failed to be rejected. What this means in context is
that conversion status and the preferred langauge of the landing page are
independent of each other.
In order to answer the question
if the time spent on the new landing page differed based on the preferred language,
P a g e 17 | 19
a one-way ANOVA test was performed.
A p-value of 0.432 resulted from the test, which is more than the level of significance of 5%.
Therefore, the null hypothesis is failed to be rejected.
What this means in context is that
the mean time spent on the new landing page was relatively similar across all the preferred
languages.
Recommendations:
E-News Express should fully implement the new landing page as
it appears to gain a lot more traction than the old landing page.
The time spent on the new landing page is greater than the time spent on the old
landing page is evidence that
users prefer it.
It might be beneficial to cut the losses with the old landing page as
there are diminutive returns in average time spent and conversion rate.
The new landing page has an increased conversion rate, therefore, more resources
should be directed towards it as
it has more opportunity to increase membership.
Deploy the new landing page incorporating all the exiting preferred language.
As there is no signficant difference between the average time spent on the new page
across the preferred languages,
the conversion rate to subscribers will be the similar throughout.
Perhaps consider adding more languages to the portal to reach a wider audience.
P a g e 18 | 19
THANK
YOU
P a g e 19 | 19