Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Inferential Statistics - Assignment

Question 1:
The quality assurance checks on the previous batches of drugs found that — it is 4 times more
likely that a drug is able to produce a satisfactory result than not.
Given a small sample of 10 drugs, you are required to find the theoretical probability that at
most, 3 drugs are not able to do a satisfactory job.
a.) Propose the type of probability distribution that would accurately portray the above
scenario, and list out the three conditions that this distribution follows.
b.) Calculate the required probability.

Solution:
a) Based on the above scenario the Binomial distribution will be correct method to arrive
on the probability
Similar cases for binomial distribution:
i) Tossing a coin 20 times to see what is the probability of occurrence of 3 tails
ii) Probability of Drawing 4 red balls from a bag, putting each ball back after
drawing it in 10 trials
iii) Probability of a student to answer 3 questions correctly (by random guess - out
of fixed number of options) in a set of 10 questions

b) Based on the problem statement,


n = 10 -- number of samples
r = 3 -- number of un-successful drug trials (atmost)
p = 0.2 -- probability of being un-successful (…. it is 4 times more likely that a drug is
able to produce a satisfactory result than not)

Based on the Binomial distribution formula,

P (X=r) = nCr (p)r (1-p)(n-r)


Calculating repeatedly for X = 0, 1, 2, 3
P(X=0) = 10C0(0.2)0(1-0.2)(10-0)
P(X=0) = 1 * 1 * 0.11 = 0.11 ~ 11%

P(X=1) = 10C1(0.2)1(1-0.2)(10-1)
P = 10 * 0.2 * 0.134 = 0.268 ~ 26.8%

P(X=2) = 10C2(0.2)2(1-0.2)(10-2)
P = 45 * 0.04 * 0.168 = 0.3024 ~ 30.24%

P(X=3) = 10C3(0.2)3(1-0.2)(10-3)
P = 120 * 0.008 * 0.210 = 0.202 ~ 20.2%

P(X = atmost 3) = P(X=0) + P(X=1) + P(X=2) + P(X=3)


= 11% + 26.8% + 30.24% + 20.2% = 88.24%

Answer: P(X = atmost 3) = 88.24%

Question 2:

For the effectiveness test, a sample of 100 drugs was taken. The mean time of effect was 207
seconds, with the standard deviation coming to 65 seconds. Using this information, you are
required to estimate the range in which the population mean might lie — with a 95%
confidence level.

a.) Discuss the main methodology using which you will approach this problem. State all the
properties of the required method. Limit your answer to 150 words.
b.) Find the required range.
Solution:

a) The solution is based on the Central Limit Theorem.

The Central Limit Theorem states that, no matter how the original population is distributed,
the sampling distribution will follow these three properties –
1. Sampling Distribution’s Mean = Population Mean
2. Sampling Distribution’s Standard Deviation (Standard Error) = 𝜎/√𝑛, where σ is the
population’s standard deviation and n is the sample size
3. For n > 30, the sampling distribution becomes a normal distribution

Using CLT, we can estimate the population mean from the sample mean and standard
deviation, but the population mean’s value has to be reported with some error margin.

Now, the y% confidence interval (i.e., confidence interval corresponding to y% confidence


level) for μ will be given by the range –

Here, X̅ = Sample Mean


Z* is the Z-score associated with the confidence level
S = Standard error, sample standard deviation
n = sample size

b)
Since n(100) > 30, the sampling distribution is a normal distribution

n = 100
SE = 65 secs
µ(x̅) = 207 secs
Confidence Level = 95%
Z* score = 1.96

Population Mean = [X̅ − {(Z*S)/√𝑛}], [ X̅ + {(𝑍∗𝑆)/√𝑛}]


1.96 ∗ 65 1.96 ∗ 65
µ = 207 - µ = 207 +
√100 √100

µ = 194.26 µ = 219.74

Answer: Range of µ = (194.26, 219.74)

Question 3:

a) The painkiller drug needs to have a time of effect of at most 200 seconds to be considered
as having done a satisfactory job. Given the same sample data (size, mean, and standard
deviation) of the previous question, test the claim that the newer batch produces a
satisfactory result and passes the quality assurance test. Utilize 2 hypothesis testing methods
to make your decision. Take the significance level at 5 %. Clearly specify the hypotheses, the
calculated test statistics, and the final decision that should be made for each method.

b) You know that two types of errors can occur during hypothesis testing — namely Type-I
and Type-II errors — whose probabilities are denoted by α and β respectively. For the current
sample conditions (sample size, mean, and standard deviation), the value of α and β come
out to be 0.05 and 0.45 respectively.
Now, a different sampling procedure (with different sample size, mean, and standard
deviation) is proposed so that when the same hypothesis test is conducted, the values of α
and β are controlled at 0.15 each. Explain under what conditions would either method be
more preferred than the other, i.e. give an example of a situation where conducting a
hypothesis test having α and β as 0.05 and 0.45 respectively would be preferred over having
them both at 0.15. Similarly, give an example for the reverse scenario - a situation where
conducting the hypothesis test with both α and β values fixed at 0.15 would be preferred
over having them at 0.05 and 0.45 respectively. Also, provide suitable reasons for your choice
(Assume that only the values of α and β as mentioned above are provided to you and no
other information is available).

Solution:
a)
n = 100
SE = 65 secs (std dev of sample)
µ(x̅) = 207 secs
Significance Level = 5%

Hypothesis:
H0 : µ <= 200 seconds, H1 : µ > 200 seconds
(upper tailed test)

Calculation by UCV/LCV method:

Calculating the std dev of the sample distributions, as std dev of sample can be considered as
the std dev of the population.

𝜎 65
𝜎𝑥̅ = = = 6.5
√𝑛 √100

Zprob = 1 – 0.05 = 0.950


Zc = 1.645

UCV = µ + (Zc * 𝜎𝑥̅ ) = 200 + (1.645 * 6.5) = 210.69


LCV = µ - (Zc * 𝜎𝑥̅ ) = 200 - (1.645 * 6.5) = 189.31

As the sample mean (x̅ = 207 seconds) lies between the UCV and the LCV we cannot reject the
NULL hypothesis.

Calculation by p-value method:

X̅ = 207 seconds
𝜎 = 65 seconds
𝜇 = 200 seconds
α = 5% (significance level)
Calculating the Z score
𝑋̅− 𝜇 207−200
Z= = = 1.077 ~ 1.08
𝜎𝑥̅ 6.5

As Z score is +ve, the following formula will be applied for P-Value

P-Value = (1 - Zprob) * 1 … as the test is one-tailed


= (1 – 0.8599)
= 0.1401 ~ 14%
As the P-Value 14% is greater than significance level (α = 5%), we cannot reject the NULL
hypothesis.

b)
In hypothesis testing,

a) Type-1 (α) error refers to the scenario where we decided to reject the NULL hypothesis
even though it was TRUE, and
b) Type-2 (β) error refers to the scenario where we failed to reject the NULL hypothesis
when it was FALSE
Considering the definitions of the errors, if we apply the logic for the below scenarios
Scenario-1: the value of α and β come out to be 0.05 and 0.45 respectively
Scenario-2: the values of α and β are controlled at 0.15 each

We can conclude from above that, we generally tend to keep the α error limited to a very low
value when there are adverse impacts / side-effects related to acceptance of alternate
hypothesis.

For the scenario in question, if there is any side effect related to the overdose of the painkiller
drug, we need to keep the α error to a minimum. This will mean that we are conservative in
rejecting the null hypothesis.
So, if the painkiller drug has significant side effects, we would like to keep the value of α and β
come out to be 0.05 and 0.45 respectively.

Conversely, we tend to be relaxed with the α error if there were no major side effects in the
painkiller drug. This would mean that we can be aggressive about changing the status quo and
establish the alternate hypothesis. In such cases, our β error limit would be very restrictive.
Question 4:

Now, once the batch has passed all the quality tests and is ready to be launched in the
market, the marketing team needs to plan an effective online ad campaign to attract new
customers. Two taglines were proposed for the campaign, and the team is currently divided
on which option to use.

Explain why and how A/B testing can be used to decide which option is more effective. Give a
stepwise procedure for the test that needs to be conducted.

Solution:

The A/B test is an industry application of two sample proportion test, which is a way to
test two different versions of the same element and see which one performs better.
Procedure for the A/B test for the new painkiller medicine:
Step-1: Define a significance level for this test. Say 5%.
Step-2: Apply the first tagline in the campaign and perform the advertisements.

Step-3: Have a customer survey to get the feedback from the market. The number of
customers providing positive feedback to be denoted as 1 in the survey data. If the
customers were indifferent or gave a negative feed back note them as 0.

Step-4: Continue the campaign for a certain period and gather the information from the
survey.

Step-5: Apply the second tagline in the campaign and perform the advertisements

Step-6: Again, have a customer survey to get the feedback from the market. Note the
survey information as per the rule provided in Step-2.

Step-7: Collect the data from both surveys and then apply the A/B test.

Step-8: Calculate the p-value of both the data sets. If the p-value of the data set is more
than the significance level (=5%) the NULL hypothesis cannot be rejected. Incase the p-
value comes out less than 5% the alternate hypothesis is accepted.

You might also like