Assignment 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

STAT 5101: Foundations of Data Science

2021-22 Term 1
Assignment #4
Due: Dec 8th , 2021 (Wednesday) at 9:30pm
Total Score: 100 points
This assignment covers material from Chapter 7~9 of the lecture notes.
You are encouraged to show your calculation steps in details, so as to obtain partial scores in
case of incorrect answers.
How to turn in the assignment? During the lecture

Problem 1 [22 points]: A random sample consists of 3 independent observations 𝑋1 , 𝑋2


and 𝑋3 follow Normal distribution 𝑁(𝜇, 𝜎 2 ) consider four estimators for population mean
𝜇 as follow:
𝑋1 + 3𝑋2 − 2𝑋3 5𝑋1 − 2𝑋2 1 1 2𝑋1 + 3𝑋3 − 2𝑋̅
𝜇
̂1 = , 𝜇
̂2 = , ̂3 = 𝑋1 + 𝑋̅,
𝜇 𝜇
̂4 =
2 3 2 2 3
Where 𝑋̅ is the sample mean of 𝑋1 , 𝑋2 , 𝑋3 .
(a) [4 points] Which of the above is/are the unbiased estimator(s) for 𝜇 ?
(b) [4 points] Which of the above is the best unbiased estimator for 𝜇 ?
(c) [3 points] Provide an estimator for 𝜇 which is better than aforementioned 𝜇
̂,
1 𝜇
̂,
2 𝜇
̂3
and 𝜇̂.4 Justify your answer.
If the random sample consists of 3 independent observations 𝑋1 , 𝑋2 and 𝑋3 follow
Binomial distribution 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(4, 𝑝) consider following estimators for 𝑝 :
3𝑋1 − 2𝑋2 𝑋1 + 𝑋2 + 𝑋3 1 1 𝑋̅ 𝑋1 + 𝑋2 + 𝑋3
𝑝
̂1 = , ̂2 = 𝑋̅ =
𝑝 , ̂3 = 𝑋1 − 𝑋̅,
𝑝 𝑝
̂4 = =
4 3 2 4 2 6
(d) [4 points] Which of the above is/are the unbiased estimator(s) for 𝑝 ?
(e) [4 points] Which of the above is the best unbiased estimator for 𝑝 ?
(f) [3 points] Provide an estimator for 𝑝 which is better than aforementioned 𝑝
̂,
1 𝑝
̂,
2 𝑝
̂3
and 𝑝̂.
4 Justify your answer.

Problem 2 [24 points]: Consider a new proposed antibiotic is taken by 16 patients, and their
serum-creatinine level are measured 24 hours after. Assume that the serum-creatinine level
is normally distributed, and the sample mean of serum-creatinine level was 1.50 mg/dL and
sample variance of serum-creatinine level was 0.0784.
Suppose that the standard deviation of serum-creatinine level in population is known at
0.28 mg/dl.
(a) [3 points] Construct a 95% Confidence Interval for the mean serum-creatinine level μ.
(b) [4 points] Instead of using the z0.025 = -1.96 and z0.975 =1.96 to construct the 95%
Confidence Interval for μ in part (a), construct another 95% Confidence Interval for μ
based on z0.02 and z0.97 (i.e. 2nd and 97th percentile of N (0,1)).
(c) [4 points] Compare part (b) to part (a), what’s the difference between these two
confidence intervals? Which confidence intervals provide a more accurate estimation?
Why? [Why we commonly use Symmetry setting (𝑧𝛼/2 𝑎𝑛𝑑 𝑧1−𝛼/2 ) in confidence
interval?]
(d) [4 points] If we want to control the width of 90% confidence interval for 𝜇 is less than
0.5 (Margin of Error is less than 0.25), what’s the minimum value of the sample size 𝑛
we should collect?

Suppose that the standard deviation of serum-creatinine level in population is unknown


(e) [5 points] Construct a 90% Confidence Interval for the mean of serum-creatinine level μ
and a 90% Confidence Interval for the Variance of serum-creatinine level 𝜎 2 based on
the sample mean and sample variance.
(f) [4 points] Construct a 95% Confidence Interval for the mean serum-creatinine level μ
and compare it to the Confidence Intervals in Part (a), which Confidence Intervals are
more accurate, why?

Problem 4 [15 points] A study was conducted to examine if asthma occurs more often for
children with smoking mothers: Of the 600 children age 0-4 with smoking mothers, 17 of
them were had asthma over the past one year. Suppose the annual incidence of asthma (哮
喘) is 2.0% for children age 0-4 from the general population.
(a) [3 points] Construct a 95% confidence interval for p, where p is the annual incidence of
asthma for children age 0-4 with smoking mothers.
(b) [3 points] Test the hypothesis on whether smoker mother would increase the annual
incidence of asthma at α = 0.05.
(c) [4 points] If we don’t know any information about the asthma, could you find the sample
size 𝑛 so that the length of 90 percent confidence interval for 𝑝 is at most 0.04 [Length
of confidence interval equal 2*Margin of Error]
𝑦 𝑦 1 1
Hints: Note that √( ) (1 − ) ≤ √( ) (1 − )
𝑛 𝑛 2 2

(d) [5 points] For Reconfirm the Result, we collect a new sample of 10 children age 0-4 with
smoking mothers and find that 2 of them had asthma over the past one year. Test the
hypothesis on whether smoker mother would increase the annual incidence of asthma
based on the new sample at α = 0.05 Hints: Use exact method and calculate its
p-value.
Problem 5 [23 points] We want to do some research on post office queue system. With
traditional single line to each windows system, the standard deviation of the waiting times
for customers is 6.5 minutes. Now we change it to the individual lines at its various window
and collect a sample for 15 customers. The sample mean is 14.0 minutes and the sample
deviation is 5.0 minutes. Assume that the waiting time follows normal distribution.
(a) [3 points] Construct a 98% Confidence Interval for the Variance of waiting time 𝜎 2
based on the sample mean and sample variance.
(b) [3 points] Test if the new lining system (individual lines at its various window) has larger
variation of waiting time for customers than the traditional one at 𝛼 = 0.05.
Suppose that the mean waiting time of the new lining system is 15 minutes (Known
population mean).
(c) [4 points] Construct an unbiased estimator for the Variance of the new lining system
based on the Known population mean. Which the distribution will this estimator follow?
Please also calculate the parameter of it.
(d) [4 points] Construct a 98% Confidence Interval for the Variance of waiting time 𝜎 2
based on the population mean and sample variance. Hints: (𝑛 − 1)𝑆 2 = ∑𝑛𝑖=1(𝑋𝑖 −
𝑋̅)2 = ∑𝑛𝑖=1(𝑋𝑖 − 𝜇 + 𝜇 − 𝑋̅)2
(e) [4 points] Please compare the Confidence interval in Part (a) and Part (d), which is more
accurate? Why?
(f) [5 points] Test if the new lining system (individual lines at its various window) has larger
variation of waiting time for customers than the traditional one at 𝛼 = 0.05 based
on the new information. Is there any different between Part (f) and Part(b)?

Problem 6 [16 points] The mean birth weight in the United States in 2010 is 𝜇 = 3315
grams with a standard deviation of 𝜎 = 575. After 10 years, the health department claims
that that birth weight would reduce 𝜇 < 3315 and it is very possible that 𝜇 = 3200 with
the same variation. So, we may do the hypothesis testing 𝐻0 : 𝜇 = 3315 vs 𝐻1 : 𝜇 =
3200 (< 3315). We collect a random sample of size 𝑛 = 100: 𝑋1 , 𝑋2 , ⋯ , 𝑋𝑛 .
(a) [4 points] Define a critical region based on sample mean that has a significance level of
𝛼 = 0.05 (Find the critical value 𝑐, let Pr(𝑋̅ < 𝑐|𝜇 = 3315) = 0.05)
(b) [4 points] Based on the critical region calculated in Part (a), could you please calculate
the power of the test? (𝐻0 : 𝜇 = 3315 vs 𝐻1 : 𝜇 = 3200)
(c) [4 points] If the random sample of 𝑛 = 100 yield 𝑥̅ = 3189, what’s your conclusion?
(d) [4 points] Construct a 90% Confidence interval for 𝜇. Can we draw the similar conclusion
based on Hypothesis testing and Confidence interval? What’s the relationship between
them?

---------------End of the Assignment-----------

You might also like