GRMD2102 - Homework 3 - With - Answer

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

GRMD2102 Fundamental Statistics for Geographers

Term 1, 2022-23
Homework 3: Point and Interval Estimation, Hypothesis testing

Notes for Submission Requirements:


1. Round all calculation results to 3 decimal places, i.e. 0.001
2. A student should finish all questions independently with the necessary detailed
procedures.
3. Please submit the homework as one single electric file in word or PDF file
⋅ You can type your answer immediately after each question on the question file.
⋅ If you will answer the question by handwriting, please take the clear and readable
photo of individual answer and insert the photo immediately after the
corresponding question.
⋅ You also can answer the questions together by handwriting with the clear indication
on the number of the question and then take the clear and readable photo of all
answers and insert to the answer file.
4. Save the submitted single electric file with the file name as:
• HW3_<surname first full name>.docx/pdf
5. Submit the single electric file to the course blackboard at the folder of Homework at
the link of “Submission of Homework Three” using Attach Files for submission.
6. The submission deadline: 5:00 PM on Friday of 25th November 2022

Questions for Homework Three:

1. Give a geographical example to illustrate the relationship between the level of significance
α and type I error in a hypothesis testing.
Type I error is the null hypothesis (𝐻0 ) is true but it is rejected. For example, we want to
test a claim that the mean value of PM2.5 concentrations in Hong Kong is 30 𝜇𝑔/𝑚3. We
collect samples and calculate the sample mean. Now, let us set the significance level as
0.01, then there has a 1% risk to reject the null hypothesis that the mean value of PM2.5 in
Hong Kong is 30 𝜇𝑔/𝑚3. If we set the significance level as 0.05, there has a 5% risk to
reject the 𝐻0 . A smaller significance level 𝛼 decreases the risk of Type I error.

2. Why an interval estimate is more useful comparing to a point estimate in inferential


statistics?
Interval estimate is more appropriate than point estimate in inferential statistics, as the
interval estimate provides a confidence level for the estimate.

3. The quality control manager at a light bulb factory needs to estimate the mean life of a large
shipment of light bulbs. The standard deviation is 100 hours. A random sample of 64 light
bulbs indicated a sample mean life of 350 hours.

1
a. Construct a 90 % confidence interval estimate of the population mean life of light
bulbs in this shipment. (329.439, 370.561) hours
b. Construct a 99% confidence interval estimate of the population mean life of light
bulbs in this shipment. (317.802, 382.198) hours
c. Suppose that the standard deviation changes to 80 hours. What are your answers in
(a) and (b)?
90% confidence: (333.551, 366.449) hours
99% confidence: (324.242, 375.758) hours

4. A random sample of 7 people were interviewed about their weekly reading hours from a
normal population with mean µ and variance 2. The sample data (weekly hours) are given
below:
9.1; 14; 14.5; 12.2; 11; 12; 9.8

a. Given that  = 2, find a 95% confidence interval for µ.


Since the population is normally distributed, sampling distribution of the sample mean is
also a normal distribution.
From 1 − 𝛼 = 0.95, we know that 𝑍α/2 = −1.96.
From the question, we know that 𝜎 = 2, and 𝑛 = 7.
From the samples, we can calculate that 𝑋̅ = 11.800.
𝜎 2
𝑋 ± 𝑍α/2 = 11.800 ± 1.96 × = 11.800 ± 1.482
√𝑛 √7
The 95% confidence interval for 𝜇 is between 10.318 and 13.282.

b. If 2 is unknown, find an approximate 95% confidence interval for µ


Since 𝜎 2 is unknown, we need to estimate the standard deviation and the t distribution
should be used.
From 𝑛 = 7, we know that the degree of freedom is 6.
From 1 − 𝛼 = 0.95, we know that 𝑡α/2,n−1 = −𝑡1−α/2,n−1 = −2.447.
From the samples, we can calculate that 𝑋̅ = 11.800 and 𝑠 = 2.011.
𝑠 2.011
𝑋 ± 𝑡α/2,n-1 = 11.800 ± 2.447 × = 11.800 ± 1.860
√𝑛 √7
Since 𝜎 2 is unknown, the approximate 95% confidence interval for 𝜇 is between 9.940 and
13.660.

c. Explain the difference in the results of (a) and (b)


Question (a) has a known 𝜎 2 , while question (b) doesn’t. In the case of (b), we need to
estimation the standard deviation for the sample distribution of sample mean. By estimating
the standard deviation, we import more uncertainty from the sample, whereby we cannot
assume the normal distribution of sample mean. We need to assume a t distribution of the
sample mean.
The confidence interval of (b) is larger than (a), because the standard deviation of (b) is
based on a t distribution while (a) is based on the standard normal distribution (using z

2
score). The difference in two distributions leads to the differences in confidence interval.

5. There is one sample of 100 students. The gender ratio of the sample is not known. It is
found that 40 male students in one sample are obtained.

a. Let p be the probability that a male person has been obtained in one sample. Find an
approximate 99% confidence interval for p.
This question is about the confidence interval of proportion.
To check for normality: a random sample and 𝑛𝜋 > 5 and 𝑛(1 − 𝜋) > 5.
𝑋 40
From 𝑛 = 100 and 𝑋 = 40, we have 𝑝 = 𝑛 = 100 = 0.400.
From 1 − 𝛼 = 0.99, we know that 𝑍𝛼/2 = −2.58,
𝑝(1 − 𝑝)
𝑝 ± 𝑍𝛼/2 √ = 0.400 ± 2.58√0.400 ∗ 0.600 /100 = 0.400 ± 0.126
𝑛
The 99% confidence interval for p is between 0.274 and 0.526.

b. Find an approximate 95% confidence interval for the number of the male persons
obtained in one sample.
From 1 − 𝛼 = 0.95, we know that 𝑍𝛼/2 = −1.96,
𝑝(1 − 𝑝)
𝑝 ± 𝑍𝛼/2 √ = 0.400 ± 1.96√0.400 ∗ 0.600 /100 = 0.400 ± 0.096
𝑛
The 99% confidence interval for p is between 0.304 and 0.496.
𝑁𝑢𝑝𝑝𝑒𝑟 = 100 × 0.496 = 49.6 < 50
𝑁𝑙𝑜𝑤𝑒𝑟 = 100 × 0.304 = 30.4 > 30
The approximate 95% confidence interval for the number of the male persons obtained is
between 30 and 50 persons.

6. Given =25. A researcher wants to estimate the mean µ within ±5.

a. If 99% confidence is desired, what sample size is needed?


Given 𝜎 = 25 and 𝑒 = 5,
From 1 − 𝛼 = 0.99, we know that 𝑍𝛼/2 = −2.58,
𝑍𝛼/2 ∗ 𝜎 2 -2.58 ×25 2
𝑛=( ) =( ) = 166.41 ≈ 167
𝑒 5

b. If 95% confidence is desired, what sample size is necessary?


From 1 − 𝛼 = 0.95, we know that 𝑍𝛼/2 = −1.96,
𝑍𝛼/2 ∗ 𝜎 2 -1.96 ×25 2
𝑛=( ) =( ) = 96.04 ≈ 97
𝑒 5

3
c. Compare the results in (a) and (b) and provide the interpretation on the required
sample size.
The required sample size in (a) is higher than (b).
As the confidence level increases, a larger sample size is required.

7. You work for a clothing company and are interested in the average height of men aged 18-
30 in China. You sample 64 men in that age group and find the average height to be 70
inches. The standard deviation is known from past studies to be 3 inches. Is there evidence
at the 1% significance level that the average height of men 18-30 is above 69 inches? What
is the p-value?

Given: n=64, µ=69, σ=3, α=.01, 𝑋̅=70

Step 1: State null and alternative hypothesis.


𝐻0 : µ ≤ 69, 𝐻1 : µ > 69

Step 2: Select the level of significance.


𝛼 = 0.01 as stated in the problem.

Step 3: Select the test statistic.

Z-distribution since σ is known

Step 4: Formulate the decision rule.


Reject H0 if Z>Z1- α
Z1- α = 2.33
Z=2.678

Step 5: since Z=2.678 is > Z1- α =2.33, reject H0

Implication: at 1% significance level, there is evidence that the average height of men at 18-30
is above 69 inches.

p-value=P(Z>2.678)=0.0038

8. CUHK conducted a survey which asked students if they have plan to take statistics course;
552 had the plan, and 532 did not have the plan.

a. At the 0.05 level of significance, use the hypothesis-testing approach to prove that
the percentage of people who want to study abroad is not more than 50%.
Calculate the p-value and interpret its meaning.

This question is about hypothesis test of proportion and p-value.


Given 𝑝=0.50, x=552, n=552+532=1084, 𝑝̂ =x/n=0.509 and 𝛼=0.05.
Step 1: State null and alternative hypothesis.
𝐻0 : 𝑝 ≤ 0.50;
𝐻1 : 𝑝 > 0.50.

4
Step 2: Select the level of significance.
𝛼 = 0.05 as stated in the problem.
Step 3: Select the test statistic.
Use the standard normal distribution since the assumptions are met and both
𝑛𝜋 > 5 and 𝑛(1 − 𝑝) > 5.

Step 4: Formulate the decision rule.


Reject 𝐻0 if 𝑍 > 𝑍1−𝛼 .
From the table, we know that 𝑍1−𝛼 = 1.65.
𝑃̂ − 𝑝 0.509 − 0.50
𝑍= = = 0.593
√ 𝑝(1 − 𝑝) √ 0.50 ∗ 0.50
𝑛 1084
Step 5: Fail to reject 𝐻0 , 𝑍 = 0.593 < 𝑍1−𝛼 = 1.65.

Interpretation: At the 0.05 level of significance, the percentage of people who want to study
abroad is not more than 50%.

p-value = P(Z > 0.593) = 1 - P(Z ≤ 0.593) = 1 - 0.723 = 0.277.

Since p-value = 0.277 > 0.05, we do not reject 𝐻0 at the significant level of 0.05.
As p-value = 0.277, we can reject 𝐻0 at the significance level of 0.278 or above.

***************************** END *****************************

You might also like