Instructions: GEN1008 / MED1018 / GED1008 Mid-Term Test Page 1 of 17

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

GEN1008 / MED1018 / GED1008 Introduction to Statistics – Mid-term Test

Name: Student ID:

INSTRUCTIONS
1. Time allowed: 1.5 hours.
2. Answer all questions on this question-answer paper.
3. The use of HKEA approved calculator(s) is allowed.
4. The use of dictionary and other electronic devices are prohibited.

DO NOT TURN OVER UNTIL YOU ARE TOLD TO DO SO

Question Score
1
2
3
4
5
Total / 50

GEN1008 / MED1018 / GED1008 Mid-term Test Page 1 of 17


Question 1 (10 marks)

A health care professional wishes to analyse the Body Mass Index (BMI) of teenagers
aged from 13 to 19. He conducted a survey and collected the BMI readings (in kg/m 2)
of 24 teenagers. The data are shown in the table below. It is given that the mean of
data below is 21.25 kg/m2.

25 20 22 31 20 19 23 25
18 16 20 18 25 17 22 26
18 22 17 19 15 21 28 23

a) Construct a frequency distribution table for the above sample data using 6 classes.
Use 15 as the lower limit of the first class. Write a title and include the class
limits, the class boundaries and the frequency as columns in your table. [4]
b) What is the shape of the distribution of data in the sample? Compare the mean
and the median in the sample data. [2]
c) Given ∑𝑥 2 = 11200, find the standard deviation of the sample data. [2]
d) How will the standard deviation in (c) be affected if the two largest values in the
sample are changed to numbers higher than 32 kg/m2? Explain your answer
briefly. [2]

[Solution]

a)
Class Limits Class Boundaries Frequency
15 - 17 14.5 - 17.5 4
18 - 20 17.5 - 20.5 8
21 - 23 20.5 - 23.5 6
24 - 26 23.5 - 26.5 4
27 - 29 26.5 - 29.5 1
30 - 32 29.5 - 32.5 1
Total 24
Frequency distribution of the Body Mass Index of 24 teenagers in the
sample.
(4 marks)

b) The sample data is positively skewed. The mean is larger than the median.
(2 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 2 of 17


c) Since the sample mean is 21.25, ∑𝑋 = 510.
(0.5 marks)

The sample variance is


𝑛 ∑ 𝑋 2 − (∑𝑋)2 24(11200) − 5102
𝑠2 = = = 15.761
𝑛(𝑛 − 1) 24(24 − 1)
The sample standard deviation is
𝑠 = √15.761 = 3.97 kg/m2
(1.5 marks)

d) The standard deviation will be larger since the values in the data set are more
dispersed. (2 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 3 of 17


Question 2 (10 marks)

Consider a population of university graduates and their scores in an aptitude test are
analysed. There are two groups of graduates: Group A graduates have degrees in
humanity and Group B graduates have degrees in science. The mean and standard
deviation of scores of both groups and all graduates are shown in the table below. It is
found that the distribution of scores of all graduates follows a bimodal distribution.

Mean Standard Deviation


Group A 63.9 9.1
Group B 78.8 10.6
Overall 71.4 12.3

a) Is the variable in the study qualitative or quantitative? [1]


b) What are the coefficients of variation of the scores of Group A and Group B
graduates respectively? Compare the dispersion of scores of these two groups of
graduates. [3]
c) By using the Chebyshev’s Theorem, the overall mean and standard deviation,
find the range of scores so that at least 80% of all graduates will have. [3]
d) If the passing mark of the aptitude test is 40, is it reasonable to say that at least
80% of all graduates get a pass in the test? [1]
e) Explain whether there is any difference in your answer in (d) if the distribution
of scores of all graduates is not bimodal, but is uniform. [2]

[Solution]

a) The variable is quantitative. (1 mark)

b) For Group A graduates, the coefficient of variation is


𝜎 9.1
𝐶𝑉𝑎𝑟 = × 100% = × 100% = 14.24%
𝜇 63.9

For Group B graduates, the coefficient of variation is


𝜎 10.6
𝐶𝑉𝑎𝑟 = × 100% = × 100% = 13.45%
𝜇 78.8
(1 mark)
Since the coefficient of variation of the scores of Group A graduates is larger, the
scores of Group A graduates is more dispersed.
(2 marks)
GEN1008 / MED1018 / GED1008 Mid-term Test Page 4 of 17
c) By setting 1 − (1/𝑘 2 ) = 80% ⇒ 𝑘 = 2.236. According to the Chebyshev’s
Theorem, at least 80% of scores will be within 2.236 standard deviations of the
mean. So, the lower limit is
𝜇 − 𝑘𝜎 = 71.4 − 2.236(12.3) = 43.89
The upper limit is
𝜇 − 𝑘𝜎 = 71.4 + 2.236(12.3) = 98.9

So, at least 80% of all graduates will have aptitude scores from 43.89 and 98.9
marks. (3 marks)

d) Yes, this is reasonable, since at least 80% of students have scores higher than
43.89, which is higher than the passing mark. (1 mark)

e) There is no difference, since the Chebyshev’s Theorem is applicable to any


distribution. (2 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 5 of 17


Question 3 (10 marks)

There is a study on the reading time (hours per week) of secondary school students in
Hong Kong. A recent report reveals that the population mean time is 3.2 hours per
week and the population standard deviation is 0.8 hours per week. It is assumed that
the reading time follows a normal distribution.

a) Find the proportion of secondary school students in Hong Kong whose reading
time is less than 2.4 hours per week. [2]
b) Suppose a researcher wishes to select students whose reading time is in the
middle 50% of the reading time of the population. Find the range of the reading
time of students who may be selected for the research. [4]
c) Another researcher selected a sample of 45 students from the whole population.
What is the probability that the mean reading time of the sample is less than 2.8
hours per week? [2]
d) Explain whether there are any changes in your answer in (c) if the reading time
follows a left-skewed distribution, instead of a normal distribution. [2]

[Solution]

a) The required proportion is


2.4 − 3.2
𝑃(𝑋 < 2.4) = 𝑃 (𝑍 < )
0.8
= 𝑃(𝑍 < −1)
= 0.1587
There are 15.87% of secondary school students in Hong Kong whose reading
time is less than 2.4 hours per week.
(2 marks)

b) To find the lower and upper reading time 𝑥1 and 𝑥2 ,


𝑃(𝑥1 < 𝑋 < 𝑥2 ) = 0.5
𝑥1 − 3.2 𝑥2 − 3.2
𝑃( <𝑍< ) = 0.5
0.8 0.8
So,
𝑥1 − 3.2 𝑥2 − 3.2
𝑃 (𝑍 < ) = 0.25 ; 𝑃 (𝑍 < ) = 0.75
0.8 0.8
Therefore,

GEN1008 / MED1018 / GED1008 Mid-term Test Page 6 of 17


𝑥1 − 3.2 𝑥2 − 3.2
= −0.67 = 0.67
0.8 0.8
𝑥1 = 3.2 − 0.67(0.8) 𝑥1 = 3.2 + 0.67(0.8)
= 2.66 = 3.74

Students whose reading time is between 2.66 and 3.74 hours per week may be
selected for the research.
(4 marks)

c) The required probability is


2.8 − 3.2
𝑃(𝑋̅ < 2.8) = 𝑃 (𝑍 < )
0.8/√45
= 𝑃(𝑍 < −3.35)
= 0.0004
The probability that the mean reading time of the sample is less than 2.8 hours
per week is 0.0004.
(2 marks)

d) The answer is the same. (0.5 marks)

This is because, by the Central Limit Theorem, the distribution of sample means
follows a normal distribution when the sample size is large (n > 30) for any
population distribution. (1.5 marks)

Remarks:
- If the student just states “because of the Central Limit Theorem”, deduct 0.5
marks.
- If the student just states “because the sample size is large”, deduct 0.5 marks.

GEN1008 / MED1018 / GED1008 Mid-term Test Page 7 of 17


Question 4 (10 marks)

Suppose you work in a research team to investigate whether cancer patients are
satisfied with their lives after receiving a radiation therapy treatment. In a pilot study
(i.e., a smaller scale study), a sample proportion 0.64 is used to determine a 90%
confidence interval, which is [0.5692, 0.7108]. The sample proportion is the
proportion of patients in the sample who are satisfied with their lives.

a) Based on the results from the pilot study, is it reasonable to conclude that more
than half of cancer patients are satisfied with their lives after receiving a
radiation therapy treatment? Explain your answer. [2]
b) What is the margin of error of the confidence interval in the pilot study? [1]
c) Suppose you want to conduct a larger scale study to find the confidence interval
of proportion. The desired level of confidence is 95% and the margin of error is
half of that in the pilot study result. By using the sample proportion in the pilot
study, find the minimum sample size required. [3]
d) You finally used a sample of 750 patients in your study and 492 of them are
satisfied with their lives after receiving the treatment. Find the 95% confidence
interval of the proportion using this sample data. Interpret your answer in the
context of the subject matter. [4]

[Solution]

a) Yes, it is reasonable since the 90% confidence interval contains values that are
larger than 0.5. (2 marks)

b) The margin of error is 0.1416/2 = 0.0708. (1 mark)

c) At the 95% confidence interval, 𝑧 = 1.96.


The margin of error is 𝐸 = 0.0708/2 = 0.0354.

The required sample size is


𝑧 2
𝑛 = 𝑝̂ (1 − 𝑝̂ ) ( )
𝐸
1.96 2
= 0.64(1 − 0.64) ( )
0.0354
= 706.29 = 707 (𝑟𝑜𝑢𝑛𝑑 𝑢𝑝)

Therefore, a minimum of 707 patients are required. (3 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 8 of 17


d) The sample proportion is 𝑝̂ = 492/750 = 0.656
The 95% confidence interval for population proportion is

𝑝̂ (1 − 𝑝̂ ) 0.656(1 − 0.656)
𝑝̂ ± 𝑧√ = 0.656 ± 1.96√
𝑛 750

= 0.656 ± 0.034 = (0.622, 0.69)


(2 marks)

We are 95% confident that between 62.2% and 69% of cancer patients are
satisfied with their lives after receiving the treatment. (2 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 9 of 17


Question 5 (10 marks)

There is a treatment on patients with diabetes. To study the effectiveness of the


treatment, 56 patients with diabetes were chosen and they have taken the treatment for
one week. The fasting glucose level (in mmol/L) of these patients are then measured.
The sample mean is 5.3 mmol/L and the sample standard deviation is 0.7 mmol/L.

a) What is the level of measurement of the variable in the study? [1]


b) Find the 95% confidence interval for the variable in the study. [2]
c) Interpret your answer in (b) in the context of the subject matter. [2]
d) What are the effects on the confidence interval width in (b) if (i) the sample
mean is higher than 5.3 mmol/L, and (ii) the sample standard deviation is higher
than 0.7 mmol/L, respectively? [2]
e) The sample is also used to test a hypothesis that the mean fasting glucose level of
patients after treatment is less than 6 mmol/L. Write down the null and the
alternative hypothesis for the test and identify the claim. [2]
f) Is the hypothesis test in (e) a left-tailed, right-tailed or two-tailed test? [1]

[Solution]

a) The variable is measured at ratio level. (1 mark)

b) The degree of freedom is 55. At the 95% level of confidence, t = 2.004.


𝑠 0.7
𝑥̅ ± 𝑡 ( ) = 5.3 ± 2.004 ( ) = 5.3 ± 0.1875 = (5.11, 5.49)
√𝑛 √56
(2 marks)

c) We are 95% confident that the population mean fasting glucose level of all
patients with diabetes is between 5.11 mmol/L and 5.49 mmol/L one week after
the treatment. (2 marks)

d) (i) No change
(ii) The interval is wider (2 marks)

e) 𝐻0 : 𝜇 = 6
𝐻1 : 𝜇 < 6 (claim)
(2 marks)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 10 of 17


f) This is a left-tailed test. (1 mark)

THE END

GEN1008 / MED1018 / GED1008 Mid-term Test Page 11 of 17


(This is a blank page)

GEN1008 / MED1018 / GED1008 Mid-term Test Page 12 of 17


Formulae

Sample mean for individual data:


∑𝑋
𝑋̅ =
𝑛

Sample mean for grouped data:


∑𝑓 ⋅ 𝑋𝑚
𝑋̅ =
𝑛

Population variance:

2
∑(𝑋 − 𝜇)2 ∑𝑋 2
2
∑𝑋 2
𝜎 = 𝑜𝑟 𝜎 = −( )
𝑁 𝑁 𝑁

Sample variance:
∑(𝑋 − 𝑋̅)2 𝑛(∑𝑋 2 ) − (∑𝑋)2
𝑠2 = 𝑜𝑟 𝑠2 =
𝑛−1 𝑛(𝑛 − 1)

Coefficient of Variation for population data:


𝜎
𝐶𝑉𝑎𝑟 = × 100%
𝜇

Coefficient of Variation for sample data:


𝑠
𝐶𝑉𝑎𝑟 = × 100%
𝑥̅

Standard score:
𝑋−𝜇
𝑧=
𝜎

Standard error of the mean:


𝜎
𝜎𝑋̅ =
√𝑛

Central Limit Theorem formula:


𝑋̅ − 𝜇
𝑧=
𝜎/√𝑛

GEN1008 / MED1018 / GED1008 Mid-term Test Page 13 of 17


z confidence interval for means:
𝜎
𝑋̅ ± 𝑧𝛼/2 ( )
√𝑛

t confidence interval for means:


𝑠
𝑋̅ ± 𝑡𝛼/2 ( )
√𝑛

Sample size for means:


𝑧𝛼/2 𝜎 2
𝑛=( )
𝐸

Sample proportion:
𝑥
𝑝̂ =
𝑛
Confidence interval for a proportion:

𝑝̂ (1 − 𝑝̂ )
𝑝̂ ± 𝑧𝛼/2 √
𝑛

Sample size for a proportion:


𝑧𝛼/2 2
𝑛 = 𝑝̂ (1 − 𝑝̂ ) ( )
𝐸

GEN1008 / MED1018 / GED1008 Mid-term Test Page 14 of 17


Statistical tables

GEN1008 / MED1018 / GED1008 Mid-term Test Page 15 of 17


GEN1008 / MED1018 / GED1008 Mid-term Test Page 16 of 17
- The End -

GEN1008 / MED1018 / GED1008 Mid-term Test Page 17 of 17

You might also like