Professional Documents
Culture Documents
Stuvia 509464 St104a Statistics 1 Exams With Commentaries 2011 2018
Stuvia 509464 St104a Statistics 1 Exams With Commentaries 2011 2018
Stuvia 509464 St104a Statistics 1 Exams With Commentaries 2011 2018
commentaries 2011-2018
written by
mreducation
www.stuvia.com
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route for
External Students
Extracts from statistical tables are given after the final question on this paper
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
UL11/0185 Page 2 of 6
D01
(6 marks)
(g) The summary statistics for 2 independent datasets from a population with a
normal distribution are as follows:
Sample size Sample mean Sample standard deviation
𝑥 data 13 4.3 1.2
𝑦 data 21 4.9 1.4
Compute the mean and the variance of the combined dataset.
(6 marks)
(h) Assume that the marks of students at a certain university are normally
distributed with mean 52 and variance 100. Consider a randomly chosen
student from that university and find the probability
i. of failing the class (pass mark is 34).
ii. of obtaining a mark between 60 and 70.
(4 marks)
(i) Define random sampling and quota sampling. Provide an example where you
would prefer one method to the other.
(4 marks)
UL11/0185 Page 3 of 6
D01
UL11/0186 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 3 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
2. (a) It is assumed that there is a linear relationship between the obtained yield
of apple trees and the amount of fertiliser supplied to them. In order to test
this assumption, nine apple trees of the same type were randomly selected and
supplied weekly with a fixed quantity (𝑥 grams) of fertiliser. The yield of each
apple tree (𝑦 kilograms) was recorded.
Tree 1 2 3 4 5 6 7 8 9
𝑥 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
𝑦 3.9 4.3 5.5 6.4 6.9 7.1 7.3 7.7 8.0
UL11/0185 Page 4 of 6
D01
UL11/0186 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 4 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
UL11/0185 Page 5 of 6
D01
UL11/0186 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 5 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
4. (a) The IQ scores for a sample of 30 students who are entering their first year of
high school are shown below:
95 95 97 98 101
102 103 104 105 106
106 107 108 108 110
111 115 115 117 119
119 121 121 126 126
128 133 134 136 142
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean (given that the sum of the data is 3408) and the modal
group.
iii. Find the median and the upper quartile.
iv. Comment on the data given the shape of the histogram and the measures
you have calculated.
(12 marks)
(b) The student union of a large university gathered a random sample of 525
students to determine whether they are in favour of a new grading system.
The results are summarised in the table below:
END OF PAPER
UL11/0185 Page 6 of 6
D01
UL11/0186 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 6 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 7 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 8 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 9 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 10 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 11 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 12 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 13 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 14 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 15 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 16 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 17 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 18 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0186 Page 19 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route for
External Students
Extracts from statistical tables are given after the final question on this paper
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
UL11/0186 Page 2 of 6
D01
(6 marks)
(g) The summary statistics for 2 independent datasets from a population with a
normal distribution are as follows:
Sample size Sample mean Sample standard deviation
𝑥 data 18 5.3 1.0
𝑦 data 15 4.1 1.5
Compute the mean and the variance of the combined dataset.
(6 marks)
(h) Assume that the marks of students at a certain university are normally
distributed with mean 55 and variance 81. Consider a randomly chosen
student from that university and find the probability
i. of getting a first (70 or above).
ii. of obtaining a mark between 50 and 60.
(4 marks)
(i) Define random sampling and cluster sampling. Provide an example where
cluster sampling will be useful.
(4 marks)
UL11/0186 Page 3 of 6
D01
UL11/0187 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 3 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
2. (a) A company would like to predict how its trainees in sales will perform based
on the results of aptitude test that is given to them at the beginning of the
training. The table below contains the test scores (x values) and the values of
the sales for these trainees during the first month of working at the company
(y values in hundreds of dollars).
Salesman 1 2 3 4 5 6 7 8 9
𝑥 1.8 2.6 2.8 3.4 3.6 4.2 4.8 5.2 5.4
𝑦 5.4 6.4 6.0 6.2 6.8 7.0 7.6 7.3 7.6
UL11/0186 Page 4 of 6
D01
UL11/0187 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 4 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
UL11/0186 Page 5 of 6
D01
UL11/0187 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 5 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
4. (a) The IQ scores for a sample of 30 students who are entering their first year of
high school are shown below:
END OF PAPER
UL11/0186 Page 6 of 6
D01
UL11/0187 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page 6 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 7 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 8 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 9 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 10 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 11 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 12 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 13 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 14 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 15 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 16 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 17 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 18 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL11/0187 Page 19 of 19
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2010–11. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Please note that all page references are to the 2011 subject guide.
SECTION A
Question 1
1 2 3 4 6
which gives a median of 3.
1+2+3+4+6
The mean can also be calculated to be 5 = 3.2, which can be used to
calculate the variance.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. If the balls are of different colours, then only one ball is yellow. This can happen if
the first ball was yellow and the second ball was blue (B1 ∩ B2c ), or if the first ball was
blue and the second ball was yellow (B1c ∩ B2 ). Adding the probabilities for these two
cases gives
4 3 3 4 4
P (B1 ∩ B2c ) + P (B1c ∩ B2 ) = · + · = .
7 6 7 6 7
iii. Most candidates had difficulty in this part although it had similarities with (i.) and
(ii.). As before, the best way to start with such exercises is to define the relevant
events. In this case we have
• A : Test positive.
• B : Person has HIV.
The next step is to write down what is given for the above events, or their
complements, or combinations of events. In our case, for example, 10% of people have
HIV, so P (B) = 0.1. Another way that information can be given is through
conditional probabilities. Typical phrases to identify such cases are ‘given ..., the
probability of ... is’ or ‘if ..., then the probability of ... is’ etc. In this question we are
told that if the person has HIV (or else, given B) the diagnostic test is correct (hence
positive, or else A) with probability 90%. This means that P (A|B) = 0.9. Similarly,
we obtain that if the person does not have HIV (given B c ) the test is correct (hence
negative, or else Ac ) which leads to P (Ac |B c ) = 0.95.
P (A|B c )P (B c )
P (B c |A) = ,
P (A|B c )P (B c ) + P (A|B)P (B)
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
(1 − 0.95) × 0.9 1
P (B c |A) = = .
(1 − 0.95) × 0.9 + 0.9 × 0.1 3
(8 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
Question 2
6
5
4
1 2 3 4 5
x: grams of fetiliser
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a title (‘Scatter diagram’ alone will not suffice) and labelled axes
which also give their units. Far too many candidates threw away marks by neglecting
these points and consequently were only given one mark out of the possible four allocated
for this part of the question. Another common way of losing marks was failing to use the
graph paper which was provided, and required, in the question. Candidates who drew on
the ordinary paper in their booklet were not awarded marks for this part of the question.
(4 marks)
ii. The regression line can be written by the equation ŷ = a + bx or y = a + bx + . The
formula for b is P
xi yi − nx̄ȳ
b= P 2 ,
xi − nx̄2
and by substituting the summary statistics we get b = 1.03.
The formula for a is a = ȳ − bx̄, so we get a = 3.25.
Hence the regression line can be written as ŷ = 3.25 + 1.03x or y = 3.25 + 1.03x + .
(5 marks)
iii. The prediction will be ŷ = 3.25 + 1.03 × 3.2 = 6.55 kilograms. One mark was deducted in
cases where the units of measurement were not given.
(2 marks)
iv. This could be a good idea due to a strong, positive, linear relationship but requires
extrapolation, so it has to be applied with caution. Answers such as ‘No, because it
requires extrapolation’ were given half credit whereas answers saying yes, but without
mentioning extrapolation, were not given any credit.
(2 marks)
H0 : µA = µB vs H1 : µA 6= µB .
to find the test statistic value: 2.313 (or 2.289 if pooled variance used). The critical
values, assuming a normal approximation as the number of observations is large, are
±1.96. If a t-distribution with 70 degrees of freedom is assumed, we have t = 2.00 (using
60 degrees of freedom, the nearest value in the table). Taking 5%, we reject the null
hypothesis and there is therefore evidence for a difference between the two. If we take an
α of 1%, the critical values are ±2.576, so we do not reject H0 . We conclude that there is
some evidence of a difference between the brands.
(7 marks)
ii. The assumptions for (ii.) were that:
2 2
• Assumption about whether σA = σB .
• Assumption about whether nA + nB − 2 is ‘large’, hence t v. z.
• Assumption about independent samples.
(2 marks)
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. In this case the question was whether the mean life of the tyres of brand B is longer than
that of the brand A tyres. Hence the hypotheses are
H0 : µA = µB vs H1 : µA < µB .
The statistic to use is the same as before. However, the critical values will be different.
We conclude that the result is highly significant so there is evidence that the mean life of
brand B is longer. The z-values are ≈ 1.645 for 5% and ≈ 2.32 for 1%.
(3 marks)
Question 3
Work out the expected values. For example, you should work out the expected value, if
there is no association, for the students below grade that attended pre-school as:
(30/100) × 57 = 17.1. Repeat for each cell to get the table below.
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
• Students who attended pre-school are more likely to obtain grade algebra marks than
students who did not.
• Pre-school attendance reduces the chances of a below grade level algebra mark.
There were some excellent answers to this, but many candidates ignored this part of the
question. (4 marks)
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Question 4
Histogram of IQScores
0.03
Frequency Densities
0.02
0.01
0.00
IQ scores
ii. The mean can be found to be 113.6, whereas the modal group is the one between IQ
scores of 100 and 110.
iii. The median is 110.5, whereas the upper quartile 121. The median had to be exactly
110.5, but for the upper quartile similar values based on different interpolations were also
accepted.
iv. There is positive (right) skewness in the distribution of the data. Most of the IQ scores
are around 100 and 110.
(12 marks)
H0 : π1 = π2 vs H1 : π1 6= π2 .
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
s
1 1
s.e.(p1 − p2 ) = 0.6495 × 0.3505 + = 0.043
325 200
r
0.68 · 0.32 0.6 · 0.4
or = + = 0.043.
325 200
The test statistic value is 1.860. For α = 0.05, the critical values are ±1.96, so we do not
reject H0 at the 5% level.
We therefore choose a larger second α to be 10%, which gives critical values of ±1.645.
We therefore reject H0 at this level and conclude that there is weak evidence of a
difference in the proportions in favour of the new grading system between students in
humanities and science.
Candidates got full marks for this question if they either:
• provided an interpretation of the findings saying that ‘Students in humanities are
more in favour of the new grading system than students in science’, or
• justified the use of the normal distribution by the large sample.
. (9 marks)
ii. This asks for a 97% confidence interval. The normal distribution may be used as before.
The working is given below:
• Confidence interval formula: (p1 − p2 ) ± zα/2 × s.e.(p1 − p2 ).
• z-value: 2.17.
• End-points: 0.08 ± 2.17 × 0.043.
• Report as an interval: (−0.013, 0.173).
. (4 marks)
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2010–11. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Please note that all page references are to the 2011 subject guide.
SECTION A
Question 1
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. If the balls are of different colours, then only one ball is red. This can happen if the
first ball was red and the second ball was green (B1 ∩ B2c ), or if the first ball was
green and the second ball was red (B1c ∩ B2 ). Adding the probabilities for these two
cases gives
5 4 4 5 5
P (B1 ∩ B2c ) + P (B1c ∩ B2 ) = · + · = .
9 8 9 8 9
iii. Most candidates had difficulty in this part although it had similarities with (i.) and
(ii.). As before, the best way to start with such exercises is to define the relevant
events. In this case we have
• A : Test positive.
• B : Person has HIV.
The next step is to write down what is given for the above events, or their
complements, or combinations of events. In our case, for example, 5% of people have
HIV, so P (B) = 0.05. Another way that information can be given is through
conditional probabilities. Typical phrases to identify such cases are ‘given ..., the
probability of ... is’ or ‘if ..., then the probability of ... is’ etc. In this question we are
told that if the person has HIV (or else, given B) the diagnostic test is correct (hence
positive, or else A) with probability 95%. This means that P (A|B) = 0.95. Similarly,
we obtain that if the person does not have HIV (given B c ) the test is correct (hence
negative, or else Ac ) which leads to P (Ac |B c ) = 0.90.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
(8 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
• Random sampling: Each unit has a known, non-zero probability of being selected.
• Cluster sampling: Roughly speaking, random sampling within a cluster/subgroup of
the population (that usually has also been chosen at random). See also p.142 of the
subject guide.
Regarding the example, one could mention any kind of sample survey, e.g. data by
population density, age, and income within London boroughs in order to decide where
to locate new convenience stores. An advantage of random sampling is that it allows for
more accurate statistical methodology, whereas quota sampling surveys are easier to
conduct and have lower cost.
(4 marks)
SECTION B
Question 2
6.5
6.0
5.5
x:aptitude score
(4 marks)
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
H0 : µA = µB vs H1 : µA 6= µB .
Use the test statistic formula:
x̄ − ȳ x̄ − ȳ
q or q
s2A s2B s2p s2p
nA + nB np + n2
to find the test statistic value: 2.934 (or 3.071 if pooled variance used). The critical
values, assuming a normal approximation as the number of observations is large, are
±1.96. If a t-distribution with 70 degrees of freedom is assumed, we have t = 2.00 (using
60 degrees of freedom, the nearest value in the table). Taking 5%, we reject the null
hypothesis and there is therefore evidence for a difference between the two. If we take an
α of 1%, the critical values are ±2.576, so we do not reject H0 . We conclude that there is
some evidence of a difference between the brands.
(7 marks)
ii. The assumptions for (ii.) were that:
2 2
• Assumption about whether σA = σB .
• Assumption about whether nA + nB − 2 is ‘large’, hence t v. z.
• Assumption about independent samples.
(2 marks)
iii. In this case the question was whether the mean life of the tyres of brand B is longer than
that of the brand A tyres. Hence the hypotheses are
H0 : µA = µB vs H1 : µA < µB .
The statistic to use is the same as before in absolute value but has a different sign, it is
2.934 (or 3.071 if pooled variance used). The critical values take a positive value for any
significance level (≈ 1.645 for 5%), so we do not reject the hypothesis that the life of
brand B is longer.
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
This bit was a little confusing as the sample mean of brand B was in fact smaller. Some
candidates tested the hypothesis
H0 : µA = µB vs H1 : µB < µA
as they thought it might have been a more interesting question. Usually this is not
allowed and candidates should answer the question as set. But given the peculiarity of
this case these candidates were awarded full marks if they carried out their test correctly.
(3 marks)
Question 3
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
• Pre-school attendance reduces the chances of a below grade level algebra mark.
There were some excellent answers to this, but many candidates ignored this part of the
question. (4 marks)
Question 4
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
9 | 566899
10 | 112234477
11 | 1233457
12 | 13479
13 | 15
14 | 3
ii. The mean can be found to be 111.4, whereas the modal group is the one between IQ
scores of 100 and 110.
iii. The median is 109, whereas the lower quartile 101.25. The median had to be exactly 109,
but for the lower quartile similar values based on different interpolations were also
accepted.
iv. There is positive (right) skewness in the distribution of the data. Most of the IQ scores
are around 100 and 110.
(12 marks)
H0 : π1 = π2 vs H1 : π1 6= π2 .
s
1 1
s.e.(p1 − p2 ) = 0.55 × 0.45 + = 0.045
225 275
r
0.4889 · 0.5111 0.6 · 0.4
or = + = 0.045
225 275
The test statistic value is 2.495. For α = 0.05, the critical values are ±1.96, so we reject
H0 at the 5% level.
We therefore choose a smaller second α to be 1%, which gives critical values of ±2.576.
We therefore do not reject H0 at this level and conclude that there is some evidence of a
difference in the proportions in favour of the new grading system between males and
females.
Candidates got full marks for this question if they either:
• provided an interpretation of the findings saying that ‘Females are more in favour of
the new grading system than males’, or
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
04a Statistics 1
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route
Friday,##
[Day], 4 May 20122012
[Month] : 10.00am to 12.00pm
: ##.##Xm to ##.##Xm
A list of formulae and extracts from statistical tables are given after the final question on this
paper.
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
1. (a) The following data represent different types of variables. Classify each one of
them as measurable (continuous) or categorical. If a variable is categorical,
further classify it as nominal or ordinal. Justify your answer. (Note that no
marks will be awarded without justification.)
i. The education level for a number of employees from a company (elementary
school, high school, university or postgraduate degree).
ii. The blood pressure from 30 hospital patients.
iii. The hair colour of 50 persons.
iv. The weights of 30 randomly selected cereal boxes.
(8 marks)
(b) The table below contains the number of graduates from eight high schools
in a particular year that are pursuing a university degree in humanities and
sciences:
Sciences: 65 76 104 67 75 88 77 116
Humanities: 46 65 76 50 72 51 40 87
i. Find the mean and the median for the number of students in each category
of degree.
ii. Find the lower quartile of the number of students in sciences and the upper
quartile for the number of students in humanities.
iii. Calculate the Spearman rank correlation coefficient and interpret its value.
(13 marks)
(c) A test is taken by some students, their marks are recorded and we are
interested in the properties of the sample mean. Under the assumption that
the marks follow a Normal distribution with exact mean 60 and variance 81,
calculate the probability that the mark of a randomly selected student
i. is greater than 59.5 exactly; and
ii. lies between 59 and 60.5 exactly.
(4 marks)
(d) A sample of 180 students was taken and each student was questioned regarding
their preferences for a number of courses. The course in Mathematics was
chosen by 65 students. Calculate a 95% confidence interval for the proportion
of students in favour of Mathematics in the population.
(3 marks)
UL12/0217 Page 2 of 6
D00
i=5
i=4
i=3
i. 3(xi − 1) ii. xi yi iii. xi (yi − 2)
i=3 i=2 i=1
(6 marks)
x 0 1 3 5
pX (x) .5 .2 .2 .1
i. Find the probability that X is larger than 2.
ii. Find the expected value of X, E(X).
(4 marks)
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. A 95% confidence interval for the mean is wider than a 90% one when
obtained from the same data.
ii. A p-value is the probability of the alternative hypothesis being true.
iii. As the p-value becomes larger the null hypothesis becomes more plausible.
(6 marks)
(i) Provide an example where response bias may occur. Be brief in explaining why
response bias may occur.
(2 marks)
UL12/0217 Page 3 of 6
D00
SECTION B
UL12/0217 Page 4 of 6
D00
Gender Sample size Working hours per day Sample standard deviation
Males 41 9.0 1.9
Females 29 7.5 1.1
UL12/0217 Page 5 of 6
D00
39 40 44 47 32
37 25 71 56 33
64 63 42 43 34
25 28 35 24 45
35 22 53 55 36
46 46 27 27 38
END OF PAPER
UL12/0217 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − μ
Z= √ X̄ − μ
σ/ n t= √
S/ n
1
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this7document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (μ1 − μ2 )
Z∼
= Z= 2
π(1−π) σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
(X̄1 − X̄2 ) − (μ1 − μ2 ) 1 1
t= 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=
n
P (1 − P ) n11 + n12
n1 + n2 p1 (1 − p1 ) p2 (1 − p2 )
(p1 − p2 ) ± z +
n1 n2
a = ȳ − bx̄
2
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this8document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this9document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 10document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 11document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 12document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 13document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 14document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 15document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 16document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 17document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 18document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 19document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 20document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0217 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 21document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route
A list of formulae and extracts from statistical tables are given after the final question on this
paper.
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
1. (a) The following data represent different types of variables. Classify each one of
them as measurable (continuous) or categorical. If a variable is categorical,
further classify it as nominal or ordinal. Justify your answer. (Note that no
marks will be awarded without justification.)
i. The amount of time it takes each of 15 telephone installers to hook up a
wall phone.
ii. The style of music preferred by each of 30 randomly selected radio listeners.
iii. The lengths of 50 randomly selected cars.
iv. The classification of a student (First, Upper Second, Lower Second, Third,
Pass, Fail) in the course 04a: Statistics 1.
(8 marks)
(b) The number of raisins in each of 16 mini boxes for two brands are shown below:
Brand A: 22 27 20 29 24 31 25 26
Brand B: 26 29 25 33 24 35 31 27
i. Find the mean and the mode for each brand.
ii. Find the upper quartile of Brand A and the lower quartile of Brand B.
iii. The mini boxes were made in 8 different machines corresponding to each
column in the table above. Calculate the Spearman rank correlation
coefficient and interpret its value.
(13 marks)
(c) A test is taken by some students, their marks are recorded and we are
interested in the properties of the sample mean. Under the assumption that
the marks follow a Normal distribution with exact mean 65 and variance 144,
calculate the probability that the mark of a randomly selected student
i. is greater than 67.5 exactly; and
ii. lies between 63 and 67 exactly.
(4 marks)
(d) A sample of 160 students was taken and each student was questioned regarding
their preferences for a number of courses. The course in Economics was chosen
by 75 students. Calculate a 95% confidence interval for the proportion of
students in favour of Economics in the population.
(3 marks)
UL12/0218 Page 2 of 6
D00
i=4
i=5
i=5
i. 2(xi − 2) ii. (xi + yi ) iii. xi (yi − 3)
i=1 i=3 i=4
(6 marks)
x 1 3 4 6
pX (x) .2 .3 .4 .1
i. Find the probability that X is an odd number.
ii. Find the expected value of X, E(X).
(4 marks)
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. A 95% confidence interval for the mean is wider than a 99% one when
obtained from the same data.
ii. A p-value is the probability of not rejecting the null hypothesis.
iii. As the value of a chi-squared test statistic becomes larger, the associated
p-value becomes smaller.
(6 marks)
(i) Provide an example where selection bias may occur. Be brief in explaining why
selection bias may occur.
(2 marks)
UL12/0218 Page 3 of 6
D00
SECTION B
UL12/0218 Page 4 of 6
D00
3. (a) We are interested in assessing the potential impact of the growth rate (X) of
the Gross National Product (GNP) on the birth rate (Y ) of a country. The
table below provides data for these quantities for 12 countries:
UL12/0218 Page 5 of 6
D00
19 20 21 21 22
22 22 22 23 23
23 23 23 23 24
24 24 24 24 25
25 25 25 25 26
26 26 27 27 28
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean, the median, the interquartile range and the modal group.
iii. Comment on the data given the shape of the histogram and the measures
you have calculated.
(13 marks)
END OF PAPER
UL12/0218 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ
Z= √ X̄ − µ
σ/ n t= √
S/ n
1
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this7document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (µ1 − µ2 )
Z∼
=q Z=
π(1−π)
q 2
σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
(X̄1 − X̄2 ) − (µ1 − µ2 )
1 1
t= r 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=r
n
P (1 − P ) n11 + n12
a = ȳ − bx̄
2
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this8document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution of this9document
of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 10document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 11document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 12document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 13document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 14document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 15document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 16document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 17document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 18document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 19document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 20document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL12/0218 Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
D01 Page
Distribution 21document
of this of 21 is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route
Statistics 1
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
A list of formulae and extracts from statistical tables are provided after the final question on this
paper.
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
(b) The table below contains the ages of the volunteers for a project in two different
years:
2011 20 18 38 18 20 18
2012 20 22 18 22 20 22 24 22 20
i. Find the mean mark and the median mark for each year.
ii. Calculate the range of the marks for each year and give an explanation for
any differences you find.
iii. Calculate the standard deviation of the marks for each year and give an
explanation for any differences you find.
iv. Comment on the differences in the mean and median for the two years
that you found in part i. For this data set, which do you think would give
a better description of the difference in marks: the mean or the median?
Explain briefly.
[12 marks]
(d) We would like to design a survey to estimate the average number of hours
university students spend studying per week. How many students must we
randomly select to be 95 percent confident that the sample mean is within 2
hours of the population mean? Assume that a previous survey has shown that
the standard deviation of hours spent studying is 6.95 hours. [3 marks]
UL12/0217 Page 2 of 6
D00
5
5
3
i. xi ii. 2xi (yi + 1) iii. x22 + (xi + yi3)
i=1 i=2 i=1
[6 marks]
(f) In an introductory economics class, the numbers of males and females are 16
and 24, respectively.
i. A student is selected randomly from the class. What is the probability the
student is female?
ii. A student is selected at random and removed from the class. A second
student is then selected. What is the probability that one of the students
is male and the other is female?
iii. What is the probability that the second student is male, given that the
first student is female and removed from the class?
iv. In previous years it was found that 80% of males pass the exam and 85%
of females pass the examination. Based on the available information, find
the probability that a student who passes the exam is female.
[8 marks]
(g) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In an observational study, a control group provides an essential tool to
establish causal relationships.
ii. If two variables are correlated we can conclude that one causes the other.
iii. The mean income of British households can be expected to be larger than
the median income of British households.
[6 marks]
(h) In the context of sampling, explain the difference between item non-response
and unit non-response. [3 marks]
UL12/0217 Page 3 of 6
D00
SECTION B
2. (a) A social survey in the United States asked subjects, ‘Would you say that home-
opathy is very scientific, sort of scientific, or not at all scientific?’ The table
below cross-classifies their responses with their highest level of education.
Homeopathy is scientific
Highest degree Very Sort of Not at all Total
Less than High school 46 (11%) 168 (41%) 196 (48%) 410 (100%)
High school 100(5%) 572 (31%) 1148 (63%) 1820 (100%)
College or higher 32 (2%) 248 (18%) 1076 (79%) 1356 (100%)
Total 178(5%) 988 (28%) 2420 (67%) 3586 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between education and opinion on
whether or not homeopathy is scientific? [4 marks]
2
ii. Calculate the χ statistic and use it to test for independence, using a 1%
significance level. What do you conclude? [9 marks]
(b) i. Define each of the following:
– Simple random sampling
– Stratified random sampling.
[4 marks]
ii. Why might a researcher prefer to take a stratified random sample rather
than a simple random sample? Give two reasons. [3 marks]
iii. You have been asked to design a nation-wide survey in your country to find
out about the smoking habits of adults. Give two stratification factors you
might use, and explain why you have chosen them. [5 marks]
UL12/0217 Page 4 of 6
D00
3. The level of infant mortality (y) is represented by the number of baby deaths for
every 1000 births. For 12 areas these are shown in the following table. For each
area, the percentage (x) of babies born into families earning at least £25,000 is also
shown.
Area A B C D E F G H I J K L
Percentage (x) 20 6 10 21 12 36 6 19 26 13 21 16
Infant mortality (y) 5 17 16 8 15 5 25 12 11 11 7 12
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully. [4 marks]
ii. Calculate the sample correlation coefficient. Interpret your findings.
[3 marks]
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram. [4 marks]
iv. Using the equation you found in iii., obtain the predicted infant mortality
for an area where 38% of babies are born into families earning at least
£25,000. Do you think this value is realistic? Justify your answer.[2 marks]
(b) A survey is conducted to compare public local attitudes towards environmental
policies. A number of people in two areas of interest are sampled, and asked
if they are satisfied with their local environmental policy. The results of this
survey are shown in the following table.
UL12/0217 Page 5 of 6
D00
4. (a) i. Carefully construct a box plot on the graph paper provided to display the
following yearly incomes of a group of people, measured in £1000:
9 6 12 24 21 57 6 15 9 12 30 36
[8 marks]
ii. Based on the shape of the box plot you have drawn, describe the
distribution of the data. [2 marks]
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices. [3 marks]
(b) A new treatment has been devised with the aim of reducing blood pressure
for people with high blood pressure. Each participant’s blood pressure was
measured before and after the program to see if the treatment is effective. The
following data were obtained:
Before After
177 174
142 146
146 144
162 159
145 145
162 163
152 156
154 150
171 172
i. Carry out an appropriate hypothesis test to determine whether the
treatment is effective for reducing blood pressure. State the test
hypotheses, and specify your test statistic and its distribution under the
null hypothesis. Comment on your findings. [6 marks]
ii. State any assumptions you made. [2 marks]
iii. Give a 90% confidence interval for the difference in means. [2 marks]
iv. On the basis of the data alone, would you recommend the programme to
a friend who suffers from high blood pressure? Explain why or why not.
[2 marks]
END OF PAPER
UL12/0217 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − μ
Z= √ X̄ − μ
σ/ n t= √
S/ n
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 7 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (μ1 − μ2 )
Z∼
= Z= 2
π(1−π) σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
(X̄1 − X̄2 ) − (μ1 − μ2 ) 1 1
t= 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=
n
P (1 − P ) n11 + n12
n1 + n 2 p1 (1 − p1 ) p2 (1 − p2 )
(p1 − p2 ) ± z +
n1 n2
a = ȳ − bx̄
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 8 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 9 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 10 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 11 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 12 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 13 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 14 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 15 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 16 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 17 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 18 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 19 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 20 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0207 Page 21 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the
Social Sciences, the Diplomas in Economics and Social Sciences and Access Route
Statistics 1
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
A list of formulae and extracts from statistical tables are provided after the final question on this
paper.
Graph paper is provided at the end of this question paper. If used, it must be detached and
fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply in all
respects with the specification given with your Admission Notice. The make and type of
machine must be clearly stated on the front cover of the answer book.
SECTION A
(b) The table below contains the marks (out of 20) of all students taking an
examination for the same course in two years:
2011 10 9 19 9 10 9
2012 10 11 9 11 10 11 12 11 10
i. Find the mean mark and the median mark for each year.
ii. Calculate the range of the marks for each year and give an explanation for
any differences you find.
iii. Calculate the standard deviation of the marks for each year and give an
explanation for any differences you find.
iv. Comment on the differences in the mean and median for the two years
that you found in part i. For this data set, which do you think would give
a better description of the difference in marks: the mean or the median?
Explain briefly.
[12 marks]
(d) We would like to start an internet service provider and need to estimate the
average weekly internet usage of households for our business plan. Internet
usage is measured in minutes. How many households must we randomly select
to be 95 percent confident that the sample mean is within 2 minutes of the
population mean? Assume that a previous survey of household usage has shown
that the standard deviation of internet usage is 6.95 minutes. [3 marks]
UL12/0217 Page 2 of 6
D00
5
5
3
i. xi ii. 2xi (yi + 1) iii. x22 + (xi + yi3)
i=1 i=2 i=1
[6 marks]
(f) In an introductory statistics class, the numbers of males and females are 17
and 23, respectively.
i. A student is selected randomly from the class. What is the probability the
student is female?
ii. A student is selected at random and removed from the class. A second
student is then selected. What is the probability that one of the students
is male and the other is female?
iii. What is the probability that the second student is male, given that the
first student is female and removed from the class?
iv. In previous years it was found that 80% of males pass the exam and 85%
of females pass the examination. Based on the available information, find
the probability that a student who passes the exam is female.
[8 marks]
(g) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. An important difference between an experimental design and an
observational study is that in an observational study data are collected
on units without any intervention.
ii. If two variables are correlated we can conclude that one causes the other.
iii. If a variable has a symmetric distribution, its mean and median are the
same.
[6 marks]
(h) In the context of sampling, explain the difference between item non-response
and unit non-response. [3 marks]
UL12/0217 Page 3 of 6
D00
SECTION B
2. (a) The 2006 General Social Survey in the United States asked subjects, ‘Would
you say that astrology is very scientific, sort of scientific, or not at all
scientific?’ The table below cross-classifies their responses with their highest
level of education.
Astrology is scientific
Highest degree Very Sort of Not at all Total
Less than High school 23 (11%) 84 (41%) 98 (48%) 205 (100%)
High school 50 (5%) 286 (31%) 574 (63%) 910 (100%)
College or higher 16 (2%) 124 (18%) 538 (79%) 678 (100%)
Total 89 (5%) 494 (28%) 1210 (67%) 1793 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between education and opinion on
whether or not astrology is scientific? [4 marks]
2
ii. Calculate the χ statistic and use it to test for independence, using a 5%
significance level. What do you conclude? [9 marks]
(b) i. Define each of the following:
– Simple random sampling
– Stratified random sampling.
[4 marks]
ii. Why might a researcher prefer to take a stratified random sample rather
than a simple random sample? Give two reasons. [3 marks]
iii. You have been asked to design a nation-wide survey in your country to find
out about the smoking habits of adults. Give two stratification factors you
might use, and explain why you have chosen them. [5 marks]
UL12/0217 Page 4 of 6
D00
3. The level of infant mortality (y) is represented by the number of baby deaths for
every 1000 births. For 12 areas these are shown in the following table. For each
area, the percentage (x) of babies born into families earning at least £25,000 is also
shown.
Area A B C D E F G H I J K L
Percentage (x) 19 5 9 20 11 35 5 18 25 12 20 15
Infant mortality (y) 3 15 14 6 13 3 23 10 9 9 5 10
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully. [4 marks]
ii. Calculate the sample correlation coefficient. Interpret your findings.
[3 marks]
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram. [4 marks]
iv. Using the equation you found in iii., obtain the predicted infant mortality
for an area where 34% of babies are born into families earning at least
£25,000. Do you think this value is realistic? Justify your answer.[2 marks]
(b) A survey is conducted to compare public attitudes towards local policing. A
number of people in two areas of interest are sampled, and asked if they are
satisfied with their local police-community relationship. The results of this
survey are shown in the following table.
UL12/0217 Page 5 of 6
D00
4. (a) i. Carefully construct a box plot on the graph paper provided to display the
following yearly incomes of a group of people, measured in £1000:
3 2 4 8 7 19 2 5 3 4 10 12
[8 marks]
ii. Based on the shape of the box plot you have drawn, describe the
distribution of the data. [2 marks]
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices. [3 marks]
(b) A new fitness programme is devised for obese people. Each participant’s weight
in kg was measured before and after the program to see if the fitness program
is effective in reducing their weights. The following data were obtained:
Before After
145 143
116 120
120 118
133 130
119 119
133 134
125 128
126 123
140 141
i. Carry out an appropriate hypothesis test to determine whether the fitness
programme is effective for reducing weight. State the test hypotheses, and
specify your test statistic and its distribution under the null hypothesis.
Comment on your findings. [6 marks]
ii. State any assumptions you made. [2 marks]
iii. Give an 80% confidence interval for the difference in means. [2 marks]
iv. On the basis of the data alone, would you recommend the programme to
a friend who wants to lose weight? Explain why or why not. [2 marks]
END OF PAPER
UL12/0217 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − μ
Z= √ X̄ − μ
σ/ n t= √
S/ n
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 7 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (μ1 − μ2 )
Z∼
= Z= 2
π(1−π) σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
(X̄1 − X̄2 ) − (μ1 − μ2 ) 1 1
t= 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=
n
P (1 − P ) n11 + n12
n1 + n 2 p1 (1 − p1 ) p2 (1 − p2 )
(p1 − p2 ) ± z +
n1 n2
a = ȳ − bx̄
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 8 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 9 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 10 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 11 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 12 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 13 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 14 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 15 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 16 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 17 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 18 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 19 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 20 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL13/0208 Page 21 of 21
Distribution of this document is illegal
D1
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2012–13. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
A change that took place from 2011–12 onwards is the presence of a formula sheet. The purpose of
this change is to encourage candidates to devote more time in understanding the key concepts of the
syllabus rather than memorising a big number of formulae. Nevertheless, candidates should not rely
on this formula sheet entirely but only use it for verification. The formula sheet is available on the
virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2011).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
By the end of this course, and having completed the Essential reading and activities, you should:
• be familiar with the key ideas of statistics that are accessible to a candidate with a
moderate mathematical competence
• be able to routinely apply a variety of methods for explaining, summarising and presenting
data and interpreting results clearly using appropriate diagrams, titles and labels when
required
• be able to summarise the ideas of randomness and variability, and the way in which these
link to probability theory to allow the systematic and logical collection of statistical
techniques of great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to perform inference to test the significance of common measures such as means and
proportions and conduct chi-squared tests of contingency tables
• be able to use simple regression and correlation analysis and know when it is appropriate to
do so.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory which covers several subquestions and accounts for 50 per cent of the total marks.
Section B contains three questions, each worth 25 per cent, from which you are asked to choose two.
Remember that each of the Section B questions is likely to cover more than one topic. In 2013, for
example, the first part of Question 2 asked for a chi-squared test and survey design problems
appeared in the second. The first part of Question 3 was on regression and involved drawing a
diagram, while the second part was a hypothesis test comparing population means using the sample
data given. Question 4 had a series of questions involving drawing diagrams, hypothesis testing and
confidence intervals. This means that it is really important that you make sure you have a
reasonable idea of what topics are covered before you start work on the paper! We suggest you
divide your time as follows during the examination:
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Don’t allow yourself to get stuck on any one
question, but don’t just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams which were required and, if you did more than the
two questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are!
The Examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
• understand and answer the questions set.
You are not expected to write long essays where explanations or descriptions of sample design
are required, and note form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2013 examinations!
Remember:
• If you are asked to label a diagram (which is almost always the case!), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x and y axes?
• If you are specifically asked to carry out a hypothesis test, or a confidence interval, do so. It
is not acceptable to do one rather than the other! If you are asked to find a 5% value, this is
what will be marked.
• Do not waste time calculating things which are not required by the Examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not help your marks.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Examiners’ commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the Examiners were looking for
• the relevant detailed reference to Newbold (seventh edition) and the subject guide (2011)
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold.
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
Question spotting
Many candidates are disappointed to find that their examination performance is poorer
than they expected. This can be due to a number of different reasons and the Examiners’
commentaries suggest ways of addressing common problems and improving your performance.
We want to draw your attention to one particular failing – ‘question spotting’, that is,
confining your examination preparation to a few question topics which have come up in past
papers for the course. This can have very serious consequences.
We recognise that candidates may not cover all topics in the syllabus in the same depth, but
you need to be aware that Examiners are free to set questions on any aspect of the syllabus.
This means that you need to study enough of the syllabus to enable you to answer the required
number of examination questions.
The syllabus can be found in the ‘Course information sheet’ in the section of the VLE dedicated
to this course. You should read the syllabus very carefully and ensure that you cover sufficient
material in preparation for the examination.
Examiners will vary the topics and questions from year to year and may well set questions that
have not appeared in past papers – every topic on the syllabus is a legitimate examination
target. So although past papers can be helpful in revision, you cannot assume that topics or
specific questions that have come up in past examinations will occur again.
If you rely on a question spotting strategy, it is likely you will find yourself in
difficulties when you sit the examination paper. We strongly advise you not to
adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2012–13. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
A change that took place from 2011–12 onwards is the presence of a formula sheet. The purpose of
this change is to encourage candidates to devote more time in understanding the key concepts of the
syllabus rather than memorising a big number of formulae. Nevertheless, candidates should not rely
on this formula sheet entirely but only use it for verification. The formula sheet is available on the
virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2011).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) The table below contains the ages of the volunteers for a project in two different
years:
2011 20 18 38 18 20 18
2012 20 22 18 22 20 22 24 22 20
i. Find the mean age and the median age for each year.
ii. Calculate the range of the ages for each year and give an explanation for any
differences you find.
iii. Calculate the standard deviation of the ages for each year and give an
explanation for any differences you find.
iv. Comment on the differences in the mean and median for the two years that
you found in part i. For this data set, which do you think would give a better
description of the difference in ages: the mean or the median? Explain
briefly.
(12 marks)
Reading for this question
This question contains material mostly from Chapter 3 of the subject guide and in
particular Section 3.8 (Measures of location) for parts (i) and (iv), and Section 3.9
(Measures of spread) for parts (ii) and (iii).
Approaching the question
It is important to do the summation carefully and divide by the correct number of
observations to obtain the mean. For questions that require calculations on the median (or
other percentiles like quartiles), a good strategy is to write the observations in order. Note
also that this question requires these measures for both years, so the calculations should be
done for each year separately.
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. In order to calculate the two means, you should sum the numbers corresponding to each
year and then divide them by the number of observations in each row. Doing so yields
and
(20 + 22 + 18 + 22 + 20 + 22 + 24 + 22 + 20)/9 = 21.11, for 2012.
For the median if we put the numbers in ascending order we get
and
18 20 20 20 22 22 22 22 24, for 2012.
The median for 2011 is given by taking the average between the 3rd and the 4th number
in the first of the rows above, resulting in a value of (18 + 20)/2 = 19. The median for
2012 is obtained from the 5th number in the 2nd row above, which is 22.
ii. Note that the range of a variable equals the difference between the maximum value and
the minimum value. Hence, the range for 2011 was 38 − 18 = 20, whereas the range for
2012 was 24 − 18 = 6. Some candidates answered ‘from 18 to 38’. While this is true, note
that it does not correspond to the definition of the range so it is essential to give the
numbers 20 (2011) and 6 (2012) in your answer.
It is also essential to comment on the different ranges between 2011 and 2012. The
difference is big and is caused by the outlier 38 in 2011.
Some candidates confused ‘Range’ and ‘Interquartile range’. Make sure that you identify
what is being asked.
iii. In order to answer this question, candidates should be familiar with Section 3.9.3 (on
variance and standard deviation) and the chapter activities. It is very important to show
your work with relevant summations of the squared deviation from the mean. In this way
you may get some marks even if the numerical answer is wrong as you are demonstrating
knowledge of the method. The answer for 2011 is 7.90, whereas for 2012 it is 1.76.
It is also essential to comment on the different ranges between 2011 and 2012. The
difference is big and is caused by the outlier 38 in 2011.
iv. The mean is higher in 2011 but the median is higher in 2012. This can be attributed to
the fact that 2011 contains an outlier (38) which results in a high mean. Apart from this
outlier, ages tend to be higher in 2012, so the median gives a somewhat better indication
of the ‘typical’ age for each year.
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(d) We would like to design a survey to estimate the average number of hours
university students spend studying per week. How many students must we
randomly select to be 95 percent confident that the sample mean is within 2
hours of the population mean? Assume that a previous survey has shown that
the standard deviation of hours spent studying is 6.95 hours.
(3 marks)
Reading for this question
All of Chapter 6 is relevant, but the main reading for this question can be found in Section
6.1 (Choosing a sample size). It is essential to read this section carefully and attempt the
activities and exercises.
Approaching the question
This question asks you to determine a sample size. This is straightforward once the
distribution is identified. Since the sample size is large, a normal distribution can be used.
The working is given below:
• Identify the correct z-value: 1.96.
• Solve
σ
1.96 √ = 2.
n
We can take σ = 6.95 to find n = 46.38.
• Round up to n = 47.
Some candidates forgot to round up. Remember that you are asked about a sample size.
5
X 5
X 3
X
i. xi ii. 2xi (yi + 1) iii. x22 + (xi + yi3 )
i=1 i=2 i=1
(6 marks)
Reading for this question
This question refers to the basic bookwork which can be found in Section 1.9 of the subject
guide and, in particular, in Activity A1.6.
Approaching the question
Be careful to leave the xs and ys in the order given and only cover the values of i asked for.
This question was generally done well; the answers are:
P5
i. i=1 xi = 4 + (−3) + 5 + 0 + 3 = 9. (1 mark)
P5 P5
ii. i=2 2xi (yi + 1) = 2 i=2 xi (yi + 1) =
2(−3 × (2 + 1) + 5 × (1 + 1) + 0 × (0 + 1) + 3 × (1 + 1)) = 2 × 7 = 14. (2 marks)
3
iii. x22 + i=1 (xi + yi3 ) = (−3)2 + (4 + 33 ) + (−3 + 23 ) + (5 + 13 ) = 9 + 29 + 5 + 7 = 51.
P
(3 marks)
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(g) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In an observational study, a control group provides an essential tool to
establish causal relationships.
ii. If two variables are correlated we can conclude that one causes the other.
iii. The mean income of British households can be expected to be larger than
the median income of British households.
(6 marks)
Reading for this question
This question contains material from various parts of the subject guide. Here, it is more
important to have a good intuitive understanding of the relevant concepts than a technical
level of knowledge in computations. Part (i) requires material from Chapter 10 and, in
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
particular, the sections on observational studies and designed experiments. Part (ii) is about
correlation and causation detailed in Section 11.7 of the subject guide. Finally part (iii)
targets the material covered in Chapter 3.
Approaching the question
Candidates always find this type of question tricky. It requires a brief explanation of the
reason for a true/false and not just a choice between the two. Some candidates also lost
marks for long rambling explanations without a decision as to whether a statement was true
or false.
i. True. A possible way to provide an explanation here is through an example, for example
if we want to establish causal effects of fluoridated water, we need a control group
without fluoride in the water, but which is as similar as possible to a group with
fluoridated water. Another way is to note that randomised experiments are better tools
to establish causal relations, but we may not be able to carry out a proper experiment
(see p.156 of the subject guide).
ii. False; the correlation may be spurious, for example there may be a third variable
affecting both variables leading to a correlation.
iii. In this part it is important to realise that income is typically a right (positively) skewed
variable. Hence the statement is true since, due to the right skewness, the mean will be
bigger than the median.
(h) In the context of sampling, explain the difference between item non-response
and unit non-response.
(3 marks)
Reading for this question
This question requires knowledge about sampling and sample surveys. Useful background
reading may be found in Chapter 9 of the subject guide. The material directly related to
this question, item non-response and unit non-response, appears on p.145. See also the
references to Newbold and Carlson given in Chapter 9 of the subject guide.
Approaching the question
The relevant parts of p.145 are that:
• item non-response occurs when a sampled member fails to respond
• unit non-response occurs when no information is collected from a sample member.
In addition to the definitions supplied above, it would also be useful to use an example.
Section B
Question 2
(a) A social survey in the United States asked subjects, ‘Would you say that
homeopathy is very scientific, sort of scientific, or not at all scientific?’ The
table below cross-classifies their responses with their highest level of education.
Homeopathy is scientific
Highest degree Very Sort of Not at all Total
Less than High school 46 (11%) 168 (41%) 196 (48%) 410 (100%)
High school 100(5%) 572 (31%) 1148 (63%) 1820 (100%)
College or higher 32 (2%) 248 (18%) 1076 (79%) 1356 (100%)
Total 178(5%) 988 (28%) 2420 (67%) 3586 (100%)
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between education and opinion on
whether or not homeopathy is scientific?
(4 marks)
ii. Calculate the χ2 statistic and use it to test for independence, using a 1%
significance level. What do you conclude?
(9 marks)
Reading for this question
This part targets Chapter 8 on contingency tables and chi-square tests. Note that part (i) of
the question does not require any calculations, just understanding and interpreting
contingency tables. Candidates can attempt Activity A8.4 to practise. Part (ii) is a
straightforward chi-squared test and the reading is also given in Chapter 8.
Approaching the question
i. Using the percentages we see that the higher someone’s education, the smaller the belief
that homeopathy is very scientific and the higher the belief that it is not at all scientific.
For example, 79% of those who attended college or higher education responded that
homeopathy is not at all scientific, whereas the corresponding proportion for those with
less than high school education is 48%.
ii. Set out the null hypothesis that there is no association between education and views on
homeopathy against the alternative, that there is an association. Be careful to get these
the correct way round!
H0 : No association between education and views on homeopathy versus
H1 : Association between education and views on homeopathy.
Work out the expected values to obtain the table below
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. You have been asked to design a nation-wide survey in your country to find
out about the smoking habits of adults. Give two stratification factors you
might use, and explain why you have chosen them.
(5 marks)
Reading for this question
This question on basic material on survey designs required background reading from
Chapters 9 and 10 of the subject guide which, along with the recommended reading should
be looked at carefully. Candidates were expected to have studied and understood the main
important constituents of design in random sampling. It is also a good idea to try the
activities in Chapter 9.
Approaching the question
One of the main things to avoid here is writing an answer without any structure. This
exercise asks for specific things and each one of them requires one or two lines. If you are
unsure of what these specific things are, do not write lengthy essays. This is a waste of
your valuable examination time. If you can identify what is being asked, keep in mind that
the answer should not be long. Note also that in some cases there is no unique answer
to the question.
i. Simple random sampling:
• Every sample has equal probability.
• With replacement.
Stratified random sampling:
• Population divided into strata (or groups).
• Random sample from each group.
ii. There are generally two main reasons why one would prefer stratified to simple random
sampling.
• Potentially more precision of parameter estimates.
• Obtain information about subgroups.
iii. In this part you can choose factors based on two arguments. First, you can aim for
factors whose subgroups differ regarding smoking habits (e.g. gender, ethnic groups, age
groups etc.). In that way the stratified sampling scheme will have increased precision.
Alternatively you can just suggest factors that are interesting from a research point of
view.
Question 3
The level of infant mortality (y) is represented by the number of baby deaths for
every 1000 births. For 12 areas these are shown in the following table. For each
area, the percentage (x) of babies born into families earning at least £25,000 is also
shown.
Area A B C D E F G H I J K L
Percentage (x) 20 6 10 21 12 36 6 19 26 13 21 16
Infant mortality (y) 5 17 16 8 15 5 25 12 11 11 7 12
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
(4 marks)
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a full title (‘Scatter diagram’ alone will not suffice) and labelled
axes, including information about units. Far too many candidates threw away marks by
neglecting these points and consequently were only given one mark out of the possible
four allocated for this part of the question. Another common way of losing marks was
failing to use the graph paper which was provided, and required, in the question.
Candidates who drew on the ordinary paper in their booklet were not awarded marks for
this part of the question.
25
●
20
15
●
●
●
10
● ●
● ●
●
●
5
● ●
0
0 5 10 15 20 25 30 35
ii. The summary statistics can be substituted into the formula for the correlation coefficient
(make sure you know which one it is!) to obtain the value −0.8026. An interpretation of
this value is the following: The data suggest that the higher the percentage of families
earning at least a certain income, the lower the mortality. The fact that the value is very
close to −1, suggests that this is a strong (negative) association.
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Many candidates did not give the measurement units here. These are essential in
answering such a question and a mark is deducted if they are not specified.
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
observations is large, is ±1.96. Hence, we reject the null hypothesis suggesting evidence
for a difference between the two areas. If we take a (smaller) α of 1%, the critical value is
±2.576, so we do not reject H0 . We conclude that there is some, but not strong, evidence
of a difference between the two areas.
ii. The assumptions included:
2 2
• Assumption about whether σA = σB .
• Assumption about whether nA + nB − 2 is ‘large’, hence t v. z
• Assumption about independent samples.
iii. The question is a standard exercise in confidence intervals. Note the question refers to
areas A and B combined. The workout is given below:
• Correct quantile: zα/2 = 2.326.
• Correct endpoints: 0.635 and 0.746. (Also accept two decimal places.)
• Report as an interval: (0.635, 0.746). (Also accept between 0.635 and 0.746.)
Question 4
(a) i. Carefully construct a box plot on the graph paper provided to display the
following yearly incomes of a group of people, measured in £1000:
9 6 12 24 21 57 6 15 9 12 30 36
(8 marks)
ii. Based on the shape of the box plot you have drawn, describe the distribution
of the data
(2 marks)
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
(3 marks)
Reading for this question
Chapter 3 provides all the relevant material for this question. More specifically, information
on boxplots can be found in Section 3.9.2, but all of Sections 3.8 and 3.9 are highly relevant.
Approaching the question
i. The boxplot diagram the Examiners were hoping to see is shown below. Marks were
awarded for including the title, identifying the box and the whiskers and noting outlier,
at a reasonable accuracy.
In order to identify the box, the quartiles are needed that are 9 and 25.5, hence giving an
interquartile range of 16.5. The median is also needed which is 13.5.
Hence the outlier limits are from 0 to 50.25. (−15.75 to 50.25 is also allowed.)
The extreme outlier limits are then from 0 to 70 (−40.5 to 70 is also allowed.)
Hence 57 is an outlier but not an extreme outlier.
Note that you did not need to label the x axis and that the plot can be transposed.
ii. Based on the shape of the boxplot, we can see that the distribution of the data is
positively skewed.
iii. A histogram or stem-and-leaf diagram are other types of suitable graphical displays. The
variable income is measurable and these graphs are suitable for displaying the
distribution of such variables.
(b) A new treatment has been devised with the aim of reducing blood pressure for
people with high blood pressure. Each participant’s blood pressure was
measured before and after the program to see if the treatment is effective. The
following data were obtained:
14
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Distribution of Income
60
●
50
Income in thousands of pounds
40
30
20
10
Before After
177 174
142 146
146 144
162 159
145 145
162 163
152 156
154 150
171 172
i. Carry out an appropriate hypothesis test to determine whether the
treatment is effective for reducing blood pressure. State the test hypotheses,
and specify your test statistic and its distribution under the null hypothesis.
Comment on your findings.
(6 marks)
ii. State any assumptions you made.
(2 marks)
iii. Give a 90% confidence interval for the difference in means.
(2 marks)
iv. On the basis of the data alone, would you recommend the programme to a
friend who suffers from high blood pressure? Explain why or why not.
(2 marks)
Reading for this question
Look up the sections about hypothesis testing for testing differences in means. However, it
is essential for this part of the question to focus on the section of the subject guide
regarding paired samples (Section 7.16.4).
Approaching the question
i. Regarding hypotheses, note that the word ‘effective’ suggests a one-sided test:
H0 : µbefore = µafter , H1 : µbefore > µafter
In this part, it is also essential to realise that we have a paired sample, as we have two
observations for each person (before and after treatment). Hence the difference for each
person should be calculated
3 −4 2 3 0 −1 −4 4 −1
The next step is to calculate sd = 2.991, x̄d = 0.2222, in order to obtain the value of the
test statistic sx̄dd/−0
√ = 0.2229.
n
15
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
We have the t distribution with eight degrees of freedom, hence the critical value (for a
one-sided test) is 1.860.
Hence, we do not reject H0 at the 5% level. Testing at the 10% level gives a critical value
of t8,0.1 = 1.397. Therefore, we still do not reject H0 . There is no significant evidence
that the treatment is effective.
ii. • Differences normally distributed (no marks for normally distributed blood pressure).
• Pairs of observations are independent (a weaker condition which suffices is that the
differences are independent, but this is unlikely if observations are not).
iii. This is a straightforward exercise for confidence intervals given the appropriate formula
from the formula sheet (make sure that you can recognise it). The requested confidence
interval is (−1.6316, 2.0766).
iv. The evidence in the data that the treatment works is close to negligible as can be seen,
for example, from the 90% confidence interval, so there is no reason to recommend the
treatment on the basis of the data alone.
16
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2012–13. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
A change that took place from 2011–12 onwards is the presence of a formula sheet. The purpose of
this change is to encourage candidates to devote more time in understanding the key concepts of the
syllabus rather than memorising a big number of formulae. Nevertheless, candidates should not rely
on this formula sheet entirely but only use it for verification. The formula sheet is available on the
virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2011).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
17
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
a variable and be able to distinguish between discrete and continuous (measurable) data. In
addition to identifying whether a variable is categorical or measurable, further distinctions
between ordinal and nominal categorical variables should be made by candidates.
Approaching the question
A general tip for identifying continuous and categorical variables is to think of the possible
values they can take. If these are finite and represent specific entities the variable is
categorical. Otherwise, if these consist of numbers corresponding to measurements, the data
are continuous and the variable is measurable. Such variables may also have measurement
units or can be measured to various decimal places.
i. Each rank is a category, therefore this is a categorical variable. The values of this
variable are the ranks of each university. By definition the categories (ranks) are ordered,
thus resulting in a (categorical) ordinal variable.
ii. Each country is a category, so the possible values are one for each country. Hence, the
variable is categorical. Note also that countries do not have a natural ordering, so this
represents a categorical nominal variable.
iii. The data represent weights of babies at birth that can be measured to many decimal
places; for example 5.234 kgs. This is, therefore, a measurable variable.
iv. Each pop group is a category and is also a potential value of this variable. Hence, the
variable is categorical. Moreover, pop groups do not have a natural ordering, therefore
this categorical variable is on a nominal scale.
Weak candidates did not provide a justification for their choices, reported nominal or
categorical to measurable variables and sometimes answered ordinal when their justification
was pointing to a nominal variable. Writing ‘It is measurable because it can be measured’
will not result in a high mark.
(b) The table below contains the marks (out of 20) of all students taking an
examination for the same course in two years:
2011 10 9 19 9 10 9
2012 10 11 9 11 10 11 12 11 10
i. Find the mean mark and the median mark for each year.
ii. Calculate the range of the marks for each year and give an explanation for
any differences you find.
iii. Calculate the standard deviation of the marks for each year and give an
explanation for any differences you find.
iv. Comment on the differences in the mean and median for the two years that
you found in part i. For this data set, which do you think would give a better
description of the difference in marks: the mean or the median? Explain
briefly.
(12 marks)
Reading for this question
This question contains material mostly from Chapter 3 of the subject guide and, in
particular, Section 3.8 (Measures of location) for parts (i) and (iv), and Section 3.9
(Measures of spread) for parts (ii) and (iii).
Approaching the question
It is important to do the summation carefully and divide by the correct number of
observations to obtain the mean. For questions that require calculations on the median (or
other percentiles like quartiles), a good strategy is to write the observations in order. Note
also that this question requires these measures for both years, so the calculations should be
done for each year separately.
18
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. In order to calculate the two means, you should sum the numbers corresponding to each
year and then divide them by the number of observations in each row. Doing so yields
and
(10 + 11 + 9 + 11 + 10 + 11 + 12 + 11 + 10)/9 = 10.56, for 2012.
For the median if we put the numbers in ascending order we get
and
9 10 10 10 11 11 11 11 12, for 2012.
The median for 2011 is given by taking the average between the 3rd and the 4th number
in the first of the rows above, resulting in a value of (9 + 10)/2 = 9.5. The median for
2012 is obtained from the 5th number in the 2nd row above, which is 11.
ii. Note that the range of a variable equals the difference between the maximum value and
the minimum value. Hence, the range for 2011 was 19 − 9 = 10, whereas the range for
2012 was 12 − 9 = 3. Some candidates answered ‘from 9 to 19’. While this is true, note
that it does not correspond to the definition of the range so it is essential to give the
numbers 10 (2011) and 3 (2012) in your answer.
It is also essential to comment on the different ranges between 2011 and 2012. The
difference is big and is caused by the outlier 19 in 2011.
Some candidates confused ‘Range’ and ‘Interquartile range’. Make sure that you identify
what is being asked.
iii. In order to answer this question, candidates should be familiar with Section 3.9.3 (on
variance and standard deviation) and the chapter activities. It is very important to show
your work with relevant summations of the squared deviation from the mean. In this way
you may get some marks even if the numerical answer is wrong as you are demonstrating
knowledge of the method. The answer for 2011 is 3.95, whereas for 2012 it is 0.88.
It is also essential to comment on the different ranges between 2011 and 2012. The
difference is big and is caused by the outlier 19 in 2011.
iv. The mean is higher in 2011 but the median is higher in 2012. This can be attributed to
the fact that 2011 contains an outlier (19) which results in a high mean. Apart from this
outlier, marks tend to be higher in 2012, so the median gives a somewhat better
indication of the ‘typical’ mark for each year.
19
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(d) We would like to start an internet service provider and need to estimate the
average weekly internet usage of households for our business plan. Internet
usage is measured in minutes. How many households must we randomly select
to be 95 percent confident that the sample mean is within 2 minutes of the
population mean? Assume that a previous survey of household usage has shown
that the standard deviation of internet usage is 6.95 minutes.
(3 marks)
Reading for this question
All of Chapter 6 is relevant, but the main reading for this question can be found in Section
6.1 (Choosing a sample size). It is essential to read this section carefully and attempt the
activities and exercises.
Approaching the question
This question asks you to determine a sample size. This is straightforward once the
distribution is identified. Since the sample size is large, a normal distribution can be used.
The working is given below:
• Identify the correct z-value: 1.96.
• Solve
σ
1.96 √ = 2.
n
We can take σ = 6.95 to find n = 46.38.
• Round up to n = 47.
Some candidates forgot to round up. Remember that you are asked about a sample size.
5
X 5
X 3
X
i. xi ii. 2xi (yi + 1) iii. x22 + (xi + yi3 )
i=1 i=2 i=1
(6 marks)
Reading for this question
This question refers to the basic bookwork which can be found in Section 1.9 of the subject
guide and, in particular, in Activity A1.6.
Approaching the question
Be careful to leave the xs and ys in the order given and only cover the values of i asked for.
This question was generally done well; the answers are:
P5
i. i=1 xi = 2 + (−3) + 6 + 0 + 3 = 8. (1 mark)
P5 P5
ii. i=2 2xi (yi + 1) = 2 i=2 xi (yi + 1) =
2(−3 × (2 + 1) + 6 × (1 + 1) + 0 × (0 + 1) + 3 × (1 + 1)) = 2 × 9 = 18. (2 marks)
2
P3 3 2 3 3 3
iii. x2 + i=1 (xi + yi ) = (−3) + (2 + 3 ) + (−3 + 2 ) + (6 + 1 ) = 9 + 29 + 5 + 7 = 50.
(3 marks)
20
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(f ) In an introductory statistics class, the numbers of males and females are 17 and
23, respectively.
i. A student is selected randomly from the class. What is the probability the
student is female?
ii. A student is selected at random and removed from the class. A second
student is then selected. What is the probability that one of the students is
male and the other is female?
iii. What is the probability that the second student is male, given that the first
student is female and removed from the class?
iv. In previous years it was found that 80% of males pass the exam and 85% of
females pass the exam. Based on the available information, find the
probability that a student who passes the examination is female.
(8 marks)
Reading for this question
This is a question on probability and targets mostly the material covered in Chapter 4. It is
essential to practise this area by attempting the chapter activities and exercises as well as
accessing the material on the VLE. In particular you can attempt Activity A4.6 and Sample
examination question 4. It is also useful to familiarise yourself with probability trees as they
can be quite useful when completing such exercises.
Approaching the question
The first three parts were straightforward for those that were familiar with this section. Part
(iv) required knowledge of Bayes’ formula or a very good understanding of probability trees.
The working out is shown below:
i. There are 23 females and 17 males in the class. Hence the answer is
23/(17 + 23) = 23/40 = 0.575.
ii. The correct answer here is 17 23 23 17
40 × 39 + 40 × 39 = 0.501. Although not necessary, the use of
a probability tree would be quite helpful here.
iii. This part can be answered in a similar way to part (i) noting that there are now 17
males and 22 females in the class. Hence 17/39 = 0.436.
iv.
P (pass|female)P (female)
P (female|pass) =
P (pass)
0.85 × 23/40
=
P (pass ∩ female) + P (pass ∩ male)
0.85 × 23/40
=
0.85 × 23/40 + 0.80 × 17/40
= 0.5897.
(g) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. An important difference between an experimental design and an
observational study is that in an observational study data are collected on
units without any intervention.
ii. If two variables are correlated we can conclude that one causes the other.
iii. If a variable has a symmetric distribution, its mean and median are the same.
(6 marks)
Reading for this question
This question contains material from various parts of the subject guide. Here, it is more
important to have a good intuitive understanding of the relevant concepts than a technical
level of knowledge in computations. Part (i) requires material from Chapter 10 and, in
21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
particular, the sections on observational studies and designed experiments. Part (ii) is about
correlation and causation detailed in Section 11.7 of the subject guide. Finally part (iii)
targets the material covered in Chapter 3.
Approaching the question
Candidates always find this type of question tricky. It requires a brief explanation of the
reason for a true/false and not just a choice between the two. Some candidates also lost
marks for long rambling explanations without a decision as to whether a statement was true
or false.
i. True. A possible way to provide an explanation here is through an example, for example
in an experimental design some units are administered a treatment, and this is not
possible in an observational study.
Note: candidates should indicate in some way that they know what the assertion means,
such as via an example (see p.156 of the subject guide).
ii. False; the correlation may be spurious, for example there may be a third variable
affecting both variables leading to a correlation.
iii. True; mean and median are at the centre of symmetry.
(h) In the context of sampling, explain the difference between item non-response
and unit non-response.
(3 marks)
Reading for this question
This question requires knowledge about sampling and sample surveys. Useful background
reading may be found in Chapter 9 of the subject guide. The material directly related to
this question, item non-response and unit non-response, appears on p.145. See also the
references to Newbold and Carlson given in Chapter 9 of the subject guide.
Approaching the question
The relevant parts of p.145 are that:
• item non-response occurs when a sampled member fails to respond
• unit non-response occurs when no information is collected from a sample member.
In addition to the definitions supplied above, it would also be useful to use an example.
Section B
Question 2
(a) The 2006 General Social Survey in the United States asked subjects, ‘Would you
say that astrology is very scientific, sort of scientific, or not at all scientific?’ The
table below cross-classifies their responses with their highest level of education.
Astrology is scientific
Highest degree Very Sort of Not at all Total
Less than High school 23 (11%) 84 (41%) 98 (48%) 205 (100%)
High school 50 (5%) 286 (31%) 574 (63%) 910 (100%)
College or higher 16 (2%) 124 (18%) 538 (79%) 678 (100%)
Total 89 (5%) 494 (28%) 1210 (67%) 1793 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between education and opinion on
whether or not astrology is scientific?
(4 marks)
22
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. Calculate the χ2 statistic and use it to test for independence, using a 1%
significance level. What do you conclude?
(9 marks)
Reading for this question
This part targets Chapter 8 on contingency tables and chi-square tests. Note that part (i) of
the question does not require any calculations, just understanding and interpreting
contingency tables. Candidates can attempt Activity A8.4 to practise. Part (ii) is a
straightforward chi-squared test and the reading is also given in Chapter 8.
Approaching the question
i. Using the percentages we see that the higher someone’s education, the smaller the belief
that astrology is very scientific and the higher the belief that it is not at all scientific.
For example, 79% of those who attended college or higher education responded that
astrology is not at all scientific, whereas the corresponding proportion for those with less
than high school education is 48%.
ii. Set out the null hypothesis that there is no association between education and views on
astrology against the alternative, that there is an association. Be careful to get these the
correct way round!
H0 : No association between education and views on astrology versus
H1 : Association between education and views on astrology.
Work out the expected values to obtain the table below
23
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Question 3
The level of infant mortality (y) is represented by the number of baby deaths for
every 1000 births. For 12 areas these are shown in the following table. For each
area, the percentage (x) of babies born into families earning at least £25,000 is also
shown.
Area A B C D E F G H I J K L
Percentage (x) 19 5 9 20 11 35 5 18 25 12 20 15
Infant mortality (y) 3 15 14 6 13 3 23 10 9 9 5 10
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
(4 marks)
ii. Calculate the sample correlation coefficient. Interpret your findings.
(3 marks)
24
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
(4 marks)
iv. Using the equation you found in iii., obtain the predicted infant mortality for
an area where 34% of babies are born into families earning at least £25,000.
Do you think this value is realistic? Justify your answer.
(2 marks)
Reading for this question
This is a standard regression question and the reading is to be found in Chapter 11. Section
11.6 provides details for scatter diagrams and is suitable for part (i) whereas the remaining
parts focus on correlation and regression and are covered in Sections 11.8 to 11.10 of the
subject guide. Section 11.7 is also relevant. Sample examination question 2 from this
chapter is recommended for practice on questions of this type.
Approaching the question
i. Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a full title (‘Scatter diagram’ alone will not suffice) and labelled
axes, including information about units. Far too many candidates threw away marks by
neglecting these points and consequently were only given one mark out of the possible
four allocated for this part of the question. Another common way of losing marks was
failing to use the graph paper which was provided, and required, in the question.
Candidates who drew on the ordinary paper in their booklet were not awarded marks for
this part of the question.
25
●
20
15
●
●
●
10
● ●
● ●
●
●
5
● ●
0
0 5 10 15 20 25 30 35
ii. The summary statistics can be substituted into the formula for the correlation coefficient
(make sure you know which one it is!) to obtain the value −0.8026. An interpretation of
this value is the following: The data suggest that the higher the percentage of families
earning at least a certain income, the lower the mortality. The fact that the value is very
close to −1, suggests that this is a strong (negative) association.
iii. The regression line can be written by the equation ŷ = a + bx or y = a + bx + . The
formula for b is P
xi yi − nx̄ȳ
b= P 2 ,
xi − nx̄2
25
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Many candidates did not give the measurement units here. These are essential in
answering such a question and a mark is deducted if they are not specified.
26
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Question 4
(a) i. Carefully construct a box plot on the graph paper provided to display the
following yearly incomes of a group of people, measured in £1000:
3 2 4 8 7 19 2 5 3 4 10 12
(8 marks)
ii. Based on the shape of the box plot you have drawn, describe the distribution
of the data
(2 marks)
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
(3 marks)
Reading for this question
Chapter 3 provides all the relevant material for this question. More specifically, information
on boxplots can be found in Section 3.9.2, but all of Sections 3.8 and 3.9 are highly relevant.
Approaching the question
i. The boxplot diagram the Examiners were hoping to see is shown below. Marks were
awarded for including the title, identifying the box and the whiskers and noting the
outlier, at a reasonable accuracy.
Distribution of Income
20
●
Income in thousands of pounds
15
10
5
0
In order to identify the box, the quartiles are needed that are 3 and 8.5, hence giving an
interquartile range of 4.5. The median is also needed which is 5.5.
27
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Hence the outlier limits are from 0 to 16.75. (−5.25 to 16.75 is also allowed.)
The extreme outlier limits are then from 0 to 25 (−13.5 to 25 is also allowed.)
Hence 19 is an outlier but not an extreme outlier.
Note that you did not need to label the x axis and that the plot can be transposed.
ii. Based on the shape of the boxplot, we can see that the distribution of the data is
positively skewed.
iii. A histogram or stem-and-leaf diagram are other types of suitable graphical displays. The
variable income is measurable and these graphs are suitable for displaying the
distribution of such variables.
(b) A new fitness programme is devised for obese people. Each participant’s weight
in kg was measured before and after the program to see if the fitness program is
effective in reducing their weights. The following data were obtained:
Before After
145 143
116 120
120 118
133 130
119 119
133 134
125 128
126 123
140 141
i. Carry out an appropriate hypothesis test to determine whether the fitness
programme is effective for reducing weight. State the test hypotheses, and
specify your test statistic and its distribution under the null hypothesis.
Comment on your findings.
(6 marks)
ii. State any assumptions you made.
(2 marks)
iii. Give a 80% confidence interval for the difference in means.
(2 marks)
iv. On the basis of the data alone, would you recommend the programme to a
friend who wants to lose weight? Explain why or why not.
(2 marks)
Reading for this question
Look up the sections about hypothesis testing for testing differences in means. However, it
is essential for this part of the question to focus on the section of the subject guide
regarding paired samples (Section 7.16.4).
Approaching the question
i. Regarding hypotheses, note that the word ‘effective’ suggests a one-sided test:
H0 : µbefore = µafter , H1 : µbefore > µafter
In this part, it is also essential to realise that we have a paired sample, as we have two
observations for each person (before and after treatment). Hence the difference for each
person should be calculated
2 −4 2 3 0 −1 −3 3 −1
The next step is to calculate sd = 2.571, x̄d = 0.1111, in order to obtain the values of the
test statistic sx̄dd/−0
√ = 0.1296.
n
We have the t distribution with eight degrees of freedom, hence the critical value (for a
one-sided test) is 1.860.
Hence, we do not reject H0 at the 5% level. Testing at the 10% level gives a critical value
of t8,0.1 = 1.397. Therefore, we still do not reject H0 . There is no significant evidence
that the fitness program is effective.
28
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. • Differences normally distributed (no marks for normally distributed blood pressure).
• Pairs of observations are independent (a weaker condition which suffices is that the
differences are independent, but this is unlikely if observations are not).
iii. This is a straightforward exercise for confidence intervals give the appropriate formula
from the formula sheet (make sure that you can recognise it). The requested confidence
interval is (−0.650729, 0.872951).
iv. The evidence in the data that the programme works is close to negligible as can be seen,
for example, from the 80% confidence interval, so there is no reason to recommend the
programme on the basis of the data alone.
29
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(b) The table below contains the number of wine bottles sold at two different
supermarkets on the last days from the previous month:
Supermarket A 55 52 102 96 59 55 60
Supermarket B 61 68 63 69 62 71 72 67 62
i. Find the mean and the median number of wine bottles sold for each
supermarket.
ii. Comment on the differences in the mean and median for the two
supermarkets that you found in part (i.). For this data set, which do
you think would give a better description for the number of wine bottles
sold: the mean or the median? Explain briefly.
iii. After making some enquiries, you find out that there was a party thrown
in a house on the street of supermarket A on the days with 102 and 96
wine bottles sold. Without doing any calculations, would you change your
answers about potential differences between the means and medians for the
two supermarkets? Give explanations for any statements that you make.
[8 marks]
(c) Suppose that X is a normally distributed random variable with mean 0 and
variance 1.
i. Find the probability that X + 4 is less than 4.
ii. Find the value of b so that the probability of X − b being less than zero is
0.975
[4 marks]
(d) You are told that a 95% confidence interval for a population proportion is
(0.3775, 0.6225). What was the sample proportion that lead to this confidence
interval? Also, what was the size of the sample used? [5 marks]
UL12/0217 Page 2 of 6
D00
X
i=5 X
i=5 X
i=4
i. xi ii. 3xi (yi − 2) iii. x25 + (x2i + yi )
i=1 i=3 i=2
[6 marks]
(f) Suppose there are two boxes; the first one contains three green and one red
balls, whereas the second contains two green and two red balls. First, a box is
chosen at random and then a ball is drawn randomly from that box.
i. What is the probability that the ball drawn is green?
ii. If the ball drawn was green, what is the probability that the first box was
chosen?
[5 marks]
x 0 1 2 3
pX (x) 0.2 0.3 0.1 0.4
i. Find the probability that X is an odd number.
ii. Find E(X), the expected value of X.
[4 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In quota sampling we cannot draw statistical inference.
ii. The Spearman rank correlation coefficient is more useful than Pearson
correlation in data with outliers.
iii. If the constant in the regression equation is negative, the correlation will
also be negative.
iv. If the p-value for a test is larger than the significance level, we reject H0 .
v. In experimental studies one can use quota sampling to select the treatment
and control groups.
[10 marks]
UL12/0217 Page 3 of 6
D00
SECTION B
2. (a) A social survey in the UK asked subjects, ‘Do you do your shopping online?’
with the possible answers being ‘Frequently’, ‘Rarely’ and ‘Never’. The table
below cross-classifies their responses with their gender.
Shop online
Gender Frequently Rarely Never Total
Male 52 (26%) 94 (47%) 54 (27%) 200 (100%)
Female 47 (39%) 52 (43%) 21 (18%) 120 (100%)
Total 99 (31%) 146 (46%) 75 (23%) 320 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between gender and tendency to shop
online?
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
[13 marks]
(b) i. You have been asked to design a nationwide survey in your country to find
out about internet usage among children less than 10 years old. Provide
a probability sampling scheme and a sampling frame that you would like
to use. Identify a potential source of selection bias that may occur and
discuss how this issue can be addressed.
ii. Describe what is a longitudinal survey. State two ways in which panel
surveys differ from longitudinal surveys.
[12 marks]
UL12/0217 Page 4 of 6
D00
3. A car insurance company would like to examine the relationship between driving
experience and insurance premium. For this reason, a random sample of ten drivers
is taken and the years of driving experience (x) as well as the monthly insurance
premium (y, in £) is recorded. The data are shown in the table below.
Driver #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Driving experience (x) 6 3 11 10 15 6 25 16 15 20
Insurance premium (y) 66 88 51 70 44 56 42 60 45 40
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
monthly insurance premium for a driver with 10 years of experience? Will
you trust this value? Justify your answer.
[13 marks]
(b) A company wants to check the quality of its customer service regarding phone
enquiries. For this reason, the manager wants to compare the call waiting
times during the years 2013 and 2012. Unfortunately, extensive records of
the company are not available, and he can only check a random sample of
phone calls within these two years. The available data, measured in minutes
of waiting time, are provided below for each year.
[12 marks]
UL12/0217 Page 5 of 6
D00
4. (a) i. Carefully construct a box plot on the graph paper provided to display
the following average daily intakes of calories for 12 athletes, measured in
kcals:
1808 2200 2154 2004 2101 1957 3061 2500 2009 2147 2231 1936
ii. Based on the shape of the box plot you have drawn, describe the
distribution of the data.
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
[13 marks]
(b) A study was made to determine the amount of fuel economy obtained by using a
specific new type of tyre over a standard type. For this reason, 8 cars were fitted
with the new type of tyre and the fuel consumption (in km/l) was measured
after a test-drive. Afterwards, the same cars with the same drivers were fitted
with the standard type of tyre and the experiment was repeated to obtain the
following fuel consumption measurements.
Car #1 #2 #3 #4 #5 #6 #7 #8
Standard type tyres 4.6 6.5 7.4 5.5 5.3 5.2 6.6 6.7
New type tyres 4.1 6.2 7.1 5.4 5.5 5.1 6.1 6.3
i. Carry out an appropriate hypothesis test to determine whether the
fuel consumption is different between the two types of tyre. State the
test hypotheses, and specify your test statistic and its distribution under
the null hypothesis. Comment on your findings.
ii. State any assumptions you made in (i.).
iii. Give a 95% confidence interval for the difference in means.
iv. On the basis of the data alone, would you be concerned about fuel
consumption if you wanted to buy the new type of tyre? Provide an
explanation with your answer.
[12 marks]
END OF PAPER
UL12/0217 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ
Z= √ X̄ − µ
σ/ n t= √
S/ n
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (µ1 − µ2 )
Z∼
=q Z=
π(1−π)
q 2
σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
(X̄1 − X̄2 ) − (µ1 − µ2 )
1 1
t= r 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=r
n
P (1 − P ) n11 + n12
a = ȳ − bx̄
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 9| of aruzhan.yerbolatovaa@gmail.com
21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 10| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 11| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 12| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 13| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 14| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 15| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 16| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 17| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 18| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 19| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 20| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL14/0741 Downloaded by: aruzhanyerbolatova
Page 21| aruzhan.yerbolatovaa@gmail.com
of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(b) The table below contains the number of customers that visited two different
branches of a bank on the last days of the previous month:
Branch A 75 72 142 120 79 75 81
Branch B 81 88 83 89 82 91 92 87 82
i. Find the mean and the median number of customers for each branch.
ii. Comment on the differences in the mean and median for the two branches
that you found in part (i.). For this data set, which do you think would
give a better description for the number of customers: the mean or the
median? Explain briefly.
iii. After making some enquiries, you find out that the ATM next door to
branch A was not working on the days with 142 and 120 customers.
Without doing any calculations, would you change your answers about
potential differences between the means and between the medians of the
two branches? Give explanations for any statements that you make.
[8 marks]
(c) Suppose that X is a normally distributed random variable with mean 0 and
variance 1.
i. Find the probability that X + 3 is less than 3.
ii. Find the value of b so that the probability of X − b being less than zero is
0.95
[4 marks]
(d) You are told that a 90% confidence interval for a population proportion is
(0.4086, 0.5914). What was the sample proportion that lead to this confidence
interval? Also, what was the size of the sample used? [5 marks]
UL12/0217 Page 2 of 6
D00
X
i=5 X
i=5 X
i=4
i. xi ii. 3xi (yi − 2) iii. x25 + (x2i + yi )
i=1 i=3 i=2
[6 marks]
(f) Suppose there are two boxes; the first one contains one green and three red
balls, whereas the second contains two green and two red balls. First, a box is
chosen at random and then a ball is drawn randomly from that box.
i. What is the probability that the ball drawn is green?
ii. If the ball drawn was green, what is the probability that the first box was
chosen?
[5 marks]
x 1 2 3 4
pX (x) 0.3 0.3 0.3 0.1
i. Find the probability that X is an even number.
ii. Find E(X), the expected value of X.
[4 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In stratified random sampling the interviewer selects a certain number of
people according to some pre-specified strata.
ii. If two variables have correlation which is almost zero, we can conclude
that they are independent.
iii. If two variables have correlation which is close to one, we can conclude
that the variables are related.
iv. If the χ2 test statistic is larger than the 5% critical value, the p-value is
also larger than 0.05.
v. Cluster sampling can be used to reduce the cost of a survey.
[10 marks]
UL12/0217 Page 3 of 6
D00
SECTION B
2. (a) A social survey in the UK asked subjects, ‘Do you buy organic products, despite
the fact they are usually more expensive?’ with the possible answers being
‘Yes’, ‘Sometimes’ and ‘No’. The table below cross-classifies their responses
with their place of residence (‘Rural’ or ‘Urban’ areas).
Buy organic products
Place of residence Yes Sometimes No Total
Rural area 35 (17%) 90 (45%) 75 (38%) 200 (100%)
Urban area 73 (21%) 163 (46%) 114 (33%) 350 (100%)
Total 108(20%) 253 (46%) 189 (34%) 550 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between place of residence and buying
organic products?
ii. Calculate the χ2 statistic and use it to test for independence, using a 10%
significance level. What do you conclude?
[13 marks]
(b) i. You have been asked to design a nationwide survey in your country to find
out about internet usage among children less than 10 years old. Provide
a probability sampling scheme and a sampling frame that you would like
to use. Identify a potential source of selection bias that may occur and
discuss how this issue can be addressed.
ii. Describe what is a longitudinal survey. State an advantage and a disad-
vantage when using such surveys.
[12 marks]
UL12/0217 Page 4 of 6
D00
3. We are interested in studying the association between the price of flour and the
production of wheat in a particular area of the UK. The data shown in the table
below provide figures regarding the production of wheat in tonnes (x) as well as the
price of flour (y), in £ per kg, over the last 10 years.
Year #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Production of Wheat (x) 30 28 32 25 25 24 22 24 35 40
Price of Flour (y) 25 30 27 40 42 41 50 45 30 25
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
price of flour for a year with 45 tonnes production of wheat? Will you
trust this value? Justify your answer.
[13 marks]
(b) A company wants to check the quality of its customer service regarding phone
enquiries. For this reason the manager wants to compare the call waiting
times during the years 2013 and 2012. Unfortunately, extensive records of
the company are not available, and he can only check a random sample of
phone calls within these two years. The available data, measured in minutes
of waiting times, are provided below for each year.
[12 marks]
UL12/0217 Page 5 of 6
D00
4. (a) i. Carefully construct a box plot on the graph paper provided to display
the following annual earnings for the salesmen of a company, measured in
£000s:
35 26 22 24 21 57 36 35 29 47 30 36
ii. Based on the shape of the box plot you have drawn, describe the
distribution of the data.
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
[13 marks]
(b) A study was made to determine the amount of fuel economy obtained by using a
specific new type of tyre over a standard type. For this reason, 8 cars were fitted
with the new type of tyre and the fuel consumption (in km/l) was measured
after a test-drive. Afterwards, the same cars with the same drivers were fitted
with the standard type tyres and the experiment was repeated to obtain the
following fuel consumption measurements.
Car #1 #2 #3 #4 #5 #6 #7 #8
Standard type tyres 4.6 6.5 7.4 5.5 5.3 5.2 6.6 6.7
New type tyres 5.1 6.2 7.3 5.4 5.5 5.1 6.1 7.3
i. Carry out an appropriate hypothesis test to determine whether the
fuel consumption is different between the two types of tyre. State the
test hypotheses, and specify your test statistic and its distribution under
the null hypothesis. Comment on your findings.
ii. State any assumptions you made in (i.).
iii. Give a 95% confidence interval for the difference in means.
iv. On the basis of the data alone, would you be concerned about fuel
consumption if you wanted to buy the new type of tyre? Provide an
explanation with your answer.
[12 marks]
END OF PAPER
END OF PAPER
UL12/0217 Page 6 of 6
D00
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ
Z= √ X̄ − µ
σ/ n t= √
S/ n
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 7 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (µ1 − µ2 )
Z∼
=q Z=
π(1−π)
q 2
σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
(X̄1 − X̄2 ) − (µ1 − µ2 )
1 1
t= r 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=r
n
P (1 − P ) n11 + n12
a = ȳ − bx̄
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 8 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 9 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 10 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 11 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 12 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 13 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 14 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 15 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 16 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 17 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 18 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 19 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 20 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL14/0742 Page 21 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2013–14. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
By the end of this unit and having completed the Essential reading and activities you should:
• be familiar with the key ideas of statistics that are accessible to a candidate with a
moderate mathematical competence
• be able to apply a variety of methods for explaining, summarising and presenting data and
interpreting results clearly using appropriate diagrams, titles and labels when required
• understand the ideas of randomness and variability, and the way in which these link to
probability theory to allow the systematic and logical collection of statistical techniques of
great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to use inference to test the significance of common measures such as means and
proportions and carry out chi-squared tests of contingency tables
• be able to carry out simple regression and correlation analysis and know when it is
appropriate to do so.
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory which covers several subquestions and accounts for 50 per cent of the total marks.
Section B contains three questions, each worth 25 per cent, from which you are asked to choose two.
Remember that each of the Section B questions is likely to cover more than one topic. In 2014, for
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
example, the first part of Question 2 asked for a chi-squared test and survey design problems
appeared in the second part. The first part of Question 3 was on regression and involved drawing a
diagram, while the second part was a hypothesis test comparing population means using the sample
data provided. Question 4 had a series of questions involving drawing diagrams, such as boxplots,
hypothesis testing, in particular paired t tests, and confidence intervals. This means that it is really
important that you make sure you have a reasonable idea of what topics are covered before you start
work on the paper! We suggest you divide your time as follows during the examination:
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Do not allow yourself to get stuck on any one
question, but do not just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams which were required and, if you did more than the
two questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are!
The Examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
You are not expected to write long essays where explanations or descriptions of sample design
are required, and note form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2014 examinations!
Remember:
• If you are asked to label a diagram (which is almost always the case!), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x and y axes?
• If you are specifically asked to carry out a hypothesis test, or a confidence interval, do so. It
is not acceptable to do one rather than the other! If you are asked to find a 5% critical
value, this is what will be marked.
• Do not waste time calculating things which are not required by the Examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not help your marks.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Examiners’ commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the Examiners were looking for
• the relevant detailed reference to Newbold et al. (seventh edition) and the subject guide
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold et al.
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
Question spotting
Many candidates are disappointed to find that their examination performance is poorer
than they expected. This can be due to a number of different reasons and the Examiners’
commentaries suggest ways of addressing common problems and improving your performance.
We want to draw your attention to one particular failing – ‘question spotting’, that is,
confining your examination preparation to a few question topics which have come up in past
papers for the course. This can have very serious consequences.
We recognise that candidates may not cover all topics in the syllabus in the same depth, but
you need to be aware that Examiners are free to set questions on any aspect of the syllabus.
This means that you need to study enough of the syllabus to enable you to answer the required
number of examination questions.
The syllabus can be found in the ‘Course information sheet’ in the section of the VLE dedicated
to this course. You should read the syllabus very carefully and ensure that you cover sufficient
material in preparation for the examination.
Examiners will vary the topics and questions from year to year and may well set questions that
have not appeared in past papers – every topic on the syllabus is a legitimate examination
target. So although past papers can be helpful in revision, you cannot assume that topics or
specific questions that have come up in past examinations will occur again.
If you rely on a question spotting strategy, it is likely you will find yourself in
difficulties when you sit the examination paper. We strongly advise you not to
adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2013–14. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. Each country is a category, so the possible values are one for each country. Hence, the
variable is categorical. Note also that countries do not have a natural ordering, so this
represents a nominal categorical variable.
ii. Speed is a variable which can be measured in miles per hour or kilometres per hour to
several decimal places. Hence it is a measurable variable.
iii. The Dow Jones index takes values to several decimal places. It is therefore regarded as a
measurable variable.
iv. The position of Manchester United can be either 1st, 2nd or any other position up to
20th. By definition, these positions (places) are ordered: 1st is the highest place and
20th is the lowest. Hence it is an ordinal categorical variable.
Weak candidates did not provide justifications for their choices, reported nominal or
categorical or measurable variables and sometimes answered ordinal when their justification
was pointing to a nominal variable. There were also phrases like ‘It is measurable because it
can be measured’ that were not awarded any marks.
(b) The table below contains the number of wine bottles sold at two different
supermarkets on the last days from the previous month:
Supermarket A 55 52 102 96 59 55 60
Supermarket B 61 68 63 69 62 71 72 67 62
i. Find the mean and the median number of wine bottles sold for each
supermarket.
ii. Comment on the differences in the mean and median for the two
supermarkets that you found in part (i.). For this data set, which do you
think would give a better description for the number of wine bottles sold:
the mean or the median? Explain briefly.
iii. After making some enquiries, you find out that there was a party thrown in a
house on the street of supermarket A on the days with 102 and 96 wine
bottles sold. Without doing any calculations, would you change your answers
about potential differences between the means and medians for the two
supermarkets? Give explanations for any statements that you make.
(8 marks)
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. In order to calculate the two means, you should sum the numbers corresponding to each
supermarket and then divide them by the number of observations in each row. Doing so
yields:
(55 + 52 + · · · + 60)/7 = 68.4, for supermarket A
and:
(61 + 68 + · · · + 62)/9 = 66.1, for supermarket B.
and:
61 62 62 63 67 68 69 71 72, for supermarket B.
The median for supermarket B is given by taking the 4th number in the first of the rows
above, which is 59. The median for supermarket B is obtained from the 5th number in
the 2nd row above, which is 67.
One mark for each of the four numbers above was awarded.
ii It is first important to note that the mean of supermarket A is higher than that of
supermarket B. However this does not necessarily indicate that the centre of the
distribution of supermarket A is larger than that of supermarket B. Supermarket A
contains two outliers (102 and 96) which result in a high mean. Apart from these
outliers, the numbers of wine bottles sold tend to be higher in supermarket B, so the
median gives a somewhat better indication of the ‘typical’ nnumber of wine bottles sold
for each supermarket.
iii. After taking out these days we can argue that both the mean and median for
supermarket A would be smaller than that for supermarket B on a day where there are
no home parties nearby. This is because all the numbers of wine bottles sold in
supermarket A (on a day where there are no house parties nearby) are smaller or equal
to any number of wine bottles sold in supermarket B.
(c) Suppose that X is a normally distributed random variable with mean 0 and
variance 1.
i. Find the probability that X + 4 is less than 4.
ii. Find the value of b so that the probability of X − b being less than zero is
0.975.
(4 marks)
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. We get:
X +4−4 4−4
P (X + 4 < 4) = P < = P (X < 0) .
1 1
Due to symmetry we get:
P (X < 0) = 0.5.
In fact the above probability can be found directly using the symmetry property of the
normal distribution. Such direct answers, giving the correct value above and stating the
symmetry, were also accepted.
ii. We can write:
(d) You are told that a 95% confidence interval for a population proportion is
(0.3775, 0.6225). What was the sample proportion that lead to this confidence
interval? Also, what was the size of the sample used?
(5 marks)
0.5
1.96 × √ = 0.1225.
n
• Remember to round up the solution to the equation above. The correct sample size is
n = 64.
Some candidates forgot to round up. Remember that we are asked about a sample size.
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i=5
X i=5
X i=4
X
i. xi ii. 3xi (yi − 2) iii. x25 + (x2i + yi )
i=1 i=3 i=2
(6 marks)
(f ) Suppose there are two boxes; the first one contains three green and one red
balls, whereas the second contains two green and two red balls. First, a box is
chosen at random and then a ball is drawn randomly from that box.
i. What is the probability that the ball drawn is green?
ii. If the ball drawn was green, what is the probability that the first box was
chosen?
(5 marks)
3 1 1 1 5
× + × = .
P (G) =
4 2 2 2 8
Note: Some candidates reported the number 0.625 instead of 5/8. This is acceptable as
long as three decimal places are used.
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
x 0 1 2 3
pX (x) 0.2 0.3 0.1 0.4
i. Find the probability that X is an odd number.
ii. Find E(X), the expected value of X.
(4 marks)
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In quota sampling we cannot draw statistical inference.
ii. The Spearman rank correlation coefficient is more useful than Pearson
correlation in data with outliers.
iii. If the constant in the regression equation is negative, the correlation will also
be negative.
iv If the p-value for a test is larger than the significance level, we reject H0 .
v. In experimental studies one can use quota sampling to select the treatment
and control groups.
(10 marks)
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Section B
Question 2
(a) A social survey in the UK asked subjects, ‘Do you do your shopping online?’
with the possible answers being ‘Frequently’, ‘Rarely’ and ‘Never’. The table
below cross-classifies their responses with their gender.
Shop online
Gender Frequently Rarely Never Total
Male 52 (26%) 94 (47%) 54 (27%) 200 (100%)
Female 47 (39%) 52 (43%) 21 (18%) 120 (100%)
Total 99 (31%) 146 (46%) 75 (23%) 320 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between gender and tendency to shop
online?
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
(13 marks)
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) i. You have been asked to design a nationwide survey in your country to find
out about internet usage among children less than 10 years old. Provide a
probability sampling scheme and a sampling frame that you would like to
use. Identify a potential source of selection bias that may occur and discuss
how this issue can be addressed.
ii. Describe what is a longitudinal survey. State two ways in which panel
surveys differ from longitudinal surveys.
(12 marks)
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Way to address it – 1 mark: reset the target population group to match what the sampling
frame is actually providing.
Part (ii.) was on longitudinal studies and involved more direct questions. One mark was
given for a description of a longitudinal study; i.e. a longitudinal survey is a survey where
the same individuals are resurveyed over time. Another mark was given for a relevant
example. In terms of ways panel surveys are different from longitudinal surveys, two marks
were given for each statement in the list below (for a maximum of four marks):
• they are more likely to be chosen by quota rather than random methods
• individuals are interviewed every 2 to 4 weeks (rather than every few years)
• individuals are unlikely to be panel members for longer than two years at a time.
Question 3
(a) A car insurance company would like to examine the relationship between driving
experience and insurance premium. For this reason, a random sample of ten
drivers is taken and the years of driving experience (x) as well as the monthly
insurance premium (y, in £) is recorded. The data are shown in the table below.
Driver #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Driving experience (x) 6 3 11 10 15 6 25 16 15 20
Insurance premium (y) 66 88 51 70 44 56 42 60 45 40
The summary statistics for these data are:
Sum of x data: 127 Sum of the squares of x data: 2033
Sum of y data: 562 Sum of the squares of y data: 33662
Sum of the products of x and y data: 6402
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
monthly insurance premium for a driver with 10 years of experience? Will
you trust this value? Justify your answer.
(13 marks)
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
80
y: insurance premium in pounds
70
60
50
40
5 10 15 20 25
ii. The summary statistics can be substituted into the formula for the correlation coefficient
(make sure you know which one it is!) to obtain the value −0.7872. An interpretation of
this value is the following: the data suggest that the higher the driving experience of a
certain driver, the lower the insurance premium. The fact that the value is very close to
−1, suggests that this is a strong (negative) linear association.
iii. The regression line can be written by the equation ŷ = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = −1.7505.
The formula for a is a = ȳ − bx̄, so we get a = 78.4318.
Hence the regression line can be written as ŷ = 78.4318 − 1.7505x or
y = 78.4318 − 1.7505x + ε. It should also be plotted in the scatter diagram.
iv. The prediction will be ŷ = 78.4318 − 1.7505 × 10 = £60.93. Yes, we would trust this
value, since this point is inside the observed range of x, and therefore the prediction is
based on interpolation.
Many candidates did not give the measurement units here. These are essential in
answering such questions and a mark is deducted if they are not specified. It is also
important to provide the answer to at least two decimal places.
(b) A company wants to check the quality of its customer service regarding phone
enquiries. For this reason, the manager wants to compare the call waiting times
during the years 2013 and 2012. Unfortunately, extensive records of the
company are not available, and he can only check a random sample of phone
calls within these two years. The available data, measured in minutes of waiting
time, are provided below for each year.
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
The test statistic value is 2.354 (2.394 if the pooled variance is used). The critical value
at the 5% significance level, assuming a normal approximation as the number of
observations is large, is ±1.96. Hence, we reject the null hypothesis suggesting evidence
for a difference between the two years. If we take a (smaller) significance level of 1%, the
critical value is ±2.576, so we do not reject H0 . We conclude that there is some but not
strong (i.e. moderate) evidence of a difference between the two years.
ii. The assumptions for (ii.) were:
• Assumption about whether σ12 = σ22 .
• Assumption about whether n1 + n2 − 2 is ‘large’, hence t vs. z.
• Assumption about independent samples.
iii. This case corresponds to a one-sided test, therefore the hypotheses would be H0 : µ1 = µ2
vs. H1 : µ1 > µ2 . The test statistic value is the same for this case but the critical values
are now 1.645 for the 5% significance level and ≈ 2.33 for the 1% significance level. As
we now reject H0 at both levels we conclude that there is strong evidence (i.e. the result
is highly significant) that the mean waiting time in 2013 was greater than in 2012.
Question 4
(a) i. Carefully construct a box plot on the graph paper provided to display the
following average daily intakes of calories for 12 athletes, measured in kcals:
1808 2200 2154 2004 2101 1957 3061 2500 2009 2147 2231 1936
ii. Based on the shape of the box plot you have drawn, describe the distribution
of the data
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
(13 marks)
i. The boxplot diagram the Examiners were expecting to see is shown below. Marks were
awarded for including the title, identifying the box and the whiskers and noting outliers,
at a reasonable accuracy.
In order to identify the box, the quartiles are needed which are 1992.25 (anything
between 1957 and 2009 is acceptable), 2124.00, 2207.75 (anything between 2200 and 2231
is also acceptable as long as it is consistent with Q1 ), hence giving an interquartile range
of 215.5 (or anything else consistent with the values of Q1 and Q3 ).
Hence the outlier limits are from 1669 to 2531.
The value of 3061 is therefore an outlier.
Note that no label of the x-axis is necessary and that the plot can be transposed.
ii. Based on the shape of the boxplot above, we can see that the distribution of the data is
positively skewed.
iii. A histogram, steam-and-leaf diagram or a dot plot are other types of suitable graphical
displays. The reason is that the variable income is measurable and these graphs are
suitable for displaying the distribution of such variables.
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) A study was made to determine the amount of fuel economy obtained by
using a specific new type of tyre over a standard type. For this reason, 8 cars
were fitted with the new type of tyre and the fuel consumption (in km/l) was
measured after a test-drive. Afterwards, the same cars with the same drivers
were fitted with the standard type of tyre and the experiment was repeated
to obtain the following fuel consumption measurements.
Car #1 #2 #3 #4 #5 #6 #7 #8
Standard type tyres 4.6 6.5 7.4 5.5 5.3 5.2 6.6 6.7
New type tyres 4.1 6.2 7.1 5.4 5.5 5.1 6.1 6.3
i. Carry out an appropriate hypothesis test to determine whether the fuel
consumption is different between the two types of tyre. State the test
hypotheses, and specify your test statistic and its distribution under the
null hypothesis. Comment on your findings.
ii. State any assumptions you made in (i.).
iii. Give a 95% confidence interval for the difference in means.
iv. On the basis of the data alone, would you be concerned about fuel
consumption if you wanted to buy the new type of tyre? Provide an
explanation with your answer.
(12 marks)
(b) Reading for this question
Look up the sections about hypothesis testing for testing differences in means in Chapter 8.
However, it is essential for this part to focus on the section regarding paired samples
(Section 8.16.4).
The next step is to calculate√sd = 0.23905 and x̄d = 0.25, in order to obtain the test
statistic value (x̄d − 0)/(sd / n) = 2.958.
We have a t distribution with 7 degrees of freedom, hence the critical value (for a
one-sided test) is 2.365.
Hence, we reject H0 at the 5% significance level. Testing at the 1% significance level
gives a critical value of t7, 0.01 = 3.499. Therefore, we do not reject H0 concluding that
there is moderate evidence of a difference between the two types of tyre.
ii. Assumptions are:
• Differences normally distributed [no marks for normally distributed fuel
consumption].
• Pairs of observations are independent [a weaker condition which suffices is that the
differences are independent, but this is unlikely if observations are not].
iii. This is a straightforward exercise for confidence intervals using the appropriate formula
from the formula sheet (make sure to be able to recognise it). The requested confidence
interval is (0.0501, 0.4499).
iv. There is some, but not strong, evidence in the data that the new type of tyre results in a
lower fuel consumption. This can be seen, for example, from the 95% confidence interval
whose endpoints are both positive.
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2013–14. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refers to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) The table below contains the number of customers that visited two different
branches of a bank on the last days of the previous month:
Branch A 75 72 142 120 79 75 81
Branch B 81 88 83 89 82 91 92 87 82
i. Find the mean and the median number of customers for each branch.
ii. Comment on the differences in the mean and median for the two branches
that you found in part (i.). For this data set, which do you think would give
a better description for the number of customers: the mean or the median?
Explain briefly.
iii. After making some enquiries, you find out that the ATM next door to
branch A was not working on the days with 142 and 120 customers. Without
doing any calculations, would you change your answers about potential
differences between the means and between the medians of the two
branches? Give explanations for any statements that you make.
(8 marks)
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
and:
(81 + 88 + · · · + 82)/9 = 66.1, for branch B.
For the median if we put the numbers in ascending order we get:
and:
81 82 82 83 87 88 89 91 92, for branch B.
The median for branch B is given by taking the 4th number in the first of the rows
above, which is 79. The median for branch B is obtained from the 5th number in the 2nd
row above, which is 87.
One mark for each of the four numbers above was awarded.
ii It is first important to note that the mean of branch A is higher than that of branch B.
However, this does not necessarily indicate that the centre of the distribution of branch
A is larger than that of branch B. Branch A contains two outliers (120 and 142) which
result in a high mean. Apart from these outliers, the numbers of customers tend to be
higher in branch B, so the median gives a somewhat better indication of the ‘typical’
number of customers for each branch.
iii. After taking out these days we can argue that both the mean and median for branch A
would be smaller than that for branch B on a day that the ATM next door to branch A
is working. This is because all the numbers of customers in branch A (on a day when the
ATM next door to branch A is working) are smaller or equal to any number of customers
in branch B.
(c) Suppose that X is a normally distributed random variable with mean 0 and
variance 1.
i. Find the probability that X + 3 is less than 3.
ii. Find the value of b so that the probability of X − b being less than zero is
0.95.
(4 marks)
In fact the above probability can be found directly using the symmetry property of the
normal distribution. Such direct answers, giving the correct value above and stating the
symmetry, were also accepted.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(d) You are told that a 90% confidence interval for a population proportion is
(0.4086, 0.5914). What was the sample proportion that lead to this confidence
interval? Also, what was the size of the sample used?
(5 marks)
i=5
X i=5
X i=4
X
i. xi ii. 3xi (yi − 2) iii. x25 + (x2i + yi )
i=1 i=3 i=2
(6 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
5
P
i. xi = 3 + (−2) + 1 + 0 − 2 = 0.
i=1
5
P 5
P
ii. 3xi (yi − 2) = 3 xi (yi − 2) = 3(1 × (−1 − 2) + 0 × (2 − 2) + (−2) × (0 − 2)) = 3 × 1 = 3.
i=3 i=3
4
iii. x25 + (x2i + yi ) = (−2)2 + ((−2)2 + 2) + (12 − 1) + (02 + 2) = 4 + 6 + 0 + 2 = 12.
P
i=2
(f ) Suppose there are two boxes; the first one contains one green and three red
balls, whereas the second contains two green and two red balls. First, a box is
chosen at random and then a ball is drawn randomly from that box.
i. What is the probability that the ball drawn is green?
ii. If the ball drawn was green, what is the probability that the first box was
chosen?
(5 marks)
1 1 1 1 3
P (G) = × + × = .
4 2 2 2 8
Note: Some candidates reported the number 0.375 instead of 3/8. This is acceptable as
long as three decimal places are used.
ii. This part can be found by using Bayes’ formula:
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In stratified random sampling the interviewer selects a certain number of
people according to some pre-specified strata.
ii. If two variables have correlation which is almost zero, we can conclude that
they are independent.
iii. If two variables have correlation which is close to one, we can conclude that
the variables are related.
iv. If the χ2 test statistic is larger than the 5% critical value, the p-value is also
larger than 0.05.
v. Cluster sampling can be used to reduce the cost of a survey.
(10 marks)
i. False, because in stratified sampling people would be chosen randomly, not by the
interviewer.
ii. False, because the two variables may have a non-linear relationship.
iii. True, because this means they would have a strong linear relationship.
iv. False, because if the χ2 value was above the 5% critical value, the test is significant
which means the p-value would be smaller than 0.05.
v. True, because cluster sampling can be cheaper than other forms of random sampling.
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Section B
Question 2
(a) A social survey in the UK asked subjects, ‘Do you buy organic products,
despite the fact they are usually more expensive?’ with the possible answers
being ‘Yes’, ‘Sometimes’ and ‘No’. The table below cross-classifies their
responses with their place of residence (‘Rural’ or ‘Urban’ areas).
Buy organic products
Place of residence Yes Sometimes No Total
Rural area 35 (17%) 90 (45%) 75 (38%) 200 (100%)
Urban area 73 (21%) 163 (46%) 114 (33%) 350 (100%)
Total 108(20%) 253 (46%) 189 (34%) 550 (100%)
i. Based on the data in the table, and without doing a significance test, how
would you describe the relationship between place of residence and buying
organic products?
ii. Calculate the χ2 statistic and use it to test for independence, using a 10%
significance level. What do you conclude?
(13 marks)
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Many candidates looked up the tables incorrectly and so failed to follow through their
earlier accurate work. A larger number did not expand on their results sufficiently.
Saying ‘we do not reject at the 5% significance level, but do reject at the 10% significance
level’ is insufficient. What does this mean? Is there an association or not? If there is one,
how strong is it? This needed to be answered if the full nine marks allocated for this
question were to be given. Many candidates lost marks by missing out follow-up like this.
(b) i. You have been asked to design a nationwide survey in your country to find
out about internet usage among children less than 10 years old. Provide a
probability sampling scheme and a sampling frame that you would like to
use. Identify a potential source of selection bias that may occur and discuss
how this issue can be addressed.
ii. Describe what is a longitudinal survey. State an advantage and a
disadvantage when using such surveys.
(12 marks)
Part (ii.) was on longitudinal studies and involved more direct questions. One mark was
given for a description of a longitudinal study; i.e. a longitudinal survey is a survey where
the same individuals are resurveyed over time. Another mark was given for a relevant
example. In terms of advantages of longitudinal surveys, two marks were given for any in
the list below or any other sensible argument:
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
• Being able to see how individuals change over time. For example, the kinds of people
who change the products they buy in response to price changes or advertising campaigns.
• Being able to see the characteristics of those who do not change with respect to an
attribute. For example, seeing the characteristics of those who are loyal to a brand.
(Note that a cross-sectional survey might show no overall change, but the individuals’
positions might have reversed.)
Marks were also given for disadvantages; see below for some examples.
• The response rate might tail off over time (drop out).
• To improve the response rate, the researcher may have an effect on the respondents
(conditioning).
Question 3
(a) We are interested in studying the association between the price of flour and the
production of wheat in a particular area of the UK. The data shown in the table
below provide figures regarding the production of wheat in tonnes (x) as well as
the price of flour (y), in £ per kg, over the last 10 years.
Year #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Production of Wheat (x) 30 28 32 25 25 24 22 24 35 40
Price of Flour (y) 25 30 27 40 42 41 50 45 30 25
The summary statistics for these data are:
Sum of x data: 285 Sum of the squares of x data: 8419
Sum of y data: 355 Sum of the squares of y data: 13349
Sum of the products of x and y data: 9718
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
price of flour for a year with 45 tonnes production of wheat? Will you trust
this value? Justify your answer.
(13 marks)
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
50
45
y: Flour price (Sterlings per kgr)
40
35
30
25
25 30 35 40
ii. The summary statistics can be substituted into the formula for the correlation coefficient
(make sure you know which one it is!) to obtain the value −0.8492. An interpretation of
this value is the following: the data suggest that the higher the production of wheat, the
lower the price. The fact that the value is very close to −1, suggests that this is a strong
(negative) linear association.
iii. The regression line can be written by the equation ŷ = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = −1.3474.
The formula for a is a = ȳ − bx̄, so we get a = 73.9005.
Hence the regression line can be written as ŷ = 73.9005 − 1.3474x or
y = 173.9005 − 1.3474x + ε. It should also be plotted in the scatter diagram.
iv. The prediction will be ŷ = 73.9005 − 1.3474 × 45 = 13.268 13.268 £ per kg. However,
since this point is outside the range of x, this prediction should not be trusted too much
as it is based on extrapolation.
Many candidates did not give the measurement units here. These are essential in
answering such questions and a mark is deducted if they are not specified. It is also
important to provide the answer to at least two decimal places.
(b) A company wants to check the quality of its customer service regarding phone
enquiries. For this reason the manager wants to compare the call waiting times
during the years 2013 and 2012. Unfortunately, extensive records of the
company are not available, and he can only check a random sample of phone
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
calls within these two years. The available data, measured in minutes of waiting
times, are provided below for each year.
The test statistic value is 2.3225 (2.3624 if the pooled variance is used). The critical
value at the 5% significance level, assuming a normal approximation as the number of
observations is large, is ±1.96. Hence, we reject the null hypothesis suggesting evidence
for a difference between the two years. If we take a (smaller) significance level of 1%, the
critical value is ±2.576, so we do not reject H0 . We conclude that there is some but not
strong (i.e. moderate) evidence of a difference between the two years.
ii. The assumptions for (ii.) were:
• Assumption about whether σ12 = σ22 .
• Assumption about whether n1 + n2 − 2 is ‘large’, hence t vs. z.
• Assumption about independent samples.
iii. This case corresponds to a one-sided test, therefore the hypotheses would be H0 : µ1 = µ2
vs. H1 : µ1 > µ2 . The test statistic value is the same for this case but the critical values
are now 1.645 for the 5% significance level and ≈ 2.33 for the 1% significance level. As
we now reject H0 at both levels we conclude that there is strong evidence (i.e. the result
is highly significant) that the mean waiting time in 2013 was greater than in 2012.
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Question 4
(a) i. Carefully construct a box plot on the graph paper provided to display the
following annual earnings for the salesmen of a company, measured in £000s:
35 26 22 24 21 57 36 35 29 47 30 36
ii. Based on the shape of the box plot you have drawn, describe the distribution
of the data
iii. Name two other types of graphical displays that would be suitable to
represent the data. Briefly explain your choices.
(13 marks)
In order to identify the box, the quartiles are needed that are 25.5 (anything between 24
and 26 is acceptable), 32.5, 36, hence giving an interquartile range of 10.5.
Hence the outlier limits are from 9.75 to 51.75.
The value of 57 is therefore an outlier.
Note that no label of the x-axis is necessary and that the plot can be transposed.
ii. Based on the shape of the boxplot above, we can see that the distribution of the data is
positively skewed.
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. A histogram, steam-and-leaf diagram or a dot plot are other types of suitable graphical
displays. The reason is that the variable income is measurable and these graphs are
suitable for displaying the distribution of such variables.
(b) A study was made to determine the amount of fuel economy obtained by
using a specific new type of tyre over a standard type. For this reason, 8 cars
were fitted with the new type of tyre and the fuel consumption (in km/l) was
measured after a test-drive. Afterwards, the same cars with the same drivers
were fitted with the standard type tyres and the experiment was repeated to
obtain the following fuel consumption measurements.
Car #1 #2 #3 #4 #5 #6 #7 #8
Standard type tyres 4.6 6.5 7.4 5.5 5.3 5.2 6.6 6.7
New type tyres 5.1 6.2 7.3 5.4 5.5 5.1 6.1 7.3
i. Carry out an appropriate hypothesis test to determine whether the fuel
consumption is different between the two types of tyre. State the test
hypotheses, and specify your test statistic and its distribution under the
null hypothesis. Comment on your findings.
ii. State any assumptions you made in (i.).
iii. Give a 95% confidence interval for the difference in means.
iv. On the basis of the data alone, would you be concerned about fuel
consumption if you wanted to buy the new type of tyre? Provide an
explanation with your answer.
(12 marks)
(b) Reading for this question
Look up the sections about hypothesis testing for testing differences in means in Chapter 8.
However, it is essential for this part to focus on the section regarding paired samples
(Section 8.16.4).
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(a) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or ordinal.
Justify your answer. (Note that no marks will be awarded without a justification.)
[8 marks]
2, 6, x, 13, 9.
You are told that the value of the sample mean is 8.
[4 marks]
(c) For a certain type of laptop, the duration of a fully charged battery until it becomes
empty, X, is normally distributed with a mean of 6 hours and a standard deviation
of 2 hours.
i. What is the probability that such a battery will last at least 5 hours?
ii. What is the probability that such a battery will last between 6 and 8 hours?
[4 marks]
!
i=3 !
i=5 !
i=4
i. 2xi ii. 3xi (yi − 2) iii. y42 + (3xi + yi2 ).
i=1 i=3 i=2
[6 marks]
UL15/0217 Page 2 of 7
D00
UL15/0377 Page 2 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(e) The length of stay in a hospital is useful for planning purposes. Let X denote
the length of stay in days in a hospital after a minor operation. The probability
distribution of X is given below:
x 1 2 3 4
pX (x) 0.5 0.3 0.1 0.1
i. Find E(X), the expected length of stay in days in hospital after a minor
operation.
ii. A new policy in the hospital will add exactly one day to the length of stay
for this operation for every stay. Will the probability distribution of X change
after this new policy is put in place? If so, what will be the new expected
length of stay after this new policy is put in place?
[4 marks]
(f) The NBA basketball player LeBron James generally shoots his first 3-point in a
basketball game shot with a 30% success rate. If LeBron makes his first 3-point
shot, the success rate on his following 3-point shots goes up to 40%. If he misses it,
the success rate on his following 3-point shots drops to 20%.
i. What is the probability that LeBron James makes exactly one of his first two
3-point shots?
ii. If LeBron made exactly one of his first two 3-point shots, what is the probability
that the shot he made was the first one?
Note: A 3-point shot is when player attempts to put the ball into the basket
from a wide distance.
[5 marks]
(g) It is known that the true mean mark in the course of ‘Statistics I’ at LSE is 64.5.
A random sample of 49 LSE athletes who took the course was taken where the
sample average was x̄ = 63.1 and the sample standard deviation was s = 5.6.
Perform a suitable hypothesis test to determine whether LSE athletes have a
different mean mark for the course ‘Statistics I’ than LSE students in general. State
your hypotheses, the test statistic and its distribution under the null hypothesis,
and your conclusion in the context of the problem.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. In a confidence interval for a population mean, an increase of the variance
will increase the width of the interval (assuming that everything else remains
constant).
UL15/0217 Page 3 of 7
UL15/0377 Page 3 of 9 D1
D00
[12 marks]
UL15/0217 Page 4 of 7
UL15/0377 Page 4 of 9 D1
D00
SECTION B
2. (a) A study looked into the amount of help students are receiving, and consisted
of 300 students from three schools. The students were classified into three
categories according to the type of help they receive. The data are shown
below.
Type of Problem
Private tuition Help from family No help Total
School 1 35 25 40 100
School 2 28 47 25 100
School 3 38 22 40 100
Total 101 94 105 300
i. Based on the data in the table, and without conducting a significance
test, compare the distributions of help received by students within schools.
Which type of help is most common in School 1, School 2 and School 3?
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
[14 marks]
(b) i. Describe what selection bias is and when it may occur. Give an example.
ii. You have been asked to design a nationwide survey in your country to
find out about working conditions among employees in the postal offices.
Provide a probability sampling scheme and a sampling frame that you
would like to use. Identify a potential source of response bias that may
occur and discuss how this issue could be addressed.
[11 marks]
UL15/0217 Page 5 of 7
UL15/0377 Page 5 of 9 D1
D00
3. A study is made for a particular allergy medication in order to determine the length
of relief it provides Y (in hours) in relation to the dosage of medication X (in mg).
For this reason, ten patients were given different doses of the medication and were
asked to report back when the medication seemed to wear off.
Patient #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
Dosage (x) 3 3.5 4 5 6 6.5 7 8 8.5 9
Relief Hours (y) 9.1 5.5 12.3 9.2 14.2 16.8 22.0 18.3 24.5 22.7
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
length of relief for a dosage of 11 mg? Will you trust this value? Justify
your answer.
[13 marks]
(b) A study focused on the perception of life satisfaction that may vary between
older and younger people. For this reason 12 adults over the age of 70 and 16
adults aged between 18 and 30 took a life satisfaction test that gave a score
for each one of them (high values of the score indicate higher life satisfaction).
Summaries of these scores are presented below.
[12 marks]
UL15/0217 Page 6 of 7
UL15/0377 Page 6 of 9 D1
D00
4. (a) A variety of a broad bean plant is studied and the number of beans per plant
is counted and listed below.
71 94 62 74 106
76 87 94 76 78
83 56 78 79 80
60 92 54 81 45
72 54 45 85 72
74 65 68 55 66
i. Carefully construct, draw and label a stem-and-leaf diagram of these data.
ii. Find the mean (given that the sum of the data is 2182), the median and
the modal stem.
iii. Comment on the data given the shape of the stem-and-leaf diagram and
the measures you have calculated.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]
END OF PAPER
UL15/0217 Page 7 of 7
UL15/0377 Page 7 of 9 D1
D00
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
1
UL15/0377 Page 8 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
=p
Z∼ Z= p
π0 (1 − π0 )/n σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 ) 1 1
T = q 2
(x̄1 − x̄2 )±tα/2, n1 +n2 −2 · sp +
Sp2 (1/n1 + 1/n2 ) n1 n 2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tα/2, n−1 · √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
X
r X
c
(Oij − Eij )2 P
n
xi yi − nx̄ȳ
Eij r = s i=1
i=1 j=1
P
n P
n
x2i − nx̄2 yi2 − nȳ 2
i=1 i=1
a = ȳ − bx̄
2
UL15/0377 Page 9 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A ZA d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(a) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or ordinal.
Justify your answer. (Note that no marks will be awarded without a justification.)
[8 marks]
3, 5, x, 12, 10
You are told that the value of the sample mean is 7.
[4 marks]
(c) For a certain type of laptop, the duration of a fully charged battery until it becomes
empty, X, is normally distributed with a mean of 5 hours and a standard deviation
of 1.5 hours.
i. What is the probability that such a battery will last at least 4 hours?
ii. What is the probability that such a battery will last between 5 and 7 hours?
[4 marks]
!
i=5 !
i=4 !
i=3
i. 3xi ii. 2xi (yi − 3) iii. y32 + (2xi + yi2 ).
i=3 i=2 i=1
[6 marks]
UL15/0217 Page 2 of 7
D00
UL15/0378 Page 2 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(e) The length of stay in a hospital is useful for planning purposes. Let X denote
the length of stay in days in a hospital after a minor operation. The probability
distribution of X is given below:
x 1 2 3 4
pX (x) 0.4 0.3 0.2 0.1
i. Find E(X), the expected length of stay in days in hospital after a minor
operation.
ii. A new policy in the hospital will add exactly one day to the length of stay
for this operation for every stay. Will the probability distribution of X change
after this new policy is put in place? If so, what will be the new expected
length of stay after this new policy is put in place?
[4 marks]
(f) The NBA basketball player Kobe Bryant generally shoots his first 3-point shot in a
basketball game with a 40% success rate. If Kobe makes his first 3-point, the success
rate on his following 3-point shots goes up to 50%. If he misses it, the success rate
on his following 3-point shots drops to 20%.
i. What is the probability that Kobe Bryant makes exactly one of his first two
3-point shots?
ii. If Kobe made exactly one of his first two 3-point shots, what is the probability
that the shot he made was the first one?
Note: A 3-point shot is when player attempts to put the ball into the basket
from a wide distance.
[5 marks]
(g) It is known that the true mean mark in the course of ‘Statistics I’ at LSE is 63.5.
A random sample of 49 LSE athletes who took this course was taken where the
sample average was x̄ = 62.2 and the sample standard deviation was s = 5.2.
Perform a suitable hypothesis test to determine whether LSE athletes have a
different mean mark for the course ‘Statistics I’ than LSE students in general. State
your hypotheses, the test statistic and its distribution under the null hypothesis,
and your conclusion in the context of the problem.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. Increasing the confidence level will increase the width of a confidence interval
for a population mean (assuming that everything else remains constant).
UL15/0217 Page 3 of 7
UL15/0378 Page 3 of 9 D1
D00
ii. Increasing the sample size will increase the width of a confidence interval for a
population mean (assuming that everything else remains constant).
√
Alternatively since divide by n the width will decrease if n increases
iii. In a χ2 test, an increase in the significance level α from 5% to 10% will decrease
the probability of a Type I error.
iv. In a χ2 test, if the p-value is smaller than the significance level, we conclude
that there is association between the two relevant variables.
v. In a sample survey assume that some respondents replied to all the questions
and some did not reply at all. The non-responses are called ‘item non response’.
vi. The regression of the variable Y on the variable X will always have the same
slope as the regression of the variable X on the variable Y .
[12 marks]
UL15/0217 Page 4 of 7
UL15/0378 Page 4 of 9 D1
D00
SECTION B
2. (a) A mental health study focused on 300 patients visiting three community mental
health centres. The patients were classified into three groups according to the
primary issue for which they were seen. The data are shown below.
Type of Problem
Social Adjustment Stress Related Other Total
Centre 1 45 28 27 100
Centre 2 28 44 28 100
Centre 3 46 29 25 100
Total 119 101 80 300
i. Based on the data in the table, and without conducting a significance test,
compare the distributions of problems within centres. Which problem is
most common in Centre 1, Centre 2 and Centre 3?
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
[14 marks]
(b) i. Describe what response bias is and when it may occur. Give an example.
ii. You have been asked to design a nationwide survey in your country to find
out about job satisfaction among employees in the banking sector. Provide
a probability sampling scheme and a sampling frame that you would like
to use. Identify a potential source of response bias that may occur and
discuss how this issue could be addressed.
[11 marks]
UL15/0217 Page 5 of 7
UL15/0378 Page 5 of 9 D1
D00
3. A chain of package delivery stores is looking into the association between weekly
sales (in hundreds of $) in each store (y) and the number of customers who made
purchases in that week (x). For this reason, 10 stores were selected at random from
all the stores in the chain and the variables x and y were recorded. They appear in
the table below:
Store #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
# of customers (x) 90 92 50 74 78 88 87 51 53 42
Sales (y) 11.2 11.1 6.8 9.2 9.4 10.1 9.4 7.7 8.2 6.1
(a) i. Draw a scatter diagram of these data on the graph paper provided. Label
the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
weekly sales for a store where 70 customers made a purchase? Will you
trust this value? Justify your answer.
[13 marks]
(b) A study focused on the perception of life satisfaction that may vary between
older and younger people. For this reason 15 adults over the age of 70 and 13
adults aged between 18 and 30 took a life satisfaction test that gave a score
for each one of them (high values of the score indicate higher life satisfaction).
Summaries of these scores are presented below.
[12 marks]
UL15/0217 Page 6 of 7
UL15/0378 Page 6 of 9 D1
D00
4. (a) Thirty people were asked about the number of hours they exercise in a seven
day period and their answers were recorded and listed below.
2.0 4.0 4.5 5.0 5.5
6.0 6.5 6.5 7.0 7.0
7.5 7.5 8.0 8.0 8.5
8.5 8.5 9.0 9.0 10.0
10.5 10.5 11.0 11.5 12.0
13.0 14.0 17.0 18.0 21.0
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean (given that the sum of the data is 277), the median and
the modal group.
iii. Comment on the data given the shape of the histogram and the measures
you have calculated.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]
END OF PAPER
UL15/0217 Page 7 of 7
UL15/0378 Page 7 of 9 D1
D00
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
1
UL15/0378 Page 8 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
=p
Z∼ Z= p
π0 (1 − π0 )/n σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 ) 1 1
T = q 2
(x̄1 − x̄2 )±tα/2, n1 +n2 −2 · sp +
Sp2 (1/n1 + 1/n2 ) n1 n 2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tα/2, n−1 · √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
X
r X
c
(Oij − Eij )2 P
n
xi yi − nx̄ȳ
Eij r = s i=1
i=1 j=1
P
n P
n
x2i − nx̄2 yi2 − nȳ 2
i=1 i=1
a = ȳ − bx̄
2
UL15/0378 Page 9 of 9 D1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2014–15. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
By the end of this course and having completed the Essential reading and activities you should:
• be familiar with the key ideas of statistics that are accessible to a candidate with a
moderate mathematical competence
• be able to apply a variety of methods for explaining, summarising and presenting data and
interpreting results clearly using appropriate diagrams, titles and labels when required
• be able to summarise the ideas of randomness and variability, and the way in which these
link to probability theory to allow the systematic and logical collection of statistical
techniques of great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to perform inference to test the significance of common measures such as means and
proportions and conduct chi-squared tests of contingency tables
• be able to use simple regression and correlation analysis and know when it is appropriate to
do so.
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory and covers several subquestions and accounts for 50 per cent of the total marks. Section
B contains three questions, each worth 25 per cent, from which you are asked to choose two.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Remember that each of the Section B questions is likely to cover more than one topic. In 2014, for
example, the first part of Question 2 asked for a chi-squared test and survey design problems
appeared in the second. The first part of Question 3 was on regression and involved drawing a
diagram, while the second part was a hypothesis test comparing population means using the sample
data given. Question 4 had a series of questions which involved, drawing diagrams, such as box
plots, hypothesis testing, in particular paired t-tests, and confidence intervals. This means that it is
really important that you make sure you have a reasonable idea of what topics are covered before
you start work on the paper! We suggest you divide your time as follows during the examination:
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Don’t allow yourself to get stuck on any one
question, but don’t just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams that were required and, if you did more than the two
questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are.
The examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
You are not expected to write long essays where explanations or descriptions of sample design
are required, and note form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2014 examinations.
Remember:
• If you are asked to label a diagram (which is almost always the case), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x and y axes?
• If you are specifically asked to carry out a hypothesis test, or a confidence interval, do so. It
is not acceptable to do one rather than the other. If you are asked to find a 5% value, this is
what will be marked.
• Do not waste time calculating things which are not required by the examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not help your marks.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the examiners were looking for
• the relevant detailed reference to P. Newbold, W.L. Carlson and B.M. Thorne Statistics for
business and economics. (London: Prentice–Hall, 2012) eighth edition [ISBN
9780273767060] and the subject guide
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold (2012).
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
Important note
In 2015, ST104a Statistics 1 was examined by two replacement examination papers, sat on 28
May and 3 June. Commentaries for these papers are provided and hence references are to these two
dates rather than ‘Zone A’ and ‘Zone B’.
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons. The Examiners’ commentaries suggest ways of
addressing common problems and improving your performance. One particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates may not cover all topics in the syllabus in the same depth, but you
need to be aware that examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet in the section of the VLE dedicated to
each course. You should read the syllabus carefully and ensure that you cover sufficient material in
preparation for the examination. Examiners will vary the topics and questions from year to year and
may well set questions that have not appeared in past papers. Examination papers may legitimately
include questions on any topic in the syllabus. So, although past papers can be helpful during your
revision, you cannot assume that topics or specific questions that have come up in past examinations
will occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2014–15. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
8, 2, 6, x, 5.
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. • Method:
(8 − 6)2 + (2 − 6)2 + (6 − 6)2 + (9 − 6)2 + (5 − 6)2
s2 =
4
• Correct value: 7.5.
Some candidates divided by 5 in the formula above. In such cases only one mark was
awarded for part (ii), provided that the correct value was obtained. The reason is that the
formula for the sample variance provided in the subject guide only suggests dividing by
n − 1, where n is the number of observations. In another error that occurred in some cases,
candidates subtracted the number x = 9 rather than the sample mean which is given to be 6.
[6 marks]
(c) In a population 20% of men show early signs of losing their hair and 2% of them
carry a gene that is related to hair loss. It is also known that 80% of men who
carry the gene experience early hair loss.
i. What is the probability that a man carries the gene and experiences early
hair loss?
ii. What is the probability that a man carries the gene, given that he
experiences early hair loss?
[4 marks]
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(d) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer. (Note that no marks will be awarded without a
justification.)
i. Classification of a university degree.
ii. Fuel consumption of a car.
iii. Eye colour.
iv. The cost of life insurance.
[8 marks]
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(e) In the past, the mean telephone call time of customers to a computer helpline
has been 16.0 minutes. The computer company conducts a training scheme for
its telephone consultants with the intention of reducing this mean call time.
After training, a random sample of 20 calling times had a sample mean of 14.3
minutes and a sample standard deviation of 5.0 minutes. Carry out a hypothesis
test, at two suitable significance levels, to decide if the training scheme has been
successful. State your hypotheses, the test statistic and its distribution under
the null hypothesis, and your conclusion in the context of the problem.
[7 marks]
(f ) The amount of coffee dispensed into a coffee cup by a coffee machine follows a
normal distribution with mean 125 ml and standard deviation 8 ml.
i. Find the probability that one cup is filled above the level of 137 ml.
ii. What is the proportion of cups with coffee contents between 117 ml and 133
ml?
[4 marks]
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
∗ P (Z < a) = P (Z ≤ a) = Φ(a)
∗ P (Z > a) = P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − P (Z < a) = 1 − Φ(a)
∗ P (a < Z < b) = P (a ≤ Z < b) = P (a < Z ≤ b) = P (a ≤ Z ≤ b) = Φ(b) − Φ(a).
The above is all you need to find the requested proportions:
i. • We can write:
X − 125 137 − 125
P (X > 137) = P > = P (Z > 1.5).
8 8
(g) The variable X takes the values 1, 2, 3 and 5 according to the following
distribution
x 1 2 3 5
pX (x) 0.1 0.3 0.4 0.2
i. What is the probability that X is negative?
ii. Find E(X), the expected value of X.
iii. Find the probability that X 2 > 8.
[5 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. The chance that a normal random variable is less than one standard
deviation from its mean is 95%.
ii. Quota sampling is free of selection bias.
iii. Increasing the level of confidence will decrease the width of a confidence
interval for a population mean (assuming that everything else remains
constant).
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Section B
Question 2
(a) The following data show the periods (in minutes) that a random sample of
students needed to complete a statistics assignment:
76 59 93 87 38
50 56 123 45 67
102 34 54 85 85
50 44 33 51 40
82 92 79 38 86
34 29 107 63 46
i. Carefully construct a stem-and-leaf diagram of these data.
ii. Find the median and the quartiles.
iii. Comment on the data given the shape of the stem-and-leaf diagram without
any further calculations.
iv. Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(b) A random sample of 512 unionised workers found that 38 had been made
redundant in the last twelve months. An independent random sample of 654
non-unionised workers found that 67 had been made redundant over the same
period.
i. Give a 95% confidence interval for the difference in the rates of redundancy
between unionised and non-unionised workers.
ii. Carry out a hypothesis test, at two suitable significance levels, to determine
whether unionised workers are less likely to be made redundant compared to
non-unionised workers. State the test hypotheses, and your test statistic and
its distribution under the null hypothesis. Comment on your findings.
iii. State any assumptions you made in (ii.).
[13 marks]
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
interval is straightforward given the formula sheet; make sure to be able to recognise the
relevant formula. First, the standard error needs to be calculated:
s
p1 (1 − p1 ) p2 (1 − p2 )
s.e.(p1 − p2 ) = + = 0.0166.
n1 n2
Then, the lower and upper bounds can be found to be −0.0604 and 0.0044, respectively.
Finally, the above should be presented as an interval (−0.0604, 0.0044).
ii. As before, let π1 denote the proportion of unionised workers made redundant and π2 the
corresponding proportion for non-unionised workers. Also denote by p the overall
proportion of redundant workers. Regarding hypotheses, note that the wording ‘less
likely’ suggests an one sided test: H0 : π1 = π2 vs. H1 : π1 < π2 . The next step is to
identify the test statistic which is (p1 − p2 )/s.e.(p1 − p2 ), and follows a standard normal
distribution. s
1 1
s.e.(p1 − p2 ) = p(1 − p) + = 0.0169.
n1 n2
Based on the above, the value of the test statistic is −1.6576. The critical value at the
5% level is −1.645, hence we reject H0 at the 5% level. Testing at the 1% level gives a
critical value of −2.323. Therefore, we do not reject H0 concluding that there is
moderate evidence that unionised workers are less likely to be made redundant.
iii. • Sample size is large enough to justify the normality assumption.
• Equal variances.
Some candidates stated assumptions in this part that were not made in part (ii). Marks
were not awarded in such cases.
Question 3
(a) A survey was conducted to investigate the relationship between the frequency of
newspaper readership and readers’ educational background. The following table
shows the results of this survey:
Educational background
Graduate A-levels Less than A-levels Total
Low readership 19 32 49 100
Moderate readership 25 52 23 100
Frequent readership 46 40 14 100
Total 90 124 86 300
i. Based on the data in the table, and without conducting a significance test,
would you say there is an association between the frequency of newspaper
readership and reader’s educational background?
ii. Calculate the χ2 statistic and use it to test for independence, using two
appropriate significance levels. What do you conclude?
[14 marks]
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. There are some differences in the distributions within readership levels. More specifically,
graduates appear more frequent readers than low readership compared to those with
lower educational attainment than A-levels (46% vs. 19% and 14% vs. 49%,
respectively). For those with A-levels, most are of a moderate readership type (52%).
Hence, there seems to be an association between readership levels and reader’s
educational background, although this needs to be investigated further. (Note: the
conclusion of the last sentence must be stated to get full marks).
ii. Set out the null hypothesis that there is no association between readership level and
educational background against the alternative – that there is an association. Be careful
to get these the correct way round!
H0 : No association between readership level and educational background vs.
H1 : Association between readership level and educational background.
Work out the expected values to obtain the table below:
30.00 41.33 28.67
30.00 41.33 28.67
30.00 41.33 28.67
The test statistic formula is:
X (Oi,j − Ei,j )2
Ei,j
that gives a value of 41.350. This is a 3 × 3 contingency table so the degrees of freedom
are (3 − 1) × (3 − 1) = 4.
For α = 0.05, the critical value from the chi-squared distribution with 4 degrees of
freedom is 9.488, hence reject H0 .
Next, for α = 0.01, the critical value is 13.277, hence reject H0 again.
We conclude that there is strong evidence of an association between readership level and
educational background.
Many candidates looked up the tables incorrectly and so failed to follow through their
earlier accurate work. A larger number did not expand on their results sufficiently.
Saying ‘we do reject at the 5% level, but at 10%’ is insufficient. What does this mean? Is
there a connection or not? If there is one, how strong is it? This needed to be answered
if the full nine marks allocated for this question were to be earned. Many candidates lost
marks by missing out on follow-up like this.
(b) i. Explain the difference between item non-response and unit non-response.
ii. State any three factors which could cause non-response.
iii. A travel agency offers customers a range of ways to make holiday bookings –
in store, online and through their call centres. To determine the level of
customer satisfaction, the company’s management has decided to use a
survey of all types of customers and has asked you to devise an appropriate
random sampling scheme. Explain in detail your recommendation, including
how you might address non-response.
[11 marks]
This question was on basic material on survey designs. Background reading is given in Chapters
9 and 10 of the subject guide which, along with the recommended reading, should be looked at
carefully. Candidates were expected to have studied and understood the main important
constituents of design in random sampling. It is also a good idea to try the activities in Chapter
9.
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
One of the main things to avoid in this part is to write essays without any structure. This
exercise asks for specific things and each one of them requires 1 or 2 lines. If you are unsure of
what these things are, do not write lengthy essays. This is not giving you anything and is a
waste of your invaluable examination time. If you can identify what is being asked, keep in mind
that the answer should not be long. Note also that in some cases there is no unique answer
to the question.
The marking scheme and some model answers are given below:
i. • Item non-response occurs when a sampled member fails to respond to a question in the
questionnaire.
• Unit non-response occurs when no information is collected from a sample member.
ii. 3 marks: Any three of:
— Not-at-home.
— Refusals.
— Incapacity to respond.
— Not found
— Lost schedules.
iii. 6 marks: Possible ‘ingredients’ of an answer:
— Sampling frame to be the travel agency’s customer database.
— Propose stratified sampling since all types of customers are to be surveyed.
— Stratification factors could include booking method, gender, holiday type.
— Take a simple random sample from each stratum.
— Contact method: mail, phone or email (likely to have all details on database).
— Minimise non-response through suitable incentive, such as discount off next booking.
Question 4
(a) An area manager in a department store wants to study the relationship between
the number of workers on duty, x, and the value of merchandise lost to
shoplifters, in $. To do so, the manager assigned a different number of workers
for each of 10 weeks. The results were as follows:
Week #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x 9 11 12 13 15 18 16 14 12 10
y 420 350 360 300 225 200 230 280 315 410
The summary statistics for these data are:
Sum of x data: 130 Sum of the squares of x data: 1760
Sum of y data: 3090 Sum of the squares of y data: 1007750
Sum of the products of x and y data: 38305
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted loss
from shoplifting when there are 17 workers on duty? Will you trust this
value? Justify your answer.
[13 marks]
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
ii. The summary statistics can be substituted to the formula for the correlation (make sure
you know which one it is!) to obtain the value −0.9688. An interpretation of this value is
the following: The data suggest that the higher the number of workers, the lower the loss
from shoplifters. The fact that the value is very close to −1, suggests that this is a
strong linear negative association.
Many candidates did not mention all three words (strong, linear, negative). Note that all
of these words provide useful information on interpreting the association and are
therefore required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = −26.64.
The formula for a is a = ȳ − bx̄, so we get a = 655.36.
Hence the regression line can be written as yb = 655.36 − 26.64x or
y = 655.36 − 26.64x + ε. It should also be plotted in the scatter diagram.
14
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Many candidates reported incorrectly the regression line as y = 655.36 − 26.64x. This
expression is false; one of the two above is required.
iv. The prediction will be yb = 655.36 − 26.64 × 17 = $202.48. Yes we would trust this value,
since this point is inside the observed range of x, and therefore the prediction is based on
interpolation.
Many candidates did not give the measurement units here. These are essential in
answering such questions and a mark is deducted if they are not specified. It is also
important to provide the answer in two decimal places.
(b) A study was conducted to determine the amount of hours spent on Facebook by
university and high school students. For this reason, a questionnaire was
administered to a random sample of 16 university and 14 high school students
and the hours per day spent on Facebook were recorded. Summaries of the data
are shown in the table below:
Sample size Sample mean Sample variance
University students 16 2.9 0.9
High school students 14 2.1 1.1
i. Use an appropriate hypothesis test to determine whether the mean hours per
day spent on Facebook were different between university and high school
students. Test at two suitable significance levels, stating clearly the
hypotheses, the test statistic and its distribution under the null hypothesis.
Comment on your findings.
ii. State clearly any assumptions you made in (i.).
iii. Adjust the procedure above to determine whether the mean hours spent per
day on Facebook for university students is higher than that of high school
students.
[12 marks]
15
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
iii. This case corresponds to an one-sided test, therefore the hypotheses would be
H0 : µA = µB vs. H1 : µA > µB . The test statistic is the same for this case but the
critical values are now 1.701 for 5% and ≈ 2.467 for 1%. As before we reject H0 at the
5% but not at the 1% level, and we conclude that there is moderate evidence (result is
moderately significant) – university students spend more time on Facebook than high
school students.
16
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2014–15. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
(a) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer. (Note that no marks will be awarded without a
justification.)
i. The manufacturer of a car.
ii. The amount of money in a bank account.
iii. The Gross Domestic Product (GDP) of a country.
iv. The rating of a hotel according to the number of stars it has.
[8 marks]
17
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
4, x, 8, 7, 2
ii. • Method:
(4 − 5)2 + (4 − 5)2 + (8 − 5)2 + (7 − 5)2 + (2 − 5)2
s2 = .
4
• Correct value: 6.
18
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Some candidates divided by 5 in the formula above. In such cases only one mark was
awarded for part (ii), provided that the correct value was obtained. The reason is that the
formula for the sample variance provided in the subject guide only suggests dividing by
n − 1, where n is the number of observations. In another error that occurred in some cases,
candidates subtracted the number x = 4 rather than the sample mean which is given to be 7.
(c) The salaries of the employees of a company are normally distributed with mean
£25,000 and a standard deviation of £10,000.
i. What is the proportion of employees with a salary of at least £20,000?
ii. What is the proportion of employees with salaries between £15,000 and
£35,000?
[4 marks]
[6 marks]
19
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i=5
P
i. 2xi = 2(5 − 1 + 2) = 12.
i=3
i=4
P i=4
P
ii. 3(yi − 3) = 3 (yi − 3) = 3((−4 − 3) + (5 − 3) + (−1 − 3)) = 3(−7 + 2 − 4) = −27.
i=2 i=2
i=3
iii. y42 + (2xi +yi2 ) = (−1)2 +(2×(−3)+12 )+(2×5+(−4)2 )+(2×5+52 ) = 1−5+26+35 = 57.
P
i=1
(e) The variable X takes the values 2, 4, 6 and 8 according to the following
distribution
x 2 4 6 8
pX (x) 0.3 0.2 0.1 0.4
i. What is the probability that X is an odd number?
ii. Find E(X), the expected value of X.
iii. Find the probability that X/2 > 3.
[5 marks]
20
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(g) It is stated in a consumer magazine that the average price of football shirts in
London is £19.00. A random sample is taken by obtaining a single football shirt
from each of 16 randomly chosen London retailers. The sample mean is £20.20
and the sample standard deviation is £2.40. Carry out a hypothesis test, at two
appropriate significance levels, to determine whether the price of football shirts
in London is more expensive than the price stated in the consumer magazine.
State your hypotheses, the test statistic and its distribution under the null
hypothesis, and your conclusion in the context of the problem.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
i. The chance that a normal random variable is less than two standard
deviations from its mean is 99%.
ii. The lower the regression coefficient in absolute value the weaker the
correlation.
iii. Increasing the sample size will increase the width of a confidence interval for
a population mean (assuming that everything else remains constant).
21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
iv. When testing a hypothesis, we use a two tailed test if we want to test
whether the parameter is greater than what is stated in the null hypothesis.
v. A population list is needed in order to conduct quota sampling.
vi. The regression of the variable Y on the variable X will always have the same
slope as the regression of the variable X on the variable Y .
[12 marks]
Section B
Question 2
(a) Questionnaires were mailed to 300 households, in three different areas of a city,
to assess the level of local sporting facilities. The collected data are shown in
the table below.
Sporting Facilities Level
Very good Fairly good Poor Total
Area 1 44 30 26 100
Area 2 29 26 45 100
Area 3 45 28 27 100
Total 118 84 98 300
i. Based on the data in the table, and without conducting a significance test,
would you say there is an association between areas and level of local
sporting facilities?
ii. Calculate the χ2 statistic and use it to test for independence, using two
appropriate significance levels. What do you conclude?
[14 marks]
22
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) i. Provide the definition of simple random sampling and cluster sampling
designs.
ii. Why might a researcher prefer cluster sampling rather than simple random
sampling?
iii. Name one other random sampling scheme, provide its definition and one of
its advantages.
[11 marks]
23
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Question 3
(a) The following data shows the recorded times (y) in seconds taken by 10
international athletes to run 100 metres together with the corresponding wind
speeds (x) at the time of running. A positive wind speed indicates the wind is
in the direction of running and therefore considered to be helpful whereas a
negative wind speed indicates the wind is against the runner.
Athlete #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x −2.45 −1.23 −0.78 −0.33 −0.37 0.34 0.53 1.17 2.35 2.91
y 10.52 10.47 10.41 10.25 10.54 10.09 10.30 9.99 9.92 9.87
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Based on the regression equation in part (iii.), what will be the predicted
time for a runner for a wind speed of 1.5? Will you trust this value? Justify
your answer.
[13 marks]
24
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a title (‘Scatter diagram’ alone will not suffice) and labelled axes
which give their units in addition. Far too many candidates threw away marks by
neglecting these points and consequently were only given one mark out of the possible
four allocated for this part of the question. Another common way of losing marks was
failing to use the graph paper, which was provided, and required, in the question.
Candidates who drew on the ordinary paper in their booklet were not awarded marks for
this part of the question.
x
x
10.5
x
10.4
x
Recorded time in seconds
10.3
x
x
10.2
10.1
x
10.0
x
9.9
−2 −1 0 1 2 3
Wind speed
ii. The summary statistics can be substituted to the formula for the correlation (make sure
you know which one it is!) to obtain the value −0.9051. An interpretation of this value is
the following: The data suggest that the higher the wind speed, the lower the time to
run 100 metres. The fact that the value is very close to −1, suggests that this is a strong
linear negative association.
Many candidates did not mention all three words (strong, linear, negative). Note that all
of these words provide useful information on interpreting the association and are
therefore required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = 0.1414.
The formula a is a = ȳ − bx̄, so we get a = 10.2663.
Hence the regression line can be written as yb = 10.2663 − 0.1414x or
y = 10.2663 − 0.1414x + ε. It should also be plotted in the scatter diagram.
Many candidates reported incorrectly the regression line as y = 10.2663 − 0.1414x.
This expression is false; one of the two above is required.
25
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
iv. The prediction will be yb = 10.2663 − 0.1414 × 1.5 = 10.05 seconds. We would trust this
value, since this point is inside the observed range of x, and therefore the prediction is
based on interpolation.
Many candidates did not give the measurement units here. These are essential in
answering such question and a mark is deducted if they are not specified. It is also
important to provide the answer in at least two decimal places.
If equal variances are assumed, the test statistic value is 2.1449 (the pooled variance is
35.77). If equal variances are not assumed the test statistic value is 2.1164.
Since the variances are unknown and the sample size is not large enough, the t40
distribution is being used. The critical value at the 5% level is −2.048, hence we reject
the null hypothesis. If we take a (smaller) α of 1%, the critical value is −2.763, so we do
not reject H0 . We conclude that there is moderate evidence of a difference in the mean
scores of managerial success between the two groups.
ii. The assumptions for (ii.) were that:
∗ Assumption about equal variances.
26
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Question 4
(a) The following data show the length (in inches) of fish caught in one day in a
river:
10.1 10.4 10.5 10.9 11.1
11.2 11.2 11.5 11.7 11.9
12.1 12.1 12.2 12.2 12.3
12.4 12.5 12.6 12.8 12.9
13.2 13.4 13.5 13.6 13.7
14.3 14.5 14.8 15.2 15.5
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean (given that the sum of the data is 376.3), the median and the
modal group.
iii. Comment on the data given the shape of the histogram and the measures
you have calculated.
iv. Name two other types of graphical displays that would be suitable to
represent the data.
[12 marks]
27
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
0.30
0.25
0.20
Density
0.15
0.10
0.05
0.00
10 11 12 13 14 15 16
lengths in inches
ii. • Mean: 12.543. Note: raw data should be used, not grouped data.
• Median: 12.35. Note: same as above.
• Modal group: 12–13 inches. Note: units are necessary.
iii. Based on the shape of the histrogram, we can see that the distribution of the data is
positively skewed.
iv. A boxplot, stem-and-leaf diagram or a dot plot are other types of suitable graphical
displays. The reason for that is that the variable income is measurable and these graphs
are suitable for displaying the distribution of such variables.
(b) In order to estimate the percentage of city households that have high speed
internet access, a random sample of 140 city households was taken. Of these, 70
had high speed internet access. A similar sample of 170 rural households was
also taken and it was found that 61 of them had high speed internet access. The
data are summarised in the table below.
i. Give a 95% confidence interval for the difference between the proportions of
high speed internet access in city and rural households.
ii. Carry out a hypothesis test, at two suitable significance levels, to determine
whether city households are more likely to have high speed internet access
compared to rural households. State the test hypotheses, and specify your
test statistic and its distribution under the null hypothesis. Comment on
your findings.
iii. State any assumptions you made in (ii.).
[13 marks]
28
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Then, the lower and upper bounds can be found to be 0.0314 and 0.2510 respectively.
Finally, the above should be presented as an interval (0.0314, 0.2510).
ii. As before, let π1 denote the proportion of city households with high speed internet and
π2 the corresponding proportion of rural households. Also denote by p the overall
proportion of households with high speed internet. Regarding hypotheses, note that the
wording ‘less likely’ suggests an one sided test: H0 : π1 = π2 vs. H1 : π1 > π2 .
The next step is to identify the test statistic which is (p1 − p2 )/(s.e.(p1 − p2 )), and
follows a standard normal distribution.
s
1 1
s.e.(p1 − p2 ) = p(1 − p) + = 0.0564.
n1 n2
Based on the above the value of the test statistic is 2.503. The critical value at the 5%
level is 1.645, hence we reject H0 at the 5% level. Testing at the 1% level gives a critical
value of 2.323. Therefore, we still reject H0 at the 1% level, concluding that city
households are more likely to have high speed internet than rural households.
iii. • Sample size is large enough to justify the normality assumption.
• Equal variances.
Some candidates stated assumptions in this part that were not made in part (ii). Marks
were not awarded in such cases.
29
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZA_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
1. (a) A random sample of the heights of buildings has a sample mean of 24.96 metres.
State the units of measurements for the summaries below and justify your
answers.
i. sample variance
ii. sample standard deviation.
[4 marks]
!
i=4 !
i=3 !
i=4 4
y i
i. x2i ii. 2xi yi iii. y53 + .
i=2 i=1 i=3
xi
[6 marks]
(e) The random variable X takes the values 0, 1 and 4 according to the following
probability distribution:
x 0 1 4
pX (x) 0.2 k k
UL16/0217 Page 2 of 6
D00 Question continues on next page.
UL16/0489 Page 2 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(g) A museum conducts a survey of its visitors in order to assess the popularity
of a device which is used to provide information on the museum exhibits. The
device will be withdrawn if fewer than 20% of all of the museum’s visitors make
use of it. Of a random sample of 100 visitors, 15 chose to use the device.
i. Carry out an appropriate hypothesis test at the 5% significance level to
see if the device should be withdrawn and state your conclusions.
ii. Calculate the p-value of the test.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a simple true/false answer.)
i. The interquartile range of a sample is influenced by extreme values.
ii. A sampling distribution is the probability distribution of a population
parameter.
iii. A sample correlation coefficient close to 1 indicates a strong positive linear
relationship between two categorical variables.
iv. A p-value of 0.08 represents a highly significant hypothesis test result.
v. Rejection of a null hypothesis might indicate that a Type II error has been
committed.
vi. A quota sample is the non-random equivalent of a systematic random
sample.
[12 marks]
SECTION B
Answer two out of the three questions from this section (25 marks each).
UL16/0217 Page 3 of 6
D00
Question continues on next page.
UL16/0489 Page 3 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Outcome
Faulty Non-faulty Total
Machine 1 4 96 100
Machine 2 2 98 100
Machine 3 11 89 100
Machine 4 14 86 100
Total 31 369 400
i. Based on the data in the table, and without conducting any significance
test, would you say there is an association between the machine number
and the component being faulty?
ii. Calculate the χ2 statistic and use it to test for independence, using a
5% significance level. What do you conclude?
[14 marks]
(b) i. Describe how stratified random sampling is performed and explain how
it differs from quota sampling.
ii. A company producing handheld electronic devices (tablets, mobile
phones etc.) wants to understand how people of different ages rate
its products. For this reason, the company’s management has decided
to use a survey of its customers and has asked you to devise an
appropriate random sampling scheme. Outline the key components
of your sampling scheme.
[11 marks]
UL16/0217 Page 4 of 6
D00
UL16/0489 Page 4 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
3. (a) The data below represent heights, measured in centimetres, of women from
an adult female population:
162 164 164 165 165
166 166 166 167 167
167 167 167 168 168
168 168 168 168 169
169 169 169 170 170
170 171 172 184 185
i. Carefully construct, draw and label a histogram of these data on the
graph paper provided.
ii. Find the median height among these women and the upper quartile.
What percentage of women were below 165 cm?
iii. Comment on the data given the shape of the histogram without doing
any further calculations.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[13 marks]
(b) A random sample of 9 people tried a specific diet that lasted 2 months
to lose weight. The weights of these people, measured in kilograms, were
measured both at the beginning and the end of the diet, and are shown in
the table below:
Weight before diet Weight after diet
75 73
76 72
90 92
92 93
89 89
63 61
65 62
80 76
90 84
i. Carry out an appropriate hypothesis test to determine whether the diet
is effective in helping people lose weight. State the test hypotheses, and
specify your test statistic and its distribution under the null hypothesis.
Comment on your findings.
ii. State any assumptions you made in i.
iii. Give a 90% confidence interval for the difference between the means of
the weights before and after the diet.
[12 marks]
UL16/0217 Page 5 of 6
D00
UL16/0489 Page 5 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
4. The director of a local Tourism Authority would like to know whether a family’s
annual expenditure on recreation (y), measured in $000s, is related to their
annual income (x), also measured in $000s. In order to explore this potential
relationship, the variables x and y were recorded for 10 randomly selected
families that visited the area last year. The results were as follows:
Week #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x 41.2 50.1 52.0 62.0 44.5 37.7 73.5 37.5 56.7 65.2
y 2.4 2.7 2.8 8.0 3.1 2.1 12.1 2.0 3.9 8.9
The summary statistics for these data are:
Sum of x data: 520.4 Sum of the squares of x data: 28431.42
Sum of y data: 48 Sum of the squares of y data: 343.74
Sum of the products of x and y data: 2858.63
(a) i. Draw a scatter diagram of these data on the graph paper provided.
Label the diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the
scatter diagram.
iv. Do you find the analyses in ii. and iii. appropriate? Justify your
answer and suggest any alternative ways to model the relationship
between x and y.
[13 marks]
(b) The fuel consumption of two different car models (A and B) was compared
in the following way. A random sample of 20 cars from model A and 35 cars
from model B were taken and the fuel consumption (in miles per gallon)
was measured for each car. The results are summarised in the table below.
END OF PAPER
UL16/0217 Page 6 of 6
D00
UL16/0489 Page 6 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
UL16/0489 Page 7 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
=p
Z∼ Z= p
π0 (1 − π0 )/n σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 ) 1 1
T = q 2
(x̄1 − x̄2 )±tα/2, n1 +n2 −2 · sp +
Sp2 (1/n1 + 1/n2 ) n1 n 2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tα/2, n−1 · √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
X
r X
c
(Oij − Eij )2 P
n
xi yi − nx̄ȳ
Eij r = s i=1
i=1 j=1
P
n P
n
x2i − nx̄2 yi2 − nȳ 2
i=1 i=1
a = ȳ − bx̄
2
UL16/0489 Page 8 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 9 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 10 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 11 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 12 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 13 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 14 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 15 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 16 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 17 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 18 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 19 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 20 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0489 Page 21 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZA_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
1. (a) A random sample of athletes’ times to run 200 metres has a sample mean of
24.96 seconds. State the units of measurements for the summaries below and
justify your answers.
i. sample variance
ii. sample standard deviation.
[4 marks]
!
i=4 !
i=3 !
i=5 4
y i
i. x2i ii. 3xi yi iii. y33 + .
i=2 i=1 i=4
xi
[6 marks]
(e) The random variable X takes the values 0, 1 and 3 according to the following
probability distribution:
x 0 1 3
pX (x) 0.4 k k
UL16/0217 Page 2 of 6
D00 Question continues on next page.
UL16/0490 Page 2 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(g) A museum conducts a survey of its visitors in order to assess the popularity
of a device which is used to provide information on the museum exhibits. The
device will be withdrawn if fewer than 25% of all of the museum’s visitors make
use of it. Of a random sample of 100 visitors, 20 chose to use the device.
i. Carry out an appropriate hypothesis test at the 5% significance level to
see if the device should be withdrawn and state your conclusions.
ii. Calculate the p-value of the test.
[7 marks]
(h) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a simple true/false answer.)
i. The range of a sample is influenced by extreme values.
ii. A sampling distribution is the probability distribution of a population
parameter.
iii. A sample correlation coefficient close to −1 indicates a strong negative
linear relationship between two categorical variables.
iv. A p-value of 0.007 represents a weakly significant hypothesis test result.
v. Failure to reject a null hypothesis might indicate that a Type I error has
been committed.
vi. A stratified random sample is the random equivalent of a convenience
sample.
[12 marks]
UL16/0217 Page 3 of 6
D00
UL16/0490 Page 3 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
Answer two out of the three questions from this section (25 marks each).
(b) i. Describe how quota sampling is performed and explain how it differs
from stratified random sampling.
ii. A company producing handheld electronic devices (tablets, mobile
phones etc.) wants to understand how men and women rate its
products. For this reason, the company’s management has decided to
use a survey of its customers and has asked you to devise an
appropriate random sampling scheme. Outline the key components
of your sampling scheme.
[11 marks]
UL16/0217 Page 4 of 6
D00
UL16/0490 Page 4 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
3. (a) A policeman recorded the speed of 30 cars on a road with a 30 miles per
hours speed limit. The recorded data are shown below:
25.6 25.7 25.7 25.8 25.8
26.2 26.9 27.5 27.7 27.8
27.9 27.9 28.3 28.4 28.5
28.8 28.9 28.9 29.0 29.1
29.2 29.3 29.5 29.7 29.8
30.1 30.1 30.2 36.2 36.9
i. Carefully construct, draw and label a histogram of these data on the
graph paper provided.
ii. Find the median speed among these cars and the upper quartile. What
percentage of drivers were exceeding the 30 miles per hour speed limit?
iii. Comment on the data given the shape of the histogram without doing
any further calculations.
iv Name two other types of graphical displays that would be suitable to
represent the data.
[13 marks]
UL16/0217 Page 5 of 6
D00
UL16/0490 Page 5 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
END OF PAPER
UL16/0217 Page 6 of 6
D00
UL16/0490 Page 6 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
UL16/0490 Page 7 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
=p
Z∼ Z= p
π0 (1 − π0 )/n σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 ) 1 1
T = q 2
(x̄1 − x̄2 )±tα/2, n1 +n2 −2 · sp +
Sp2 (1/n1 + 1/n2 ) n1 n 2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tα/2, n−1 · √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
X
r X
c
(Oij − Eij )2 P
n
xi yi − nx̄ȳ
Eij r = s i=1
i=1 j=1
P
n P
n
x2i − nx̄2 yi2 − nȳ 2
i=1 i=1
a = ȳ − bx̄
2
UL16/0490 Page 8 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 9 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 10 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 11 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 12 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 13 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 14 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 15 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 16 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 17 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 18 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 19 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 20 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL16/0490 Page 21 of 21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2015–16. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
At the end of the course and having completed the Essential reading and activities you should:
• be familiar with the key ideas of statistics that are accessible to a student with a moderate
mathematical competence
• be able to routinely apply a variety of methods for explaining, summarising and presenting
data and interpreting results clearly using appropriate diagrams, titles and labels when
required
• be able to summarise the ideas of randomness and variability, and the way in which these
link to probability theory to allow the systematic and logical collection of statistical
techniques of great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to perform inference to test the significance of common measures such as means and
proportions and conduct chi-square tests of contingency tables
• be able to use simple linear regression and correlation analysis and know when it is
appropriate to do so.
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory which covers several subquestions and accounts for 50 per cent of the total marks.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B contains three questions, each worth 25 per cent, from which you are asked to choose two.
Remember that each of the Section B questions is likely to cover more than one topic. In 2016, for
example, the first part of Question 2 asked for a chi-squared test and survey design problems
appeared in the second part. Question 3 had a series of questions involving drawing diagrams, such
as histograms, hypothesis testing, in particular paired sample t tests, and confidence intervals. The
first part of Question 4 was on linear regression and involved drawing a diagram, while the second
part was a hypothesis test comparing population means using the sample data given. This means
that it is really important that you make sure you have a reasonable idea of what topics are covered
before you start work on the paper! We suggest you divide your time as follows during the
examination.
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Do not allow yourself to get stuck on any one
question, but do not just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams which were required and, if you did more than the
two questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are!
The examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
• understand and answer the questions set.
You are not expected to write long essays where explanations or descriptions of sample design
are required, and note-form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
Examiners’ commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2016 examinations!
Remember the following.
• If you are asked to label a diagram (which is almost always the case!), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x-axis and y-axis?
• If you are specifically asked to carry out a hypothesis test, or a confidence interval, do so. It
is not acceptable to do one rather than the other! If you are asked to find a 5% critical
value, this is what will be marked.
• Do not waste time calculating things which are not required by the examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not gain you marks.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Examiners0 commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the examiners were looking for
• the relevant detailed reference to P. Newbold, W.L. Carlson and B.M. Thorne Statistics for
business and economics. (London: Prentice–Hall, 2012) eighth edition [ISBN
9780273767060] and the subject guide
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold (2012).
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
It was noted recently that a small number of candidates appeared to be memorising answers from
previous years’ Examiners’ commentaries, for example plots, and produced the exact same image of
them without looking at the current year’s examination paper questions! Note that this is very easy
to spot. The Examiners’ commentaries should be used as a guide to practise on sample examination
questions and it is pointless to attempt to memorise them.
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons. The Examiners’ commentaries suggest ways of
addressing common problems and improving your performance. One particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates may not cover all topics in the syllabus in the same depth, but you
need to be aware that the examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet in the section of the VLE dedicated to
each course. You should read the syllabus carefully and ensure that you cover sufficient material in
preparation for the examination. Examiners will vary the topics and questions from year to year and
may well set questions that have not appeared in past papers. Examination papers may legitimately
include questions on any topic in the syllabus. So, although past papers can be helpful during your
revision, you cannot assume that topics or specific questions that have come up in past examinations
will occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2015–16. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows the symbol • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each).
Section A
Question 1
(a) A random sample of the heights of buildings has a sample mean of 24.96 metres.
State the units of measurements for the summaries below and justify your
answers.
i. sample variance
ii. sample standard deviation.
(4 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(6 marks)
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Sample examination questions are quite relevant. For the first part of the question it is
essential to check Section 6.9 of the subject guide.
Approaching the question
The first part just requires knowledge of the fact that if X is a normal random variable with
mean µ and variance σ 2 , the sample mean from a sample of size n, X̄, is also a normal
random variable with mean µ and variance σ 2 /n. Direct application of this fact then yields
that:
(21)2
X̄ ∼ N 138, = N (138, 17.64).
25
For the second part, the basic property of the normal random variable for this question is
that if X ∼ N (µ, σ 2 ), then Z = (X − µ)/σ ∼ N (0, 1). Note also that:
* P (Z < a) = P (Z ≤ a) = Φ(a)
* P (Z > a) = P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − P (Z < a) = 1 − Φ(a)
* P (a < Z < b) = P (a ≤ Z < b) = P (a < Z ≤ b) = P (a ≤ Z ≤ b) = Φ(b) − Φ(a).
The above is all you need to find the requested proportion. We can write:
128 − 138
P (X̄ < 128) = P Z < √
17.64
= P (Z < −2.38)
= 1 − Φ(2.38)
= 1 − 0.99134
= 0.00866.
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. Each colour (black, white, red, etc.) is a category. Also, there is no natural ordering
between the colours, for example we cannot really say that ‘blue is higher than red’. This
is therefore a categorical nominal variable.
iv. Measurable because exchange rates are quoted to several decimal places, for example
US$1.45 to the £.
Weak candidates did not provide a justification for their choices, reported nominal or
categorical to measurable variables and sometimes answered ordinal when their justification
was pointing to a nominal variable. There were also phrases like ‘It is measurable because it
can be measured’ that were not awarded any marks.
(e) The random variable X takes the values 0, 1 and 4 according to the following
probability distribution:
x 0 1 4
pX (x) 0.2 k k
i. Determine the constant k.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(5 marks)
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
ii. We have:
P (F | S) P (S) 0.025 25
P (S | F ) = = = = 0.5682.
P (F ) 0.044 44
(g) A museum conducts a survey of its visitors in order to assess the popularity of a
device which is used to provide information on the museum exhibits. The
device will be withdrawn if fewer than 20% of all of the museum’s visitors make
use of it. Of a random sample of 100 visitors, 15 chose to use the device.
i. Carry out an appropriate hypothesis test at the 5% significance level to see if
the device should be withdrawn and state your conclusions.
ii. Calculate the p-value of the test.
(7 marks)
Note: The last three marks of the first part can also be awarded by correct use of the
p-value, see below.
• The p-value is higher than α = 0.05.
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
(12 marks)
i. False. The interquartile range of a sample is defined as the range of the central 50% of
the values in a dataset, so any extreme values would lie below the lower quartile and/or
above the upper quartile.
ii. False. A sampling distribution is the probability distribution of a sample statistic.
iii. False. A value of r close to 1 indicates a strong, positive linear relationship between two
measurable (continuous) variables.
iv. False. A p-value less than 0.01 represents a highly significant hypothesis test result, 0.08
is merely weakly significant.
v. False. Rejection of a true null hypothesis might indicate that a Type I error has been
committed.
vi. False. A quota sample is the non-random equivalent of a stratified random sample.
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
7.75 92.25
7.75 92.25
7.75 92.25
7.75 92.25
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
which gives a value of 13.53. This is a 4 × 2 contingency table, so the degrees of freedom
are (4 − 1) × (2 − 1) = 3.
For α = 0.05, the critical value is 7.815, hence we reject H0 .
We conclude that there is evidence of an association between machine number and the
component being faulty.
Many candidates looked up the tables incorrectly and so failed to follow through their
earlier accurate work.
(b) i. Describe how stratified random sampling is performed and explain how it
differs from quota sampling.
ii. A company producing handheld electronic devices (tablets, mobile phones
etc.) wants to understand how people of different ages rate its products. For
this reason, the company’s management has decided to use a survey of its
customers and has asked you to devise an appropriate random sampling
scheme. Outline the key components of your sampling scheme.
(11 marks)
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Question 3
(a) The data below represent heights, measured in centimetres, of women from an
adult female population:
162 164 164 165 165
166 166 166 167 167
167 167 167 168 168
168 168 168 168 169
169 169 169 170 170
170 171 172 184 185
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the median height among these women and the upper quartile. What
percentage of women were below 165 cm?
iii. Comment on the data given the shape of the histogram without doing any
further calculations.
iv. Name two other types of graphical displays that would be suitable to
represent the data.
(13 marks)
Histogram of Heights
0.12
Frequency Densities
0.08
0.04
0.00
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. • Median: 168 centimeters. Note: Raw data should be used, not grouped data. Also,
make sure to mention the units to get the full marks.
• Upper quartile: 169 centimeters. Note: Same as above.
• Percentage: 3/30 = 10%. Note: As the question asks for a percentage, make sure to
report 10%, not just 3/30 or anything else.
iii. Based on the shape of the histogram, we can see that the distribution of the data is
positively skewed. Also two women, with heights of 184 cm and 185 cm, may be regarded
as outliers. Note: It is important to identify the specific outliers (184 cm and 185 cm)
not just write ‘there are two outliers’.
iv. A boxplot, stem-and-leaf diagram or dot plot are other types of suitable graphical
displays. The reason for that is that the variable height is measurable and these graphs
are suitable for displaying the distribution of such variables.
(b) A random sample of 9 people tried a specific diet that lasted 2 months to lose
weight. The weights of these people, measured in kilograms, were measured
both at the beginning and the end of the diet, and are shown in the table below:
Weight before diet Weight after diet
75 73
76 72
90 92
92 93
89 89
63 61
65 62
80 76
90 84
i. Carry out an appropriate hypothesis test to determine whether the diet is
effective in helping people lose weight. State the test hypotheses, and specify
your test statistic and its distribution under the null hypothesis. Comment
on your findings.
ii. State any assumptions you made in i.
iii. Give a 90% confidence interval for the difference between the means of the
weights before and after the diet.
(12 marks)
−2 −4 2 1 0 −2 −3 −4 −6
The next step is to calculate sd = 2.598 and s̄d = −2.0, in order to obtain the value of
the test statistic:
x̄d − 0
t= √ = −2.309.
sd / n
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
We have a t distribution with 8 degrees of freedom, hence the critical value (for a
one-sided test) is −1.860. Note: This is clearly a t distribution, make sure not to use the
standard normal distribution.
Hence, we reject H0 at the 5% significance level. Testing at the 1% significance level
gives a critical value of t8, 0.99 = −2.896. Therefore, we do not reject H0 and conclude
that there is moderate evidence that the diet is effective.
ii. • Differences are normally distributed.
• Pairs of observations are independent.
iii. This is a standard exercise for confidence intervals given the appropriate formula from
the formula sheet (make sure to be able to recognise it). The requested confidence
interval is (−3.610, −0.390).
Question 4
(a) The director of a local Tourism Authority would like to know whether a family’s
annual expenditure on recreation (y), measured in $000s, is related to their
annual income (x), also measured in $000s. In order to explore this potential
relationship, the variables x and y were recorded for 10 randomly selected
families that visited the area last year. The results were as follows:
Week #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x 41.2 50.1 52.0 62.0 44.5 37.7 73.5 37.5 56.7 65.2
y 2.4 2.7 2.8 8.0 3.1 2.1 12.1 2.0 3.9 8.9
The summary statistics for these data are:
Sum of x data: 520.4 Sum of the squares of x data: 28431.42
Sum of y data: 48 Sum of the squares of y data: 343.74
Sum of the products of x and y data: 2858.63
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Do you find the analyses in ii. and iii. appropriate? Justify your answer and
suggest any alternative ways to model the relationship between x and y.
(13 marks)
14
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
12
Annual family recreation expenditure in $000s
10
x
8
6
4
x
x
x x
x
xx
2
40 45 50 55 60 65 70
(b) The fuel consumption of two different car models (A and B) was compared in
the following way. A random sample of 20 cars from model A and 35 cars from
model B were taken and the fuel consumption (in miles per gallon) was
measured for each car. The results are summarised in the table below.
Sample size Sample mean Sample standard deviation
Car Model A 20 30.9 6.11
Car Model B 35 27.1 6.41
15
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
H0 : µA = µB vs. H1 : µA > µB .
The test statistic formulae, depending on whether a pooled variance is used or not, are
provided in the formula sheet:
x̄ − ȳ x̄ − ȳ
p or q .
s2A /nA + s2B /nB 2
sp (1/n1 + 1/n2 )
If equal variances are assumed, the test statistic value is 2.150 (the pooled variance is
39.74). If equal variances are not assumed the test statistic value is 2.179.
Since the variances are unknown and the sample size is not large enough, the t50
distribution is being used. The critical value at the 5% significance level is 1.676, hence
we reject the null hypothesis. If we take a (smaller) α of 1%, the critical value is 2.390,
so we do not reject H0 . We conclude that there is moderate evidence of a difference in
the mean fuel consumption between the car models.
ii. The assumptions for ii. were the following.
• Assumption about equal variances.
• Assumption about whether nA + nB is ‘large’ so that the normality assumption is
satisfied.
• Assumption about independent samples.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also some other candidates just copied the phrase
‘assumption about equal variances’ and naturally were not awarded any marks. One
should state whether the calculations were based on the assumption that the unknown
variances are equal or unequal.
iii. Based on the t50 distribution and using the correct formula from the formula sheet (make
sure to be able to recognise it) the requested 95% confidence interval is (0.251, 7.349).
Note: In the solution above, the t50 distribution was used but the use of the standard
normal distribution is also justified as the sample size is relatively large. Hence a solution
based on the standard normal distribution is also acceptable.
16
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2015–16. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows the symbol • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each).
Section A
Question 1
(a) A random sample of athletes’ times to run 200 metres has a sample mean of
24.96 seconds. State the units of measurements for the summaries below and
justify your answers.
i. sample variance
ii. sample standard deviation.
(4 marks)
17
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(6 marks)
18
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
19
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
iv. Measurable because inflation rates are quoted to several decimal places, for example
1.50%.
Weak candidates did not provide a justification for their choices, reported nominal or
categorical to measurable variables and sometimes answered ordinal when their justification
was pointing to a nominal variable. There were also phrases like ‘It is measurable because it
can be measured’ that were not awarded any marks.
(e) The random variable X takes the values 0, 1 and 3 according to the following
probability distribution:
x 0 1 3
pX (x) 0.4 k k
i. Determine the constant k.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(5 marks)
− µ)2 p(xi ),
P
An alternative method to find the variance is through the formula i (xi
where µ is found in part ii.
20
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. We have:
P (F | S) P (S) 0.05 50
P (S | F ) = = = = 0.6494 ≈ 0.65.
P (F ) 0.077 77
(g) A museum conducts a survey of its visitors in order to assess the popularity of a
device which is used to provide information on the museum exhibits. The
device will be withdrawn if fewer than 250% of all of the museum’s visitors
make use of it. Of a random sample of 100 visitors, 20 chose to use the device.
i. Carry out an appropriate hypothesis test at the 5% significance level to see if
the device should be withdrawn and state your conclusions.
ii. Calculate the p-value of the test.
(7 marks)
Note: The last three marks of the first part can also be awarded by correct use of the
p-value, see below.
• The p-value is higher than α = 0.05.
21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(h) State whether the following are true or false and give a brief explanation. (Note
that no marks will be awarded for a simple true/false answer.)
(12 marks)
i. True. The range is defined as x(n) − x(1) , so any extreme values would be x(1) and/or
x(n) , hence influencing the range.
ii. False. A sampling distribution is the probability distribution of a sample statistic.
iii. False. A value of r close to −1 indicates a strong, negative linear relationship between
two measurable (continuous) variables.
iv. False. A p-value of 0.007 represents a highly significant hypothesis test result. Weakly
significant means a p-value between 0.05 and 0.10.
v. False. Failure to reject a null hypothesis might indicate that a Type II error has been
committed.
vi. False. A quota sample is the non-random equivalent of a stratified random sample.
22
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
(a) A sample consisting of 400 randomly selected students was classified in terms of
personality type (introvert or extrovert) and in terms of their favourite colour
(red, yellow, green or blue). Their responses are summarised in the table below:
Personality type
Introvert Extrovert Total
Red 32 68 100
Yellow 26 74 100
Green 21 79 100
Blue 46 54 100
Total 125 275 400
i. Based on the data in the table, and without conducting any significance test,
would you say there is an association between the student’s type of
personality and colour preference?
ii. Calculate the χ2 statistic and use it to test for independence, using a 5%
significance level. What do you conclude?
(14 marks)
31.25 68.75
31.25 68.75
31.25 68.75
31.25 68.75
23
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(b) i. Describe how quota sampling is performed and explain how it differs from
stratified random sampling.
ii. A company producing handheld electronic devices (tablets, mobile phones
etc.) wants to understand how men and women rate its products. For this
reason, the company’s management has decided to use a survey of its
customers and has asked you to devise an appropriate random sampling
scheme. Outline the key components of your sampling scheme.
(11 marks)
24
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Question 3
(a) A policeman recorded the speed of 30 cars on a road with a 30 miles per hours
speed limit. The recorded data are shown below:
25.6 25.7 25.7 25.8 25.8
26.2 26.9 27.5 27.7 27.8
27.9 27.9 28.3 28.4 28.5
28.8 28.9 28.9 29.0 29.1
29.2 29.3 29.5 29.7 29.8
30.1 30.1 30.2 36.2 36.9
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the median speed among these cars and the upper quartile. What
percentage of drivers were exceeding the 30 miles per hour speed limit?
iii. Comment on the data given the shape of the histogram without doing any
further calculations.
iv. Name two other types of graphical displays that would be suitable to
represent the data.
(13 marks)
Histogram of Speeds
0.20
0.15
Frequency Densities
0.10
0.05
0.00
24 26 28 30 32 34 36 38
25
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
ii. • Median: 28.65 miles per hour. Note: Raw data should be used, not grouped data.
Also, make sure to mention the units to get the full marks.
• Upper quartile: 29.45 miles per hour. Note: Same as above.
• percentage: 5/30 = 16.67%. Note: As the question asks for a percentage, make sure
to report 16.67% (17% is also fine), not just 5/30 or anything else.
iii. Based on the shape of the histogram, we can see that the distribution of the data is
positively skewed. Also two cars, with speeds 36.2 and 36.9 miles per hour, may be
regarded as outliers. Note: It is important to identify the specific outliers (36.2 and 36.9
miles per hour) not just write ‘there are two outliers’.
iv. A boxplot, stem-and-leaf diagram or dot plot are other types of suitable graphical
displays. The reason for that is that the variable speed is measurable and these graphs
are suitable for displaying the distribution of such variables.
2 4 −2 −1 0 2 3 4 6
The next step is to calculate sd = 2.598 and x̄d = 2.0, in order to obtain the value of the
test statistic:
x̄d − 0
t= √ = 2.309.
sd / n
26
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
We have a t distribution with 8 degrees of freedom, hence the critical value (for a
one-sided test) is 1.860. Note: This is clearly a t distribution, make sure not to use the
standard normal distribution.
Hence, we reject H0 at the 5% significance level. Testing at the 1% significance level
gives a critical value of t8, 0.01 = 2.896. Therefore, we do not reject H0 concluding that
there is moderate evidence that the special training is effective.
ii. • Differences are normally distributed.
• Pairs of observations are independent.
iii. This is a standard exercise for confidence intervals given the appropriate formula from
the formula sheet (make sure to be able to recognise it). The requested confidence
interval is (0.390, 3.610).
Question 4
(a) An insurance company wants to relate the amount of fire damage (y) in major
residential fires to the distance between the residence and the nearest fire
station (x). For this reason, a study was conducted in a large suburb of a major
city based on a sample of 10 recent fires in this suburb. For each of these fires,
the variables x and y were recorded and are shown in the table below:
Fire #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x 3.4 1.8 4.6 2.3 3.1 5.5 0.7 3.0 2.6 4.3
y 2.6 1.8 5.9 2.3 2.8 8.6 1.4 2.3 2.0 5.7
The summary statistics for these data are:
Sum of x data: 31.3 Sum of the squares of x data: 115.85
Sum of y data: 35.4 Sum of the squares of y data: 175.64
Sum of the products of x and y data: 138.08
i. Draw a scatter diagram of these data on the graph paper provided. Label the
diagram carefully.
ii. Calculate the sample correlation coefficient. Interpret your findings.
iii. Calculate the least squares line of y on x and draw the line on the scatter
diagram.
iv. Do you find the analyses in ii. and iii. appropriate? Justify your answer and
suggest any alternative ways to model the relationship between x and y.
(13 marks)
27
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
8
7
Amount of fire damage
6 x
x
5
4
3
x
x
x x
x
2
x
x
1 2 3 4 5
ii. The summary statistics can be substituted into the formula for the sample correlation
coefficient (make sure you know which one it is!) to obtain the value 0.9093. An
interpretation of this value is the following: the data suggest that the greater the
distance of the residence from the nearest fire station, the higher the amount of fire
damage. The fact that the value is very close to 1, suggests that this is a strong, positive
linear relationship.
Many candidates did not mention all three words (strong, positive, linear). Note that all
of these words provide useful information on interpreting the relationship and are
therefore required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = 1.526.
The formula for a is a = ȳ − bx̄, so we get a = −1.235.
Hence the regression line can be written as yb = −1.235 + 1.526x or
y = −1.235 + 1.526x + ε. It should also be plotted on the scatter diagram.
Many candidates reported incorrectly the regression line as y = −1.235 + 1.526x. This
expression is false; one of the two above expressions is required.
iv. In this case, one can note in the scatter diagram that the points seem to be ‘scattered’
around a non-linear curve rather than a straight line. Another, equivalent, way to note
this is the presence of two outliers. Hence a linear regression model does not seem to be
a good model for the relationship between the amount of fire damage and the distance
from the nearest fire station. Alternative approaches may involve the Spearman’s rank
correlation coefficient or transformations of the data, for example the log-transformation.
(b) The 55 university students on a certain course were randomly assigned to two
class groups of size 30 and 25 students respectively. At the end of the year, all
students took the examination and their marks are summarised in the table
below.
Sample size Sample mean Sample standard deviation
Class Group 1 30 75.33 7.61
Class Group 2 25 71.40 6.37
28
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
H0 : µA = µB vs. H1 : µA > µB .
The test statistic formulae, depending on whether a pooled variance is used or not, are
provided in the formula sheet:
x̄ − ȳ x̄ − ȳ
p or q .
s2A /nA + s2B /nB 2
sp (1/n1 + 1/n2 )
If equal variances are assumed, the test statistic value is 2.0511 (the pooled variance is
50.06). If equal variances are not assumed the test statistic value is 2.0848.
Since the variances are unknown and the sample size is not large enough, the t50
distribution is being used. The critical value at the 5% significance level is 1.676, hence
we reject the null hypothesis. If we take a (smaller) α of 1%, the critical value is 2.390,
so we do not reject H0 . We conclude that there is moderate evidence of a difference
between the mean examination marks of the two class groups.
ii. The assumptions for ii. were the following.
• Assumption about equal variances.
• Assumption about whether nA + nB is ‘large’ so that the normality assumption is
satisfied.
• Assumption about independent samples.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also some other candidates just copied the phrase
‘assumption about equal variances’ and naturally were not awarded any marks. One
should state whether the calculations were based on the assumption that the unknown
variances are equal or unequal.
iii. Based on the t50 distribution and using the correct formula from the formula sheet (make
sure to be able to recognise it) the requested 95% confidence interval is (0.082, 7.778).
Note: In the solution above, the t50 distribution was used but the use of the standard
normal distribution is also justified as the sample size is relatively large. Hence a solution
based on the standard normal distribution is also acceptable.
29
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZA_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(6 marks)
(b) Classify each one of the following variables as either measurable (continuous)
or categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer (no marks will be awarded without a justification).
i. Time spent in the previous week browsing the internet.
ii. Highest level of education obtained, i.e. no education, school education,
bachelor’s degree, master’s degree, doctorate.
iii. Country of residence.
iv. The rate of change of the human population.
(8 marks)
(c) The weights of a large population of animals have a mean of 7.3 kg and a
standard deviation of 1.9 kg.
i. Assuming that the weights are normally distributed, what is the
probability that a random selection of 40 animals from that population
will have a mean weight between 7.0 kg and 7.4 kg?
ii. A researcher stated that the probability you calculated is approximately
correct even if the distribution of the weights is not normal. Do you agree?
Justify your answer (no marks will be awarded without a justification).
(5 marks)
(d) The random variable X takes only the values 3, 5, 8 and 10 according to the
following probability distribution:
x 3 5 8 10
pX (x) k k k 2k
i. Determine the constant k and hence write down the probability
distribution of X.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(6 marks)
UL17/0338 Page 2| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(f) State whether the following are true or false and give a brief explanation (no
marks will be awarded for a simple true/false answer ).
i. The median of a random sample is influenced by extreme values.
ii. If A and B are independent events, then P (A | B) = P (A).
iii. If X ∼ N (5, 2), then P (X ≤ 5) < 0.5.
iv. A p-value of 0.3 represents a weakly significant hypothesis test result.
v. In stratified random sampling, elements within a stratum are
heterogeneous.
vi. A scatter diagram is used to display two categorical variables.
(12 marks)
(g) In a random sample of size n = 6 the mean of the data is 12 and the median
is 9. Another observation is then obtained and this takes the value of 5, i.e.
x7 = 5.
i. Calculate the mean of the seven observations.
ii. What can you conclude about the median of the seven observations?
(6 marks)
UL17/0338 Page 3| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
Answer two out of the three questions from this section (25 marks each).
2. (a) The data below represent the weights (in kg) of 30 athletes.
57 59 61 63 64
65 73 74 74 74
75 77 77 81 82
82 82 83 83 85
87 89 91 93 96
96 98 99 99 101
i. Carefully construct, draw and label a histogram of these data on the
graph paper provided.
ii. Find the mean and the modal group. You are given that the sum of
the data is 2420.
iii. Find the median and the lower quartile.
iv. Comment on the data, given the shape of the histogram and the
measures which you have calculated.
(13 marks)
(b) The student union of a large university gathered a random sample of 525
students to determine whether they are in favour of a new examination
timetable. The table below summarises the student responses.
In favour of
Subject area Sample size new examination timetable
Humanities 325 221
Science 200 120
UL17/0338 Page 4| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Baby 1 2 3 4 5 6 7 8 9
x 36.0 39.7 38.0 41.4 38.7 35.7 40.3 37.3 42.4
y 2.0 3.7 2.7 3.7 2.9 2.6 3.5 2.7 3.8
Sum of the x values: 349.5 Sum of the squares of the x values: 13615.37
Sum of the y values: 27.6 Sum of the squares of the y values: 87.82
Sum of the products of the x and y values: 1082.6
UL17/0338 Page 5| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) You have been asked to design a cluster random sample survey from the
employees of a certain large company to examine whether job satisfaction
of employees varies between different job types.
i. Discuss how you will choose your sampling frame. Also discuss the
limitations of your choice.
ii. Propose two relevant clusters. Justify your answers.
iii. Provide two actions to reduce response bias and explain why you think
they would be successful.
iv. Briefly discuss the statistical methodology you would use to analyse the
collected data.
(12 marks)
END OF PAPER
UL17/0338 Page 6| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ
Z= √ X̄ − µ
σ/ n t= √
S/ n
UL17/0338 Page 7| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (µ1 − µ2 )
Z∼
=q Z=
π(1−π)
q 2
σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
(X̄1 − X̄2 ) − (µ1 − µ2 )
1 1
t= r 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=r
n
P (1 − P ) n11 + n12
a = ȳ − bx̄
UL17/0338 Page 8| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 9| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 10| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 11| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 12| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 13| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 14| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 15| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 16| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 17| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 18| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 19| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 20| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0338 Page 21| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZB_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
(6 marks)
(b) Classify each one of the following variables as either measurable (continuous)
or categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer (no marks will be awarded without a justification).
i. Gross domestic product (GDP) of a country.
ii. Community type, i.e. rural, small town, large town, small city, large city.
iii. Discipline studied as the degree major.
iv. Volume of water in a bottle.
(8 marks)
(c) The weights of a large population of animals have a mean of 8.9 kg and a
standard deviation of 2.1 kg.
i. Assuming that the weights are normally distributed, what is the
probability that a random selection of 50 animals from that population
will have a mean weight between 8.6 kg and 9.1 kg?
ii. A researcher stated that the probability you calculated is approximately
correct even if the distribution of the weights is not normal. Do you agree?
Justify your answer (no marks will be awarded without a justification).
(5 marks)
(d) The random variable X takes only the values 2, 6, 7 and 9 according to the
following probability distribution:
x 2 6 7 9
pX (x) k k k 2k
i. Determine the constant k and hence write down the probability
distribution of X.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(6 marks)
UL17/0339 Page 2| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(f) State whether the following are true or false and give a brief explanation (no
marks will be awarded for a simple true/false answer ).
i. The median of a random sample is not influenced by extreme values.
ii. If A and B are independent events, then P (A | B) < P (A).
iii. If X ∼ N (7, 4), then P (X ≥ 7) > 0.5.
iv. A p-value of 0.03 represents an insignificant hypothesis test result.
v. In cluster random sampling, elements within a cluster are
homogeneous.
vi. A contingency table is used to display two measurable variables.
(12 marks)
(g) In a random sample of size n = 6 the mean of the data is 15 and the median
is 11. Another observation is then obtained and this takes the value of 8, i.e.
x7 = 8.
i. Calculate the mean of the seven observations.
ii. What can you conclude about the median of the seven observations?
(6 marks)
UL17/0339 Page 3| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
Answer two out of the three questions from this section (25 marks each).
2. (a) The data below contain measurements of the low-density lipoproteins, also
known as the ‘bad’ cholesterol, in the blood of 30 patients. Data are
measured in milligrams per deciliters (mg/dL).
95 96 96 98 99
99 101 101 102 102
103 104 104 107 107
111 112 113 113 114
115 117 121 123 124
127 129 131 135 143
i. Carefully construct, draw and label a histogram of these data on the
graph paper provided.
ii. Find the mean and the modal group. You are given that the sum of
the data is 3342.
iii. Find the median and the upper quartile.
iv. Comment on the data, given the shape of the histogram and the
measures which you have calculated.
(13 marks)
UL17/0339 Page 4| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
3. (a) The table below contains information from 9 students taking a course in
Statistics. Students were asked how many hours they spent revising the
material before the examination (x values, in hours) and what their
examination mark was (y values, in %).
Student 1 2 3 4 5 6 7 8 9
x 1.8 2.6 2.8 3.4 3.6 4.2 4.8 5.2 5.4
y 54 64 60 62 68 70 76 73 76
UL17/0339 Page 5| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(b) You have been asked to design a stratified random sample survey from the
employees of a certain large company to examine whether job satisfaction
of employees varies between different job types.
i. Discuss how you will choose your sampling frame. Also discuss the
limitations of your choice.
ii. Propose two relevant stratification factors. Justify your answers.
iii. Provide two actions to reduce response bias and explain why you think
they would be successful.
iv. Briefly discuss the statistical methodology you would use to analyse the
collected data.
(12 marks)
END OF PAPER
UL17/0339 Page 6| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Examination Formula Sheet
Z-test of hypothesis for a single mean (σ t-test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ
Z= √ X̄ − µ
σ/ n t= √
S/ n
UL17/0339 Page 7| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Z-test of hypothesis for a single Z-test for the difference between two means
proportion: (variances known):
p−π (X̄1 − X̄2 ) − (µ1 − µ2 )
Z∼
=q Z=
π(1−π)
q 2
σ1 σ22
n n1 + n2
t-test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
(X̄1 − X̄2 ) − (µ1 − µ2 )
1 1
t= r 2
(x̄1 − x̄2 ) ± tn1 +n2 −2 sp +
n1 n2
Sp2 n11 + n12
Confidence interval endpoints for the Z-test for the difference between two
difference in means in paired samples: proportions:
sd (P1 − P2 ) − (π1 − π2 )
x̄d ± tn−1 √ Z=r
n
P (1 − P ) n11 + n12
a = ȳ − bx̄
UL17/0339 Page 8| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 9| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 10| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 11| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 12| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 13| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 14| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 15| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 16| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 17| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 18| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 19| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 20| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL17/0339 Page 21| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2016–17. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
At the end of the half course and having completed the Essential reading and activities you should:
• be familiar with the key ideas of statistics that are accessible to a candidate with a
moderate mathematical competence
• be able to routinely apply a variety of methods for explaining, summarising and presenting
data and interpreting results clearly using appropriate diagrams, titles and labels when
required
• be able to summarise the ideas of randomness and variability, and the way in which these
link to probability theory to allow the systematic and logical collection of statistical
techniques of great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to perform inference to test the significance of common measures such as means and
proportions and conduct chi-squared tests of contingency tables
• be able to use simple linear regression and correlation analysis and know when it is
appropriate to do so.
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory which covers several subquestions and accounts for 50 per cent of the total marks.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B contains three questions, each worth 25 per cent, from which you are asked to choose two.
Remember that each of the Section B questions is likely to cover more than one topic. In 2017, for
example, the first part of Question 2 asked for data presentation and descriptive statistics, while
hypothesis testing and confidence intervals (for proportions) appeared in the second part. Question
3 began with correlation and linear regression, followed by further hypothesis testing (of means).
The first part of Question 4 required a chi-squared test of association, while the second part covered
survey design questions. This means that it is really important that you make sure you have a
reasonable idea of what topics are covered before you start work on the paper! We suggest you
divide your time as follows during the examination:
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Do not allow yourself to get stuck on any one
question, but do not just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams which were required and, if you did more than the
two questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are!
The examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
• understand and answer the questions set.
You are not expected to write long essays where explanations or descriptions of sampling design
are required, and note-form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2017 examinations!
Remember the following.
• If you are asked to label a diagram (which is almost always the case!), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x-axis and y-axis?
• If you are specifically asked to perform a hypothesis test, or calculate a confidence interval,
do so. It is not acceptable to do one rather than the other! If you are asked to use a 5%
significance level, this is what will be marked.
• Do not waste time calculating things which are not required by the examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not gain you marks.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Examiners0 commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the examiners were looking for
• the relevant detailed reference to Newbold, P., W.L. Carlson and B.M. Thorne Statistics for
business and economics. (London: Prentice–Hall, 2012) eighth edition [ISBN
9780273767060] and the subject guide
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold et al. (2012).
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
It was noted recently that a small number of candidates appeared to be memorising answers from
previous years’ Examiners’ commentaries, for example plots, and produced the exact same image of
them without looking at the current year’s examination paper questions! Note that this is very easy
to spot. The Examiners’ commentaries should be used as a guide to practise on sample examination
questions and it is pointless to attempt to memorise them.
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons, but one particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates might not cover all topics in the syllabus in the same depth, but you
need to be aware that examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet available on the VLE. You should read
the syllabus carefully and ensure that you cover sufficient material in preparation for the
examination. Examiners will vary the topics and questions from year to year and may well set
questions that have not appeared in past papers. Examination papers may legitimately include
questions on any topic in the syllabus. So, although past papers can be helpful during your revision,
you cannot assume that topics or specific questions that have come up in past examinations will
occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2016–17. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows the symbol • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
(6 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ii. We have:
i=5
X √ √ √
yi zi = 12 × 3 + 7 × 7 = 6 + 7 = 13.
i=4
iii. We have:
i=3
X 1 1 1
z52 + 2
= 7 + −1 − + = 48.25.
y
i=1 i
4 2
(b) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer (no marks will be awarded without a justification).
i. Time spent in the previous week browsing the internet.
ii. Highest level of education obtained, i.e. no education, school education,
bachelor’s degree, master’s degree, doctorate.
iii. Country of residence.
iv. The rate of change of the human population.
(8 marks)
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(c) The weights of a large population of animals have a mean of 7.3 kg and a
standard deviation of 1.9 kg.
i. Assuming that the weights are normally distributed, what is the probability
that a random selection of 40 animals from that population will have a mean
weight between 7.0 kg and 7.4 kg?
ii. A researcher stated that the probability you calculated is approximately
correct even if the distribution of the weights is not normal. Do you agree?
Justify your answer (no marks will be awarded without a justification).
(5 marks)
(d) The random variable X takes only the values 3, 5, 8 and 10 according to the
following probability distribution:
x 3 5 8 10
pX (x) k k k 2k
i. Determine the constant k and hence write down the probability distribution
of X.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(6 marks)
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iii. We have:
X
E(X 2 ) = x2i p(xi ) = 32 × 0.2 + 52 × 0.2 + 82 × 0.2 + (10)2 × 0.4 = 59.6
i
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(f ) State whether the following are true or false and give a brief explanation (no
marks will be awarded for a simple true/false answer ).
i. The median of a random sample is influenced by extreme values.
ii. If A and B are independent events, then P (A | B) = P (A).
iii. If X ∼ N (5, 2), then P (X ≤ 5) < 0.5.
iv. A p-value of 0.3 represents a weakly significant hypothesis test result.
v. In stratified random sampling, elements within a stratum are heterogeneous.
vi. A scatter diagram is used to display two categorical variables.
(12 marks)
P (A ∩ B) P (A) P (B)
P (A | B) = = = P (A).
P (B) P (B)
iii. False. A normal distribution is symmetric about its mean, hence P (X ≤ 5) = 0.5.
iv. False. A p-value of 0.3 represents an insignificant hypothesis test result. An alternative
justification could be that a weakly significant hypothesis test result means a p-value
between 0.05 and 0.10.
v. False. Stratified random sampling works better if the elements within a stratum are
homogeneous.
vi. False. A scatter diagram is used to display two measurable variables. An alternative
justification is that a contingency table is used to display two categorical variables.
(g) In a random sample of size n = 6 the mean of the data is 12 and the median is 9.
Another observation is then obtained and this takes the value of 5, i.e. x7 = 5.
i. Calculate the mean of the seven observations.
ii. What can you conclude about the median of the seven observations?
(6 marks)
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
7
P
xi
i=1 72 + 5
x̄ = = = 11.
n 7
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
(a) The data below represent the weights (in kg) of 30 athletes.
57 59 61 63 64
65 73 74 74 74
75 77 77 81 82
82 82 83 83 85
87 89 91 93 96
96 98 99 99 101
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean and the modal group. You are given that the sum of the data
is 2420.
iii. Find the median and the lower quartile.
iv. Comment on the data, given the shape of the histogram and the measures
which you have calculated.
(13 marks)
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
0.9
0.8
0.7
Frequency density
0.6
0.5
0.4
0.3
0.2
0.1
50 60 70 80 90 100 110
ii. • Mean = 2420/30 = 80.67 kg. Note: The raw data should be used, not grouped data.
Also make sure to mention the units to get the full marks.
• Modal group: [80, 90) kg. Note: Same as above.
iii. • Median = 82 kg.
• Correct position of Q1 (between 7th and 8th inclusive).
• Q1 ≈ 73.5 kg.
iv. The distribution of the data appears to be negatively/left-skewed. This is also supported
by the fact that the mean is less than the median.
(b) The student union of a large university gathered a random sample of 525
students to determine whether they are in favour of a new examination
timetable. The table below summarises the student responses.
In favour of
Subject area Sample size new examination timetable
Humanities 325 221
Science 200 120
i. Do the student responses indicate a difference between students in
humanities and science degrees in whether they are in favour of the new
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Using the standard normal distribution, which is justified by the large sample sizes
according to the central limit theorem, the critical values at the 5% significance level are
±1.96. Since 1.866 < 1.96 we do not reject H0 at the 5% significance level.
Therefore, we choose a second (larger) significance level, say 10%, which gives critical
values of ±1.645, in which case we reject H0 since 1.645 < 1.866.
Hence we conclude that there is weak evidence of a difference between the population
proportions of students in humanities and science in favour of the new examination
timetable.
ii. This is a standard exercise for confidence intervals given the appropriate formula from
the formula sheet (make sure to be able to recognise it). The requested confidence
interval is (−0.013, 0.173), using a z-value of 2.17 from Table 4 of the New Cambridge
Statistical Tables.
Note that in order to get the confidence interval above the following formula for
s.e.(p1 − p2 ) is required:
r
0.68 × 0.32 0.6 × 0.4
s.e.(p1 − p2 ) = + = 0.043.
325 200
In this case it makes no difference in the confidence interval calculation, but it could give
different answers in other questions of this type.
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Question 3
(a) It is assumed that there is an association between the gestational age at birth,
i.e. the number of weeks the mother was pregnant when she gave birth (x) and
the birth weight of the baby (y, in kg). An experiment was conducted on 9
randomly-selected babies and the data are summarised in the table below.
Baby 1 2 3 4 5 6 7 8 9
x 36.0 39.7 38.0 41.4 38.7 35.7 40.3 37.3 42.4
y 2.0 3.7 2.7 3.7 2.9 2.6 3.5 2.7 3.8
The summary statistics for these data are:
Sum of the x values: 349.5 Sum of the squares of the x values: 13615.37
Sum of the y values: 27.6 Sum of the squares of the y values: 87.82
Sum of the products of the x and y values: 1082.6
i. Draw a scatter diagram of these data on the graph paper provided. Carefully
label the diagram.
ii. Calculate the sample correlation coefficient. Interpret its value.
iii. Calculate and report the least squares line of y on x. Draw the line on the
scatter diagram.
iv. Based on the regression model above, what baby birth weight would you
expect from a mother who gave birth when she was 38 weeks pregnant?
Would you trust this value? Justify your answer.
(13 marks)
x
x x
3.5
x
Baby birth weight (in kg)
3.0
x x
x
2.5
2.0
36 37 38 39 40 41 42
Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a title (‘Scatter diagram’ alone will not suffice) and labelled axes
which give their units in addition. Far too many candidates threw away marks by
neglecting these points and consequently were only given one mark out of the possible
four allocated for this part of the question. Another common way of losing marks was
failing to use the graph paper which was provided, and required, in the question.
Candidates who drew on the ordinary paper in their answer booklet were not awarded
marks for this part of the question.
ii. The summary statistics can be substituted into the formula for the sample correlation
coefficient (make sure you know which one it is!) to obtain the value 0.92. An
interpretation of this value is the following: The data suggest that the higher the
gestational age, the higher the birth weight. The fact that the value is very close to 1,
suggests that this is a strong, positive linear association.
Many candidates did not mention all three words (strong, positive, linear). Note that all
of these words provide useful information for interpreting the association and are,
therefore, required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting in the summary statistics we get b = 0.250.
The formula for a is a = ȳ − bx̄, so we get a = −6.660.
Hence the regression line can be written as:
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2 .
The test statistic formulae, depending on whether or not a pooled variance is used, are
provided in the formula sheet, hence use:
x̄ − ȳ x̄ − ȳ
q or p .
s2p (1/n1 + 1/n2 ) s21 /n1 + s22 /n2
If equal variances are assumed, the test statistic value is −2.084. If equal variances are
not assumed the test statistic value is −2.082.
The variances are unknown but the sample sizes are large, so the standard normal
distribution can be used due to the central limit theorem. The t60 distribution is also
acceptable. The critical values at the 5% significance level are ±1.96, hence we reject the
null hypothesis. If we take a (smaller) α and test at the 1% significance level, the critical
values are ±2.576, so we do not reject H0 .
We conclude that there is moderate evidence of a difference between the population mean
battery lifetimes of the two brands.
ii. The assumptions for i. concerned:
• an assumption about equal variances
• an assumption about whether n1 + n2 is ‘large’ so that the normality assumption is
satisfied
• an assumption about independent samples.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also, some other candidates just copied the phrase
‘assumption about equal variances’ and naturally were not awarded any marks. One
should state whether the calculations were based on the assumption that unknown
variances are equal or unequal.
iii. Given the different wording in this part, ‘mean battery lifetime of brand 1 is shorter than
that of brand 2’, a one-tailed test is required. The hypotheses now become:
H0 : µ1 = µ2 vs. H1 : µ1 < µ2 .
The critical values, still based on the standard normal distribution, now become −1.645
for the 5% significance level and −2.326 for the 1% significance level. We reject H0 for
α = 0.05, but not α = 0.01, hence we conclude that there is moderate evidence that the
brand 1 batteries have a shorter mean battery lifetime.
Question 4
(a) A sample consisting of 100 randomly-selected adults in the USA was classified
in terms of their political affiliation (Democrat or Republican) and opinion on a
tax reform bill (in favour, indifferent or opposed). The data are summarised in
the table below.
14
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. Based on the data in the table, and without conducting any significance test,
would you say there is an association between the political affiliation and
opinion on the tax reform bill?
ii. Calculate the χ2 statistic and use it to test for independence of political
affiliation and opinion on the tax reform bill. What do you conclude?
(13 marks)
H0 : No association between political affiliation and opinion on the tax reform bill
vs.
H1 : Association between political affiliation and opinion on the tax reform bill.
(b) You have been asked to design a cluster random sample survey from the
employees of a certain large company to examine whether job satisfaction of
employees varies between different job types.
i. Discuss how you will choose your sampling frame. Also discuss the
limitations of your choice.
15
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
16
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2016–17. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE). Note that in
what follows the symbol • corresponds to 1 mark unless stated otherwise.
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
(6 marks)
17
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
ii. We have:
i=5
X √ √ √
yi zi = 16 × 4 + 10 × 10 = 8 + 10 = 18.
i=4
iii. We have:
i=3
X 1 1 1
z42 + = 42 + − − + 1 = 16.3.
y
i=1 i
2 5
(b) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as either nominal or
ordinal. Justify your answer (no marks will be awarded without a justification).
i. Gross domestic product (GDP) of a country.
ii. Community type, i.e. rural, small town, large town, small city, large city.
iii. Discipline studied as the degree major.
iv. Volume of water in a bottle.
(8 marks)
i. Measurable because the amount can be measured, for example, in trillions of pounds to
several decimal places such as £2.65 trillion.
ii. Each community type corresponds to a category. Moreover, the categories are in a
ranked order in terms of population or size, for example a large city has more residents
than a small city. Therefore, it is a categorical ordinal variable.
iii. Each discipline (Philosophy, Mathematics, Geography etc.) is a category. Also, there is
no natural ordering between the disciplines, for example we cannot really say that
‘Philosophy is higher than Geography’. Therefore, this is a categorical nominal variable.
iv. Measurable because volume can be measured to several decimal places, for example 502
ml.
Weak candidates did not provide justifications for their choices, reported nominal or ordinal
to measurable variables and sometimes answered ordinal when their justification was
pointing to a nominal variable. There were also phrases like ‘It is measurable because it can
be measured’ that were not awarded any marks.
18
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(c) The weights of a large population of animals have a mean of 8.9 kg and a
standard deviation of 2.1 kg.
i. Assuming that the weights are normally distributed, what is the probability
that a random selection of 50 animals from that population will have a mean
weight between 8.6 kg and 9.1 kg?
ii. A researcher stated that the probability you calculated is approximately
correct even if the distribution of the weights is not normal. Do you agree?
Justify your answer (no marks will be awarded without a justification).
(5 marks)
(d) The random variable X takes only the values 2, 6, 7 and 9 according to the
following probability distribution:
x 2 6 7 9
pX (x) k k k 2k
i. Determine the constant k and hence write down the probability distribution
of X.
ii. Find E(X), the expected value of X.
iii. Find Var(X), the variance of X.
(6 marks)
19
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
iii. We have:
X
E(X 2 ) = x2i p(xi ) = 22 × 0.2 + 62 × 0.2 + 72 × 0.2 + 92 × 0.4 = 50.2
i
20
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(f ) State whether the following are true or false and give a brief explanation (no
marks will be awarded for a simple true/false answer ).
i. The median of a random sample is not influenced by extreme values.
ii. If A and B are independent events, then P (A | B) < P (A).
iii. If X ∼ N (7, 4), then P (X ≥ 7) > 0.5.
iv. A p-value of 0.03 represents an insignificant hypothesis test result.
v. In cluster random sampling, elements within a cluster are homogeneous.
vi. A contingency table is used to display two measurable variables.
(12 marks)
P (A ∩ B) P (A) P (B)
P (A | B) = = = P (A).
P (B) P (B)
iii. False. A normal distribution is symmetric about its mean, hence P (X ≥ 7) = 0.5.
iv. False. A p-value of 0.03 represents an moderately significant hypothesis test result. An
alternative justification could be that an insignificant hypothesis test result means a
p-value larger than 0.10.
v. False. Cluster random sampling works better if the elements within a cluster are
heterogeneous.
vi. False. A contingency table is used to display two categorical variables. An alternative
justification is that a scatter diagram is used to display two measurable variables.
(g) In a random sample of size n = 6 the mean of the data is 15 and the median is
11. Another observation is then obtained and this takes the value of 8, i.e.
x7 = 8.
i. Calculate the mean of the seven observations.
ii. What can you conclude about the median of the seven observations?
(6 marks)
21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
(a) The data below contain measurements of the low-density lipoproteins, also
known as the ‘bad’ cholesterol, in the blood of 30 patients. Data are measured
in milligrams per deciliters (mg/dL).
95 96 96 98 99
99 101 101 102 102
103 104 104 107 107
111 112 113 113 114
115 117 121 123 124
127 129 131 135 143
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean and the modal group. You are given that the sum of the data
is 3342.
iii. Find the median and the upper quartile.
iv. Comment on the data, given the shape of the histogram and the measures
which you have calculated.
(13 marks)
22
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
0.9
0.8
0.7
Frequency density
0.6
0.5
0.4
0.3
0.2
0.1
ii. • Mean = 3342/30 = 111.4 mg/dL. Note: The raw data should be used, not grouped
data. Also make sure to mention the units to get the full marks.
• Modal group: [100, 110) mg/dL. Note: Same as above.
iii. • Median = 109 mg/dL.
• Correct position of Q3 (between 22nd and 23rd inclusive).
• Q3 ≈ 119 mg/dL.
iv. The distribution of the data appears to be positively/right-skewed. This is also
supported by the fact that the mean is greater than the median.
23
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Question 3
(a) The table below contains information from 9 students taking a course in
Statistics. Students were asked how many hours they spent revising the
material before the examination (x values, in hours) and what their
examination mark was (y values, in %).
Student 1 2 3 4 5 6 7 8 9
x 1.8 2.6 2.8 3.4 3.6 4.2 4.8 5.2 5.4
y 54 64 60 62 68 70 76 73 76
24
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Sum of the x values: 33.8 Sum of the squares of the x values: 139.24
Sum of the y values: 603 Sum of the squares of the y values: 40861
Sum of the products of the x and y values: 2336
i. Draw a scatter diagram of these data on the graph paper provided. Carefully
label the diagram.
ii. Calculate the sample correlation coefficient. Interpret its value.
iii. Calculate and report the least squares line of y on x. Draw the line on the
scatter diagram.
iv. Based on the regression model above, what examination mark would you
expect from a student who studied 8 hours? Would you trust this value?
Justify your answer.
(13 marks)
x x
75
x
70
x
Examination mark (in %)
x
65
x
60
x
55
Candidates are reminded that they are asked to draw and label the scatter diagram
which should include a title (‘Scatter diagram’ alone will not suffice) and labelled axes
which give their units in addition. Far too many candidates threw away marks by
neglecting these points and consequently were only given one mark out of the possible
four allocated for this part of the question. Another common way of losing marks was
failing to use the graph paper which was provided, and required, in the question.
Candidates who drew on the ordinary paper in their answer booklet were not awarded
marks for this part of the question.
25
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
ii. The summary statistics can be substituted into the formula for the sample correlation
coefficient (make sure you know which one it is!) to obtain the value 0.95. An
interpretation of this value is the following: The data suggest that the higher the hours
spent revising, the higher the examination mark. The fact that the value is very close to
1, suggests that this is a strong, positive linear association.
Many candidates did not mention all three words (strong, positive, linear). Note that all
of these words provide useful information for interpreting the association and are,
therefore, required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting in the summary statistics we get b = 5.804.
The formula for a is a = ȳ − bx̄, so we get a = 45.203.
Hence the regression line can be written as:
(12 marks)
26
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2 .
The test statistic formulae, depending on whether or not a pooled variance is used, are
provided in the formula sheet, hence use:
x̄ − ȳ x̄ − ȳ
q or p .
s2p (1/n1 + 1/n2 ) s21 /n1 + s22 /n2
If equal variances are assumed, the test statistic value is 2.426. If equal variances are not
assumed the test statistic value is 2.465.
The variances are unknown but the sample sizes are large, so the standard normal
distribution can be used due to the central limit theorem. The t60 distribution is also
acceptable. The critical values at the 5% significance level are ±1.96, hence we reject the
null hypothesis. If we take a (smaller) α and test at the 1% significance level, the critical
values are ±2.576, so we do not reject H0 .
We conclude that there is moderate evidence of a difference between the population mean
battery lifetimes of the two brands.
ii. The assumptions for i. concerned:
• an assumption about equal variances
• an assumption about whether n1 + n2 is ‘large’ so that the normality assumption is
satisfied
• an assumption about independent samples.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also, some other candidates just copied the phrase
‘assumption about equal variances’ and naturally were not awarded any marks. One
should state whether the calculations were based on the assumption that unknown
variances are equal or unequal.
iii. Given the different wording in this part, ‘mean battery lifetime of brand 1 is longer than
that of brand 2’, a one-tailed test is required. The hypotheses now become:
H0 : µ1 = µ2 vs. H1 : µ1 > µ2 .
The critical values, still based on the standard normal distribution, now become 1.645 for
the 5% significance level and 2.326 for the 1% significance level. We reject H0 for
α = 0.05 and α = 0.01, hence we conclude that there is strong evidence that the brand 1
batteries have a longer mean battery lifetime.
Question 4
27
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. Based on the data in the table, and without conducting any significance test,
would you say there is an association between the student’s origin and
satisfaction with university life?
ii. Calculate the χ2 statistic and use it to test for independence of student’s
origin and satisfaction with university life. What do you conclude?
(13 marks)
vs.
(b) You have been asked to design a stratified random sample survey from the
employees of a certain large company to examine whether job satisfaction of
employees varies between different job types.
i. Discuss how you will choose your sampling frame. Also discuss the
limitations of your choice.
ii. Propose two relevant stratification factors. Justify your answers.
iii. Provide two actions to reduce response bias and explain why you think they
would be successful.
28
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
iv. Briefly discuss the statistical methodology you would use to analyse the
collected data.
(12 marks)
29
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZA_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
i=5
X i=2
X i=5 2
X
1 y i
i. x2i ii. iii. y43 + .
i=3
xy
i=1 i i i=4
xi
(6 marks)
(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a plain true/false answer.)
i. If A and B are independent events, then P (A \ B) = P (A)/P (B).
ii. If X ⇠ N (3, 4), then P (X 3) = 0.5.
iii. A p-value can be negative.
iv. A Type I error is the failure to reject a true null hypothesis.
v. Item non-response occurs when no information is collected from a sample
member.
(10 marks)
(e) The random variable X takes the values 1, 1 and 2 according to the following
probability distribution:
x 1 1 2
pX (x) 0.20 k 4k
i. Determine the constant k and, hence, write out the probability distribution
of X.
ii. Find E(X) (the expected value of X).
iii. Find Var(X) (the variance of X).
(6 marks)
(f) The scores on a verbal reasoning test are normally distributed with a population
mean of µ = 100 and a population standard deviation of = 10.
i. What is the probability that a randomly chosen person scores at least 105?
ii. A simple random sample of size n = 20 is selected. What is the probability
that the sample mean will be between 97 and 104? (You may use the
nearest values provided in the statistical tables.)
(7 marks)
(g) You are told that a 99% confidence interval for a single population proportion
is (0.3676, 0.5324).
i. What was the sample proportion that lead to this confidence interval?
ii. What was the size of the sample used?
(6 marks)
SECTION B
Answer two out of the three questions from this section (25 marks each).
(b) You work for a market research company and your manager has asked
you to carry out a random sample survey for a mobile phone company to
identify whether a recently launched mobile phone is attractive to people
over 40 years old. Limited time and money resources are available at your
disposal. You are being asked to prepare a brief summary containing the
items below.
i. Choose an appropriate probability sampling scheme. Provide a brief
justification for your answer.
ii. Describe the sampling frame and the method of contact you will use.
Briefly explain the reasons for your choices.
iii. Provide an example in which response bias may occur. State an action
that you would take to address this issue.
iv. State the main research question of the survey. Identify the variables
associated with this question.
(12 marks)
4. (a) A sales department monitors the distribution of orders by their value (in £s).
The data below are the values of 30 recent orders:
76 59 93 87 38
50 56 123 45 67
102 34 54 85 85
50 44 33 51 40
82 92 79 38 86
34 29 107 63 46
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean, the median, the interquartile range and the modal group
on the histogram.
iii. Comment on the data, given the shape of the histogram and the
measures which you have calculated.
(13 marks)
END OF PAPER
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
Z∼
=p Z=
π0 (1 − π0 )/n
p
σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 )
1 1
T = q 2
x̄1 − x̄2 ±tα/2, n1 +n2 −2 × sp +
Sp2 (1/n1 + 1/n2 ) n1 n2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd P1 − P2 − (π1 − π2 )
x̄d ± tα/2, n−1 × √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
a = ȳ − bx̄
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 9 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 10 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 11 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 12 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 13 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 14 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 15 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 16 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 17 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 18 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 19 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 20 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
UL18/0322 Page 21 of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
~~ST104A_ZA_2016_d0
Statistics 1
A list of formulae and extracts from statistical tables are provided after the final question
on this paper.
Graph paper is provided at the end of this question paper. If used, it must be detached
and fastened securely inside the answer book.
A calculator may be used when answering questions on this paper and it must comply
in all respects with the specification given with your Admission Notice. The make and
type of machine must be clearly stated on the front cover of the answer book.
SECTION A
i=5
X i=2
X i=5 2
X
1 y i
i. x2i ii. iii. y43 + .
i=3
xy
i=1 i i i=4
xi
(6 marks)
(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a plain true/false answer.)
i. If A and B are mutually exclusive events, then P (A [ B) = 0.
ii. If X ⇠ N (8, 9), then P (X 8) = 0.5.
iii. A p-value can be greater than 1.
iv. A Type II error is to reject a false null hypothesis.
v. Unit non-response occurs when a sampled member fails to respond to a
question in the questionnaire.
(10 marks)
UL18/0323 Page 2| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(e) The random variable X takes the values 1, 1 and 3 according to the following
probability distribution:
x 1 1 3
pX (x) 0.10 k 5k
i. Determine the constant k and, hence, write out the probability distribution
of X.
ii. Find E(X) (the expected value of X).
iii. Find Var(X) (the variance of X).
(6 marks)
(f) The scores on a verbal reasoning test are normally distributed with a population
mean of µ = 100 and a population standard deviation of = 12.
i. What is the probability that a randomly chosen person scores at least 118?
ii. A simple random sample of size n = 24 is selected. What is the probability
that the sample mean will be between 96 and 103? (You may use the
nearest values provided in the statistical tables.)
(7 marks)
(g) You are told that a 90% confidence interval for a single population proportion
is (0.3853, 0.5147).
i. What was the sample proportion that lead to this confidence interval?
ii. What was the size of the sample used?
(6 marks)
UL18/0323 Page 3| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
SECTION B
Answer two out of the three questions from this section (25 marks each).
2. (a) A survey was conducted in order to examine whether the final grade of
students taking a class is associated with their attendance of a revision
session a few days before the examination. The data, consisting of
students’ final grades and revision session attendance, are summarised in
the table below.
Final Final Final
Grade A Grade B Grade C
Attended revision session 56 34 28
Did not attend revision session 44 46 42
i. Based on the data in the table, and without conducting any
significance test, would you say there is an association between final
grade and attending revision? Provide a brief justification for your
answer.
ii. Calculate the 2 statistic for the hypothesis of independence between
final grade and attending revision, and test that hypothesis. What do
you conclude?
(13 marks)
(b) You work for a market research company and your manager has asked you
to carry out a random sample survey for a laptop company to identify
whether a new laptop model is attractive to females. The main concern
is to produce results of high accuracy. You are being asked to prepare a
brief summary containing the items below.
i. Choose an appropriate probability sampling scheme. Provide a brief
justification for your answer.
ii. Describe the sampling frame and the method of contact you will use.
Briefly explain the reasons for your choices.
iii. Provide an example in which selection bias may occur. State an action
that you would take to address this issue.
iv. State the main research question of the survey. Identify the variables
associated with this question.
(12 marks)
UL18/0323 Page 4| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
3. (a) A study was conducted to determine whether the yield of olive oil is associated
with the average temperature of the area. The data in the table below provide
the average kilograms of olive oil per tree (y) and the average temperature
(x), measured in degrees Celsius. The data correspond to areas taken for 12
di↵erent countries.
(b) A survey was conducted in order to compare the average delivery times (in
minutes) between two pizza companies operating in the same area. A random
sample was drawn consisting of various pizza orders from both companies and
the delivery times were recorded. The data are summarised in the following
table:
UL18/0323 Page 5| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
4. (a) A large company is checking the salaries of its employees regularly to get an
idea of their distribution. The data below are the salaries (in $000s per year
before tax) of 30 employees.
39 40 44 47 32
37 25 71 56 33
64 63 42 43 34
25 28 35 24 45
35 22 53 55 36
46 46 27 27 38
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean, the median, the interquartile range and the modal group
on the histogram.
iii. Comment on the data, given the shape of the histogram and the measures
which you have calculated.
(13 marks)
END OF PAPER
UL18/0323 Page 6| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Examination Formula Sheet
z test of hypothesis for a single mean (σ t test of hypothesis for a single mean (σ
known): unknown):
X̄ − µ0
Z= √ X̄ − µ0
σ/ n T = √
S/ n
UL18/0323 Page 7| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
z test of hypothesis for a single z test for the difference between two means
proportion: (variances known):
P − π0 X̄1 − X̄2 − (µ1 − µ2 )
Z∼
=p Z=
π0 (1 − π0 )/n
p
σ12 /n1 + σ22 /n2
t test for the difference between two means Confidence interval endpoints for the
(variances unknown): difference between two means:
s
X̄1 − X̄2 − (µ1 − µ2 )
1 1
T = q 2
x̄1 − x̄2 ±tα/2, n1 +n2 −2 × sp +
Sp2 (1/n1 + 1/n2 ) n1 n2
Confidence interval endpoints for the z test for the difference between two
difference in means in paired samples: proportions:
sd P1 − P2 − (π1 − π2 )
x̄d ± tα/2, n−1 × √ Z=p
n P (1 − P ) (1/n1 + 1/n2 )
a = ȳ − bx̄
UL18/0323 Page 8| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 9| of
Downloaded by: aruzhanyerbolatova 21
aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 10| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 11| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 12| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 13| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 14| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 15| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 16| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 17| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 18| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 19| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 20| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Dennis V. Lindley, William F. Scott, New Cambridge Statistical Tables, (1995) © Cambridge University Press, reproduced with permission.
UL18/0323 Page 21| aruzhan.yerbolatovaa@gmail.com
Downloaded by: aruzhanyerbolatova of 21
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2017–18. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
At the end of the half course and having completed the Essential reading and activities you should:
• be familiar with the key ideas of statistics that are accessible to a candidate with a
moderate mathematical competence
• be able to routinely apply a variety of methods for explaining, summarising and presenting
data and interpreting results clearly using appropriate diagrams, titles and labels when
required
• be able to summarise the ideas of randomness and variability, and the way in which these
link to probability theory to allow the systematic and logical collection of statistical
techniques of great practical importance in many applied areas
• have a grounding in probability theory and some grasp of the most common statistical
methods
• be able to perform inference to test the significance of common measures such as means and
proportions and conduct chi-squared tests of contingency tables
• be able to use simple linear regression and correlation analysis and know when it is
appropriate to do so.
You have two hours to complete this paper, which is in two parts. The first part, Section A, is
compulsory which covers several subquestions and accounts for 50 per cent of the total marks.
1
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B contains three questions, each worth 25 per cent, from which you are asked to choose two.
Remember that each of the Section B questions is likely to cover more than one topic. In 2018, for
example, the first part of Question 2 asked for a chi-squared test of association and survey design
problems appeared in the second part. Question 3 began with correlation and linear regression,
followed by hypothesis testing of means and confidence interval construction. Question 4 began with
data presentation and descriptive statistics, while hypothesis testing for proportions appeared in the
second part. This means that it is really important that you make sure you have a reasonable idea of
what topics are covered before you start work on the paper! We suggest you divide your time as
follows during the examination:
• Spend the first 10 minutes annotating the paper. Note the topics covered in each question
and subquestion.
• Allow yourself 45 minutes for Section A. Do not allow yourself to get stuck on any one
question, but do not just give up after two minutes!
• Once you have chosen your two Section B questions, give them about 25 minutes each.
• This leaves you with 15 minutes. Do not leave the examination hall at this point! Check
over any questions you may not have completely finished. Make sure you have labelled and
given a title to any tables or diagrams which were required and, if you did more than the
two questions required in Section B, decide which one to delete. Remember that only two of
your answers will be given credit in Section B and that you must choose which these are!
The examiners are looking for very simple demonstrations from you. They want to be sure that you:
• have covered the syllabus as described and explained in the subject guide
• know the basic formulae given there and when and how to use them
• understand and answer the questions set.
You are not expected to write long essays where explanations or descriptions of sampling design
are required, and note-form answers are acceptable. However, clear and accurate language, both
mathematical and written, is expected and marked. The explanations below and in the specific
commentaries for the papers for each zone should make these requirements clear.
The most important thing you can do is answer the question set! This may sound very simple, but
these are some of the things that candidates did not do, though asked, in the 2018 examinations!
Remember the following.
• If you are asked to label a diagram (which is almost always the case!), please do so. Writing
‘Histogram’ or ‘Stem-and-leaf diagram’ in itself is insufficient. What do the data describe?
What are the units? What are the x-axis and y-axis?
• If you are specifically asked to perform a hypothesis test, or calculate a confidence interval,
do so. It is not acceptable to do one rather than the other! If you are asked to use a 5%
significance level, this is what will be marked.
• Do not waste time calculating things which are not required by the examiners. If you are
asked to find the line of best fit, you will get no marks if you calculate the correlation
coefficient as well. If you are asked to use the confidence interval you have just calculated to
comment on the results, carrying out an additional hypothesis test will not gain you marks.
• When performing calculations try to use as many decimal places as possible in intermediate
steps to reach the most accurate solution. It is advised to have at least two decimal places
in general and at least three decimal places when calculating probabilities.
2
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
How should you use the specific comments on each question given in the
Examiners0 commentaries?
We hope that you find these useful. For each question and subquestion, they give:
• further guidance for each question on the points made in the last section
• the answers, or keys to the answers, which the examiners were looking for
• the relevant detailed reference to Newbold, P., W.L. Carlson and B.M. Thorne Statistics for
business and economics. (London: Prentice–Hall, 2012) eighth edition [ISBN
9780273767060] and the subject guide
• where appropriate, suggested activities from the subject guide which should help you to
prepare, and similar questions from Newbold et al. (2012).
Any further references you might need are given in the part of the subject guide to which you are
referred for each answer.
It was noted recently that a small number of candidates appeared to be memorising answers from
previous years’ Examiners’ commentaries, for example plots, and produced the exact same image of
them without looking at the current year’s examination paper questions! Note that this is very easy
to spot. The Examiners’ commentaries should be used as a guide to practise on sample examination
questions and it is pointless to attempt to memorise them.
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons, but one particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates might not cover all topics in the syllabus in the same depth, but you
need to be aware that examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet available on the VLE. You should read
the syllabus carefully and ensure that you cover sufficient material in preparation for the
examination. Examiners will vary the topics and questions from year to year and may well set
questions that have not appeared in past papers. Examination papers may legitimately include
questions on any topic in the syllabus. So, although past papers can be helpful during your revision,
you cannot assume that topics or specific questions that have come up in past examinations will
occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
3
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2017–18. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
(a) Suppose that x1 = −0.5, x2 = 2.5, x3 = −2.8, x4 = 0.4, x5 = 6.1, and y1 = −0.5,
y2 = 4.0, y3 = 4.6, y4 = −2.0, y5 = 0. Calculate the following quantities:
i=5 i=2 i=5
X X 1 X yi2
i. x2i ii. iii. y43 + .
i=3 i=1
x i yi i=4
xi
(6 marks)
4
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. We have:
i=5
X
x2i = (−2.8)2 + (0.4)2 + (6.1)2 = 7.84 + 0.16 + 37.21 = 45.21.
i=3
ii. We have:
i=2
X 1 1 1
= + = 4 + 0.1 = 4.1.
x
i=1 i i
y (−0.5) × (−0.5) 2.5 × 4.0
iii. We have:
i=5 2
(−2.0)2 02
X y i
y43 + 3
= (−2.0) + + = −8 + 10 = 2.
i=4
xi 0.4 6.1
(b) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as nominal or ordinal.
Justify your answer. (No marks will be awarded without a justification.)
i. Gross domestic product (GDP) of a country.
ii. Five possible responses to a customer satisfaction survey ranging from ‘very
satisfied’ to ‘very dissatisfied’.
iii. A person’s name.
(6 marks)
(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a plain true/false answer.)
i. If A and B are independent events, then P (A ∩ B) = P (A)/P (B).
ii. If X ∼ N (3, 4), then P (X ≤ 3) = 0.5.
iii. A p-value can be negative.
5
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(e) The random variable X takes the values −1, 1 and 2 according to the following
probability distribution:
x −1 1 2
pX (x) 0.20 k 4k
6
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
i. Determine the constant k and, hence, write down the probability distribution
of X.
ii. Find E(X) (the expected value of X).
iii. Find Var(X) (the variance of X).
(6 marks)
iii. We have:
X
E(X 2 ) = x2i p(xi ) = (−1)2 × 0.20 + 12 × 0.16 + 22 × 0.64 = 2.92
i
hence:
Var(X) = 2.92 − (1.24)2 = 1.3824.
− µ)2 p(xi ),
P
An alternative method to find the variance is through the formula i (xi
where µ = E(X) was found in part ii.
7
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(g) You are told that a 99% confidence interval for a single population proportion is
(0.3676, 0.5324).
i. What was the sample proportion that lead to this confidence interval?
ii. What was the size of the sample used?
(6 marks)
i. The sample proportion, p, must be in the centre of the interval (0.3676, 0.5324). Adding
the two endpoints and dividing by 2 gives p = (0.3676 + 0.5324)/2 = 0.45.
ii. The (estimated) standard error when estimating a single proportion is:
r √
p (1 − p) 0.45 × 0.55 0.4975
= √ = √ .
n n n
Since this is a 100 (1 − α)% = 99% confidence interval, then α = 0.01, so the confidence
coefficient is zα/2 = z0.005 = 2.576. Therefore, to determine n we need to solve:
0.4975
2.576 × √ = 0.5324 − 0.45 = 0.0824.
n
8
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
(a) An experiment was conducted to examine whether age, in particular being over
30 or not, has any effect on preferences for a digital or an analogue watch.
Specifically, 129 randomly-selected people were asked what watch they prefer
and their responses are summarised in the table below:
i. Based on the data in the table, and without conducting any significance test,
would you say there is an association between age and watch preference?
Provide a brief justification for your answer.
ii. Calculate the χ2 statistic for the hypothesis of independence between age
and watch preference, and test that hypothesis. What do you conclude?
(13 marks)
9
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
which gives a test statistic value of 24.146. This is a 2 × 3 contingency table so the
degrees of freedom are (2 − 1) × (3 − 1) = 2.
For α = 0.05, the critical value is 5.991, hence reject H0 .
For α = 0.01, the critical value is 9.210, hence reject H0 .
We conclude that there is strong evidence of an association between age and watch
preference.
Many candidates looked up the statistical tables incorrectly and so failed to follow
through their earlier accurate work.
(b) You work for a market research company and your manager has asked you to
carry out a random sample survey for a mobile phone company to identify
whether a recently launched mobile phone is attractive to people over 40 years
old. Limited time and money resources are available at your disposal. You are
being asked to prepare a brief summary containing the items below.
i. Choose an appropriate probability sampling scheme. Provide a brief
justification for your answer.
ii. Describe the sampling frame and the method of contact you will use. Briefly
explain the reasons for your choices.
iii. Provide an example in which response bias may occur. State an action that
you would take to address this issue.
iv. State the main research question of the survey. Identify the variables
associated with this question.
(12 marks)
10
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
is a waste of your valuable examination time. If you can identify what is being asked, keep
in mind that the answer should not be long.
Note also that in some cases there is no single right answer to the question. Some suggested
answers are given below.
i. Cluster sampling is appropriate here due to the cost issue (the subject guide emphasises
its use for reasons of economy). Also, multistage sampling is an option. Although the
question mentions limited time, discussion of quota sampling (for speed) gained no
marks due to the question stressing a ‘probability sampling scheme’.
ii. The question requires:
∗ a description of a sampling frame
∗ a justification of its choice
∗ mentioning a (sensible) contact method
∗ stating an advantage of the contact method mentioned above.
A suggested answer is given below.
The sampling frame could be an email list from the records of the mobile phone
company, which should be easy to obtain. Assuming that the new type is a smartphone,
the method of contact can be via email. Alternatively, if the new type is not a
smartphone, some basic questions can be sent by text. Method of contact by post would
be too expensive compared with the other two methods.
iii. The question requires an example of response bias and an action suggested to address
this issue.
Those least comfortable with the phone are unlikely to reply, so the questionnaire should
be designed appropriately. Also, busy people may not want to spend time on such
market research. A reward of, say, ten free text messages could be offered as an incentive
so that the cost remains low.
iv. A suggested answer for the question is ‘Are attitudes to the new phone more favourable
to younger owners?’.
In terms of variables one could mention ‘age’ and ‘measure of favourableness toward the
new phone’.
Question 3
(a) An area manager in a department store wants to study the relationship between
the number of workers on duty, x, and the value of merchandise lost to
shoplifters, y, in $. To do so, the manager assigned a different number of
workers for each of 10 weeks. The results were as follows:
Week #1 #2 #3 #4 #5 #6 #7 #8 #9 #10
x 9 11 12 13 15 18 16 14 12 10
y 420 350 360 300 225 200 230 280 315 410
The summary statistics for these data are:
Sum of x data: 130 Sum of the squares of x data: 1760
Sum of y data: 3090 Sum of the squares of y data: 1007750
Sum of the products of x and y data: 38305
i. Draw a scatter diagram of these data on the graph paper provided. Carefully
label the diagram.
ii. Calculate the sample correlation coefficient. Interpret its value.
iii. Calculate and report the least squares line of y on x. Draw the line on the
scatter diagram.
iv. Based on the regression model above, what will be the predicted loss from
shoplifting when there are 17 workers on duty? Would you trust this value?
Justify your answer.
(13 marks)
11
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
350
300
250
200
10 12 14 16 18
ii. The summary statistics can be substituted into the formula for the sample correlation
coefficient (make sure you know which one it is!) to obtain the value r = −0.9688. An
interpretation of this value is the following – the data suggest that the higher the number
of workers, the lower the loss from shoplifters. The fact that the value is very close to −1
suggests that this is a strong, negative, linear association.
Many candidates did not mention all three words (strong, negative, linear). Note that all
of these words provide useful information on interpreting the association and are,
therefore, required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = −26.64.
The formula for a is a = ȳ − bx̄, and we get a = 655.36.
Hence the regression line can be written as:
12
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2 .
The test statistic formulae, depending on whether or not a pooled variance is used, are
provided on the formula sheet:
x̄1 − x̄2 x̄1 − x̄2
q or p .
s2p (1/n1 + 1/n2 ) s1 /n1 + s22 /n2
2
13
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
If equal variances are assumed, the test statistic value is 2.1449. If equal variances are
not assumed, the test statistic value is 2.1164.
The variances are unknown but the sample size is large enough, so the standard normal
distribution can be used due to the central limit theorem. The t40 distribution is also
correct and will be used in what follows.
The critical values at the 5% significance level are ±2.021, hence we reject the null
hypothesis. If we take a (smaller) α of 1%, the critical values are ±2.704, so we do not
reject H0 . We conclude that there is moderate evidence of a difference between the two
tutoring groups.
ii. The assumptions for part i. relate to the following.
• Assumption about equal variances.
• Assumption about whether nA + nB is ‘large’ so that the normality assumption is
satisfied.
• Assumption about independent samples.
• Assumption about normality.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also, some other candidates just memorised the phrase
‘assumption about equal variances’ and, naturally, were not awarded any marks. One
should state whether the calculations were based on the assumption that the unknown
variances are equal or unequal.
iii. It is important to identify the correct formula for this confidence interval and substitute
correctly the elements required. Assuming a t distribution with 21 degrees of freedom,
the correct t-value is 1.721. The interval can be worked out as:
6.61
65.33 ± 1.721 × √ .
22
Question 4
(a) A sales department monitors the distribution of orders by their value (in £s).
The data below are the values of 30 recent orders:
76 59 93 87 38
50 56 123 45 67
102 34 54 85 85
50 44 33 51 40
82 92 79 38 86
34 29 107 63 46
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean, the median, the interquartile range and the modal group on
the histogram.
iii. Comment on the data, given the shape of the histogram and the measures
which you have calculated.
(13 marks)
14
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
0.45
0.40
0.35
Frequency density
0.30
0.25
0.20
0.15
0.10
0.05
ii. ∗ Mean: £64.27. Note: Make sure to mention the units to get the full marks.
∗ Median: £57.50. Note: The raw data should be used.
∗ Modal group: between £40 and £59. Note: between £41 and £60 would also be
acceptable.
∗ Correct values of quartiles. Q1 = £44.25 and Q3 = £85.00. Note: Any reasonable
method for quartile calculations would be acceptable.
∗ Interquartile range: £85 − £44.25 = £40.75.
iv. The distribution of the data appears to be slightly positively/right-skewed. This is also
supported by the fact that the mean is larger than the median.
15
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
which follows a standard normal distribution, approximately, due to the central limit
theorem.
∗ Calculation of standard error:
s
41 29 1 1
s.e.(πT − πP ) = × × + = 0.119.
70 70 40 30
16
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2017–18. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2014).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Candidates should answer THREE of the following FOUR questions: QUESTION 1 of Section
A (50 marks) and TWO questions from Section B (25 marks each). Candidates are strongly
advised to divide their time accordingly.
Section A
Question 1
(a) Suppose that x1 = −0.2, x2 = 2.5, x3 = −3.7, x4 = 0.8, x5 = 7.4, and y1 = −0.2,
y2 = 8.0, y3 = 3.9, y4 = −2.0, y5 = 0. Calculate the following quantities:
i=5 i=2 i=5
X X 1 X yi2
i. x2i ii. iii. y43 + .
i=3 i=1
x i yi i=4
xi
(6 marks)
17
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. We have:
i=5
X
x2i = (−3.7)2 + (0.8)2 + (7.4)2 = 13.69 + 0.64 + 54.76 = 69.09.
i=3
ii. We have:
i=2
X 1 1 1
= + = 25 + 0.05 = 25.05.
i=1
x i yi (−0.2) × (−0.2) 2.5 × 8.0
iii. We have:
i=5 2
(−2.0)2 02
X y i
y43 + = (−2.0)3 + + = −8 + 5 = −3.
i=4
xi 0.8 7.4
(b) Classify each one of the following variables as either measurable (continuous) or
categorical. If a variable is categorical, further classify it as nominal or ordinal.
Justify your answer. (No marks will be awarded without a justification.)
i. A person’s nationality.
ii. The unemployment rate of a country.
iii. Responses to a customer opinion survey ranging from ‘strongly agree’ to
‘strongly disagree’.
(6 marks)
(c) State whether the following are true or false and give a brief explanation. (No
marks will be awarded for a plain true/false answer.)
i. If A and B are mutually exclusive events, then P (A ∪ B) = 0.
ii. If X ∼ N (8, 9), then P (X ≥ 8) = 0.5.
iii. A p-value can be greater than 1.
18
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(e) The random variable X takes the values −1, 1 and 3 according to the following
probability distribution:
x −1 1 3
pX (x) 0.10 k 5k
19
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
i. Determine the constant k and, hence, write down the probability distribution
of X.
ii. Find E(X) (the expected value of X).
iii. Find Var(X) (the variance of X).
(6 marks)
iii. We have:
X
E(X 2 ) = x2i p(xi ) = (−1)2 × 0.10 + 12 × 0.15 + 32 × 0.75 = 7
i
hence:
Var(X) = 7 − (2.3)2 = 1.71.
− µ)2 p(xi ),
P
An alternative method to find the variance is through the formula i (xi
where µ = E(X) was found in part ii.
20
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
(g) You are told that a 90% confidence interval for a single population proportion is
(0.3853, 0.5147).
i. What was the sample proportion that lead to this confidence interval?
ii. What was the size of the sample used?
(6 marks)
i. The sample proportion, p, must be in the centre of the interval (0.3853, 0.5147). Adding
the two endpoints and dividing by 2 gives p = (0.3853 + 0.5147)/2 = 0.45.
ii. The (estimated) standard error when estimating a single proportion is:
r √
p (1 − p) 0.45 × 0.55 0.4975
= √ = √ .
n n n
Since this is a 100 (1 − α)% = 90% confidence interval, then α = 0.1, so the confidence
coefficient is zα/2 = z0.05 = 1.645. Therefore, to determine n we need to solve:
0.4975
1.645 × √ = 0.5147 − 0.45 = 0.0647.
n
21
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
Section B
Answer two out of the three questions from this section (25 marks each).
Question 2
(a) A survey was conducted in order to examine whether the final grade of students
taking a class is associated with their attendance of a revision session a few days
before the examination. The data, consisting of students’ final grades and
revision session attendance, are summarised in the table below.
Final Final Final
Grade A Grade B Grade C
Attended revision session 56 34 28
Did not attend revision session 44 46 42
i. Based on the data in the table, and without conducting any significance test,
would you say there is an association between final grade and attending
revision? Provide a brief justification for your answer.
ii. Calculate the χ2 statistic for the hypothesis of independence between final
grade and attending revision, and test that hypothesis. What do you
conclude?
(13 marks)
22
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
which gives a test statistic value of 5.273. This is a 2 × 3 contingency table so the degrees
of freedom are (2 − 1) × (3 − 1) = 2.
For α = 0.05, the critical value is 5.991, hence do not reject H0 .
For α = 0.1, the critical value is 4.605, hence reject H0 .
We conclude that there is weak evidence of an association between attending revision
and final grade.
Many candidates looked up the statistical tables incorrectly and so failed to follow
through their earlier accurate work.
(b) You work for a market research company and your manager has asked you to
carry out a random sample survey for a laptop company to identify whether a
new laptop model is attractive to females. The main concern is to produce
results of high accuracy. You are being asked to prepare a brief summary
containing the items below.
i. Choose an appropriate probability sampling scheme. Provide a brief
justification for your answer.
ii. Describe the sampling frame and the method of contact you will use. Briefly
explain the reasons for your choices.
iii. Provide an example in which selection bias may occur. State an action that
you would take to address this issue.
iv. State the main research question of the survey. Identify the variables
associated with this question.
(12 marks)
23
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
response. If you are unsure of what these things are, do not write lengthy answers. This
is a waste of your valuable examination time. If you can identify what is being asked, keep
in mind that the answer should not be long.
Note also that in some cases there is no single right answer to the question. Some suggested
answers are given below.
i. We are asked for accuracy and random (probability) sampling, so a reasonable is option
is the use of stratified random sampling which is known to produce results of high
accuracy. An example of a sampling scheme could be ‘a stratified sample of those
customers who bought this laptop recently’.
ii. The question requires:
∗ a description of a sampling frame
∗ a justification of its choice
∗ mentioning a (sensible) contact method
∗ stating an advantage of the contact method mentioned above.
A suggested answer is given below.
Use a list provided by retailers to identify those who bought this laptop model recently.
The list could include the postal address, telephone or email. Stratification can be made
by area of country or by gender of buyer. Finally, an explanation as to which you would
prefer – for example, email is fast if all have it but there may be a lot of non-response.
iii. The question requires an example of selection bias and an action suggested to address
this issue.
For example, retailers’ records may be incomplete. Offer incentives to make sure they
keep accurate records.
iv. A suggested answer for the question is ‘How does preference for the laptop model
compare for men and women?’.
In terms of variables on could mention ‘gender’ and ‘buying preference’.
Question 3
(a) A study was conducted to determine whether the yield of olive oil is associated
with the average temperature of the area. The data in the table below provide
the average kilograms of olive oil per tree (y) and the average temperature (x),
measured in degrees Celsius. The data correspond to areas taken for 12
different countries.
24
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
20
15
10
5
2 4 6 8 10
ii. The summary statistics can be substituted into the formula for the sample correlation
coefficient (make sure you know which one it is!) to obtain the value r = 0.8049. An
interpretation of this value is the following – the data suggest that the higher the average
temperature, the higher the olive oil yield. The fact that the value is very close to 1
suggests that this is a strong, positive, linear association.
Many candidates did not mention all three words (strong, positive, linear). Note that all
of these words provide useful information on interpreting the association and are,
therefore, required to obtain full marks.
iii. The regression line can be written by the equation yb = a + bx or y = a + bx + ε. The
formula for b is: P
xi yi − nx̄ȳ
b= P 2
xi − nx̄2
and by substituting the summary statistics we get b = 2.744.
The formula for a is a = ȳ − bx̄, and we get a = −2.641.
Hence the regression line can be written as:
25
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
(b) A survey was conducted in order to compare the average delivery times (in
minutes) between two pizza companies operating in the same area. A random
sample was drawn consisting of various pizza orders from both companies and
the delivery times were recorded. The data are summarised in the following
table:
H0 : µ1 = µ2 vs. H1 : µ1 6= µ2 .
The test statistic formulae, depending on whether or not a pooled variance is used, are
provided on the formula sheet:
x̄1 − x̄2 x̄1 − x̄2
q or p .
s2p (1/n1 + 1/n2 ) s1 /n1 + s22 /n2
2
26
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
If equal variances are assumed, the test statistic value is 3.8180. If equal variances are
not assumed, the test statistic value is 4.1639.
The variances are unknown but the sample size is large enough, so the standard normal
distribution can be used due to the central limit theorem. The t60 distribution is also
correct and will be used in what follows.
The critical values at the 5% significance level are ±2.000, hence we reject the null
hypothesis. If we take a (smaller) α of 1%, the critical values are ±2.660, so we still reject
H0 . We conclude that there is strong evidence of a difference between the two companies.
ii. The assumptions for part i. relate to the following.
• Assumption about equal variances.
• Assumption about whether nA + nB is ‘large’ so that the normality assumption is
satisfied.
• Assumption about independent samples.
• Assumption about normality.
Some candidates stated assumptions in this part that were not made in part i. Marks
were not awarded in such cases. Also, some other candidates just memorised the phrase
‘assumption about equal variances’ and, naturally, were not awarded any marks. One
should state whether the calculations were based on the assumption that the unknown
variances are equal or unequal.
iii. It is important to identify the correct formula for this confidence interval and substitute
correctly the elements required. Assuming a t distribution with 28 degrees of freedom,
the correct t-value is 2.467. The interval can be worked out as:
1.1
27.5 ± 2.467 × √ .
29
Question 4
(a) A large company is checking the salaries of its employees regularly to get an
idea of their distribution. The data below are the salaries (in $000s per year
before tax) of 30 employees.
39 40 44 47 32
37 25 71 56 33
64 63 42 43 34
25 28 35 24 45
35 22 53 55 36
46 46 27 27 38
i. Carefully construct, draw and label a histogram of these data on the graph
paper provided.
ii. Find the mean, the median, the interquartile range and the modal group on
the histogram.
iii. Comment on the data, given the shape of the histogram and the measures
which you have calculated.
(13 marks)
27
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
ST104a Statistics 1
0.9
0.8
0.7
Frequency density
0.6
0.5
0.4
0.3
0.2
0.1
20 30 40 50 60 70 80
ii. ∗ Mean: $40,400. Note: Make sure to mention the units to get the full marks.
∗ Median: $38,500. Note: The raw data should be used.
∗ Modal group: between $30,000 and $39,000. Note: between $31,000 and $40,000
would also be acceptable.
∗ Correct values of quartiles. Q1 = $32,250 and Q3 = $46,000. Note: Any reasonable
method for quartile calculations would be acceptable.
∗ Interquartile range: $46,000 − $32,250 = $13,750.
iv. The distribution of the data appears to be slightly positively/right-skewed. This is also
supported by the fact that the mean is larger than the median.
28
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Stuvia.com - The Marketplace to Buy and Sell your Study Material
which follows a standard normal distribution, approximately, due to the central limit
theorem.
∗ Calculation of standard error:
s
44 26 1 1
s.e.(πT − πP ) = × × + = 0.117.
70 70 40 30
29
Downloaded by: aruzhanyerbolatova | aruzhan.yerbolatovaa@gmail.com
Distribution of this document is illegal
Powered by TCPDF (www.tcpdf.org)