Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

3k kertaus stat B [216 

marks]
1a. [1 mark]
Anita is concerned that the construction of a new factory will have an adverse affect on the fish in
a nearby lake. Before construction begins she catches fish at random, records their weight and
returns them to the lake. After the construction is finished she collects a second, random sample
of weights of fish from the lake. Her data is shown in the table.

Anita decides to use a t-test, at the 5% significance level, to determine if the mean weight of the
fish changed after construction of the factory.
State an assumption that Anita is making, in order to use a t-test.

Markscheme
EITHER
The weights of the fish are distributed normally.          A1
OR
The variance of the two groups of fish is equal.          A1
[1 mark]
1b. [1 mark]
State the hypotheses for this t-test.

Markscheme
H 0 :B= A and  H 1 : B ≠ A          A1

where B and A represent the weights before and after.


[1 mark]
1c. [3 marks]
Find the p-value for this t-test.

Markscheme
df = 14,  t = 0.861         (M1)
p-value = 0.403         A2
[3 marks]
1d. [2 marks]
State the conclusion of this test, in context, giving a reason.

Markscheme
Since 0.403 > 0.05               R1
Do not reject H0.
There is insufficient evidence, at the 5% level, of a change in weight.           A1
[2 marks]
2a. [5 marks]
In an effort to study the level of intelligence of students entering college, a psychologist collected
data from 4000 students who were given a standard test. The predictive norms for this particular
test were computed from a very large population of scores having a normal distribution with
mean 100 and standard deviation of 10. The psychologist wishes to determine whether the 4000
test scores he obtained also came from a normal distribution with mean 100 and standard
deviation 10. He prepared the following table (expected frequencies are rounded to the nearest
integer):

 
Copy and complete the table, showing how you arrived at your answers.

Markscheme
To calculate expected frequencies, we multiply 4000 by the probability of each cell:

    p ( 80.5 ≤ X ≤ 90.5 ) =p ( 80.5−100


10
≤ Z≤
90.5−100
10 )       (M1)
          ¿ p (−19.5≤ Z ≤−0.95 )
          ¿ 0.1711−0.0256
          ¿ 0.1455
Therefore, the expected frequency ¿ 4000 × 0.1455       (M1)
          ≈ 582       (A1)
Similarly:  p ( 90.5 ≤ X ≤ 100.5 )=0.5199−0.1711
          ¿ 0.3488
    Frequency ¿ 4000 × 0.3488
          ≈ 1396       (A1)
And  p ( 100.5≤ X ≤ 110.5 )=0.8531−0.5199
          ¿ 0.3332
    Frequency ¿ 4000 × 0.3332
          ≈ 1333       (A1)
[5 marks]
2b. [6 marks]
Test the hypothesis at the 5% level of significance.

Markscheme
To test the goodness of fit of the normal distribution, we use the χ 2 distribution. Since the last cell
has an expected frequency less than 5, it is combined with the cell preceding it. There are
therefore 7 – 1 = 6 degrees of freedom.          (C1)  

2( 20−6 )2 ( 90−96 )2 ( 575−582 )2 ( 1282−1396 )2 ( 1450−1333 )2 ( 499−507 )2 ( 84−80 ) 2


χ= + + + + + +             
6 96 582 1396 1333 507 80
(M1)
= 53.03          (A1)          
H0: Distribution is Normal with μ=100 and σ =10.
H1: Distribution is not Normal with μ=100 and σ =10.            (M1)

χ 2(0.95 ,6) =14.07


2 2
Since  χ =53.0> χ critical=14.07 , we reject H0          (A1) 

Or use of p-value
Therefore, we have enough evidence to suggest that the normal distribution with mean 100 and
standard deviation 10 does not fit the data well.          (R1) 
Note: If a candidate has not combined the last 2 cells, award (C0)(M1)(A0)(M1)(A1)(R1) (or as
appropriate).
[6 marks]
3. [9 marks]
Six coins are tossed simultaneously 320 times, with the following results.

At the 5% level of significance, test the hypothesis that all the coins are fair.

Markscheme
Let H0 be the hypothesis that all coins are fair,      (C1)
and let H1 be the hypothesis that not all coins are fair.     (C1)
Let T be the number of tails obtained, T  is binomially distributed.               (M1)

        (A3)
Notes:  Award (A2) if one entry on the third row is incorrect. Award (A1) if two entries on the
third row are incorrect. Award (A0) if three or more entries on the third row are incorrect.

( 5−5 )2 ( 40−30 )2 ( 86−75 )2 ( 89−100 )2 ( 67−75 )2 ( 29−30 )2 ( 4−5 )2


χ 2calc = + + + + + +
5 30 75 100 75 30 5
¿ 7.24           (A1) 

Also  χ 20.05 , 6=12.592          (A1) 

Since 7.24 < 12.592, H0 cannot be rejected.         (R1)


[9 marks]
4a. [2 marks]
Sue sometimes goes out for lunch. If she goes out for lunch on a particular day then the
probability that she will go out for lunch on the following day is 0.4. If she does not go out for
lunch on a particular day then the probability she will go out for lunch on the following day is 0.3.
Write down the transition matrix for this Markov chain.

Markscheme

(0.4
0.6 0.7 )
0.3
      M1A1

[2 marks]
4b. [2 marks]
We know that she went out for lunch on a particular Sunday, find the probability that she went
out for lunch on the following Tuesday.

Markscheme

( ) ( 10)=( 0.34
0.66 )
2
0.4 0.3
     M1
0.6 0.7

So probability is 0.34       A1


[2 marks]
4c. [3 marks]
Find the steady state probability vector for this Markov chain.

Markscheme

(0.4 0.3
)( p
0.6 0.7 1− p )=(
p
1− p ) 1
⇒ 0.4 p +0.3 ( 1− p )= p ⇒ p=     M1A1
3

()
1
3
So vector is         A1
2
3

[or by investigating high powers of the transition matrix]


[3 marks]
5a. [4 marks]
A 2 ×2 transition matrix for a Markov chain will have the form M = (1−a
a
)
1−b , 0< a<1 , 0<b<1
b
.

Show that  λ=1 is always an eigenvalue for M and find the other eigenvalue in terms of a and b .

Markscheme

|1−a
a−λ 1−b
b−λ |
=0 ⇒ ( a−λ )( b−λ )−( 1−b ) (1−a ) =0        M1A1

⇒ λ −( a+ b ) λ+a+ b−1=0 ⇒ ( λ−1 ) ( λ+ ( 1−a−b ) )=0         A1


2

⇒ λ=1∨ λ=a+b−1         AGA1


[4 marks]
5b. [5 marks]
Find the steady state probability vector for M in terms of a and b .

Markscheme

(1−a
a 1−b
b )( ) ( )
p
1− p
=
p
1− p
⇒ap+ 1−b− p+bp=p        M1A1

1−b
⇒ 1−b= (2−a−b ) p⇒ p=          M1
2−a−b

( )
1−b
2−a−b
So vector is           A1A1
1−a
2−a−b

[5 marks]
6a. [1 mark]
A company sends a group of employees on a training course. Afterwards, they survey these
employees to gather data on the effectiveness of the training. In order to test the reliability of the
survey, they design two sets of similar questions, which are given to the employees one week
apart.
State the name of this test for reliability.

Markscheme
Parallel Forms         A1
[1 mark]
6b. [1 mark]
State a possible disadvantage of using this test for reliability.

Markscheme
EITHER
The two sets of questions might not be of equal difficulty         R1
OR
It is time consuming to create two sets of questions           R1        
[1 mark]
6c. [2 marks]
The questions in the survey were grouped in different sections. The mean scores of the
employees on the first section of each survey are given in the table.

Calculate Pearson’s product moment correlation coefficient for this data.

Markscheme
r =0.958           A2
[2 marks]
6d. [2 marks]
Hence determine, with a reason, if the survey is reliable.

Markscheme
Since the value of r is close to +1,           R1
The survey is reliable.           A1
[2 marks]
7a. [1 mark]
As part of the selection process for an engineering course at a particular university, applicants
are given an exam in mathematics. This year the university has produced a new exam and they
want to test if it is a valid indicator of future performance, before giving it to applicants. They
randomly select 8 students in their first year of the engineering course and give them the exam.
They compare the exam scores with their results in the engineering course.
State the name of this test for validity.

Markscheme
criterion-related          A1
[1 mark]
7b. [2 marks]
The results of the 8 students are shown in the table.

Calculate Pearson’s product moment correlation coefficient for this data.

Markscheme
r =0.414          A2
[2 marks]
7c. [2 marks]
Hence determine, with a reason, if the new exam is a valid indicator of future performance.

Markscheme
Since the value of r  is low (closer to 0 than +1),         R1
The new exam is not a valid indicator of future performance.         A1
[2 marks]
8a. [1 mark]
Saloni wants to find a model for the temperature of a bottle of water after she removes it from
the fridge. She uses a temperature probe to record the temperature of the water, every 5 minutes.

After graphing the data, Saloni believes a suitable model will be


t +¿¿
T =28−a b , where a , b ∈ R .
Explain why 28−T can be modeled by an exponential function.

Markscheme
Rearranging the model gives 28−T =a bt        A1
So 28−T can be modeled by an exponential function.        AG
[2 marks]
8b. [3 marks]
Find the equation of the least squares exponential regression curve for 28−T .

Markscheme

       (A1)
t
28−T =22.7 ( 0.925 )         M1A1

[3 marks]
8c. [1 mark]
Write down the coefficient of determination,  R2.

Markscheme
2
R =0.974         A1
[1 mark]
8d. [1 mark]
Interpret what the value of R2 tells you about the model.

Markscheme
Since the value of R2 is close to +1, the model is a good fit for the data.         R1
[1 mark]
8e. [2 marks]
Hence predict the temperature of the water after 3 minutes.
Markscheme
T =28−( 22.69 … ) ( 0.9250 … )3=10.0 minutes         M1A1
[2 marks]
9a. [1 mark]
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The owner believes that
the number of brown eggs in a box can be modelled by a binomial distribution. He examines 100
boxes and obtains the following data.

Calculate the mean number of brown eggs in a box.

Markscheme
Note: Candidates may obtain slightly different numerical answers depending on the calculator and
approach used. Use discretion in marking.
1× 29+ …+6 ×1
Mean ¿ =1.98        (A1)
100
[1 mark]
9b. [1 mark]
Hence estimate p, the probability that a randomly chosen egg is brown.
Markscheme
Note: Candidates may obtain slightly different numerical answers depending on the calculator and
approach used. Use discretion in marking.
̂
1.98
p= =0.33       (A1)
6
[1 mark]
9c. [8 marks]
By calculating an appropriate χ 2 statistic, test, at the 5% significance level, whether or not the
binomial distribution gives a good fit to these data.

Markscheme
Note: Candidates may obtain slightly different numerical answers depending on the calculator and
approach used. Use discretion in marking.
The calculated values are
f 0                f e        ( f 0−f e )2
10            9.046         0.910
29          26.732          5.14       (M1)
31          32.917         3.675      (A1)
18          21.617       13.083      (A1)
12            9.688         5.345      (A1)
Note: Award (M1) for the attempt to calculate expected values, (A1) for correct expected values,
2
(A1) for correct ( f 0−f e ) values, (A1) for combining cells.

2 0.910 5.345
χ= +…+ =1.56       (A1)
9.046 9.688
OR
2
χ =1.56       (G5)
Degrees of freedom = 3; Critical value = 7.815
(or p-value = 0.668 (or 0.669))      (A1)(A1)
We conclude that the binomial distribution does provide a good fit.      (R1)
[8 marks]
10a. [2 marks]
A zoologist believes that the number of eggs laid in the Spring by female birds of a certain breed
follows a Poisson law. She observes 100 birds during this period and she produces the following
table.

Calculate the mean number of eggs laid by these birds.

Markscheme
1×19+ 2× 34+…+5 × 4
Mean ¿        (M1)
100
¿ 2.16         A1  N2
[2 marks]
10b. [2 marks]
The zoologist wishes to determine whether or not a Poisson law provides a suitable model.
Write down appropriate hypotheses.

Markscheme
H0 : Poisson law provides a suitable model          A1
H1 : Poisson law does not provide a suitable model          A1
[2 marks]
10c. [14 marks]
Carry out a test at the 1% significance level, and state your conclusion.
Markscheme
The expected frequencies are

    
A1A1A1A1A1A1
Note: Accept expected frequencies rounded to a minimum of three significant figures.

( 10−11.533 )2 ( 4−6.824 )2
χ 2= +…+           (M1)(A2)
11.533 6.82 4
¿ 5.35   (accept 5.33 and 5.34)       A2
v=4    (6 cells − 2 restrictions)          A1
Note: If candidates have combined rows allow FT on their value of v.
Critical value  χ 2=13.277
Because 5.35 < 13.277, the Poisson law does provide a suitable model.          R1  N0
[14 marks]
11. [11 marks]
The number of cars passing a certain point in a road was recorded during 80 equal time intervals
and summarized in the table below.

Carry out a χ 2 goodness of fit test at the 5% significance level to decide if the above data can be
modelled by a Poisson distribution.
Markscheme
H0 : The data can be modeled by a Poisson distribution.
H1 : The data cannot be modeled by a Poisson distribution.
∑fx 0 × 4+1 ×18+2 ×19+ …+5 ×8 200
∑ f =80 , = = =2.5         A1
∑f 80 80
Theoretical frequencies are

f ( 0 )=8.0 e−2.5 =6.5668        (M1)(A1)


2.5
f (1)= ×6.5668=16.4170        A1
1
2.5
f ( 2) = ×16.4170=20.5212
2
2.5
f ( 3 )= ×20.5212=17.1010
3
2.5
f ( 4)= ×17.1010=10.6882         A1
4
Note:    Award A1 for f ( 2 ) , f ( 4 ) , f ( 4 ) .
f (5 or more) ¿ 80−( 6.5668+16.4170+20.5212+17.1010+10.6882 )        A1
          ¿ 8.7058

( 4−6.5668 )2 ( 18−16.4170 )2 ( 19−20.5212 )2 ( 20−17.1010 )2 ( 11−10.6882 )2 ( 8−8.7058 )2


χ 2= + + + + +
6.5668 16.4170 20.5212 17.1010 10.6882 8.7058
      ¿ 1.83  (accept 1.82)        (M1)(A1)
       v=4   (six frequencies and two restrictions)        (A1)
       χ 2 ( 4 )=9.488  at the 5% level.        A1
       Since 1.83 < 9.488 we accept H0 and conclude that the distribution can be modeled by a
Poisson distribution.        R1    N0
[11 marks]
12a. [2 marks]
The number of telephone calls received by a helpline over 80 one-minute periods are
summarized in the table below.

Find the exact value of the mean of this distribution.

Markscheme
( 9 ×0+12 ×1+22 ×2+10 ×3+11 × 4+8 × 5+8 ×6 )
Mean  λ=         (M1)
80

             ¿ 2.725= ( 109
40 )
        A1

Note: Do not accept 2.73.


[2 marks]
12b. [12 marks]
Test, at the 5% level of significance, whether or not the data can be modelled by a Poisson
distribution.

Markscheme
H0: the data can be modelled by a Poisson distribution            A1
H1: the data cannot be modelled by a Poisson distribution            A1

        A3
Note:  Award A2 for one error, A1 for two errors, A0 for three or more errors.
Combining last two columns                (M1)
Note:  Allow FT from not combining the last two columns and / or getting 2.98 for the last
expected frequency.
EITHER
2 2 2 2 2 2
2 9 12 22 10 11 16
χ= + + + + + −80         (M1)(A1)
5.244 14.289 19.469 17.684 12.047 11.267
               = 8.804  (accept 8.8)            A1
v=6−2=4,   χ 25 %  =9.488             A1A1

Hence 8.804 is not significant since 8.804 < 9.488 and we accept H 0            R1
OR
p-value = 0.0662    (accept 0.066) which is not significant since              A5
0.0662 > 0.05 and we accept H0          R1  N0
[12 marks]
13a. [3 marks]
The heights, x metres, of the 241 new entrants to a men’s college were measured and the
following statistics calculated.
2
∑ x=412.11, ∑ x =705.5721
Calculate unbiased estimates of the population mean and the population variance.

Markscheme
412.11
x= =1.71            A1
241
2
2 705.5721 412.11
s= − =0.0036        M1A1
240 240 × 241
[3 marks]
13b. [1 mark]
The Head of Mathematics decided to use a χ 2 test to determine whether or not these heights
could be modelled by a normal distribution. He therefore divided the data into classes as follows.

State suitable hypotheses.

Markscheme
H0: Data can be modelled by a normal distribution
H1: Data cannot be modelled by a normal distribution           A1
[1 mark]
13c. [11 marks]
Calculate the value of the χ 2 statistic and state your conclusion using a 10% level of significance.

Markscheme
The expected frequencies are

   
A1A1A1A1A1A1
2 2 2
2 5 34 12
χ= + + …+ −241=3.30/3.29        M1A1
8.04 30.19 16.10
Degrees of freedom = 3       A1
Critical value = 6.251 or p-value = 0.35       A1
The data can be modelled by a normal distribution.       R1
[11 marks]
14a. [1 mark]
A pharmaceutical company has developed a new drug to decrease cholesterol. The final stage of
testing the new drug is to compare it to their current drug. They have 150 volunteers, all recently
diagnosed with high cholesterol, from which they want to select a sample of size 18. They require
as close as possible 20% of the sample to be below the age of 30, 30% to be between the ages of
30 and 50 and 50% to be over the age of 50.
State the name for this type of sampling technique.

Markscheme
stratified sampling        A1
[1 mark]
14b. [3 marks]
Calculate the number of volunteers in the sample under the age of 30.

Markscheme
0.2 ×18=3.6       M1A1
so 4 volunteers need to be chosen       A1
[3 marks]
14c. [1 mark]
Half of the 18 volunteers are given the current drug and half are given the new drug. After six
months each volunteer has their cholesterol level measured and the decrease during the six
months is shown in the table.

Calculate the mean decrease in cholesterol for


The new drug.

Markscheme
34.8 mg/dL      A1
[1 mark]
14d. [1 mark]
The current drug.

Markscheme
24.7 mg/dL      A1
[1 mark]
14e. [1 mark]
The company uses a t-test, at the 1% significance level, to determine if the new drug is more
effective at decreasing cholesterol.
State an assumption that the company is making, in order to use a t-test.

Markscheme
EITHER
The decreases in cholesterol are distributed normally    A1
OR
The variance of the two groups of volunteers is equal.    A1
[1 mark]
14f. [1 mark]
State the hypotheses for this t-test.

Markscheme
H 0 : N=C and  H 1 : N > C          A1

where N and C represent the decreases of the new and current drug
[1 mark]
14g. [3 marks]
Find the p-value for this t-test.

Markscheme
df = 16, t = 2.77        (M1)
p-value = 0.00683        A2
[3 marks]
14h. [2 marks]
State the conclusion of this test, in context, giving a reason.

Markscheme
Since 0.00683 < 0.01        R1
Reject H0. There is evidence, at the 1% level, that the new drug is more effective.       A1
[2 marks]
15a. [12 marks]
Jim writes a computer program to generate 500 values of a variable Z. He obtains the following
table from his results.
Use a chi-squared goodness of fit test to investigate whether or not, at the 5 % level of
significance, the N(0, 1) distribution can be used to model these results.

Markscheme

          (A1)(A1)
(A1)(A1)(A1)(A1)

2 ( 16−11.35 )2
χ= +…           (M1)
11.35
= 7.94          A1
Degrees of freedom = 5          A1
Critical value = 11.07          A1
Or use of p-value
We conclude that the data fit the N(0, 1) distribution.          R1
at the 5% level of significance          A1
[12 marks]
15b. [2 marks]
In this situation, state briefly what is meant by
a Type I error.

Markscheme
Type I error concluding that the data do not fit N(0, 1) when in fact they do.         R2
[2 marks]
15c. [2 marks]
a Type II error.

Markscheme
Type II error concluding that data fit N(0, 1) when in fact they do not.       R2
[2 marks]
16a. [3 marks]
The curve  y=f ( x ) is shown in the graph, for 0 ≤ x ≤ 10.
The curve  y=f ( x ) passes through the following points.

It is required to find the area bounded by the curve, the x -axis, the y -axis and the line x=10 .
Use the trapezoidal rule to find an estimate for the area.

Markscheme
2
Area =  ( 2+2 ( 4.5+ 4.2+ 3.3+4.5 ) +8 )        M1A1
2
Area = 43        A1
[3 marks]
16b. [3 marks]
One possible model for the curve  y=f ( x ) is a cubic function.
Use all the coordinates in the table to find the equation of the least squares cubic regression
curve.
Markscheme
y=0.0389 x 3−0.534 x2 +2.06 x +2.06       M1A2
[3 marks]
16c. [1 mark]
Write down the coefficient of determination.

Markscheme
2
R =0.991     A1
[1 mark]
16d. [1 mark]
Write down an expression for the area enclosed by the cubic regression curve, the x -axis, the y -
axis and the line x=10.

Markscheme
10

Area = ∫ y dx      A1


0

[1 mark]
16e. [2 marks]
Find the value of this area.

Markscheme
42.5     A2
[2 marks]
17a. [1 mark]
The hens on a farm lay either white or brown eggs. The eggs are put into boxes of six. The farmer
claims that the number of brown eggs in a box can be modelled by the binomial distribution, B(6,
p). By inspecting the contents of 150 boxes of eggs she obtains the following data.
Show that this data leads to an estimated value of p=0.4.

Markscheme
from the sample, the probability of a brown egg is
0 ×7+1 ×32+… 360
= =0.4       A1
6 ×150 900
p=0.4       AG
[1 mark]
17b. [11 marks]
Stating null and alternative hypotheses, carry out an appropriate test at the 5 % level to decide
whether the farmer’s claim can be justified.

Markscheme
if the data can be modelled by a binomial distribution with p=0.4, the expected frequencies of
boxes are given in the table

          A3
Notes: Deduct one mark for each error or omission.
Accept any rounding to at least one decimal place.
null hypothesis: the distribution is binomial          A1
alternative hypothesis: the distribution is not binomial          A1
for a chi-squared test the last two columns should be combined           R1

2 ( 7−7 )2 ( 32−28 )2
χ calc = + + …=6.05  (Accept 6.06)          (M1)A1
7 28
degrees of freedom = 4          A1
critical value = 9.488           A1
Or use of p-value
we conclude that the farmer’s claim can be justified          R1
[11 marks]
18a. [6 marks]
Scientists have developed a type of corn whose protein quality may help chickens gain weight
faster than the present type used. To test this new type, 20 one-day-old chicks were fed a ration
that contained the new corn while another control group of 20 chicks was fed the ordinary corn.
The data below gives the weight gains in grams, for each group after three weeks.

The scientists wish to investigate the claim that Group B gain weight faster than Group A. Test
this claim at the 5% level of significance, noting which hypothesis test you are using. You may
assume that the weight gain for each group is normally distributed, with the same variance, and
independent from each other.

Markscheme
This is a t-test of the difference of two means. Our assumptions are that the two populations are
approximately normal, samples are random, and they are independent from each other.         
(R1)
H0: μ1 − μ2 = 0
H1: μ1 − μ2 < 0          (A1)                
t = −2.460,          (A1)
degrees of freedom = 38          (A1)
Since the value of critical t = −1.686 we reject H0.          (A1)
Hence group B grows faster.          (R1)
[6 marks]
18b. [10 marks]
The data from the two samples above are combined to form a single set of data. The following
frequency table gives the observed frequencies for the combined sample. The data has been
divided into five intervals.

Test, at the 5% level, whether the combined data can be considered to be a sample from a normal
population with a mean of 380.

Markscheme
This is a χ 2 goodness-of-fit test.
To finish the table, the frequencies of the respective cells have to be calculated. Since the
standard deviation is not given, it has to be estimated using the data itself. s = 49.59, eg the third
expected frequency is 40 × 0.308 = 12.32, since P(350.5 < W < 390.5) = 0.3078...
The table of observed and expected frequencies is:

      (M1)(A2)
Since the first expected frequency is 3.22, we combine the two cells, so that the first two rows
become one row, that is,

      (M1)
Number of degrees of freedom is 4 – 1 – 1 = 2           (C1)          
H0: The distribution is normal with mean 380
H1: The distribution is not normal with mean 380         (A1)
The test statistic is
2
2 ( f e −f 0 ) ( 11−11.04 )2 ( 8−12.32 )2 ( 15−10.48 )2 ( 6−6.17 )2
χ calc =∑ = + + +
fe 11.04 12.32 10.48 6.17

= 3.469          (A1)
With 2 degrees of freedom, the critical number is χ 2 = 5.99           (A2)
So, we do not have enough evidence to reject the null hypothesis. Therefore, there is no evidence
to say that the distribution is not normal with mean 380.           (R1)
[10 marks]
19a. [2 marks]
In a reforested area of pine trees, heights of trees planted in a specific year seem to follow a
normal distribution. A sample of 100 such trees is selected to test the validity of this hypothesis.
The results of measuring tree heights, to the nearest centimetre, are recorded in the first two
columns of the table below.
Describe what is meant by
a goodness of fit test (a complete explanation required);

Markscheme
A goodness of fit test is a statistical test of the hypothesis that a set of observed counts of k cells
of a certain large population is consistent with a set of theoretical counts.                (R1)
The test statistic has a χ 2 distribution with k −n degrees of freedom. One degree of freedom is lost
for every parameter that has to be estimated from the sample.            (R1)
[2 marks]
19b. [1 mark]
the level of significance of a hypothesis test.

Markscheme
The level of significance of a hypothesis test is the maximal probability that we reject a true null
hypothesis.      (R1)
[1 mark]
19c. [4 marks]
Find the mean and standard deviation of the sample data in the table above. Show how you
arrived at your answers.
Markscheme
We use the class midpoints in the calculation of the mean and standard deviation.
∑ xi f ( xi ) 30 ×6+ 60× 11+ 90 ×15+… 13350
x= = =                 (M1)
∑ f ( xi ) 100 100

= 133.5                (A1)

√ √
2
∑ x i f ( xi ) 2 900 ×6+3600 × 11+ … 2
s= −( x ) = −133.5                 (M1)
∑f ( x i ) 100

= 56.345  (= 56.3 to 3 sf)                (A1)


[4 marks]
19d. [4 marks]
Most of the expected frequencies have been calculated in the third column. (Frequencies have
been rounded to the nearest integer, and frequencies in the first and last classes have been
extended to include the rest of the data beyond 15 and 225. Find the values of a , b  and c and
show how you arrived at your answers.

Markscheme
Every frequency is the product of the number of observations and the probability of a number in
each class. Since by hypothesis we have a normal distribution, the probabilities can be read from
a normal table with mean 133.5 and standard deviation 56.345                 (M1)
E1 = 100 × P(45 ≤ x  ≤ 75) ≈ 9          so a  = 9              (A1)
E2 = 100 × P(135 ≤ x  ≤ 165) ≈ 20    so b = 20              (A1)
E3 = 100 × P(195 ≤ x  ≤ 225) ≈ 9      so c  = 9              (A1)
[4 marks]
19e. [3 marks]
In order to test for the goodness of fit, the test statistic was calculated to be 1.0847. Show how
this was done.

Markscheme
The test statistic is a χ 2 variable. Hence                 (M1)
2
2 ( f e −f o ) ( 6−6 )2 ( 9−11 )2 (5−6 )2
χ =∑ = + +…                  (M1)
fe 6 9 5

= 1.0847              (A1)
[3 marks]
19f. [5 marks]
State your hypotheses, critical number, decision rule and conclusion (using a 5% level of
significance).

Markscheme
H0: The distribution of tree heights is normally distributed
H1: The distribution is not normal            (M1)
Since the mean and standard deviation were estimated from the sample, the number of degrees
of freedom is 8 – 1 – 2 = 5            (A1)
2
The critical number is χ 5 , 0.05 = 110705

If χ 2 > 11.0705 we reject H0            (A1)

Since χ 2 = 1.0847 < 11.0705, we fail to reject H0            (R1)


Conclusion: we do not have enough evidence to claim that the distribution of tree heights is not
normal            (R1) 
[5 marks]

Printed for TAMPEREEN LYSEON LUKIO


© International Baccalaureate Organization 2022
International Baccalaureate® - Baccalauréat International® - Bachillerato Internacional®

You might also like