Caie A2 Level Further Maths 9231 Further Statistics 2 v1

ZNOTES.
ORG
UPDATED TO 2020-22 SYLLABUS
CAIE A2 LEVEL
FURTHER MATHS
(9231)
SUMMARIZED NOTES ON THE FURTHER STATISTICS 2 SYLLABUS
CAIE A2 LEVEL FURTHER MATHS (9231)
3. Find P (X < m)
1. Continuous Random Solution:
Part (i):
Variables Total area must equal 1 hence
5 5
kx3
1.1. Probability Density Functions (PDF) ∫ kx (6 − x) = [3kx2 − ] =1
3 2

2
Function whose area under its graph represents 125 8
probability used for continuous random variables = 75k − k − 12k + k = 24k = 1
3 3

Represented by f (x)
1
∴k=
24

Part (ii):
Mode is the value which has the greatest probability hence
we are looking for the max point on the pdf
d
[kx (6 − x)] = 6k − 2kx
dx

Finding max point hence stationary point
6k − 2kx = 0
1
6 ( 24 )
x= =3

1
2 ( 24 )

∴ mode = 3
Conditions: Part (iii):
P (X < m) can be interpreted as P (−∞ < X < m)
Total area always = 1
3 3
m
kx3
d
∫ kx (6 − x) = ∫ kx (6 − x) = [3kx2 − ]
∫ f (x) dx = 1 3 2

−∞ 2
c
1 33 23 13
(3(3 ) − − 3(22 ) + ) =
2
Cannot have negative probabilities ∴ graph cannot dip =
24 3 3 36

below x-axis; f (x) ≥ 0

Probability that X lies between a and b is the area from a
to b 1.2. Cumulative Distribution Function
b (CDF)
P (a < X < b) = ∫ f (x) dx
a Gives the probability that the value is less than x
Outside given interval f (x) = 0; show on a sketch P (X < x) or P (X ≤ x)

P (X = b) always equals 0 as there is no area
Notes: Represented by F (x)
P (X < b) = P (X ≤ b) as no extra area added It is the integral of f (x)
The mode of a PDF is its maximum (stationary point) x
Example:
F (x) = ∫ f (t) dt
−∞
Given that:
x= Median: the value of x for which F (x) = 0.5
(Apply analogy to quartiles/percentages)
{
kx(6 − x) 2<x<5
0 otherwise

Notes:
Since it is always impossible to have a value of X
1. Find the value of k
smaller than −∞ or larger than ∞:
2. Find the mode, m
WWW.ZNOTES.ORG
F (−∞) = 0 F (∞) = 1 ⎧0 x≤0
⎨ 4x
1x
0≤x≤1
As x increases, F (x) either increases or remains 9
3
⎩
− 1 ≤ x ≤ 31

constant, but never decreases. 9
9
F (x) is a continuous function even if f (x) is 1 x ≥ 31

discontinuous.
Part (iii):
Useful Relations: Finding the median, you must check in which interval it lies.
P (c < X < d) = F (d) − F (c) Do this by substituting the maximum value for x in the first
P (X > a) = 1 − F (a) case
Example: 1 1 1
×1= <
9 9 2

Given that:
⎧k
This means the median does not lie in this interval ∴
0<x<1
f (x) = ⎨4k 1<x<3 4 3
⎩0

x − = 0.5
9 9

otherwise
15
1. Find the value of k x=
8

2. Find F (x) 1 1
3. Find the difference between the median and the fifth
The fifth percentile lies in the first interval as 20 < 9 so

percentile of X 1 1
x=
9 20

Solution:
Part (i): 9
x=
Total area must equal 1 hence 20

1 3 Find the difference

∫ k+∫
1 3

4k = [kx]0 + [4kx]1 = 1
0 1 15 9 57
− =
8 20 40

= (k − 0) + (12k − 4k ) = 9k = 1
∴k=
1 1.3. Expectation and Variance
9

To calculate expectation
Part (ii):
∞
Integrate each case separately from its −∞ to x
E (X ) = ∫ xf (x) dx
For the first interval 0 ≤x≤1

−∞
x
1 1 x 1 In general:
F (x) = ∫ = [ x] = x
9 9 0 9

0
E (g (X )) = ∫ g (x) f (x) dx
We must split next interval 0 ≤ x ≤ 3 as ∀x
F (x) = P (X ≤ 3) = P (X ≤ 1) + P (1 ≤ x ≤ 3) To calculate variance:

First calculate E(X) as above
1 2
and P (X ≤ 1) = F (1) = 9 The calculate E(X ) by
x ∞
1 1
∴ F (x) = +∫ 4× E (X 2 ) = ∫ x2 f (x) dx
9 9

1 −∞
x
1 1 4 3 Substitute information and calculate using
= + [4 × x] = x −
9 9 1 9 9

Var (X ) = E (X 2 ) − (E (X ))2
Writing in correct notation and fixing intervals (adding equal
sign to inequalities) 1.4. Obtaining f(x) from F(x)
F(x) =
As F is obtained by integrating f , then f can be obtained
by differentiating F
WWW.ZNOTES.ORG
Example: 1 1
( (y − 3)) − 2 ⟹ (y − 7)
The random variable has CDF given by 2 2

F (x) = {0 (x−1)3
8 1 x≤11≤x≤3x≥3
Expressing CDF of Y with ranges worked out FY (Y ) =
P (Y ≤ Y ) =
Find the form of the PDF of X
Solution: ⎧ 0 x≤7
F (x) = ⎨ 12 (y − 7)
⎩
F (x) is unchanging for x < 1 and for x > 3, therefore f (x)

7≤y≤9
is equal to 0. Hence, we must find differentiate in the interval 1 x≥9

1<x<3 Differentiate function to find PDF
′
f (x) = F (x) 1
(y − 7) 7≤y≤9
F (x) = { 2

0 otherwise

3
( ) = (x − 1)
d (x − 1) 3 2
f (x) = Method can be used for both increasing and decreasing
dx 8 8

functions as well as functions with powers (e.g. W = X2)

Hence:
F (x) = { 38 (x − 1)2 0

1 < x < 3 otherwise
1.6. Confidence intervals for the
difference in means
1.5. Distribution of a Function of a
Conf. interval for the difference in means for small
Random Variable samples is given as:
We can deduce the distribution of a simple function of X 1 1

either increasing or decreasing with this procedure: (x − y) ± t α2 ,nx +ny −2 × sp +
nx ny

fX → FX → FY → fY
X and Y are independent populations

Example: The population variance of the two populations is the

The random variable X has pdf fX (x) given by, same but may be unknown
n is small (n < 30)
1 2<x<3
F (x) = { Conf. interval for the difference in means for large n is
0 otherwise

given as:
The random variable Y is given by Y = 2X + 3. Determine
the pdf and cdf of Y . s2x s2y
(x − y) ± z α2 × +

Solution:

nx ny

First step is to find FX (x) and suppose we do, Fx (x)

=
P (X ≤ x) = X and Y are independent populations
The population variance of the two populations is the
⎧ 0 x≤2 same but may be unknown
F (x) = ⎨x − 2
⎩
1≤x≤3 n is large
1 x≥3
Conf. interval for the difference in means for matched
Find the ranges for Y pairs is given as:
sd
(2 × 2) + 3 ≤ y ≤ (3 × 2) + 3 d ± t α2 ,n−1 ×

n

7≤y≤9
Question 3 9231/04/SP/20:
Convert CDF from X to Y using relationship given Employees at a particular company have been working seven
hours each day, from 9 am to 4 pm. To
FY (y ) = P (Y ≤ y ) = P (2X + 3 ≤ y )
try to reduce absence, the company decides to introduce
‘flexi-time’ and allow employees to work their seven hours
1
= P (X ≤ (y − 3)) each day at any time between 7 am and 9 pm.
2

For a random sample of 10 employees, the

numbers of hours of absence in the year before and the year
Now substitute 12 (y − 3) for x in CDF function after the introduction of flexi-time are
given in the following table.
( )
WWW.ZNOTES.ORG
Employee A B C D E F G H I J 2.1. t-distribution

Before 42 35 96 74 20 5 78 45 146 0
After 34 32 100 72 31 2 61 35 140 0
t-Distributions are used for small sample sizes in
hypothesis tests’ critical values.
Test, at the 10% significance level, whether the population
Given n data points, the unbiased estimator of the
mean number of hours of absence has
variance, s2 , is calculated using:
decreased following the introduction of flexi-time, stating any
assumption that you make. 1
∑ (x − x)
2
Solution: s2 =
n−1

Assume:
Or
Population differences are normally distributed
(∑ x)2
(∑ x2 − )
Hypothesis: 1
s2 =

n−1

n
H0 : μd = 0

(∑ x2 − nx2 )
1
H1 : μd < 0 =
n−1

Take their differences

Employee A B C D E F G H I J For an underlying population that is normally distributed,
with unknown variance, H0 and H1 will be:
Difference -8 -3 4 -2 11 -3 -17 -10 -6 0

H0 : μ = k *

Gather information:
H1 = {μ > k μ < k μ\nek

n = 10
∑ d = −34

.
∑ d2 = 648

Test statistic =
x−μ
s , where x is sample mean, s is the
n
−34 unbiased estimator of the standard deviation.
d=
10 The critical value for a significance level is:

One tailed: tα,n−1

1 342 2662
(648 − )=

s2d = Two tailed: t α2 ,n−1

9 10 45

Example:
Test statistic:
A random sample of 12 workers from a mobile phone
d − μd −3.4 assembly line is selected from a large number of workers to
t= = ≈ −1.39792

sd

2662

assemble a phone at their normal working speed. The times

45
taken, in minutes, to complete these tasks are recorded

n

10

below:
Critical value: 43.2, 41.6, 49.3, 48.2, 44.2, 40.6, 39.7, 43.4, 44.9, 45.1, 46.2,
43.2
One-tailed test
Assuming that this sample comes from an underlying normal
10% significance level
population, investigate that the population mean is 45
Which means: minutes. Use 5% significance level
Solution:
t0.9,9 = −1.383
Assumed underlying normal dist.

1.39792 > 1.383 Unknown variance

Small sample size
Since test statistic > critical value Test if population is different from 45, which means that
∴ Reject H0 , hours of absence have decreased.

it’s a two tailed test.

Significance level is 5%
2. Inference using normal and H0 : μ = 45
t-distributions H1 : μ\neq45

Test statistic:
WWW.ZNOTES.ORG
We will have to calculator ∑ x and ∑ x2 manually (X − Y ) − (μx − μy )

T =

∑ x = 529.6
s2p
( n1x

+ 1
ny

)
And critical value will be tnx +ny −2

∑ x = 23462.88

2 Example:

A shopkeeper believes that playing music in his shop

encourages customers to spend more money. To test this
1 529.62
(23462.88 − ) ≈ 2.86
belief, he records how much money is collected for a ten-day
s=
12 − 1 12 period while music is playing then for an eight-day period

without music. The sales, in thousands of dollars are

∑x 529.6 summarized as follows.
x= = ≈ 44.1

12 ∑ x2 = 92274.44

n With music ∑ x = 960.1

Test statistic: Without music ∑ y = 748.2

∑ y 2 = 70041.16
x−μ 44.1 − 45
s
= 28.6
= −1.05 Assuming these data are randomly sampled from normal
n 12 distributions with the same variance, test the shopkeeper’s

claim, using a 5% significance level.

Ignore the negative sign, we will only take 1.05
Solution:
Check the critical value for this test, which is t0.975,11
which can be check in the last page of this Znote. n is small for both populations
Both have an underlying mean
t0.975,11 = 2.201
The variances are equal but unknown
Since 1.05 < 2.201, the test statistic is not in the critical Let X be the sales with music, and Y be the sales without
region and hence we do not reject H0 . There is insufficient
music.
evidence to claim that the population mean is not 45 minutes If the sales “with music” are greater than “without”, then
Note: if we were to take the negative value, −1.05, then we μx > μy . Rewrite this as:

would the critical value be negative too, −2.201, we would

then conclude by saying H0 : μx − μy = 0

−1.05 > −2.201. The magnitude is what is important

here, if the magnitude of our test statistic is NOT higher than H1 : μx − μy > 0

the magnitude of the crit. Value, then we do not reject H0 .

This is a one tailed test.

Otherwise, reject H0 .
Test statistic is:

2.2. Difference in means: two-sample t- (X − Y ) − (μx − μy )
test

s2p ( n1x
+ 1
ny
)
The pooled estimate of the population variance s2p can be

Next, we need to find what s2p is:

found from:

(∑ x)2
∑ (x − x)2 + ∑ (y − y)2 (nx − 1) s2x = ∑ x2 −

s2p =

nx
nx + ny − 2

Or 960.12
= 92274.44 − ≈ 95.239
10

(nx − 1) s2x + (ny − 1) s2y

s2p = (∑ y )2

nx + ny − 2 (ny − 1) s2y = ∑ y 2 −

ny
Note: The second formula is more commonly used.

748.22
Difference in means: two sample t-test = 70041.16 − ≈ 65.755
8

Underlying distributions are normal

Populations are independent Using formula for pooled variance:
Population variance of the two populations is the
(nx − 1) s2x + (ny − 1) s2y 95.239 + 65.755
same (but may be unknown) s2p = =

nx + ny − 2 10 + 8 − 2

The test statistic is:
WWW.ZNOTES.ORG
≈ 10.062125 If x is the mean of a random sample of size n from a

normal distribution with population mean μ, a
Finding their means:
100 (α − 1) confidence interval for μ is given by:
960.1 s
x= = 96.01 x ± t α2 ,,n−1
10

n

748.2
y= = 93.525 Interval is written as:
8

(x − t α2 ,n−1 )
Test statistic: s s
,, x + t α2 ,n−1

n n

(96.01 − 93.525) − (0)

≈ 1.65154 Example:
1 + 1)
10.062125 ( 10 8
A random sample of people queueing for a train ticket are

asked how long they have been waiting in the queue before
Critical value is t0.95,16 = 1.746
buying their ticket, their replies, in minutes, are 12, 17, 21, 9,
We used tα,nx +ny −2 here.

14, 9.
Since 1.65154 < 1.746, the test statistic is not in the critical Assuming a normal distribution, calculate a 90% confidence
region, so we do not reject H0 .
interval for the mean stated waiting time.

There is insufficient evidence to suggest that playing music Solution:
increases the sales in the shop. Since we have a small sample, and the population variance is
unknown, we must consider a t-distribution.
2.3. Difference in means: paired sample Calculate the unbiased estimators of the mean and variance
t-tests ∑ x = 92

∑ x2 = 1512

Assume:
Differences are normally distributed ∑x 92
x= = =≈ 15.333

Populations are independent 6

n
Population variance of the two populations is the 922
(∑ x)2
(∑ x2 − )=
same (but may be unknown) 2 1 1512 − 6
s =

n−1 5

The test statistic is sd / dn

d −μ

n

Critical value is tn−1

≈ 20.267

2.4. Difference in means: Normal We are creating 90% conf. interval so:
distribution 100 (α − 1)
α
Assume: ∴ = 0.95
2

Underlying distributions are normal

Large sample sizes The conf. interval required is:
Populations are independent
s
Population variance of the two populations is the x ± t0.95,5
same n
The test statistic is:

From tables, t0.95,5 = 2.015
(X − Y ) − (μx − μy ) So, conf. interval is:
Z=

( nx + 20.267
ny )
σx2 σy2
15.33 ± 2.015

6

Critical value: And the 90% conf. interval is (11.627, 19.033)

Confidence level 90% 95% 98% 99%
z-value 1.645 1.960 2.326 2.576
3. Chi-Squared Tests
\
3.1. χ2 Test
2.5. Confidence interval for a mean for
small sample
WWW.ZNOTES.ORG
Used to test whether a particular type of distribution is v =m−1−k

appropriate for the data given
Test statistic involves squares – only interested in upper Or in words
limit critical values Deg. Of freedom = No. of Ei – 1 – Parameters estimated

The χ2 test can only be used to test two lists of Question 2 9231/04/SP/20
frequencies – the observed and the expected frequencies Each of 200 identically biased dice is thrown repeatedly until
calculated from the hypothesis. an even number is obtained. The number of throws needed
is recorded and the results are summarized in the following
(Oi − Ei )2 table.
χ =∑
2
No. of throws 1 2 3 4 5 6 ≥7

Ei
Frequency 126 43 22 3 5 1 0
where Oi and Ei are the observed and expected frequencies

Carry out a goodness of fit test, at the 5% significance level,

Another way to calculate χ2 is:
to test whether Geo(0.6) is a satisfactory model for the data.
Oi2 Solution:
∑( )−N

Ei

Hypotheses:
H0 : Geo(0.6) is a suitable model

When calculating, set up a table as follows H1 : Geo(0.6) is not a suitable model

r−1
Calculate expected values using 200 × (0.6) (0.4)
(O∗i−E∗i)2
Variable Probability Oi Ei
Ei
r 1 2 3 4 5 6 ≥7
Oi 126 43 22 3 5 1 0
⋮ ⋮ ⋮ ⋮ ⋮

Ei 120 48 19.2 7.68 3.072 1.2288 0.8192

Total 1 N N …

Combine the last 3 cells since we want to ensure that all Ei

If the expected frequency for a class is less than 5, then

you must group this class with the next class (or two …) are greater than 5, Ei
≥ 5.
Hypothesis when testing: r 1 2 3 4 ≥5
H0 : the … distribution is a suitable model

Oi 126 43 22 3 6
H1 : the … distribution is not a suitable model

Ei 120 48 19.2 7.68 5.12
3.2. Comparing the χ2 Value Calculate test statistic, χ2 :
Oi2
Once you have calculated the χ2 value of the data given, χ2 = ∑ ( )−N

you must then compare it to the critical values of the χ2 Ei
distribution
To test 5 classes at a 5% significance level, find the critical 1262 432 222 32 62
= + + + + − 200
value of the χ2 distribution at 95% with 4 degrees of 120 48 19.2 7.68 5.12

freedom 4063
If the distribution fits, the calculate value should be less
= ≈ 4.2323
960

than the critical value, accepting H0

Find the critical value:

3.3. Goodness of Fit to Prescribed Degree of freedom = No. of cells – 1 – parameter

Distribution Type estimated = 5 − 1 = 4
5% significance level →, χ24 (0.95)
This is the case where the null hypothesis states that the
From tables:
data has a ‘particular named distribution’ but does not
specify all the parameters of the distribution χ24 (0.95) = 9.488

You must then calculate the parameter in order to carry

out the test e.g. Compare test statistic with crit. Value
Normal: mean and estimated sample variance
Poisson: mean 4.2323 < 9.488
Binomial: probability of success
Since Test Stat. < Crit. Value, ∴ Accept H0 , Geo(0.6) is a
For k parameters calculated from the observed data, you

suitable model.
must subtract k from the degrees of freedom v
Note: You can use
Hence, with m different outcomes,
WWW.ZNOTES.ORG
(Oi − Ei )2 Solution:
χ2 = ∑

Ei

a) Calculate the totals for each row and column by using

∑ Ri ×∑ Cj

Eij =

T
To calculate the test statistic, but the method shown is more Expected Cappuccino Latte Ground
convenient to use. Company A 144×95 144×92 144×63
250
250
250
Company B 106×95 106×92 106×63

250 250 250
3.4. Contingency Table

Simplifying the values in the table:

This is a table which contains the frequencies for two or
Expected Cappuccino Latte Ground
more variables.
Company A 54.72 52.992 36.288
You may then assess whether the variables are
associated or independent. Company B 40.28 39.008 26.712
Hypothesis when testing:
H0 : the variables are independent

Hypotheses:
H1 : the variables are associated

H0 : There is no association between the company and

For example: coffee preference

H1 : There is association between the company and coffee

A B C preference
X ∑ R1
Calculating the test statistic
Y ∑ R2
Z ∑ R3 (Oij−Eij)2

O E
Eij
∑ C1
∑ C2
∑ C3
T
60 54.72 0.509474
The expectation of each variable is calculated by 52 52.992 0.01857
32 36.288 0.506695
row total × column total
35 40.28 0.692115
grand total

40 39.008 0.025227
Or in math symbols 31 26.712 0.68834
∑ Ri × ∑ Cj
Eij =

T

44
List each variable and set up table as before Hence, Nmin =4

The degree of independence for an r by c table is
v = (r − 1) (c − 1) 4. Non-Parametric Tests
Question 10 9231/02/QP/12:
Random samples of employees are taken from two
4.1. Single-sample sign test
companies, A and B. Each employee is asked which of three
types of coffee (cappuccino, latte, and ground) they prefer. Given n data points, a single-sample sign test is created
The results are shown in the following table. using X ∼ Bin(n, 0.5)
Cappuccino Latte Ground The test statistic can be the number of + signs, that is the
number of data points greater than the median.
Company A 60 52 32
We can calculate the probability that X is above this test
Company B 35 40 31 statistic, below this test statistic, or either in the case of a
two-tailed test.
a) Test, at the 5% significance level, whether coffee This can be expressed as P (X ≤ ts∣X ∼ Bin (n, 0.5))
preferences of employees are independent of their company
or P (X ≥ ts∣X ∼ Bin (n, 0.5)) where ts stands for
Larger random samples, consisting of N times as many
test statistic.
employees from each company, are taken. In each company, Only in this chapter where if the test statistic is less than
the proportions of employees preferring the three types of
the critical value is when you reject H0 .
coffee remain unchanged.

b) Find the least possible value of N that would lead to the Example:
conclusion, at the 1% significance level, that coffee It is believed that the following dataset comes from a
preferences of employees are not independent of their population with median 135
company 150 130 125 140 170
WWW.ZNOTES.ORG
150 130 125 140 170 The underlying data are continuous.
140 190 180 175 165 The data are independent
Where:
160 130 140 140 145
P is the sum of the ranks corresponding to the
positive differences from the stated median.
Perform a single-sample sign test, at the 5% significance
N is the sum of the rank corresponding to the
level, to test this claim.
negative differences from the stated median.
Solution:
Hypotheses:
T = min (P , N ) is the test statistic
If the test statistic is below the critical value, we reject H0
H0 : The population median is 135
. This is because the closer our test statistic is to 0, the

H1 : The population median is not 135

more extreme the data; that is, more likely data are to be

This is a two-tailed test

above or below stated population median, which is why
Value Sign Value Sign our test statistic needs to be below the critical value from
150 + 140 + the tables.
140 + 140 +
Example:
160 + 175 +
The weights (in kg) of ten randomly selected Spanish
130 - 140 + mackerel are recorded:
190 + 170 + 1.6 1.1 2.1 2.4 2.2 2.9 2.6 2.3 2.7 1.9
130 - 165 +
125 - 145 + Test, at the 5% significance level, whether the median weight
180 + + is greater than 1.8 kg.
Solution:
Here, the test statistic is 12, as there are 12 values above the Hypotheses:
stated median. H0 : The population median weight of Spanish mackerel is

Since 12 is greater than n2 , which is 7.5, we need consider

1.8 kg.
only the top tail. H1 : The population median weight of Spanish mackerel is

Consider X ∼ Bin (15, 0.5): greater than 1.8 kg.

Note: To perform this test, we first need to rank the
15 15
P (X ≥ 12) = ( ) (0.5)15 + ( ) (0.5)15 + magnitude of differences of each data point from the stated
12 13 population median. Ignoring signs, start with the smallest

difference and give this rank 1, the next smallest difference is

15 15
( ) (0.5)15 + ( ) (0.5)15 given rank 2 and so on.
14 15

Weight, Wi Wi −Median

P N
P (X ≥ 12) ≈ 0.017578 1.6 -0.2 2
1.1 -0.7 7
Since 0.017578 < 0.025, the test statistic of 12 is in the
2.1 0.3 3
critical region and, therefore, we reject H0 .
2.4 0.6 6

There is sufficient evidence to suggest the population is not

135. 2.2 0.4 4
2.9 1.1 10
We can approximate the simple-sample sign test with the
2.6 0.8 8
normal distribution
n has to be large (n > 10) 2.3 0.5 5
Let S = min(No. + signs, No. of − signs) 2.7 0.9 9
E (S ) = n 1.9 0.1 1
2
Var (S ) = n4
Sums: 46 9
Then T ∼ N ( n2 , n4 )
Applying continuity correction, our z-value is: So, the test statistic here is T = min (P , N ) = 9
We look up the critical value in the statistical tables:
S + − μ + 0.5
z=
One-tailed test
σ
5% significance level
4.2. Wilcoxon signed-rank test
Since 9 < 10, (Test statistic < critical value), there is sufficient
evidence to reject. ∴ There is sufficient evidence to suggest
A Wilcoxon signed-rank test can be performed when:
The underlying data are symmetric that the population median is not 1.8kg.
WWW.ZNOTES.ORG
Note: Once again, in this case we require the test statistic to Since our Test statistic > Crit. value. We reject H0 , we can

be lower than the crit. value to reject H0 . conclude that the median waiting time is lower than 50
minutes
Given the statistic T = min (P , N ) Note: as shown, once we have used normal dist. to
E (T ) = n(n+1
4
)
approximate the signed-rank test, we stick back to the old
Var (T ) = n(n+126
)(2n+1)

rule where we reject H0 when Test stat. > Crit. val.

For large n:
4.3. Paired-sample sign test
n (n + 1) n (n + 1) (2n + 1)
T ∼N( , )
4 26

We can extend the idea of the sign test to work with

paired-sample data by looking for a positive or negative
Allowing for an approximate z-test to be done using:
difference. Nevertheless, the principles behind the sign
T − μ + 0.5 test remain the same
z=
We still use the Binom. Dist. to calculate our test statistic

σ
After approximating it to the normal distribution, this is Example:
where you reject the H0 if the test statistic is bigger than

Data are collected on the time, in seconds, it takes nine
the crit. value, exactly like in the last chapters. children to tie up their left shoelace and their right shoelace.
Child Left (s) Right (s)
Example:
A 42 45
Managers at a busy international airport are studying the
times taken by arriving passengers to pass immigration, B 38 36
collect their luggage, then pass through customs. It is known C 51 52
that in the past this was 50 minutes. Some changes have D 42 39
been made to the queueing system in the hope of reducing E 31 35
this time. A random sample of 45 arriving passengers is
F 48 49
taken and the rank sums calculated as P=55, N=410. Using a
suitable approximation, test, with 1% significance level, G 61 62
whether the median waiting time has reduced H 38 39
Solution: I 44 45
Hypotheses:
H0 : Median waiting time is 50 minutes

Test, at the 10% level of significance, whether there is a
H1 : Median waiting time is lower than 50 minutes.

difference in the time it takes for the children to tie each
This is a One-tailed test shoelace.
We will use normal dist. to approximate our data because Solution:
our n is large Hypotheses:
Gather information: H0 : There is no difference in the time taken to tie their left

and right shoelaces.

n = 45 H1 : There is a difference in the time taken to tie their left
E (T ) = 14 n (n + 1) = 1035

2
and right shoelaces.

Var (T ) = 241 n (n + 1) (2n + 1) = 31395

4
Let’s set Li − Ri as the difference.

T = min (55, 410) = 55 Child Left (s) Right (s)

Standardizing our T value A 42 45 -
B 38 36 +
T − μ + 0.5
z=
C 51 52 -
σ
D 42 39 +
1035
55.5 − 2 E 31 35 -
=

31395 F 48 49 -
4
G 61 62 -
≈ −5.21485 H 38 39 -
Crit. value from the normal dist. table: I 44 45 -
z0.99 = −2.326
We will let the number of + signs be the test statistic
The test statistic is 2
∴ 5.21485 > 2.326 Use: X ∼ Bin (9, 0.5)
( ) ( ) ( )
WWW.ZNOTES.ORG
9 9 9 b) Stating clearly your hypotheses and using a 5% level of

P (X ≤ 2) = ( ) (0.5)9 + ( ) (0.5)9 + ( ) (0.5)9
0 1 2 significance, test whether or not there is evidence that the

training has reduced blood pressure.

P (X ≤ 2) ≈ 0.089844 Solution:
a)
Because this is a two-tailed test, our critical value will be 10 Hypotheses:
Since 0.089844 > 0.05, the test statistic of 2 is not in the
H0 : Population median blood pressure is unchanged
critical region. Therefore, there is insufficient evidence to

H1 : Population median blood pressure has decreased

reject H0 . There is insufficient evidence to say there is a

Construct the signed-rank table

difference in the times taken for children to tie their left and
1 2 3 4 5 6 7 8
right shoelaces.
Note: Once again here, we only reject H0 if our test statistic
Ai − Bi
-10 -7 -5 -35 13 6 -1 -40
is lower than the critical value. P 6 3
N 5 4 2 7 1 8
4.4. Wilcoxon matched-pairs signed-
rank test P =9
N = 27
A Wilcoxon signed-rank test can be performed when:
T = min (9, 27) = 9
The underlying data are symmetric
The underlying data are continuous Crit. value:
Where:
P is the sum of the ranks corresponding to the 5% significance level
positive differences between the matched pairs One-tailed test
N is the sum of the ranks corresponding to the
negative differences between matched pairs From the tables:
T = min (P , N ) is the test statistic Crit. Value = 5
Since 9 > 5, do not reject H0 . There is no evidence to suggest

Given the statistic T = min (P , N ) that the median blood pressure has decreased.
E (T ) = n(n+1 ) b)
4
Hypotheses:

Var (T ) = n(n+126
)(2n+1)
H0 : Population median blood pressure is unchanged

For large n:

H1 : Population median blood pressure has decreased

n (n + 1) n (n + 1) (2n + 1) We can use the normal dist. to approximate since n is large

T ∼N( , ) Gather information:
4 26

Allowing for an approximate z-test to be done using: n = 30

E (T ) = 14 n (n + 1) = 232.5
T − μ + 0.5

1
z=
Var (T ) = 24 n (n + 1) (2n + 1) = 2363.75

σ
T − μ + 0.5
Example: z=
The following table shows the systolic blood pressure

σ
(mmHg) of a random sample of eight students before and 132.5 − 232.5
=
after a six-week training period. 2363.75

Student 1 2 3 4 5 6 7 8
≈ −2.06
B.T. 130 170 125 170 130 130 145 160
A.T. 120 163 120 135 143 136 144 120 Critical value:
5% significance level & one-tailed test.
B.T. = Before Training
A.T. = After Training z0.95 = −1.645

a) Stating clearly your hypotheses, test, using the Wilcoxon

Since 2.06 > 1.645, reject H0 . The population median blood
signed-rank test, whether or not there is evidence that the

pressure has decreased.

training has reduced blood pressure. Use a 5% level of
significance
At a later date, a random sample of 30 students undertake a 4.5. Wilcoxon rank-sum test:
six-week training period. Analysis of their results using the
Wilcoxon signed-rank test gives T = 132 A Wilcoxon rank-sum test can be performed when the
two samples are independent, where:
WWW.ZNOTES.ORG
The two samples have sizes m and n, where m ≤ n Low B12 High B12
Rm is the sum of the ranks of the items in the sample

Sum 53 83
size m
The test statistic is: Calculate the test statistic:
W = min (Rm , m (n + m + 1) − Rm ) Rm = 83 since this is the rank sum from the smaller-sized

sample.

Example:
Researchers are investigating the effect of vitamin B12 on the m (n + m + 1) − Rm = 7 (9 + 7 + 1) − 83 = 36

size of the brain. A sample of males aged between 25 & 40

The test statistic is the minimum of 83 & 36, which is W =
years is selected. Nine of them are known to have low B12
36
levels and seven are known to have high B12 levels. After a
Since 36 < 40, there is sufficient evidence to reject H0 . There
brain scan, the ratio of brain volume to skull capacity is

is evidence to suggest that level of vitamin B12 affects brain

recorded.
size.
Low B12 Levels High B12 Levels
Note: The following table shows what we would get if we
0.795 0.786 ranked them the other way around.
0.798 0.789 Low B12 High B12
0.802 0.792 0.812 1 16
0.805 0.796 0.810 2 15
0.806 0.799 0.808 3 14
0.807 0.8 0.807 4 13
0.808 0.803 0.806 5 12
0.81 0.805 6 11
0.812 0.803 7 10
0.802 8 9
Carry out a Wilcoxon rank-sum test, at the 5% significance
0.800 9 8
level, to see whether the level of vitamin B12 affects the size
of the brain. 0.799 10 7
Solution: 0.798 11 6
Hypotheses: 0.796 12 5
H0 : Level of B12 has no effect on brain size

0.795 13 4
H1 : Level of B12 has an effect on brain size

0.792 14 3
First, rank the whole dataset. Note which group each value
0.789 15 2
comes from. Then add up the ranks for each category. Use
the value of the sums of the group with the smaller sample 0.786 16 1
size. Sum 100 36
Low B12 High B12
Given the test statistic W , then:
0.812 1 1
E (W ) = 12 m (n + m + 1)
0.810 2 2

1
Var (W ) = 12 mn (n + m + 1)
0.808 3 3

For large n and m, it is possible to approximate W as

0.807 4 4 a normal dist.
0.806 5 5
1 1
0.805 6 6 W ∼ N ( m (n + m + 1) , mn (n + m + 1))
2 12

0.803 7 7
0.802 8 8 Allowing for an approximate z-test with:
0.800 9 9
W − μ + 0.5
0.799 10 10 z=
σ
0.798 11 11
0.796
0.795
12
13 13
12
5. Probability Generating
0.792 14 14 Functions (PGF)
0.789 15 15
0.786 16 16 The uses of PGFs:
WWW.ZNOTES.ORG
PGFs gives and elegant and efficient way of finding

5.3. E(X) and Var(X) using PGFs
expected values and variances.
It gives a concise form for a probability distribution
GX (1) = 1
and allows much greater analysis

By recognizing the expansions of functions, PGFs E (X ) = GX (1)
2
Var (X ) = GX (1) + GX (1) − (GX (1))
′′ ′ ′
enable us to describe the structure of infinite discrete
distributions, such as the Poisson distribution and

Geometric distribution.
We can therefore use PGFs to find probabilities when 5.4. The PGF of a linear transformation
the value of the discrete random variable is very large
indeed. GaX+b (t) = tb GX (ta )

GaX+bY +cZ+d = td GX (ta ) × GY (tb ) × GZ (tc )

5.2. The PGF In general:
The PGF can be written as a single summation: GX1 +…+Xn (t) = GX1 (t) × … × GXn (t)

GX (t) = ∑ txi P (X = xi ) This is called The Convolution Theorem

Question 6: 9231/04/SP/20:

x
Aisha has a bag containing 3 red balls and 3 white balls. She
Notice that the expression for GX (t) is the same as that selects a ball at random, notes its colour and returns it to the
X
for the expectation function, E (t ), and so: bag; the same process is repeated twice more. The number
of red balls selected by Aisha is denoted by X .
GX (t) = ∑ txi P (X = xi ) = E (tX )

a) Find the probability generating function GX (t) of X.
x Basant also has a bag containing 3 red balls and 3 white

balls. He selects three balls at random, without replacement,
The variable t is called a dummy variable in this case, and
from his bag. The number of red balls selected by Basant is
has no significance itself, but t does have an important
denoted by Y.
role in finding expectation of X and higher moments of
b) Find the probability generating function GY (t) of Y .
expectation.
The random variable Z is the total number of red balls
Example: selected by Aisha and Basant.
Consider the Probability distribution: c) Find the probability generating function of Z , expressing
your answer as a polynomial.
x 0 1 2 3 4 5 6
d) Use the probability generating function of Z
P (X = x) 0.1 0.2 0.3 0.15 0.1 0.1 0.05
to find E(Z) and Var(Z).
Solution:
Solution: a) It’s best to make a Prob. Dist. table first, then make the
Apply the general form of the PGF: PGF
x 0 1 2 3
GX (t) = ∑ txi P (X = xi )
1 3 3 1
P (X = x)

x 8
8
8
8
GX (t) = 0.1t0 + 0.2t1 + 0.3t2 + 0.15t3 + 0.1t4 + 0.1t5 + 0.05t6 1 3 3 1

∴ GX (t) = + t + t2 + t3

8 8 8 8

Standard PGFs:
b) Same as part (a) but this time without replacement.
Probability Distribution P(X=r) Gx (t) y 0 1 2 3
Bin(n,p) (nr )pr q n−r

(q + pt)n P (Y = y ) 1
20
9
20
9
20
1
20
P o(λ) e−λ λr eλ(t−1)

r! 1 9 9 9

Geo(p) q r−1 p pt ∴ GY (t) = + t + t2 + t3

1−qt 20 20 20 20

1 t(1−tn )
Uniformdist(n) n n(1−t) c) Using the Convolution Theorem:

The probabilities in a PGF can be found using: GZ (t) = GX+Y = GX (t) × GY (t)

(r) 1 3 3 1
P (X = r) =
GX (t)
=( + t + t2 + t3 ) ×
8 8 8 8

r!
( )
WWW.ZNOTES.ORG
60
1 9 9 9 ∴ E (Z ) = 3
( + t + t2 + t3 )
20 20 20 20

′′ 1
1 GZ (t) = (78 + 336t + 468t2 + 240t3 + 30t4 )
160

= (1 + 12t + 39t2 + 56t3 + 39t4 + 12t5 + t6 )

160

Let t =1
d) Recall that:
′′ 36
′ GZ (1) =
E (Z ) = GZ (1) 5

2
Var (Z ) = GX (1) + GX (1) − (GX (1)) 36 6
′′ ′ ′

Var (Z ) = + 3 − (3)2 =
5 5

′ 1 6
GZ (t) = (12 + 78t + 168t2 + 156t3 + 60t4 + 6t5 ) ∴ Var (Z ) =
160

5

Let t =1
′ 1
GZ (1) = (12 + 78 + 168 + 156 + 60 + 6)
160

WWW.ZNOTES.ORG
CAIE A2 LEVEL
Further Maths (9231)
Copyright 2022 by ZNotes

These notes have been created by Devandhira Wijaya Wangsa for the 2020-22 syllabus
This website and its content is copyright of ZNotes Foundation - © ZNotes Foundation 2022. All rights reserved.
The document contains images and excerpts of text from educational resources available on the internet and
printed books. If you are the owner of such media, test or visual, utilized in this document and do not accept its
usage then we urge you to contact us and we would immediately replace said media.
No part of this document may be copied or re-uploaded to another website without the express, written
permission of the copyright owner. Under no conditions may this document be distributed under the name of
false author(s) or sold for financial gain; the document is solely meant for educational purposes and it is to remain
a property available to all at no cost. It is current freely available from the website www.znotes.org
This work is licensed under a Creative Commons Attribution-NonCommerical-ShareAlike 4.0 International License.

Caie A2 Level Further Maths 9231 Further Statistics 2 v1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Caie A2 Level Further Maths 9231 Further Statistics 2 v1

Uploaded by

Copyright:

Available Formats

ZNOTES.

UPDATED TO 2020-22 SYLLABUS

Finding max point hence stationary point

below x-axis; f (x) ≥ 0

Outside given interval f (x) = 0; show on a sketch P (X < x) or P (X ≤ x)

F (−∞) = 0 F (∞) = 1 ⎧0 x≤0

constant, but never decreases. 9 ​

F (x) is a continuous function even if f (x) is 1 x ≥ 31

1 3 Find the difference

F (x) = P (X ≤ 3) = P (X ≤ 1) + P (1 ≤ x ≤ 3) To calculate variance:

is equal to 0. Hence, we must find differentiate in the interval 1 x≥9

functions as well as functions with powers (e.g. W = X2)

We can deduce the distribution of a simple function of X 1 1

Example: The population variance of the two populations is the

First step is to find FX (x) and suppose we do, Fx (x)

For a random sample of 10 employees, the

Employee A B C D E F G H I J 2.1. t-distribution

Take their differences

One tailed: tα,n−1

s2d = Two tailed: t α2 ,n−1

assemble a phone at their normal working speed. The times

1.39792 > 1.383 Unknown variance

it’s a two tailed test.

2. Inference using normal and H0 : μ = 45 ​

We will have to calculator ∑ x and ∑ x2 manually (X − Y ) − (μx − μy )

And critical value will be tnx +ny −2

A shopkeeper believes that playing music in his shop

without music. The sales, in thousands of dollars are

n With music ∑ x = 960.1

Test statistic: Without music ∑ y = 748.2

claim, using a 5% significance level.

The variances are equal but unknown

would the critical value be negative too, −2.201, we would

−1.05 > −2.201. The magnitude is what is important

the magnitude of the crit. Value, then we do not reject H0 .

2.2. Difference in means: two-sample t- (X − Y ) − (μx − μy ) ​ ​

The pooled estimate of the population variance s2p can be

(nx − 1) s2x + (ny − 1) s2y

Note: The second formula is more commonly used.

Underlying distributions are normal

The test statistic is: ​ ​

≈ 10.062125 If x is the mean of a random sample of size n from a

(96.01 − 93.525) − (0)

A random sample of people queueing for a train ticket are

interval for the mean stated waiting time.

Populations are independent 6

The test statistic is sd / dn

Critical value is tn−1

Underlying distributions are normal

The test statistic is:

Critical value: And the 90% conf. interval is (11.627, 19.033)

Used to test whether a particular type of distribution is v =m−1−k

Carry out a goodness of fit test, at the 5% significance level,

When calculating, set up a table as follows H1 : Geo(0.6) is not a suitable model

Ei 120 48 19.2 7.68 3.072 1.2288 0.8192

Combine the last 3 cells since we want to ensure that all Ei

3.2. Comparing the χ2 Value Calculate test statistic, χ2 :

you must then compare it to the critical values of the χ2 Ei ​

than the critical value, accepting H0

3.3. Goodness of Fit to Prescribed Degree of freedom = No. of cells – 1 – parameter

You must then calculate the parameter in order to carry

a) Calculate the totals for each row and column by using

Company B 106×95 106×92 106×63

constant, but never decreases. 9

2. Inference using normal and H0 : μ = 45

2.2. Difference in means: two-sample t- (X − Y ) − (μx − μy )

The test statistic is:

you must then compare it to the critical values of the χ2 Ei

List each variable and set up table as before Hence, Nmin =4

By recognizing the expansions of functions, PGFs E (X ) = GX (1)

GX (t) = ∑ txi P (X = xi ) This is called The Convolution Theorem