CS1A, April19 To April22

INSTITUTE AND FACULTY OF ACTUARIES
EXAMINATION
3 April 2019 (am)
Subject CS1A – Actuarial Statistics

Core Principles
Time allowed: Three hours and fifteen minutes
INSTRUCTIONS TO THE CANDIDATE
1. Enter all the candidate and examination details as requested on the front of your
answer booklet.
2. You must not start writing your answers in the booklet until instructed to do so by the
supervisor.
3. Mark allocations are shown in brackets.
4. Attempt all questions, begin your answer to each question on a new page.
5. Candidates should show calculations where this is appropriate.
Graph paper is NOT required for this paper.
AT THE END OF THE EXAMINATION
Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this
question paper.
In addition to this paper you should have available the 2002 edition of the Formulae
and Tables and your own electronic calculator from the approved list.
CS1A A2019 © Institute and Faculty of Actuaries

1 The amount of money customers spend in a single trip to the supermarket is modelled
using an exponential distribution with mean €15.
(i) Calculate the probability that a randomly selected customer spends more than
€20. [2]
(ii) Calculate the probability that a randomly selected customer spends more than
€20, given that it is known that she spends more than €15. [3]
[Total 5]
2 Consider an estimator of a parameter θ, denoted as θ! .
(i) State the definition of the bias of θ! by giving an appropriate formula. [1]
(ii) State the definition of the mean square error of θ! , denoted as MSE θ! , by
()
giving an appropriate formula. [1]
(iii) Derive an expression for MSE θ! in terms of the variance and the bias of θ! .
() [3]
[Total 5]
3 The number of claims on a certain type of policy follows a Poisson distribution with
claim rate l per year. For a group of 200 independent policies of this type, the total
number of claims during the last calendar year was 82.
Determine an approximate 95% confidence interval for the true annual claim rate for
this type of policy based on last year’s claims. [4]
4 Alice and Bob are playing a game of dice. Two fair six-sided dice are rolled.
Consider the following events:
A = ‘sum of two dice equals 3’

B = ‘sum of two dice equals 7’
C = ‘at least one of the dice shows a 1’.
(i) Show that P(C) = 11/36. [1]
(ii) Calculate P(A|C). [2]
(iii) Calculate P(B|C). [2]
(iv) Determine whether A and C are independent. [1]
(v) Determine whether B and C are independent. [1]

[Total 7]
CS1A A2019–2
5 (i) State the central limit theorem for independent identically distributed random
variables X1, X2, …, Xn with finite mean µ and finite (non-zero) variance σ2.
[2]
(ii) Show that if the random variable B has the binomial distribution with
B – np
parameters (n, p), then approximately follows a standard normal
np(1 – p)
distribution for large n, using the central limit theorem. [4]
Two players have played a large number of independent games. In a sample of 100 of
these games, one player has won 57 games and the other player has won 43.
(iii) Derive a 95% confidence interval for the probability p that the first player wins
a given game, using the normal approximation in part (ii). [4]
[Total 10]
6 Let X and Y be two continuous random variables.
(i) State the definition of independence of the random variables X and Y in terms
of their joint probability density function. [2]
The joint probability density function of X and Y is given by:
5
8xy for 0 < x < y < 1,
f X ,Y (x, y) =
0 otherwise.
(ii) (a) Determine the marginal density functions of X and Y. [2]
(b) State whether or not X and Y are independent based on your answer in
part (ii)(a). [1]
(iii) Derive the conditional expectation E[X | Y = y]. [3]

[Total 8]
CS1A A2019–3 PLEASE TURN OVER

7 Consider a random sample X1, …, X9 of size 9 from a Normal distribution with
expectation 3 and variance 4, that is, Xi ~ N(3,4) for i = 1, …, 9, and a sample
Y1, …, Y18 of size 18 from a N(4,10) distribution. Assume that the two samples are
independent. Let X and Y denote the means of the two samples and let S X2 and SY2
be the sample variances.
(i) Calculate the probability for the event that X > 4 . [2]
(ii) Derive the probability for the event that X > Y . [3]
(iii) Calculate the probability for the event that S X2 > 4 . [2]
(iv) Show that the probability for the event S X2 > SY2 is approximately 0.05. [3]
(v) Explain whether the exact probability in part (iv) is greater or less than 0.05.
[2]
[Total 12]
CS1A A2019–4
8 The Poisson distribution with mean and variance µ has the following density function:
µ y e –µ
f ( y) =
y!
(i) (a) Show that this probability function can be written in the standard form
of the exponential family of distributions, stating the natural and scale
parameters, θ and ϕ, and the associated functions of these parameters.
[4]
(b) Verify the mean and variance of the Poisson distribution, using the
expression from part (a) together with the properties of the exponential
family of distributions. [2]
A wildlife researcher is investigating the national population of a particular species

of bear. The researcher believes that the number of bears detected over one year, at
each of i = 1, 2, …, 30 observation points across the country, may follow a Poisson
distribution with parameter µi . She also believes that µi depends on the density of
trees, ti , at the observation point i. The tree density t is measured in ‘trees per square
kilometre’.
The researcher specifies the following linear predictor, where α and β are parameters
to be estimated:
η(µi) = α + βti
The researcher then runs a computer model that fits a generalised linear model (using
the Poisson canonical link function) to bear-count and tree density data collected from
the 30 observation points.
Parameters:
Estimate Standard Error
Intercept, α −1.54520 0.29190
Tree density, β 0.42408 0.09352
(ii) (a) Explain, using the model output shown above, whether the variable
‘tree density’ is significant or not. [3]
(b) Estimate, using the fitted model, the expected number of bears that
would be detected over a one-year period in a woodland area with a
tree density of 12. [3]
[Total 12]
CS1A A2019–5 PLEASE TURN OVER

9 Consider a sample of 1,000 motor insurance policies. We assume that the annual total
claim amounts per policy are independent and identically distributed. We denote by X
the number of policies with a total amount of over £5,000 claimed in a calendar year,
and assume that X has a Binomial distribution, X ~ Bin(p, 1,000), with expectation
E[X] = 1,000p.
An analyst wishes to estimate the unknown proportion p of claims with amount

greater than £5,000 per year.
(i) Derive the maximum likelihood estimator for p. [3]
Suppose now that the analyst has some prior knowledge about p and assumes a Beta
Γ(α + β) α–1
prior distribution with density function f ( p) = p (1– p)β–1.
Γ(α)Γ(β)
(ii) Derive the density of the Bayesian posterior distribution of p in terms of n, X,
α and β. [2]
(iii) State the type of the posterior distribution of p with its parameters. [2]
(iv) Comment on the relationship between the prior distribution and the posterior
distribution of p in this context. [2]
Assume that 50 policies out of 1,000 policies in an actual sample have a total claim
amount of over £5,000.
(v) Estimate p using the MLE in (i). [1]
(vi) Estimate p using the Bayesian estimator under quadratic loss, based on the
posterior distribution derived in parts (ii) and (iii). Assume that the parameters
of the prior distribution are α = 2 and β = 2. [3]
(vii) Comment on the difference between the values estimated in (v) and (vi). [1]
(viii) State the Bayesian estimator from part (vi) in the form of a credibility interval,
determining the credibility factor. [3]
[Total 17]
CS1A A2019–6
10 A professional body wishes to analyse the performance of its students on a particular
two-part examination. It records the scores obtained by a sample of 12 students on the
first part of the exam, and the scores obtained by the same students on the second part
of the exam. The results are as follows:
Student A B C D E F G H I J K L
First-part exam score x (%) 82 49 73 60 61 77 65 85 91 53 59 73
Second-part exam score y (%) 76 58 75 66 70 71 76 92 87 59 63 71
∑ x = 828 ∑ y = 864 ∑ x 2 = 59,054 ∑ y 2 = 63,362 ∑( x – x ) ( y – y ) = 1,334

(i) Calculate the fitted linear regression equation of y on x. [3]
(ii) Assuming the full Normal model:
(a) Calculate an estimate for the error variance σ2. [2]
(b) Determine the 90% confidence interval for σ2. [2]
(iii) Test whether the data are positively correlated, by considering the slope
parameter. [4]
(iv) Calculate a 95% confidence interval for the mean second-part exam score
corresponding to an individual first-part exam score of 57. [3]
(v) Test whether these data could come from a population with a correlation
coefficient equal to 0.75. [4]
(vi) Calculate the proportion of variation explained by the model. [1]
(vii) Comment on the fit of the model, using your answer in part (vi). [1]
[Total 20]
END OF PAPER
CS1A A2019–7
EXAMINERS’ REPORT
April 2019 Examinations
Subject CS1 – Actuarial Statistics Core Principles

(Part A)
Introduction
The Examiners’ Report is written by the Chief Examiner with the aim of helping candidates,
both those who are sitting the examination for the first time and using past papers as a
revision aid and also those who have previously failed the subject.
The Examiners are charged by Council with examining the published syllabus. The
Examiners have access to the Core Reading, which is designed to interpret the syllabus, and
will generally base questions around it but are not required to examine the content of Core
Reading specifically or exclusively.
For numerical questions the Examiners’ preferred approach to the solution is reproduced in
this report; other valid approaches are given appropriate credit. For essay-style questions,
particularly the open-ended questions in the later subjects, the report may contain more points
than the Examiners will expect from a solution that scores full marks.
The report is written based on the legislative and regulatory context pertaining to the date that
the examination was set. Candidates should take into account the possibility that
circumstances may have changed if using these reports for revision.
Mike Hammer
Chair of the Board of Examiners
July 2019
Subject CS1 (Actuarial Statistics Core Principles) part A– April 2019 – Examiner’s report
A. General comments on the aims of this subject and how it is marked
1. The aim of the Actuarial Statistics 1 subject is to provide a grounding in

mathematical and statistical techniques that are of particular relevance to actuarial
work.
2. Some of the questions in the examination paper admit alternative solutions from
these presented in this report, or different ways in which the provided answer can
be determined. All mathematically correct and valid alternative solutions or
answers received credit as appropriate.
3. Rounding errors were not penalised, but candidates lost marks where excessive
rounding led to significantly different answers.
4. In cases where the same error was carried forward to later parts of the answer,
candidates were given full credit for the later parts.
5. In questions where comments were required, valid comments that were different
from those provided in the solutions also received full credit where appropriate.
B. Comments on student performance in this diet of the examination.
1. Performance was satisfactory, with most candidates demonstrating good

understanding and application of core topics in actuarial statistics.
2. Answers requiring the derivation of statistical properties contained a considerable

number of errors (e.g. Question 2). Candidates are encouraged to revise
corresponding parts of the Core Reading and practice on using provided definitions
to derive important statistical properties.
3. The calculation of probabilities of certain events is fundamental for the

understanding and use of actuarial statistics. Candidates are advised to practice on
this topic (e.g. Question 4), under scenarios of varying complexity.
4. Attention is also drawn on providing full and mathematically precise definitions or

statistical statements (e.g. Questions 5, 6).
C. Pass Mark
The combined pass mark for CS1 in this exam diet was 58.
CS1A A2019 @Institute and Faculty of Actuaries

Solutions Subject CS1 – A
Q1
1 1
(i) 𝐸𝐸(𝑋𝑋) = 𝜆𝜆 = 15 so 𝜆𝜆 = 15 = 0.06666 … [1]
So P(X > 20) = 1 – F(20) = 1 – (1 – exp(-0.06666 x 20)) = 0.26360 [1]
P(X > 20 ∩ X > 15)

(ii) P(X > 20 | X > 15) = P(X > 15)
[1]
P(X > 20)

= P(X > 15) = 𝑃𝑃[𝑋𝑋 > 5] (using memoryless property)
[1]
Numerator as calculated above for Part (i), denominator is:
𝑃𝑃(𝑋𝑋 > 15) = 1 – 𝐹𝐹(15) = 1 – (1 – exp(−0.06666 𝑥𝑥 15) = exp(−1)
𝑆𝑆𝑆𝑆 𝑃𝑃(𝑋𝑋 > 20 | 𝑋𝑋 > 15) = 0.26360/0.36788 = 0.71653 [1]
Alternatively, using property of exponential distribution:

𝑃𝑃(𝑋𝑋 > 20 | 𝑋𝑋 > 15) = 𝑃𝑃(𝑋𝑋 > 20 − 15) = 0.71653
The question was answered generally well by most candidates. Some candidates were
unable to recall and apply the memoryless property of the exponential distribution in part
(ii).
Q2
(i) bias(𝜃𝜃�) = 𝐸𝐸�𝜃𝜃�� − 𝜃𝜃 [1]
(ii) MSE(𝜃𝜃�) = 𝐸𝐸[(𝜃𝜃� − 𝜃𝜃)2 ] [1]

2
(iii) MSE(𝜃𝜃�) = 𝐸𝐸 ��𝜃𝜃� − 𝐸𝐸�𝜃𝜃�� + 𝐸𝐸�𝜃𝜃�� − 𝜃𝜃� � [1]
2 2
= 𝐸𝐸 ��𝜃𝜃� − 𝐸𝐸�𝜃𝜃�� + 2𝐸𝐸�𝜃𝜃� − 𝐸𝐸�𝜃𝜃�� 𝐸𝐸�𝜃𝜃�� − 𝜃𝜃� + � 𝐸𝐸�𝜃𝜃�� − 𝜃𝜃� [1]
= 𝑉𝑉�𝜃𝜃�� + bias 2 (𝜃𝜃�). [1]
Parts (i) and (ii) require standard definitions and were answered well by most candidates.
Part (iii) was answered poorly. A number of candidates repeated the answer in parts (ii)
and (iii), failing to properly derive the required expression.

Q3
82
The mean number of claims per policy is 𝑥𝑥̅ = 200 = 0.41 [1]
Using the normal approximation to the Poisson distribution,
the approximate 95% CI [1]
𝑥𝑥̅
for 𝜆𝜆 is 𝑥𝑥̅ ± 1.96�𝑛𝑛 which gives
0.41
0.41 ± 1.96� 200 = 0.41 ± 0.0887, i.e. (0.321, 0.499). [2]
Candidates performed strongly on this question, with most applying correctly the normal
approximation to the Poisson distribution. A common error was to use the incorrect mean,
∑ 𝑋𝑋𝑖𝑖 −𝑛𝑛𝜆𝜆
i.e. 82 rather than 82/200. Answers working with the alternative statistic �
were
�𝑛𝑛𝜆𝜆
given full credit when used correctly.
Q4
The initial step is to define the sample space:

Sample space =
Ω = {(1,1), (1,2), (1,3), … , (6,6)} = (𝑖𝑖, 𝑗𝑗) | 𝑖𝑖, 𝑗𝑗 = 1, 2, 3, 4, 5, 6
Each outcome is equally likely with probability 1/36.
𝐴𝐴 = {(1,2), (2,1)}
𝐵𝐵 = {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}
𝐶𝐶 = {(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (3,1), (4,1), (5,1), (6,1)}
(i) P(C) = 11/36 [1]

𝑃𝑃(𝐴𝐴∩𝐶𝐶)
(ii) P(A|C) = 𝑃𝑃(𝐶𝐶) [½]
𝑃𝑃(𝐴𝐴 ∩ 𝐶𝐶) = 2⁄36 [1]
2/36
P(A|C) = 11/36 = 2/11 [½]
𝑃𝑃(𝐵𝐵∩𝐶𝐶) 2/36
(iii) P(B|C) = 𝑃𝑃(𝐶𝐶)
= 11/36 = 2/11 [2]
(iv) P(A) = 2/36 ≠ P(A|C) -> So they are not independent. [1]
(v) P(B) = 6/36 ≠ P(B|C) -> So they are not independent. [1]
The question was generally well answered. Candidates that took a methodical approach in
setting out the sample space scored well on parts (i), (ii) and (iii). Most candidates were
able to demonstrate correctly the lack of independence for parts (iv) and (v). Common
errors occurred in parts (ii) and (iii) were P(C) or P(B and C) etc. were calculated
incorrectly.

Q5
(i) If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 is a sequence of independent, identically distributed random variables
with finite mean µ and finite (non-zero) variance 𝜎𝜎 2 , then the distribution of
𝑋𝑋� − 𝜇𝜇
𝜎𝜎/√𝑛𝑛
[1]
approaches the standard normal distribution, N(0,1), [½]
as n tends to infinity. [½]
(ii) If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent and each follows a Bernoulli(p) distribution
with mean p and variance p(1-p), then 𝐵𝐵 = ∑𝑛𝑛𝑖𝑖=1 𝑋𝑋𝑖𝑖 follows a Binomial(n,p)
distribution.
[2]
The CLT from part (i) can also be expressed as follows: the distribution of
∑ 𝑋𝑋𝑖𝑖 − 𝑛𝑛𝜇𝜇
√𝑛𝑛𝜎𝜎 2
approaches the standard normal distribution, N(0,1), as n tends to infinity. [1]
𝐵𝐵−𝑛𝑛𝑛𝑛
Therefore, we have that has the standard normal distribution for large n.
�𝑛𝑛𝑛𝑛(1−𝑛𝑛)
[1]
(iii) We can use 𝑝𝑝̂ = 0.57 in the variance (denominator). [1]
𝑛𝑛�(1−𝑛𝑛�)
So the CI is given as 𝑝𝑝̂ ± 1.96� 𝑛𝑛
, i.e. [1]
0.57(1−0.57)
0.57 ± 1.96� 100
= (0.473, 0.667) [2]
Parts (i) and (ii) were reasonably well attempted. In part (i) full credit was given for
∑ 𝑋𝑋𝑖𝑖 −𝑛𝑛𝑛𝑛
providing the answer in terms of of 𝑛𝑛𝜎𝜎 2
or equivalent. A number of candidates were
not precise enough in their statement of the central limit theorem, for example, missing
out the requirement for large sample size. Part (iii) was well answered, although a
number of arithmetic slips were made in the calculation of the confidence interval.
Q6
(i)
The random variables X and Y are independent if, and only if, the joint pdf is the
product of the two marginal pdfs for all (x,y) in the range of the variables, i.e.
𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥, 𝑦𝑦) = 𝑓𝑓𝑋𝑋 (𝑥𝑥) × 𝑓𝑓𝑌𝑌 (𝑦𝑦) for all (x, y) in the range. [2]

(ii)
1 8𝑥𝑥�1−𝑥𝑥 2 �
(a) 𝑓𝑓𝑋𝑋 (𝑥𝑥) = ∫𝑥𝑥 8𝑥𝑥𝑦𝑦 𝑑𝑑𝑦𝑦 = 8𝑥𝑥[𝑦𝑦 2 /2]1𝑥𝑥 = 2
= 4𝑥𝑥(1 − 𝑥𝑥 2 ), 0 < 𝑥𝑥 < 1
[1]
𝑦𝑦 𝑦𝑦
𝑓𝑓𝑌𝑌 (𝑦𝑦) = ∫0 8𝑥𝑥𝑦𝑦 𝑑𝑑𝑥𝑥 = 8𝑦𝑦[𝑥𝑥 2 /2]0 = 4𝑦𝑦 3 , 0 < 𝑦𝑦 < 1
[1]
(b) Here, 𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥, 𝑦𝑦) ≠ 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑓𝑓𝑌𝑌 (𝑦𝑦) so X and Y are not independent.
[1]
(iii)
𝑦𝑦 𝑥𝑥 𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥,𝑦𝑦) 𝑦𝑦 8𝑥𝑥𝑦𝑦

𝐸𝐸(𝑋𝑋|𝑌𝑌 = 𝑦𝑦) = ∫0 𝑑𝑑𝑥𝑥 = ∫0 𝑥𝑥 4𝑦𝑦 3
𝑑𝑑𝑥𝑥
𝑓𝑓𝑌𝑌 (𝑦𝑦)
[2]
2 𝑦𝑦 2𝑦𝑦
= 3𝑦𝑦 2
[𝑥𝑥 3 ]0 = 3
, 0 < 𝑦𝑦 < 1
[1]
In part (i) many candidates failed to give the full definition, for example mentioning that
the property must hold for all (x, y). Part (ii) was well answered by almost all candidates.
Part (iii) was answered successfully by only the strongest candidates, with many
candidates getting confused with the integral limits or integrating with respect to y instead
of x.
Q7
4
(i) Variance is known, so 𝑋𝑋� ∼ 𝑁𝑁(3, 9) [1]
𝑋𝑋� − 3 4−3
𝑃𝑃[𝑋𝑋� > 4] = 1 − 𝑃𝑃 � < � = 1 − 𝑃𝑃[𝑍𝑍 < 1.5]
2 2
3 3
= 1 − 0.93319 = 0.06681
[1]
(ii)
𝑋𝑋� and 𝑌𝑌� are independent and both are normally distributed. So, 𝑌𝑌� − 𝑋𝑋� is normally
distributed,
[1]
4 10
𝑌𝑌� − 𝑋𝑋� ∼ 𝑁𝑁 �4 − 3, 9 + 18� = 𝑁𝑁(1,1) [1]
𝑃𝑃[𝑋𝑋� > 𝑌𝑌�] = 𝑃𝑃[𝑌𝑌� − 𝑋𝑋� < 0] = 𝑃𝑃[𝑍𝑍 < −1] = 1 − 𝑃𝑃[𝑍𝑍 ≤ 1]
= 1 − 0.84134 = 0.15866 [1]

(iii)
𝑃𝑃[𝑆𝑆𝑋𝑋2 > 4] = 1 − 𝑃𝑃[𝑆𝑆𝑋𝑋2 ≤ 4] [½]

2
8×𝑆𝑆𝑋𝑋
= 1 − 𝑃𝑃 � 4 ≤ 8� [½]
= 1 − 𝑃𝑃[𝜒𝜒82 ≤ 8] [½]
= 1 − 0.5665 = 0.4335 [½]
(iv)
𝑆𝑆 2 /4
𝑆𝑆𝑋𝑋2 and 𝑆𝑆𝑌𝑌2 are independent, and therefore, 𝑆𝑆2𝑋𝑋/10 ∼ 𝐹𝐹8,17 [1]
𝑌𝑌
𝑆𝑆 2 𝑆𝑆 2 /4 10
𝑃𝑃[𝑆𝑆𝑋𝑋2 > 𝑆𝑆𝑌𝑌2 ] = 𝑃𝑃 �𝑆𝑆𝑋𝑋2 > 1� = 𝑃𝑃 �𝑆𝑆2𝑋𝑋/10 > 4
� = 𝑃𝑃�𝐹𝐹8,17 > 2.5� [1]
𝑌𝑌 𝑌𝑌
Checking the 5% probabilities for the 𝐹𝐹8,17 distribution in the “Formulae and Tables”
we find that
𝑃𝑃�𝐹𝐹8,17 > 2.5� ≈ 𝑃𝑃�𝐹𝐹8,17 > 2.548� = 0.05 [1]
(v)
Actually, we have 𝑃𝑃�𝐹𝐹8,17 > 2.5� > 𝑃𝑃�𝐹𝐹8,17 > 2.548�. [1]
So, the probability we are looking for is slightly greater than 5%. [1]
The question was well attempted. In parts (ii) and (iv), reference to independence is
required for full marks.
Q8
(i) (a)
𝜇𝜇 𝑦𝑦 𝑒𝑒 −𝑛𝑛
𝑓𝑓(𝑦𝑦) =
𝑦𝑦!
Taking logs of both sides:
log�𝑓𝑓(𝑦𝑦)� = 𝑦𝑦 log 𝜇𝜇 − 𝜇𝜇 − log 𝑦𝑦!

[1]
Then taking exponentials:
𝑓𝑓(𝑦𝑦) = exp(𝑦𝑦 log 𝜇𝜇 − 𝜇𝜇 − log 𝑦𝑦!) [½]
Comparing this to the generalised form of the exponential family of distributions:

𝑦𝑦𝜃𝜃 − 𝑏𝑏(𝜃𝜃)
𝑓𝑓(𝑦𝑦; 𝜃𝜃; 𝜑𝜑) = exp � + 𝑐𝑐(𝑦𝑦, 𝜑𝜑)�
𝑎𝑎(𝜑𝜑)
We see that:
𝜃𝜃 = log 𝜇𝜇 [½]
𝑏𝑏(𝜃𝜃) = 𝜇𝜇 = 𝑒𝑒 𝜃𝜃 [½]
𝜑𝜑 = 1 [½]
𝑎𝑎(𝜑𝜑) = 1 [½]
𝑐𝑐(𝑦𝑦, 𝜑𝜑) = − log 𝑦𝑦! [½]
(i) (b)
Using the properties of exponential distributions:
𝑑𝑑
𝐸𝐸(𝑌𝑌) = 𝑏𝑏 ′ (𝜃𝜃) = 𝑑𝑑𝜃𝜃 𝑒𝑒 𝜃𝜃 = 𝑒𝑒 𝜃𝜃 = 𝜇𝜇
[1]
𝑑𝑑2
Var(𝑌𝑌) = 𝑎𝑎(𝜑𝜑) 𝑏𝑏′′ (𝜃𝜃) = 1 × 𝑑𝑑𝜃𝜃2 𝑒𝑒 𝜃𝜃 = 𝑒𝑒 𝜃𝜃 = 𝜇𝜇 [1]
(ii) (a) Using the model output, we can see that:
𝛽𝛽 > 2 × standard error (𝛽𝛽)
i.e. 0.42408 > 2 x 0.09352 = 0.18704 [1]
This is a two-tailed test for significance at the 5% significance level, with z-value 1.96
(approximated by 2) – which is based on the null hypothesis of 𝛽𝛽 ~ 𝑁𝑁(0, 0.093522 ).
Since 𝛽𝛽 > 2 × standard error (𝛽𝛽), we can conclude that the parameter 𝛽𝛽 for the
variable ‘tree density’ is significant in the model.
[2]
(ii) (b) Using the Poisson canonical link function, we have:
𝜂𝜂 = log 𝜇𝜇 = 𝛼𝛼 + 𝛽𝛽𝛽𝛽 [1]
So for 𝛽𝛽 = 12, 𝛼𝛼 = −1.54520, and 𝛽𝛽 = 0.42408:

log 𝜇𝜇 = −1.54520 + (0.42408 × 12) = 3.54376 [1]
So the expected number of bears that would be detected is:
𝜇𝜇 = 𝑒𝑒 3.54376 = 34.6 bears ≈ 35 bears [1]
Part (i) was very well answered. Answers in part (ii) were problematic, with many
candidates failing to apply their knowledge to correctly interpret the given model output.

Common errors included applying a 1-sided test in part (ii)(a) and not using the
canonical link function correctly in part (ii)(b).
Q9
(i) 𝑙𝑙(𝑝𝑝) = 𝑐𝑐𝑆𝑆𝑛𝑛𝑐𝑐𝛽𝛽𝑎𝑎𝑛𝑛𝛽𝛽 + 𝑥𝑥 log 𝑝𝑝 + (𝑛𝑛 − 𝑥𝑥) log(1 − 𝑝𝑝) [1]

𝑥𝑥 𝑛𝑛−𝑥𝑥 𝑥𝑥(1−𝑛𝑛)−(𝑛𝑛−𝑥𝑥)𝑛𝑛
𝑙𝑙 ′ (𝑝𝑝) = 𝑛𝑛 − 1−𝑛𝑛 = (1−𝑛𝑛)𝑛𝑛
=0 [1]
0 = 𝑥𝑥(1 − 𝑝𝑝) − (𝑛𝑛 − 𝑥𝑥)𝑝𝑝 = 𝑥𝑥 − 𝑛𝑛𝑝𝑝

𝑥𝑥
𝑝𝑝̂ = 𝑛𝑛 [1]
(ii) 𝜋𝜋(𝑝𝑝) ∝ 𝐿𝐿(𝑝𝑝)𝑓𝑓(𝑝𝑝) [1]

∝ 𝑝𝑝 𝑥𝑥 (1 − 𝑝𝑝)𝑛𝑛−𝑥𝑥 𝑝𝑝𝛼𝛼−1 (1 − 𝑝𝑝)𝛽𝛽−1 = 𝑝𝑝 𝑥𝑥+𝛼𝛼−1 (1 − 𝑝𝑝)𝑛𝑛−𝑥𝑥+𝛽𝛽−1
= 𝑝𝑝 𝑥𝑥+𝛼𝛼−1 (1 − 𝑝𝑝)999−𝑥𝑥+𝛽𝛽 [1]
(iii) The posterior distribution is a beta distribution [1]

with parameters 𝑥𝑥 + 𝛼𝛼 and 𝑛𝑛 − 𝑥𝑥 + 𝛽𝛽. [1]
(iv) The prior distribution and the posterior distribution are of the same type. [1]
The beta distribution is the conjugate prior for the binomial distribution. [1]
50
(v) 𝑝𝑝̂ = 1000 = 0.05 [1]
(vi) Under quadratic loss, the Bayesian estimator is the expectation of the posterior
distribution. [1]
𝑥𝑥+𝛼𝛼 𝑥𝑥+𝛼𝛼
In our case, this is 𝑝𝑝̂ = 𝑥𝑥+𝛼𝛼+𝑛𝑛−𝑥𝑥+𝛽𝛽 = 𝑛𝑛+𝛼𝛼+𝛽𝛽 [1]
52
And for the given parameters we obtain 𝑝𝑝̂ = 1004 = 0.0518 [1]
(vii) The two estimates are almost identical meaning that the impact of the prior
distribution is very limited and the Bayesian estimator is mainly determined by the
data. [1]
(viii) We can write the posterior mean in credibility form as

𝑥𝑥+𝛼𝛼 𝑛𝑛 𝑥𝑥 𝛼𝛼+𝛽𝛽 𝛼𝛼
𝑝𝑝̂ = 𝑛𝑛+𝛼𝛼+𝛽𝛽 = 𝑛𝑛+𝛼𝛼+𝛽𝛽 × 𝑛𝑛 + 𝑛𝑛+𝛼𝛼+𝛽𝛽 × 𝛼𝛼+𝛽𝛽 [1]
𝑝𝑝̂ = 𝑍𝑍𝐸𝐸[𝑋𝑋] + (1 − 𝑍𝑍)𝐸𝐸[𝑝𝑝] [1]

𝑛𝑛
with credibility factor 𝑍𝑍 = 𝑛𝑛+𝛼𝛼+𝛽𝛽 [1]
Alternatively, the numerical answer can be given as

𝑋𝑋 + 2 1000 𝑋𝑋 4 2 1000
= × + × , where 𝑍𝑍 = .
1004 1004 1000 1004 4 1004
The question was very well answered with many candidates achieving high marks across
the different parts. In part (vii) valid comments different from those presented here, were
also given full credit. There were some numerical slips in parts (v) and (vi).
Note that the wording "credibility interval" that is used in the part (viii) is not accurate
and should have been "credibility estimate". The surrounding wording in the same part of
the question is such that the possibility of misunderstanding is minimised. The examiners
did not find evidence of this ambiguity having a negative impact on candidates’
performance.
Q10
(i)
Start by calculating the sum of squares:
8282
𝑆𝑆𝑥𝑥𝑥𝑥 = 59,054 − 12 = 1,922 [½]
𝑆𝑆𝑥𝑥𝑦𝑦 = 1,334
𝑆𝑆𝑥𝑥𝑥𝑥 1,334
𝛽𝛽̂ = 𝑆𝑆𝑥𝑥𝑥𝑥
= = 0.69407
1922
[1]
𝛼𝛼� = 𝑦𝑦� − 𝛽𝛽̂ 𝑥𝑥̅ = 72 − 0.69407 x 69 = 24.10926 [1]
Hence, the fitted regression equation of y on x is: 𝑦𝑦� = 24.10917 + 0.69407x . [½]
8642
(ii) (a) 𝑆𝑆𝑦𝑦𝑦𝑦 = 63,362 − = 1,154 , [½]
12
so:
2
𝑆𝑆𝑥𝑥𝑥𝑥
1
𝜎𝜎� 2 = 𝑛𝑛−2 �𝑆𝑆𝑦𝑦𝑦𝑦 − 𝑆𝑆 � [½]
𝑥𝑥𝑥𝑥
1 1,3342
= 10 �1,154 − � [½]
1,922
= 22.81124 [½]
�2
10𝜎𝜎 2
(ii) (b) Now 𝜎𝜎2
̴ 𝜒𝜒10 , which gives a confidence interval for 𝜎𝜎 2 of: [1]
10 x 22.81124 10 x 22.81124
� 18.31
, 3.94
� = (12.46 , 57.90) [1]
(iii) We test 𝐻𝐻0 : 𝛽𝛽 = 0 𝑣𝑣𝑐𝑐 𝐻𝐻1 : 𝛽𝛽 > 0 . [1]
� −𝛽𝛽
𝛽𝛽
Now ̴ 𝛽𝛽10 . The observed value here is:
� 2 /𝑆𝑆𝑥𝑥𝑥𝑥 )
�(𝜎𝜎
0.69407 − 0
= 6.371
��22.81124�
1922
[1]

This is a significant result which exceeds the 0.5% critical value of 3.169. [1]
So there is sufficient evidence at the 0.5% level to reject 𝐻𝐻0 , and conclude that
𝛽𝛽 > 0 (i.e. that the data is positively correlated). [1]
(iv) The variance of the distribution of the second-part exam score corresponding to a
first-part exam score of 57 is:
1 (𝑥𝑥0 − 𝑥𝑥̅ )2 2 1 (57 − 69)2
� + � 𝜎𝜎� = � + � x 22.81124 = 3.610
𝑛𝑛 𝑆𝑆𝑥𝑥𝑥𝑥 12 1922
[1]
The predicted value is 24.10917 + 0.69407 x 57 = 63.67116.
[1]
Using the 𝛽𝛽10 distribution, the 95% confidence interval is:
63.67116 ± 2.228 x √3.610 = (59.44 , 67.90)
[1]
(v) We are testing 𝐻𝐻0 : 𝜌𝜌 = 0.75 𝑣𝑣𝑐𝑐 𝐻𝐻1 : 𝜌𝜌 ≠ 0.75
1
If 𝐻𝐻0 is true, then the test statistic 𝑍𝑍𝑟𝑟 has a 𝑁𝑁 �𝑧𝑧𝜌𝜌 , 9� distribution, where:
1 1 + 0.75
𝑧𝑧𝜌𝜌 = log = 0.9729551 .
2 1 − 0.75
[½]
Pearson's correlation coefficient can be calculated as
1334
𝑟𝑟 = = 0.89573
√1922 × 1154
[1]
1 1+0.89573
The observed value of this statistic is 𝑧𝑧𝑟𝑟 = 2 log 1−0.89573 = 1.45018 ,
[½]
1.45018 − 0.9729551
which corresponds to a value of 1
= 1.431435
�
9
[1]
on the N(0 , 1) distribution.
This is less than 1.96, the upper 2.5% point of the standard normal distribution.
So there is insufficient evidence at the 5% level to reject 𝐻𝐻0 ,

i.e. the data do not provide enough evidence to conclude that the correlation
parameter is different from 0.75.
[1]
(vi) The proportion of variability explained by the model is by 𝑅𝑅 2 :

1,334 2
𝑅𝑅 2 = 𝑟𝑟 2 = � � = 0.802329 = 80.2%
√1922 x 1154
[1]

(vii) 80.2% of the variation is explained by the model, which indicates that
the fit is very good.
[1]
The question was generally well answered. In part (iii) the test needs to involve the slope
parameter for full marks (rather than, say, the correlation coefficient). Part (iv) asks for
the “mean” response – a common error here was to provide the individual response.
END OF EXAMINERS’ REPORT

EXAMINATION
17 September 2019 (am)

Core Principles
INSTRUCTIONS TO THE CANDIDATE
1. Enter all the candidate and examination details as requested on the front of your
answer booklet.
2. You must not start writing your answers in the booklet until instructed to do so by the
supervisor.
3. Mark allocations are shown in brackets.
4. Attempt all questions, begin your answer to each question on a new page.
5. Candidates should show calculations where this is appropriate.
Graph paper is NOT required for this paper.
AT THE END OF THE EXAMINATION
Hand in BOTH your answer booklet, with any additional sheets firmly attached, and this
question paper.
CS1A S2019 © Institute and Faculty of Actuaries

1 A survey showed that 40% of investors invest in at least two companies in order to
diversify their risk.
Calculate an approximate probability that more than 100 investors have invested in at
least two companies in a random sample of 300 investors. [3]
2 Let X1, X2, …, Xn be a random sample consisting of independent random variables

with mean µ and variance σ2. Consider the sample mean:
X =
∑i = 1 X i
n
(i) Derive the expected value of X . [1]
(ii) Derive the variance of X . [2]
(iii) Comment on the variance of variable X as compared to the variance of Xi. [1]
An actuary is interested in exploring the difference in the size of claim losses from
two insurance portfolios, and can take samples of claims from these portfolios.
(iv) Explain how the answer to part (iii) can affect the precision of the actuary’s
comparison. [2]
[Total 6]
3 The table below shows the annual aggregate claim statistics for three risks over four
years. The annual aggregate claim for risk i, in year j, is denoted by Xij.
4 4
1 1 2
Risk i Xi =
4
∑ X ij si2 = ∑ X – Xi
(
3 j = 1 ij
)
j =1
1 2,109 3,959,980
2 6,152 7,543,626
3 3,016 3,151,286
(i) Calculate the value of the credibility factor for Empirical Bayes Model 1. [4]
(ii) Comment on how each of the following features of the data affects the value of
the credibility factor calculated in part (i):
(a) the number of years of data

(b) the variance of the claim amounts.
[2]
[Total 6]
CS1A S2019–2
4 X and Y are discrete random variables with joint distribution as follows:
X=0 X=1 X=3
Y = –1 0.08 0.03 0.00
Y =0 0.03 0.12 0.20
Y=3 0.11 0.11 0.06
Y = 4.5 0.04 0.20 0.02
(i) Calculate:
(a) E(Y | X = 1)
(b) Var(X | Y = 3).
[5]
(ii) Calculate the probability functions of the marginal distributions for X and Y.
[2]
(iii) Determine whether X and Y are independent. [2]

[Total 9]
CS1A S2019–3 PLEASE TURN OVER

5 An insurance portfolio has a set of n policies (i = 1, 2, …, n), for which the company
has recorded the number of claims per month, Yij, for m months ( j = 1, 2, …, m). It
is assumed that the number of claims for each policy, for each month, are independent
Poisson random variables with E[Yij] = µij. These random variables are modelled
using a simple generalised linear model, with log(µij) = βi, for (i = 1, 2, …, n).
(i) Derive the maximum likelihood estimator of βi. [4]
(ii) Show that the deviance for this model is:
n m y
D = 2∑ ∑ H yij log yij – ( yij – yi )J
i =1j =1 i
where yi is the average number of claims per month for policy i:
m yij
yi = ∑ m
j =1
[4]
The company has data for each month over a three-year period. For one policy, the
average number of claims per month was 18.95. In the most recent month for this
policy, there were seven claims.
(iii) Determine the part of the total deviance that comes from this single
observation. [2]
[Total 10]
CS1A S2019–4
6 An actuary is asked to check a linear regression calculation performed by a trainee.
The trainee reports a least squares slope parameter estimate of b! = 13.7 and a sample
correlation coefficient r = –0.89.
(i) Justify why this suggests that the trainee has made an error. [2]
In a different simple linear regression model, a histogram of the residuals is shown

below.
200
150
Frequency
100
50
0
–0.4 –0.2 0.0 0.2 0.4 0.6 0.8

Residuals
(ii) Comment on the validity of the assumptions of the linear model. [2]
The following pairs of data are available:
x 0 1 2 3 4 5 6 7 8 9
y −1.35 −4.96 − 9.20 −13.15 −16.70 −21.23 −25.14 −28.44 −33.68 −37.39
for which
10 10
2 2
y = – 19.124, ∑ ( yi – y ) = 1,329.523, ∑ ( xi – x ) = 82.5,
i =1 i =1
10
∑ ( xi – x ) ( yi – y ) = –331.05
i =1
A linear model of the form y = a + bx + e is fitted to the data, where the error terms
(e) independently follow a N(0, s2) distribution, and where a, b and s2 are unknown
parameters.
(iii) Determine the fitted line of the regression model. [3]
(iv) Calculate a 95% confidence interval for the predicted mean response if x = 11.
[5]
(v) Comment on the width of a 95% confidence interval for the predicted mean
response if x = 3.5, as compared to the width of the interval in part (iv),
without calculating the new interval. [2]
[Total 14]
7 An actuary has designed a new product to insure luxury apartments. If there is
a claim, her insurance company pays a fixed sum of £1 million per claim. The
probability of a claim on a policy in a given year is θ and the probability of more than
one claim on a policy in any given year is zero. The actuary’s prior beliefs about θ are
given by a Beta distribution with parameters a = 3 and b = 5.
In the first year, the company insured 300 apartments and in the second year it insured
300 + x apartments, where x is an integer. In year 1 the total amount of claims was
£39 million, while in year 2 it was £60 million.
(i) Show that the posterior distribution of θ is Beta with parameters 102 and
506 + x. [7]
(ii) Derive the Bayesian estimate of θ in terms of x, under quadratic loss. [2]
(iii) Derive the Bayesian estimate of θ in terms of x, under all-or-nothing loss. [4]
(iv) Justify that, in this case, the Bayesian estimate of θ cannot be the same under
quadratic and all-or-nothing loss. [2]
[Total 15]
CS1A S2019–6
8 A city is experiencing a high crime rate, particularly burglaries. A sample of 100
streets is taken, and in each of the sampled streets, a sample of six similar houses
is taken. The table below shows the number of sampled houses, x, which have had
burglaries during the last six months, and the corresponding frequency, f, in terms of
number of streets.
Number of houses burgled, x 0 1 2 3 4 5 6
Number of streets, f 39 36 19 4 1 1 0
It is assumed that the number of sampled houses per street that have been burgled
during the last six months follows a Binomial distribution, i.e. X ~ Bin(6, p).
(i) State any assumptions needed to justify the use of a Binomial distribution
for X. [2]
(ii) Show that the maximum likelihood estimate of p, the probability that a sample
house has been burgled during the last six months, is p̂ = 0.1583. [5]
(iii) Determine the fitted values of the Binomial model using the estimate of p from
part (ii). [2]
(iv) Comment on the fit of the model in part (iii), without doing any formal tests.
[1]
An insurance company works on the basis that the probability of a house being
burgled over a six-month period is 0.13.
(v) Perform a test to investigate whether the Binomial model with this value of p
provides a good fit for the data. [7]
[Total 17]

9 An actuary wants to model a particular type of claim size and has been advised to use
a Gamma distribution with probability distribution function:
x
xα – 1 –
f (x, α, θ) = e 0 < x < ∞, α > 0, θ > 0.
θ,
α
Γ(α)θ
(i) Show, using moment generating functions, that:
(a) E(X ) = αθ
(b) E(X 2) = α(α + 1)θ2
(c) E(X 3) = α(α + 1)(α + 2)θ3.
[3]
The shape parameter alpha is assumed to be α = 4.
(ii) (a) Determine the variance of the claim size distribution in terms of θ.
(b) Calculate the coefficient of skewness of the claim size distribution,

which is defined as:
3
F
E ( X – E( X )) G
1.5
H EF( X – E( X ))2 GJ .
[4]
Let X1, X2, …, Xn be a random sample of n claim sizes for such claims.
(iii) Show that the maximum likelihood estimator (MLE) of θ is given by:
θ! =
X
. [3]
4
(iv) Show that θ! is an unbiased estimator of θ. [1]
A sample of n = 100 claim sizes yields ∑ xi = 796.2 and ∑ xi2 = 8,189.4.

(v) Calculate the MLE of θ. [1]
(vi) (a) Calculate the sample variance.

(b) Compare the result in part (vi)(a) with the variance of the distribution
evaluated at θ! .
[2]
CS1A S2019–8
The sample coefficient of skewness is given as 1.12.
(vii) Comment on its comparison with the coefficient of skewness of the

distribution, calculated in part (ii)(b). [1]
(viii) Calculate an appropriate 95% confidence interval for θ by using an

approximate 95% confidence interval for the mean of the distribution of the
claim size. [3]
(ix) (a) Determine the variance of the distribution of θ at both lower and upper
limits of the confidence interval calculated in part (viii).
(b) Comment on the result in part (ix)(a) with reference to your answer in
part (vi)(a) above.
[2]
[Total 20]
END OF PAPER
CS1A S2019–9
EXAMINERS’ REPORT
September 2019 Examinations
Subject CS1 – Actuarial Statistics Core Principles

(Part A)
Introduction
Mike Hammer
September 2019
© Institute and Faculty of Actuaries

Subject CS1 (Actuarial Statistics Core Principles) part A– September 2019 – Examiner’s report
1. The aim of the Actuarial Statistics subject is to provide a grounding in mathematical and
statistical techniques that are of particular relevance to actuarial work.
2. Some of the questions in the examination paper admit alternative solutions from these
presented in this report, or different ways in which the provided answer can be determined.
All mathematically correct and valid alternative solutions or answers received credit as
appropriate.
3. Rounding errors were not penalised, but candidates lost marks where excessive rounding
led to significantly different answers.
4. In cases where the same error was carried forward to later parts of the answer, candidates
were given full credit for the later parts.
5. In questions where comments were required, valid comments that were different from
those provided in the solutions also received full credit where appropriate.
B. Comments on student performance in this diet of the examination.
1. Performance was satisfactory in general, but varied considerably among candidates. Well
prepared candidates were able to score highly.
2. This is a relatively new subject under the recently introduced curriculum, and combines a
number of topics from previous CT subjects (CT3 and CT6). A number of candidates
appeared to be inadequately prepared, in terms of not having covered sufficiently the
entire breadth of the subject.
S2019
C. Pass Mark
The combined pass mark for CS1 in this exam diet was 55.
S2019
Solutions Subject CS1 – A
Q1
If 𝑋𝑋 is the number of people who have at least two investments, 𝑋𝑋 follows a binomial (300,
0.4) distribution and:
𝐸𝐸[𝑋𝑋] = 300 × 0.4 = 120 and 𝑉𝑉[𝑋𝑋] = 72.

[1]
Then, using continuity correction, [½]
100.5 − 120
𝑃𝑃(𝑋𝑋 > 100) = 𝑃𝑃(𝑋𝑋 ≥ 100.5) = 1 − Φ � � = 1 − Φ(−2.298) = Φ(2.298)
√72
= 0.989
[1.5]
[Total 3]
The question was answered well by most candidates. Attention should be given to applying
the continuity correction properly.
Q2
𝑛𝑛
∑𝑖𝑖=1 𝑋𝑋𝑖𝑖 ∑𝑛𝑛
𝑖𝑖=1 𝐸𝐸(𝑋𝑋𝑖𝑖 ) 𝑛𝑛µ
(i) 𝐸𝐸(𝑋𝑋�) = 𝐸𝐸 � 𝑛𝑛
� = 𝑛𝑛
= 𝑛𝑛
=µ [1]
∑ 𝑛𝑛 ∑ 𝑛𝑛
𝑋𝑋 𝑉𝑉(𝑋𝑋 )
(ii) 𝑉𝑉(𝑋𝑋�) = 𝑉𝑉 � 𝑖𝑖=1 𝑖𝑖 � = 𝑖𝑖=1𝑛𝑛2 𝑖𝑖 because of independence [1]
𝑛𝑛
𝑛𝑛σ2 σ2
= 𝑛𝑛2
= [1]
𝑛𝑛
(iii) The variance of the sample mean is smaller compared to the variance of individual
variables. [1]
(iv) Individual values are less precise than the average of a sample. [1]
Larger sample leads to smaller variance. [1]
[Total 6]
S2019
Parts (i)-(iii) were answered very well. In part (ii), independence must be
mentioned for a fully justified derivation. Part (iv) was not well answered, with
many answers being vague.
Q3
(i) 𝐸𝐸[𝑠𝑠 2 (𝜃𝜃)] is estimated by the average of the sample variances, therefore:
3,959,980 + 7,543,626 + 3,151,286

= 4,884,964
3
[½]
The sample mean of the 𝑋𝑋�𝚤𝚤 ’s is:
2,109 + 6,152 + 3,016

𝑋𝑋� = = 3,759
3
[½]
And the sample variance of the 𝑋𝑋�𝚤𝚤 ’s is:
4
1
�(𝑋𝑋�𝚤𝚤 − 𝑋𝑋�) = 0.5 × ((2,109 − 3,759)2 + (6,152 − 3,759)2 + (3,01 − 3,759)2 )
3−1
𝑖𝑖=1
= 4,500,499 [1]
So 𝑉𝑉𝑉𝑉𝑉𝑉[𝑚𝑚(𝜃𝜃)] is estimated by:

.
1 1 1
�(𝑋𝑋�𝚤𝚤 − 𝑋𝑋�)2 − 𝐸𝐸[𝑠𝑠 2 (𝜃𝜃)] = 4,500,499 − x 4,884,964 = 3,279,258
2 4 4
.
[1]
The credibility factor,
𝑛𝑛
𝑍𝑍 =
𝐸𝐸[𝑠𝑠 2 (𝜃𝜃)]
𝑛𝑛 +
𝑉𝑉𝑉𝑉𝑉𝑉[𝑚𝑚(𝜃𝜃)]
is then estimated by:
4
𝑍𝑍 = = 0.72864
4,884,964
4+
3,279,258
[1]
(ii) Z is an increasing function of n, the number of years of past data. If we have more
than 4 years of past data, the credibility factor will increase. [1]
Z is a decreasing function of [𝑠𝑠 2 (𝜃𝜃)] . If 𝐸𝐸[𝑠𝑠 2 (𝜃𝜃)] increases, e.g. if the variance of the
claim amounts from one or more of the risks were to increase, then the value of the
credibility factor will fall. [1]
[Total 6]
S2019
Answers in part (i) were satisfactory, with a small number of calculation or arithmetic
errors. A common error in part (ii) was trying to justify that credibility increases as
variance increases.
Q4
(i) (a) E(Y | X = 1)
= ∑𝑦𝑦 𝑦𝑦 𝑃𝑃(𝑌𝑌 = 𝑦𝑦 | 𝑋𝑋 = 1) [½]
𝑃𝑃(𝑌𝑌 = 𝑦𝑦, 𝑋𝑋 = 1)
=∑𝑦𝑦 𝑦𝑦 [½]
𝑃𝑃(𝑋𝑋 = 1)
0.03 0.11 0.2

= (–1 × 0.46 ) + (3 × 0.46 ) + (4.5 × 0.46)
= 2.6087 [1]
(b) Var(X | Y = 3) = E(X2 | Y = 3) – (E(X | Y = 3))2 [1]

0.11 0.06 0.11 0.06
= (1 × 0.28 ) + (9 × 0.28 ) – ((1 × 0.28 ) + (3 × 0.28 ))2 [1]
= 2.3214 – (1.0357)2
= 1.2487 [1]
(ii) Summing columns gives:
𝑃𝑃(𝑋𝑋 = 0) = 0.26, 𝑃𝑃(𝑋𝑋 = 1) = 0.46, 𝑃𝑃(𝑋𝑋 = 3) = 0.28 [1]
Summing rows gives:
𝑃𝑃(𝑌𝑌 = −1) = 0.11, 𝑃𝑃(𝑌𝑌 = 0) = 0.35, 𝑃𝑃(𝑌𝑌 = 3) = 0.28,

𝑃𝑃(𝑌𝑌 = 4.5) = 0.26 [1]
(iii) Show that this result 𝑃𝑃(𝑋𝑋 = 𝑥𝑥, 𝑌𝑌 = 𝑦𝑦) = 𝑃𝑃(𝑋𝑋 = 𝑥𝑥) 𝑃𝑃(𝑌𝑌 = 𝑦𝑦) does not hold for
one pair, for example: [1]
𝑃𝑃(𝑋𝑋 = 0, 𝑌𝑌 = −1) = 0.08 ≠ P(𝑋𝑋 = 0) × 𝑃𝑃(𝑌𝑌 = −1)
Correct conclusion that X and Y are NOT independent. [1]

[Total 9]
The question was reasonably well attempted. A common error in part (i) was not applying
the expectation correctly for a conditional probability, e.g. by missing the division
element.
S2019
Q5
𝑦𝑦 −𝜇𝜇
𝜇𝜇𝑖𝑖𝑖𝑖 𝑖𝑖𝑖𝑖 𝑒𝑒 𝑖𝑖𝑖𝑖
(i) 𝐿𝐿�𝜇𝜇𝑖𝑖𝑖𝑖 ; 𝑦𝑦𝑖𝑖𝑖𝑖 � = ∏𝑛𝑛𝑖𝑖=1 ∏𝑚𝑚
𝑖𝑖=1 𝑦𝑦𝑖𝑖𝑖𝑖 !
𝑛𝑛 𝑚𝑚
𝑙𝑙�𝜇𝜇𝑖𝑖𝑖𝑖 ; 𝑦𝑦𝑖𝑖𝑖𝑖 � = log �𝐿𝐿�𝜇𝜇𝑖𝑖𝑖𝑖 ; 𝑦𝑦𝑖𝑖𝑖𝑖 �� = � ��𝑦𝑦𝑖𝑖𝑖𝑖 log�𝜇𝜇𝑖𝑖𝑖𝑖 � − 𝜇𝜇𝑖𝑖𝑖𝑖 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !��
𝑖𝑖=1 𝑖𝑖=1
= ∑𝑛𝑛𝑖𝑖=1 ∑𝑚𝑚 𝛽𝛽𝑖𝑖

𝑖𝑖=1�𝑦𝑦𝑖𝑖𝑖𝑖 𝛽𝛽𝑖𝑖 − 𝑒𝑒 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !�� [2]
𝑑𝑑
𝑑𝑑𝛽𝛽𝑖𝑖
𝑙𝑙�𝜇𝜇𝑖𝑖𝑖𝑖 ; 𝑦𝑦𝑖𝑖𝑖𝑖 � = ∑𝑚𝑚
𝑖𝑖=1 𝑦𝑦𝑖𝑖𝑖𝑖 − 𝑚𝑚𝑒𝑒
𝛽𝛽𝑖𝑖
[1]
And,
𝑑𝑑 � 𝑦𝑦𝑖𝑖𝑖𝑖
𝑑𝑑𝛽𝛽𝑖𝑖
𝑙𝑙�𝜇𝜇𝑖𝑖𝑖𝑖 ; 𝑦𝑦𝑖𝑖𝑖𝑖 � = 0 ⇒ 𝑒𝑒 𝛽𝛽𝑖𝑖 = ∑𝑚𝑚
𝑖𝑖=1 ⇒ 𝛽𝛽̂𝑖𝑖 = (𝑦𝑦�𝑖𝑖 ) [1]
𝑚𝑚
where:
𝑚𝑚
𝑦𝑦𝑖𝑖𝑖𝑖
𝑦𝑦�𝑖𝑖 = �
𝑚𝑚
𝑖𝑖=1
(ii) For the deviance we have:
𝑙𝑙𝑠𝑠 = ∑𝑛𝑛𝑖𝑖=1 ∑𝑚𝑚

𝑖𝑖=1�𝑦𝑦𝑖𝑖𝑖𝑖 log(𝑦𝑦𝑖𝑖𝑖𝑖 ) − 𝑦𝑦𝑖𝑖𝑖𝑖 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !�� [1]
𝑙𝑙𝑐𝑐 = ∑𝑛𝑛𝑖𝑖=1 ∑𝑚𝑚

𝑖𝑖=1�𝑦𝑦𝑖𝑖𝑖𝑖 log(𝑦𝑦
�𝑖𝑖 ) − 𝑦𝑦�𝑖𝑖 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !�� [1]
𝐷𝐷 = 2(𝑙𝑙𝑠𝑠 − 𝑙𝑙𝑐𝑐 )
𝑛𝑛 𝑚𝑚
= 2 �� 𝑦𝑦𝑖𝑖𝑖𝑖 log�𝑦𝑦𝑖𝑖𝑖𝑖 � − 𝑦𝑦𝑖𝑖𝑖𝑖 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !��

𝑛𝑛 𝑚𝑚
− � ��𝑦𝑦𝑖𝑖𝑖𝑖 log(𝑦𝑦�𝑖𝑖 ) − 𝑦𝑦�𝑖𝑖 − log�𝑦𝑦𝑖𝑖𝑖𝑖 !��

𝑛𝑛 𝑚𝑚
= 2 � �{𝑦𝑦𝑖𝑖𝑖𝑖 log − (𝑦𝑦𝑖𝑖𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )}
𝑦𝑦�𝑖𝑖
where:
𝑚𝑚
𝑦𝑦�𝑖𝑖 = �
𝑚𝑚
𝑖𝑖=1
[2]
S2019
(iii) In this case we have:
𝑦𝑦𝑖𝑖𝑖𝑖 = 7, 𝑦𝑦�𝑖𝑖 = 18.95 [1]

7
𝐷𝐷𝑖𝑖𝑖𝑖 = 2 �7 log �18.95� − (7 − 18.95)� = 9.957 [1]
[Total 10]
Answers to this question were weak in general. The question concerns the MLE and
deviance of a simplified Poisson GLM. Part (iii) requires a calculation by inserting
numerical values in a given expression.
Q6
(i) The regression slope suggests a positive relationship between the two variables, while
the correlation coefficient shows a strong negative relationship. [2]
(ii) The histogram suggests a non-symmetric distribution for the residuals [1]
Non-symmetric about zero. [1]
𝑆𝑆𝑥𝑥𝑦𝑦 −331.05
(iii) β� = 𝑆𝑆 = 82.5 = −4.013 [1]
𝑥𝑥𝑥𝑥
45
� = 𝑦𝑦� − β�𝑥𝑥̅ = −19.124 + 4.013 �10� = −1.067 to 4 s.f.
α [1]
Line given as: 𝑦𝑦� = −1.066 − 4.013𝑥𝑥 [1]
(iv) Predicted value is: 𝑦𝑦� = −1.066 − 4.013 ∗ 11 = −45.207 [½]
𝑆𝑆𝑥𝑥𝑦𝑦 2 (−331.05)2
�𝑆𝑆𝑦𝑦𝑦𝑦 − � �1329.523 − �
𝑆𝑆𝑥𝑥𝑥𝑥 82.5
σ2 =
� = = 0.1387045
𝑛𝑛 − 2 8
[1]
45
1 (𝑥𝑥𝑛𝑛𝑒𝑒𝑥𝑥−𝑥𝑥̅ )2 1 (11− )2
2
𝑉𝑉(𝑦𝑦�) = � 𝑛𝑛 + 𝑆𝑆𝑥𝑥𝑥𝑥
�×σ
� = � 10 + 10
82.5
� × 0.1387595 = 0.08490399 [1.5]
And 𝑡𝑡8,0.025 = 2.306 [½]
95% CI for mean 𝑦𝑦� is given by: −45.207 ± 2.306 × (0.08490399)1/2
i.e. (−45.879, −44.535). [1.5]
(v) The width of the interval is only affected by 𝑉𝑉(𝑦𝑦�), which depends on the new x value
through the term (𝑥𝑥𝑛𝑛𝑒𝑒𝑥𝑥 − 𝑥𝑥̅ )2 . This term will now be smaller as the new 𝑥𝑥𝑛𝑛𝑒𝑒𝑥𝑥 = 3.5
value is closer to 𝑥𝑥̅ than 𝑥𝑥 = 11. Therefore the interval will be narrower.
[2]
S2019
[Total 14]
The question was reasonably well answered by most candidates. In part (i) many
candidates provided a reasonable algebraic argument using known formulae. In part (iv)
a common issue was using a normal or chi-squared pivotal quantity. In part (v) most
candidates identified correctly the impact on the interval. However, note that an
appropriate explanation of why the interval is narrower is required here.
Q7
(i) Based on data from two years (y), the likelihood is:
𝑓𝑓(𝑦𝑦|𝜃𝜃) ∝ 𝜃𝜃 39 (1 − 𝜃𝜃)261 𝜃𝜃 60 (1 − 𝜃𝜃)240+𝑥𝑥 = 𝜃𝜃 99 (1 − 𝜃𝜃)501+𝑥𝑥 [2]
Prior for 𝜃𝜃 is: 𝑓𝑓(𝜃𝜃) ∝ 𝜃𝜃 2 (1 − 𝜃𝜃)4 [1]
So the posterior density is given by:
𝑓𝑓(𝜃𝜃|𝑦𝑦) ∝ 𝑓𝑓(𝑦𝑦|𝜃𝜃) × 𝑓𝑓(𝜃𝜃) = 𝜃𝜃101 (1 − 𝜃𝜃)505+𝑥𝑥 [2]
which is the density of a Beta(102, 506 + x) distribution. [2]
(ii) The Bayesian estimate under quadratic loss is the posterior mean, so
102 102
𝜃𝜃� = 𝐸𝐸(𝜃𝜃|𝑦𝑦) = 102+506+𝑥𝑥
= 608+𝑥𝑥 [2]
(iii) The Bayesian estimate under all-or-nothing loss is the posterior mode,
so we now need to maximise the posterior density:
𝑑𝑑
𝑓𝑓(𝜃𝜃|𝑦𝑦) = 101 𝜃𝜃100 (1 − 𝜃𝜃)505+𝑥𝑥 − 𝜃𝜃101 (505 + 𝑥𝑥)(1 − 𝜃𝜃)504+𝑥𝑥
𝑑𝑑𝜃𝜃
= 𝜃𝜃100 (1 − 𝜃𝜃)504+𝑥𝑥 [101(1 − 𝜃𝜃) − (505 + 𝑥𝑥)𝜃𝜃] [2]
𝑑𝑑 101
𝑑𝑑𝑑𝑑
𝑓𝑓(𝜃𝜃|𝑦𝑦) = 0 ⇒ 101�1 − 𝜃𝜃�� − (505 + 𝑥𝑥)𝜃𝜃� ⇒ 𝜃𝜃� = 606+𝑥𝑥
[2]
(iv) In this case, for 𝜃𝜃� = 𝜃𝜃� we need:
102 101
= ⇒ 102𝑥𝑥 + 61812 − 101𝑥𝑥 − 61408 = 0 ⇒ 𝑥𝑥 = −404
608 + 𝑥𝑥 606 + 𝑥𝑥
S2019
[1]
This means that the number of apartments in year 2 would be 300 – 404 = – 104
which is not possible. [1]
[Total 15]
The question was not well answered overall. Parts (i) and (ii) were reasonably well
attempted. In part (iii) some candidates worked with the logarithm of the posterior
density, which is a valid alternative way to answer the question. The justification in part
(iv) was poor in many cases.
Q8
(i) Each house must have the same probability of being burgled. [1]
Whether a house is burgled or not is independent of other houses being burgled.

[1]
(ii) 𝐿𝐿(𝑝𝑝) = [𝑃𝑃(𝑥𝑥 = 0)]39 [𝑃𝑃(𝑥𝑥 = 1)]36 [𝑃𝑃(𝑥𝑥 = 2)]19 [𝑃𝑃(𝑥𝑥 = 3)]4 [𝑃𝑃(𝑥𝑥 = 4)][𝑃𝑃(𝑥𝑥 = 5)]
[1]
Using a Bin(6 , p) distribution to calculate the probabilities:
𝐿𝐿(𝑝𝑝) = 𝑐𝑐 × [(1 − 𝑝𝑝)6 ]39 [𝑝𝑝(1 − 𝑝𝑝)5 ]36 [𝑝𝑝2 (1 − 𝑝𝑝)4 ]19 [𝑝𝑝3 (1 − 𝑝𝑝)3 ]4 [𝑝𝑝4 (1 − 𝑝𝑝)2 ]𝑝𝑝5 (1 − 𝑝𝑝)
= 𝑐𝑐 × 𝑝𝑝95 (1 − 𝑝𝑝)505 [1]
log 𝐿𝐿(𝑝𝑝) = log 𝑐𝑐 + 95 log 𝑝𝑝 + 505 log(1 − 𝑝𝑝)
𝜕𝜕 95 505
𝜕𝜕𝜕𝜕
log 𝐿𝐿(𝑝𝑝) = 𝜕𝜕
− 1−𝜕𝜕 [1]
Setting the differential equal to zero to obtain the maximum:
95 505
𝜕𝜕�
− 1−𝜕𝜕� = 0  95(1 − 𝑝𝑝̂ ) − 505𝑝𝑝̂ = 0
95
𝑝𝑝̂ = 95+505 = 0.158333 [1]
Checking it’s a maximum:
𝜕𝜕2 95 505
𝜕𝜕𝜕𝜕2
log 𝐿𝐿(𝑝𝑝) = − 𝜕𝜕2 − (1−𝜕𝜕)2 < 0 → 𝑀𝑀𝑉𝑉𝑥𝑥 [1]
(iii) Using the estimate of 𝑝𝑝̂ we get the frequencies of 35.55 , 40.13 , 18.87 , 4.73 , 0.67 ,
6
0.05 , 0.0 , using 𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = � � 𝑝𝑝̂ 𝑥𝑥 (1 − 𝑝𝑝̂ )6−𝑥𝑥 . [2]
𝑥𝑥
S2019
(iv) These are fairly similar to the observed frequencies – implying that it is a good fit. [1]
6
(v) Using 𝑝𝑝 = 0.13 and 𝑃𝑃(𝑋𝑋 = 𝑥𝑥) = � � 0.13𝑥𝑥 (0.87)6−𝑥𝑥 we get
𝑥𝑥
0 1 2 3 4 6 5
Observed 39 36 19 4 1 0 1
Expected 43.36 38.88 14.52 2.89 0.32 0.02
0.00
[2]
Since the expected frequencies are less than 5 for 3, 4, 5 and 6 houses burgled, we need to
combine these columns:
0 1 2 3+
Observed 39 36 19 6
Expected 43.36 38.88 14.52 3.23
[1]
Calculating the statistic:
(39−43.36)2 (6−3.23)2
𝜒𝜒 2 = 43.36
+⋯+ 3.23
= 4.409516 [1]
There are now 4 groups so the number of degrees of freedom is 4 – 1 = 3. No further

reduction is made for 𝑝𝑝, as this was given rather than estimated. [1]
Carry out a one-sided test. The observed value of the test statistic is less than the 5% critical
value of 7.815. [1]
So there is insufficient evidence to reject H 0 at the 5% level. Therefore it is reasonable to

conclude that the model is a good fit. [1]
[Total 17]
The question was generally not well answered. In part (i) many candidates failed to give
the standard assumptions required for using a binomial distribution. Answers to part (ii)
were generally satisfactory. A number of candidates did not attempt parts (iii) and (iv),
while many candidates that attempted them failed to derive the frequencies correctly. Part
(v) concerns a standard goodness of fit chi-squared test, which many candidates failed to
apply correctly. Note that in part (v), an alternative valid answer can be provided by
combining the 2 and 3+ groups. This gives a different value for the statistic, but the same
conclusion.
Q9
(i) 𝑀𝑀𝑋𝑋 (𝑡𝑡) = (1 − 𝜃𝜃𝑡𝑡)−𝛼𝛼
(a) 𝑀𝑀𝑋𝑋′ (𝑡𝑡) = 𝛼𝛼𝜃𝜃(1 − 𝜃𝜃𝑡𝑡)−𝛼𝛼−1 , hence, 𝐸𝐸[𝑋𝑋] = 𝑀𝑀𝑋𝑋′ (0) = 𝛼𝛼𝜃𝜃 [1]
S2019
(b) 𝑀𝑀𝑋𝑋′′ (𝑡𝑡) = 𝛼𝛼(𝛼𝛼 + 1)𝜃𝜃 2 (1 − 𝜃𝜃𝑡𝑡)−𝛼𝛼−2 , therefore, 𝐸𝐸[𝑋𝑋 2 ] = 𝑀𝑀𝑋𝑋′′ (0) = 𝛼𝛼(𝛼𝛼 + 1)𝜃𝜃 2 [1]
(c) 𝑀𝑀𝑋𝑋′′′ (𝑡𝑡) = 𝛼𝛼(𝛼𝛼 + 1)(𝛼𝛼 + 2)𝜃𝜃 3 (1 − 𝜃𝜃𝑡𝑡)−𝛼𝛼−3

implies 𝐸𝐸[𝑋𝑋 3 ] = 𝑀𝑀𝑋𝑋′′′ (0) = 𝛼𝛼(𝛼𝛼 + 1)(𝛼𝛼 + 2)𝜃𝜃 3 . [1]
(ii) (a) With 𝛼𝛼 = 4, we have:
𝐸𝐸[𝑋𝑋] = 4𝜃𝜃, 𝐸𝐸[𝑋𝑋 2 ] = 20𝜃𝜃 2 , 𝐸𝐸[𝑋𝑋 3 ] = 120𝜃𝜃 3
Hence:
𝜎𝜎 2 = 𝐸𝐸[𝑋𝑋 2 ] − (𝐸𝐸[𝑋𝑋])2 = 20𝜃𝜃 2 − (4𝜃𝜃)2 = 4𝜃𝜃 2
[2]
(b)
𝜇𝜇3 = 𝐸𝐸[𝑋𝑋 3 ] − 3𝜇𝜇𝐸𝐸[𝑋𝑋 2 ] + 2𝜇𝜇 3 = 120𝜃𝜃 3 − 3(4𝜃𝜃)(20𝜃𝜃 2 ) + 2(4𝜃𝜃)3 = 8𝜃𝜃 3
𝜇𝜇 8𝑑𝑑3
Coefficient of skewness = 𝜎𝜎33 = (2𝑑𝑑)3 = 1 [2]
(iii)
𝑛𝑛 𝑛𝑛
𝑥𝑥𝑖𝑖3 −𝑥𝑥𝑖𝑖 ∏𝑖𝑖=1
𝑛𝑛
𝑥𝑥𝑖𝑖3 −∑𝑖𝑖=1 𝑥𝑥𝑖𝑖
𝐿𝐿(𝜃𝜃) = � 4 𝑒𝑒 𝑑𝑑 = 𝑛𝑛 4𝑛𝑛 𝑒𝑒 𝑑𝑑
6𝜃𝜃 6 𝜃𝜃
𝑖𝑖=1
𝑛𝑛 ∑𝑛𝑛
𝑖𝑖=1 𝑥𝑥𝑖𝑖
log 𝐿𝐿(𝜃𝜃) = log(∏𝑖𝑖=1 𝑥𝑥𝑖𝑖3 ) − 𝑛𝑛 log(6) − 4𝑛𝑛 log(𝜃𝜃) − [1]
𝑑𝑑
𝜕𝜕 −4𝑛𝑛 ∑𝑛𝑛
𝑖𝑖=1 𝑥𝑥𝑖𝑖
log 𝐿𝐿(𝜃𝜃) = + =0 [1]
𝜕𝜕𝑑𝑑 𝑑𝑑 𝑑𝑑2
Solving this equation leads to:

𝑛𝑛
∑ 𝑋𝑋 𝑋𝑋�
𝜃𝜃� = 𝑖𝑖=1 𝑖𝑖 = [1]
4𝑛𝑛 4
and this is maximum since:

𝜕𝜕 2
log 𝐿𝐿(𝜃𝜃)� < 0
𝜕𝜕𝜃𝜃 2 �
𝑑𝑑
1 1 1
(iv) 𝐸𝐸�𝜃𝜃�� = 4 𝐸𝐸[𝑋𝑋�] = 4
𝐸𝐸[𝑋𝑋] = 4
4𝜃𝜃 = 𝜃𝜃, hence, 𝜃𝜃� is unbiased. [1]
796.2 7.962
(v) 𝑋𝑋� = 100 = 7.962, implies 𝜃𝜃� = 4
= 1.9905 [1]
1 796.22
(vi) (a) 𝑠𝑠 2 = 99 �8189.4 − 100
� = 18.69 [1]
(b) 𝜎𝜎 2 = 4𝜃𝜃 2 and 4𝜃𝜃� 2 = 15.848, 𝑠𝑠 2 is a bit larger than the variance at 𝜃𝜃�. [1]
(vii) Sample coefficient 1.12 is close to the distribution value 1. [1]
S2019
𝑠𝑠2
(viii) Approximate 95% CI for 𝜇𝜇 is 𝑥𝑥̅ ± 1.96� [1]
𝑛𝑛
Since 𝜇𝜇 = 4𝜃𝜃, we obtain an approximate 95% CI for 𝜃𝜃:
1 𝑠𝑠2
4
�𝑥𝑥̅ ± 1.96� 𝑛𝑛 � [1]
We obtain:
1 18.69
4
�7.962 ± 1.96� 100 � i.e. (1.779, 2.202) [1]
(ix) (a)
The lower limit of the variance is 4 × 1.7792 = 12.66 and the upper limit is 4 ×
2.2022 = 19.40. [1]
(b)
The value 𝑠𝑠 2 falls within these values, confirming that 𝑠𝑠 2 is close to 4𝜃𝜃� 2 . [1]
[Total 20]
Performance on this question was mixed. Part (i) was generally well answered – some
candidates attempted to derive the MGF which is not required here. In part (ii) there were
several algebraic errors. Parts (iii) and (iv) were well attempted, while answers in parts
(v)-(vii) were generally weak for those candidates that attempted them. There were some
reasonable attempts in part (viii). Many candidates failed to scale the CI down by a
quarter, while another common error was not using the sample standard deviation. Note
that a valid alternative answer can be given in part (viii) with the use of asymptotic
normality and the Cramér-Rao lower bound for the variance. Part (ix) was poorly
answered.
S2019
EXAMINATION

Core Principles
If you encounter any issues during the examination please contact the Examination Team
on T. 0044 (0) 1865 268 873

1 Let X1 , X2 ,…, X81 be independent and identically distributed continuous random
variables, each with expected value μ = E Xi = 5, and variance σ2 = V Xi = 4.
(i) Determine the sampling distribution of the statistic T = ∑81

i=1 Xi . [2]
(ii) Calculate the probability P(T > 369) using your answer to part (i). [2]
[Total 4]
2 A pair of fair six-sided dice is rolled once.
(i) Identify which one of the following options gives the probability that the sum
of the two dice is seven:
1
A1
36
1
A2
6
1
A3
12
1
A4
3
[2]
(ii) Identify which one of the following options gives the probability that at least
one dice shows three:
25
A1
36
1
A2
36
11
A3
36
5
A4
36
[2]
(iii) Identify which one of the following options gives the probability that at least
one dice shows an odd number:
1
A1
4
3
A2
4
1
A3
2
1
A4
12
[2]
CS1A S2020–2
The random variables representing the numbers on the first and second dice are
denoted by X and Y respectively.
(iv) (a) Identify which one of the following options gives the correct
expression of E (X + Y | X = 4), that is the conditional expectation of the
sum of the two dice given that X = 4:
A1 E(Y)
A2 E X + E(Y)
A3 4E(X) + E(Y)
A4 4 + E(Y)
[1]
(b) State a necessary assumption for deriving the answer in part (iv)(a).
[1]
(c) Determine the value of E(X + Y | X = 4), using your answer to part
(iv)(a). [2]
[Total 10]
3 The following data are available on three television factories that produce all the
televisions used in a country.
% of total Probability of
Factory
production defect (Def)
A 0.35 0.020
B 0.40 0.015
C 0.25 0.010
A television is selected at random and found to have a defect (Def).
(i) Identify which one of the following expressions gives the required expression
to correctly calculate the probability that the selected television was made in
factory B.
P(made in B | Def) × P(Def)
A1
P(made in A | Def)P(Def) + P(made in B | Def)P(Def) + P(made in C | Def)P(Def)
P(Def | made in B) × P(made in B)

A2
P(Def | made in A)P(made in A) + P(Def | made in B)P(made in B) + P(Def | made in C)P(made in C)
P(Def | made in B) + P(made in B)

A3
[P(Def | made in A) + P(made in A)] × [P(Def | made in B) + P(made in B)] × [P(Def | made in C) + P(made in C)]
P(Def | made in B)
A4
P(Def | made in A) + P(Def | made in B) + P(Def | made in C)
[2]
(ii) Calculate, by using your answer to part (i), the probability that the selected
television was produced by Manufacturer B. [2]
[Total 4]
CS1A S2020–3
4 A random variable Y has probability density function
f y = ae – 5y , y > b,
where a, b are positive constants.
The moment generating function of Y is denoted by MY (t).
(i) Write down the bounds of the integration required to calculate MY (t). [1]
(ii) Identify which one of the following options gives the correct expression for
MY (t). [2]
e – 1 – 5t b
A1 a
1 – 5t
a e – 1 – 5t b
A2
b 1 – 5t
ae– 5–t b
A3
b 5–t
e– 5–t b
A4 a
5–t
(iii) Write down the condition on t for MY (t) to be finite. [1]
(iv) Determine an expression giving the constant a in terms of b, using your

answer for MY (t) from part (ii). [3]
[Total 7]
CS1A S2020–4
5 Consider a regression model in which the response variable Yi is linked to the
explanatory variable Xi by the following equation:
Yi = a + bXi + ei, i = 1, ..., n
assuming that the error terms ei are independent and Normally distributed with
expectation 0 and variance 2. In a sample of size n = 10, the following statistics have
been observed:
n n
xi = 141, yi = 127 ,
i=1 i=1
n n n
x2i = 2,014, y2i = 1,629 , xiyi = 1,810.

i=1 i=1 i=1
(i) Calculate values for Sxx , Syy , and Sxy . [3]
(ii) Write down, using your answers to part (i), the value of Pearson’s correlation
coefficient between the variables Xi and Yi. [1]
(iii) Calculate estimates of the parameters a and b in the regression model. [2]
[Total 6]
CS1A S2020–5
6 (i) State the three components of a Generalised Linear Model (GLM). [3]
In a mortality model, the number of deaths Dx at age x is modelled with a GLM. Dx is

assumed to have a Poisson distribution with expectation mx = exp(a + bx) for each age
x, such that Dx ~ Poisson(exp(a + bx)).
(ii) State the specific form of each of the three components of the GLM for the
above mortality model. [3]
(iii) Identify which one of the following expressions gives the correct likelihood
function as a function of the unknown parameters a and b based on the
observed number of deaths for all ages 20 to 80 given by d20 ,…, d80 , assuming
that the numbers of deaths at different ages are independent.
80 80
1 – e a + bx a + bx d .
A1 L a, b = P[Dx = dx ] = e e x
dx !
x = 20 x = 20
80 80
a + bx .
A2 L a, b = P[Dx = dx ] = ee e a + bx dx
x = 20 x = 20
80 80
1 – e a – bx a – bx d .
A3 L a, b = P[Dx = dx ] = e e x
dx !
x = 20 x = 20
80 80
1 e a + bx .
A4 L a, b = P[Dx = dx ] = e–
a+ bx dx
e
dx !
x = 20 x = 20
[2]
An analyst is reviewing the mortality model and is considering deaths only for ages
between 40 to 43 inclusive.
The analyst collects data for deaths and estimates the parameters for 𝑎 and 𝑏 as
follows:
d40 = 2 d41 = 3 d42 = 1 d43 = 0
a = 0.01512 b = –0.00686
(iv) Identify, using your answer to part (iii), which one of the following options
gives the correct value of the likelihood function, based on the analyst’s data
and parameter estimates.
A1 0.00222
A2 4.05473
A3 0.0008
A4 4.32729
[2]
[Total 10]
CS1A S2020–6
7 The probability density function of a Normal distribution is given as follows:
1 1
f x; m, s2 = exp – x–m 2
s√2π 2s2
with –∞ < x < ∞, –∞ < m < ∞, s > 0.
(i) Identify which one of the following options gives the correct expression for
the exponential family of the density f. [2]
1 xm – m2 ⁄2 x2
A1 exp – – ln s
√2π s2 2s2
xm – m2 ⁄2 x2 ln (2πs2 )
A2 exp – 2–
s2 2s 2
x 2m – x m 2 ⁄2 ln( 2πs2 )
A3 exp – –
2s2 s2 2
1 x2 ln( 2πs2 )
A4 exp xm – m2 ⁄2 – –
s2 2 2
(ii) Identify which one of the following options gives the natural parameter θ, the
scale parameter ϕ, and the relevant functions b θ , a ϕ and c(x, ϕ) of the
exponential family for this distribution, using your answer to part (i).
s2 1
A1 θ = m, ϕ = s2 , b θ = m2 , a ϕ =
2
, c x, ϕ = –
2
x2 + ln( 2πs2 )
s2 s2 1 x2
A2 θ = m, ϕ = , b θ = m2 , a ϕ = , c x, ϕ = –
2 2 2 ss
+ ln( 2πs2 )
s2 1 ln( 2πs2 )
A3 θ = s2 , ϕ = m, b θ = m2 , a ϕ = , c x, ϕ = –
2 2
x2 +
2
m2 1 x2
A4 θ = m, ϕ = s2 , b θ =
2
, a ϕ = s2 , c x, ϕ = –
2 s2
+ ln( 2πs2 )
[3]
An analyst found that the mean and standard deviation of this distribution are
E X = m and SD X = s2 . In your answer you may denote θ by theta and ϕ by phi.
(iii) Justify, using the properties of the exponential family, whether or not the
analyst is right about the mean and standard deviation of this distribution. [3]
(iv) Contrast a numerical variable and a factor covariate in the context of a

generalised linear model. [2]
[Total 10]
CS1A S2020–7
8 A statistician has recorded the number of advertising telephone calls that their office
received over 2 years. The statistician has recorded data Xij, which is the number of
calls received in the ith quarter of the jth year (where i = 1, 2, 3, 4 and j = 1, 2):
Xi1 Xi2 X̅ i ∑j ( Xij – X̅ i)2

i=1 43 29 36 98
i=2 38 42 40 8
i=3 22 18 20 8
i=4 68 56 62 72
(i) Calculate values for:
(a) E m(θ)
(b) E s2 (θ)
(c) Var m θ .
[4]
(ii) Calculate an estimate for X13, the number of advertising telephone calls that the
statistician’s office expects to receive in the first quarter of year 3, using your
answers to part (i) and the assumptions of the Empirical Bayes Credibility
Theory Model 1 (EBCT Model 1). [2]
(iii) (a) State two key assumptions underlying the EBCT Model 1.
(b) Explain what these assumptions mean for the data Xij above.
[4]
[Total 10]
CS1A S2020–8
9 For an empirical investigation into the amount of rent paid by tenants in a town, data
on income X and rent Y have been collected. Data for a total of 300 tenants of one-
bedroom flats have been recorded. Assume that X and Y are both Normally distributed
with expectations μX and μY ,and variancesσ2X and σ2Y . SX and SY are the sample
standard deviation for random samples of X and Y, respectively.
The random variable ZX is defined as
S2
ZX = 299 X2 .
σX
(i) State the distribution of ZX and all of its parameters. [2]
(ii) Write down the expectation and variance of ZX . [2]
(iii) Explain why the distribution of ZX is approximately Normal. [2]
(iv) Calculate values of an approximate 2.5% quantile and 97.5% quantile of the
distribution of ZX using your answers to parts (ii) and (iii). [3]
In the collected sample, the mean income is $1,838 with a realised sample standard
deviation of $211, the mean rent is $608 with a realised sample standard deviation of
$275 and Σxiyi = 348 × 106 .
(v) Calculate a 95% confidence interval for the mean income. [2]
(vi) Calculate a 95% confidence interval for the mean rent. [2]
(vii) Calculate an approximate 95% confidence interval for the variance of income
using your answer to part (iv). [2]
(viii) Identify which one of the following options gives the correct form of the
equation for the simple linear regression model of rent on income, including
any assumptions required for statistical inference. [2]
A1 yi = a + bxi
A2 yi = a + bxi + zi with E zi = 0
A3 yi = a + bxi + zi with zi ~ χ2 , 299 df
A4 yi = a + bxi + zi with zi ~ N(0, σ2 )
(ix) Calculate estimates of the slope and the intercept of the model in part (viii)
based on the above data for the 300 tenants. [4]
[Total 21]
CS1A S2020–9
10 It is thought that house prices in certain areas are correlated with the quality of
schools in the same areas. A study has been carried out in ten regions where average
house prices and school quality indices ranging from 1 (very poor) to 10 (excellent)
have been recorded:
Region i 1 2 3 4 5 6 7 8 9 10
School index xi 9 5 7 6 4 9 7 8 5 6
House prices yi
210 185 190 190 170 195 180 195 160 150
(£1,000s)
xiyi =12,240; x2i = 462; y2i = 335,975.
(i) State what is meant by response and explanatory variables in a linear

regression. [1]
A plot of the data is given below.
(ii) Comment on the relationship between school quality index and house price,
using the plot. [2]
Pearson’s correlation coefficient between the data is given as r = 0.7.
(iii) A statistical test is performed, using Fisher’s transformation, to determine

whether Pearson’s population correlation coefficient is significantly different
from zero, i.e. for
H0 : ρ = 0 vs H1 : ρ ≠ 0.
(a) Identify which one of the following options gives the correct value of
the test statistic for this test:
A1 2.295
A2 6.071
A3 2.743
A4 4.009
[2]
CS1A S2020–10
(b) Write down the conclusion of the test at the 5% level of significance,
including the relevant critical value(s) from the Actuarial Formulae
and Tables. [3]
The linear regression line, of house prices (y) on school index (x), is given as
y = 133.8 + 7.386x.
(iv) A t test is performed to determine if the slope parameter is significantly

different from 0.
(a) Identify which one of the following options gives the correct values of
the sums Sxx , Syy , Sxy for the house prices (y) and school index (x) data:
A1 Sxx = 32.8; Syy = 2,415.4; Sxy = 235

A2 Sxx = 20.5; Syy = 3,131.2; Sxy = 182
A3 Sxx = 26.4; Syy = 2,912.5; Sxy = 195
A4 Sxx = 35.2; Syy = 2,817.4; Sxy = 247
[2]
(b) Calculate the value of the test statistic. [2]
(c) Write down the distribution of the test statistic, if the null hypothesis of
the test is correct. [1]
(d) Write down the conclusion of the test at the 5% level of significance,
including the relevant critical value(s) from the Actuarial Formulae
and Tables. [3]
(v) Comment on the results in parts (iii)(b) and (iv)(d). [2]

[Total 18]
END OF PAPER
CS1A S2020–11
EXAMINERS’ REPORT
September 2020
Subject CS1 Paper A – Actuarial Statistics

Core Principles
Introduction
Mike Hammer
December 2020
 Institute and Faculty of Actuaries

Subject CS1 Paper A (Actuarial Statistics – Core Principles) – September 2020 – Examiners’ report
1. The aim of the Actuarial Statistics subject is to provide a grounding in mathematical and
presented in this report, or different ways in which the provided answer can be
determined. All mathematically correct and valid alternative solutions or answers
received credit as appropriate.
3. Rounding errors were not penalised, but candidates lost marks where excessive rounding
led to significantly different answers.
4. In cases where the same error was carried forward to later parts of the answer, candidates
were given appropriate credit for the later parts.
6. The paper included a number of multiple choice questions, where showing working was
not required as part of the answer.
In all multiple choice questions, the details provided in the answers below (e.g.
calculations) are for information. Candidates were not be required to show working.
7. In all numerical questions that were not multiple-choice, full credit was given for correct
answers that also included appropriate workings.
8. Standard keyboard typing was accepted for mathematical notation.
B. Comments on candidate’ performance in this diet of the examination.
1. Performance was very satisfactory in general, with most candidates showing very good
understanding of the topics in this subject. Well prepared candidates were able to score
highly.
2. A smaller number of candidates appeared to be inadequately prepared, in terms of not

having covered sufficiently the entire breadth of the subject.
3. Topics that were not particularly well answered in this paper include moment generating
functions (Q4), GLMs (Q6) and non-standard CIs (Q9(iv), (vii)).
4. Questions that required higher order skills and comments were generally not well
answered (e.g. Q8(iii)(b), Q9(iii), Q10(v)).
5. Questions corresponding to parts of the syllabus that had not been recently examined
were generally poorly answered (e.g. Q4). This highlights the need for candidates to
cover the whole syllabus when they revise for the exam and not only rely on themes
appearing in past papers.
CS1A S2020 ©Institute and Faculty of Actuaries

C. Pass Mark
The Pass Mark for this exam was 60.

1189 presented themselves and 823 passed.

Solutions for Subject CS1 Paper A September 2020
Q1
(i)
From the Central Limit Theorem, approximately
𝑇𝑇 ~ 𝑁𝑁(81 x 5 , 81 x 4), 𝑖𝑖. 𝑒𝑒. 𝑁𝑁(405,182 ), 𝑜𝑜𝑜𝑜𝑁𝑁(405, 324) [2]
(ii)
Standardising, we get:
𝑇𝑇−405 369−405
𝑃𝑃(𝑇𝑇 > 369) = 𝑃𝑃 � > � [1]
18 18
≈ 𝑃𝑃(𝑍𝑍 > −2) [½]
= 0.97725 [½]
using tables.
[Total 4]
Generally very well answered. In part (ii) some candidates applied a continuity correction,
which was not needed.
Q2
(i) Ans: A2 [2]
𝑃𝑃[𝑋𝑋 + 𝑌𝑌 = 7] = 𝑃𝑃[𝑋𝑋 = 1, 𝑌𝑌 = 6] + 𝑃𝑃[𝑋𝑋 = 2, 𝑌𝑌 = 5] + 𝑃𝑃[𝑋𝑋 = 3, 𝑌𝑌 = 4] +

𝑃𝑃[𝑋𝑋 = 4, 𝑌𝑌 = 3]𝑃𝑃 + [𝑋𝑋 = 5, 𝑌𝑌 = 2] + 𝑃𝑃[𝑋𝑋 = 6, 𝑌𝑌 = 1]
1 1
= 𝑃𝑃[𝑋𝑋 = 1]𝑃𝑃[𝑌𝑌 = 6] + ⋯ + 𝑃𝑃[𝑋𝑋 = 6]𝑃𝑃[𝑌𝑌 = 1] = 6 × =
36 6
(ii) Ans: A3 [2]
𝑃𝑃[𝑋𝑋 = 3, 𝑌𝑌 = 3] + 𝑃𝑃[𝑋𝑋 = 3, 𝑌𝑌 ≠ 3] + 𝑃𝑃[𝑋𝑋 ≠ 3, 𝑌𝑌 = 3] =

5 5 11
1 − 𝑃𝑃[𝑋𝑋 ≠ 3, 𝑌𝑌 ≠ 3] = 1 − × =
6 6 36
(iii) Ans: A2 [2]

1 1 3
1 − 𝑃𝑃[𝑋𝑋 ∈ {2,4,6}]𝑃𝑃[𝑌𝑌 ∈ {2,4,6}] = 1 − × =
2 2 4
(iv) (a) Ans: A4 [1]
𝐸𝐸[𝑋𝑋 + 𝑌𝑌|𝑋𝑋 = 4] = 𝐸𝐸[𝑋𝑋|𝑋𝑋 = 4] + 𝐸𝐸[𝑌𝑌|𝑋𝑋 = 4] = 4 + 𝐸𝐸[𝑌𝑌]
(b) We assume that 𝑋𝑋 and 𝑌𝑌 are independent (pair of fair dice). [1]

1 21
(c) 𝐸𝐸[𝑋𝑋 + 𝑌𝑌|𝑋𝑋 = 4] = 4 + (1 + 2 + ⋯ + 6) = 4 + = 7.5 [2]
6 6
[Total 10]
The question was very well answered by candidates.
Q3
(i) Ans: A2 [2]
The required probability is: P(TV made in Factory B | defective)
Using Bayes’ theorem:
P(defective|made in factory B) × P(made in factory B)

=
P(defective|factory A)P(factory A) + P(defective|factory B)P(factory B)+P(defective|factory C)P(factory C)
(ii) P(TV made in Factory B | defective)
0.015 × 0.4
= = 0.38710
0.02 × 0.35 + 0.015 × 0.4 + 0.01 × 0.25
[2]
[Total 4]
Part (i) was well answered. In part (ii), a number of candidates despite identifying the correct
answer in (i), went on to calculate incorrect probabilities. In some cases this was due to
misinterpreting the probabilities in the table.
Q4
(i) Integrate from b to plus infinity. [1]
(ii) Ans: A4 [2]
Moment generating function of Y is:

∞
𝑀𝑀𝑌𝑌 (𝑡𝑡) = 𝐸𝐸(𝑒𝑒 𝑡𝑡𝑌𝑌 ) = ∫𝑏𝑏 𝑒𝑒 𝑡𝑡𝑡𝑡 𝑎𝑎𝑒𝑒 −5𝑡𝑡 𝑑𝑑𝑑𝑑
∞
= 𝑎𝑎 ∫𝑏𝑏 𝑒𝑒 −(5−𝑡𝑡)𝑡𝑡 𝑑𝑑𝑑𝑑

𝑒𝑒 −(5−𝑡𝑡)𝑦𝑦 ∞ 𝑎𝑎𝑒𝑒 −(5−𝑡𝑡)𝑏𝑏

= 𝑎𝑎 �− � 𝑏𝑏 =
5−𝑡𝑡 5−𝑡𝑡
(iii) t < 5 [1]
(iv) Evaluating the function at t = 0 gives 1. [1]
We obtain a = 5e^(5b) [2]
[Total 7]
This question was not answerd well in general – particulalrly parts (iii) and (iv). This was a
type of question that is not examined very often. Candidates are advised to cover the whole
syllabus when they revise for the exam and not only rely on themes appearing in past
papers.
Q5
1412
(i) 𝑆𝑆𝑥𝑥𝑥𝑥 = 2014 − = 25.9 [1]
10
1272
𝑆𝑆𝑡𝑡𝑡𝑡 = 1629 − = 16.1 [1]
10
141×127
𝑆𝑆𝑥𝑥𝑡𝑡 = 1810 − = 19.3 [1]
10
19.3
(ii) 𝑜𝑜 = = 0.9451364 [1]
√25.9×16.1
19.3
(iii) 𝑏𝑏� = = 0.745 [1]
25.9
127 141
𝑎𝑎� = − 0.745 × = 2.193 [1]
10 10
[Total 6]
This is a typical regression/correlation question and was answered very well.

Q6
Q6
(i) A distribution of the response variable 𝑌𝑌. [1]

A “linear predictor” 𝜂𝜂 [1]
A “link function” 𝑔𝑔 [1]
(ii) The distribution of the response 𝐷𝐷𝑥𝑥 is a Poisson distribution. [1]

The linear predictor 𝜂𝜂𝑥𝑥 = 𝑎𝑎 + 𝑏𝑏𝑏𝑏. [1]
The link function is the logarithm since 𝑙𝑙𝑜𝑜𝑔𝑔(𝐸𝐸[𝐷𝐷𝑥𝑥 ]) = 𝜂𝜂𝑥𝑥 . [1]

(iii) Ans: A1 [2]
(iv) Ans: A3 [2]
[Total 10]
Parts (i) and (ii) were well answered, whereas many candidates gave wrong answers in parts
(iii) and (iv). These concerned a direct application of likelihood estimation in a less typical
scenario, as compared to the setting usually appearing in estimation questions.
Q7
(i) Ans: A2 [2]
(ii) Ans: A4 [3]
(iii) The expectation of X is correct.

This is obtained by taking the derivative of b(theta). [1]
The standard deviation is not correct. In fact it is the variance that is s squared. [1]
It is obtained by taking the second derivative of b(theta) and multiply by
a(phi). [1]
(iv) A factor takes a categorical value and for a factor with k levels, there are generally
k parameters. [1]
For a numerical variable, the value is included as such in the linear predictor and
there is a single parameter in the model for each numerical variable. [1]
[Total 10]
Parts (i) and (ii) were well answered. Part (iii) was overall answered well, with a common
problem of failing to identify that the standard deviation is incorrect, or not making any
comments. Part (iv) was poorly answered, often with no mention regarding parameters and
levels.
(i ) i l i f ll k h i
Q8
(i) In this case, n = 2 and N = 4. Therefore the estimates are:

1 1
(a) E[m(𝜃𝜃)] = 𝑏𝑏̅ = ∑4𝑖𝑖=1 𝑏𝑏̅𝑖𝑖 = (36 + 40 + 20 + 62) = 39.5 [1]
4 4
1 1
(b) E[s2(𝜃𝜃)] = ∑4𝑖𝑖=1 ∑2𝑖𝑖=1(𝑏𝑏𝑖𝑖𝑖𝑖 − 𝑏𝑏̅𝑖𝑖 )2 = (98 + 8 + 8 + 72)/4 = 46.5 [1]
4 1
1 1 1 1
(c) Var[m(𝜃𝜃)] = ∑4𝑖𝑖=1( 𝑏𝑏̅ i -𝑏𝑏̅ )2 - � ∑4𝑖𝑖=1 ∑2𝑖𝑖=1(𝑏𝑏𝑖𝑖𝑖𝑖 − 𝑏𝑏̅𝑖𝑖 2 )�
3 2 4 1

1 1
= [(36 – 39.5)2 + (40 – 39.5)2 + (20 – 39.5)2 + (62 – 39.5)2] - (46.5)
3 2
2 1
= 299 − 23
3 4
= 276.42 [2]
(ii) The credibility factor is:

2
46.5 = 0.92241 [1]
2+276.41
And the estimate of X13 is (0.92241 x 36) + (1 – 0.92241) x 39.5 = 36.272 [1]
(iii)(a) Assumption 1
The distribution of each Xij (i = 1, 2, 3, 4 and j = 1, 2) depends on the value of a
parameter 𝜃𝜃i, whose value is fixed, unknown, and the same for each value of j. [1]
Assumption 2
Given 𝜃𝜃i (i = 1, 2, 3, 4), Xij (j = 1, 2) are independent and identically distributed. [1]
(iii)(b) For the given data, the assumptions can be interpreted as saying:
- The number of calls received follows a distribution with a parameter that varies
according to the time of year, but that is constant between years. [2]
[Total 10]
Parts (i) and (ii) were well answerd – except from (i)(c) where the calculation of the
variance was often incorrect. Part (iii)(b) was poorly answered, with the interpretation in
the context of the question scenario being handled poorly. Note that alternative
assumptions (as in the Core Reading) were given credit as appropriate.
Q9
(i) 𝑍𝑍𝑋𝑋 has a chi-squared distribution [1]

with 𝑛𝑛 − 1 = 299 degrees of freedom [1]
(ii) 𝐸𝐸[𝑍𝑍𝑋𝑋 ] = 299 [1]

and 𝑉𝑉𝑎𝑎𝑜𝑜(𝑍𝑍𝑋𝑋 ) = 598 [1]
(iii) A chi-squared distribution with 299 degrees of freedom is the distribution of a

sum of 299 independent random variable that are all squared standard normally
distributed. [1]
It follows from the CLT that a chi-squared distribution with a large number of
degrees of freedom can be approximated with a normal distribution. [1]

𝑍𝑍𝑥𝑥 −299 𝑞𝑞−299 𝑞𝑞−299

(iv) 𝑃𝑃[𝑍𝑍𝑥𝑥 ≤ 𝑞𝑞] = 𝑃𝑃 � ≤ � = 𝑃𝑃 �𝑍𝑍 ≤ �
√598 √598 √598
𝑞𝑞97.5 −299
= 1.96 and 𝑞𝑞97.5 = 299 + 1.96 × √598 = 346.93 [1½]
√598
𝑞𝑞2.5 −299
= −1.96 and 𝑞𝑞2.5 = 299 − 1.96 × √598 = 251.07 [1½]
√598
(v) 95% confidence interval (using normal approximation of 𝑡𝑡-distribution):

211 211
Income: �1838 − 1.96 , 1838 + 1.96 � [1]
√300 √300
= [1814.12, 1861.88] [1]
(vi) 95% confidence interval (using normal approximation of 𝑡𝑡-distribution):

275 275
Rent: �608 − 1.96 , 608 + 1.96 � [1]
√300 √300
= [576.88,639.12] [1]
299×2112 299×2112 299×2112 299×2112

(vii) � 2 , 2 �≈� , � [1]
𝜒𝜒0.975 𝜒𝜒0.025 346.93 251.07
= [38370.22, 53020.19] [1]
(viii) Ans: A4 [2]
(ix) 𝑆𝑆𝑥𝑥𝑡𝑡 = ∑ 𝑏𝑏𝑖𝑖 𝑑𝑑𝑖𝑖 − (∑ 𝑏𝑏𝑖𝑖 )(∑ 𝑑𝑑𝑖𝑖 )/𝑛𝑛 = 348 × 106 − 1838 × 300 × 608
= 12,748,800
[1]
2
𝑆𝑆𝑥𝑥𝑥𝑥 = 299 × 211 = 13,311,779
[1]
𝑆𝑆𝑥𝑥𝑦𝑦 12,748,800
𝑏𝑏� = = = 0.9577082
𝑆𝑆𝑥𝑥𝑥𝑥 13,311,779
[1]
𝑎𝑎� = 𝑑𝑑� − 𝑏𝑏�𝑏𝑏̅ = 608 − 0.9577082 × 1838 = −1152.268
[1]
[Total 21]
Parts (i) and (ii) were well answered. In part (iii) the reasoning was often inadequate. Parts
(iv) and (vii) were poorly answered or unattempted, with many candidates failing to
calculate the quantiles required. Parts (viii) and (ix) were reasonably well answered.
Q10
(i) In bivariate data, the response variable is a random variable whose value is
influenced by the explanatory variable. [1]
(ii) There is an increasing and relatively linear relationship. [1]

However the trend and linearity are not very clear around values x = 5, 6. [1]

(iii) (a) Ans: A1 [2]
1 1+𝑟𝑟 1 1+𝜌𝜌
𝑊𝑊 = log � � is normally distributed with mean log � � and standard deviation
2 1−𝑟𝑟 2 1−𝜌𝜌
1⁄√𝑛𝑛 − 3. 𝑊𝑊 = 0.8673 and 𝑊𝑊~𝑁𝑁(0, 1⁄7).
1 0.5
Test statistic = 0.867�� = 2.295.
7
(b) This is a two-sided test with the 2.5% critical values being -1.96 and 1.96 [2]
So we reject 𝐻𝐻0 at 5% significance level and conclude that Pearson’s correlation
coefficient is significantly different from zero. [1]
[Alternatively, use p-value = 0.022 for same conclusion.]
(iv) (a) Ans: A3 [2]

662
𝑆𝑆𝑥𝑥𝑥𝑥 = 462 − = 26.4
10
18252
𝑆𝑆𝑡𝑡𝑡𝑡 = 335975 − = 2912.5
10
66×1825
𝑆𝑆𝑥𝑥𝑡𝑡 = 12240 − = 195
10
(b)
1 1952
𝜎𝜎� 2 = �2912.5 − � = 184.02
8 26.4
s.e.�𝛽𝛽̂� = (𝜎𝜎� 2 ⁄𝑆𝑆𝑥𝑥𝑥𝑥 )1⁄2 = (184.02⁄26.4)1⁄2 = 2.64
195
𝛽𝛽̂ = = 7.386
26.4
Test statistic = 7.386⁄2.64 = 2.80 [2]
(c)
The test statistic follows a t-distribution with 8 df under the null hypothesis. [1]
(d)
This is a two-sided test with the 2.5% critical values being -2.306 and 2.306. [2]
We have evidence at 5% significance level to reject the null hypothesis that
𝛽𝛽 = 0. [1]
(v) The two tests are actually similar therefore it is not surprising that they yield to the
same conclusion that there is a linear relationship between house prices and school
indices. [2]
[Total 18]

There were no particular isues with part (i). In part (ii), many candidates failed to make
any comment regarding the unclear trend in part of the data. A common error in parts (iii)
and (iv) was to not use a two sided test. Part (v) was poorly answered, often with no
mention of the two tests being similar.

EXAMINATION
15 April 2021 (am)
Subject CS1 – Actuarial Statistics

Core Principles
Paper A
In addition to this paper you should have available the 2002 edition of the
Formulae and Tables and your own electronic calculator.
If you encounter any issues during the examination please contact the Assessment Team on
T. 0044 (0) 1865 268 873.

1 A random variable, X, is modelled using a gamma distribution with parameters α = 50
and λ = 0.25.
(i) Calculate an approximate value for P(X > 270) using the chi-square
distribution. [2]
(ii) Calculate an approximate value for P(X > 270) using the central limit theorem.
[4]
(iii) Comment on the difference between your answers to parts (i) and (ii). [2]
[Total 8]
2 Consider two random variables, X and Y. The conditional expectation and conditional
variance of Y given X are denoted by the two random variables U and V, respectively;
that is, U = E[Y|X] and V = Var[Y|X].
Assume that Y is Normally distributed with expectation 5 and variance 4. Also assume
that the expectation of V is 2.
(i) Calculate the expected value of U. [1]
(ii) Calculate the variance of U. [2]

[Total 3]
3 Consider two random variables, X and Y, with a uniform distribution on the interval
[0,1]; that is, X ∼ U(0,1) and Y ∼ U(0,1). Assume that X and Y are independent.
(i) Identify which one of the following options describes the moment generating
function of X:
1
A e–t – 1 for t ≠ 0
t
1
B et – 1 for t ≠ 0
t
1
C 1 – e–t for t ≠ 0
t
1
D 1 – et for t ≠ 0
t
[2]
(ii) Derive the value of the moment generating function MX (t) of X at t = 0. [1]
An analyst argues that the sum of X and Y must have a uniform distribution on the
interval [0,2] since both X and Y are uniformly distributed on [0,1].
(iii) Derive the moment generating function for the random variable Z with a
U(0,2) distribution. [2]
(iv) Comment on the analyst’s argument by determining if the random variable

Z = X + Y has a uniform distribution on [0,2] using moment generating
functions. [3]
[Total 8]
CS1A A2021–2
4 Consider a random sample of size n = 25 from a Normal distribution with mean 10,
variance 4 and sample variance S2 .
(i) Write down the sampling distribution of S2 . [2]
(ii) Calculate, using your answer in part (i), the expected value of S2 . [1]
(iii) Calculate, using your answer in part (i), the variance of S2 . [1]
[Total 4]
CS1A A2021–3
5 The joint probability density function of random variables X and Y is:
ke–(x + 2y) , x > 0, y > 0

f x,y =
0, otherwise
[Hint: You may find it helpful to define the functions gX x = e–x and gY y = e–2y ,
using this notation in your answers.]
(i) Demonstrate that X and Y are independent. [1]
(ii) Verify that k = 2. [3]
(iii) Demonstrate that fY (y), the marginal density function of Y, is:
2e – 2y for y > 0.
[1]
(iv) Demonstrate that the conditional density function f y|Y > 3 is:
f y|Y > 3 = 2e6 – 2y for y > 3.
[Hint: Consider P(Y ≤ y|Y > 3).] [3]
(v) Identify which one of the following expressions is equal to the conditional
expectation E Y|Y > 3]:
∞ –2t ∞
A 0
te dt + 0 3e–2t dy
∞ –2t ∞
B 0
te dt + 0 6e–2t dy
∞ ∞
C 0
2te–2t dt + 0 3e–2t dy
∞ ∞
D 0
2te–2t dt + 0 6e–2t dy
[1]
(vi) Determine the value of the conditional expectation E[Y|Y > 3]. [2]
(vii) Identify which one of the following options is the conditional expectation
E[Y2 |Y > 3]:
A 12.5
B 13.5
C 14.5
D 15.5.
[2]
(viii) Determine the conditional variance Var[Y|Y > 3]. [1]

[Total 14]
CS1A A2021–4
6 A tutor believes that the number of exams passed by students sitting three different
exams follows a binomial distribution with parameters n = 3 and p. A random sample
of 120 students showed the following results:
Number of exams passed 0 1 2 3

Number of students 40 60 15 5
(i) (a) Identify which one of the following corresponds to the log likelihood
function of p given the observed data:
A log L ∝ 255 log 1–p + 105log p

B log L ∝ 115 log 1–p + 80log p
C log L ∝ 265 log 1–p + 115log p
D log L ∝ 175 log 1–p + 85log p
[2]
(b) Show, using your answer to part (i)(a), that the maximum likelihood
estimate for 𝑝 is p = 0.2917. You are not required to check that it is a
maximum. [3]
(ii) Perform a goodness of fit test for the binomial model Bin(3, p) at a
significance level of 5%. [8]
[Total 13]
CS1A A2021–5
7 A telecommunications company has performed a small empirical study comparing
phone usage in rural and urban areas, collecting data from a total of 35 people who
use their phones independently. The average number of hours that each person spent
using their phone during a week is denoted by Y.
In the following table, Y, denotes the sample mean of Y in rural and urban areas, and
1
SY denotes the sample standard deviations; that is, S2Y = ∑ni= 1 Yi – Y 2 .
n–1
Sample size
Y SY
n
Rural areas 15 3.7 2.1
Urban areas 20 4.4 1.9
A statistical test is to be performed, at the 5% significance level, to determine whether

the null hypothesis that mean phone usage in rural areas is the same as mean phone
usage in urban areas, i.e. for:
H0 : phone usage is equal versus H1 : phone usage is not equal.
(i) State a suitable distribution for the test statistic with its parameter(s). [1]
(ii) Justify any assumption(s) required to perform this test. [2]
(iii) Identify which one of the following options gives the correct value of the test
statistic for this test:
A –1.031
B –0.519
C –3.019
D –1.455.
[2]
(iv) Write down the conclusion of the test including the relevant critical value(s)
from the Actuarial Formulae and Tables. [3]
(v) Determine a 95% confidence interval for the mean phone usage (hours per
week) for rural areas, stating any assumption(s) you make. [4]
[Total 12]
CS1A A2021–6
8 An initial investigation into climate change has been conducted using climate change
data from the past 50 years, collected by the International Meteorological Society. For
each year, t, the number of consecutive days, d, of extreme weather was recorded. The
total number of days in any year is 365 and extreme weather is defined as a rainless
day with temperatures in excess of 28 degrees Celsius.
An Actuary has performed a preliminary statistical analysis on the data. Below is a

scatter plot of the Actuary’s findings:
Extreme weather over the past 50 years

350
300
250
200
Days d
150
100
50
0
0 5 10 15 20 25 30 35 40 45 50
Year t
The Actuary also fitted a least squares regression line for extreme weather days on
year, giving:
d = 147.39 – 5.82601t,
and calculated the coefficient of determination for this regression line as:
R2 = 91.5% .
(i) Comment on the plot and the Actuary’s analysis. [2]
A separate analysis, on the same data, is undertaken independently by a statistician.

Below are the key summaries of their analysis:
t = 1,275 t2 = 42,925 d = 8,502 d2 = 1,911,378 td = 282,724
(ii) Verify that the equation of the statistician’s least squares fitted regression line
of extreme weather days on year is given by:
d = 8.59592 + 6.33114t.
[3]
CS1A A2021–7
(iii) (a) Determine the standard error of the estimated slope coefficient in
part (ii).
(b) Test the null hypothesis of ‘no linear relationship’ at the 1%

confidence level, using the equation in part (ii).
(c) Determine a 99% confidence interval for the underlying slope

coefficient for the linear model, using the equation in part (ii).
[7]
Further climate change data are collected from an alternative independent data source,
also covering the past 50 years. These data were analysed and resulted in an estimated
slope coefficient of:
β = 5.21456 with standard error 1.98276
(iv) (a) Test the ‘no linear relationship’ hypothesis at the 1% confidence level
based on the further climate change data. [2]
(b) Determine a 99% confidence interval for the underlying slope

coefficient β based on the alternative climate change data. [2]
(v) Comment on whether or not the underlying slope coefficients, for the
statistician’s data in part (ii) and the independent data in part (iv), can be
regarded as being equal. [3]
(vi) Discuss why the results of the tests in parts (iii)(b) and (iv)(a) seem to
contradict the conclusion in part (v). [4]
[Total 23]
CS1A A2021–8
9 The number of claims received by a motor insurance company on any given day
follows a Poisson distribution with mean u. Prior beliefs about u are expressed
through a gamma distribution with parameters a and b. Over a period of n days the
observed number of claims received per day are x1 , x2 , …, xn .
(i) Identify which one of the following is the posterior density of u:
A f u|x ∝ ub + ∑ xi – 1 e – (a + n)u
B f u|x ∝ ua + ∑ xi e – (b + n + 1)u
C f u|x ∝ ua + ∑ xi – 1 e– (b + n)u
D f u|x ∝ ub + ∑ xi + 1 e– (a + n – 1)u
[3]
(ii) Write down the posterior density of the parameter u and specify its
parameters. [2]
(iii) (a) Determine the Bayesian estimate of u under quadratic loss. [2]
(b) Write down the Bayesian estimate of u under quadratic loss as a

credibility estimate and state the credibility factor. [2]
Suppose that a = 9, b = 3 and that the company receives 320 claims in total during a
6-day period.
(iv) Calculate the Bayesian estimate of u under quadratic loss. [2]
(v) Calculate the variance of the posterior distribution of u. [2]
An industry expert suggests that prior beliefs about u are better expressed through a
gamma distribution with parameters a = 18 and b = 6.
(vi) Explain how these prior beliefs would affect the variance of the posterior
distribution of 𝑢, without explicitly calculating the variance of the posterior
distribution. [2]
[Total 15]
END OF PAPER
CS1A A2021–9
EXAMINERS’ REPORT
April 2021

Core Principles
Paper A
Introduction
Paul Nicholas
July 2021
 Institute and Faculty of Actuaries

CS1A - Actuarial Statistics - Core Principles - April 2021 - Examiners’ report
1. The aim of the Actuarial Statistics subject is to provide a grounding in mathematical

and statistical techniques that are of particular relevance to actuarial work.
3. Rounding errors were not penalised, but candidates lost marks where excessive
rounding led to significantly different answers.
4. In cases where the same error was carried forward to later parts of the answer,
candidates were given appropriate credit for the later parts.
6. The paper included a number of multiple choice questions, where showing working
was not required as part of the answer.
7. In all multiple choice questions, the details provided in the answers below (e.g.
calculations) are for information. Candidates were not be required to show working.
8. In all numerical questions that were not multiple-choice, full credit was given for
correct answers that also included appropriate workings.
9. Standard keyboard typing was accepted for mathematical notation.
B. Comments on candidate’ performance in this diet of the examination.
1. Performance was satisfactory in general, with many candidates showing very good
understanding of the topics in this subject. Well prepared candidates were able to
score highly.
2. A smaller number of candidates appeared to be inadequately prepared, in terms of not

3. Questions that required higher order skills and comments were generally not well
answered (e.g. Q1(iii)(b), Q8(v),(vi)).
4. Questions corresponding to parts of the syllabus that are not frequently examined
were generally poorly answered (e.g. Q5). This highlights the need for candidates to
cover the whole syllabus when they revise for the exam and not only rely on themes
appearing in past papers.
5. There was a typing error in Q5(v) of the paper, where the correct answer should be
∞ ∞
shown as ∫0 2𝑡𝑡𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 + ∫0 6𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡. This is also related to the answers in parts (vii)
and (viii) of the question. The error was taken into account when marking the

question, with the Examiners applying flexibility in awarding credit where

appropriate. The pass mark for this exam was adjusted accordingly, to reflect the
marks that affected candidates might not have had the opportunity to score. The
Examiners did not find any evidence that the error had any further impact on
performance on the remainder of the paper.
C. Pass Mark
The Pass Mark for this exam was 56.

1,482 candidates presented themselves and 779 passed.
Solutions for Subject CS1 Paper A April 2021
Q1
(i)
2
X ~ Gamma(50, 0.25), and using X ~ Gamma(𝛼𝛼, 𝜆𝜆) ⇒ 2𝜆𝜆𝜆𝜆~𝜒𝜒2𝛼𝛼 : [1]
2
P(X > 270) = P(2𝜆𝜆X > 2𝜆𝜆 × 270) = P(0.5X > 135) = P(𝜒𝜒100 > 135) ≈ 0.01 [1]
(using the Actuarial Tables for chi-square probabilities)
(ii)
The mean and variance of the given gamma distribution are
𝛼𝛼 50 𝛼𝛼 50
𝐸𝐸(𝜆𝜆) = 𝜆𝜆 = 0.25 = 200, 𝑣𝑣𝑣𝑣𝑣𝑣(𝜆𝜆) = 𝜆𝜆2 = 0.252 = 800 [1]
Using the normal approximation for large 𝛼𝛼, X ~ Gamma(50, 0.25) can be approximated as
X ~ N(200, 800): [1]
270−200
P(X > 270) = 𝑃𝑃 = �𝑍𝑍 > � = P(Z > 2.4749) = 1 – P(Z < 2.4749) = 0.0066642
√800
Alternatively, use tables to interpolate, giving 0.00667
[2]
(iii)
The gamma distribution converges to the normal distribution as 𝛼𝛼 → ∞. [1]
But for 𝛼𝛼 = 50, the gamma distribution exhibits positive skew, [½]
and gives a higher tail probability than the symmetric normal distribution [½]
[Total 8]
Generally well answered. In part (i) some candidates did not calculate the probability
using the chi-square distribution, as the question asked. In (iii) a number of
candidates did not provide any comments.
Q2
(i) 𝐸𝐸[𝑈𝑈] = 𝐸𝐸�𝐸𝐸[𝑌𝑌|𝜆𝜆]� = 𝐸𝐸[𝑌𝑌] = 5 [1]
(ii) 𝑉𝑉𝑣𝑣𝑣𝑣(𝑈𝑈) = 𝑉𝑉𝑣𝑣𝑣𝑣(𝐸𝐸[𝑌𝑌|𝜆𝜆]) = 𝑉𝑉𝑣𝑣𝑣𝑣(𝑌𝑌) − 𝐸𝐸[𝑉𝑉𝑣𝑣𝑣𝑣(𝑌𝑌|𝜆𝜆)] = 4 − 2 = 2 [2]

[Total 3]

Generally well answered. There were a few slips in the derivation, resulting in
incorrect answers.
Q3
(i)
Answer B
1
𝑀𝑀𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑒𝑒 𝑡𝑡𝑋𝑋 ] = ∫0 𝑒𝑒 𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑
1
𝑀𝑀𝑋𝑋 (𝑡𝑡) = 𝑡𝑡 (𝑒𝑒 𝑡𝑡 − 1) for 𝑡𝑡 ≠ 0 [2]
(ii)
𝑀𝑀𝑡𝑡 (0) = expectation of exp(0*X) = 1 [1]
(iii)
For a 𝑈𝑈(0,2) distributed RV 𝑍𝑍, we have:
𝑀𝑀𝑍𝑍 (𝑡𝑡) = 𝐸𝐸[𝑒𝑒 𝑡𝑡𝑍𝑍 ] [½]
1 2 1
= 2 ∫0 𝑒𝑒 𝑡𝑡𝑡𝑡 𝑑𝑑𝑑𝑑 = 2𝑡𝑡 (𝑒𝑒 2𝑡𝑡 − 1) [1½]
(iv)
Since 𝜆𝜆 and 𝑌𝑌 are independent, [1]
the MGF of 𝜆𝜆 + 𝑌𝑌 is given by the product of the MGFs:
1
𝑀𝑀𝑋𝑋+𝑌𝑌 (𝑡𝑡) = 𝐸𝐸[𝑒𝑒 𝑡𝑡𝑋𝑋 ]𝐸𝐸[𝑒𝑒 𝑡𝑡𝑌𝑌 ] = 𝑡𝑡 2 (𝑒𝑒 𝑡𝑡 − 1)2 . [1]
So, 𝑀𝑀𝑋𝑋+𝑌𝑌 (𝑡𝑡) ≠ 𝑀𝑀𝑈𝑈(0,2) (𝑡𝑡), and therefore 𝜆𝜆 + 𝑌𝑌 does not have a 𝑈𝑈(0,2) distribution
[1]
[Total 8]
Part (i) was well answered. In part (ii), a common error was to state that the MGF is
undefined. Common errors in part (iv) involved not stating that X and Y are
independent, incorrectly deriving MGF(X+Y) and not summarising a response to the
assertion.
Q4
(i)
(𝑛𝑛−1)𝑆𝑆 2 2
The sampling distribution of 𝑆𝑆 2 is: 𝑑𝑑2
~𝜒𝜒𝑛𝑛−1 with 𝑛𝑛 = 25 and 𝑑𝑑 2 = 4
25 − 1 2
Therefore the sampling distribution of 𝑆𝑆 2 is: 4
𝑆𝑆 2 = 6 𝑆𝑆 2 ~𝜒𝜒24 [2]
(ii)
So 𝐸𝐸[6𝑆𝑆 2 ] = 24
And: 𝐸𝐸[𝑆𝑆 2 ] = 4 [1]
(iii)
var[6𝑆𝑆 2 ] = 48
48 4
So var[𝑆𝑆 2 ] = 36 = 3 [1]
[Total 4]

The question was well answered. In part (i) a number of candidates failed to specify
the sampling distribution, as the question asked.
Q5
(i)
𝑓𝑓(𝑑𝑑, 𝑦𝑦) = 𝑘𝑘𝑒𝑒 −(𝑡𝑡+2𝑦𝑦) = 𝑘𝑘𝑒𝑒 −𝑡𝑡 𝑒𝑒 −2𝑦𝑦 , 𝑑𝑑 > 0, 𝑦𝑦 > 0
[Or, f(x,y) = k g_X(x) g_Y(y).] [½]
The density function is expressed as a product of a function of x and y. Therefore, the joint
probability function is a product of the two marginal probability functions for all (𝑑𝑑, 𝑦𝑦) in the
range of the variables hence 𝜆𝜆 and 𝑌𝑌 are independent [½]
(ii)
The integral over the domain
∞ ∞ ∞ ∞
� 𝑓𝑓(𝑑𝑑, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑦𝑦 = 𝑘𝑘 � 𝑒𝑒 −(𝑡𝑡+2𝑦𝑦) 𝑑𝑑𝑑𝑑𝑑𝑑𝑦𝑦 = 𝑘𝑘 � 𝑒𝑒 −𝑡𝑡 𝑑𝑑𝑑𝑑 � 𝑒𝑒 −2𝑦𝑦 𝑑𝑑𝑦𝑦
0 0 0 0
Or, the integral of f(x,y) is k times the integral of g_X times the integral of g_Y
∞
∫0 𝑒𝑒 −𝑡𝑡 𝑑𝑑𝑑𝑑 = −𝑒𝑒 −𝑡𝑡 |∞
0 = 1, that is, the integral of g_X is one [1]
∞ 1 1
∫0 𝑒𝑒 −2𝑦𝑦 𝑑𝑑𝑦𝑦 = − 2 𝑒𝑒 −2𝑦𝑦 |∞
0 = 2 , that is, the integral of g_Y is 0.5 [1]
The integral of f(x,y) is 1 only for k=2 since, [1]

∞ 1
∫0 𝑓𝑓(𝑑𝑑, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑦𝑦 = 𝑘𝑘 × 1 × 2 = 1, hence 𝑘𝑘 = 2
(iii)
The marginal density is
∞ ∞
𝑓𝑓𝑌𝑌 (𝑦𝑦) = 2 ∫0 𝑒𝑒 −𝑡𝑡 𝑒𝑒 −2𝑦𝑦 𝑑𝑑𝑑𝑑 = 2𝑒𝑒 −2𝑦𝑦 ∫0 𝑒𝑒 −𝑡𝑡 𝑑𝑑𝑑𝑑 = 2𝑒𝑒 −2𝑦𝑦 [1]
(iv)
The conditional probability 𝑃𝑃(𝑌𝑌 ≤ 𝑦𝑦|𝑌𝑌 > 3) is
𝑃𝑃(𝑌𝑌≤𝑦𝑦,𝑌𝑌>3)
𝑃𝑃(𝑌𝑌 ≤ 𝑦𝑦|𝑌𝑌 > 3) =
𝑃𝑃(𝑌𝑌>3)
𝑃𝑃(3<𝑌𝑌≤𝑦𝑦)
=
𝐹𝐹𝑌𝑌 (𝑦𝑦)−𝐹𝐹𝑌𝑌 (3)
= 𝑃𝑃(𝑌𝑌>3)
, 𝑦𝑦 > 3.
[1]
Therefore,
𝑓𝑓𝑌𝑌 (𝑦𝑦) 2𝑒𝑒 −2𝑦𝑦
𝑓𝑓(𝑦𝑦|𝑌𝑌 > 3) = = 𝑒𝑒 −6
= 2𝑒𝑒 6−2𝑦𝑦 , 𝑦𝑦 > 3, [1]

since
∞
𝑃𝑃(𝑌𝑌 > 3) = ∫3 𝑓𝑓𝑌𝑌 (𝑦𝑦)𝑑𝑑𝑦𝑦 = −𝑒𝑒 −2𝑦𝑦 |∞
3 = 𝑒𝑒
−6
. [1]
(v)
Answer D [1]
The conditional expectation is given as
∞ ∞
𝐸𝐸[𝑌𝑌|𝑌𝑌 > 3] = ∫3 𝑦𝑦𝑓𝑓(𝑦𝑦|𝑌𝑌 > 3)𝑑𝑑𝑦𝑦 = ∫3 2𝑦𝑦𝑒𝑒 6−2𝑦𝑦 𝑑𝑑𝑦𝑦
By taking 𝑡𝑡 = 𝑦𝑦 − 3,
∞ ∞ ∞
𝐸𝐸[𝑌𝑌|𝑌𝑌 > 3] = ∫0 2(𝑡𝑡 + 3)𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 = ∫0 2𝑡𝑡𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 + ∫0 6𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡
(vi)
∞ 𝑒𝑒 −2𝑡𝑡 (−2𝑡𝑡−1) ∞ 1 1
∫0 2𝑡𝑡𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 = 2
|0 = (0) − �− 2� = 2 [1]
∞
∫0 6𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 = 3 [½]
𝐸𝐸[𝑌𝑌|𝑌𝑌 > 3] = 3.5 [½]
(vii)
Answer D [2]
∞ ∞
𝐸𝐸[𝑌𝑌 2 |𝑌𝑌 > 3] = ∫3 𝑦𝑦 2 𝑓𝑓(𝑦𝑦|𝑌𝑌 > 3)𝑑𝑑𝑦𝑦 = ∫3 2𝑦𝑦 2 𝑒𝑒 6−2𝑦𝑦 𝑑𝑑𝑦𝑦
Similar to (v),
∞ ∞ ∞ ∞
∫0 2(𝑡𝑡 + 3)2 𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 = ∫0 2𝑡𝑡 2 𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 + ∫0 12𝑡𝑡𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡 + ∫0 18𝑒𝑒 −2𝑡𝑡 𝑑𝑑𝑡𝑡
𝐸𝐸[𝑌𝑌 2 |𝑌𝑌 > 3] = 0.5 + 6 × 0.5 + 9 = 12.5
The first integral is the moment of order 2 for the exponential distribution with parameter 2
(viii)
The variance of Y given 𝑌𝑌 > 3
Var[Y|Y > 3] = 𝐸𝐸[𝑌𝑌 2 |𝑌𝑌 > 3] − ( 𝐸𝐸[𝑌𝑌|𝑌𝑌 > 3])2 = 12.5 − 3.52 = 0.25 [1]
[Total 14]
There were mixed answers in this question. This type of question has not appeared
frequently in the presented form and many candidates found it challenging. Parts (i) -
(iii) were well answered, while in part (iv) the justification for the conditional
probability required was often missed. Parts (v), (vii), v(iii) were not well answered.
These parts were potentially affected by the typing error in part (v). Part (vi) was
poorly answered.

Q6
(i)(a)
Answer A
The likelihood function is:

L =[(1 – p)3]40 × [3p(1 – p)2]60 × [3p2(1 – p)]15 × [p3]5
∝ (1 – p)120 p60 (1 – p)120 p30 (1 – p)15 p15
= (1 – p)255 p105
Taking logs:
log L ∝ 255 log (1 – p) + 105 log p [2]
(b)
Using the answer from (i)(a):
Then differentiate with respect to p:
𝑑𝑑 𝑙𝑙𝑙𝑙𝑙𝑙 𝐿𝐿 255 105
𝑑𝑑𝑑𝑑
= − 1−𝑑𝑑 + 𝑑𝑑 [1]
Setting this equal to zero gives:

255𝑝𝑝̂ = 105(1 − 𝑝𝑝̂ )
360𝑝𝑝̂ = 105 [1]
105
𝑝𝑝̂ = 360 = 0.2917 [1]
(ii)
Specify the hypotheses using a χ2 goodness of fit test:
H 0 – the probabilities follow a binomial bin(3, p) distribution
H 1 – the probabilities do not follow a binomial bin(3, p) distribution [1]
Using the MLE estimate for p above (0.29166):

P(X = 0) = (1 – p)3 = 0.35540
P(X = 1) = 3p(1 – p)2 = 0.43902
P(X = 2) = 3p2(1 – p) = 0.18077
P(X = 3) = p3 = 0.024812 [1]
Therefore we get the following:
Number of
0 1 2 3
exam passes
Observed no.
40 60 15 5
of passes
Expected no. 0.35540 x 120 0.43902 x 120 0.18077 x 120 0.024812 x 120
of passes = 42.648 =52.682 = 21.693 = 2.9774
[2]

Combining last two columns, as expected no. of students with 3 exam passes < 5:
Number of
0 1 2 and 3
exam passes
Observed no.
40 60 20
of passes
Expected no.
42.648 52.682 24.670
of passes
[1]
So: degrees of freedom = 3 – 1 – 1 = 1 [1]
The test statistic is:

(𝑂𝑂 − 𝐸𝐸)2 (40 − 42.648)2 (60 − 52.682)2 (20 − 24.670)2
� = + + = 2.0649
𝐸𝐸 42.648 52.682 24.670
[1]
The test statistic is less than the 5% 𝜒𝜒12 critival value of 3.841 – therefore there is insufficient
evidence at the 5% level to reject H 0 . Therefore there is no evidence to conclude that the
model is not a good fit [1]
[Total 13]
Part (i) was well answered. Part (ii) was reasonably well answered, but with a
number of common errors, including: incorrect hypotheses stated, incorrect expected
numbers calculated, no attempt at combining final 2 cells, incorrect degrees of
freedom and a number of candidates not clearly showing their working.
Q7
(i)
𝑡𝑡 distribution would be suitable, with 33 df. [1]
(ii)
Assumed that the variances (rural and urban) are equal [1]
Equal variances seem to be justified given the 𝑆𝑆𝑦𝑦 values for rural and urban areas are similar
given the small sample sizes [1]
Assumption of Normality [½]
[Marks available 2½, maximum 2]
(iii)
Answer A
1 1
Test statistic 𝑡𝑡 = (𝑌𝑌�𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑙𝑙 − 𝑌𝑌�𝑟𝑟𝑟𝑟𝑢𝑢𝑟𝑟𝑛𝑛 )/ �𝑆𝑆𝑃𝑃 �𝑛𝑛 + 𝑛𝑛 � ~𝑡𝑡𝑛𝑛𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 +𝑛𝑛𝑟𝑟𝑟𝑟𝑢𝑢𝑟𝑟𝑢𝑢 −2
𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑟𝑟𝑟𝑟𝑢𝑢𝑟𝑟𝑢𝑢
under the null hypothesis that phone usage is equal.

2 2
14𝑆𝑆𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 +19𝑆𝑆𝑟𝑟𝑟𝑟𝑢𝑢𝑟𝑟𝑢𝑢 1
𝑆𝑆𝑃𝑃2 = 33
= 33 (14 × 2.12 + 19 × 1.92 ) = 3.949
3.7−4.4
𝑡𝑡 = 1 1
= −1.031 [2]
�3.95×( + )
15 20
(iv)
We are applying a two-sided test [1]
Critical values for 𝑡𝑡33 are not in the tables, but at the 2.5% level they are between 2.032 (𝑡𝑡34 )
and 2.037 (𝑡𝑡32 ) [1]
Since the test statistic lies in-between the table values,

i.e. −2.032 < −1.031 < 2.037,
[Or, as t33;2.5% is between 2.032 and 2.037, we have
t33;97.5% < -1.031 < t33;2.5%)]
we conclude that the null hypothesis of equal phone usage being equal cannot be rejected
[1]
(v)
Assume 𝑌𝑌𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑙𝑙 ~𝑁𝑁(𝜇𝜇, 𝜎𝜎 2 ). [1]
Critical values for 𝑡𝑡14 at the 2.5% level are: -2.145 and +2.145 [1]
𝑡𝑡 2.1
Confidence interval = 3.7 ± 14,0.025
√15
= [3.7 − 2.145 × 0.542 , 3.7 + 2.145 × 0.542] [1]
= [2.537, 4.863] [1]
[Total 12]
Parts (i)-(iii) were generally well answered – common errors here included the
justification of equal variances often being omitted. In part (iv), a number of
candidates did not clearly refer to the critical values required, while in (v) the
assumption of Normality was often missed.
Q8
(i)
There appears to be a number of possible outliers, [½]
(i.e. c0 or c365 days, these should be rechecked as they may be an error in the data or
analysis.)
The plot exhibits a strong positive linear relationship between days and year [1]
𝑅𝑅 2 percentage looks too high when compared to the scatterplot and the several outliers [½]
𝛼𝛼 value looks too high, we would expect it lower than 100 days, looking at the scatterplot
[½]
𝛽𝛽 value sign looks to be the wrong way around, i.e. should be a positive [½]
The number of days is bounded in the interval [0,366]. If the intention is to project into
future years, it may have been better to fit a model that respects this restriction, e.g. do a
logistic transformation on the number of days first (although the relationship may no longer
be linear) [1]
[Marks available 4, maximum 2]
(ii)
The required values are:

𝑠𝑠𝑡𝑡𝑡𝑡 = ∑ 𝑡𝑡 2 − 𝑛𝑛𝑡𝑡̅ 2
= 42,925 – 50 * (1,275 / 50)^2 = 10,412.50 [½]
𝑠𝑠𝑡𝑡𝑑𝑑 = ∑ 𝑡𝑡𝑑𝑑 − 𝑛𝑛𝑡𝑡𝑑𝑑 ̅ ̅
= 282,724 – 50 * (1,275 / 50) * (8,502 / 50) = 65,923.00 [½]
Therefore:
𝑠𝑠
𝛽𝛽̂ = 𝑡𝑡𝑡𝑡
𝑠𝑠𝑡𝑡𝑡𝑡
= 65,923.00 / 10,412.50 = 6.331 [1]
𝛼𝛼� = 𝑑𝑑̅ − 𝛽𝛽̂ 𝑡𝑡̅
= 8,502 / 50 – 6.331 * (1,275 / 50) = 8.596 [1]
Hence the regression line equation as given in the question
(iii)(a)
𝑠𝑠𝑑𝑑𝑑𝑑 = ∑ 𝑑𝑑 2 − 𝑛𝑛𝑑𝑑̅ 2
= 1,911,378 – 50 * (8,502 / 50)^2 = 465,697.92 [1]
2
1 𝑠𝑠𝑡𝑡𝑡𝑡
𝜎𝜎� 2 = 𝑛𝑛−2 �𝑠𝑠𝑑𝑑𝑑𝑑 − 𝑠𝑠 �
𝑡𝑡𝑡𝑡
= (1 / 48) * (465,697.92 – 65,923^2 / 10,412.50) = 1,006.878 [1]
So the standard error of 𝛽𝛽̂ is:

�2
𝜎𝜎
�𝑠𝑠
𝑡𝑡𝑡𝑡
= sqrt(1,006.878 / 10,412.50) = 0.311 [1]
(iii)(b)
The test is as follows:
𝐻𝐻0 ∶ 𝛽𝛽 = 0 𝑣𝑣𝑠𝑠 𝐻𝐻1 ∶ 𝛽𝛽 ≠ 0 [½]
Under the null hypothesis, the corresponding test statistic is:

� −0
𝛽𝛽
�2
𝜎𝜎
= 6.331 / 0.311 = 20.360 [½]
�
𝑠𝑠𝑡𝑡𝑡𝑡
The 1% critical values from the 𝑡𝑡48 distribution are circa ±2.678 (using 𝑡𝑡50 for simplicity)
[½]
Or, interpolate to find critical values ±2.6832
Since 20.35998 > 2.678 there is strong evidence to reject 𝐻𝐻0 at the 1% level,
i.e. there is sufficient evidence to suggest that there is a strong linear relationship [½]
(iii)(c)
Using the same standard error and percentage point in (iii)(b), the confidence interval is
found by:

�
𝜎𝜎 2
𝛽𝛽̂ ± 2.678�𝑠𝑠
𝑡𝑡𝑡𝑡
= 6.331 ± 2.678 × 0.311 [1]

= (5.498 , 7.164) [1]
(iv)(a)
The test is as follows:
𝐻𝐻0 ∶ 𝛽𝛽 = 0 𝑣𝑣𝑠𝑠 𝐻𝐻1 ∶ 𝛽𝛽 ≠ 0 [½]
Under the null hypothesis, the corresponding test statistic is:
= 5.215 / 1.983 = 2.630 [½]
The 1% critical values from the 𝑡𝑡48 distribution are circa ±2.678 (using 𝑡𝑡50 for simplicity)
Since 2.62995 < 2.678 we have no evidence to reject 𝐻𝐻0 at the 1% level [½]
We conclude that there is insufficient evidence of a linear relationship [½]
(iv)(b)
Using the same standard error and percentage point in (iv)(a), the confidence interval is:
5.215 ± 2.678 × 1.983 [1]
= (−0.095 , 10.524) [1]
(v)
The two confidence intervals overlap, with one being a subset of the other [1]
This suggests that we cannot confidently conclude that the underlying slope coefficients are
different [1]
However, the large standard error leads to a wide confidence interval, meaning we lack
evidence in the conclusion to the above bullet points [1]
Alternative comments:
For deciding if the two underlying slope parameters are equal, a formal test would be
required for the difference between the two parameters [1]
where the variance of the difference should also be taken into account properly [1]
(vi)
The test conclusions in (iii)(b) and (iv)(a) appear to disagree [1]
The test statistic in (iii)(b) lies well over the critical value whereas the test statistic in (iv)(a)
lies just under the critical value [1]
So this suggests that the slope coefficients may be different for the two sets of climate change
data [1]
Recording of past data, method of collection, errors in collection / the data etc from the
alternative sources, treatment of outliers, differences in definition (e.g. location used) of
extreme weather, may lead to the apparent differences observed [1]
Alternative comments:
We reject this hypothesis of the slope parameter being significantly different from 0 in part
(iii)(b) but not in part (iv)(a) [1]

From these results alone it appears that the two parameters are therefore different, which
seems to contradict the conclusion in part (v) [1]
However, for deciding if the two underlying slope parameters are equal, a formal test would
be required for the difference between the two parameters, where the variance of the
difference should also be taken into account properly [1]
[Total 23]
Part (i) required more analysis and judgement from candidates, compared to the
usual comments required for this question type. Many candidates made generic
comments regarding the plot, with very little challenge or comment regarding the
statistics given in the question. Parts (ii)-(iv) were generally answered well, with the
only issue being numerical errors. Parts (v)-(vi) were poorly answered.
Q9
(i)
Answer C [3]
The likelihood is
u𝑥𝑥𝑖𝑖 𝑒𝑒 −u
𝑓𝑓(𝑑𝑑|u) = ∏𝑛𝑛𝑖𝑖=1 and prior for u is 𝑓𝑓(u) ∝ ua−1 𝑒𝑒 −bu
𝑡𝑡𝑖𝑖 !
So the posterior density is given by

u𝑥𝑥𝑖𝑖 𝑒𝑒 −u
𝑓𝑓(u|𝑑𝑑) ∝ 𝑓𝑓(𝑑𝑑|u) × 𝑓𝑓(u) ∝ ∏𝑛𝑛𝑖𝑖=1 × ua−1 𝑒𝑒 −bu ∝ ua+∑ 𝑡𝑡𝑖𝑖 −1 𝑒𝑒 −(b+n)u
𝑡𝑡𝑖𝑖 !
(ii)
𝑢𝑢|𝑑𝑑 follows a gamma(a+ ∑ 𝑑𝑑𝑖𝑖 , b+n) distribution [2]
(iii)(a)
The Bayesian estimate of μ under quadratic loss is the posterior mean and so:
a+∑ 𝑡𝑡
u� = 𝐸𝐸(u|𝑑𝑑) = b+n 𝑖𝑖 . [2]
(b)
This can be written as:
𝑟𝑟 b ∑ 𝑡𝑡 n
u� = b b+n + 𝑖𝑖 b+n [1]
n
a
= (1 − 𝑍𝑍) b + 𝑍𝑍𝑑𝑑̅ ,
n
where 𝑍𝑍 = b+n
is the credibility factor [1]
(iv)
a+∑ 𝑡𝑡𝑖𝑖 9+320 329
u� = 𝐸𝐸(u|𝑑𝑑) = b+n
= 3+6
= 9
= 36.56
[2]
a+∑ 𝑡𝑡𝑖𝑖 329

(v) 𝑉𝑉(𝑢𝑢|𝑑𝑑) = = = 4.06 [2]
(b+n)2 92

(vi)
The prior variance of 𝑢𝑢 has changed from 9/9 = 1 to 18/36 = 0.5. With the data (and hence the
likelihood) being unchanged [1]
this means that the posterior variance will also be reduced (but not necessarily halved) [1]
[Total 15]
Answered very well by most candidates. A common error in part (ii) was specifying an
incorrect Gamma distribution.
[Paper Total 100]

EXAMINATION

Core Principles
Paper A
Time allowed: Three hours and twenty minutes
T. 0044 (0) 1865 268 873.

1 A random sample of size 15 is taken from a Normal distribution with mean 19 and
variance 2.
(i) Write down the sampling distribution of S2 . [2]
(ii) Explain why your answer in part (i) is valid for this random sample. [1]
[Total 3]
2 A statistician wants to simulate values from certain distributions and has available
only a random number generator that yields independent samples from a N(0,1)
distribution.
(i) Describe an algorithm to simulate random numbers from a t distribution with

1 degree of freedom. [3]
(ii) Describe an algorithm to simulate random numbers from a gamma distribution

3 1
with parameters and . [3]
2 2
(iii) Describe an algorithm to simulate random numbers from an F distribution

with (1,1) degrees of freedom. [2]
[Total 8]
3 The random variable X follows a distribution with mean E X

b
a–1
and variance
2
ab
Var X = where a = 4 and b = 6 are the parameters of the distribution.
a – 1 2 (a – 2)
Y is a random variable such that
E (Y | X = x) = 3x + 6
and
Var (Y | X = x) = x2 + 4
Calculate the unconditional standard deviation of Y.

[6]
CS1A S2021–2
4 The number of pizzas ordered in a restaurant each day follows a Poisson distribution
with unknown mean m. The prior distribution for m follows a gamma distribution
with mean 35 and standard deviation 5. The restaurant receives 135 pizza orders over
7 days.
(i) Write down an expression of the prior probability density function for m
leaving out any coefficient of proportionality. [3]
(ii) Identify which one of the following expressions gives the correct posterior
probability density function for m.
A fposterior m ∝ m135 e – 7m
B fposterior m ∝ m183 e – 7.7m
C fposterior m ∝ m184 e – 8.4m
D fposterior m ∝ m183 e – 8.4m

[3]
(iii) Calculate a point estimate for the number of pizzas ordered each day, using
Bayesian estimation under all-or-nothing loss. [4]
(iv) Calculate a point estimate for the number of pizzas ordered each day, using
Bayesian estimation under squared-error loss. [2]
[Total 12]
5 The probability that a claim is made on a car insurance policy in a particular year is
0.06. The policies are assumed to be independent among them. 500 of these policies
are selected at random.
(i) Calculate the probability that no more than 40 of these policies will result in a
claim during the year, stating any approximations you make. [5]
Past data from the insurer indicate that the standard deviation of claim amounts is
£75. The insurer wishes to construct a 95% confidence interval for the mean claim
amount, with an interval width of £10.
(ii) Calculate the sample size needed to achieve this level of accuracy for a 95%
confidence interval. [4]
[Total 9]
CS1A S2021–3
6 Consider independent observations y1 , y2 , …, yn of a random variable Y with
probability density function
f y = 2cy exp –cy2 , y > 0,
where c > 0 is an unknown parameter. Let F(y) denote the cumulative distribution
function (CDF) of Y.
(i) Identify which one of the following expressions gives the inverse of the CDF
of Y:
1
A y = – log (1 – F(y))
c
1
B y = 1 – – log (1 – F(y))
c
1 1/2
C y = – log (1 – F(y))
c
1 1/2
D y = 1 – – log (1 – F(y))
c
[2]
(ii) Determine how values of this random variable can be generated using the
inverse transform method. [2]
A gamma prior distribution is assumed for 𝑐 with parameters a and b.
(iii) Identify which one of the following expressions is correct for the posterior
density of parameter c:
A p c | y) ∝ cn + a – 1 exp – ab + ∑ni= 1 y2i c
B p c | y) ∝ cn + a – 1 exp – b + ∑ni= 1 y2i c
C p c | y) ∝ cn + a exp – ab + ∑ni= 1 y2i c
D p c | y) ∝ cn + a exp – b + ∑ni= 1 y2i c

[2]
(iv) Determine the posterior distribution of parameter c with all relevant

parameters. [2]
[Total 8]
CS1A S2021–4
7 Let Xi , i = 1, 2, …, n be independent random variables, each following an exponential
distribution with parameter b. We consider the random variable Y = ∑ni= 1 Xi.
(i) Justify why MY t , the moment generating function (MGF) of variable Y, is

given by
–n
MY t = 1 – t b [2]
Let Z be a random variable such that the MGF of Z is Mz t = MY t .
(ii) Determine the value of b for which Z follows a chi-square distribution,

specifying the degrees of freedom of the chi-square distribution. [3]
[Total 5]
CS1A S2021–5
8 The number of hospital admissions for respiratory conditions in a big city was
recorded over 150 days. The level of the concentration of a certain pollutant was also
recorded (‘low’, ‘medium’, ‘high’), together with the mean temperature (in degrees
Celsius) on the day. Part of the data is shown below.
Pollutant Hospital
Temperature
Day concentration admissions
X1 X2 (Y)
1 10 Low 26
2 8 Low 37
. . . .
. . . .
. . . .
50 12 Low 32
51 7 Medium 31
. . . .
. . . .
. . . .
120 3 Medium 28
121 5 High 35
. . . .
. . . .
. . . .
150 6 High 31
A generalised linear model is to be fitted to investigate the dependence of the number

of hospital admissions on mean temperature and pollutant concentration.
(i) Write down a suitable model for the number of hospital admissions. [3]
(ii) Justify the inclusion of the terms that you have used in the linear predictor in
part (i). [2]
A statistician fitted a GLM, and obtained the following summary:
Coefficients:
Estimate Std. error z value Pr(>|z|)
(Intercept) –0.372 0.053 –6.916 4.66e-12 ***
X1 0.090 0.015 5.676 1.38e-08 ***
X2 Medium –0.100 0.080 –1.244 0.213570
X2 High 0.298 0.082 3.614 0.000301 ***
X1 : X2 Medium 0.036 0.023 1.551 0.120933
X1 : X2 High –0.076 0.028 –2.705 0.006825 **
Suppose that, on a different day, the pollutant concentration is High and the mean
temperature is 19 degrees Celsius.
(iii) Write down the linear function of the parameters the statistician should use in
constructing a predictor of the number of hospital admissions on that day. [1]
CS1A S2021–6
(iv) Explain why estimates for X2 Low and X1 : X2 Low are not shown in the
summary of the results above. [1]
(v) Comment on the impact of the pollutant concentration on the number of

hospital admissions, based on the summary of results above. [2]
[Total 9]
9 An actuarial analyst working in an investment bank believes that a firm’s first year
percentage return (y) depends on its revenues (x). The table below provides a
summary of x, y and the natural logarithmic revenue (z) for 110 firms.
Sample
Mean Median standard Minimum Maximum
deviation
y 0.106 –0.130 0.824 –0.938 4.333
x (£ million) 134.487 39.971 261.881 0.099 1455.761
z = log(x) 3.686 3.688 1.698 –2.316 7.283
The analyst determined that the correlation between y and x is −0.0175 and that the
linear regression line of the return on the revenue is
y = a + bx.
(i) (a) Identify which one of the following options gives the correct values of
the coefficient estimates a and b:
A a = 0.113 and b = –5.506 × 10–5
B a = –5.506 × 10–5 and b = 0.113
C a = 748.1227 and b = –5.562
D a = –5.562 and b = 748.1227
(b) Calculate the fitted return for a firm with revenue 95.55.
[3]
CS1A S2021–7
The analyst estimated the regression using the logarithm revenues (z) and y as
y = 0.438 – 0.090z
(ii) (a) Calculate the fitted return for the firm with revenue 95.55 (£ million)
using the regression model with the logarithmic revenues.
(b) Comment on the result in parts (ii)(a) and (i)(b).
(c) Calculate the value of the sum Szy .

[3]
(iii) Perform a statistical test at the 10% significance level to determine if the
logarithmic revenues significantly affect the percentage returns. [5]
The analyst speculated that, other things being equal, firms with greater revenues will
be more stable and thus enjoy a larger return. They considered the null hypothesis of
no relation between z and y.
(iv) Perform a statistical test at the 10% significance level to determine whether
the analyst’s speculation is correct. Your answer should include the
hypotheses of the test. [3]
(v) Calculate Pearson’s correlation coefficient between z and y. [1]
A client is considering investing in a firm that has z = 2.
(vi) (a) Calculate the client’s predicted first year percentage return.
(b) Calculate an approximate 95% confidence interval corresponding to

the predicted percentage return in part (vi)(a).
[4]
A firm in the data has logarithmic revenue z = 1.76 and the highest first year
percentage return y = 4.333.
(vii) (a) Calculate the residual for this observation.
(b) Comment on the observed data for this firm using part (vii)(a).
[3]
[Total 22]
CS1A S2021–8
10 Total yearly aggregate claims in a particular company are modelled as a random
variable X, where X is assumed to follow a Normal distribution with unknown mean µ
and variance σ2 = 12,0002 . Aggregate claims from the last 5 years are as follows:
146,000 142,000 153,000 127,000 132,000
An analyst wishes to estimate the unknown parameter µ.
(i) Identify which one of the following gives the correct expression of the
derivative of the log-likelihood function:
dl μ
A = – ∑ni= 1 xi – μ
dμ
dl μ
B = ∑ni= 1 xi – μ
dμ
dl μ 1
C = ∑ni= 1 xi – μ
dμ σ2
dl μ 1
D =– ∑ni= 1 xi – μ
dμ σ2
[2]
(ii) Calculate the maximum likelihood estimate for µ, using your answer to
part (i). [1]
(iii) Calculate a 95% confidence interval for µ. [4]
The analyst assumes a Normal prior distribution for μ with density function
2
μ – μ0
–
2σ2
f μ ∝e 0 , μ0 > 0 and σ0 > 0.
For such a prior, the analyst derives the posterior distribution for μ as
2
1 nτx + τ0 μ0
p μ x ∝ exp – nτ + τ0 μ–
2 nτ + τ0
1 1
where τ = and τ0 = .
σ2 σ20
Prior information about µ suggests that μ0 = 150,000 and σ20 = 10,204.082 .
(iv) Write down the distribution corresponding to the density p μ x above, with
all its parameters values. [2]
(v) Comment on the relationship between the prior distribution and the posterior
distribution of μ. [1]
(vi) Calculate the value of the Bayesian credibility estimate for μ under quadratic
loss. [2]
CS1A S2021–9
(vii) Calculate an approximate 95% Bayesian interval for μ, based on its posterior
distribution. [2]
(viii) Comment on the intervals estimated in parts (iii) and (vii). [1]
Another analyst assumes a Uniform prior distribution for μ with mean μ0 = 150,000
and variance σ20 = 10,204.082 .
(ix) Identify which one of the following gives the correct expression of the
posterior distribution for μ:
μ – μ0 n 2
A p μx ∝ exp – μ–x
σ20 2σ2
n 2
B p μ x ∝ exp – μ–x
2σ2
μ 2
+ 02
nx
1 n 1 σ2 σ0
C p μ x ∝ exp – + μ– 1
2 σ2 σ20 n
+ 2
σ2 σ0
2 n 2
D p μ x ∝ μ – μ0 exp – μ–x
2σ2
[3]
[Total 18]
END OF PAPER
CS1A S2021–10
EXAMINERS’ REPORT
September 2021
CS1 – Actuarial Statistics

Core Principles
Paper A
Introduction
Sarah Hutchinson
December 2021

CS1A - Actuarial Statistics - Core Principles - September 2021 - Examiners’ report
The aim of the Actuarial Statistics subject is to provide a grounding in mathematical and
Some of the questions in the examination paper accept alternative solutions from those
Rounding errors were not penalised. However, candidates may have lost marks where
excessive rounding led to significantly different answers.
In cases where the same error was carried forward to later parts of the answer, candidates
were given appropriate credit for the later parts.
In questions where comments were required, valid comments that were different from
The paper included a number of multiple choice questions, where showing working was
not required as part of the answer.
In all multiple choice questions, the details provided in the answers below (e.g.
calculations) are for information.
In all numerical questions that were not multiple-choice, full credit was given for correct
answers that also included appropriate workings.
Standard keyboard typing was accepted for mathematical notation.
B. Comments on candidate performance in this diet of the examination.
Performance was satisfactory in general, with many candidates showing good

understanding of the topics in this subject. Well prepared candidates were able to score
highly.
A smaller number of candidates appeared to be inadequately prepared, in terms of not

Questions corresponding to parts of the syllabus that are not frequently examined
were generally poorly answered (e.g. Question 2, parts of Question 8). This highlights the
need for candidates to cover the whole syllabus when they revise for the exam and not
only rely on themes appearing in past papers.
C. Pass Mark
The Pass Mark for this exam was 58

1372 presented themselves and 578 passed.

Solutions for Subject CS1A – September 2021
Q1
(i)
(𝑛𝑛−1)𝑆𝑆 2 2
𝜎𝜎2
~𝜒𝜒𝑛𝑛−1 , [½]
2
𝑛𝑛 = 15, 𝜎𝜎 = 2 𝑠𝑠𝑠𝑠 7𝑆𝑆 2 ~𝜒𝜒14
2
. [1½]
(ii)
The underlying sample is from the Normal distribution, hence the chi-squared
distributional assumption for the sample variance holds true [1]
[Total 3]
Generally well answered.

(i) Common errors included candidates using the wrong expression.
(ii) A number of candidates did not answer this part. Those who answered, did so well.
Q2
(i)
𝑍𝑍
𝑡𝑡1 = , where Z ~ N(0,1) and Y ~ 𝜒𝜒12 are independent [1]
√𝑌𝑌
Simulate 𝑍𝑍1 , 𝑍𝑍2 from N(0,1) independently, [½]

then 𝑍𝑍22 ~ 𝜒𝜒12 . [½]
𝑍𝑍 𝑍𝑍
So 1 = 𝑍𝑍1 ~ 𝑡𝑡1 [1]
�𝑍𝑍22 2
(ii)
Simulate iid 𝑍𝑍1 , 𝑍𝑍2 , 𝑍𝑍3 ~𝑁𝑁(0,1), so that 𝑍𝑍12 + 𝑍𝑍22 + 𝑍𝑍32 ~ 𝜒𝜒32 . [2]
This is the same as a Gamma (3/2, 1/2) distribution [1]
(iii)
Simulate iid 𝑍𝑍1 , 𝑍𝑍2 ~𝑁𝑁(0,1), so that 𝑍𝑍12 , 𝑍𝑍22 ~ 𝜒𝜒12 independently [1]
𝑍𝑍12
Then𝑍𝑍 2 ~𝐹𝐹1,1 [1]
2
[Total 8]
This question was not well answered, with many candidates not attempting it.
In many cases candidates attempted to provide answers using incorrect (or not sufficiently
explained) references to the inverse CDF method. Notice that the inverse CDF method is
not directly applicable here.
Q3
The mean and variance of the distribution are given by
𝑏𝑏 6
𝐸𝐸 [𝑋𝑋] = 𝑎𝑎−1
= 4–1 = 2 [½]

𝑎𝑎𝑏𝑏 2 4(6)2
𝑉𝑉𝑉𝑉𝑉𝑉 [𝑋𝑋] = (𝑎𝑎−1)2 (𝑎𝑎−2) = (4 − 1)2 (4 − 2)
=8 [½]
𝑉𝑉𝑉𝑉𝑉𝑉 (𝑌𝑌) = 𝑉𝑉𝑉𝑉𝑉𝑉 [𝐸𝐸 (𝑌𝑌 | 𝑋𝑋)] + 𝐸𝐸 [𝑉𝑉𝑉𝑉𝑉𝑉 (𝑌𝑌 | 𝑋𝑋)], so
𝑉𝑉𝑉𝑉𝑉𝑉 (𝑌𝑌) = 𝑉𝑉𝑉𝑉𝑉𝑉 [3𝑋𝑋 + 6 ] + 𝐸𝐸 [𝑋𝑋 2 + 4] [1]
= 9 𝑉𝑉𝑉𝑉𝑉𝑉 [𝑋𝑋] + 𝐸𝐸 [𝑋𝑋 2 ] + 4 [1]
Also 𝐸𝐸 [𝑋𝑋 2 ] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋] + (𝐸𝐸 [𝑋𝑋])2 = 8 + 22 = 12 [1]
So, 𝑉𝑉𝑉𝑉𝑉𝑉 (𝑌𝑌) = 9 (8) + 12 + 4 = 88 [1]
The standard deviations is √88 = 9.381 [1]
Generally answered very well.

Common issues involved not providing the standard deviation and calculation errors.
Also, some candidates did not provide sufficient intermediate steps and this may have
impacted partial credit given.
Q4
(i)
A gamma distribution with mean 35 and standard deviation 5 has the following parameters:
𝛼𝛼 𝛼𝛼
𝜆𝜆
= 35 and 𝜆𝜆2
= 25
So: 𝛼𝛼 = 25𝜆𝜆2 and 𝛼𝛼 = 35𝜆𝜆

Solving these equations gives: 𝛼𝛼 = 49 and 𝜆𝜆 = 1.4 [1]
So the prior distribution of 𝑚𝑚 is Gamma(49, 1.4) [1]

The prior PDF of 𝑚𝑚 is therefore: 𝑓𝑓𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (𝑚𝑚) ∝ 𝑚𝑚48 𝑒𝑒 −1.4𝑚𝑚 [1]
(ii)
Answer: D [3]
Likelihood function L ∝ 𝑒𝑒 −7𝑚𝑚 𝑚𝑚135
The posterior PDF of 𝑚𝑚 is given by:
𝑓𝑓𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (𝑚𝑚) ∝ 𝑓𝑓𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (𝑚𝑚) × 𝐿𝐿𝐿𝐿𝐿𝐿𝑒𝑒𝐿𝐿𝐿𝐿ℎ𝑠𝑠𝑠𝑠𝑜𝑜 𝑓𝑓𝑓𝑓𝑛𝑛𝑓𝑓𝑡𝑡𝐿𝐿𝑠𝑠𝑛𝑛
So 𝑓𝑓𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (𝑚𝑚) ∝ 𝑚𝑚48 𝑒𝑒 −1.4𝑚𝑚 × 𝑒𝑒 −7𝑚𝑚 𝑚𝑚135 = 𝑚𝑚183 𝑒𝑒 −8.4𝑚𝑚
(iii)
Under all or nothing loss, the Bayesian estimate is given by the mode of this
Gamma(184, 8.4) distribution, which can be obtained by finding the value of 𝑚𝑚 that
maximises the PDF [1]
Finding the maximum:
𝑑𝑑 𝑑𝑑 183
𝑑𝑑𝑚𝑚
�log(𝑓𝑓
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 (𝑚𝑚)� =
𝑑𝑑𝑚𝑚
(183 log 𝑚𝑚 − 8.4𝑚𝑚) =
𝑚𝑚
− 8.4 [1]
183
Setting equal to zero gives 𝑚𝑚 = 8.4
= 21.786 [2]
(iv)
Correctly identify mean of gamma posterior distribution as: [1]
𝛼𝛼 184
𝜆𝜆
= 8.4 = 21.905 [1]
[Total 12]

The question was well answered.

Please note that in all multiple choice questions, the details provided in the answers (e.g.
calculations) are for information. Candidates were not required to show working.
Q5
(i)
𝑋𝑋 = number of policies where a claim is made, so
𝑋𝑋~𝐵𝐵𝐿𝐿𝑛𝑛(500, 0.06) [1]
Use the Normal approximation: 𝑋𝑋 ∼̇ N(30, 28.2), [2]

as n is sufficiently large
𝑃𝑃(𝑋𝑋 ≤ 40) = 𝑃𝑃(𝑋𝑋 ≤ 40.5) using continuity correction [1]

40.5−30 10.5
= Φ� � = Φ� � = Φ(1.97726) = 0.97599 [1]
√28.2 √28.2
(ii)
𝜎𝜎
The 95% confidence interval for the mean claim amount is: 𝑥𝑥̅ ± 1.96 [1]
√𝑛𝑛
𝜎𝜎
1.96 = 5 for a total confidence width of £10 [1]
√𝑛𝑛
Solve for n, using 𝜎𝜎 = £75, gives n = 864.36, i.e. sample size of 865 [2]
[Total 9]
Generally answered well.

A common issue in part (i) was not using or applying a continuity correction correctly.
Q6
(i)
Answer: C [2]
First derive the cdf of Y as
𝑦𝑦
𝑦𝑦
𝐹𝐹(𝑦𝑦) = � 2𝑓𝑓𝑡𝑡 exp(−𝑓𝑓𝑡𝑡 2 ) 𝑜𝑜𝑡𝑡 = [−exp(−𝑓𝑓𝑡𝑡 2 )]0
0
= 1 − exp(−𝑓𝑓𝑦𝑦 2 ), 𝑦𝑦 > 0.
1 1/2
So, 𝑦𝑦 = 𝐹𝐹 −1 (𝑓𝑓) = �− 𝑐𝑐 log(1 − 𝑓𝑓)�
(ii)
To generate values of Y:
1. Generate a random variate u from U(0, 1) [½]
1 1/2
2. Return 𝑦𝑦 = �− 𝑐𝑐 log(1 − 𝑓𝑓)� [1½]
(iii)
Answer: B [2]
2
𝑝𝑝(𝑓𝑓 | 𝑦𝑦) ∝ 𝜋𝜋(𝑓𝑓)𝐿𝐿(𝑓𝑓; 𝑦𝑦) ∝ 𝑓𝑓 𝑎𝑎−1 𝑒𝑒 −𝑐𝑐𝑏𝑏 (2𝑓𝑓)𝑛𝑛 ∏𝑝𝑝 𝑦𝑦𝑝𝑝 𝑒𝑒 −𝑐𝑐𝑦𝑦𝑖𝑖
∝ 𝑓𝑓 𝑛𝑛+𝑎𝑎−1 exp{−(𝑏𝑏 + ∑𝑛𝑛𝑝𝑝=1 𝑦𝑦𝑝𝑝2 )𝑓𝑓}

(iv)
This is the density of a gamma distribution [1]
with parameters 𝑛𝑛 + 𝑉𝑉 and 𝑏𝑏 + ∑𝑛𝑛𝑝𝑝=1 𝑦𝑦𝑝𝑝2 [1]
[Total 8]
There were mixed answers here, with a number of candidates not attempting parts of the
question.
In part (iv) some candidates failed to identify the parameters of the gamma distribution
correctly.
Q7
(i)
Since 𝑋𝑋𝑝𝑝 are independent, we have that 𝑌𝑌 = ∑𝑛𝑛𝑝𝑝 𝑋𝑋𝑝𝑝 follows a gamma distribution with
parameters n and 𝑏𝑏 [1]
−𝑛𝑛
So MGF is given by 𝑀𝑀𝑌𝑌 (𝑡𝑡) = �1 − 𝑡𝑡�𝑏𝑏� [1]
(ii)
−𝑛𝑛/2
𝑀𝑀𝑧𝑧 (𝑡𝑡) = �𝑀𝑀𝑌𝑌 (𝑡𝑡) = �1 − 𝑡𝑡�𝑏𝑏� [½]
The MGF of a chi-square distribution with n degrees of freedom
is (1 − 2𝑡𝑡)−𝑛𝑛/2 [½]
So 𝑀𝑀𝑧𝑧 (𝑡𝑡) is the MGF of a chi-square distribution with n degrees of freedom [1]
and 𝑏𝑏 = 0.5 [1]
[Total 5]
There were mixed answers in this question, often with unclear justification.
In part (i) reference to independence of the variables is required to fully justify the answer
and obtain full marks.
Q8
(i)
Y follows a Poisson distribution [1]
log(𝜇𝜇) = 𝛼𝛼𝑝𝑝 + 𝛽𝛽𝑝𝑝 𝑋𝑋1 ; where 𝐿𝐿 = 1,2,3 for low, medium and high pollutant respectively [1]
𝜇𝜇 = 𝐸𝐸(𝑌𝑌) [1]
Alternative forms for the linear predictor:

The linear predictor above can also be written as:
log(𝜇𝜇) = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + 𝛽𝛽2,𝑝𝑝 + 𝛽𝛽3,𝑝𝑝 𝑋𝑋1; where 𝐿𝐿 = 2,3 for medium and high pollutant
Or, a model without the interaction term can be given
log(𝜇𝜇) = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1 + 𝛽𝛽2,𝑝𝑝 ; where 𝐿𝐿 = 2,3 for medium and high pollutant
(ii)
𝛼𝛼𝑝𝑝 , 𝐿𝐿 = 1,2,3 are the coefficients of the main effect for pollutant concentration [1]

We may also need the interaction term 𝛽𝛽𝑝𝑝 𝑋𝑋1 if the effect of temperature on number of
hospitalisations is different for each level of pollutant concentration [1]
Under alternative forms for the linear predictor in (i):

𝛽𝛽0 is the intercept
𝛽𝛽1 is the coefficient for the main effect for temperature
𝛽𝛽2,𝑝𝑝 the coefficients of the main effect for pollutant concentration where 𝐿𝐿 = 2,3 for
medium and high pollutant
𝛽𝛽3,𝑝𝑝 the coefficients of the effect of temperature on number of hospitalizations
where 𝐿𝐿 = 2,3 for medium and high pollutant
(iii)
log(𝜇𝜇) = −0.372 + 0.09 × 19 + 0.298 − 0.076 × 19 [1]
(iv)
These are not listed as X_2Low is used as the reference category [1]
or, equivalently, their effect is included in the intercept estimate
(v)
Medium concentration has no significant effect, as compared to low concentration, [1]
while high concentration has a significant increasing effect for the number of hospital
admissions [1]
Alternative comments include:

The sign of 𝑋𝑋1 : 𝑋𝑋2 𝐻𝐻𝐿𝐿𝐻𝐻ℎ suggests that temperature becomes less important when pollutant
concentration is High (but 0.09-0.076 is still positive)
[Total 9]
Again, the quality of answers given here was mixed.

In part (i) there was no mention of the distribution in many cases.
Parts (iii), (iv) were well answered for candidates that attempted them.
Q9
(i)(a)
Answer: A [2]
𝑛𝑛 = 110
𝑆𝑆𝑥𝑥𝑥𝑥 = 𝑉𝑉𝑉𝑉𝑉𝑉(𝑥𝑥) (𝑛𝑛 − 1) = 261.8812 × 109 = 7475401
𝑆𝑆𝑦𝑦𝑦𝑦 = 𝑉𝑉𝑉𝑉𝑉𝑉(𝑦𝑦) (𝑛𝑛 − 1) = 0.8242 × 109 = 74.008
𝑺𝑺𝒚𝒚𝒚𝒚 0.824
𝑏𝑏� = 𝑉𝑉�𝑺𝑺 = −0.0175 261.881 = −5.506 × 10−5
𝒙𝒙𝒙𝒙
�𝑉𝑉 = 𝑦𝑦� − 𝑏𝑏�𝑥𝑥̅ = 0.106 + 5.506 × 10−5 × 134.487 = 0.113
(b)
The fitted return for a firm with 𝑥𝑥 = 95.55 is
𝑦𝑦 ∗ = 0.113 − 5.506 × 10−5 × 95.55 = 0.108 [1]

(ii)(a)
Using the logarithmic regression,
𝑦𝑦 ∗ = 0.438 − 0.090 × 𝐿𝐿𝑠𝑠𝐻𝐻(95.55) = 0.028 [1]
(b)
The return estimated with the log revenue is different from the return in part (i)(b) as
expected [1]
(c)
𝑆𝑆𝑧𝑧𝑧𝑧 = 𝑉𝑉𝑉𝑉𝑉𝑉(𝑧𝑧) (𝑛𝑛 − 1) = 1.6982 × 109 = 314.269 [½]
𝑆𝑆𝑧𝑧𝑦𝑦 = β𝑆𝑆𝑧𝑧𝑧𝑧 = −0.09 × 314.269 = −28.284 [½]
(iii)
𝐻𝐻0 : β = 0 𝑣𝑣𝑠𝑠 𝐻𝐻1 : β ≠ 0 [½]
1 28.2842
𝜎𝜎� 2 = 108 �74.008 − � = 0.662 [1]
314.269
s.e.�𝛽𝛽̂ � = (𝜎𝜎� 2 ⁄𝑆𝑆𝑧𝑧𝑧𝑧 )1⁄2 = (0.662⁄314.269)1⁄2 = 0.046 [½]
Test statistic = −0.09⁄0.046 = −1.956 [1]
The test statistic follows a t-distribution with 108 df under the null hypothesis [½]
This is a two-sided test with the 5% critical value being −1.658 for 120 df
( −1.661 using linear interpolation and −1.659 using R) [½]
We have evidence at 10% significance level to reject the null hypothesis that
β = 0 and we conclude that the logarithmic revenues affect returns [1]
(iv)
𝐻𝐻0 : β = 0 𝑣𝑣𝑠𝑠 𝐻𝐻1 : β > 0 [1]
From (iii), the test statistic is − 1.956

The test statistic follows a t-distribution with 108 df under the null hypothesis
This is a one-sided test with the 10% critical value approximating 1.289 for
120 df (1.290 using linear interpolation and 1.289 using R) [1]
We do not have evidence to reject 𝐻𝐻0 at 10% significance level. Firms with greater
revenues do not necessary enjoy a larger return [1]
(v)
𝑆𝑆𝑧𝑧𝑧𝑧 −28.284
𝑉𝑉 = (𝑆𝑆 = (314.269×74.008)1/2 = −0.185 [1]
𝑧𝑧𝑧𝑧 𝑆𝑆𝑧𝑧𝑧𝑧 )1/2
(vi)(a)
𝑧𝑧 = 2, the estimated percentage return is
𝑦𝑦 ∗ = 0.438 − 0.09 × 2 = 0.258 [1]
(b)
1
∗) 1 (2−3.686)2 2
𝑆𝑆𝑒𝑒(𝑦𝑦 = ��1 + 110 + � 0.662 � = 0.821 [1½]
314.269

Confidence interval: 0.258 ± 1.98 × 0.821 i.e. (−1.367 , 1.883) [1½]

if approximating the percentage points for 𝑡𝑡108 𝑡𝑡𝑠𝑠 𝑡𝑡120 .
(vii)(a)
The expected return is 𝑦𝑦 ∗ = 0.438 − 0.09 × 1.76 = 0.28 . [1]
The residual is 𝑒𝑒̃ = 4.333 − 0.28 = 4.053 . [1]
(b)
The residual is way above 0 and from the table the percentage return is 3 times
the median [1]
Alternative:
This observation seems to be an outlier. Or, the residual appears large given the size
of the sample SD of the y data
[Total 22]
Generally well answered.

Some common issues included:
(ii)(b) Attention to detail was required here, often candidates made inconsistent
comments.
(ii)(c), (iii): Calculation errors.
(iv) Using a two-tailed test was a common error here.
(v) A number of candidates showed lack of understanding where incorrect values for Szy,
Szz led to an obviously incorrect Pearson’s correlation coefficient, i.e. r < -1 or r > 1.
In parts (iii), (iv), (vi) full credit was given for using alternative critical point values,
including values resulting from linear interpolation, extracted form R, or the appropriate
critical points of the standard normal distribution with justification – i.e. high df.
Q10
(i)
Answer: C [2]
𝑛𝑛 𝑛𝑛
𝐿𝐿(𝜇𝜇) = log �� 𝑓𝑓(𝑥𝑥𝑝𝑝 |𝜇𝜇)� = � log 𝑓𝑓(𝑥𝑥𝑝𝑝 |𝜇𝜇)

𝑝𝑝=1 𝑝𝑝=1
𝑛𝑛 𝑛𝑛 1
= − 2 log(2𝜋𝜋) − 2 log(𝜎𝜎 2)
− 2𝜎𝜎2 ∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝜇𝜇)2
Therefore
𝑑𝑑𝑑𝑑(µ) 1
= 2 ∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − µ)
𝑑𝑑µ σ
(ii)
From part (i):
𝑑𝑑𝑑𝑑(µ) 1
= 0 ∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − µ) = 0
𝑑𝑑µ σ2

Therefore,
𝑛𝑛
1
𝜇𝜇̂ = � 𝑥𝑥𝑝𝑝 = 𝑥𝑥̅
𝑛𝑛
𝑝𝑝=1
𝜇𝜇̂ = 140,000 [1]
(iii)
Given that 𝜇𝜇̂ is the sample mean,
𝜎𝜎2
𝜇𝜇̂ ~ 𝒩𝒩(𝜇𝜇, 𝑛𝑛
) [1½]
Confidence interval:
2
𝜇𝜇̂ ± 𝑍𝑍0.025 �𝜎𝜎 �𝑛𝑛 [1]
12,000 12,000
140,000 − 1.96 ≤ 𝜇𝜇 ≤ 140,000 + 1.96 [1]
√5 √5
95% CI: (129,481.54 , 150,518.46) [½]
(iv)
The posterior distribution is a normal distribution with mean: [1]
𝑛𝑛𝑛𝑛𝑥𝑥̅ +𝑛𝑛0 𝜇𝜇0

𝑛𝑛𝑛𝑛+𝑛𝑛0
= 142166.7 [½]
and variance:
1/(𝑛𝑛𝑛𝑛 + 𝑛𝑛0 ) = 4749.772 = 22,560,315 [½]
Hence,
𝜇𝜇̂ ~ 𝒩𝒩(142166.7 , 4749.772 )
(v)
The prior and the posterior distribution are of the same type [½]
The normal distribution is the conjugate prior for the mean of a normal distribution [½]
(vi)
Bayesian credible estimate for 𝜇𝜇 under quadratic loss is the expectation of the posterior
distribution: [1]
𝜇𝜇� = 142166.67 [1]
(vii)
𝜇𝜇� ∼ 𝑁𝑁(142166.67,4749.772 ), therefore the Bayesian interval is
�142,166.67 − 1.96√4749.772 , 142,166.67 + 1.96√4749.772 � [1½]

i.e. (132857.1, 151476.2). [½]

(viii)
The Bayesian interval is different (narrower) than the CI of the MLE [½]
The prior belief has impacted on the estimation of the posterior [½]
(ix)
Answer: B [3]
Given that the prior density of the uniform distribution 𝑓𝑓(𝜇𝜇) does not
depend on μ, we have:
𝑝𝑝(𝜇𝜇|𝑥𝑥� ) ∝ 𝐿𝐿(𝜇𝜇)𝑓𝑓(𝜇𝜇)
1
∝ exp �− 2𝜎𝜎2 (∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝜇𝜇)2 )�
1
∝ exp �− 2𝜎𝜎2 (∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝑥𝑥̅ )2 + 2 ∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝑥𝑥̅ )(𝑥𝑥̅ − 𝜇𝜇) + n(𝑥𝑥̅ − 𝜇𝜇)2 )�
𝑛𝑛
∝ exp �− 2 (𝑥𝑥̅ − 𝜇𝜇)2 �
2𝜎𝜎
Since
∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝑥𝑥̅ )(𝑥𝑥̅ − 𝜇𝜇) = (𝑥𝑥̅ − 𝜇𝜇) ∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝑥𝑥̅ ) = 0
and
∑𝑛𝑛𝑝𝑝=1(𝑥𝑥𝑝𝑝 − 𝑥𝑥̅ )2 does not depends on 𝜇𝜇.
[Total 18]
Very well answered.

Comments in part (viii) varied, with many candidates failing to mention the impact of the
prior distribution.
[Paper Total 100]

EXAMINATION
21 April 2022 (am)

Core Principles
Paper A
Time allowed: Three hours and twenty minutes
T. 0044 (0) 1865 268 873.

1 The number of emails, X, to be replied to in a day by an employee of the customer
service centre of an insurance company is modelled as a Poisson random variable
with mean 25. The time (in minutes), Y, that the employee takes to reply to x emails is
modelled as a random variable with conditional mean and variance given by:
E Y|X = x) = 3x + 11, Var Y|X = x) = x + 9.
Calculate the unconditional variance of the time, Y, that the employee takes to reply to
emails in a day. [3]
2 The number of claims arriving at an insurance company is assumed to follow a

Poisson process N(t) t ≥ 0 with rate 𝑚 2 per year.
(i) State the distribution of the random variable N(1). [1]
(ii) Calculate the probability of more than two claims arriving in year 2 given that
five claims arrived in year 1. [2]
(iii) Calculate the probability of more than two claims arriving in year 2 given that
no claims arrived in year 1. [1]
(iv) Compare the results in parts (ii) and (iii). [1]
(v) Identify the distribution of the time of the nth claim, justifying your answer.
[2]
(vi) Calculate a random value from the exponential distribution with

parameter m = 2 using a realised value of 0.201 from a U(0,1) distribution
and the inverse transform method. [2]
[Total 9]
CS1A A2022–2
3 Let X and Y be two continuous random variables jointly distributed with probability
density function:
6e–(2x + 3y) , x, y ≥ 0,
fXY x, y =
0, otherwise.
(i) Identify which one of the following options gives the correct expression for
the marginal density function fX (x):
2x
A fX x = 2e , x≥0
0 otherwise.
–2x
B fX x = e , x≥0
0 otherwise.
2e–x , x≥0
C fX x =
0 otherwise.
–2x
D fX x = 2e , x≥0
0 otherwise.
[1]
(ii) Identify which one of the following options gives the correct expression for
the marginal density function fY y :
3e3y , y≥0
A fY y =
0, otherwise.
e3y , y≥0
B fY y =
0, otherwise.
3e–3y , y≥0
C fY y =
0, otherwise.
e–3y , y≥0
D fY y =
0, otherwise.
[1]
(iii) Comment on whether X and 𝑌 are independent, by using your results in parts
(i) and (ii). [1]
(iv) Calculate the conditional expectation E Y|X > 2 . [2]
(v) Identify which one of the following options gives the correct expression for
P X>Y :
1
A
5
3
B
5
1
C
3
1
D
2
[2]
[Total 7]
CS1A A2022–3
4 (i) Describe what is meant by each of the following:
(a) A random sample
(b) A statistic.
[3]
A new political party is interested in the level of support it would have among the
voters in a particular country. The random variable X is defined as:
1, if the voter would support the party,

X=
0, otherwise.
A random sample of 50 voters are presented with a simple summary of the party’s
policies and asked if they would support this new party. The random sample is
represented by X1 , X2 , …, X50 .
(ii) (a) Identify a suitable population together with a possible parameter of

interest.
(b) Determine, using your answer to part (ii)(a), the sampling distribution
of the statistic:
50
Y= Xi
i=1
[4]
[Total 7]
CS1A A2022–4
5 Let X1 , X2 , …, Xn be independent identically distributed random variables following a
Poisson(m) distribution. Suppose that, rather than observing the random variables
precisely, only the events Xi = 0 or Xi > 0 are observed for i = 1, 2, …, n.
Let Y be a random variable with:
0, Xi = 0
Yi =
1, Xi > 0
for i = 1, 2, …, n.
(i) Explain why the distribution of Yi is a Bernoulli (p) distribution with

parameter p = 1 – e – m . [1]
(ii) Identify which one of the following expressions gives the correct likelihood
1
function based on observations y1 , …, yn in terms of y = ∑ni = 1 yi and the
n
unknown parameter m.
A L m = (1 + e – m )ny (em )n – ny
B L m = (1 – em )ny (e – m )n – ny
C L m = (1 – e – m )ny (e – m )n – ny
D L m = (1 – e – m )ny (e – m )n + ny
[2]
(iii) Derive an expression for the Maximum Likelihood Estimate (MLE) m of m in

1
terms of y = ∑ni = 1 yi . [4]
n
(iv) State the condition that m and L(m) must satisfy for m to maximise the
likelihood function. [1]
[Total 8]
CS1A A2022–5
6 The size of claims on a certain type of motor insurance policy are modelled as a
random variable X with Probability Density Function (PDF)
βα
f x; α, β = α , x ≥ β, α, β > 0.
xα + 1
(i) Identify which one of the following expressions gives the correct log
likelihood function in terms of a random sample (x1 , x2 , …, xn ) and the
unknown parameters α and β:
A l α, β = n log α + nα log β + α + 1 ∑ni= 1 log xi

B l α, β = log α + nα log β – α + 1 ∑ni= 1 log xi
C l α, β = n log α + n log β – α + 1 ∑ni= 1 log xi
D l α, β = n log α + nα log β – α + 1 ∑ni= 1 log xi
[2]
(ii) Derive the MLE α of parameter α as a function of parameter β, for a random

sample (x1 , x2 , …, xn ). [2]
(iii) Comment on the behaviour of the PDF of X when β increases. [1]
(iv) Determine the MLE β of parameter β based on your comment in part (iii). [2]
The values (in $) of a sample of 10 claims are given in the table below:
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
10,000 12,000 8,000 16,000 20,000 19,000 17,000 22,000 18,000 5,000
(v) Calculate the mean and standard deviation of the natural logarithm of the
sample. [2]
(vi) Calculate the MLEs α and β based on the sample. [2]

[Total 11]
CS1A A2022–6
7 The probability density function of a gamma distribution is parameterised as follows:
2 2
μ (μ / σ ) μ2
–1
f x = σ
2 2
x σ2 e–xμ / σ , x ≥ 0, μ, σ > 0.
μ2
Γ 2
σ
This density can be expressed in the form of the exponential family, as follows:
1 μ2 1
θ=– , b θ = – log – θ , ϕ = 2 , α ϕ = ,
μ σ ϕ
c x, ϕ = ϕ – 1 log x – log Γ ϕ + ϕ log ϕ,
where the exponential family notation is the same as that in the Actuarial Formulae
and Tables book.
(i) Justify that µ and σ2 are the mean and the variance of the distribution,
respectively, using the properties of the exponential family. [3]
An actuary is modelling the relationship between claim size and the time spent
processing the claim, called operational time (opt). A statistician suggests using a
model with the claim size being the response variable following the gamma
distribution given above.
(ii) Comment on why a gamma distribution may be more suitable than the Normal
distribution for the claim sizes. [2]
The actuary decided to fit a generalised linear model (GLM) with a gamma family
and obtained the following estimates:
Parameters:
Estimate Standard error
Intercept 7.51621 0.03310
opt 0.06084 0.00296
(iii) Explain, using the model output shown above, whether the variable ‘opt’ is
significant or not. [2]
Another statistician has suggested that an alternative model needs to take into account
a legal representation variable, which shows whether or not an insured person has
legal representation.
(iv) Explain the difference between the variables ‘opt’ and ‘legal representation’ in
a statistical sense in the context of a GLM. [2]
The actuary now has to choose between the following two models for the claim size:
Model 1: Only opt is used as a covariate.

Model 2: Both opt and legal representation are used as covariates.
CS1A A2022–7
An analysis of variance (ANOVA) was carried out to assess the significance of the
two covariates: opt and legal representation (denoted by lr). The results obtained are
given below, where claim size is denoted by cs:
Model 1: cs = 7.52 + 0.06 × opt

Model 2: cs = 3.6 + 0.04 × opt + 2.32 × lr
Resid. df Resid. dev Df Deviance Pr(>Chi)

Model 1 45 39.987
Model 2 44 15.869 1 24.118 0.000286
(v) Determine which model provides the better fit to the data. [2]
[Total 11]
CS1A A2022–8
8 The time, T, until the next lorry arrives at a customs checkpoint at the border of a
country is modelled with an exponential distribution, that is, T ∼ Exp λ , where λ is
an unknown parameter. Time is measured in minutes.
(i) Identify which one of the following expressions gives the correct likelihood
function L(λ) for the parameter λ, based on a sample of observed times until
the next lorry arrives, ti , i = 1, …, n:
A L λ|T = λn exp (–λ∑ti )

B L λ|T = λn – 1 exp (–λ∑ti )
C L λ|T = λn + 1 exp (–λ∑ti )
D L λ|T = λ exp –λ∑ti
[1]
An analyst uses Bayesian inference to obtain an estimate for λ. They choose a gamma
distribution with parameters α and β as the prior distribution for λ.
(ii) Verify that the posterior distribution of the parameter λ is a gamma

distribution with parameters α + n and β + ∑ti . [4]
Assume that a total of 20 lorries have arrived at the checkpoint.
(iii) Determine the Bayesian estimator for λ, in terms of the parameters α and β,
under quadratic loss based on this sample. [2]
(iv) Explain how to determine the Bayesian estimator for λ under all-or-nothing
loss based on this sample. [3]
(v) Identify which one of the following options gives the correct Bayesian
estimator for λ under all-or-nothing loss based on the sample given:
α
A λ=
β + 60
α + 19
B λ=
β + 60
α + 18
C λ=
β + 60
α + 20
D λ=
β + 60
[2]
(vi) Comment on the difference between the two estimators in parts (iii) and (v).
[1]
[Total 13]
CS1A A2022–9
9 Consider the linear regression model in which the response variable Yi is linked to the
explanatory variable Xi by the following equation:
Yi = α + βXi + ei , i = 1, …, n,
where ei are the error terms and data xi , yi , i = 1, …, n, are available.
(i) Comment on whether or not the linear regression model as presented above
can be used to make inferences on parameters α and β. [3]
S2xy
The coefficient of determination for this model is given by R2 = .
Sxx Syy
(ii) Verify that R2 gives the proportion of the total variability of Y ‘explained’ by
the linear regression model. [3]
Consider the multiple linear regression model where the response variable Yi is
related to explanatory variables X1 , X2 , …, Xk by:
Yi = α + β1 X1i + β2 X2i + … + βk Xki + ei , i = 1, …, n,
where ei are the error terms and relevant data are available.
(iii) Suggest three ways for assessing the fit of the multiple linear regression model
to a set of data. [3]
A forward selection process is used for selecting explanatory variables in the multiple
linear regression model.
(iv) Explain whether the coefficient of determination, R2 , can be used as a criterion

for selecting variables when applying this process. [3]
A multiple linear regression model with four explanatory variables (X1 , X2 , X3 , X4 ) is

fitted to a set of data, and a forward selection process is used for selecting the optimal
set of explanatory variables.
Some output of this process is shown in the following table:
Model R2 Adjusted R2
X1 0.7322 0.7167
X1 X4 0.8018 0.7712
X1 X4 X3 0.8253 0.7805
X1 X4 X3 X2 0.8259 0.7684
(v) Determine the optimal set of explanatory variables using this output. [2]
[Total 14]
CS1A A2022–10
10 A random sample of the records of a certain hospital yielded the following
information on the length of hospital stay in days (li ) and the annual family income
(ai , rounded to the nearest £500) of 15 discharged patients. An analyst believes that
the relationship between these two variables is linear. The graph below depicts the
scatter plot of the annual family income against the length of stay and the simple
linear regression line fitted by the analyst.
Summary statistics for these data are given below:
∑ai = 82,500, ∑a2i = 523,750,000, ∑ai li = 510,500, ∑li = 107, ∑l2i = 871.
(i) Comment on the relationship between the two variables. [2]
(ii) Determine the equation of the simple regression line. [3]
(iii) Perform an ANOVA test to determine whether the slope of the regression line
is significantly different from zero. [4]
(iv) Calculate Pearson’s correlation coefficient between the annual family income
and the length of hospital stay. [1]
(v) Perform a statistical test to determine whether Pearson’s correlation

coefficient for the corresponding population is significantly different
from −0.8. [5]
CS1A A2022–11
(vi) Identify which one of the following options gives an approximate 95%
confidence interval for Pearson’s correlation coefficient for the corresponding
population:
A (–2.027, –0.896)
B (–0.966, –0.714)
C (–0.989, –0.683)
D (–0.908, –0.794)
[2]
[Total 17]
END OF PAPER
CS1A A2022–12

CS1A, April19 To April22

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS1A, April19 To April22

Uploaded by

Copyright:

Available Formats

INSTITUTE AND FACULTY OF ACTUARIES

3 April 2019 (am)

Subject CS1A – Actuarial Statistics

Time allowed: Three hours and fifteen minutes

INSTRUCTIONS TO THE CANDIDATE

3. Mark allocations are shown in brackets.

5. Candidates should show calculations where this is appropriate.

Graph paper is NOT required for this paper.

AT THE END OF THE EXAMINATION

CS1A A2019 © Institute and Faculty of Actuaries

2 Consider an estimator of a parameter θ, denoted as θ! .

A = ‘sum of two dice equals 3’

(i) Show that P(C) = 11/36. [1]

(ii) Calculate P(A|C). [2]

(iii) Calculate P(B|C). [2]

(iv) Determine whether A and C are independent. [1]

(v) Determine whether B and C are independent. [1]

6 Let X and Y be two continuous random variables.

The joint probability density function of X and Y is given by:

(ii) (a) Determine the marginal density functions of X and Y. [2]

(iii) Derive the conditional expectation E[X | Y = y]. [3]

CS1A A2019–3 PLEASE TURN OVER

A wildlife researcher is investigating the national population of a particular species

CS1A A2019–5 PLEASE TURN OVER

An analyst wishes to estimate the unknown proportion p of claims with amount

(i) Derive the maximum likelihood estimator for p. [3]

(v) Estimate p using the MLE in (i). [1]

First-part exam score x (%) 82 49 73 60 61 77 65 85 91 53 59 73

Second-part exam score y (%) 76 58 75 66 70 71 76 92 87 59 63 71

∑ x = 828 ∑ y = 864 ∑ x 2 = 59,054 ∑ y 2 = 63,362 ∑( x – x ) ( y – y ) = 1,334

(ii) Assuming the full Normal model:

(a) Calculate an estimate for the error variance σ2. [2]

(b) Determine the 90% confidence interval for σ2. [2]

(vi) Calculate the proportion of variation explained by the model. [1]

Subject CS1 – Actuarial Statistics Core Principles

A. General comments on the aims of this subject and how it is marked

1. The aim of the Actuarial Statistics 1 subject is to provide a grounding in

B. Comments on student performance in this diet of the examination.

1. Performance was satisfactory, with most candidates demonstrating good

2. Answers requiring the derivation of statistical properties contained a considerable

3. The calculation of probabilities of certain events is fundamental for the

4. Attention is also drawn on providing full and mathematically precise definitions or

CS1A A2019 @Institute and Faculty of Actuaries

Solutions Subject CS1 – A

So P(X > 20) = 1 – F(20) = 1 – (1 – exp(-0.06666 x 20)) = 0.26360 [1]

P(X > 20 ∩ X > 15)

P(X > 20)

Numerator as calculated above for Part (i), denominator is:

𝑃𝑃(𝑋𝑋 > 15) = 1 – 𝐹𝐹(15) = 1 – (1 – exp(−0.06666 𝑥𝑥 15) = exp(−1)

𝑆𝑆𝑆𝑆 𝑃𝑃(𝑋𝑋 > 20 | 𝑋𝑋 > 15) = 0.26360/0.36788 = 0.71653 [1]

Alternatively, using property of exponential distribution:

(i) bias(𝜃𝜃�) = 𝐸𝐸�𝜃𝜃�� − 𝜃𝜃 [1]

(ii) MSE(𝜃𝜃�) = 𝐸𝐸[(𝜃𝜃� − 𝜃𝜃)2 ] [1]

CS1A A2019 @Institute and Faculty of Actuaries

The initial step is to define the sample space:

(i) P(C) = 11/36 [1]

CS1A A2019 @Institute and Faculty of Actuaries

(iii) We can use 𝑝𝑝̂ = 0.57 in the variance (denominator). [1]

CS1A A2019 @Institute and Faculty of Actuaries

𝑦𝑦 𝑥𝑥 𝑓𝑓𝑋𝑋,𝑌𝑌 (𝑥𝑥,𝑦𝑦) 𝑦𝑦 8𝑥𝑥𝑦𝑦

CS1A A2019 @Institute and Faculty of Actuaries

𝑃𝑃[𝑆𝑆𝑋𝑋2 > 4] = 1 − 𝑃𝑃[𝑆𝑆𝑋𝑋2 ≤ 4] [½]