CS1A Workbook For Sept 2020 Exams Sankhyiki

www.sankhyiki.
in
+91-‐9711150002

INDEX
1. Random Variable………………………………………………………………………….3
2. Probability Distribution………………………………………………………………….16
3. Generating Functions…………………………………………………………………….37
4. Joint Distributions………………………………………………………………………..42
5. Revision Assignment - 1…………………………………………………………………57
6. Central Limit Theorem…………………………………………………………………...61
7. Point Estimation………..………………………………………………………………...69
8. Confidence Interval and Hypothesis Testing…………………………………………….87
9. Correlation and Regression………………...…………………………………………...113
10. Revision Assignment - 2……………………………………………………………… 125
11. Sampling………………………………………………………………………………..133
12. Random Number Simulation.…………………………………………………………..137
13. Bayesian Statistics and Credibility Theory...…………………………………………...141
14. GLM…………….………………………………………………………………………162
15. EBCT………….………………………………………………………………………..182
16. Tables
Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 1

www.sankhyiki.in
+91-‐9711150002


www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 1
RANDOM VARIABLE
Question 1. For each of the following, determine whether the given values can serve
as the probability distribution of a random variable with the given range:
!!!
a) 𝑓 𝑥 = !
for 𝑥 = 1,2,3,4,5;
!!
b) 𝑓 𝑥 = !" for 𝑥 = 1,2,3,4;
!
c) 𝑓 𝑥 = ! for 𝑥 = 1,2,3,4,5;
!
d) 𝑓 𝑥 = ! for 𝑥 = 1,2,3,4;
!!
Question 2. Verify that 𝑓 𝑥 = !(!!!) for 𝑥 = 1,2,3, … , 𝑘 can serve as the probability
distribution of random variables with the given range.
Question 3. For each of the following, determine 𝑐 so that the function can serve as the
probability distribution of a random variable with the given range:
a) 𝑓 𝑥 = 𝑐𝑥 for 𝑥 = 1,2,3,4,5;
𝟓
b) 𝑓 𝑥 = 𝑐 𝒙
for 𝑥 = 1,2,3,4,5;
! !
c) 𝑓 𝑥 = 𝑐 !
for 𝑥 = 1,2,3, ….
!
d) 𝑓 𝑥 = 𝑐𝑥 for 𝑥 = 1,2,3, … , 𝑘
Question 4. For each of the following, determine whether the given values can serve
as the values of a distribution function variable with the range x=1, 2, 3,
and 4;
a) 𝐹 1 = 0.3, 𝐹 2 = 0.5, 𝐹 3 = 0.8 and 𝐹 4 = 1.2;
b) 𝐹 1 = 0.5, 𝐹 2 = 0.4, 𝐹 3 = 0.7 and 𝐹 4 = 1.0;
c) 𝐹 1 = 0.25, 𝐹 2 = 0.61, 𝐹 3 = 0.83 and 𝐹 4 = 1.0;
Question 5. If X has the distribution function
0 for 𝑥 < 1
!
!
for 1 ≤ 𝑥 < 4
!
F(x) = !
for 4 ≤ 𝑥 < 6
!
!
for 6 ≤ 𝑥 < 10
1 for 𝑥 ≥ 10

www.sankhyiki.in
+91-‐9711150002

Find PDF and hence find
(a) 𝑃(2 < 𝑋 ≤ 6) (b) 𝑃(𝑋 = 4) (c) 𝑃(𝑋 ≥ 10)
(d) 𝑃(𝑋 < 4) (e) 𝑃(𝑋 > 4) (f) 𝑃(𝑋 ≥ 4)
Question 6. If X has the distribution function

0 for 𝑥 < −1
!
!
for − 1 ≤ 𝑥 < 1
!
𝐹 𝑥 = !
for 1 ≤ 𝑥 < 3
!
!
for 3 ≤ 𝑥 < 5
1 for 𝑥 ≥ 5
Find PDF and find
(a) 𝑃(𝑋 ≤ 3) (b) 𝑃(𝑋 = 3) (c) 𝑃(𝑋 < 3)
(d) 𝑃(𝑋 ≥ 1) (e) 𝑃(−0.4 < 𝑋 < 4) (f) 𝑃(𝑋 = 5)
(g) 𝑃(3 < 𝑋 < 5) (h) 𝑃(3 ≤ 𝑋 < 5) (i) 𝑃(3 ≤ 𝑋 ≤ 5)
Question 7. Given that the discrete random variable X has the distribution function
!
; 𝑥 = 1,2,3
𝑓 𝑥 = ! Find 𝐹(𝑥).
0 elsewhere
Question 8. A random variable X has the following probability function:

X 0 1 2 3 4 5 6
f(x) K 3k 5k 7k 9k 11k 13k
(i) Find 𝑘, (ii) Find 𝑃 𝑋 ≥ 5 , 𝑃 3 < 𝑋 ≤ 6 , 𝑃(𝑋 < 4).

Question 9. A random variable X has the following prob. distribution function
X : 0 1 2 3 4 5 6 7
f(x) : 0 𝑘 2𝑘 2𝑘 3k 𝑘! 2𝑘 ! !
7𝑘 + 𝑘
i) Find k ii) Evaluate 𝑃 𝑋 < 6 , 𝑃 𝑋 ≥ 6 & 𝑃(0 < 𝑋 < 5)

iii) Determine the distribution function of X.
!
!" ; 𝑥 = 1,2,3,4,5
Question 10. If 𝑃 𝑥 =
0; otherwise
! !
Find (i) 𝑃(𝑋 = 1 or 2) (ii) 𝑃 !
< 𝑋 < ! | 𝑋 > 1 .

www.sankhyiki.in
+91-‐9711150002

Question 11. Find distribution function of the random variable that has the probability
distribution.
!
𝑓 𝑥 = !" ; 𝑥 = 1, 2, 3, 4, 5
Question 12. Let X is a continuous random variable with p.d.f.:
𝑎𝑥 ; 0 < 𝑥 < 1
𝑎 ; 1 ≤ 𝑥 ≤ 2
𝑓 𝑥 =
−𝑎𝑥 + 3𝑎 ; 2 ≤ 𝑥 ≤ 3
0 ; elsewhere
(i) Determine constant 𝑎 (ii) 𝑃(𝑋 ≤ 1.5).
Question 13. If X has the probability density function
𝑘𝑒 !!! ; 𝑥 > 0
𝑓 𝑥 =
0 ; otherwise
Find k and 𝑃(0.5 ≤ 𝑋 ≤ 1).
Question 14. 𝑓 𝑥 = 𝑒 !! ; 𝑥 > 0, Find 𝑃(𝑋 > 1).
Question 15: Find a prob. density function for the random variable whose distribution
function is given by
0 for 𝑥 ≤ 0
𝐹 𝑥 = 𝑥 for 0 < 𝑥 < 1
1 for 𝑥 > 1
Question 16. The distribution f𝑥 ! of the random variable X is given by
1 − 1 + 𝑥 𝑒 !! for 𝑥 > 0
𝐹 𝑥 =
0 for 𝑥 ≤ 0
Find i) 𝑃(𝑋 ≤ 2) ii) 𝑃(1 < 𝑋 < 3) iii) 𝑃(𝑋 > 4)
Question 17. The probability density of the random variable Y is given by

!
𝑦 + 1 for 2 < 𝑦 < 4
𝑓 𝑦 = !
0 elsewhere
Find 𝑃(𝑌 < 3.2) and 𝑃(2.9 < 𝑌 < 3.2).

www.sankhyiki.in
+91-‐9711150002

Question 18. The p.d.f of the random variable X is given by
!
for 0 < 𝑥 < 4
𝑓 𝑥 = !
0 elsewhere
Find
!
a) The value of 𝑐 ; b) 𝑃 𝑋 < ! and 𝑃(𝑋 > 1)
Question 19. The density function of the random variable X is given by
6𝑥 1 − 𝑥 for 0 < 𝑥 < 1 ! !

𝑔 𝑥 = . Find 𝑃 𝑋 < ! and 𝑃 𝑋 > ! .
0 elsewhere
Question 20. (a) Show that 𝑓 𝑥 = 3𝑥 ! for 0 < 𝑥 < 1, represents a density function.
(b) Calculate the probability that (0.1 < 𝑋 < 0.5).
Question 21. The probability density of the continuous random variable X is given by
!
for 2 < 𝑥 < 7
𝑓 𝑥 = !
0 elsewhere
Find 𝑃(3 < 𝑋 < 5).
Question 22. Find the distribution function of the random variable X whose
probability density is given by
!
!
for 0 < 𝑥 < 1
𝑓 𝑥 = ! for 2 < 𝑥 < 4
!
0 elsewhere
x for 0 < 𝑥 < 1

𝑓 𝑥 = 2 − x for 1 ≤ x ≤ 2
0 elsewhere
!
!
for 0 < 𝑥 ≤ 1
!
for 1 < 𝑥 ≤ 2
𝑓 𝑥 = !
!!!
!
for 2 < 𝑥 < 3
0 elsewhere

www.sankhyiki.in
+91-‐9711150002

Question 25. The distribution function of the random variable Y is given by
!
1 − !! for 𝑦 > 3
𝐹 𝑦 =
0 elsewhere
Find 𝑃 𝑌 ≤ 5 , P Y ≤ 2 and 𝑃(𝑌 > 8)
Question 26. A random variables X which can be used in certain circumstances as a

model for claim sizes has cumulative distribution function
0 , 𝑥 < 0
𝐹 𝑥 = ! !
1 − !!! , 𝑥 > 0
Calculate the value of the conditional probability 𝑃 𝑋 > 3 𝑋 > 1).
Question 27. The probability density of the random variable Z is given by
!! !
𝑓 𝑧 = 𝑘𝑧𝑒 for 𝑧 > 0 Find k.
0 for 𝑧 ≤ 0
Question 28. If the probability density of X is given by
!!
𝑓 𝑥 = 2𝑥 for 𝑥 > 1
0 elsewhere
Check whether it’s mean and its variance exists.
Question 29. A random variable X has the following probability distribution
X -2 -1 0 1 2
P(X) 1/6 p 1/4 p 1/6
(i) Find the value of p.

(ii) Calculate 𝐸 𝑋 + 2 , 𝐸(2𝑋 ! + 3𝑋 + 5)
Question 30. If X is the number of point rolled with a balanced die, find the expected
value of g(X) = 2𝑋 ! + 1.
Question 31. What is the expectation of the sum of points on 2 unbiased dice?
Question 32. A lot of 12 television sets includes 2 with white cords. If three of the sets
are chosen at random for shipment to a hotel, how many sets with white
cords can the shipper expect to send to the hotel?

www.sankhyiki.in
+91-‐9711150002

Question 33. Let X be a random variable with the following probability function
𝑋 -3 6 9
P (𝑋 = 𝑥) 1/6 1/2 1/3
Find E(X) and E (𝑋 ! ) and evaluate E (2𝑋 + 1)! .
2 1 − 𝑥 for 0 < 𝑥 < 1

𝑓 𝑥 =
0 elsewhere
!
(i) Show that 𝐸(𝑋 ! ) = !!! (!!!)
(ii) And use the result to evaluate E [(2𝑋 + 1)! ].
Question 35. Find the expected value of the random variable Y whose probability
density is given by
!
𝑦 + 1 for − 1 < 𝑦 < 1
𝑓 𝑦 = !
0 elsewhere
Question 36. A continuous random variable X has the p.d.f.

!
𝑓 𝑥 = 𝑎 1 + 𝑥 for 2 ≤ 𝑥 ≤ 5
0 otherwise
(i) Find 𝑎 (ii) Find E(X).
Question 37. Certain coded measurements of the pitch diameter of threads of a fitting
have the probability density
!
for 0 < 𝑥 < 1
𝑓(𝑥) !(!!! ! )
0 elsewhere
Find the expected value of this random variable.
Question 38. An insurance company monthly claim are modeled by a continuous

random variable X whose prob. function is proportional to
(1 + 𝑋)!! , 0 < 𝑥 < 1
(i) Determine the companies expected monthly claims.

(ii) Variance of monthly sales.

www.sankhyiki.in
+91-‐9711150002

Question 39. If X has the probability density
𝑒 !! for 𝑥 > 0
𝑓 𝑥 =
0 elsewhere
Find the expected value of g(X) = 𝑒 !! ! .
Question 40. 𝑓 𝑥 = 𝑘(1 + 𝑥 ! )!! , 𝑥 > 0. Find the value of k for which f(x) will be the pdf
of a continuous random variable X. Find F(x).
Question 41. Let X be a random variable denoting the hours of life in an electric light
bulb. Suppose X is distributed with density function
!
𝑓 𝑥 = !""" 𝑒 !!/!""" for 𝑥 > 0. Find the expected life time of such a bulb.
!
Question 42. 𝑓 𝑥 = ! 𝑥 + 1 ; −1 < 𝑥 < 1. Find the variance of X.
Question 43. 𝑓 𝑥 = 𝜆𝑒 !!" , 0 < 𝑥 < ∞. Find the variance of X.
Question 44. The probability density function of a random variable X is given by
𝑘𝑥 1 − 𝑎𝑥 ! , 0 ≤ 𝑥 ≤ 1
𝑓 𝑥 =
0, otherwise
where k and 𝑎 are positive constants.
(i) Determine the value of k in terms of 𝑎.

(ii) For the case 𝑎 = 1, determine the mean of X.
Question 45. A claim size distribution is modeled using a simple distribution with
density of the form
𝑘 100 − 𝑥 , 0 ≤ 𝑥 ≤ 100
𝑓 𝑥 =
0 , otherwise
(i) Verify that 𝑘 = 0.0002.

(ii) Determine the mean of this claim size distribution.
(iii) Calculate the probability that an individual claim size is greater
than 50.
(iv) Calculate the probability that an individual claim size is less than
60 given that it is greater than 50.

www.sankhyiki.in
+91-‐9711150002

6𝑥 1 − 𝑥 for 0 < 𝑥 < 1

𝑓 𝑥 =
0 elsewhere
Find the probability density function of 𝑌 = 𝑋 ! .
!! !
𝑓 𝑥 = 2𝑥𝑒 for 𝑥 > 0
0 elsewhere
and 𝑌 = 𝑋 ! , find
a) The probability density function of Y.
b) The distribution function of Y.

!
for 0 < 𝑥 < 2
𝑓 𝑥 = !
0 elsewhere
Find the probability density function of 𝑌 = 𝑋 ! .
Question 49. Let X is a continuous random variable with p.d.f.

!
, 1<𝑥<5
𝑓 𝑥 = !"
0 otherwise
Find the p.d.f. of 𝑌 = (2𝑋 − 3).
Question 50. If X is a continuous random variable with pdf
!!
!
if x ≤ 0
𝑓 𝑥 = !!!
Find 𝐸[ 𝑋 ].
!
if x > 0
Question 51. The probability density for damage claims X paid by the Automobile
insurance company on collision insurance is given below.
!! !
𝑓(𝑥) = ! (!! !! ! ) for 𝑥 ≤ 𝑎
=0 otherwise
Obtain the mean and variance of X.
www.sankhyiki.in
+91-‐9711150002

Question 52. A claim size distribution is modeled using a simple distribution with
density of the form
𝑘 50 − 𝑥 , 0 ≤ 𝑥 ≤ 50
𝑓 𝑥 =
0, otherwise
(i) Find k.
(ii) Determine the mean of this claim size distribution.
(iii) Calculate the probability that an individual claim size is greater
than 25.
(iv) Calculate the probability that an individual claim size is less than
30 given that it is greater than 25.
Question 53. A random sample of size n is taken from a distribution with probability
density function:
!
𝑓 𝑥 = (!!!)!!! , 0 < 𝑥 < ∞
where α is a parameter such that α > 0

Show by evaluating the appropriate integral that, in the case α > 1, the
!
mean of this distribution is given by !!!.
Question 54. The random variable X has probability density function
f(x) = k(1-x)(1+x), 0<x<1
where k is a positive constant.
(i) Show that k = 1.5
(ii) Calculate the probability P(X > 0.25).
Question 55. A continuous random variable X has the cumulative distribution function
𝐹! (𝑥) given by
0, 𝑥 < 0
!!
𝐹! 𝑥 = , 0 ≤ 𝑥 ≤ 2
!
1, 𝑥 > 2
(i) Determine the probability density function of X.
(ii) Calculate 𝑃(0.5 < 𝑋 < 1)
Let Y= 𝑋
(iii) Determine the cumulative distribution function and the probability
density function of Y.
(iv) Calculate the expected values of X and Y.
www.sankhyiki.in
+91-‐9711150002

Question 56. Let X be a discrete random variable with the following probability
distribution:
X 0 1 2 3
P(X = x) 0.4 0.3 0.2 0.1
Calculate the variance of Y, where Y = 2X + 10.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. (a) No, (b) Yes, (c) No (d) Yes

Ans.2. Yes
! !" !
Ans.3. (a) 𝐶 = !" (b) 𝐶 = !"# (c) 𝐶 = 3 (d) 𝐶 = ! !!! (!!!!)
Ans.4. (a) No (b) No (c) Yes

Ans.5. (a) 3/6 (b) 1/6 (c) 1/6 (d) 1/3 (e) 3/6 (f) 2/3
Ans.6. (a) 3/4 (b) 1/4 (c) 1/2 (d) 3/4 (e) 1/2 (f) 1/4 (g) 0 (h) 1/4
(i) ½
0 𝑥 < 1
!
1 ≤ 𝑥 < 2
Ans.7. 𝐹 𝑥 = !!
!
2 ≤ 𝑥 < 3
1 𝑥 ≥ 3
Ans.8. 1/49, 24/49, 33/49, 16/49
Ans.9. (i) k=1/10 (ii) 81/100, 19/100 & 4/5
(iii)
x 0 1 2 3 4 5 6 7
P(X≤ 𝑥) 0 ! ! ! ! !" !"
!" !" !" ! !"" !""
1
Ans.10. (i) 1/5 (ii) 1/7 [Hint (ii) P(1< X< 2 𝑋 > 1)
0 𝑥 < 1
!
!"
1 ≤ 𝑥 < 2
!
Ans.11. 𝐹(𝑥) = 2 ≤ 𝑥 < 3
!"
!
!"
3 ≤ 𝑥 < 4
!"
!"
4 ≤ 𝑥 < 5
! !
Ans.12. (i) 𝑎 = ! (ii) !
Ans.13. 𝑘 = 3 and 𝑃 0.5 ≤ 𝑋 ≤ 1 = 0.173

Ans.14. 𝑒 !!
0 for 𝑥 < 0
Ans.15. 𝐹 𝑥 = 1 for 0 < 𝑥 < 1
0 for 𝑥 > 1
www.sankhyiki.in
+91-‐9711150002

!
Ans.16 (i) 1 − 3𝑒 !! (ii) !
(1 − 2𝑒 !! ) (iii) 5𝑒 !!
Ans.17. 0.54, 0.15187

Ans.18. (a) 1/4 (b) 0.25, 0.5
Ans.19. 0.15625, 0.5
Ans.20. 0.124
Ans.21. 2/5
0 𝑥 < 0
!
!
0 ≤ 𝑥 < 1
!
Ans.22. 𝐹 𝑥 = !
1 ≤ 𝑥 < 2
!!!
!
2 ≤ 𝑥 < 4
1 𝑥 ≥ 4
0 𝑥 < 0
!!
!
0 ≤ 𝑥 < 1
Ans.23. 𝐹 𝑥 = !!
2𝑥 − !
− 1 1 ≤ 𝑥 < 2
1 𝑥 ≥ 2
0 𝑥 < 0
!!
!
0 ≤ 𝑥 < 1
! !
Ans.24. 𝐹 𝑥 = !
− ! 1 ≤ 𝑥 < 2
6𝑥−𝑥2 −5
2 ≤ 𝑥 < 3
4
1 𝑥 ≥ 3
Ans.25. 16/25, 0, 9/64
!
Ans.26. (!)!
Ans.27. 𝑘 = 2
Ans.28. E(X) = 2 V(X) = ∞, which is not a finite number and hence it does not exist
Ans.29. (i) p=5/24, (ii) 2, 17/2
Ans.30. 94/3
Ans.31. 𝐸 𝑋 = 7
Ans.32. 1/2
Ans.33. E(X) =11/2, E (𝑋 ! ) = 93/2, E (2X+1)! = 209
Ans.34. 3
www.sankhyiki.in
+91-‐9711150002

Ans.35. 1/12
Ans.36. (i) 1/42 (ii) 31/8
!
Ans.37. !
log 2
Ans.38. (i) 2/7 (ii) 3/49

Ans.39. 4
!
Ans.40. 2/𝜋, !
𝑡𝑎𝑛!! 𝑥
Ans.41. 1000 hrs.

Ans.42. 2/9
Ans.43. 1/𝜆!
!
Ans.44. (i) 𝑘 = !!! (ii) 8/15
Ans.45. (ii) 33.33 (iii) 1/4 (iv) 9/25

Ans.46. 𝑓 𝑦 = 2(𝑦 !! !
− 1) ;0 < 𝑦 < 1
Ans.47. (a) 𝑓! 𝑦 = 𝑒 !! ; 𝑦 > 0 (b) F(y) = 1 − 𝑒 !! , y>0
!
Ans.48. ! 𝑦 !! ! ; 0 < 𝑦 < 8
!!!
Ans.49. !"
; −1 < 𝑦 < 7
Ans.50. E [|X|] = 1
!!
Ans.51. E(X) = 0 V(X) = !
(4 − 𝜋)
Ans.52. (a) k= 1/1250 (b) 16.67 (c) 0.25 (d) 0.36

Ans.54. (ii) 0.63281
! !! !
Ans.55. (i) 𝑓 𝑥 = ! 𝑥 ! 𝑓𝑜𝑟 0 ≤ 𝑥 ≤ 2 (ii) 0.109375 (iii) 𝑓! 𝑦 = !
(iv) 1.2122
Ans.56. 4
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 2
PROBABILITY DISTRIBUTION
Question 1. Let X ~ 𝐵(𝑛, 𝑝) with 𝑛 = 25 & 𝑃 = 0.2. Find P [𝑋 < 𝜇 − 2𝜎].
Question 2. Let X ~ 𝐵(𝑛, 𝑝) If E(X) = 5, Var. X = 4. Find 𝑛 & p.
Question 3. X ~ P (𝜆) such that. P(X = 0) = P(X = 1). Find E[X].
Question 4. X ~ P (𝜆) such that P(X = 0) = 0.5 find E[X].
Question 5. Name of a dist. which is
(1) 𝜇! ≥ Var. X (2) 𝜇! = Var. X (3) 𝜇! ≤ Var. X
Question 6. If 1% Gillette blades are defective, what is the probability that a carton of
50 Gillette blades has at least 2 defective blades?
Question 7. The average no. of calls arriving at a telephone exchange is 30 per hour.
What is the probability that
i. No calls arrive in a 3 min period.

ii. More than 5 calls arrive in a 5 min period.
Question 8. Suppose that flows in plywood occur at an average of one flow per 50
sq.ft. What is the probability that a 4×8 ft. sheet will have?
(i) no flows (ii) at most one flow.
Question 9. An insurance company finds that .005 of the population die from a certain
kind of accident each year. What is the probability that the company must
pay off three or more than 3 of 10,000 insured risk against such accident in
a given year.
Question 10. Assume that the number of fatal car accidents in a certain state obeys a
Poisson distribution with an average of one per day. What is the
probability of more than 10 such accidents in a weak?
Question 11. A die is cast until 6 appear. What is the probability that it must be cast
more than five times?
Question 12. A marks man is required to shoot at a target until he scores 5 bulls eye is
X. The prob. that he hits the bull’s eyes on any trial is 0.3. What is the
prob. that he requires 8 shots?
www.sankhyiki.in
+91-‐9711150002

Question 13. Find the probability of getting five heads and seven tails in 12 flips of a
balanced coin.
Question 14. Find the probability that seven of 10 persons will recover from a tropical
disease if we can assume independence and the probability is 0.80 that
any one of them will recover from the disease.
!
Question 15. If X has the discrete uniform distribution 𝑓 𝑥 = ! for 𝑥 = 1,2, … , 𝑘, show
!!! ! ! !!
that (a) Its mean is 𝜇 = !
(b) its variance is 𝜎 ! = !"
Question 16. If the probability is 0.40 that a child exposed to a certain contiguous
disease will catch it, what is the probability that the tenth child exposed to
the disease will be the third to catch it?
Question 17. If the probability is 0.75 that an applicant for a driver’s license will pass
the road test on any given try, what is the probability that an applicant
will finally pass the test on the fourth try?
Question 18. As part of an air-population survey, an inspector decides to examine the

exhaust of six of a company’s 24 trucks. If four of the company’s trucks
emit excessive amount of pollutants, what is the probability that none of
them will be included in the inspector’s sample?
Question 19. Among the 120 applicants for a job, only 80 are actually qualified. If five of
the applicants are randomly selected for an in-depth interview, find the
probability that only two of the five will be qualified for the job by using
a. The formula for the hyper geometric distribution;

b. The formula for the binomial distribution with θ = 80/120 as an
approximation.
Question 20. If 2% of the books bound at a certain bindery have defective binding, use
the Poisson approximation to the binomial distribution to determine the
probability that five of 400 books bound by this bindery will have
defective bindings.
Question 21. Records show that the probability is 0.00005 that a car will have a flat tire
while crossing a certain bridge. Use the Poisson distribution to
approximate the binomial probabilities that, among 10,000 cars crossing
this bridge.
www.sankhyiki.in
+91-‐9711150002

a. exactly two will have a flat tire;
b. at most two will have a flat tire;
Question 22. The average number of trucks arriving on any one day at a truck depot in
a certain city is known to be 12. What is the probability that on a given
day fewer than nine trucks will arrive at the depot?
Question 23. A certain kind of sheet metal has, on the average, five defects per 10
square feet. If we assume a Poisson distribution, what is the probability
that a 15- square foot sheet of the metal will have at least six defects?
Question 24. Derive the formulas for the mean and the variance of the Poisson
distribution by first evaluating E(X) and E [𝑋(𝑋 − 1)].
Question 25. If the probability is 0.75 that a person will believe a rumor about the
transgressions of a certain politician, find the probabilities that
a. The eighth person to hear the rumour will be the fifth to believe
it;
b. The fifteenth person to hear the rumour will be the tenth to
believe it.
Question 26. If the probabilities of having a male or female child or both 0.50, find the
probabilities that
a. a family’s fourth child is their first son;

b. a family’s seventh child is their second daughter;
c. a family’s tenth child is their fourth or fifth son;
Question 27. When taping a television commercial, the probability is 0.30 that a certain
actor will get his lines straight on any one take. What is probability that he
will get his straight for the first time on the sixth take?
Question 28. Records show that the probability is 0.0012 that a person will get food
poisoning spending a day at a certain state fair. Use the Poisson
approximation to the binomial distribution to find the probability that
among 1,000 persons attending the fair at most two will get food
poisoning.
Question 29. Among the 16 applicants for a job 10 have college degrees. If three of the
applicants are randomly chosen for interviews, what are the probabilities
that
www.sankhyiki.in
+91-‐9711150002

a. None has a college degree b. One has a college degree;
c. Two has college degrees; d. All three have college degrees;
Question 30. Find the probabilities that the value of a random variable will exceed 4 if it
has a gamma distribution with
a. ∝= 2 and 𝜆 = 3; b. ∝= 3 and 𝜆 = 4;
Question 31. Find the probabilities that random variable having the standard normal
distribution will take on a value
a. Less than 1.72; b. less than -0.88;

c. between 1.30 and 1.75 d. Between -0.25 and 0.45
Question 32. Suppose that the time in days between services calls on an office-copying
machine follows an exponential distribution with mean 50 days.
i. What is the probability that the time until the machine again
requires service exceeds 60 days?
ii. Find the probability that the time until the machine again require
service is longer than 50 + 2𝜎, where 𝜎 is the standard deviation of
the distribution.
Question 33. (a) Assume that 40% of the policyholders in a certain metropolitan area
have type A blood. If the distribution of the blood donors among the
policy holders entering a blood bank on any given day is considered
random,
i. Find the distribution of X, the number of donors entering a blood

bank on a given day, until the first type A donor is encountered.
ii. Also find the mean and standard deviation of X.
(b) Suppose that the amount of time a customer spends at a cash counter
in a certain office has an exponential distribution with a mean of six
minutes. Find
i. The probability that a randomly selected customer will spend

more than 12 minutes.
www.sankhyiki.in
+91-‐9711150002

ii. The conditional probability that the customer will spend more
than 12 minutes in the cash counter given that the customer has
been there for more than six minutes.
iii. The probability that the customer spends longer than (µ+2𝜎)
minutes where µ and 𝜎 are the mean and standard deviation of the
exponential distribution.
Question 34. The time (in minutes) between telephone calls at an Insurance claims
office has the following exponential distribution:
!
!
𝑓 𝑥 = ! 𝑒 ! ! 0≤𝑥≤∞
i. What is the mean time interval between consecutive telephone

calls?
ii. What is probability of having 6 or more minutes without a
telephone call?
iii. What is the probability of receiving a telephone call between 6 and
9 minutes just after the receipt of a call?
Question 35. The average time a subscriber spends reading ‘THE HINDU’ is 49
minutes. Assume that the standard deviation is 16 minutes and that the
times are normally distributed.
i. What is that probability that a subscriber will spend at least 1 hour

reading the paper?
ii. What the probability that a subscriber will spend no more than 30
minutes reading the paper?
Question 36. Phone calls arrive at the rate of 48 per hour at the reception desk for an
insurance company. Find
i. The probability of receiving exactly 10 calls in 15 minutes.

ii. The probability of receiving three calls in a 5 minutes interval of
time.
Question 37. 40% of business travelers carry either a cell phone or a laptop. In a sample
of 15 business travelers,
i. What is the probability that three have a cell phone or laptop?
www.sankhyiki.in
+91-‐9711150002

ii. What is the probability that at least three of the travellers have a
cell phone or a laptop?
iii. What is the probability that 12 of the travellers have neither a cell
phone nor a laptop?
Question 38. At a certain large restaurant in a city it takes an average 10 minutes to

receive the order after placing. If the service time is exponentially
distributed, find the probability that the customer waiting time is
i. more than 10 minutes

ii. 3 minutes or less
Question 39. Towards recruiting actuarial professional, an insurance company is

conducting entrance examination. The test score for the examination are
normally distributed with mean 450 and a standard deviation of 100.
i. Suppose someone receives a score of 630, what percentage of

people taking the test scores better?
ii. If the insurance company will not recruit any one scoring 420, what
percentage of the person taking the test would be acceptable to the
company?
Question 40. An insurance company found that only 0.01% of the population is
involved in a certain type of accident each year. If its 1000 policyholders
can be regarded as randomly selected from the population, what is the
probability that not more than two of its clients are involved in such
accidents?
Question 41. In a certain metropolitan city the daily consumption of electric power (in
Million Kilowatt Hour (MKH)) may be regarded as a random variable
having Gamma distribution with parameter (3, 1/2). If the power plant
has a daily capacity of 12 MKH, what is the probability that this power
supply will be inadequate on any given day?
Question 42. On the average 8 calls per hours are received in a telephone board.
Assuming that the number of calls received in the board in a given length
of time is a Poisson process, find the probability that
i. 6 calls received in 2 hours.
www.sankhyiki.in
+91-‐9711150002

ii. At least 2 calls in the next 20 minutes.
Question 43. Consumer demand for milk X, in a metropolitan area, is known to follow
a Gamma distribution with p.d.f.
!! ! !!" ! !!!
𝑓 𝑥 = !"
It is given that the average demand is ‘a’ liters and the modal demand is
‘b’ liters (𝑏 < 𝑎).
a) Compute the mode in terms of 𝛼 and 𝜆?

!
b) Given E(X) = ! . What is the variance in terms of a and b.
Question 44. The random variable 𝑌 = Log 𝑋 has N (10, 4) distribution. Find
a) The p.d.f. of X
b) Mean and variance of X
c) 𝑃(𝑋 ≤ 1000)
Question 45. A very crude model for the distribution of claim size, X, in a particular
situation represents X as a ‘discrete random variable, which takes the
values £5,000, £10,000, and £20,000 with probabilities 0.4, 0.5, and 0.1
respectively.
Calculate the probability that of five randomly selected claims, three are
for £5,000 each and the other two are for larger amounts.
Question 46. A multiple-choice test consists of 8 questions and 3 options to each

question (of which only one is correct). If a student answers each question
by rolling a balanced die and checking the first answer if he gets 1 or 2, the
second answer if he gets 3 or 4 and the third answer if he gets 5 or 6, what
is the probability that he will get exactly 4 correct answers?
Question 47. An automobile safety engineer claims that 1 in 10 automobile accident is

due to driver fatigue. What is the probability that at least 3 of 5
automobile accidents are due to driver fatigue?
Question 48. If 40% of the mice used in an experiment will become very aggressive
within 1 minute after having been administered an experimental drug,
www.sankhyiki.in
+91-‐9711150002

find the probability that exactly six of the 15 mice that have been
administered the drug will become very aggressive within 1 minute?
Question 49. In a certain city, incompatibility is given as the legal reason in 70% of all
divorce cases. Find the probability that 5 of the next 6 divorce cases files in
this city will claim incompatibility as reason.
Question 50. A social scientist claims that only 50% of all high school seniors capable of
doing college work actually go to college. Assuming that this claim is true,
find the probabilities that among 18 high school seniors capable of doing
college work
i. Exactly 10 will go to college;

ii. At least 10 will go to college;
iii. At most 8 will go to college;
Question 51. (a) To reduce the standard deviation of the binomial distribution by half,
what change must be made in the number of trials?
(b) If n is the multiplied by the factor k in the binomial distribution having

the parameter n and p, what statement can be made about the standard
deviation of the resulting distribution?
Question 52. A and B play a game in which their chances of winning are in the ratio 3:2.
Find A’s chance of winning at least 3 games out of 5 games played.
Question 53. A coffee connoisseur claims that he can distinguish between a cup of
instant coffee and a cup of percolator coffee 75% of the time. It is agreed
that his claim will be accepted if he correctly identifies at least 5 of the 6
cups. Find his chances of having the claim (i) accepted, (ii) rejected, when
he does have the ability he claims.
Question 54. An irregular six-faced die is thrown and the expectation that in 10 throws
it will give five even numbers is twice the expectation that it will give four
even numbers. How many times in 10,000 sets of 10 throws each, would
you expect it to give no even number?
Question 55. A department in a works has 10 machines, which may need adjustment
from time to time during the day. Three of these machines are old; each
having a probability of 1/11 of needing adjustment during the day, and 7
www.sankhyiki.in
+91-‐9711150002

are new, having corresponding probabilities of 1/21. Assuming that no
machine needs adjustment twice on the same day, determine the
probabilities that on a particular day
(i) Just 2 old and no new machines need adjustment.
(ii) If just 2 machines need adjustment, they are of the same type.
Question 56. The probability of a man hitting a target is 0.25;
(i) If he fires 7 times what is the probability of his hitting the target at
least twice?
(ii) How many times must he fire so that the probability of his hitting
the target at least once is greater than 2/3?
Question 57. In a precision bombing attack there is a 50% chance that any one bomb
will strike the target. Two direct hits are required to destroy the target
completely. How many bombs must be dropped to give a 99% chance or
better of completely destroying the target? [Hint: Probability that out of n
bombs, at least two strike the target, is greater than 0.99]
Question 58. In a binomial distribution consisting of 5 independent trials, probabilities

of 1 and 2 successes are 0.4096 and 0.2048 respectively. Find the parameter
“p” of the distribution.
Question 59. With the usual notations, find p for a binomial variate X, if n = 6 and
9P(X=4) = P(X=2).
Question 60. The mean and variance of binomial distribution are 4 and 4/3
respectively. Find P(X>=1).
Question 61. A manufacturer of cotter pins knows that 5% of his product is defective. If
he sells cotter pins in boxes of 100 and guarantees that not more than 10
pins will be defective, what is the approximate probability that a box will
find to meet the guaranteed quality?
Question 62. A car hire firm has two cars, which it hires out day by day. The number of
demands for a car on each day is distributed as a Poisson distribution with
mean 1.5. Calculate the proportion of days on which (i) neither car is used,
and (ii) the proportion of days on which some demand is refused.
www.sankhyiki.in
+91-‐9711150002

Question 63. An insurance company insures 4,000 people against loss of both eyes in a
car accident. Based on previous data, the rates were computed on the
assumption that on the average 10 persons in 1,00,000 will have car
accident each year that result in this type of injury. What is the probability
that more than 3 of the injured will collect on their policy in a given year?
Question 64. A manufacturer, who produces medicine bottles, finds that 0.1% of the
bottles are defective. The bottles are packed in boxes containing 500
bottles. A drug manufacturer buys 100 boxes from the producer of bottles.
Using Poisson distribution, find how many boxes will contain; (i) no
defective and (ii) at least two defectives.
Question 65. Six coins are tossed 6,400 times. Using the Poisson distribution, find the
approximate probability of getting 6 heads r times. [Hint : p = 0.5! and n =
6,400]
Question 66. In a book of 520 pages, 390 typographical errors occur. Assuming Poisson
law for the number of errors per page, find the probability that a random
sample of 5 pages will contain no error.
Question 67. Suppose that the number of telephone calls coming into a telephone
exchange between 10 a.m. and 11 a.m. say, X1 is a random variable with
Poisson distribution with parameter 2. Similarly the number of calls
arriving between 11 a.m. and 12 p.m., say, X2 has a Poisson distribution
with parameter 6. If X1 and X2 are independent, what is the probability
that more than 5 calls come in between 10 a.m. and 12 p.m.?
Question 68. If X is a Poisson variate such that P(X=2) = 9P(X=4) + 90P(X=6).
Find (i) 𝜆, (ii) the mean of X, (iii) the coefficient of skewness.
Question 69. If X and Y are independent Poisson variates with 𝜆, 1 and 2 respectively,
find the probability that X+Y = k.
Question 70. If X is uniformly distributed with mean 1 and variance 4/3, find P(X<0).
Question 71. Subway trains on a certain line run every half hour between mid-night
and six in the morning. What is the probability that a man entering the
station at a random time during this period will have to wait at least
twenty minutes? [Hint : U(0,30) distribution]
www.sankhyiki.in
+91-‐9711150002

Question 72. At Yamuna expressway, the number of cars exceeding the speed limit by
more than 100km/hr is a random variable having Poisson distribution
with 𝜆 = 8.4 for 30 minutes. What is the probability of a waiting time of
less than 5 minutes between cars exceeding the speed limit by more than
100km/hr?
Question 73. Show that if a random a variable has a uniform density with the
parameters a and b, the probability it takes values less than a+p(b-a) is
equal to p.
Question 74. If a random variable X has a uniform density with the parameters a and b,
find its distribution function.
Question 75. Show that if a random variable has an exponential distribution with mean
𝜆, the probability that it will take on a value less – 𝜆.ln(1-p) is equal to p.
Question 76. Suppose that the amount of cosmic radiation to which a person is exposed
when flying by jet across the United States is a random variable having a
normal distribution with mean of 4.35mrem and a standard deviation of
0.59mrem. What is the probability that a person will be exposed to more
than 5.20mrem of cosmic radiation on such a flight?
Question 77. X is a normal variate with mean 30 and S.D. 5. Find the probabilities that
(i) 26<X<40 (ii) X>45 (iii) |X-30| > 5
Question 78. The mean yield for one-acre plot is 662 kgs with a s.d. 32 kgs. Assuming
normal distribution, how many one-acre plots in a batch of 1,000 plots
would you expect to have yield (i) over 700kgs, (ii) below 650 kgs.
Question 79. The local authorities in a certain city install 10,000 electric lamps in the
streets of the city. If these lamps have an average life of 1,000 burning
hours with a standard deviation of 200 hours, assuming the normality,
what number of lamps might be expected to fail
(i) in the first 800 hours (ii) between 800 and 1,200 hours
Question 80. Claim amounts are modeled as an exponential random variable with
mean £1,000.
(i) Calculate the probability that one such claim amount is greater than
£5,000
www.sankhyiki.in
+91-‐9711150002

(ii) Calculate the probability that a claim amount is greater than £5,000
given that it is greater than £1,000.
Question 81. The ratio of the standard deviation to the mean of a random variable is
called the coefficient of variation.
For each of the following distributions, decide whether increasing the
mean of the random variable increases, decreases or has no effect an the
value of the coefficient of variation:
(a) Poisson with mean λ (b) Exponential with mean µ
(c) Chi-square with ν degrees of freedom
Question 82. Claim sizes are normally distributed about a mean µ = £6,000 and with
standard deviation σ = £1,000.Calculate the probability that a claim is for
more than £7,500: given that it is for more than £6,000.
Question 83. It is assumed that claims arising on an industrial policy can be modeled as
a Poisson process at a rate of 0.5 per year.
(i) Determine the probability that no claims arise in a single year.
(ii) Determine the probability that, in three consecutive years, there is one
or more claims in one of the years and no claims in each of the other two
years.
(ii) Suppose a claim has just occurred. Determine the probability that more
than two years will elapse before the next claim occurs.
Question 84. Claim sizes in a certain insurance situation are modeled by a normal
distribution with mean µ = £30,000 and standard deviation σ = £4,000. The
insurer defines a claim to be a large claim if the claim size exceeds £35,000.
(i) Calculate the probabilities that the size of a claim exceeds:
(a) £ 35,000 and (b) £ 36,000
(ii) Calculate the probability that the size of a large claim (as defined
by the insurer) exceeds £ 36000.
(iii) Calculate the probability that a random sample of 5 claims includes
2 which exceed £ 35,000 and 3 which are less than £ 35,000.
Question 85. The probability that a component in a rocket motor will fail when the
motor is fired is 0.02. To achieve a greater reliability several similar
components are to be fitted in parallel; the motor will then fail only if all
www.sankhyiki.in
+91-‐9711150002

the individual components fail simultaneously. Determine the minimum
number of components required to ensure that the probability the motor
fails is less than one in a billion (to less than
10-9), assuming that components fail independently.
Question 86. Suppose that in a group of insurance policies (which are independent as
regards occurrence of claims), 20% of the policies have incurred claims
during the last year. An auditor is examining the policies in the group on
by one in random order until two policies with claims are found.
(i) Determine the probability that exactly five policies have to be examined
until two policies with claims are found.
(ii) Find the expected number of policies that have to be examined, until
two policies with claims are found.
Question 87. If X ~ Gamma(10,10) and P( X > L) = 0.01 , determine the value of L .
Question 88. If log X has a N(𝜇 ,𝜎 ! ) distribution , we say that X has a logN(𝜇, 𝜎 ! )
distribution. If Y ~ log N (10, 4) , calculate P(Y>200,000).
Question 89. An insurance company’s records suggest that experienced drivers (those
aged over 21) submit claims at a rate of 0.1 per year, and inexperienced
drivers (those 21 years old or younger) submit claims at a rate of 0.15 per
year. A driver can submit more than one claim a year. The company has
40 experienced and 20 inexperienced drivers insured with it.
The number of claims for each driver can be modeled by a Poisson

distribution, and claims are independent of each other. Calculate the
probability the company will receive three or fewer claims in a year.
Question 90. Calculate P(X < 8) if:

(i) X is the number of claims reported in a year by 20 policyholders.
Each policyholder makes claims at the rate of 0.2 per year
independently of the other policyholders.
(ii) X is the number of claims examined up to and including the fourth
claim that exceeds £20,000. The probability that any claim received
exceeds £20,000 is 0.3 independently of any other claim.
(iii) X is the number of deaths amongst a group of 500 policyholders.
Each policyholder has a 0.01 probability of dying independently of
any other policyholder.
www.sankhyiki.in
+91-‐9711150002

(iv) X is the number of phone calls made before an agent makes the first
sale. The probability that any phone call leads to a sale is 0.01
independently of any other call.
Question 91. Suppose that the distribution of a physical coefficient, X , can be modeled
using a uniform distribution on (0,1) . A researcher is interested in the
distribution of Y, an adjusted form of the reciprocal of the coefficient,
!
where Y = ! − 1.
(i) Show that the probability density function of Y is given by:

!
𝑓! 𝑦 = (!!!)! , 𝑦 > 0
(ii) Show that mean of Y does not exist.
Question 92. Let X and Y be independent random variables. Let V and W be the random
variables defined by V = max{X, Y} and W = min{X, Y}, i.e., V is the larger,
and W is the smaller, of the observations of X and Y.
Let 𝐹! , 𝐹!, 𝐹! 𝑎𝑛𝑑 𝐹! denote the distribution functions of X, Y, V and W

respectively.
(i) Show that 𝐹! 𝑡 = 𝐹! (𝑡)𝐹! (𝑡)

(ii) Show that 𝐹! 𝑡 = 𝐹! 𝑡 + 𝐹! 𝑡 − 𝐹! (𝑡)𝐹! (𝑡)
(iii) The random variable X has an exponential distribution with
parameter 4 and, independently, Y has an exponential distribution
with parameter 4. Obtain the distribution function of minimum of X
and Y and state its mean.
Question 93. A coin has two sides, “heads” and “tails”. Such a coin with P(heads) = p is
tossed repeatedly until it lands “heads” for the first time. Let X be the
number of tosses required.
Suppose the process is repeated independently a total of n times,

producing values of the variables X1, X2, … , Xn , where each Xi has the
same distribution as X.
Let Y = min(X1, X2, … , Xn), so Y is the smallest number of tosses required

to produce a “heads” in the n repetitions of the experiment.
www.sankhyiki.in
+91-‐9711150002

(i) Explain why, for each i = 1, 2, …, n, P(Xi ≥ x) is given by
P(Xi ≥ x) = (1 – p)x−1 , x = 1, 2, … .
(ii) (a) Find an expression for P(Y ≥ y).
(b) Hence deduce the probability function of Y.

!"#$ ! !"#$
Question 94. A measure of skewness is defined as: ψ = !"#$%#&% !"#$%&$'(.
Find the value of ψ for a gamma distribution with parameters α = 2.5 and
𝜆= 0.4.
Question 95. A secretary is given 100 computer passwords and only one, which is
correct, opens a file. Since the secretary has no information on the correct
password, she tries to open using one of the passwords. She randomly
chooses one and discards it if incorrect until she finds the correct one.
(i) Calculate the probability that she obtains the correct password in
the third attempt.
A security system has been set up so that if three incorrect passwords are
tried before the correct one, the computer file is locked and access to it is
denied.
(ii) Calculate the probability that the secretary will gain access to the
file.
The secretary selects a password tries it and if it does not work, puts it
back with the other passwords before randomly selecting a new password.
(iii) Calculate the probability that the correct password is found on the
tenth attempt.
Question 96. Let 𝑋! , 𝑋! , … 𝑋! be iid random variables from exponential distribution with
parameter λ. Find the pdf of 𝑉 = max (𝑋! , 𝑋! , … 𝑋! ).
Question 97. Derive an iterative/ recursive formula for the probability function of the
Poisson distribution.
[Hint : Result of the form 𝑃 𝑋 = 𝑥 = 𝑘(𝑥, 𝜆)𝑃(𝑋 = 𝑥 − 1)]
Question 98. If U denotes a continuous random variable that is uniformly distributed

over the range (-1, 1) and V denotes a discrete random variable that is
equally likely to take any of the values {-1,- 0.5 ,0, 0.5 ,1}, calculate the
variance of U and V . Comment on your answers.
Question 99. A random variable has a lognormal distribution with mean 10 and
variance 4. Calculate the probability that the variable will take a value
www.sankhyiki.in
+91-‐9711150002

between 7.5 and 12.5.
Question 100. The random variable N has a Poisson distribution with parameter 𝜆 and
P(N = 1| N ≥ 1) = 0.4 . Calculate the value of 𝜆 to 2 decimal places.
Question 101. Let X follows Exp(𝜆) distribution

(i) Determine the probability density function of the random variable
Y , where Y = X2
(ii) Show that Y has a Weibull distribution, stating clearly the
parameters of the distribution.
Question 102. Obtain the recursive relation for the Binomial distribution (n,p) of the
form 𝑃 𝑋 = 𝑥 = 𝑔 𝑥, 𝑛, 𝑝 𝑃 𝑋 = 𝑥 − 1 ; 𝑥 = 1,2,3, … 𝑛; 0 < 𝑝 < 1 where
𝑔 𝑥, 𝑛, 𝑝 is a general function of x, n and p.
Question 103. A sports scientist is building a statistical model to describe the number of
attempts a high jump athlete will have to make until she succeeds in
clearing a certain height for the first time during an indoor sports event.
For this model the scientist considers a geometric distribution with
probability of success p. The cumulative distribution function of the
geometric distribution is given as
𝐹! 𝑥 = 1 − (1 − 𝑝)! , x = 1, 2, 3, …
(i) (a) State the assumptions that the scientist needs to make for
considering this distribution.
(b) Comment on the validity of the assumptions in part (i)(a).
The athlete has tried n jumps without success.

(ii) (a) Determine the probability that the athlete will require more
than x additional jumps to succeed in clearing the height.
(b) Comment on what the answer in part (ii)(a) means for the
athlete.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
!"
Ans.1. (0.8)
!
Ans.2. 𝑛 = 25, 𝑃 = !
Ans.3. E(X) = 1
Ans.4. E(X) = Log2
Ans.5. E(X) = 1
Ans.5. (1) B (n, p) (2) P (𝜆) (3) Neg. Binomial
Ans.6. 0.0902
Ans.7. (i) 0.2231 (ii) 0.04202
Ans.8. (i) 𝑒 !!.!" (ii) 𝑒 !!.!" (1 + 0.64)
Ans.9. 𝑃(𝑋 ≥ 3) = 1
Ans.10. 0.09852
Ans.11. 0.401878
Ans.12. 0.02917
Ans.13. 0.1934
Ans.14. 0.20
Ans.16. 0.0645
Ans.17. 0.0117
Ans.18. 0.2880
Ans.19. (a) 0.164 (b) 0.165
Ans.20. 0.093
Ans.21. (a) 0.0758 (b) 0.9856
Ans.22. 0.1550
Ans.23. 0.7586 (Hint 𝜆 = 7.5)
Ans.25. (i) 0.1298 (ii) 0.11009

Ans.26. (i) 0.0625 (ii) 0.046875 (iii) 0.20507
Ans.27. 0.050421
Ans.28. 0.87949
www.sankhyiki.in
+91-‐9711150002

Ans.29. (i) 1/28 (ii)15/56 (iii) 27/56 (iv) 3/14
Ans.30.(a) 0.0001 (b) 0
Ans.31.(a) 0.95728 (b) 0.18943 (c) 0.05674 (d) 0.27235
Ans.32. (i) 0.3012 (ii) 0.0498
Ans.33. (a) (i) P(X=x) = (0.6)!!! (0.4) (ii) E(X) = 1.5, 𝜎 = 3.75
(b) (i) 0.1353 (ii) 0.3679 (iii) E(X) = 6 V(X) = 36 P(X > 18) = 0.049
Ans.34. (i) 3 min (ii) 0.1353 (iii) 0.0855
Ans.35. (i) 0.2451 (ii) 0.1170
Ans.36. (i) 0.1048 (ii) 0.1953
Ans.37. (i) 0.0634 (ii) 0.9729 (iii) 0.0634
Ans.38. (i) 0.3679 (ii) 0.2592
Ans.39. (i) 0.0359 (ii) 0.6179
Ans.40. 0.9998
Ans.41. 0.062
Ans.42. (i) 0.0026 (ii) 0.7452
!!!
Ans.43. (a) Mode = !
(b) V(X) = 𝑎(𝑎 − 𝑏)
! ! ! !"# !!!" !
Ans.44. (a) ! !! !
. exp !
( ! ) ; 𝑥 > 0, (b) E(X) = 162.754, V(X) = 53.598 𝑒 !"
(c) 0.0611
Ans.45. 0.2304
Ans.46. 0.1707
Ans.47. 0.0086
Ans.48. 0.2066
Ans.49. 0.3025
Ans.50. (a) 0.1669 (b) 0.4073 (c) 0.4073
Ans.51. (a) New number of trials are one-fourth of original number of trials
(b) New s.d. is square root of k times the original s.d.
Ans.52. 0.68
Ans.53. (i) 0.534 (ii) 0.466
www.sankhyiki.in
+91-‐9711150002

Ans.54. Approx 1
Ans.55. (i) 0.016 (ii) 0.044
Ans.56. (i) 0.5550 (ii) n = 4
Ans.57. n = 11
Ans.58. p = 0.2
Ans.59. p = 0.025
Ans.60. 0.99863
Ans.61. 0.9863
Ans.62. (i) 0.2231 (ii) 0.19126
Ans.63. 0.0008
Ans.64. (i) 60.65 (ii) 9.025
! !!"" !""!
Ans.65. 𝑃 𝑋 = 𝑟 = !!
Ans.66. 0.02352
Ans.67. 0.08088
Ans.68. (i) 𝜆=1 (ii) Mean = 1 (iii) Coeff of Skewness = 1
! ! !"! !" ( !"!!")!

Ans.69. 𝑃 𝑋 + 𝑌 = 𝑘 = !!
Ans.70. 0.25
Ans.71. 1/3
Ans.72. 0.75
!!!
Ans.74. 𝐹 𝑥 = !!!
Ans.76. 0.0749
Ans.77. (i) 0.7653 (ii) 0.00135 (iii) 0.3174
Ans.78. (i) 117 (ii) 352
Ans.79. (i) 1,587 (ii) 6,826
Ans.80. (i) 0.0067 (ii) 0.0183
Ans.81. (i) Coefficient of variation = 1/ λ (COV decreases as mean 𝜆 increases)
(ii) Coefficient of variation = 1 (No effect)
www.sankhyiki.in
+91-‐9711150002

!
(iii) Coefficient of variation = !
(COV decreases as mean n increases)
Ans.82. 0.13362
Ans.83. (i) 0.60653 (ii) 0.43425 (iii) 0.36788
Ans.84. (i) (a) 0.10565 (b) 0.06681
(ii) 0.6324
(iii) 0.07985
Ans.85. n=6
Ans.86. (i) 0.08192 (ii) 10
Ans.87. (ii) 400k2(1 – P)2 + 400kP(1 – P) + 400P (iii) 0.842
Ans.87. L = 1.8785
Ans.88. 0.1350
Ans.89. 0.08177
Ans.90. (i) 0.94887 (ii) 0.12604 (iii) 0.86663 (iv) 0.07726
!
Ans.92. (iii) CDF = 1 − 𝑒 !!! and Mean = !
Ans.93. (ii) (a) ((1-p)n)y-1 (b) The probability in part(a) implies that Y has the same
distribution as X but with 1- (1-p)n in place of p
Ans.94. 0.632
Ans.95 (i) 0.01 (ii) 0.03 (iii) 0.009135
Ans.96. 𝑓! 𝑣 = 𝑛𝜆𝑒 !!" (1 − 𝑒 !!" )!!!
!
Ans.97. 𝑃 𝑋 = 0 = 𝑒 !! 𝑃 𝑋 = 𝑥 = ! 𝑃 𝑋 = 𝑥 − 1 𝑓𝑜𝑟 𝑥 = 1,2,3, …
Ans.98. Var(U)= 1/3 Var(V)=1/2
The variance is a measure of the spread of values. Both distributions take values
in the range from -1 to +1 and are centred around zero. However, the variance of
V is greater than the variance of U because there is a greater probability of
obtaining the extreme values -1 and +1.
Ans.99. 0.802
Ans.100. 𝜆 ≈ 1.62
! !
! ! !
Ans.101. (i) 𝑓! 𝑦 = ! 𝜆𝑦 !! 𝑒 !!! !
(ii) 𝑐 = 𝜆 𝑎𝑛𝑑 𝛾 = !
Ans.102.
Ans.103. (i ) (a) Needs to assume that each time the athlete tries she independently has
the same probability p of passing the height, i.e. that attempts here are iid.
www.sankhyiki.in
+91-‐9711150002

(b) Given that the attempts are at the same event and on the same day, it is
reasonable to assume that conditions are the same (independence) and that
probability of success does not change.
(ii) (a) (1 − 𝑝)!
(b) The lack of success on the first n jumps is irrelevant – under this model the
chances of success are not any better because there have been n attempts already.
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 3
GENERATING FUNCTIONS
Question 1. A continuous random variable X has the following p.d.f.
𝑓 𝑥 = 𝑘𝑥𝑒 !!/! ; k is a constant, 𝑥 > 0
a) Find the value of k for 𝑓(𝑥) to be a valid probability density

function.
b) Find the cumulant generating function of X.
c) Using the cumulant generating function or otherwise, find the
mean and variance of X.
Question 2. Let X follows the Poisson distribution with parameter λ=2. Obtain the
MGF of X and 2X.
Question 3. An unbiased coin is tossed twice. If X denotes the number of heads that
appear, find the MGF of X.
Question 4. Derive the cumulant-generating function of a Gamma distribution with

parameters α and λ. Hence find the mean and variance of the distribution.
Question 5. A random variable X has the following p.d.f.

!|!!!|
!
𝑓 𝑥 = !! 𝑒
! ; 𝑎 > 0, −∞ < 𝑥 < ∞
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the cumulant generating function and find mean and variance.
Question 6. A random variable X has exponential distribution with pdf :

𝑓 𝑥 = 𝑒 !! ; 𝑥 > 0.
i. Obtain the m.g.f. of 𝑌 = 1 − 𝑒 !!

ii. What is the distribution of Y? Comment.
Question 7. The size of a claim, X, which arises under a certain type of insurance
contract, is to be modeled using a gamma random variable with
www.sankhyiki.in
+91-‐9711150002

!
parameter ! and a, (both> 0) such that the moment generating function of
X is given by
𝑀 𝑡 = (1 − 𝜃𝑡)!! , 𝑡 < 1/𝜃
By using the cumulant generating function of X, or otherwise show that

the coefficient of skewness of the distribution of distribution of X is given
by 2/√𝛼.
Question 8. Show that the MGF for a Binomial (n, p) distribution is

𝑀! 𝑡 = (1 − 𝑝 + 𝑝𝑒 ! )! .
Question 9. Let 𝑋! and 𝑋! be independent Poisson random variables with respective

means 𝜇! and 𝜇! . Assuming the moment generating function of a Poisson
random variables, determine the moment generating function of 𝑋! + 𝑋!
and hence state the distribution of 𝑋! + 𝑋! .
Question 10. Let X is a random variable, which has a Poisson distribution with
parameter µ.
i. Write down the cumulant generating function 𝐾! (𝑡).

ii. By differentiation of 𝐾! (𝑡) show that the mean and variance of X
are both equal to m.
Question 11. Let X be a random variable with moment generating function.

!
!!! !
𝑀! 𝑡 = !
− ∞ < 𝑡 < ∞. Calculate the variance of X.
Question 12. Suppose that X is a continuous random variable uniformly distributed on

(0, 1).
i. Show that the moment generating function of 𝑌 = −log 𝑋 is

1
𝑀! 𝑡 = for 𝑡 < 1
1−𝑡
ii. Hence state the distribution of Y.
Question 13. The cumulant generating function of a random variable X is given by:
𝐶! 𝑡 = log 𝑀! 𝑡 = 2{(1 − 𝑡)!!" − 1}
www.sankhyiki.in
+91-‐9711150002

where 𝑀! 𝑡 is the moment generating function.
Determine the mean and variance of the distribution of X.
!
Question 14. The random variable X has an exponential dist. with mean !. It is found
that 𝑀! −𝑏 ! = 0.2. Find b.
Question 15. Consider a negative binomial variable X with probability function given
by:
!!!!!
𝑃 𝑋=𝑥 = !
𝑝! 𝑞 ! , 𝑥 = 0,1,2, … where 0 < p < 1 and q = 1 – p
(i) Show that the moment generating function is given by;
! !
𝑀 𝑡 = !!!! !
𝑓𝑜𝑟 𝑞𝑒 ! < 1
(ii) Determine E(X) and E(X2) by expanding M(t) as a power series as far as
the term in t2, and hence verify that the mean arid variance of X are
!" !"
given by: !
and !! respectively.
Question 16. Claim sizes in a certain insurance situation are modelled by a distribution
with moment generating function M(t) given by: 𝑀 𝑡 = (1 − 10𝑡)!!
Show that E[X2] = 600 and find the value of E[X3].
Question 17. Consider the discrete random variable X with probability function:
!
𝑓 𝑥 = !!!! , 𝑥 = 0,1,2, …
(i) Show that the moment generating function of the distribution of X

is given by 𝑀! 𝑡 = 4(5 − 𝑒 ! )!! for et < 5
(ii) Determine E[X] using the: moment generating function given in

part (i).
Qustion 18. The claim amount X in units of £1,000 for a certain type of industrial
policy is modeled as a gamma variable with parameters a =3 and l = 1/4.
!
(i) Use moment generating functions to show that ! 𝑋~𝜒!! .
(ii) Hence use tables to find the probability that a claim amount
exceeds £20,000.
www.sankhyiki.in
+91-‐9711150002

Question 19. Let X be, a random variable with moment generating function Mx(t) and
cumulant generating function CX(t) and let Y = aX + b where a and b are
constants. Let Y have moment generating function MY(t) and cumulant
generating function CY(t).
(i) Show that CY(t) = bt + Cx(at).
(ii) Find the coefficient of skewness of Y in the case that MX(t) = (1 – t)-2
and Y = 3X + 2 (you may use the fact that 𝐶!!!! 0 = 𝐸[ 𝑌 − 𝜇! ! ]
Question 20. Let X have a normal distribution with mean m and standard deviation s,
let the 𝑖 !! cumulant of the distribution of X be denoted by 𝑘! . Assuming
the moment generating function of X, determine the values of
𝑘! , 𝑘! 𝑎𝑛𝑑 𝑘! .
Question 21. (i) Determine the moment generating function of the two parameter
exponential random variable X, defined by the probability density
function: 𝑓 𝑥 = 𝜆𝑒 !!(!!!) , 𝑥 ≥ 𝛼 𝑤ℎ𝑒𝑟𝑒 𝜆, 𝛼 > 0 .
(ii) Hence, or otherwise, determine the mean and variance of the

random variable X.
Question 22. Use a series expansion to find 𝐸 𝑋 , 𝐸 𝑋 ! 𝑎𝑛𝑑 𝐸 𝑋 ! of a random

!
variable, X, with MGF given by 𝑀! 𝑡 = (1 − !)!! 𝑡 < 5.
! ! !
Question 23. Show that the MGF of normal distribution is given by 𝑀! 𝑡 = 𝑒 !"!!! !
.
Question 24. Use MGFs to show that if X has a Gamma (𝛼, 𝜆) distribution, then 2𝜆X has
a 𝜒 ! !! distribution. Hence, if X is Gamma (20, 0.4), estimate P(X>75).
Question 25. Let X be a random variable with probability density function

!
𝑒 !|!| ; −∞ < 𝑥 < ∞
𝑓 𝑥 = !
(i) Show that the moment generating function of X is given by
𝑀! 𝑡 = (1 − 𝑡 ! )!! 𝑓𝑜𝑟 𝑡 < 1.
(ii) Hence find the mean and the variance of X using the moment
generating function in part (i).
www.sankhyiki.in
+91-‐9711150002

ANSWERS
!
Ans.1. (i) k = 4 (ii) 𝐶! 𝑡 = -2log (1−2𝑡); 𝑡 < ! (iii) Mean = 4 and Var = 8
! !!) !! !!)
Ans.2. 𝑀! 𝑡 = 𝑒 !(! 𝑃!! 𝑡 = 𝑒 !(!
!
Ans.3. 𝑀! 𝑡 = ! [1 + 𝑒 ! ]!
!
Ans.4. 𝐾! 𝑡 = −𝛼 log( 1 − !) 𝐾!! 0 = 𝛼/𝜆, 𝐾!!! 0 = 𝛼/𝜆!
Ans.5. 𝐾! 𝑡 = 𝑎𝑡 − log(1 − 𝑎! 𝑡 ! ) Mean = a Var = 2𝑎!
! ! !!
Ans.6. (i) !
(ii) Y~𝑈(0,1)
Ans.9. 𝑋! + 𝑋! ~𝑃(𝜇! + 𝜇! )
Ans.10. (i) 𝐾! 𝑡 = µμ(𝑒 ! − 1) (ii) Mean = µ (iii) Var = µ
Ans.11. 2
Ans.13. E(X) = 20 V(X) = 220
Ans.14. b = 4 Ans.16. 24,000 Ans.17. 0.25
Ans.18. (ii) 0.1247 Ans.20. 𝑘! = σ! , 𝑘! = 𝑘! = 0

! ! !
Ans.21. (i) 𝑀! 𝑡 = !!! 𝑒 !" (ii) E(X)= ! + 𝛼 and V(X)= !!
Ans.22. E(X) = 1/5 E(X2) = 2/25 E(X3) = 6/125
Ans.24. Probability is just less than 0.025
Ans.25. (ii) Mean = 0 and Var = 2
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 4
JOINT DISTRIBUTION
Question 1. Determine the value of k for which the function given by
𝑓 𝑥, 𝑦 = 𝑘𝑥𝑦 For 𝑥 = 1,2,3 ; 𝑦 = 1, 2,3
can serve as a joint probability distribution.
Question 2. Given the values of the joint probability distribution of X and Y shown in the
table
X -1 1
-1 1/8 1/2
Y 0 0 1/4
1 1/8 0
Find (a) the marginal distribution of X;
(b) the marginal distribution of Y;
(c) The conditional distribution of X given 𝑌 = −1;
Question 3. The joint distribution of two random variables 𝑢 & 𝑣 is as follows:
V
5 10 15
1 a 3a 0
U 2 11a 2a 8a
3 a a 3a
Calculate (i) the value of 𝑎 (ii) 𝐸(2𝑈 + 𝑉) (iii) 𝐸 𝑈 𝑉 = 5]
www.sankhyiki.in
+91-‐9711150002

Question 4. If the joint probability distribution of X and Y is given by

X
-‐1 0 1
-‐1 1/6 1/3 1/6 2/3
Y 0 0 0 0 0
1 1/6 0 1/6 1/3
1/3 1/3 1/3
Show that their covariance is zero even though the two random variables are not
independent.
Question 5. The COV(X,Y) = 5. What is the COV(2X-3, 3Y+5)?
Question 6. If the joint probability density of X and Y is given by
𝑥 + 𝑦 for 0 < 𝑥 < 1, 0 < 𝑦 < 1

𝑓 𝑥, 𝑦 =
0 elsewhere
Find the joint distribution function of these two random variables.
Question 7. Given the joint probability density

!
𝑥 + 2𝑦 for 0 < 𝑥 < 1, 0 < 𝑦 < 1
𝑓 𝑥, 𝑦 = !
0 elsewhere
Find the marginal densities of X and Y.
Question 8. Given the joint probability density
4𝑥𝑦 for 0 < 𝑥 < 1, 0 < 𝑦 < 1

𝑓 𝑥, 𝑦 =
0 elsewhere
Find the marginal densities of X and Y and the conditional density of X
given 𝑌 = 𝑦.

!
!
for 0 < 𝑥 ≤ 1
!
for 1 < 𝑥 ≤ 2
𝑓 𝑥 = !
!!!
!
for 2 < 𝑥 < 3
0 elsewhere
www.sankhyiki.in
+91-‐9711150002

Find the expected value of 𝑔 𝑋 = 𝑋 ! − 5𝑋 + 3.
Question 10. Find the expected value of the random variable X whose probability density is
given by
𝑥 for 0 < 𝑥 < 1
𝑓 𝑥 = 2 − 𝑥 for 1 ≤ 𝑥 < 2
0 elsewhere
Question 11. Let (X, Y) have the joint density
!! !!!
𝑓 𝑥𝑦 = 𝑥𝑒 𝑥 ≥ 0 𝑦 ≥ 0
0 elsewhere
Show that (i) 𝐸(𝑌) does not exist.
!
(ii) 𝐸 𝑌|𝑋 = 𝑥 = ! , 𝑥 > 0
Question 12. Test whether 𝑋 and 𝑌 are independent?
𝑓 𝑥, 𝑦 = 4𝑥𝑦; 0 < 𝑥 < 1, 0 < 𝑦 < 1
= 0; otherwise
Question 13. Let 𝑓 𝑥, 𝑦 = 𝑘 𝑥 + 𝑦 0 < 𝑥 < 1 ; 0 < 𝑦 < 1

! !
Find (i) the value of k. (ii) 𝑃 0 < 𝑋 < ! , 0 < 𝑌 < !
(iii) 𝑓! ! (𝑦 𝑥) (iv) 𝐸 𝑋 , 𝐸 𝑌 & 𝐸(𝑋 + 𝑌)
Question 14. Let 𝑓 𝑥, 𝑦 = 𝑒 !(!!!) 𝑥 > 0, 𝑦 > 0
Find (a) 𝑃[𝑋 > 1] (b) 𝑃[1 < 𝑋 + 𝑌 < 2]

!
(c) 𝑃[𝑋 < 𝑌 𝑋 < 2𝑌] (d) 𝑚 such that 𝑃 𝑋 + 𝑌 < 𝑚 = !
Question 15. Find the joint probability density of the two random variable X and Y whose joint
distribution function is given by
1 − 𝑒 !! 1 − 𝑒 !! for 𝑥 > 0 and 𝑦 > 0

𝐹 𝑥, 𝑦 =
0 elsewhere
24𝑥𝑦 for 0 < 𝑥 < 1, 0 < 𝑦 < 1, 𝑥 + 𝑦 < 1

𝑓 𝑥, 𝑦 =
0 elsewhere
www.sankhyiki.in
+91-‐9711150002

!
Find 𝑃 𝑋 + 𝑌 < ! .

2 for 𝑥 > 0, 𝑦 > 0, 𝑥 + 𝑦 < 1
𝑓 𝑥, 𝑦 =
0 elsewhere
! !
Find 𝑃 𝑋 ≤ ! , 𝑌 ≤ ! ;
Question 18. The joint density function of X and Y is given by:
2 𝑥 > 0, 𝑦 > 0, 𝑥 + 𝑦 < 1

𝑓 𝑥, 𝑦 =
0 otherwise
!
i. Show that 𝐸 𝑋 = !
ii. Find Cov (𝑥, 𝑦).
Question 19. Let 𝑓 𝑥, 𝑦 = 3 𝑥 + 𝑦 0 < 𝑥 + 𝑦 < 1, 0 < 𝑥 < 1, 0 < 𝑦 < 1
Find (a) 𝑓! (𝑥) (b) 𝑃[𝑋 + 𝑌 < 0.5] (c) 𝐸 𝑌 𝑋 = 𝑥] (d) cov. (𝑋, 𝑌)
!
Question 20. Let 𝑓 𝑥, 𝑦 = ! 𝑥𝑦 0 < 𝑦 < 𝑥, 0 < 𝑥 < 2.
Find the marginal distribution of X & Y. Are X & Y independent?
Question 21. Suppose X and Y have joint density by

!!!"
𝑓 𝑥, 𝑦 = !
; 𝑥 < 1, 𝑦 < 1
Examine whether x and y are independent random variables.
Question 22. Let the two dimensional random variable(X, Y) have the following joint density
function;
!
𝑓!,! 𝑥, 𝑦 = ! 6 − 𝑥 − 𝑦 ; 0 < 𝑥 < 2, 2 < 𝑦 < 4
= 0 elsewhere
(a) Find 𝐸(𝑌/𝑋 = 𝑥) (b) 𝐸(𝑋𝑌/𝑋 = 𝑥)
(c) show that 𝐸 𝑌 = 𝐸[𝐸(𝑌/𝑋)].
Question 23. The p.d.f. of (X, Y) is
𝑓 𝑥, 𝑦 = 8𝑥𝑦, 0 < 𝑥 < 𝑦 < 1
= 0 otherwise
Find (i) 𝐸(𝑌/𝑋 = 𝑥) (ii) 𝐸(𝑋𝑌/𝑋 = 𝑥) (iii) Var (𝑌/𝑋 = 𝑥)
www.sankhyiki.in
+91-‐9711150002

Question 24. Let X and Y have joint probability density function
! !
𝑓 𝑥, 𝑦 = 21𝑥 𝑦 0 < 𝑥 < 𝑦 < 1
0 otherwise
Find (i) 𝐸(𝑋/𝑌 = 𝑦) (ii) 𝑉(𝑋/𝑌 = 𝑦)
Question 25. Let X and Y be two random variables each taking three -1, 0 and 1 and having the
joint probability distribution
Y/X -‐1 0 1
-‐1 0 0.1 0.1
0 0.2 0.2 0.2
1 0 0.1 0.1
Find (i) the corr. (X, Y) (ii) 𝐸(𝑌/𝑋 = −1) and 𝑉(𝑌/𝑋 = −1)
Question 26. The random variables (X, Y) have the following joint probability mass function
X
y 1 2 3
2 1/12 1/6 1/12
3 1/6 0 1/6
4 0 1/3 0
i. Show that X and Y are dependent.

ii. Given a probability table of random variables U and V that have the same
marginal’s as X and Y but are independent.
Question 27. The joint pmf of X and Y is given below.
X
Y 0 1 2
0 1/6 1/3 1/12
1 2/9 1/6 0
2 1/36 0 0
Find (a) Mean values of X and Y (b) Find the covariance between X and Y
(c) Conditional distribution of X given 𝑌 = 1 (d) correlation between X and Y.
www.sankhyiki.in
+91-‐9711150002

Question 28. Let the joint p.d.f. of (X, Y) be
!
𝑓 𝑥, 𝑦 = ! 𝑥 + 2𝑦 ; 0 < 𝑥, 𝑦 < 1
!
Find (a) Conditional mean of X given 𝑌 = !
!
(b) Conditional variance of X given 𝑌 = !.
Question 29. The joint cdf of(X, Y) is
𝐹!,! 𝑥, 𝑦 = 1 − 𝑒 !!! − 𝑒 !!! + 𝑒 !(!!!!!) ; 𝑥, 𝑦 > 0

= 0 elsewhere
Find (a) joint pdf of (X, Y) (b) Marginal pdf of X and Y
(c) 𝑃 𝑋 ≤ 1 ∩ 𝑌 ≤ 1 (d) 𝑃 1 < 𝑋 < 3 ∩ 1 < 𝑌 < 2
Question 30. The joint probability distribution of the amounts X and Y of two commodities
supplied to a market has the probability density function (pdf)
𝑓 𝑥, 𝑦 = 𝑘𝑥𝑦 4𝑥 + 9𝑦 𝑒 ! !!! ; 𝑥, 𝑦 > 0, k is a constant.
a. Determine the constant k.

b. Find the conditional pdf of X given 𝑌 = 𝑦 and that of Y given 𝑋 = 𝑥.
c. Find the conditional variance of X given 𝑌 = 𝑦.
Question 31. Suppose (X, Y) have joint pdf
𝑓 𝑥, 𝑦 = 2 exp[ − 𝑥 + 𝑦 ]; 0 ≤ 𝑦 ≤ 𝑥 < ∞
= 0 elsewhere
Derive the conditional expectation 𝐸(𝑋/𝑌 = 𝑦).
Question 32. Suppose that the joint pdf of 𝑋! and 𝑋! is given by
!! !!
𝑓 𝑥! , 𝑥! = 𝑥!! + !
; 0 ≤ 𝑥! ≤ 1, 0 ≤ 𝑥! ≤ 2
= 0; otherwise
Find (a) the conditional density of 𝑋! given 𝑋! = 𝑥! (b) 𝐸(𝑋! 𝑋! = 𝑥! )
(c) Verify that 𝐸 𝐸 𝑋! 𝑋! = 𝑥! = 𝐸(𝑋! )
Question 33. The joint pdf of (X, Y) is given by

𝑓 𝑥, 𝑦 = 𝑥 exp[ − 𝑥(1 + 𝑦)]; 𝑥 > 0, 𝑦 > 0
www.sankhyiki.in
+91-‐9711150002

a. Find the marginal pdf’s of X and Y
b. Show that E[Y] does not exist.
c. Calculate E[Y/X] and comment in light of (b).
Question 34. Let 𝑋! + 𝑋! + ⋯ + 𝑋! be iid exponential random variables with mean 1/λ
a. Using mgf, find the distribution of 𝑌 = 𝑋! + 𝑋! + ⋯ + 𝑋! .

b. Using cgf of Y, find the mean and variance of Y.
Question 35. Let Z be a random variable with mean 0 and variance 1, and let X be a random
variable independent of Z with mean 5 and variance 4. Let 𝑌 = 𝑋 − 𝑍.Calculate
the correlation coefficient between X and Y.
Question 36. Consider three random variables X, Y and Z with the same variance 𝜎 ! = 4.
Suppose that X is independent of both Y and Z but Y and Z are correlated, with
correlation coefficient 𝜌!" = 0.5.
i. Calculate the covariance between X and U, where 𝑈 = 𝑌 + 𝑍.

ii. Calculate the covariance between Z and V, where 𝑉 = 3𝑋 − 2𝑌.
iii. Calculate the variance of W, where 𝑊 = 3𝑋 − 2𝑌 + 𝑍.
Question 37. Suppose that the joint probability distribution of two, random variables X and Y
is given by the following table:
Y
2 4 6
1 0.2 0.0 0.2
X 2 0.0 0.2 0.0
3 0.2 0.0 0.2
(i) Show that X and Yare uncorrelated, but are not independent.
(ii) Leaving the probabilities in the first and third rows of the table the same,
change the entries in the second row so that X and Y are independent.
Question 38. Let X and Y be random variables which each takes values 1 and 2 only.
Calculate E[X|Y = 2], given that E[X] = 6/5, E[X|Y = 1] = 7/6, and P(Y=1)=3/5.
Question 39. The continuous random variables X and Y have a bivariate probability density
function
f(x,y) = 2 0 < x + y < 1, x > 0, y > 0
The conditional distribution of X given Y = y is a uniform distribution with
probability density function:
www.sankhyiki.in
+91-‐9711150002

!
𝑓 𝑥 𝑦 = !!! 0<x<1-y
and the marginal distribution of Y is a beta distribution, with probability density

function:
f(y) = 2(1-y) 0<y<1
(i) Show that the conditional expectation of X given Y = y is
!!!
𝐸 𝑋𝑌 = !
and obtain the conditional variance of X given Y= y

(ii) Verify in this ‘case that var(X) = var(E(X|Y)) + E(var(X|Y)).
Question 40. (i) Show that for continuous random variables X and Y:
E(Y) = E[E(Y|X)]
(ii) Suppose that a random variable X has a standard normal distribution, and the
conditional distribution of a Poisson random variable Y given the value of X =
x has expectation g(x) = 𝑥 ! + 1.
Determine E(Y) and var(Y).
Question 41. Consider two random variables X and Y with joint probability density function
(PDF):
!
𝑓 𝑥, 𝑦 = ! (1 − 𝑥𝑦) 0 < x < 1, 0 < y < 1
The marginal PDF of X is given by:
!
𝑓 𝑥 = ! (2 − 𝑥) 0<x<1
with a corresponding marginal PDF for Y by symmetry. (You are not asked to
verify these marginal densities.)
(i) Show that the conditional PDF of Y given X = x is given by:
(!!!")
𝑓 𝑦𝑥 =2 (!!!)
0 < y < 1
(ii) (a) Determine the conditional expectation E(Y|X = x) as a function of x and

hence determine E(Y)
(b) Verify your answer in part (a) by determining E(Y) directly from the
marginal PDF of Y.
Question 42. (i) Let Y be the sum of two independent random variables X1 and X2 that is:
Y = X 1 + X2
Show that the moment generating function (MGF) of Y is the product of the
MGFs of X1 and X2.
www.sankhyiki.in
+91-‐9711150002

(ii) Let X1 and X2 be independent gamma random variables with parameters
(α1, λ) and (α2, λ), respectively.
Use MGFs to show that Y = X1 + X2 is also a gamma random variable and
specify its parameters.
Question 43. The number of claims, X, arising on each policy in a certain portfolio depends on
another random variable Y. X is considered to follow a Poisson distribution with
mean Y. The variable Y itself is assumed to have a gamma distribution with
parameters (a,b).
Find expressions for the unconditional moments E(X) and E(X2) using
appropriate conditional moments
Question 44. Let the random variables (X,Y) have the joint probability density function:
𝑓!,! 𝑥, 𝑦 = exp − 𝑥 + 𝑦 x > 0, y > 0
(i) Derive the marginal probability density functions of X and Y and hence
determine (giving reasons) whether or not the two variables are independent.
(ii) Derive the joint cumulative distribution function FX,Y(x, y).
Question 45. The table below shows a bivariate probability distribution for two discrete
random variables X and Y:
X=0 X=1 X=2
Y=1 0.15 0.20 0.25
Y=2 0.05 0.15 0.20
Find the value of E(X|Y = 2).
Question 46. Consider two random variables X and Y, for which the variances satisfy
V [X] = 5V[Y] and the covariance cov[X,Y] satisfies cov[X,Y] = V[y].
Let S = X + Y and D = X – Y
(i) Show that the covariance between S and D satisfies
cov[S,D] = 4V[Y].
(i) Calculate the correlation coefficient between S and D.
Question 47. The random variable X and Y have a joint probability density with density
3𝑥, 0 < 𝑦 < 𝑥 < 1
function 𝑓!" 𝑥, 𝑦 =
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(i) Determine the marginal densities of X and Y.
(ii) State, with reasons, whether X and Y are independent.
www.sankhyiki.in
+91-‐9711150002

(iii) Determine E(X) and E(Y)
Question 48. X and Y are discrete random variables with joint distribution given below.
Y = −1 Y=0 Y=1
X =1 0 1/4 0
X=0 1/4 1/4 1/4
(i) Determine the conditional expectation E[Y|X = 1].
(ii) Determine the conditional expectation E[X|Y = y] for each value of y.
(iii) Determine the expected value of X based on your conditional expectation

results from part (ii).
Question 49. Consider two random variables X and Y with E(X) = 2, V(X) = 4, E(Y) = -3,
V(Y) = 1 and Cov(X,Y) = 1.6. Calculate
(a) the expected value of 5X+20Y

(b) the correlation coefficient between X and Y.
(c) the expected value of the product XY.
(d) The variance X-Y.
Question 50. Claim amounts arising under a particular type of insurance policy are modelled as
having a normal distribution with standard deviation £35. They are also assumed
to be independent from each other. Calculate the probability that two randomly
selected claims differ by more than £100.
Question 51. The random variables X and Y are related as follows: X conditional on Y = y has
a N(2y, y2) distribution. Y has a N(200, 100) distribution.
Derive the unconditional variance of X, V[X].
Question 52. Consider the random variable X taking the value X = 1 if a randomly selected
person is a smoker, or X = 0 otherwise. The random variable Y describes the
amount of physical exercise per week for this randomly selected person. It can
take the values 0 (less than one hour of exercise per week), 1 (one to two hours)
and 2 (more than two hours of exercise per week). The random variable
R = (3 − Y)2(X + 1) is used as a risk index for a particular heart disease.
The joint distribution of X and Y is given by the joint probability function in the
following table.
www.sankhyiki.in
+91-‐9711150002

Y
X 0 1 2
0 0.2 0.3 0.25
1 0.1 0.1 0.05
(i) Calculate the probability that a randomly selected person does more than
two hours of exercise per week.
(ii) Decide whether X and Y are independent or not and justify your answer.
(iii) Derive the probability function of R.
(iv) Calculate the expectation of R.
Question 53. The random variable X has a Poisson distribution with mean Y, where Y itself is
considered to be a random variable. The distribution of Y is lognormal with
parameters µ and σ2
Derive the unconditional mean E[X] and variance var[X] using appropriate
conditional moments. (You may use any standard results without proof, including
results from the book of Formulae and Tables.)
Question 54. Consider two random variables X and Y with E[X] = 2, V[X] = 4, E[Y] = −3,
V[Y] = 1, and Cov[X, Y] = 1.6.
Calculate:
(a) the expected value of 5X + 20Y.
(b) the correlation coefficient between X and Y.
(c) the expected value of the product XY.
(d) the variance of X − Y.
Question 55. Consider two random variables X and Y.

(i) Write down the precise mathematical definition for the correlation
coefficient ρ(X,Y) between X and Y.
Assume now that Y = aX + b where a < 0 and −∞ < b < ∞.
(ii) Determine the value of the correlation coefficient 𝜌(X, Y).
www.sankhyiki.in
+91-‐9711150002

Question 56. The joint density function of X and Y is given by
!
! !!! /!
𝑓 𝑥, 𝑦 = 𝑦𝑒 !! − ∞ < 𝑥 < ∞, 𝑦 > 0
!!
!
a) State the probability density functions and hence identify the statistical
distributions of
i) Y ii) X conditional on Y = y.
b) Compute E(Y) and Var(Y).
c) Compute E(X|Y = y) and Var(X|Y = y).
d) Hence, compute E(X) and Var(X).
Question 57. Let X and Y be identically distributed and uncorrelated random variables such
that the moment generating of the random variable Z = X+Y is
𝑀! 𝑡 = 0.09𝑒 !!! + 0.24𝑒 !! + 0.34 + 0.24𝑒 ! + 0.09𝑒 !! , −∞ < 𝑡 < ∞

(i) Compute E(Z) and Var(Z).
(ii) Using part (i), show that the marginal distribution of X(or Y) is
Value -1 0 1
Probability 0.3 0.4 0.3
(iii) Hence construct a table showing the joint distribution of X and Y.

(iv) Are X and Y independent? Justify your answer.
Question 58. Let X denote the time taken by a worker to complete a specified work in a factory.
For a given worker, the distribution of X is modelled as an exponential
distribution with unknown mean u that varies across the work force. U is treated
as a Uniform random variable over (a, b), i.e.
!
𝑋|𝑈~𝐸𝑥𝑝(!) and 𝑈~𝑈(𝑎, 𝑏).
Find the mean and variance of the marginal distribution of X.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. 1/36
Ans.3. (i) 1/30, (ii) 412/30, (iii) 2
Ans.5. 30
! !
Ans.7. ! (𝑥 + 1), !
(1 + 4𝑦)
Ans.8. 2x, 2y, 2x
Ans.9. -11/6
Ans.10. 1
Ans.12. Yes
!!!
Ans.13. (ii) 0.04637, (iii) ! , (iv) 7/12, 7/12, 14/12
!!
!
Ans.14. (a) 𝑒 !! (b) 0.32975 (c) ¾
Ans.15. 𝑒 !(!!!) 𝑥 > 𝑦 > 0
Ans.16. 0.0625
Ans.17. 1/2
Ans.19. (a) 3(1 − 𝑥 ! )/2 (b) 0.125 (d) 55/64
Ans.20. No
Ans.21. not independent

(!"!!"!) (!"!!"!) !"
Ans.22. (a) !(!!!!)
(b) 𝑥 !(!!!!)
(c) !" = E[Y] = E[Y 𝑋]
! ! !!!!! ! !" ! !!!!! !

Ans.23. (i) 𝐸 !
= 𝑥 = ! !!!
(ii) 𝐸 !
= 𝑥 = !𝑥 !!!
!!! ! ! !!!!! !
(iii) !
−!( !!! !
)!
! ! !
Ans.24. (i) 𝐸 !
= ! 𝑦 (ii) 𝑉 !
= 𝑦 = (3/80) 𝑦 !
Ans.25. (i) Corr. (XY) = 0 (ii) 𝐸(𝑌/𝑋 = −1) = 0 (iii) 𝑉(𝑌/𝑋 = −1) = 0
www.sankhyiki.in
+91-‐9711150002

! ! ! !!
Ans.27. (a) 𝐸 𝑋 = ! , 𝐸 𝑌 = ! (b) Cov(XY) = − !" (d) 𝜌 = !

!
Ans.28. (i) 𝐸 𝑋 = ! (ii) Var = 13/162
Ans.29. (a) 𝑓!,! = 6𝑒 !(!!!!!) ; 𝑥 , 𝑦 > 0 (b) 𝑓! 𝑥 = 2𝑒 !!! 𝑥 > 0 ; 𝑓! 𝑦 = 3𝑒 !!! 𝑦 > 0
! !"!!"! ! !"!!"! !"!!"! !
Ans.30. (a) K = 1/26 (b) 𝐸 !
=𝑦 = (!!!!)
(c) 𝑉 !
=𝑦 = !!!!
− !!!!
)
Ans.31. 𝐸 𝑋 𝑌 = 𝑦] = (𝑦 + 1)
!!! !!!
! 0 < 𝑥! ≤ 1, 0 ≤ 𝑥! < 2
Ans.32. (a) 𝑓!! !! (𝑥! 𝑥! ) = !
!!! !!
0 elsewhere
!! !!
(b) 𝐸 𝑋! 𝑋! = 𝑥! = !!! !! (c) E (𝑋! ) = 10/9 = E [E [𝑋! 𝑋! = 𝑥! ]]
!
! !
Ans.33. (a) 𝑓 𝑥 = 𝑒 !! (c) ! 𝑓 𝑦 = (!!!)!
!
Ans.34. (b) 𝐸 𝑌 = ! , 𝑣 𝑦 = 𝜋/𝜆!
Ans.35. 0.894
Ans.36. (i) 0 (ii) -4 (iii) 48
Ans.37. P(X=1&Y=2) = 0.1, P(X=2&Y=4) = 0 and P(X=3&Y=6) = 0.1
Ans.38. E[X|Y=2] = 5/4
Ans.40. E(Y) = 2 and Var(Y) = 4

!!!! !
Ans.41. E(Y|X = x) = !(!!!) E(Y) = !
! !"!! ! !!
Ans.43. E[X] = ! E[𝑋 ! ] = !!
Ans.44. (i) 𝑓! 𝑥 = 𝑒 !! 𝑎𝑛𝑑 𝑓! 𝑦 = 𝑒 !! (X and Y are independent)
(ii) 𝐹!,! 𝑥, 𝑦 = 1 − 𝑒 !! (1 − 𝑒 !! )
Ans.45. E(X|Y=2) = 1.375
Ans.46. (ii) corr(S,D) = 0.70711

!
Ans.47. (i) 𝑓! 𝑥 = 3𝑥 ! 𝑓𝑜𝑟 0 < 𝑥 < 1 𝑓! 𝑦 = ! (1 − 𝑦 ! ) 𝑓𝑜𝑟 0 < 𝑦 < 1
www.sankhyiki.in
+91-‐9711150002

(ii) Not independent because 𝑓! (𝑥)𝑓! (𝑦) ≠ 𝑓!" (𝑥, 𝑦)
!
(iii) E(X) = 0.75 and E(Y) = !
Ans.48. (i) 0 (ii) E(X|Y = -1) = 0 E(X|Y=0) = 0.5 E(X|Y=1) = 0 (iii) 0
Ans.49. (a) -50 (b) 0.8 (c) E(XY) = -4.4 (d) V(X-Y) = 1.8
Ans.50. 0.043
Ans.51. 40,500
Ans.52. (i) 0.3
(ii) Not independent
(iii) R 1 2 4 8 9 18
P(R=r) 0.25 0.05 0.3 0.1 0.2 0.1
(iv) 5.95
! ! !
Ans.53. 𝐸 𝑋 = exp (𝜇 + ! 𝜎 ! ) 𝑉 𝑋 = exp 𝜇 + ! 𝜎 ! + exp (2𝜇 + 𝜎 ! )(𝑒 ! − 1)
Ans.54. 𝐹! 𝑧 − 𝐹! 𝑧 − 1
Ans.54. (a) -50 (b) 0.8 (c) -4.4 (d) 1.8

!"#(!,!)
Ans.55. 𝜌 𝑋, 𝑌 = (b) -1
! ! !(!)
Ans.56. (a) (i) 𝑌~𝐺𝑎𝑚𝑚𝑎 2,1 and 𝑋|𝑌 = 𝑦~𝑁𝑜𝑟𝑚𝑎𝑙(0, 𝑦 !! )
(b) E(Y) = V(Y) = 2 (c) E(X|Y=y) = 0 and V(X|Y=y) = 𝑦 !! (d)E(X) =0 V(X) =1
Ans.57. (i) E(Z) =0 and V(Z) = 1.2
(ii) (iii) Yes

!!!
Ans.58. 𝐸 𝑋 = !
𝑉 𝑋 = (5 𝑎! + 𝑏 ! + 2𝑎𝑏)/12
www.sankhyiki.in
+91-‐9711150002

REVISION ASSIGNMENT – 1
Question 1. An actuarial student has said that the following three distributions are the same:
(i) the chi square distribution with 2 degrees of freedom
(ii) the exponential distribution with mean ½
(iii) the gamma distribution with 𝛼 = 1 and 𝜆 = ½ .
State with reasons whether the student is correct.
Question 2. The number of telephone calls per hour on a working day received at an insurance
office follows a Poisson distribution with mean 2.5.
(i) Calculate the probability that more than 7 telephone calls are received on a
working day between 9am and 11am.
(ii) Calculate the probability that, if the office opens at 8am, there are no
telephone calls received until after 9am.
Question 3. The random variable X has an exponential distribution with parameter 𝜆. Use the
moment generating function to determine an expression for E[X 4 ].
Question 4. The random variable X has a beta distribution with parameters 𝛼 = 1 and 𝛽 =4.
(i) State the value of E(X).
(ii) Determine the median of X.
(iii) Hence comment on the shape of this distribution.
Question 5. A large life office has 1,000 policyholders, each of whom has a probability of
0.01 of dying during the next year (independently of all other policyholders).
(i) Derive a recursive relationship for the binomial distribution of the form:
𝑃 𝑋 = 𝑥 = 𝑘𝑔 𝑥 𝑃(𝑋 = 𝑥 − 1)
where k is a constant and g(x) is a function of x
(ii) Calculate the probabilities of the following events:

(a) there will be no deaths during the year
(b) there will be more than two deaths during the year
(c) there will be exactly twenty deaths during the year.
www.sankhyiki.in
+91-‐9711150002

Question 6. On a portfolio of insurance policies, the claim size, Y is assumed to depend on the
age of the policyholder, X . Suppose that the conditional mean and variance of Y
are:
!!
𝐸 𝑌 𝑋 = 𝑥 = 2𝑥 + 400 𝑉 𝑌 𝑋 = 𝑥 = !
The distribution of X over the portfolio is assumed to be normal with mean 50 and
standard deviation 14.
Calculate the unconditional mean and standard deviation of Y.
Question 7. (i) For a pair of jointly distributed random variables X and Y , derive the
result: 𝑉 𝑋 + 𝑌 = 𝑉 𝑋 + 𝑉 𝑌 + 2𝑐𝑜𝑣(𝑋, Y)
(ii) The random variables X and Y are jointly distributed with standard
deviations of 5 and 7 respectively and corr(X ,Y) = -3/7 . Calculate the
standard deviation of 3X - 2Y + 5.
Question 8. (i) The random variables X and Y have a discrete joint distribution with joint
probability function:
𝑐 𝑥 + 2𝑦 𝑥 = 0,1,2 𝑎𝑛𝑑 𝑦 = 0,1,2
𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 =
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where c is an appropriate constant.
Determine the conditional distribution of X given Y=y for each value of y
(ii) It is subsequently discovered that the random variables, X and Y , are in

fact continuous over the ranges 0<x<2 and 0<y<2 with the probability
density function being the same as the probability function.
Determine the conditional distribution of X given Y = y.
Question 9. (i) For a lognormal distribution with mean m and standard deviation s, give
an expression for 𝜇, the mean of the underlying normal distribution.
(ii) Claim amounts for a particular type of medical negligence are lognormally
distributed with mean 15,000 and standard deviation 8,000. Calculate the
probability that the next claim exceeds 20,000.
(iii) An actuary is examining the number of large claims received by her

company. To do this she counts the number of claims arriving until she
receives one that exceeds 20,000. Calculate the mean number of claims
that she will count (not including the 20,000 claim).
www.sankhyiki.in
+91-‐9711150002

Question 10. X and Y are discrete random variables. The only possible combinations of
these two variables have the following probabilities:
Y 0 1 2
0 1/2 0 1/16
1 0 1/8 0
2 1/4 1/16 0
(i) Show that X and Y are:

(a) not independent (b) not uncorrelated.
(ii) State the circumstances under which the result

𝐸 𝑋 = 𝐸 𝐸 𝑋 𝑌 holds.
(iii) Calculate:
(a) E(X + Y|X = 1)
(b) E(X|Y = 2)
(c) var(X|Y = 2).
(iv) Determine the values of the random variable E[Y2|X] and hence
calculate E[E(Y2|X)] .
(v) Calculate E[Y2] and comment on your answer.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. By definition, the 𝜒 ! distribution is the same as the Gamma(1,1/2) distribution. The
exponential distribution with mean 1/2 has parameter 2. This is a Gamma(1,2)
distribution, and so is not equivalent to the other two. Therefore the student is wrong.
Ans.2. (i) 0.13337 (ii) 0.08208
Ans.3. 24 𝜆!
Ans.4. (i)1/5 (ii) 0.159 (iii) Since the mean is to the right of the median this
suggests that the distribution is positively skewed.
! !!!!!
Ans.5. (i)𝑘 = !!! 𝑔 𝑥 = !
(ii) (a) 0.0000432 (b) 0.997321 (c) 0.00179
Ans.6. E(Y) = 500 and SD(Y) = 46.17
Ans.7. (ii) 24.5

! !!! !!!
Ans.8. (i) 𝑃 𝑋 = 𝑥 𝑌 = 0 = ! 𝑃 𝑋 = 𝑥 𝑌 = 1 = !
𝑃 𝑋 = 𝑥 𝑌 = 2 = !"
!!!!
(ii) 𝑓 𝑥 𝑦 = ! !!!!
!!
Ans.9. (i)𝜇 = 0 ∙ 5𝑙 n !! !! !
(ii) 0.20469 (iii) 3.9
Ans.10. (ii) Always (iii)(a) 7/3 (b) 1/5 (c) 4/25 (iv) 11/8 (v) 11/8
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 5
CENTRAL LIMIT THEOREM
Question 1. State the central limit theorem for independent identically distributed random
variables.
In a large population the distribution of a variable has mean 167 and standard
deviation 27 units. If a random sample of size 36 is chosen, find the approximate
probability that the sample mean lies between 163 and 171 units.
Question 2. (a) What is the central limit theorem?

(b) Over the years, it has been observed that of all the undergraduate students, in
Mathematics who take a society’s examination, only 57% pass. Suppose that this
year 950 undergraduate students in Mathematics are taking examination. What is
the probability that
i. 565 or more pass.
ii. Between 535 and 575 pass?
Question 3. If the probability is 0.20 that a certain bank will refuse a loan application, using
normal approximation (to three decimal), find the probability that the bank will
refuse at most 40 out of 225 loan applications.
Question 4. A fair die is tossed 180 times. Determine the probability that the face 6 will
appear
i. Between 29 and 39 times inclusive

ii. Less than 22 times.
Question 5. A random variable x has the following p.d.f.

!
𝑓 𝑥 = ! 𝑥! ; −1 < 𝑥 < 1
Use central limit theorem to evaluate P [0.03≤ 𝑋≤ 0.15]
Where 𝑋 denotes the mean of random sample of size 15.
Question 6. A random sample of size 100 is taken from an infinite population having mean 76
and variance 256. What is probability that the sample mean 𝑋 will be between 75
and 78?
www.sankhyiki.in
+91-‐9711150002

Question 7. Let X ≡ N (µ, 𝜎 ! ), then the distribution 𝑋 is
(i) N(𝜇! , 𝜎 ! /𝑛! ) (ii) N(µ, 𝜎 ! /𝑛)
(iii) N(µ, 𝜎 ! /𝑛! ) (iv) N(µ, 𝜎)
Question 8. The number of claims arising in a period of one month from a group of policies
can be modeled by a Poisson distribution with mean 24. Determine the probability
that fewer than 20 claims arise in a particular month.
Question 9. A magazine claims that 25% of its readers are students. A random sample of 200
readers is taken and is found to contain 42 students.
Calculate the probability of obtaining 42 or fewer students readers, assuming that

the magazine’s claim is correct.
Question 10. Suppose that the sums assured under policies of a certain type are modeled by a
distribution with mean £8,000 and standard deviation £3,000. Consider a group of
100 independent policies of this type.
Calculate the approximate probability that the total sum sure under this group of
policies exceeds £845,000.
Question 11. The probability that a claim is made on a certain type of policy in a particular year
is 0.04. Five hundred policies are selected at random.
Use a suitable normal approximation to calculate the probability that no more than
30 of these will result in a claim during the year.
Question 12. Consider a random sample of size 16 taken from a normal distribution with mean
µ=25 and variance 𝜎 ! = 4. Let the sample mean be denoted 𝑋.
State the distribution of 𝑋 and hence calculate the probability that 𝑋 assumes a
value greater than 26.
Question 13. The movement of a stock price is modeled as follows:

In each time period, the stock goes up 1 with probability 0.35, stays the same with
probability 0.35, or goes down 1 with probability 0.30.
The change in the stock price after 500 time periods is being considered.
i. Assuming that changes in successive time periods are independent,
explain why the normal distribution can be used as an approximate model.
ii. Calculate an approximate value for the probability that, after 500 time
periods, the stock will be up by more than 20 from where it started.
www.sankhyiki.in
+91-‐9711150002

Question 14. The occurrence of claims in a group of 2000 policies is modeled such that the
probability of a claim arising in the next year is 0.015 independently for each
policy. Each policy can give rise to a maximum of one claim.
Calculate an approximate value for the probability that more than 40 claims arise
from group of policies in the next year.
Question 15. In order to simulate an observation of a normal random variable it is suggested

that 𝑆 = !!!! 𝑋! is used where 𝑋!, … . , 𝑋! a random sample from a continuous
! !
uniform is distributed on the interval − ! , ! .
i. Determine the approximate distributed of S.

ii. Determine the value of 𝑛 which should be used if S is required to
represent a standard normal random variable.
Question 16. Let 𝑋!, 𝑋! ,…. , 𝑋!"" be independent random variable, each having a gamma (4, 1)
distribution (and hence with mean 4 and variance 4). Calculate an approximate
value for the probability that the sum of the variables assumes a value, which
exceeds 425.
Question 17. Claim amounts on a certain type of policy are modeled as following a gamma
distribution with parameters 𝛼=120 and 𝜆=1.2.
Calculate an approximate value for the probability that an individual claim

amount exceeds 120, giving a reason for the approach you use.
Question 18. In a certain large population 45% of people have blood group A. A random
sample of 300 individuals is chosen from this population. Calculate an
approximate value for the probability that more than 115 of the sample have
blood group A.
Question 19. In a large portfolio 65% of the policies have been in force for more than five
years. An investigation considers a random sample of 500 policies from the
portfolio. Calculate an approximate value for the probability that fewer than 300
of the policies in the sample have been in force for more than five years.
Question 20. It is known that 24% of the customers in a bank holding a current account also
have another type of account with the bank. Calculate the approximate value for
the probability that fewer than 50 customers in a random sample of 250 customers
with a current account also have another type of account.
www.sankhyiki.in
+91-‐9711150002

Question 21. For a certain class of policies issued by a large insurance company it is believed
that the probability of each policy-giving rise to any claims is 0.5, independently
of all other policies. A random sample of 250 such policies is selected.
Determine approximately the probability that at least 139 of the policies in the
sample will each giving rise to any claims.
Question 22. Consider ten independent random variables X1,… , X10 which are identically
distributed with an exponential distribution with expectation 4.
(i) Specify the approximate distribution of 𝑋 = !" !!! 𝑋! , including all
parameters, using the central limit theorem.
(ii) Calculate the approximate value of the probability P[X < 40] using the result
in part (i).
(iii) Calculate the exact probability P[X < 40].
(iv) Comment on the answers in parts (ii) and (iii).
Question 23. A computer routine selects one of the integers 1, 2, 3, 4, 5 at random and replicate
the process a total of 100 times. Let S denote the sum of the 100 numbers
selected. Calculate the approximate probability that S assumes a value between
280 and 320 inclusive.
Question 24. For a certain class of business, claim amounts are independent of one another and
are distributed about a mean of µ = £4,000 and with standard deviation σ = £500.
Calculate an approximate value for the probability that the sum of 100 such claim
amounts is less than £407,500.
Question 25. A certain type of claim amount (in units of £1,000) is modelled as an exponential
random variable with parameter λ = 1.25. An analyst is interested in S, the total of
10 such independent claim amounts. In particular he wishes to calculate the
probability that S exceeds £10,000.
(i) (a) Show, using moment generating functions, that:
(1) S has a gamma distribution, and
(2) 2.5S has a 𝜒 ! distribution with 20 degrees of freedom.
(b) Use tables to calculate the required probability.
(ii) (a) Specify an approximate normal distribution for S by applying the

central limit theorem, and use this to calculate an approximate
value for the required probability.
(b) Comment briefly on the use of this approximation and on the
result.
www.sankhyiki.in
+91-‐9711150002

Question 26. A random sample of size n = 36 has sample standard deviation s = 7. Calculate,
approximately, the probability that the mean of this sample is greater than 44.5
when the mean of the population is µ = 42.
Question 27. Let X1, X2, X3, X4 and X5 be independent random variables, such that Xi~gamma
with parameters i and λ for i = 1, 2, 3, 4, 5. Let 𝑆 = 2𝜆 !!!! 𝑋!
(i) Derive the mean and variance of S using standard results for the mean and
variance of linear combinations of random variables.
(ii) Show that S has a chi-square distribution using moment generating
functions and state the degrees of freedom of this distribution.
(iii) Verify the values found in part (i) using the results of part (ii).
Question 28. An insurance company experiences claims at a constant rate of 150 per year. Find
the approximate probability that the company receives more than 90 claims in a
period of six months.
Question 29. A woodcutter has to cut 100 fence posts of a standard length and he has a metal
bar of the required length to act as the standard. The woodcutter decides to vary
his procedure from post to post − he cuts the first post using the metal standard,
then uses this post as his standard for the cut of the next post. He continues in a
similar manner, each time using the most recently cut post as the standard for the
next cut.
Each time the woodcutter cuts a post there is an error in the length cut relative to
the standard being employed for that cut − you should assume that the errors are
independent observations of a random variable with mean 0 and standard
deviation 3mm.
Calculate, approximately, the probability that the length of the final post differs
from the length of the original metal standard by more than 15mm.
Question 30. Suppose that the time T, measured in days, until the next claim arises under a
portfolio of non-life insurance policies, follows an exponential distribution with
mean 2.
(i) Find the probability that no claim is made in the next one-day period.
(ii) The median of a random variable is defined as the value for which the
cumulative distribution function of the variable is equal to 0.5.
Find the median time until the next claim arises.
(iii) Now let T1, T2, …, T30 be the times (in days) until the next claim arises
under each one of 30 similar portfolios of non-life insurance policies, and
assume that each Ti, i = 1,…,30, follows an exponential distribution with
mean 2,independently of all others.
Calculate, approximately, the probability that the total of all 30 times
which elapse until a claim arises on each of the portfolios exceeds 45 days.
www.sankhyiki.in
+91-‐9711150002

Question 31. A university runs a 3-year B.Sc degree course in Statistics. The course is divided
over 6 semesters each consisting of 5 credit papers over the three year period.
Each credit paper is assessed on a maximum possible 100 marks and is recorded
as integers.
At the end of the course, the university ranks the students based on a measure
called “grade point average” which is the average of marks obtained over all
credit papers examined over 3 years.
Assume that in each subject, the instructor makes an error of quantum k in
!
awarding marks with probability !"|!| 𝑤ℎ𝑒𝑟𝑒 𝑘 = ±1, ±2, ±3, ±4, ±5. Assume
that these errors occur independently.
(a) Show that the probability of no error is 463/600.
(b) State the approximate distribution of quantum of error in a given student’s
final grade point average using the Central limit theorem.
(c) Hence show that there is only 17.7% chance that his final grade point
average is accurate to within ±0.05.
Question 32. It is known from past experience that the daily tip amount, a waiter in a restaurant
gets, is a random variable with mean Rs.100 and standard deviation Rs.10.
(i) Assuming that the number of tips is sufficiently large, calculate the
number of tips required to ensure with at least 0.95 probability that the
average daily tip would exceed Rs.98.
(ii) Given that the waiter gets 64 tips on a particular day and the tip amounts
are independent. What is the probability that the total tips amount is
greater than Rs.6500 on the given day?
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. 0.62594
Ans.2. 0.065907, 0.637622
Ans.3. 0.22663
Ans.4. 0.58919, 0.044565
!
Ans.5. E(X) = 0, V(X) = !, 0.21375
Ans.6. 0.62836
Ans.7. (ii)
Ans.8. 0.179168
Ans.9. 0.11034
Ans.10. 0.06681
Ans.11. 0.99172
Ans.12. 0.02275
Ans.13. P [𝑌 > 20] = 0.59871
Ans.14. P (𝑋 > 40) = 0.0267
Ans.15. (i) S~N (0, n/12) (ii) S~N (0, 1) if n=12
Ans.16. P (𝑆 > 425) = 0.106
Ans.17. P (𝑋 > 120) = 0.014228
Ans.18. P (𝑋 > 115) = .98986
Ans.19. P (𝑋 < 300) = 0.0084
Ans.20. P (𝑋 < 50) = 0.0599
Ans.21. P (𝑋 ≥ 139) = 0.04385
Ans.22. (i) 𝑋~𝑁(40, 160) (ii) X is symmetric so P[X < 40] = 0.5 (iii) 0.5421
(iv) Although the sample size here is small, the CLT gives an answer which is close to
the exact probability.
Ans.23. 0.853
Ans.24. 0.933
Ans.25. (i)(b) 0.2014 (ii)(a) 𝑆~𝑁(8, 6.4) and 0.214
www.sankhyiki.in
+91-‐9711150002

(b) n is not particularly large for the use of the CLT, but the approximation is still
quite close to the true probability.
Ans.26. 0.016
Ans.27. (i) E(S) = 30 V(S) = 60
Ans.28. 0.037
Ans.29. 0.617
Ans.30. (i) 0.6065 (ii) 1.386 (iii) 0.915
Ans.31. (b) N(0, 0.05)
Ans.32. (i) 67.65≈68 (ii) 0.10565
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 6
POINT ESTIMATION
Question 1. It is known that a random sample of 12, 11.2, 13.5, 12.3, 13.8, and 11.9 comes
from a population having the following p.d.f
!
; 𝑥 > 1, 𝜃 > 1
𝑓 𝑋; 𝜃 = ! !!! Find
0; otherwise
i. Maximum likelihood estimator of θ

ii. Moment estimator of θ
Question 2. A random variable X has probability density function given by
𝑓 𝑥 = 𝑚 𝜃 − 𝑥 0 ≤ 𝑥 ≤ 𝜃
=0 otherwise
i. Determine the constant m.

ii. Derive the mean and variance of X in terms of θ.
iii. Write down the likelihood function of θ based on single observation.
iv. Find the MLE and the method of moments estimator for θ based on a
single observation
v. Are they unbiased?
Question 3. Let 𝑥! , 𝑥! , … , 𝑥! be a random sample of size n from the exponential distribution

with probability density function.
!
𝑓 𝑥 = ! 𝑒 !!/! 𝑥 > 0, 𝜆 > 0
i. Show that MLE of λ is given by

!!
𝜆= !
ii. Show that 𝜆 is an unbiased estimator of λ.

iii. Obtain the Cramer-Rao lower bound for the variance of the unbiased
estimators of λ.
iv. Show that the variance of 𝜆 attains the Cramer-Rao lower bound in (iii).
Question 4. Twenty electronic tubes were put to test and the test continued till of all them
failed. The failure times (in hours) were recorded.
www.sankhyiki.in
+91-‐9711150002

9.9 35.5 57.9 94.6 141.4 154.4 163.3 226.7 244.3 337.2
391.8 417.2 444.6 461.2 497.1 582.6 606.8 616.3 672 784.7
Total hours: 6939.5
The failure times are assumed to be independent and follow exponential

distribution with density𝑓 𝑥; 𝜆 = 𝜆𝑒 !!" ; 𝑥 > 0
i. Determine the maximum likelihood estimate (MLE) 𝜆 of λ.

ii. Obtain the large sample variance of λ.
iii. Supposing the test was terminated at 600 hours, what would be the MLE 𝜆
of λ?
Question 5. A random sample of size n is taken from distribution with pdf

!!
𝑓 𝑥 = !! ; 0 < 𝑥 < 𝜃
= 0; otherwise
a. Write down the likelihood function and hence by drawing the rough
sketch of the likelihood function, obtain the maximum likelihood
estimator (MLE) for θ.
b. Examine if the MLE is unbiased.
Question 6. (a) Let X be a continuous random variable having pdf
! ! ! !!! ! !!"
𝑓 𝑥, 𝜃 = !!! !
; 𝑥 ≥ 0, 𝜃 > 0
!!!
Where m is a known integer ≥ 2. Show that !
is an unbiased estimator of θ.
(b) Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from 𝑓! (𝑥) where
𝑓! 𝑥 = 1 + 𝜃 𝑥 ! ; 0 < 𝑥 < 1
= 0 otherwise
Obtain the MLE of θ.
(c) Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from 𝑓! 𝑥 where
𝑓! 𝑥 = 𝜃𝑒 !!" ; 𝑥 > 0
= 0; otherwise
Obtain Cramer-Rao Lower Bound (CRLB) for the variance of the unbiased
estimator of θ, assuming that the regularity conditions are satisfied.
www.sankhyiki.in
+91-‐9711150002

Question 7. Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from the following density function
!"
𝑓 𝑥; 𝜃 = !! ; 0 < 𝑥 < 𝜃, 𝜃 > 0
a) Find k such that above is a valid density function

b) Find the MLE of θ, for the given sample
Question 8. When Ramesh was appointed as the laboratory assistant on 1st Jan 2008 to
observe the lifetime of mice, there were 10 mice in the laboratory. His assignment
was to observe the lifetime of time of the mice till 100 weeks and then estimate
the expected remaining lifetime (in weeks) of mice as at 1st Jan 2008. 7 mice died
within the 100 weeks period at the following times (in weeks)
23, 27, 39, 52, 68, 89, 95
And 3 mice were alive at the end of 100th week. Assuming that the future life time
(as at 1st Jan 2008) follows Exp (λ) with density function
𝑓 𝑥, 𝜆 = 𝜆𝑒 !!" ; 𝜆 > 0, 𝑥 > 0
a) Write down the likelihood function for the sample of 10 life times that
Ramesh observed.
b) Compute the MLE of λ based on this likelihood.
c) What is the asymptotic variance of the MLE?
Ganesh (Ramesh’s boss) is the laboratory in-charge. He knows that the
experiment actually started 25 weeks before 1st Jan 2008 with 15 mice. 5
mice had died before 1st Jan 2008(Ramesh didn’t know about it). Even
Ganesh did not have the exact weeks in which these 5 mice died. He only
knows that they had died before 1st Jan 2008. Based on this new
information Ramesh wanted to correct the likelihood function.
d) What is the correct likelihood function for the life time (starting 25 weeks
before 1st Jan 2008) of 15 mice?
!!
Question 9. Suppose that 𝜃 is an unbiased estimator of a parameter θ and has variance !".
Derive an expression for the mean square error of 𝑘𝜃, where k is a constant, and
determine the value of k for which the mean square error is a minimum.
Question 10. For the estimation of a binomial probability 𝑝 = 𝑃 (success), a series of n

independent trials are performed and X represents the number of successes
observed.
www.sankhyiki.in
+91-‐9711150002

i. Write down the likelihood function L(p) and show that the maximum
!
likelihood estimator (MLE) of p is 𝑝 = !
ii. (a) Determine the Cramer-Rao lower bound for the estimation of p.
(b) Show that the variance of the MLE is equal to the Cramer-Rao lower
bound.
(c) Write down an approximate sampling distribution for 𝑝 valid for large
n.
Question 11. The number of claims, X which arise in a year on each policy of a particular class
is to be modelled as a Poisson random variable with mean λ. Let
𝑋 = (𝑋! , 𝑋! , … , 𝑋! ) be a random sample of size n from the distribution of X, and
! !
let 𝑋 = ! !!! 𝑋! . Suppose that it is required to estimate λ, the mean number of
claims on a policy.
i. Show that 𝜆, the maximum likelihood estimator of λ is given by 𝜆 = 𝑋.
ii. Derive the Cramer-Rao lower bound (CRLB) for the variance of unbiased
estimators of λ.
iii. (a) Show that 𝜆 is unbiased for λ and that it attain the CRLB.
(b) Explain clearly why, in the case that n is large, the distribution of 𝜆
!
can be approximated by 𝜆~𝑁 𝜆, ! .
Question 12. Claims in a portfolio are believed to arise as an Exp (λ) distribution. There is a
retention limit of 1,000 in force, and claims in excess of 1,000 are paid by the
reinsurer. The insurer, wishing to estimate λ, observe a random sample of 100
claims, and finds that the average amount of the 90 claims that do not exceed
1,000 is 82.9. There are 10 claims that do exceed the retention limit. Find the
MLE for λ.
Question 13. A random variable X has density function
f x = 2×15!!! cx !" (15 + x !" )!!!! , x > 0

!""
A sample of 200 values of X gave the statistic !!!
log(15 + x!!" ) = 600.What
is the maximum likelihood estimate of c?
Question 14. A random sample 𝑥! , 𝑥! , … , 𝑥!" is taken from a distribution having the density
! !/!
function: 𝑓 𝑥 = ! 𝑥 !!/! 𝑒 !!! ,𝑥 > 0
www.sankhyiki.in
+91-‐9711150002

For the sample 𝑥! = 247,360 𝑎𝑛𝑑 𝑥! !/! = 102.778
Determine the:
a. Maximum likelihood estimate of k.

b. Method of moments estimate for k.
Question 15. The discrete random variable X has the following probability function:
P(X = x) = 0.2 + αx x = -2, -1, 0, 1, 2.
(i) State the possible values that α can take.
(ii) Given a random sample x1 , x2 , ..., xn from this distribution, determine the
method of moments estimate of α and show that this can result in
inadmissible estimates (i.e. estimates outside the range of possible values
of α).
Question 16. Let X be a random variable with cumulative distribution function

!!
𝐹! 𝑥 = 𝑃 𝑋 < 𝑥 = 1 − exp − !
𝑥 > 0
and let (X1,X2,….Xn) be a random sample from X.
(i) Show that Y=X2 has an exponential distribution.
!!!
(ii) (a) Show that MLE of 𝜃 is given by 𝜃 = !
.
(b) Show that 𝜃 is an unbiased estimator of 𝜃 which attains the
Cramer-Rao lower bound on variance.
!! !
(c) Using moment generating functions, show that !
𝜃~𝜒!! .
(iii) The above distribution of X is to be used as a model for claim amounts in a
particular situation. A random sample of 50 such claim amounts (in
appropriate units) gives 𝑋!! = 485.7518
(d) Calculate the MLE of 𝜃.
(e) Using the result of (ii)(c) above, calculate an exact 95% confidence
interval of 𝜃.
Question 17. Claim amounts of a certain type are modelled using a normal distribution with an
unknown mean and a known standard deviation σ = £20.
For a random sample of 20 claim amounts all that is known is that 5 of them are
greater than £200.
(i) Let 𝜃 be the probability that a claim amount is greater than £200. Write
down the maximum likelihood estimate of 𝜃.
(ii) Determine 𝜃 in terms of µ and hence calculate the maximum likelihood
estimate of µ.
www.sankhyiki.in
+91-‐9711150002

Question 18. It has been decided to model a claim amount distribution using a gamma
distribution with parameters α = 4 and λ(unknown), that is, with density
!
𝑓 𝑥; 𝜆 = ! 𝜆! 𝑥 ! 𝑒 !!" ; 0 < 𝑥 < ∞
A random sample of n claim amounts, X1, X2, ……,Xn is selected and it is
required to estimate the parameter λ.
(i) Determine the method of moments estimator of λ.
(ii) Show that the MLE of λ is same as the method of moments estimator.
Question 19. The number of incomplete insurance proposals Y, in a batch of x proposals, is to

be modelled as a Poisson random variable with mean λx, where λ is unknown.
Data are available from n independent batches of proposals as follows: batch
number i contains xi proposals of which yi are incomplete, i = 1,2, ., n.
The least squares estimator of λ for which !!!!(𝑌! − 𝐸 𝑌! )! is minimised.
(i) Show that the least squares estimator of λ is given by
!! !!
𝜆= !!!
(ii) Determine 𝜆, the maximum likelihood estimator of λ.

(iii) Determine whether neither, one, or both of 𝜆 and 𝜆 provide unbiased
estimators of λ.
Question 20. The percentage return on an investment of a particular type over a period of one
year is to be modelled as a normal random variable X with mean µ and variance 1.
A potential investor is interested in the chance that the return on such an
investment will exceed 9%.
A random sample of ten such returns have values
7.3, 8.9, 8.3, 6.2, 9.8, 7.7, 9.4 , 7.9, 9.1, and 7.4 .
Calculate the maximum likelihood estimate of θ = P(X > 9).
Question 21. In a quality control test, a random sample of 100 items is selected, of which 90 are
found to be satisfactory. The value of p, the probability of an item being
satisfactory, is unknown.
Write down the probability of observing 90 satisfactory items out of 100 as a
function of p, and thus derive the maximum likelihood estimate of p.
Question 22. A simple model for the movement of a stock price is such that, independently in
each time period, the stock either:
! !
goes up with probability (! − 𝜃); stays the same with probability (! + 2𝜃);
!
goes down with probability (! − 𝜃).
www.sankhyiki.in
+91-‐9711150002

(i) Determine the range of admissible values of the parameter 𝜃.
(ii) (a) Calculate the probability that the stock goes down in one time
period, in the case θ =0.1.
(b) Calculate the probability that the stock stays the same for two
consecutive time periods, in the case θ = 0.
(c) Calculate the probability that, in four time periods, the stock goes
up twice and stays the same twice, in the case θ = -0.2.
(iii) Data are collected for 80 consecutive time periods and yield the following
observed frequencies
change in stock up same down
no. of time periods 24 35 21
(a) Write down an expression for L(𝜃), the likelihood of these data,
!
and show that !!
𝑙𝑜𝑔𝐿 𝜃 = 0 reduces to the equation
5120𝜃 ! − 468𝜃 − 95 = 0
(b) Explain why one of the roots of this quadratic yields the maximum
likelihood estimate of 𝜃 and hence determine this estimate.
Question 23. A random sample of size n is taken from an exponential distribution with
parameter λ, that is, with pdf
𝑓 𝑥 = 𝜆𝑒 !!" , 0 < 𝑥 < ∞
(i) Determine the MLE of λ.
Claim sizes for certain policies are modelled using an exponential distribution
with parameter λ. A random sample of such claims results in the value of the
MLE of λ as 𝜆 = 0.00124.
A large claim is defined as one greater than £4,000 and the claims manager is
particularly interested in p, the probability that a claim is a large claim.
(ii) Determine 𝑝, the MLE of p, explaining why it is the MLE.
Question 24. The size of claims (in units of £1,000) arising from a portfolio of house contents
insurance policies can be modelled using a random variable X with probability
density function (pdf) given by:
!! !
𝑓! 𝑥 = ! !!! 𝑥 ≥ 𝑐
where 𝑎 > 0 𝑎𝑛𝑑 𝑐 > 0 are the parameters of the distribution.
!"
(i) Show that the expected value of X is E[X] = !!!, for a >1.
(ii) Verify that the cumulative distribution function of X is given by
𝐹! 𝑥 = 1 − (𝑐/𝑥)! 𝑥 ≥ 𝑐
www.sankhyiki.in
+91-‐9711150002

Suppose that for the distribution of claim sizes X, it is known that c = 2.5, but a is
unknown and needs to be estimated given a random sample x1, x2, …, xn.
(iii) Show that the MLE of a is given by:
!
𝑎= ! !!
!!! !"# (!.!)
(iv) Derive the asymptotic variance of the MLE 𝑎, and hence determine its
approximate asymptotic distribution.
Consider a sample of 30 observations from this distribution, for which
!"
!!! log 𝑥! = 32.9
(v) Calculate the MLE 𝑎 in this case, together with an approximate 95%
confidence interval for a.
In the current year, claim sizes are assumed to follow the distribution of X with
a =6, c = 2.5. Inflation for the following year is expected to be 5%.
(vi) Calculate the probability that the size of a claim arising from this portfolio
in the following year will exceed £4,000.
Question 25. A life insurance company runs a statistical analysis of mortality rates. The
company considers a population of 100,000 individuals. It assumes that the
number of deaths X during one year has a Poisson distribution with expectation
E[X] = µ. Over four years the company has observed the following realisations of
X (number of deaths).
Year 1 2 3 4
Number of deaths (per 100,000 lives) 1,140 1,200 1,170 1,190
The maximum likelihood estimator for the parameter µ of the Poisson distribution
is given by 𝑋.
(i) Obtain the maximum likelihood estimate of the parameter µ using these
data.
A statistician suggests using a Poisson distribution for the number of deaths per
year in each group, where the parameter μ depends on the middle age in that
group. Under the suggested model the number of deaths in the group with middle
age ti is given by Xi ~ Poisson(μi) with μi = wti, where ti is the middle age of the
group that the individual belongs to at the time of death.
(ii) Derive a maximum likelihood estimator for the parameter w and estimate
the value of w from the data in the above table if
𝑥! = 1179 𝑎𝑛𝑑 𝑡! = 160.
Question 26. The number of claims made by each policyholder in a certain class of business is
modeled as having a Poisson distribution with mean λ.
www.sankhyiki.in
+91-‐9711150002

(i) Derive an expression for the probability, p, that a policyholder in this class
has made at least one claim.
The claims records of 20 randomly chosen policyholders were examined and the
number of policyholders that made at least one claim in a year, X, was recorded.
(ii) (a) State the distribution of the random variable X and its parameters.
(b) Derive an expression for the maximum likelihood estimator of the
probability p given in (i) using your answer in (ii)(a).
(iii) Show that, in the case X = 5, the maximum likelihood estimate (MLE) of p
is pˆ = 0.25 and hence calculate the MLE of λ.
It is now found that of the five policyholders who had made at least one claim
there were four who had made exactly one claim and one who had made two
claims.
(iv) Calculate the MLE of λ given this additional information.
Question 27. An experiment has three possible outcomes (A, B, C) and a model states that the
probabilities of these outcomes are θ, θ2, and 1 – θ – θ2 respectively, for some
suitable value of θ > 0.
Let nA, nB, and nC be the number of occurrences of outcomes A, B, and C
respectively in n (= nA + nB + nC) repetitions of the experiment. Let logL(θ)
!"#$%(!)
represent the loglikelihood function, and let U(θ)= !"
(i) (a) Show that
!! !!!! !! (!!!!)
𝑈 𝜃 = !
− !!!!! !
(b) Hence find a quadratic equation whose solution gives the
maximum likelihood estimate of 𝜃.
!" !
(ii) (a) Find an expression for !"
.
(b) Hence show that
!"(!) !(!!!!!! ! )
𝐸 − !"
= !(!!!!! ! )
The results of 100 repetitions of the experiment show that outcome A occurred 51
times, outcome B occurred 16 times, and outcome C occurred 33 times.
(iii) (a) Show that the maximum likelihood estimate of θ is 𝜃 = 0.4525.
(b) Calculate an estimate of the asymptotic standard error of 𝜃.
(c) Find an approximate 95% confidence interval for θ.
Question 28. A random sample of size n is taken from a gamma distribution with parameters
α= 8 and λ = 1/θ. The sample mean is 𝑋 and θ is to be estimated.
(i) Determine the method of moments estimator (MME) of θ.
(ii) Find the bias of the MME determined in part (i).
www.sankhyiki.in
+91-‐9711150002

(iii) (a) Determine the mean square error of the MME of θ.
(b) Comment on the efficiency of the MME of θ based on your answer
in part (iii)(a).
Question 29. A regulator wishes to inspect a sample of an insurer’s claims. The insurer
estimates that 10% of policies have had one claim in the last year and no policies
had more than one claim. All policies are assumed to be independent.
(i) Determine the number of policies that the regulator would expect to
examine before finding 5 claims.
On inspecting the sample claims, the regulator finds that actual payments
exceeded initial estimates by the following amounts:
£35 £120 £48 £200 £76
(ii) Find the mean and variance of these extra amounts.
(iii) It is assumed that these amounts follow a gamma distribution with
parameters α and λ. Estimate these parameters using the method of
moments.
Question 30. The random variable X has a distribution with probability density function given
by
!!
; 0 ≤ 𝑥 ≤ 𝜃
𝑓 𝑥 = !!
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where θ is the parameter of the distribution.
(i) Derive expressions in terms of θ for the expected value and the variance of
X.
Suppose that X1, X2,..., Xn is a random sample, with mean 𝑋 , from the distribution
of X.
!!
(ii) Show that the estimator 𝜃 = !
is an unbiased estimator of 𝜃.
Question 31. An actuary is considering statistical models for the observed number of claims, X,
which occur in a year on a certain class of non-life policies. The actuary only
considers policies on which claims do actually arise. Among the considered
models is a model for which
!! !!
𝑃 𝑋 = 𝑥 = !"#(!!!) !
, 𝑥 = 1,2,3 …
where 𝜃 is a parameter such that 0 < 𝜃 < 1.
Suppose that the actuary has available a random sample X1, X2,…..Xn with
sample mean 𝑋.
www.sankhyiki.in
+91-‐9711150002

(i) Show that the method of moments estimator (MME), 𝜃, satisfies the
equation 𝑋(1 − 𝜃) log(1 − 𝜃) + 𝜃 = 0.
(ii) (a) Show that the log likelihood of the data is given by
𝑙 𝜃 ∝ −𝑛 log − log(1 − 𝜃) + !!!! 𝑥! log(𝜃)
(b) Hence verify that the maximum likelihood estimator (MLE) of θ is
the same as the MME.
(iii) Suggest two ways in which the MLE of θ can be computed when a
particular data set is given.
Question 32. In order to estimate a certain probability of success a single observation taken
from the binomial random variable X ~ Bin(20, p)
(i) Write down an expression for the mean square error of the maximum
!
likelihood estimator 𝑝 = !" and evaluate this mean square error at p = 0.5.
(ii) Determine an expression for the mean square error of the estimator
!!!
𝑝= !"
and evaluate this mean square error at p = 0.
(iii) Comment briefly on the comparison of 𝑝 and 𝑝 as estimation the case p = 0.5.
Question 33. Let X1, X2 ,…, Xn be a random sample from a distribution with parameter θ and density
function:
!!
;0 ≤ 𝑥 ≤ 𝜃
𝑓 𝑥 = !!
Suppose that 𝑥 = (𝑥! , 𝑥! , … , 𝑥! ) is a realization of X1, X2, …, Xn.
(i) (a) Derive the likelihood function 𝐿(𝜃; 𝑥) and produce a rough sketch of its
graph.
(b) Use the graph produced in part (i)(a) to explain why the maximum
likelihood estimate of θ is given by x(n) = max{x1, x2,… , xn}
Let X(n) = max{X1, X2 ,… , Xn} be the estimator of θ, that is the random variable
corresponding to x(n) .
(ii) (a) Show that the cumulative distribution function of the estimator X(n) is
given by:
!
𝐹! ! 𝑥 = (!)!! for 0 ≤ 𝑥 ≤ 𝜃
(b) Hence, derive the probability density function of the estimator X(n).
(c) Determine the expected value E(X(n)) and the variance V (X(n)).
!!!!
(d) Show that the estimator !! 𝑋(!) is an unbiased estimator of θ.
(iii) (a) Derive the mean square error of the estimator given in part (ii)(d).
(b) Comment on the consistency of this estimator.
www.sankhyiki.in
+91-‐9711150002

Question 34. Consider the following discrete distribution with an unknown parameter p for the
distribution of the number of policies with 0, 1, 2, or more than 2 claims per year in a
portfolio of n independent policies.
number of claims 0 1 2 more than 2

probability 2p p 0.25p 1− 3.25p
We denote by X0 the number of policies with no claims, by X1 the number of policies

with one claim and by X2 the number of policies with two claims per year. The
random variable X = X0 + X1 + X2 is then the number of policies with at most two
claims.
(i) Derive an expression for the maximum likelihood estimator 𝑝 of parameter p
in terms of X and n.
(ii) Show that the estimator obtained in part (i) is unbiased.
The following frequencies are observed in a portfolio of n = 200 policies during the
year 2012:
observed frequency 123 58 13 6
A statistician proposes that the parameter p can be estimated by 𝑝= 58/200 = 0.29 since
p is the probability that a randomly chosen policy leads to one claim per year.
(iii) Estimate the parameter p using the estimator derived in part (i).
(iv) Explain why your answer to part (iii) is different from the proposed estimated
value of 0.29.
An alternative model is proposed where the probability function has the form
probability p 2p 0.25p 1− 3.25p
(v) Explain how the maximum likelihood estimator suggested in part (i) needs to be
adapted to estimate the parameter p in this new model.
(vi) Suggest a suitable test to use to make a decision about which of the two models
should be used based on empirical data.
Question 35. Let X1, X2, …, X6 be a random sample from a population following a Gamma(2,1)
distribution. Consider the following two estimators of the mean of this distribution:
! !
𝜃! = 𝑋 and 𝜃! = !" 𝑋! + 𝑋! + 𝑋! + !" 𝑋! + 𝑋! + 𝑋!
where 𝑋 is the mean of the sample.
(i) Determine the sampling distribution of 𝑋 using moment generating
functions.
(ii) Derive the bias of each estimator 𝜃! and 𝜃! .
www.sankhyiki.in
+91-‐9711150002

(iii) Derive the mean square error of each estimator 𝜃! and 𝜃! .

(iv) Compare the efficiency of the two estimators 𝜃! and 𝜃! .
Question 36. Let (X1, X2, . . . ,Xn) be a random sample from a uniform distribution on the interval
(−𝜃, 𝜃), where is an unknown positive number.
A particular sample of size 5 gives values 0.87, -0.43, 0.12, -0.92, and 0.58.
(i) Draw a rough graph of the likelihood function L(𝜃) against 𝜃 for this sample.
(ii) State the value of the maximum likelihood estimate of 𝜃.
Question 37. Ayush and Shriya play a game as follows:
• Ayush selects a positive integer θ∈{1, 2, 3 … } at random and writes it

down on a piece of paper
• Shriya has two chances to ask Ayush what the number θ is.
• Each time Ayush tosses a biased coin secretly, and reports to Shriya the
number θ - 1 if a head comes up and the number θ + 1 if a tail comes up.
• Shriya has to guess the true value of θ based on the two numbers reported
by Ayush after two independent tosses of the coin.
It is known to Shriya that the coin has a probability 1/3 of showing a head. Let X
and Y be the two numbers which Ayush reported.
Shriya considers the following two estimators of θ:
!
! !
𝑋 + 𝑌 𝑖𝑓 𝑋 ≠ 𝑌
!
𝑇 =! 𝑋+𝑌 −! 𝑆= !
𝑋 − ! 𝑖𝑓 𝑋 = 𝑌
a) Show that:
! ! ! ! !
𝑃 𝑆=𝜃 =! 𝑃 𝑆 =𝜃−! =! 𝑃 𝑆 =𝜃+! =!
b) Calculate the bias of S as an estimator of θ. Is S unbiased?
c) Calculate the mean squared error of S as an estimator of θ.
!
It can be shown that T is an unbiased estimator of θ with variance !.
d) Which of the estimators T or S should Shriya use then if she wants to minimize
the error of her estimation?
Question 38. A certain type of insurance policy has a claim rate of per year and the cover
ceases and the policy expires after the first claim. Accordingly the duration of a
policy is modelled by an exponential distribution with density function
𝜆𝑒 !!" ; 𝑥 > 0.
A company has data on (m + n) policies which have expired and which may be
assumed to be independent. Of these, m policies had duration less than 5 years
and n policies had duration greater than or equal to 5 years.
www.sankhyiki.in
+91-‐9711150002

(i) An investigator makes note of the actual durations, x1,…, xn, of the latter
group of n policies, but ignores the former group without even noting the
value of m.
(a) Explain why the xi’s come from a truncated exponential
distribution with density function 𝑓 𝑥 = 𝑘𝜆𝑒 !!"
and show that 𝑘 = 𝑒 !! .
(b) Write down the likelihood for the data from the point of view of
this investigator and hence show that the maximum likelihood
!
estimate (MLE) of is given by 𝜆 = ! ! !!!
!!! !
(c) The data yield the values: n = 10 and 𝑥! = 71. Calculate this
investigator’s MLE of 𝜆.
(ii) A second investigator ignores the actual policy durations and simply notes
the values of m and n.
(a) Write down the likelihood for this information and hence show that
! !!!
the resulting MLE of is given by 𝜆 = ! 𝑙𝑜𝑔 !
(b) The same data as in part (i) yield the values: m = 120 and n = 10.
Calculate this investigator’s MLE of 𝜆.
(iii) The two investigators decide to pool their data, and so have the
information that there are m policies with duration less than 5 years, and n
policies with actual durations x1, ... , xn.
(a) Explain why the likelihood for this joint information is given by
𝐿 𝜆 = (1−𝑒 !!! )! !!!! 𝜆𝑒 !!!!
and determine an equation, the solution of which will lead to the MLE
of 𝜆.
(b) Given that this leads to an MLE of 𝜆 equal to 0.508, comment on the
comparison of the three MLE’s.
Question 39. Suppose that 𝑋! , 𝑋! , … 𝑋! are independent and identically distributed Poisson(𝜆)
random variables.
(a) Find the maximum likelihood estimator of 𝜆.
(b) Suppose that rather than observing the random variables precisely, only the
events "𝑋! = 0" or "𝑋! > 0" for 𝑖 = 1,2 … 𝑛 are observed. Find the maximum
likelihood estimator of 𝜆 under the new observation scheme.
Question 40. (i) Define the following terms:

• Estimator
• Unbiasedness
• Mean Square Error
• Consistency
(ii) Mention any two methods of estimation.
(iii) Trace the steps for finding the MLE in straightforward two parameter cases.
(iv) Explain briefly the importance of the CRLB, in term of efficiency and
confidence interval.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. (a) 𝜃 = 0.3970 (b) 𝜃 = 1.087
!(!!!)
Ans.2. (i) 𝑀 = 2/𝜃 ! (ii)𝐸 𝑋 = 𝜃/3, 𝑉 𝑋 = 𝜃 ! /18 (iii) 𝐿 𝜃 = !!
𝜃 ≥ 𝑥
(iv) 𝜃 = 2𝑋 (MLE) 𝜃 = 3𝑋 (MOM) (v) MLE → Biased, MOM → unbiased
!! !!
Ans.3. (i) 𝜆 = 𝑋 (ii) 𝐸 𝜆 = 𝜆 (iii) CRLB = !
(iv) 𝑉(𝜆) = !
! !!
Ans.4. (i) 𝜆 = ! = 0.0029 (ii) CRLB = ! (iii) 𝜆 = 0.0024
!
Ans.5. (i) 𝐿 𝜃 = 2! /𝜃 !! !!! 𝑥! (ii) 𝐸(𝜃) < 𝜃 𝜃 is biased
!!! !! !!
Ans.6. (a) 𝐸 !
= 𝜃, (b) 𝜃 = !"# !!
−1 (c) CRLB is !
Ans.7. (a) 𝑘 = 2 (b) 𝜃 = max (𝑋! )
Ans.8. (b) 𝜆 = 0.0101 (c) CRLB = 𝜆! /7 = 0.000015
Ans.9. K = 10/11
! ! !!! !(!!!)
Ans.10. (i) 𝑝 = ! (ii) (a) CRLB = !
(b) Var(𝑝) = CRLB (c) 𝑝~𝑁 𝑝, !
! ! !
Ans.11. (i) 𝜆 = 𝑋 (ii) CRLB = ! (iii) (a) E(𝜆) = 𝜆, V(𝜆) = ! (b) 𝜆~𝑁(𝜆, !)
Ans.12. 𝜆 = 0.005154
Ans.13. 𝑐 = 3.425
Ans.14. (a) 𝑘 = 0.1946 (b) 𝑘 = 0.3957
Ans.15. (i) −0.1 ≤ 𝛼 ≤ 0.1

(ii) The method of moment estimate is 𝑋/10. As 𝑋 can take any value between -‐2 and
+2, the method of moments estimate can take any value between –0.2 and +0.2.
Thus it can be outside the range (-‐0.1, 0.1).
Ans.16. (iii) (a) 9.718 (b) 95% CI for 𝜃 = (7.50, 13.1)
Ans.17. (i) 𝜃 = 0.25 (ii) 𝜇 = 186.52
www.sankhyiki.in
+91-‐9711150002

!
Ans.18. (i) 𝜆 = !
!!
Ans.19. i 𝜆 = !!
(ii) Both are unbiased
! !
Ans.20. (i) − !" ≤ 𝜃 ≤ ! (ii)(a) 0.025 (b) 0.391 (c) 0.062 (iii)(b)0.0980
!""
Ans.21. P(90 satisfactory items out of 100) = !"
𝑝!" (1 − 𝑝)!" and 𝑝 = 0.9
! !
Ans.22. (i) − ! ≤ 𝜃 ≤ ! (ii)(a) 0.025 (b) 0.391 (c) 0.062
! ! !
(iii) (a) 𝐿 𝜃 = (! − 𝜃)!" (! + 2𝜃)!" (! − 𝜃)!"
(b) 0.189 is inadmissible and -‐0.0980 is admissible, therefore MLE 𝜃 = −0.0980
!
Ans.23. (i) 𝜆=! (ii) 𝑝 = 0.0070
Ans.24. (iv) 𝑎~𝑁(𝑎, 𝑎! 𝑛) (v) (3.560, 7.528) (vi) 0.0799
Ans.25. (i)1175 (ii) 𝑤 = 7.36875

!
Ans.26. (i) 𝑝 = 1 − 𝑒 !! (ii)(a) 𝑋~𝐵𝑖𝑛(20, 𝑝) (b) 𝑝 = !" (iii) 𝜆 = 0.288 (iv)𝜆 = 0.3
Ans.27. (i)(b) 𝑛! + 2𝑛! + 2𝑛! 𝜃 ! + 𝑛! + 2𝑛! + 𝑛! 𝜃 − 𝑛! + 2𝑛! = 0
!"(!) !! !!!! !! (!!!!!!! ! )

(ii)(a) !"
=− !!
− (!!!!! ! )!
(iii)(b)0.0244 (c) (0.405, 0.5)
! ! !!
Ans.28. (i) MME= ! (ii) Bias = 0 (iii) (a) MSE( ! ) = !!
(iii)(b) MME gets more efficient (MSE gets smaller) as sample size increases
Ans.29. (i) Mean = 5/0.1 = 50 (ii) Mean = 95.8, Variance = 4454.2 (iii) α=2.06 and
λ=0.0215
!! !!
Ans.30. (i) Mean = ! Variance = !"
Ans.31. (iii) The equation above needs to be solved numerically. Alternatively, the likelihood
(or log-likelihood) function can be plotted and the maximum can be identified
from the graph.
www.sankhyiki.in
+91-‐9711150002

!(!!!) !"!(!!!) (!!!)!
Ans.32. (i) MSE = !" = 0.0125 (ii) MSE = !"! + !"! = 0.0119
(iii) Even though p is the MLE and is unbiased, p is a more efficient estimate (for p =
0.5) having a smaller mean square error.
!! !! !! …!!
Ans.33. (i) (a) 𝐿 = ! !!
(b) 𝜃 = 𝑥(!) = max {𝑥!, 𝑥!,… 𝑥! }
!!! !!!! !!" !! !
(ii) (b) 𝑓! ! 𝑥 = ! !!
0≤𝑥≤𝜃 𝑐 𝐸 𝑋! = !!!! , 𝑉 𝑋 ! = (!!!)(!!!!)!
!!
(iii) (a) MSE= !!(!!!) (b) We have MSE→0 as n→ ∞, therefore the estimator is
consistent.
!
Ans.34. (i) 𝑝 = !.!"! (iii)0.2985
(iv) The MLE in part (iii) takes the structure of the entire probability function into
account while the estimator 58/200 only considers the number of policies with
one claim.
(v) No change required, since the MLE ˆp turns out to dependent only on the total
number of policies with less than three claims. [1]
2
(vi) χ –test
Ans.35. (i) Gamma(12, 6) (ii) 𝐵𝑖𝑎𝑠 𝜃! = 0 and 𝐵𝑖𝑎𝑠 𝜃! = 0
(iii) 𝑀𝑆𝐸(𝜃! ) = 0.333 and 𝑀𝑆𝐸 𝜃! = 0.547
(iv) 𝜃! has smaller MSE, therefore is more efficient than 𝜃! .
! !
Ans.36. L = !! So, as 𝜃 increases from zero, L(𝜃) is zero until it reaches the largest
observation in absolute value i.e. max |xi|, i = 1, 2, , n. For the data given, this value
is 0.92.
Ans.37. (b) Bias = 4/27 (c) 32/81
(d) S has a lower mean square error than T, Shriya should use the estimator S for
guessing the value of in spite of it being a biased estimator unlike T.
Ans. 38. (i)(a) The xi’s are known to be such that xi > 5, therefore have density which is a scaled
form of 𝜆𝑒 !!" for x>5.
!
The scaling constant k is such that ! 𝑘𝜆𝑒 !!" = 1
(c) 0.476
(ii) (b) 0.513
(iii) (b) All three are re-assuringly close. The pooled estimate is between the first two (as
expected, but it is closer to 0.513).
Ans.39. (i) 𝜆 = 𝑋
!!!
(ii) 𝜆 = −log ( ! ) where m are the number of observations greater than 0
Ans.40. (i) An estimator is a rule, as a function of the random sample, often expressed as a
formula that tells how to calculate the value of an estimate.
• If we have a random sample 𝑋 = (𝑋! , 𝑋! , … 𝑋! ) from a distribution with an
unknown parameter θ and 𝑔 𝑋 is an estimator of θ, it seems desirable that
E [𝑔 𝑋 ] = θ. This is the property of unbiasedness.
• The MSE of an estimator 𝑔 𝑋 for θ is defined by 𝑀𝑆𝐸 𝑔 𝑋 = 𝑉𝑎𝑟 + 𝐵𝑖𝑎𝑠 !
www.sankhyiki.in
+91-‐9711150002

• It is also desirable that an estimator gets better as the sample size increases i.e. it
is desirable that MSE → 0 as n → ∞. This property is known as consistency.
(ii) Method of moments and MLE
(iii) Steps for finding the maximum likelihood estimator in two parameter cases are:
• Write down the likelihood function, L
• Find log L and simplify the resulting expression
• Partially differentiate log L with respect to each parameter to be estimated
• Set the derivatives equal to zero
• Solve these equations simultaneously.
• Check the condition that the Hessian matrix i.e. the matrix of second
derivatives, is negative definite.
(iv) The CRLB provides a lower bound for the variance of an unbiased estimator as a
function of the true parameter value. It can be used to compare the efficiency of
different estimators.
It also provides an approximate value for the variance of the MLE of a parameter
when the sample size is large. Hence, it may be used to obtain approximate
confidence intervals.
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT - 7
CONFIDENCE INTERVAL AND HYPOTHESIS TESTING
Question 1. A market analyst is investigating the experience of two insurance companies A

and B with regard to a particular class of insurance business (which is written by
both companies). The analyst has access to a limited amount of information –a
random sample of 10 recent claim amounts from company A and a random
sample of 8 recent claim amounts from B. The data summaries are given below:
CompanyA: n = 10 ∑𝑥 = 322.8 ∑𝑥 ! = 11772.90

CompanyB: n = 8 ∑𝑥 = 195.7 ∑𝑥 ! = 5308.67
a) The analyst assumes that the variance of the amount is the same for both
the companies. Perform a suitable test to justify this assumption.
b) Calculate a 95% two sided confidence interval for the difference between
the mean claim amounts for all such business for the two companies.
c) Can it be concluded that the mean claim amount for all such business is
the same for two companies? Use a 5% level of significance.
Question 2. Consider the following testing problem. A random variable X has the following
distribution under 𝐻! and 𝐻! .
X: 17 210
𝐻! : 𝑃! : 0.3 0.7
𝐻! : 𝑃! : 0.8 0.2
𝐻! is rejected with probability ‘a’ when X = 17 and with probability ‘b’ when X =
210. Find ‘a’ and ‘b’ such that P (Type I error) = 1.6/3 and P (Type II error) =
0.4/3.
Question 3. (a) A tax preparation firm is interested in comparing the quality of work at two of
its regional offices. Out of 250 tax returns from office A, 35have errors, whereas
out of 300 returns filed in office B, 27 have errors.
Test the hypothesis that the proportions of erroneous return are equal at 1% level.
(b) In a random sample of 100 articles taken from a large batch of articles, 8 are
found to be defective. Obtain 95% confidence interval for the true proportion of
defectives in batch.
www.sankhyiki.in
+91-‐9711150002

Question 4. An insurance company has clients for its automobile policies in two regions A
and B. The company believes that the average claim amounts in both the areas are
not equal. A sample of 8 automobiles for which claims have been made is
selected at random from each of the two areas. The claim amounts (in 000’s) are
as follows.
Automobiles No: 1 2 3 4 5 6 7 8
Area A(x): 49 53 51 52 47 50 52 53
Area B(y): 52 55 52 53 50 54 54 53
𝑥 = 407, 𝑦 = 423, 𝑥!! = 20732, 𝑦!! = 22383
i. Plot the data and comment on the normality assumption.

ii. Carryout a test to decide whether there is evidence of difference between
the variability of the claim amounts within the two areas.
iii. Perform a t-test to investigate the company’s belief that the mean claim
amounts for area B and for area A are not the same.
iv. Calculate a two sided 95% confidence interval for the difference between
the mean claim amounts.
Question 5. An educationist is investigating the intelligence level of school students. A sample

of 400 of such students gives the results as below.
Intelligence level Total

Gender
Intelligent Dull
Boys 75 82 157
Girls 102 141 243
Total 177 223 400
i. Carry out a test to find whether there is any association between

intelligence level and gender.
ii. The educationist again sub divides his sample of 400 students according to
their place of living and it was found that 200 are from rural and 200 are
from urban areas. The results for the rural category of students are as
follows.
www.sankhyiki.in
+91-‐9711150002

Intelligence level Total

Intelligent Dull
Gender
Boys 49 29 78
Girls 31 91 122
Total 80 120 200
Carry out separate test for association between intelligence level and gender
according to the place of their domiciles. Comment briefly on your result.
Question 6. A simple random sample of 50 items resulted in a sample mean of 32 and a

sample standard deviation of 6. Obtain 95% confidence interval for the population
mean.
Question 7. A municipal corporation asks a random sample if 800 people whether they are for
or against a ban on smoking in public places. Each person was classified by
smoking habit and educational level. The results of the sample survey are detailed
in the following table.
For ban Against ban
Illiterate Literate Illiterate Literate

Smokers 177 83 123 67
non-
smokers 73 137 77 63
Investigate whether or not there is an association between smoking habit and

opinion on the ban on smoking among
(a) Literates (b) illiterates (c) all People

Comment on your results.
Question 8. The following figures gives the price (Rs) of a certain commodity in a sample of
10 shops each selected at random from city A and city B.
City A: 7.41 7.77 7.44 7.4 7.38 7.93 7.58 8.28 7.23 7.52
City B: 7.08 7.49 7.42 7.04 6.9 7.22 7.68 7.74 7.28 7.43
City A(x): 𝑥 = 75.94 𝑥 ! = 577.582

City B(y): 𝑦 = 73.28 𝑦 ! = 537.676
www.sankhyiki.in
+91-‐9711150002

i. Construct a dot plot and comment on the normality of the distribution and
the relative variability’s of two sets of data.
ii. Formally test for the equality of variance of the prices between the two
cities at 5% level.
iii. Assuming equality of variance construct 95% confidence interval for the
presumed common standard deviation of the prices.
iv. Test at the 5% level that the mean price of the commodity in city A is
greater than Rs. 7.40.
Question 9. A social worker interested in reducing obesity among children visited a village
and recorded the weight (in Kg) of 20 children.
The following are the weights of these 20 children
65 62 70 62 64 72 55 50 60 60
70 64 55 63 64 70 56 54 50 69
𝑥 = 1235 𝑥 ! = 77117
a) Plot the data using a plot and comment on whether measurements follow
normal distribution.
b) Calculate 95% confidence interval for the mean weight of obese children
in the village.
During the visit, the social worker advised the children to have routine physical
exercise and prescribed diets to reduce the weights. After two months the social
worker revisited the village and recorded the weights of the same children. The
data of recorded weights are given below in the same order of the children.
62 60 68 62 60 68 60 50 58 61
68 64 56 62 60 68 55 51 55 70
c) Carry out an appropriate t - test to investigate whether the advice given to

the children is effective in reducing the weights.
Question 10. A bird watcher sitting in a park has spotted a number of birds belonging to six
categories. The exact classification is given below.
Category 1 2 3 4 5 6
Frequency 6 7 13 14 9 5
Test at 5% level of significance whether or not the data are compatible with the
assumption that the park is visited by the same proportion of birds belonging to
these six categories in the population.
www.sankhyiki.in
+91-‐9711150002

Question 11. In the surgical treatment of duodenal ulcers there are three different operations
corresponding to the removal of various amounts of the stomach. The three
operations are denoted A, B and C with A being the least traumatic and C the
most traumatic.
It is known that these operations have an undesirable side-effect for some

patients. In cases where the side effect is present, it can be classified as being of
“slight degree” or of “moderate degree”.
The data in the following table related to a group of 417 patients and specify the
operation received and the degree of the side effects suffered.
Existence/degree of side effects

Operation None Slight Moderate Total
A 63 26 7 96
B 126 63 25 214
C 51 40 16 107
Total 240 129 48 417
a) Perform a 𝜒 ! test on this table to investigate independence between level
of operation and degree of side-effects.
b) Also examine whether the operation has any significance on the presence
of side-effects.
Question 12. The following table gives the length of time required to assemble the device using
standard procedure and new procedure. Two groups of nine employees were
selected randomly, one group using the new procedure and the other following
standard procedure
Length of time (in minutes)

Standard Procedure(𝑋! ) 32 37 35 28 41 44 35 31 34
New Procedure(𝑋! ) 35 31 29 25 34 40 27 32 31
! !
It is given that (𝑋! − 𝑋! )! = 195. 5556 (𝑋! − 𝑋! )! =160. 2222
!!! !!!
a) Do the data present sufficient evidence to indicate that the mean time to
assemble the device under standard procedure is less than the mean time
under new procedure?
b) Obtain the 95% confidence interval for the difference in mean.
c) Test for the equality of the variance of the two procedures.
www.sankhyiki.in
+91-‐9711150002

Question 13. Let X be a random variable following exponential distribution with density
𝜇𝑒 !!" if 0 < 𝑥 < ∞, 𝜇 > 0

𝑓(𝑥 𝜇) =
0 otherwise
For testing 𝐻! : µ= 20 against 𝐻! : µ = 30, a single value is observed from this
distribution. If this value is less than 28, 𝐻! is accepted otherwise rejected. Find
the probabilities of Type1 error and Type II error.
Question 14. Two thousand individuals were chosen at random by a researcher and cross
classified according to gender and color blindness as given below:
Description Male Female

Normal 904 998
Color Blind 91 7
a) Apply an appropriate test to conclude that there is overwhelming evidence

against the hypothesis that is no association between gender and color
blindness.
b) A genetic model states that the human population is split in the
proportions as illustrated in the following table where 𝑞(0 < 𝑞 < 1) is a
parameter related to the distribution of the color blindness.
Description Male Female

Normal (1-q)/2 (1-𝑞! )/2
Color Blind q/2 𝑞! /2
Using the data in (a):
i. Write down the likelihood function for the above model.

ii. Determine the maximum likelihood estimate of q.
Question 15. Eight pairs of slow learners with similar reading capabilities are identified in a
third grade class. One member of each pair is randomly assigned to the standard
teaching method, while the other is assigned to a new teaching method. The
scores are as given below.
Pair 1 2 3 4 5 6 7 8
New Method 77 74 82 73 87 69 66 80
Old Method 72 68 76 68 84 68 64 76
www.sankhyiki.in
+91-‐9711150002

a) Test for the difference between mean scores for the two methods.
b) Test for the equality of variance for these two methods.
c) Obtain the 95% of confidence interval for the difference in means.
Question 16. In a random sample of 200 stomachs cancer patients yielded 92 having blood type
A, 20 having blood type B, 4 having blood type AB and 84 having blood type O.
Are these data significance enough; at 5% level of significance to enable us to

reject the null hypothesis that the blood type distribution of stomach cancer
suffers is the same as that of the general population?
Question 17. A software company has developed a new software package to help the system
analyst working in insurance industries to reduce the time required to design,
develop and implement an information system. To evaluate the benefits of this
new software the insurance company has selected 24 system analysts, out of
which 12 of them were instructed to produce the information system using current
technologies and the rest of them were trained and then were asked to produce the
information system using new software. The data set is as given below.
Time required completing the information system using:
Current technology (𝑥! ) 300 280 344 385 372 360 288 321 376 290 301 283
New Software (𝑥! ) 276 222 310 338 200 302 317 260 320 312 334 265
i. Stating the hypothesis test that the new software package will provide a
shorter mean project completion time than the current technology? Use
L.S = 0.05.
ii. Test the hypothesis that the variances of the project completion times are
equal. Use L.S = 0.05.
iii. Obtain 90% confidence interval for the difference between the means of
two populations.
Question 18. A training manager of an insurance company wishes to see if there has been any
change in the ability of his trainees after they have been on a course. The trainees
take an aptitude test before they start the course and equivalent one after they
have completed it. The scores are recorded below:
Scores before training 42 35 37 46 53 38 44 40 43

Scores after training 47 28 26 54 42 17 44 31 44
www.sankhyiki.in
+91-‐9711150002

a) Has any change taken place? Test your claim at 5% level.
b) Obtain 95% confidence interval for the mean change in ability of trainees.
c) Compute Pearson’s correlation coefficient between the scores before and
after training and test its significance using t-test at 5% level of
significance.
Question 19. The diameter of steel rods manufactured on two different machines A and B is
studied. Two random samples of sizes 𝑛! = 12 and 𝑛! = 15 are selected and the
sample means and sample standard deviation respectively are
𝑥! = 24.6, 𝑠! = 0.85; 𝑥! = 22.1, 𝑠! = 0.98
Assuming that the diameter of the rods follow 𝑁(𝜇! , 𝜎!! ) and 𝑁(𝜇! , 𝜎!! ).
a) Test for the equality of the variances.

b) Test the equality of means of the diameters of the rods manufactured by
the two machines assuming 𝜎!! = 𝜎!! .
c) Construct 95% confidence interval for 𝜇! − 𝜇! assuming 𝜎!! = 𝜎!!
d) Construct 95% confidence interval for 𝜎!! /𝜎!! .
Question 20. A textile fiber manufacturer is investigating a new drapery yarn, which the
company claims that the thread elongation this yarn follows normal distribution
with mean 12kg and sd 0.5 kg. The company wishes to test the hypothesis
𝐻! : µ = 12 against 𝐻! : µ <12 using a random sample of 4 specimens.
a) What is the probability of type I error if the critical region used is 𝑥 < 11.5 kg.
b) Find the power for the case in (a) when the true mean elongation is 11.25 kg.
Question 21. Claims on a certain type of policy are such that the claim amounts are
approximately normally distributed.
(i) A sample of 101 such claim amounts (in £) yields a sample mean of £416
and sample standard deviation of £72. For this type of policy:
(a) Obtain a 95% confidence interval for the mean of the claim
amounts.
(b) Obtain a 95% confidence interval for the standard deviation of the
claim amounts.
The company makes various alterations to its policy conditions and thinks that
these changes may result in a change in the mean, but not the standard deviation,
of the claim amounts. It wants to take a random sample of claims in order to
www.sankhyiki.in
+91-‐9711150002

estimate the new mean amount with a 95% confidence interval equal to sample
mean ± £10.
(ii) Determine how large a sample must be taken, using the following as an
estimate of the standard deviation:
(a) The sample standard deviation from part (i).

(b) The upper limit of the confidence interval for the standard
deviation from part (i)(b).
(iii) Comment briefly on your two answers in (ii)(a) and (ii)(b).
Question 22. A random sample of 60 adult men who live in Leeds includes 21 who have visited
Majorca. An independent random sample of 70 adult women who live in Leeds
includes 28 who have visited Majorca.
Calculate a 98% confidence interval for the proportion of adults who live in Leeds
who have visited Majorca.
Question 23. Consider a random sample X1, …, Xn from a Poisson distribution with expectation
E[Xi] = λ. An estimator 𝜆 for the parameter λ is given by the observed mean of the
!
sample, that is: 𝜆 = ! !!!! 𝑋!
(i) Derive formulae for the expected value & variance of 𝜆 in terms of λ & n.
Assume in parts (ii) to (v) that the true parameter value is λ =0.25
(ii) Calculate the exact probability that 0.2 ≤ 𝜆 ≤ 0.3 if the sample size is n
=10.
(iii) Calculate the approximate probability that 0.2 ≤ 𝜆 ≤ 0.3 if the sample size
is n =10 using the following :
(a) the normal approximation to !!!! 𝑋! with continuity correction.
(b) the normal approximation to !!!! 𝑋! without continuity correction.
(iv) Comment on the difference in your answers in part (ii) and (iii).
(v) Calculate the minimal required sample size n for which the probability
that 0.2≤ 𝜆 ≤ 0.3 is at least 0.95, using the normal approximation without
continuity correction.
Suppose a random sample of size n = 400 gives the estimate 𝜆=0.27.
(vi) Calculate a 95% confidence interval for λ.
Question 24. In a recent study of attitudes to a proposed new piece of consumer legislation
(“proposal X”) independent random samples of 200 men and 200 women were
asked to state simply whether they were “for” (in favour of) , or “against”, the
proposal. The resulting frequencies, as reported by the consultants who carried
out the survey, are given in the following table:
www.sankhyiki.in
+91-‐9711150002

Men Women
For 138 130
Against 62 70
(i) Carry out a formal chi-squared test to investigate whether or not an
association exists between gender and attitude to proposal X.
Note: in this and any later such tests in this question you should state the P-value
of the data and your conclusion clearly.
At a subsequent meeting to discuss these and other results, the consultants

revealed that they had in fact stratified the survey, sampling 100 men and 100
women in England and 100 men and 100 women in Wales. The resulting
frequencies were as follows:
England Wales
Men Women Men Women
For 82 66 56 64
Against 18 34 44 36
A chi-squared test to investigate whether or not an association exists between

gender and attitude to proposal X in England gives χ2 = 6.653, while an
equivalent test for Wales gives χ2 = 1.333.
(ii) (a) Find the P-value for each of the chi-squared tests mentioned above
and state your conclusions regarding possible association between
gender and attitude to proposal X in England and in Wales.
(b) Discuss the results of the survey for England and Wales separately
and together, quoting relevant percentages to support your
comments.
(iii) A different survey of 200 people conducted in each of England, Wales,
and Scotland gave the following percentages in favour of another
proposal:
England Wales Scotland

% in favour of proposal 62% 53% 58%
A chi-squared test of association between country and attitude to the proposal

gives χ2 = 3.332 on 2 degrees of freedom, with P-value 0.189.
Suppose a second survey of the same size is conducted in the three countries and
results in the same percentages in favour of the proposal as in the first survey.
The results of the two surveys are now combined, giving a survey based on the
attitudes of 1,200 people.
www.sankhyiki.in
+91-‐9711150002

(a) State (or find) the results of a second chi-squared test for an association
between country and attitude to the proposal, based on the overall survey
of 1,200 people.
(b) Comment briefly on the results.
Question 25. In a random sample of 200 people taken from a large population of adults, 70
people intend to vote for party A at the next election.
(i) Calculate an approximate equal-tailed 95% confidence interval for θ, the

true proportion of this population who intend to vote for party A at the
next election.
(ii) Give a brief interpretation of the interval calculated in part (i).
Question 26. A random sample of 200 email messages was selected from all messages
delivered through an Internet provider company. Each message is monitored for
the presence of computer viruses. It is assumed that each message contains a virus
with the same probability p, independently from all other messages.
Let Yi , i =1,… , 200 be indicator random variables taking the value 1 if message i
contains a virus, and 0 otherwise. Also, let Y denote the total number of messages
in this sample found to contain viruses, i.e.𝑌 = !""
!!! 𝑌!
(i) Derive expressions for the expected value and the variance of Y in terms
of the parameter p, using the indicator variables Y1, Y2,… , Y200.
(ii) Explain why the approximate distribution of Y is N(200p, 200p(1−p)),

using the indicator variables Y1, Y2,… , Y200.
It is found that 38 email messages in this sample contained viruses.
(iii) Calculate an equal-tailed 90% confidence interval for the probability p

using the approximate normal distribution from part (ii).
Question 27. Consider a random sample X1,…, Xk of size k = 400 . Statistician A wants to use a
χ2 -test to test the hypothesis that the distribution of Xi is a binomial distribution
with parameters n = 2 and unknown p based on the following observed
frequencies of outcomes of Xi :
Possible realisation of Xi 0 1 2
Frequency 90 220 90
(i) Estimate the parameter p using the method of moments.
(ii) Test the hypothesis that Xi has a binomial distribution at the 0.05
significance level using the data in the above table and the estimate of p
obtained in part (i).
www.sankhyiki.in
+91-‐9711150002

Statistician B assumes that the data are from a binomial distribution and wants to
test the hypothesis that the true parameter is p0 = 0.5.
(iii) Explain whether there is any evidence against this hypothesis by using the
estimate of p in part (i) and without performing any further calculations.
Statistician C wants to test the hypothesis that the random variables Xi have a
binomial distribution with known parameters n= 2 and p = 0.5.
(iv) Write down the null hypothesis and the alternative hypothesis for the test
in this situation.
(v) Carry out the test at the significance level of 0.05 stating your decision.
(vi) Explain briefly the relationship between the test decisions in parts (ii), (iii)
and (v), and in particular whether there is any contradiction.
Question 28. Analyst A collects a random sample of 30 claims from a large insurance portfolio
and calculates a 95% confidence interval for the mean of the claim sizes in this
portfolio. She then collects a different sample of 100 claims from the same
portfolio and calculates a new 95% confidence interval for the mean claim size.
(i) Explain how the widths of the two confidence intervals will differ.
Analyst B obtains a 95% confidence interval for the mean claim size of this
portfoliobased on a different sample of 30 claims. She subsequently realises that
one of the claims in the sample has an extremely large value and can be
considered as an outlier. She decides to replace this claim with a new randomly
selected one, whose size is not an outlier, and obtains a new 95% confidence
interval.
(ii) Explain how the two confidence intervals will differ in the case of Analyst
B.
Question 29. In order to compare the effectiveness of two new vaccines, A and B, for a
childhood disease, 11 infants were immunised with vaccine A and 9 infants were
immunized with vaccine B. One month after immunisation the concentration of
the disease antibodies in the blood of each infant was recorded in appropriate
units. The sample mean and variance for each group is given below.
Vaccine A: nA =11, 𝑥 A = 4.05, s2A = 0.692

Vaccine B: nB = 9, 𝑥 B = 4.36, s2B = 0.813
It is assumed that the distributions of the antibody concentration levels after

immunisation with vaccine A and vaccine B are N(µA,σ2A) and N(µB,σ2B )
respectively. You may assume that the samples are independent.
! ! !!
!
(i) State the distribution of the pivotal quantity !!! !
!!
.
!
www.sankhyiki.in
+91-‐9711150002

!
!!
(ii) Calculate an equal-tailed 95% confidence interval for the ratio !
!!
using the
pivotal quantity in part (i). (You are not required to show the derivation of
the interval.)
We now assume that 𝜎!! = 𝜎!! = 𝜎 ! . Under this assumption, you are given that
!"! !
!
the distribution of !!! is 𝜒!" , 𝑤ℎ𝑒𝑟𝑒 𝑠!! is the pooled variance of the two samples
and is independent from 𝑥 A and 𝑥 B.
(iii) Explain why, under the above result, the sampling distribution of
!! !!! !(!! !!! )
! !
is 𝑡!" .
!! !
!! !
(iv) Calculate an equal-tailed 95% confidence interval for µA − µB using the
sampling distribution in part (iii). (You are not required to show the
derivation of the interval.)
(v) Comment on your results with regard to differences between vaccine A
and vaccine B.
Question 30. An insurer has collected data about the body mass index of 200 males between the
age of 18 and 40. The results are shown in the following table.
Body mass index < 18.5 18.5–25 25–30 >30

Observed frequency 6 114 62 18
A statistician suggests the following model for the distribution of the body mass
index with an unknown parameter p.
Body mass index < 18.5 18.5–25 25−30 >30
Relative frequency p 20p 10p 1−31p
(i) Estimate the parameter p using the method of maximum likelihood.

(ii) Perform a statistical test to decide whether the suggested distribution is
appropriate for the observed data. You should state the null hypothesis for
the test and your decision.
To improve the description of the distribution of the body mass index, it is

suggested that the marital status of the males in this study is also recorded. The
results are shown in the following table.
Marital Status Body mass index Total

< 18.5 18.5–25 25–30 >30
Single 5 98 43 12 158
Married 1 16 19 6 42
Total 6 114 62 18 200
www.sankhyiki.in
+91-‐9711150002

A life office has considered a sample of 10,000 men aged between 18 and 40 of
which 50% are married and the other 50% are single.
(iii) Estimate the proportion of men with a body mass index of more than 30 in
this sample, based on the data in the above table.
(iv) Determine whether the body mass index is independent of the marital
status or not, using an appropriate statistical test. You should state the null
hypothesis for the test, calculate the value of the test statistic and the
approximate p-value and state your decision.
Question 31. Bank robberies in various countries are assumed to occur according to Poisson
processes with rates that vary from year to year. It was reported that the number
of robberies in a particular country in a specific year was 123. The number of
robberies in a different country in the same year was 111. It can be assumed that
each robbery is an independent event and that robberies occur independently in
the two countries.
Determine an approximate 90% confidence interval for the difference between the
true yearly robbery rates in the two countries.
Question 32. A survey is undertaken to investigate the proportion p of an adult population that
support a certain government policy. A random sample of 100 adults is taken and
contains 30 who support the policy.
(i) Calculate an approximate 95% confidence interval for p.

(ii) Comment on the validity of the interval obtained in part (i).
A different sample of 1,000 adults is taken and it contains 300 who support the
policy.
(iii) Explain how the width of a 95% confidence interval for p in this case will
compare to the width of the interval in part (i), without performing any
calculations.
Question 33. A behavioral scientist is observing a troop of monkeys and is investigating

whether social status affects the amount of food that an individual takes. The
monkeys are divided into two groups of different social rank and the scientist
counts the number of bananas each individual takes. Each monkey can take a
maximum of 7 bananas.
Social rank A B
Number of monkeys 6 11
Total bananas taken 33 37
www.sankhyiki.in
+91-‐9711150002

(i) It is first suggested that the number of bananas taken by each individual of
each group follows the same binomial distribution with common
parameter p and n=7.
(a) Use the method of moments to estimate the parameter p.
(b) The scientist is unsure whether a common parameter is appropriate
and wishes to compare pA and pB, the probability that a banana is
taken by an individual in groups A and B respectively.
Test the hypothesis that pA = pB.
(ii) A statistician suggests an alternative model. The number of bananas taken
by an individual still follows a binomial distribution with n=7, but for
group A the parameter is 2θ and for group B the parameter is θ, where
θ < 0.5.
(a) Show that the log likelihood for θ is given by:
33ln (2θ) + 9ln (1− 2θ) + 37ln (θ) + 40ln (1− θ) + constant
(b) Hence calculate the maximum likelihood estimate of θ.
(iii) (a) Compare the fit of the two suggested models in parts (i) (with
common parameter p) and (ii) by considering the expected number
of bananas taken in groups A and B under the two models. You are
not required to perform a formal test.
(b) Comment on the above comparison in relation to your answer in
part (i)(b).
Question 34. A researcher obtains samples of 25 items from normally distributed measurements
from each of two factories. The sample variances are 2.86 and 9.21 respectively.
(i) Perform a test to determine if the true variances are the same.
(ii) For each factory calculate central 95% confidence intervals for the true
variances of the measurements.
(iii) Comment on how your answers in parts (i) and (ii) relate to each other.
Question 35. In an opinion poll, a sample of 100 people from a large town were asked which
candidate they would vote for in a forthcoming national election with the
following results:
Candidate A B C
Supporters 32 47 21
(i) Determine the approximate probability that candidate B will get more than
50% of the vote.
A second opinion poll of 150 people was conducted in a different town with the
following results:
www.sankhyiki.in
+91-‐9711150002

Candidate A B C
Supporters 57 56 37
(ii) Use an appropriate test to decide whether the two towns have significantly
different voting intentions.
Question 36. In a medical study conducted to test the suggestion that daily exercise has the
effect of lowering blood pressure, a sample of eight patients with high blood
pressure was selected. Their blood pressure was measured initially and then again
a month later after they had participated in an exercise programme. The results are
shown in the table below:
Patient 1 2 3 4 5 6 7 8
Before 155 152 146 153 146 160 139 148
After 145 147 123 137 141 142 140 138
(i) Explain why a standard two-sample t-test would not be appropriate in this
investigation to test the suggestion that daily exercise has the effect of
lowering blood pressure.
(ii) Perform a suitable t-test for this medical study. You should clearly state
the null and alternative hypotheses.
Question 37. An insurance company experiences claims from 290 insurance policies in a year
on a portfolio of 900 policies. Only one claim can be made on a policy in a year.
The company assumes that all policies are independent of each other.
Determine a 90% confidence interval for the proportion of policies on which a
claim is made in a year.
Question 38. A random sample of 30 observations is drawn from a normal distribution with
unknown variance.
(i) Write down an expression for the distribution of S, the population standard
deviation.
The sample standard deviation, s, is 7.5.
(ii) Calculate a 95% confidence interval for the population standard deviation.
Question 39. An insurance company has calculated premiums assuming that the average claim
size per claim for a certain class of insurance policies does not exceed £20,000
per annum. An actuary analyses 25 such claims that have been randomly selected.
She finds that the average claim size in the sample is £21,000 and the sample
standard deviation is £2,500. Assume that the size of a single claim is normally
distributed with unknown expectation α and variance σ2.
(i) Calculate a 95% confidence interval for α based on the sample of

25claims.
www.sankhyiki.in
+91-‐9711150002

(ii) Perform a test for the null hypothesis that the expected claim size is not
greater than £20,000 at a 5% significance level.
(iii) Discuss whether your answers to parts (i) and (ii) are consistent.
(iv) Calculate the largest expected claim size, a0, for which the hypothesis
α ≤α0 can be rejected at a 5% significance level based on the sample of 25
claims.
The insurer is also concerned about the number of claims made each year. It is
found that the average number of claims per policy was 0.5 during the year 2011.
When the analysis was repeated in 2012 it was found that the average number of
claims per policy had increased to 0.6. These averages were calculated on the
basis of random samples of 100 policies in each of the two years. Assume that the
number of claims per policy per year has a Poisson distribution with unknown
expectation λ and is independent from the number of claims in any other year or
for any other policy.
(v) Perform a test at 5% significance level for the null hypothesis that l= 0.6
during the year 2011.
(vi) Perform a test to decide whether the average number of claims has
increased from 2011 to 2012.
Question 40. The distribution of claim size under a certain class of policy is modeled as a
normal random variable, and previous years records indicate that the standard
deviation is £120.
(i) Calculate the width of a 95% confidence interval for the mean claim size
if a sample of size 100 is available.
(ii) Determine the minimum sample size required to ensure that a 95%
confidence interval for the mean claim size is of width at most £10.
(iii) Comment briefly on the comparison of the confidence intervals in (i) and
(ii) with respect to widths and sample sizes used.
Question 41. A researcher wishes to investigate whether a coin is balanced or not, that is if
P(heads) = 0.5. She throws the coin four times and decides to accept the
hypothesis H0 : P(heads) = 0.5 in a test against the alternativeH1 : P(heads) ≠ 0.5,
if the number of times that the coin lands “heads” is 1, 2, or 3.
(i) Calculate the probability of the type I error of this test.
(ii) Calculate the probability of the type II error of this test, if the true
probability that the coin lands “heads” is 0.7.
www.sankhyiki.in
+91-‐9711150002

Question 42. Pressure readings are taken regularly from a meter. It transpires that, in a random
sample of 100 such readings, 45 are less than 1, 35 are between 1 and 2, and 20
are between 2 and 3.
Perform a χ2 goodness of fit test of the model that states that the readings are
independent observations of a random variable that is uniformly distributed on
(0, 3).
Question 43. In a survey conducted by a mail order company a random sample of 200
customers yielded 172 who indicated that they were highly satisfied with the
delivery time of their orders.
Calculate an approximate 95% confidence interval for the proportion of the
company’s customers who are highly satisfied with delivery times.
Question 44. Let X1, …, Xn denote a large random sample from a distribution with unknown
population mean and known standard deviation 3. The null hypothesis H0: 𝜇 = 1 is
to be tested against the alternative hypothesis H1: 𝜇 > 1, using a test based on the
sample mean with a critical region of the form 𝑋 > k, for a constant k.
It is required that the probability of rejecting H0 when 𝜇 = 0.8 should be
approximately 0.05, and the probability of not rejecting H0 when 𝜇 = 1.2 should
be approximately 0.1.
(i) Show that the test requires

where Φis the standard normal distribution function.
(ii) The values for the sample size n and the critical value k which satisfy the
requirements of part (i) are n = 482 and k = 1.025 (you are not asked to
verify these values).
Calculate the approximate level of significance of the test, and comment
on the value.
Question 45. (i) A random variable Y has a Poisson distribution with parameter but there is
a restriction that zero counts cannot occur. The distribution of Y in this
case is referred to as the zero-truncated Poisson distribution.
(a) Show that the probability function of Y is given by
! ! ! !!
𝑃 𝑦 = !!(!!! !! ) 𝑦 = 1,2,3, …
(b) Show that E[Y] = 𝜃/(1 − 𝑒 !! ).

(ii) (a) Let y1,… , yn denote a random sample from the zero-truncated
Poisson distribution.
www.sankhyiki.in
+91-‐9711150002

Show that the maximum likelihood estimate of may be determined
by the solution to the following equation:
!! !!
𝑦 − 𝜃 − !!! !! = 0
and deduce that the maximum likelihood estimate is the same as

the method of moments estimate.
(b) Obtain an expression for the Cramer-Rao lower bound (CRlb) for
the variance of an unbiased estimator of 𝜃.
(iii) The following table gives the numbers of occupants in 2,423 cars observed
on a road junction during a certain time period on a weekday morning.
Number of occupants 1 2 3 4 5 6
Frequency of cars 1,486 694 195 37 10 1
The above data were modelled by a zero-truncated Poisson distribution as
given in (i).
The maximum likelihood estimate of 𝜃 is 𝜃 = 0.8925 and the Cramer-Rao
lower bound on variance at 𝜃= 0.8925 is 5.711574×10-4 (you do not need
to verify these results.)
(a) Obtain the expected frequencies for the fitted model, and use a 𝜒2
goodness-of-fit test to show that the model is appropriate for the
data.
(b) Calculate an approximate 95% confidence interval for 𝜃 and hence
calculate a 95% confidence interval for the mean of the zero-
truncated Poisson distribution.
Question 46. Calculate a 99% confidence interval for the percentage of claims for household
accidental damage, which are fully settled within six months of being submitted,
given that in a random sample of 100 submitted claims of this type, exactly 83
were fully settled within six months of being submitted.
Question 47. A random sample of 500 claim amounts resulted in a mean of £237 and a standard
deviation of £137.
Calculate an approximate 95% confidence interval for the true underlying mean
claim amount for such claims, explaining why the normal distribution can be
used.
Question 48. Let 𝑋! and 𝑋! constitute a random sample of size 2 from a 𝑁(𝜃, 1) population.
For testing 𝐻! : 𝜃 = 0 vs 𝐻! : 𝜃 > 0, we have two competing tests:
𝑇𝑒𝑠𝑡 1: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻! 𝑖𝑓 𝑋! > 0.95 𝑇𝑒𝑠𝑡 2: 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻! 𝑖𝑓 𝑋! + 𝑋! > 𝐶
(a) Find the value of C so that Test 2 has the same 𝑃(𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) as that of
Test 1.
www.sankhyiki.in
+91-‐9711150002

(b) Compute the P(Type II error) of each test given the value of 𝜃 = 𝜃! > 0.
(c) Comment on your results as obtained in part(b)?
Question 49. An insurance company has a portfolio of 10,000 policies. Based on past data the
company estimates that the probability of a claim on any one policy in a year is
0.003. It assumes no policy will generate more than one claim in a year.
(i) Determine the approximate probability of more than 40 claims from the
portfolio of 10,000 policies in a year.
(ii) Determine an approximate equal-tailed interval into which the number of
claims per year will fall with probability 0.95.
In practice 42 claims were received in a particular year. A Director of the
company complains about the range of estimates in part (ii) being wrong.
(iii) Comment on the Director’s complaint.
Question 50. It is desired to test 𝐻! ∶ 𝜃 = 1 against 𝐻! ∶ 𝜃 = 0.5 based on a random sample of

size 1 from an exponential distribution with density given by
𝑓 𝑥 = 𝜃𝑒 !!" ; 𝑥, 𝜃 > 0
The critical region is given as 𝑥 > −𝑙𝑜𝑔𝛼 ; 0 < 𝛼 < 1
Show that
(i) 𝑃 𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟 = 𝛼 (ii) Power of the test = 𝛼
Question 51. (i) In the context of hypothesis testing, define a statistical test, null hypothesis
and alternate hypothesis.
(ii) List the steps involved in hypothesis testing.
(iii) An experimenter has prepared a drug dosage level that she claims will
induce sleep for 80% of people suffering from insomnia. After examining
the dosage, we feel that her claims regarding the effectiveness of the
dosage are inflated. In an attempt to disprove her claim, we administer her
prescribed dosage to 20 insomniacs and we observe X, the number for
whom the drug dose induces sleep. Assume that the rejection region
{x ≤ k} is used
(a) Find the value of k so that P (Type I error), α, is approximately at
1% level
(b) For the rejection region in part (iii), find P (Type II error), β, when
the proportion of people suffering from insomnia is 1/2.
Question 52. A Seismologist collected data on one of the islands of Japan. The island was struck with
one earthquake in past one year. She is interested in µ, the average number of
earthquakes per annum.
(i) Obtain an exact 95% confidence interval for µ using one-year data.
She also collected past data and found that the island was struck with 36 earth quakes in
the past 36 years
(ii) Find an approximate 95% confidence interval for µ.
(iii) Compare and comment on the confidence intervals obtained in parts (i)
and (ii) above.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. (a) T.S = 2.02, do not reject 𝐻! (b) C.I = (-3.06, 18.70) (c) Yes, as C.I included zero
Ans.2. a = 1 b = 1/3
Ans.3. (a) T.S = 1.85, do not reject 𝐻! (b) C.I = (0.03, 0.13)
Ans.4. (ii) F = 1.83, Accept H! (iii) T.S = −2.17, Reject H! (iv) C.I = (−3.98, −0.02)
Ans.5. (i) T.S = 1.3, Accept H! (ii) T.S = 27.25, reject H! (Rural) T.S = 12.71, Reject H! (Urban)
Ans.6. C.I = (30.295, 33.705)
Ans.7. (i) T.S = 6.37, Reject H! (ii) T.S. = 4.32, Reject H! (iii) T.S = 0.402, do not reject H!
Ans.8. (ii) F = 1.31, do not reject H! (iii) C.I = (0.0499, 0.1912) (iv) T.S = 1.95, reject H!
Ans.9. (b) C.I = (58.61, 64.89) (c) T.S = 1.47, do not reject H!
Ans.10. T.S = 7.778, do not reject H!
Ans.11. (a) T.S = 7.65, do not reject H! (b) T.S = 7.01, reject H!
Ans.12. (a) T.S = 1.65, accept H! (b) C.I = (-1.05, 8.37) (c) F = 1.22, accept H!
Ans.13. P (Type I error) = 0 P (Type II error) = 1
Ans.14. (a) T.S = 76.598, reject H! (b) 𝑞 = 0.089
Ans.15. (a) T.S = 1.198, accept H! (b) T.S = 1.1667, accept H! (c) C.I = (-3.16, 11.16)
Ans.16. T.S = 118.72, rejectH!
Ans.17. (i) T.S = 2.16, reject H! (ii) F = 1.21, accept H! (iii) C.I = (7.5261, 66.4739)
Ans.18. (a) T.S = 1.63, accept H! (b) C.I = (-2.088, 12.088)
(c) r = 0.6796 T.S =2.451, reject H!
Ans.19. (a) F = 1. 329, accept H! (b) T.S =1 .70874, reject H!
(c) C.I = (-3.24, -1.76) or C.I = (1.76, 3.24) (d) C.I = 0.2411, 2.581
Ans.20. (a) 0.0228 (b) 0.8413
Ans.21.(i)(a) (402.0, 430.0) (b) (63.2, 83.6)
www.sankhyiki.in
+91-‐9711150002

(ii)(a) 𝑛 = 199.15, 𝑠𝑜 𝑛 ≥ 200 (b) 𝑛 = 268.30, 𝑠𝑜 𝑛 ≥ 269
(iii) Assuming a larger value of s results in a larger standard error, so a larger sample
size is required to achieve the same width of confidence interval.
Ans.22. (0.278, 0.476)
Ans.23. (i) 𝐸 𝜆 = 𝜆 and V[𝜆] = 𝜆 𝑛 (ii) 0.47028 (iii) (a) 0.4713 (b) 0.2510
(iv) When compared to the exact probability in (ii) the results in (iii) (a) and (b) show that
the continuity correction reduces the approximation error significantly for this small
sample size.
(v) 𝑛 ≈ 384 (vi) [0.21908, 0.32092]
Ans.24.(i) P-value = P(𝜒!! > 0.724) = 0.395

No evidence against H0 - we conclude that no association exists between gender and
attitude to proposal X.
(ii)(a) For England: P-value = P(𝜒!! > 6.653) = 0.01

Evidence against H0 – we reject it at the 1% level of testing and conclude that an
association exists between gender and attitude to proposal X in England.
For Wales: P-value = P(𝜒!! > 1.333) = 0.248

No evidence against H0 – we conclude that there is no association between gender and
attitude to proposal X in Wales.
(b) England: there is evidence of an association – 82% of men and only 66% of women
support proposal X – these proportions are significantly different.
Wales: there is no evidence of an association – 56% of men and 64% of women support
proposal X – these proportions are not significantly different.
The effects are in different directions and cancel out to some extent when the data are
combined: now there is no evidence of an association – overall 69% of men and 65% of
women support proposal X – these proportions are not significantly different.
The combined data give a misleading message – they hide the effect of the factor
“country” and fail to reveal that there is an association in England.
(iii)(a) The χ2 value doubles to 6.664
P-value = P(𝜒!! > 6.664) = 0.0357

Conclusion: reject “no association” at the 3.6% level of testing and conclude that an
association does exist.
www.sankhyiki.in
+91-‐9711150002

(b) Comment: having more data with the same proportions provides strong enough
evidence to justify claiming that an association exists.
Ans.25. (i) (0.284, 0.416) (ii) If we take a large number of samples from this population, we
expect 95% of the resulting CIs to include the true value of θ.
Ans.26 (i) E(Y) = 200p V(Y) = 200p(1-p)
(ii) Again, with Yi being iid Bernoulli(p) random variables and n being sufficiently
large, the central limit theorem implies that Y follows an approximately a normal
distribution.
(iii) (0.144, 0.236)
Ans.27.(i) 𝑝 = 0.5 (ii) TS = 4 : Reject H0
(iii) Since the estimated value is 0.5, any reasonable test will not reject that value, since
the value 0.5 will always be in the acceptance region of the test. In other words, 0.5 will
always be in any confidence interval around the estimate 0.5.
(iv) We now have: H0 : Xi ~ Bin(2, 0.5) and
H1 : Xi does not follow Bin(2, 0.5) (emphasis on both Bin, p = 0.5 )
(v) TS = 4, do not reject H0 (degree of freedom is 2)
(vi) The result in part (ii) states that a binomial distribution does not fit the data
well and is rejected. However, in part (iii) we found that, under the assumption of a
binomial distribution, p0 = 0.5 cannot be rejected. A specific binomial distribution with
parameter p = 0.5 is not rejected in part (v) for the same data. The reason is that the
additional degree of freedom in part (v) allows for a larger value of the test-statistic under
the null.
Ans.28.(i) With the larger sample of 100 claims the standard error of the sample mean will be
smaller, giving a narrower confidence interval.
(ii) The replacement of the extreme value will give a smaller sample mean, which means
that the interval will be shifted to the left. The variance of the sample will also be smaller,
which will again give a narrower interval.
Ans.29.(i) This is an F distribution with 10, 8 degrees of freedom.
(ii) (0.198, 3.281)
(iii) As the two samples are independent we have that
www.sankhyiki.in
+91-‐9711150002

!(!! ) !(!! ) ! !
V(𝑋! − 𝑋! ) = !!
+ !
= 𝜎 ! (!! + !)
Normality of the data then gives that

(!! !!! )!(!! !!! )
Z= ! !
~𝑁(0,1)
! !
!! !
! !
We are also given that 𝑌~𝜒!" and Z and Y being independent we can use that ~𝑡!" .
!/!"
(iv) (– 1.126, 0.506). (v) The interval includes the value 0, suggesting that there is no
difference in the mean effectiveness of the two vaccines.
Ans.30.(i) 𝑝 = 0.02935
(ii) Test-statistic is 0.286915 from a Chi-square distribution with 2 d.f.

The test statistic has a very small value, and there is no evidence against the null.
(iii) P[BMI > 30] = P[BMI > 30|single]P[single]+ P[BMI > 30|married]P[married]
!" !
= !"# ×0.5 + !" ×0.5 = 0.1094
(iv) TS = 8.528399 at 3 degree of freedom. P-Value = 0.0384

Therefore, we reject H0 at 5% level, but not at the 1% level.
Ans.31. (-13.162, 37.162)
Ans.32. (i) (0.21, 0.39) (ii)Sample size is large, so normal approximation is valid.
(iii) With larger sample size the standard error will be smaller, and therefore the interval
will be narrower.
Ans.33. (i) (a) 𝑝 = 0.588 (b) T.S. = 3.23, Reject H0: pA = pB
(ii) (b) 𝜃 = 0.412

(iii) (a) Model in (ii) seems to provide a better fit as expected values are closer to
observed
(b)In part (i)(b) we rejected pA = pB which suggests a model with a common value of
p would not be appropriate. The comparison above suggests that an improved
model can be used.
Ans.34. (i) T.S. = 3.22, reject H0 at 1%significance level.
(ii) CI 1 = (1.74, 5.54) CI 2 = (5.61, 17.83)
(iii) Confidence intervals don’t overlap i.e. agree with result in (i) that variances are
www.sankhyiki.in
+91-‐9711150002

different.
Ans.35. (i) 0.274 (ii) T.S. = 2.315, Accept H0
Ans.36. (i) The two samples are from the same patients, so they are clearly not independent.
(ii) T.S. = 4.785, we have strong evidence against H0 (P-value < 0.5%), and conclude that
daily exercise has the effect of lowering blood pressure.
Ans.37. (0.296, 0.348)
(!!!)! !
Ans.38. (i) !!
~𝜒 ! !!! (ii) (5.97, 10.08)
Ans.39. (i) (19.968, 22.032) (ii) T.S. = 2, we reject null hypothesis at 5%.
(iii) The confidence interval in part (i) corresponds to a two-sided test. We found in
part (i) that 20 is contained in the confidence interval, and we can therefore not
reject the null hypothesis H0 : α=20 at a 5% significance level. However, the one-
sided test rejects H0 : α ≤ 20 since only positive differences 𝑋 -α0 are considered.
Answers are consistent.
(iv) 20.1445 (v) T.S. = -.129, Accept H0 (vi) T.S. = 0.9535, Accept H0.
Ans.40. (i) Width = 47.04 (ii) 554
(iii) The confidence interval in (ii) in narrower to achieve this we require a much
larger sample size.
Ans.41. (i) 0.125 (ii) 0.7518
Ans.42. T.S. = 9.50, P-Value < 1%, reject H0.
Ans.43. (0.812, 0.908)
Ans.44. 43%
!(!!! !! )!
Ans.45. (ii) (b) !(!!! !! !!! !! )
(iii)(a) Y 1 2 3 4 5 ≥6
Expected 1500.48 669.59 199.2 44.45 7.93 1.35
Observed 1486 694 195 37 10 1
T.S. = 2.68 on 3 df, can not reject H0.

(b) (0.846, 0.939) and (1.48, 1.54)
www.sankhyiki.in
+91-‐9711150002

Ans.46. (73.3%, 92.7%)
Ans.47. n = 500 is very large, so the Central Limit Theorem justifies normality. (225, 249)
Ans.48. (a) 𝐶 = 1.3435 (b) 𝑃 𝑍 ≤ 0.95 − 𝜃! 2 (c) Test 2 is more powerful test that Test 1.
Ans.49. (i) 0.027 (ii) (19.28, 40.72)

(iii) The probability that the result in any year will lie in the interval is 0.95 so there is a
5% probability that the company will see a result outside that range.
Ans.51. (i) Statistical test: Statistical / hypothesis testing begins with an assumption called a
hypothesis that we make about a population parameter. A hypothesis is where we make a
statement about something; A hypothesis test is where we collect a representative sample
and examine it to see if our hypothesis holds true.
Null Hypothesis: A hypothesis of no difference is called null hypothesis and is usually
denoted by H0. Null hypothesis is the hypothesis which is tested for possible rejection
under the assumption that it is true. It is very useful tool in test of significance. Null
hypothesis can sometimes be regarded as representing the current state of knowledge or
belief about the value of the parameter being tested, the “status quo” hypothesis.
Alternative Hypothesis: Any hypothesis, which is complementary to the null
hypothesis, is called an alternative hypothesis, usually denoted by H1.
(ii) Steps involved in hypothesis testing are:
• Specify the hypothesis to be tested

• Select a suitable statistical model
• Design and carry out an experiment study
• Calculate a test statistic
• Calculate the probability value
• Determine the conclusion of the test
(iii) (a) k = 11 (b) 0.2517
Ans.41. (i) (0.0253, 5.5725) (ii) (0.6733, 1.3267)
(iii) The large sample gives a much narrower confidence interval.
With a large sample we can predict the value of µ (average earthquakes p.a.) with
greater certainty.
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 8
CORRELATION AND REGRESSION
Question 1. The following data relate x, the moisture of wet mix of a certain product and y,
the density of the finished product.
x 12 11 10 9 8
y 4 3 2 0 1
a) Draw a scatter plot and comment.

b) Calculate the coefficient of correlation.
c) Assuming a linear relationship of the form 𝑌 = β! + β! X + error term,
obtain the least squares estimate of β! and β! .
d) Find the estimate of error variance.
e) Obtain the 95% confidence interval for β! , stating the assumption on the
error term.
f) In testing H! : β! = 0 against H! : β! ≠ 0 state with reason whether H! can
not be rejected at the 5% level of significance.
Question 2. A study was made on the effect of temperature on the yield of a chemical process.
The following data (in coded form) were collected.
x -5 -4 -3 -2 -1 0 1 2 3 4 5
y 1 5 4 7 10 8 9 13 14 13 8
Linear regression model 𝑦 = β! + β! x + 𝑒 is fitted for the above data.
a) Describe the parameters and variables in the model.

b) Obtain the least squares estimate of β! and β! and hence the prediction
equation.
c) Construct ANOVA table and test H! : β! = 0 at 5% level of significance.
d) What are 95% confidence limits for β! ?
e) What are the confidence limits for the true mean value of y when 𝑥 = 3 at
95% confidence level?
f) Are there any indications that a better model can be tried?
www.sankhyiki.in
+91-‐9711150002

Question 3. The following measurements, air velocity (cm/secs) and evaporation coefficients
(mm! /secs) of burning fuel droplets in an impulses engine.
Air Velocity: x 20 60 100 140 180 220 260 300 340 380
Evaporation coeff.: y 0.18 0.37 0.35 0.78 0.56 0.75 1.18 1.36 1.17 1.65
For the above data, a regression model 𝑦 = 𝛼 + 𝛽𝑥 + 𝑒 is to be fitted
a) State the assumption on this model.

b) Obtain the least squares estimate of α and β.
c) Compute an unbiased estimate of the error variance.
d) Test the hypothesis β = 0 against β ≠ 0 at 5% level.
e) 95% confidence interval for evaporation coefficient when air velocity
is 190 cm/sec.
Question 4. The following data refers to the number of claims (X) received by a motor
insurance company in a week and the number of settlements (Y) of these claims
in the following week during 10 randomly selected weeks in a year.
X: 100 110 120 130 140 150 160 170 180 190
Y: 45 51 54 61 66 70 74 78 85 89
A regression model 𝑌 = 𝛼 + 𝛽𝑋 + ε, ε~𝑁(𝜇, σ! ) is to be filled on the above data.
a) Display the data in a scatter diagram and comment on the selection of a

linear model for regression.
b) Compute 𝑋, 𝑌, 𝑆!! , 𝑆!! . Hence, find the estimates of α and β.
c) Obtain the estimate of σ! .
d) Test the hypothesis β = 0 against β ≠ 0.
e) Obtain 95% confidence interval for β.
f) Let the population correlation coefficient between X and Y be ρ. Compute
the sample correlation coefficient and test whether ρ = 0 against ρ ≠ 0.
Question 5. As humidity influences evaporation, the solvent balance of water-reducible paints

during spray-out is affected by humidity. A study is conducted to examine the
relation between humidity (X) and the extent of solvent evaporation (Y). The
following data summary is obtained:
𝑛 = 25 𝑥 = 1314.90 𝑦 = 235.70
𝑥 ! = 76,308.53 𝑦 ! = 2286.07 𝑥𝑦 = 11,824.44
www.sankhyiki.in
+91-‐9711150002

a) Find the estimated correlation coefficient between X and Y and test the
hypothesis H! :ρ=0, against H! :ρ ≠ 0; ρ being the population correlation
coefficient between X and Y.
b) Stating the assumption fit a regression line of the model
Y! = β! + β! X! + e! for the above data.
c) Obtain the unbiased estimator of 𝜎 ! .
d) Test the hypothesis H! : β! = 0 against H! : β! ≠ 0
e) Obtain 99% confidence interval for β! .
f) Obtain the coefficient determination 𝑅! .
Question 6. The following data shows the number of hours that ten administrative officers
worked and number of files disposed by them in a certain LIC office.
Hours Worked (x) 4 9 10 14 4 7 12 22 1 17
No. of file Disposed(y) 31 58 65 73 37 44 60 91 21 84
A linear regression model 𝑌 = 𝛼 + 𝛽𝑥 has been fitted to the above data.
i. Find the equation of the least squares line that approximates the regression
of the disposed on the number of hours worked.
ii. Estimate the average number of files disposed by an officer who worked
14 hours. Assuming the usual normal linear regression model.
iii. Test the null hypothesis β = 3 against β > 3 at 0.01 level of significance.
iv. Calculate a two sided 95% confidence interval for β.
Question 7. In a correlation analysis based on a random sample of 10 values from a bivariate

Normal distribution, a test of H0: ρ=0 against 𝜌 > 0 results in a probability value
of 0.025. Calculate the value of the sample correlation coefficient.
Question 8. A survey was conducted to investigate whether people tend to marry partners of
about the same age. This question was addressed to 12 married couples and their
ages were given in the following table.
Couple No. 1 2 3 4 5 6 7 8 9 10 11 12
Husband's age(x) 30 29 36 72 37 36 51 48 37 50 51 36
Wife's age(y) 27 20 34 67 35 37 50 46 36 42 46 35
a) Draw the scatter plot and comment on it.
b) Find the correlation coefficient and interpret it.
c) If ρ represents the population correlation coefficient between the ages of
partners, test the significance of ρ at 5% level.
www.sankhyiki.in
+91-‐9711150002

Question 9. It is thought that a suitable model for a plumber’s charge when called out for a job
is a linear one based on a fixed call-out charge and an hourly rate.
A random sample of 10 of his invoices gave the following results:
Duration of Job
(hours) x 0.5 1 1.5 2 2.5 3 3.5 4.5 5 5.5
Cost of Job(Rs) y 40 55 45 65 80 75 95 100 120 130
!
𝑥 = 29 𝑥 = 110.5 𝑦 = 805 𝑦 ! = 73,225 𝑥 𝑦 = 2,795
a) Plot these data and comment on the suitability of the proposed model.
b) Calculate the least squares estimates of the plumber’s call-out charge, and
the plumber’s hourly rate charge.
c) Determine a 90% confidence interval for the plumber’s hourly rate charge.
d) Compute Pearson correlation co-efficient and test for its significance.
Question 10. The following table shows the student population (in thousands) and quarterly
sales (in thousand Rupees) data for 10 armands’ pizza parlors
Restaurant Student Quarterly Sales

(i) Population (𝑥! ) (𝑦! )
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
The simple linear regression model y= β! + β! x + e has been fitted for these data:
i. Obtain the estimates regression equation for these data.

ii. Obtain the estimate of 𝜎 ! .
iii. Obtain 99% confidence interval for β! .
Question 11. The following data shows the student population and quarterly sales data for ten
food restaurants. The manager believes that quarterly sales for these restaurants
(y) are related positively to the size of the student population (x).
www.sankhyiki.in
+91-‐9711150002

Student
Restaurant Population Quarterly Sales
i (in‘000) 𝑥! (in Rs.’000) 𝑦!
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
A simple linear regression model 𝑦 = β! + β! x + 𝑒! is fitted.
i. Develop a scatter plot for these data.

ii. Develop the estimated regression equation.
iii. Test the hypothesis H! : β! = 0 against H! ∶ β! ≠ 0 using L.S = 0.01.
iv. Compute the 99% confidence interval for β! .
Question 12. A study was done to find if the students who are good in high school, carry on
doing well also in the college. The high school grade (X) and college grade (Y) of
15 randomly chosen students are given below.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X 2 3.4 3.7 1.5 3.3 0.3 0.4 2 2 2.1 2.1 1.3 1.5 3.1 2.1
Y 2 2.6 3.8 1.1 3 0.1 1.4 1.5 1.4 4 1.5 1.3 1.9 3.1 1.9
a) Draw a scatter plot of the above data with high school grades on X-axis and
college grade on Y-axis.
b) Estimate the slope and intercept parameters of the linear regression. Also
calculate the sample correlation coefficient between the high school grades
and the college grades.
c) Can it be concluded that, in general, the expected performance in college is
the same as that in high school?
d) Estimate the college grade of a student who has a grade of 2.1 in high
school. Comment on the college grade of student number 10 in the above
data.
www.sankhyiki.in
+91-‐9711150002

Question 13. A training manager wishes to see if there has been any alteration in the ability of
his trainees after they have been on a course. The trainees take an aptitude test
before they start the course and on equivalent one after they have completed it.
The scores are given below.
Before Training (X) 42 35 37 46 53 38 44 40 43

After Training (Y) 47 28 26 54 42 17 44 31 44
a) Compute the Pearson’s correlation coefficient between the scores before

and after training and test the significance of the calculated correlation
coefficient.
b) Assume that the populations are independent test whether there is any
significance difference between the variances of the scores before and after
training.
c) Obtain 90% confidence interval for the ratio of population variances of
scores before and after training.
d) In view of the conclusions in a, examine whether training has increased
their aptitude (at 5% significant level).
e) Obtain 95% confidence interval for the difference in the population mean
scores before and after training.
Question 14. In a study of the relation between the amount of information available and use of
buses in eight comparable test cities, bus route maps were given to residents of
the cities at the beginning of the test period. The increase in average daily bus use
during the test period was recorded. The numbers of maps and the increase in bus
use are given in the table below (both in thousands).
Number of maps(x) 80 220 140 120 180 100 200 160
Increase in bus use(y) 0.6 6.7 5.3 4 6.55 2.15 6.6 5.75
For these data:
𝑥 = 1,200 , 𝑥 ! = 196,800, 𝑦 = 37.65 , 𝑦 ! = 213.4875, 𝑥𝑦 = 6,378
i. Construct a scatter plot of the data and comment on the relationship

between the increase in bus use and the number of maps distributed.
ii. The equation of the fitted linear regression is given by 𝑦 = −1.816 +
0.04348𝑥. Perform an appropriate statistical test to assess the hypothesis
that the slope in this fitted model suggests no relationship between the
increase in bus use and the number of maps distributed. Any assumption
made should be clearly stated.
iii. The fitted responses and the residuals from the linear regression model
fitted in part (ii) are given below:
www.sankhyiki.in
+91-‐9711150002

Fitted values (𝑦) 1.66 7.75 4.27 3.4 6.01 2.53 6.88 5.14
Residuals(𝑒) -1.06 -1.05 1.03 0.6 0.54 -0.38 -0.28 0.61
Plot the residuals against the values of the fitted response and comment on
the adequacy of the model.
iv. A new city is added to the study, and 250,000 maps are distributed to its
citizens. Calculate the prediction of the increase in bus use in this city
according to the model fitted in part (ii) and comment on the validity of
this prediction.
Question 15. Consider a situation in which the data consist of two responses at each of five
values of an explanatory variable (𝑥 = 1, 2, 3, 4, 5), so we have a data set with ten
responses (y), as in the following table:
X 1 1 2 2 3 3 4 4 5 5
Y 12 19 18 35 19 44 32 53 44 65
For these data 𝑥 = 30, 𝑦 = 341 , 𝑥 ! = 110, 𝑦 ! = 14,345, 𝑥𝑦 =

1,211
i. You are asked to carry out a linear regression analysis using data.
a) Draw a plot of the data to show the relationship between the responses
and explanatory values.
b) Calculate the total, regression and residual sums of squares for a least
squares linear regression analysis of y on x, and hence calculate the
value of R! , the coefficient of determination.
c) Determine the equation of the fitted regression line.
d) Calculate a 95% confidence interval for the slope of the underlying
regression line.
ii. A colleague suggests that it will be simpler and will produce the same results
if we use the following reduced data, in which the two responses at each x
value are replaced by their mean:
x 1 2 3 4 5
y 15.5 26.5 31.5 42.5 54.5
The details of the regression analysis for these data are given in the box below.
Regression equation: 𝑦 = 5.90 + 9.40𝑥
www.sankhyiki.in
+91-‐9711150002

Coef Stdev t-ratio p-val
Intercept 5.900 2.233 2.64 0.078
x 9.400 0.673 13.96 0.001
𝑠 = 2.129 R-sq=98.5%
Analysis of Variance:
Source df SS MS F p-val
Regression 1 883.6 883.6 194.91 0.001
Error 3 13.6 4.53
Total 4 897.2
Discuss the similarities and the differences between the two approaches and their
results, in particular addressing the claim by the colleague that the two analysis
will produce “the same results”.
Question 16. As part of an investigation into health service funding a working party was
concerned with the issue of whether mortality rates could be used to predict
sickness rates. Data on standardised mortality rates and standardised sickness
rates were collected for a sample of 10 regions and are shown in the table below:
Region Mortality rate m (per 10,000) Sickness rate s (per 1,000)

1 125.2 206.8
2 119.3 213.8
3 125.3 197.2
4 111.7 200.6
5 117.3 189.1
6 100.7 183.6
7 108.8 181.2
8 102.0 168.2
9 104.7 165.2
10 121.1 228.5
Data summaries:
𝑚=1136.1, 𝑚! =129,853.03, 𝑠=1934.2, 𝑠 ! =377,700.62, 𝑚𝑠= 221,022.58
(i) Calculate the correlation coefficient between the mortality rates and the
sicknesses rates and determine the probability-value for testing whether
the underlying correlation coefficient is zero against the alternative that it
is positive.
www.sankhyiki.in
+91-‐9711150002

(ii) Noting the issue under investigation, draw an appropriate scatterplot for
these data and comment on the relationship between the two rates.
(iii) Determine the fitted linear regression of sickness rate on mortality rate and
test whether the underlying slope coefficient can be considered to be as
large as 2.
(iv) For a region with mortality rate 115.0, estimate the expected sickness rate
and calculate 95% confidence limits for this expected rate.
Question 17. A chain of LUMA ice cream parlors is floated in college campuses across the
metropolitan cities. A sample of 10 ice cream parlors across different colleges is
selected. The daily revenue Y (in INR) and the student population X in each of the
colleges are recorded as follows.
X 200 600 800 800 1200 1600 2000 2000 2200 2600
Y 6000 10000 9000 12000 11500 14000 16000 17000 15000 20000
The proprietor of the parlor chain has contacted a statistician, provided the above
information, and asked her to predict daily revenue given the student population
in a college. The statistician has decided to fit a linear regression model with
student population as independent variable.
(i) Fit a linear regression model.
(ii) Find the 99% confidence interval for the slope parameter.
(iii) Establish statistically, based on the regression model (i) above, whether
there is any relationship between the daily revenue and the college student
size by proposing and testing an appropriate hypothesis
(iv) Find the 95% confidence interval for the mean daily revenue when the
population of the college is 1,000.
(v) For what size of the college student population, the 95% confidence
interval for the mean daily revenue is the shortest?
(vi) Find 95% confidence interval for the predicted daily revenue for an
individual parlor when the student population of one particular college is
1,000.
(vii) Interpret the results of (iv) and (vi)
Question 18. Auditors are often required to compare the audited (or current) value of an inventory
item with the book (or listed) value. If a company is keeping its inventory and books
up to date, there should be a strong linear relationship between the audited and book
values: An Accountant intends to fit a linear regression model. He sampled ten
inventory items and obtained the audited and book values shown in the following
table.
Book - X 10 12 9 27 47 112 36 241 59 167

Audited - Y 9 14 7 29 45 109 40 238 60 170
𝑋=72.00; 𝑌= 72.10; 𝑆!! =54,714.00; 𝑆!! =53,832.90 and 𝑆!" =54,243.00
www.sankhyiki.in
+91-‐9711150002

(i) Fit a Linear regression model: Y = α + β X + e for the above data.
(ii) Obtain a 95% confidence interval for β, the slope parameter.
(iii) If the book value x = 100, find a 95% confidence interval for the predicted
mean audited value 𝜇 = 𝐸 𝑌|𝑋 = 𝑥 .
(iv) Find the book value x for which the 95% confidence interval for 𝑦, the
predicted individual audited value, has minimum length.
(v) Calculate coefficient of determination and interpret.
Question 19. A statistician has a series of bivariate data {(x1,y1), (x2,y2), … (xn,yn)} and
wishes to perform a linear regression on these data.
(i) State the equation that must be minimized to give the least squares
estimates of the regression coefficients.
(ii) Derive the least squares estimate of the slope coefficient from the equation
in part (i).
For a sample of 44 fish, the age (days) and length (millimeters) of each fish are
measured. Denote age by X and length by Y. The following summary data are
𝑥! = 3660 𝑥!! = 389684 𝑦! = 136727

𝑦!! = 500813951 𝑥! 𝑦! = 13609918
(iii) Determine the coefficients for a linear regression of Y on X.

(iv) Calculate the sample correlation coefficient between x and y.

www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. (b) 𝑟 = 0.9 (c) 𝑦 = −7 + 0.9𝑥 (d) S.E. = 0.6333 (e) C.I. = (0.0992, 1.7007)
(f) C.I. does not include zero reject H!
Ans.2. (b) 𝑦= 8.3636+0.9818x (c) F=16.36, reject H! (d) CI= (0.4319, 1.5317)
(e) CI= (8.912, 13.707) (f) R2 = 0.6444 Model is OK.
Ans.3. (b) 𝑦 = 0.0692 + 0.00383𝑥 (c) SE = 0.0253 (d) TS = 8.75, reject H!
(f) CI = (0.408, 1.1854)
Ans.4. (b)𝑋 = 145, 𝑌 = 67.3, 𝑆!! = 8250, 𝑆!! = 1932.1 β = 0.483 𝛼 = −2.739
(c) 𝜎 ! = 0.903 (d) TS = 46.167, reject H! (e) CI = (0.4589,0.5071)
(f) T.S = 46.17, reject H!
Ans.5. (a) T.S = 1.714, reject (b) β! = -0.08 𝛽! = 13.64 (c) 𝜎 ! = 0.79 (d) T.S = 7.64, reject H!
(e) CI = (- 0.109, - 0.051) (f) 𝑅! = 71.74%
Ans.6. (i) 𝑦 = 21.69 + 3.471𝑥 (ii) 𝑦 = 70.284 (iii) T.S. = 1.731, can not reject H!
(iv) C.I = (2.84,4.10)
Ans.7. 𝑟 = 0.632
Ans.8. (b) 𝑟 = 0.969 (c) T.S. = 12.403, reject H!
Ans.9. (b) 𝛼 = 29.915 𝛽 = 17.443 (c) C.I = (14.91, 19.97) (d) 𝑟 = 0.977 T.S. = 12.9, reject H!
Ans.10. (i) 𝑦 = 60 + 5𝑥 (ii) 𝜎 ! = 191.25 (iii) CI = (3.053, 6.947)
Ans.11. (ii) 𝑦 = 60 + 5𝑥 (iii) TS = 8.62, reject H! (iv) C.I. = (3.053, 6.947)
Ans.12. (b) 𝛼 = 0.346 𝛽 = 0.825 𝑟 = 0.778 (c) 𝑦 = 2.0785
Ans.13. (a) TS = 2.453, reject H! (b) TS = 4.87, reject H!
(c) CI = (0.0596, 0.7059) or (1.416, 16.757)
www.sankhyiki.in
+91-‐9711150002

(d) T.S. = −1.629, accept H! (e) C.I = (-12.077, 2.077)
Ans.14. (ii) Reject H! (iv) 𝑦 = −1.816 + 0.04348×250
Ans.15. (i) (b) 𝑅! =65.0% (c) 𝑦 = 5.9 + 9.4𝑥 (d) C.I. = (3.78, 15.02)
Ans.16. (i) 𝑟 = 0.764 and p-value = 0.005
(ii)
There seems to be an increasing linear relationship such that mortality could be used to
predict sickness.
(iii) 𝛼 = 7.426 𝑎𝑛𝑑 𝛽 = 1.6371, T.S. = -0.74, cannot reject H0
(iv) 𝜇! = 195.69 and CI = (185.60, 205.78)
Ans.17. (i) ) 𝑦 = 6025.35 + 5.0176𝑥 (ii) (3.2078, 6.8275) (iii) Ho: β = 0; Vs. H1: β ≠ 0
Because 0, the hypothesized value of β is not included in the confidence interval
calculated in above part, we can reject Ho and conclude that significant statistical
relationship exists between the student size and daily revenue.
(iv) (9,981.57, 12,104.35)
(v) The standard error of mean daily revenue estimate, 𝑆𝑒(𝜇), will be minimum at
𝑥! = 𝑥 since term is squared and minimum can be zero at 𝑥! = 𝑥
For any confidence interval, the shortest length is attained at minimum 𝑆𝑒(𝜇). Hence for
a college with 1,400 students, the 95% confidence interval for the mean daily revenue
will be the shortest.
(vi) (7,893.97, 14,191.94)
(vii) CI for the daily revenue of a particular college with 1,000 students is wider than CI
for mean daily revenue with 1,000 students.
The difference reflects the fact that we are able to estimate mean value of y more
precisely than individual value of y.
The resulting interval for an individual response is wider than the corresponding interval
for the mean response because the uncertainty associated with individual estimator is
more than the relatively more stable mean response.
Ans.18. (i) 𝑦 = 0.7198 + 0.9914𝑥 (ii) (0.9651, 1.0177) (iii) (97.781, 101.937)
(iv) 𝑆𝑒(𝑦) will be minimum at 𝑋! = 𝑋 = 72
(v) Coefficient of determination is used to measure the goodness of fit of a linear
regression model. A value of 99.89% means the model is a good fit.
𝑆
Ans.19. (i) 𝑦! − (𝛼 + 𝛽𝑥! ) ! (ii) 𝛽 = !" 𝑆 (iii) 𝛼 = 924 ∙ 68 𝛽 = 26 ∙ 241
!!
www.sankhyiki.in
+91-‐9711150002

(iv) 𝑟 = 0 ∙ 879

www.sankhyiki.in
+91-‐9711150002

REVISION ASSIGNMENT – 2
Question 1. The waist measurements (in cm) of six male patients before and after undergoing
a medically controlled diet are as follows:
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6
Before 106 98 110 100 105 96
After 98 97 82 89 80 90
Calculate a 90% confidence interval for the reduction in waist measurement
following the diet.
Question 2. A random sample of size 2n is taken from a geometric distribution for which:
𝑃 𝑋 = 𝑥 = 𝑝𝑞 !!! 𝑥 = 1,2,3, ⋯
Give an expression for the likelihood that the sample contains equal numbers of
odd and even values of X.
Question 3. A random sample of 16 observations ( 𝑥!, 𝑥!, ⋯ 𝑥!" ) from a normal distribution
gives: !" !" !
!!! 𝑥! = 128 !!! 𝑥! = 1168
Calculate a 90% confidence interval for the population standard deviation.

Question 4. The following sample was taken from a normal distribution with mean 𝜇 and
variance 20: 56, 32, 49, 57, 44
(i) Calculate a symmetrical 95% confidence interval for 𝜇.
(ii) Repeat part (i) for the situation where the population variance is unknown.
Question 5. Two children play an ‘incy-wincy’ spider game. They take it in turns to roll two
dice each and move their spiders up their drainpipes as follows:
Score Movement
2, 3 or 4 Down 1
5, 6 or 7 Stay same
8, 9 or 10 Up 1
11 or 12 Up 2
(i) Using a normal approximation, calculate the probability that after 15 turns
a child’s spider will have moved up more than 8 squares from the start.
(ii) Comment briefly on the suitability of this approximation.
www.sankhyiki.in
+91-‐9711150002

Question 6. A sample of 50 independent and identically distributed observations from an
Exp(𝜆) distribution gave:
Range 0≤x<1 1≤x<2 x≥2
Frequency 30 15 5
(i) Show that the log-likelihood can be expressed as:
𝑙𝑛𝐿 𝜆 = 𝑐𝑜𝑛𝑠𝑡 − 25𝜆 + 45ln (1 − 𝑒 !! )
explaining clearly why the constant has arisen.
(ii) Hence calculate the maximum likelihood estimate of 𝜆.
Question 7. A random variable X has probability density function:
2𝑒 !!(!!!) 𝑥 ≥ 𝜃 where the value of 𝜃 is unknown.
Five observations of X are: 1.90, 2.97, 1.88, 2.94 and 1.56.
(i) Derive a formula for the maximum likelihood estimator of 𝜃 and obtain
the maximum likelihood estimate for this sample.
!
(ii) Show that 𝐸 𝑋 = 𝜃 + ! and hence calculate the method of moments
estimate of 𝜃.
(iii) Comment briefly on your results.
Question 8. A large life office has n policyholders, each with a probability of 0.01 of dying
during the next year (independently of all other policyholders).
Calculate the approximate probability that there will be between 9 and 16 (both
inclusive) deaths during the year, when:
(i) n = 400 (ii) n = 3,000
Question 9. The gamma distribution, with parameters 𝛼 and 𝜆, has moment generating
! !!
function : 𝑀! 𝑡 = 1 − !
(i) Show, using moment generating functions, that the sum of two
independent gamma random variables, each with second parameter 𝜆, is
also a gamma random variable and state its parameters.
(ii) A random sample X1 ,⋯, Xn is taken from a Gamma(𝛼,𝜆) distribution.
Derive the moment generating function of 2𝜆 𝑋! , and hence show that it
!
has a 𝜒!!" distribution.
(iii) Suppose that X is the mean of a random sample of size 5 taken from a
Gamma(2,0.1) distribution. Use the result from part (ii) to calculate the
probability that X exceeds 40.
Question 10. The number of claims per annum from a certain type of medical insurance policy
sold to policyholders over the age of 60 is believed to follow a Poi(𝜆) distribution,
where the parameter 𝜆 is unknown. A sample of 10 policies gave rise to the
following numbers of claims: 0, 1, 0, 0, 3, 0, 1, 0, 2, 2
(i) Use a normal approximation to calculate an approximate 99% confidence
interval for the Poisson parameter 𝜆.
www.sankhyiki.in
+91-‐9711150002

(ii) Comment on the accuracy of the interval obtained in part (i).
(iii) Write down the equations that you would use to obtain the confidence
interval for 𝜆 using an accurate method.
Question 11. A random sample of 10 pet insurance claims had an average size of £680. It is
believed that claim amounts are exponentially distributed.
(i) Using the fact that if X1,⋯, Xn are exponentially distributed with
!
parameter 𝜆 , then 2𝑛𝜆𝑋 has a 𝜒!! distribution, where 𝑋 is the mean of
X1 ,⋯, Xn , calculate an exact 90% confidence interval for the mean pet
insurance claim size.
(ii) Write down the likelihood function in terms of the mean 𝜇 of the
exponential distribution and hence show that the maximum likelihood
estimator of 𝜇 is 𝑋.
(iii) (a) Show that the Cramér-Rao lower bound for estimators of the mean
𝜇!
of the exponential distribution is given by 𝑛
(b) Hence, calculate the estimated asymptotic standard error of the
mean, 𝑋.
(iv) (a) Use your results from (iii) and the asymptotic properties of
estimators to calculate an approximate 90% confidence interval for
the mean claim size.
(b) Comment on the confidence intervals produced in (i) and (iv)(a).
Question 12. It is desired to test the value of the parameter p for a random variable that has a
binomial distribution. In order to test the null hypothesis H0 : p = 0.4 against the
alternative hypothesis H1 : p = 0.6 , the following test is devised:
The number of successes, X, in a sample of size 50 is determined. If X≥25 , then
H0 is rejected. Calculate the approximate size of this test.
Question 13. Following archaeological excavations at a site in Egypt, ten samples of wood
were carbon-dated and their ages x (years) estimated as:
4,900 4,750 4,820 4,710 4,760
4,570 4,300 4,680 4,800 4,670
!
𝑥 = 46960 𝑥 = 220772
(i) Calculate a 95% confidence interval for the true mean age of the wood
found at this site.
(ii) Present these data values graphically and comment on the validity of the
confidence interval calculated in part (i).
(iii) Ideally the archaeologist would like the 95% confidence interval for the
true mean age, calculated in (i) above, to have a width of no more than
200 years. Calculate the minimum sample size needed.
www.sankhyiki.in
+91-‐9711150002

(iv) At a second site, eight samples of wood gave the following results:
𝑦 = 36000 𝑦 ! = 162280000
Calculate a 95% confidence interval for the difference between the mean
ages of the wood found at the two sites.
(v) Obtain a 90% confidence interval for the ratio of the underlying variances
in the ages of the two samples of wood. Hence comment on the validity of
the confidence interval given in part (iv).
Question 14. A research chemist thinks he has discovered a new desiccant, which is more
efficient at extracting moisture from chemicals than the existing one. In order to
test the claim, equal amounts of a homogeneously mixed compound are put into
each of sixteen desiccators. These are divided into two batches of eight, labelled
A and B, and in each batch the desiccators are numbered 1 to 8. Into each
desiccator is also put a standard amount of the respective desiccant under test.
Batch A contains the existing desiccant whilst the new desiccant is placed in
Batch B. The desiccators are sealed for 24 hours and then the increase in weight
in grams of each of the sixteen samples of desiccant is measured. The results are:
Sample number 1 2 3 4 5 6 7 8
Existing desiccant (A) 4.59 5.05 4.49 5.33 4.66 4.98 5.67 5.23
New desiccant (B) 4.75 5.03 4.66 5.56 4.90 4.88 5.80 5.33
𝐴 = 40 𝐴! = 201.1574 𝐵 = 40.91 𝐴! = 210.3659
(i) (a) (1) Draw a plot of the data and comment briefly.
(2) Perform a test to verify that the variances arising from the
use of each desiccant are not significantly different and
comment briefly in relation to your plot of the data.
(b) Use a t test to investigate the claim that the new desiccant extracts
more moisture than the existing one.
(ii) It was subsequently discovered that eight different compounds had been
used in the above test. The ith pair of desiccators A and B had contained
equal weights of compound i, 𝑖 = 1,2,⋯ ,8. Perform a new analysis with
the same aim, as in part (i)(b) above, again using a t test.
(iii) Comment on any difference found between the analyses, and the cause.
Question 15. A study was carried out into the effects of smoking on life expectancy. The
average number (x) of cigarettes smoked per day from age 50 by 11 individuals
was calculated and the number (y) of years from age 50 until their deaths was
recorded. The results were as follows:
www.sankhyiki.in
+91-‐9711150002

X 0 1.1 17.3 10.6 25.1 5.2 11.8 40 15.6 13.8 3.6

Y 42.3 30.7 26.3 36.8 8.9 25.1 10.8 10 25.2 17.2 29.1
(i) Calculate Pearson’s correlation coefficient. Comment on the value
obtained.
(ii) Calculate Spearman’s rank correlation coefficient and comment on the
value obtained.
(iii) State a general advantage of using Spearman’s rank correlation
coefficient.
(iv) Carry out a test to determine if Spearman’s rank correlation coefficient is
significantly different from zero assuming it is appropriate to use a normal
approximation and comment on this assumption.
(v) Calculate the Kendall’s rank correlation coefficient.
Question 16. It is thought that a plumber charges £22 per hour plus an administrative charge of
£15 per callout.
A sample of eight invoices was obtained corresponding to jobs with durations of 1
hour, 2 hours, ..., 8 hours. For each invoice the total cost of the job was noted
with the following results:
Time x(hours): 1 2 3 4 5 6 7 8
Cost y (£): 40 50 81 89 122 128 151 179
The following model is used to represent the data:
𝑌! = 𝑎 + 𝑏𝑥! + 𝑒!
where Yi (𝑖 = 1,2, ⋯ ,8) are the costs, xi (𝑖 = 1,2, ⋯ ,8) are the fixed times and
ei (𝑖 = 1,2, ⋯ ,8) are independent errors with a N(0,𝜎 ! ) distribution.
(i) (a) Derive formulae for the least squares estimators of a and b .
(b) Explain how your answer to part (i)(a) would have differed if you
had been asked to calculate the maximum likelihood estimators
and justify your answer.
(ii) Calculate the regression coefficients 𝑎 and 𝑏.

(iii) Carry out a test to establish whether or not the slope in the model agrees
with the suggested £22 per hour.
(iv) Calculate a 90% confidence interval for the:
(a) average cost of a job lasting 4 hours
(b) cost of an individual job lasting 6 hours.
(v) Comment on relative widths of the two intervals calculated in part (iv).
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. (4.22,22.11)
!! !!
Ans.2. ! (!!!)!!
Ans.3. (2.4,4.45)
Ans.4. (i) (43.68,51.52) (ii) (34.92,60.28)
Ans.5. (i) 0.14381 (ii) The Central Limit Theorem requires n to be large. Fifteen turns is not
large, therefore this will be a poor approximation.
Ans.6. (ii) 𝜆 = 1.030
Ans.7. (i) 𝜃 = 𝑚𝑖𝑛𝑋! From this sample, the maximum likelihood estimate of 𝜃 is 1.56
(ii) 1.75
(iii) One of the observed values was less than the method of moments estimate of 𝜃 . So
the method of moments gives an estimate of 𝜃 that is not ‘possible’ in this case. This
contrasts with the situation for maximum likelihood estimators, which, provided they
exist, must, by definition, give feasible estimates.
Ans.8. (i) 0.02136 (ii) 0.00659
Ans.9. (ii) (1 − 2𝑡)!!" (iii) 0.005
Ans.10. (i) (0.127,1.673)

(ii) The approximation is not brilliant as:
- the Poisson parameter is small (the approximation is better for large values)
- the sample size of 10 is small (the approximation is better for large samples)
- an estimate for 𝜆 is used in the variance.
! !!"! (!"!)!
!
(iii) !!! !!
= 0.005 to obtain the upper limit
! !!"! (!"!)!
!
!!! !!
= 0.995 to obtain the lower limit
Ans.11. (i) (433, 1250) (iii)(b) 215

(iv)(a) (447.3,1417) (b) The exact confidence interval of (433, 1250) from part (i) is
very different from the approximate confidence intervals of (326, 1034) or (447, 1417)
from part (iv)(a). The normal approximation used in part (iv)(a) requires a large sample.
We have a sample of only 10 values. Hence the approximation is not very good.
Ans.12. 0.09697
www.sankhyiki.in
+91-‐9711150002

Ans.13. (i) (4,577, 4,815)
(ii)
Given that our data set is small, the confidence interval in part (i) requires that the ages
are normally distributed. The plot seems to show that 4,300 is very different to the other
values and so it may be an outlier. In which case the underlying distribution is not
normal, and our confidence interval is not valid. However, more data is needed for us to
be sure.
(iii) Asample size of at least 14 is required
(iv) (13.2, 379) (v) (0.188,2.27) Since this confidence interval contains 1, the
assumption of equal variances used in the confidence interval in part (iv) looks reasonable.
Ans.14. (i)(a)(1)
The plots suggest that the new desiccant may extract more water. The spread of values is
similar for each desiccant.
(2) TS = 1.004 : insufficient evidence to reject H0
(b) TS = 0.559 : insufficient evidence to reject H0
(ii) TS = 2.7083 : sufficient evidence to reject H0
(iii) The paired test shows that there was a significant difference between the two
desiccators, whereas the two-sample test does not indicate any significant difference. The
small but significant difference between the two desiccants is masked in the two-sample
test because the test statistic for the two-sample test is calculated using the pooled
variance (which is 0.1657) rather than the sample variance of the differenced data (which
is 0.01411). A smaller variance leads to a larger test statistic, which means we are more
likely to reject the null hypothesis. In other words, the increased power of the paired test
enables a significant difference to be identified.
Ans.15. (i) - 0.7286 (ii) - 0.7636

(iii) Since Spearman’s rank correlation coefficient only considers ranks rather than the
actual values, the value of the coefficient is less affected by outliers in the data than
www.sankhyiki.in
+91-‐9711150002

Pearson’s correlation coefficient. Hence the Spearman’s rank correlation coefficient is
more robust.
(iv) TS = - 2.415 : sufficient evidence to reject H0
(v) 𝜏 = −0 ∙ 5636
𝑆!"
Ans.16. (i) (a) 𝑎 = 𝑦 − 𝑏𝑥 𝑏= 𝑆!! (b) The answer would not have differed at
all. For a normal distribution, maximum likelihood and least squares obtain the same
estimates.
(ii) 𝑏 = 19.667 𝑎 = 16.5 (iii) TS = - 2.355 : insufficient evidence to reject H0
(iv) (a) (90.7,99.7) (b) (121,148)
(v) The confidence interval for the individual job is wider (í27) than the confidence
interval for the average cost (í9). So there is greater uncertainty over an individual result
than an average one.
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT - 9
SAMPLING AND STATISTICAL INFERENCE
Question 1. A random sample of n observations is taken from a normal distribution with mean
µ and variance σ2. The sample variance is an observation of a random variable S2.
Derive from first principles E(S2) and Var (S2).
Question 2. Calculate the probability that, for a random sample of 5 values taken from a
N(100, 252) population (i) 𝑋 will be between 80 and 120, (ii) S will exceed 41.7
(iii) both conditions (i) and (ii) will hold?
!!!""
Question 3. State the distribution of !/ !
for a random sample of 5 values taken from a
N(100,σ2 ) population. What is the probability that this quantity will exceed
1.533?
Question 4. Independent random samples of size n1 and n2 are taken from the normal
populations N(µ1,𝜎!! ) and N(µ2,𝜎!! ) respectively.
(i) Write down the sampling distributions of 𝑋1 and 𝑋2 and hence determine
the sampling distribution of 𝑋! − 𝑋! , the difference between the sample
means.
(ii) Now assume that 𝜎!! = 𝜎!! = 𝜎 !
(a) Express the sampling distribution of 𝑋! − 𝑋! in standard normal

form.
!! !! !!! !(!! !!)!!!
(b) State the sampling distribution of !!
(c) Using the N(0,1) distribution from (a) and the χ2 distribution from
(b), apply the definition of the t distribution to find the sampling
distribution of 𝑋! − 𝑋! when 𝜎 ! is unknown.
Question 5. Determine:
(i) P(F9,10 > 3.779) (ii) P(F12,14 < 3.8)
(iii) P(F11,8 < 0.3392) (iv) the value of p such that P(F14,6 < p) = 0.01.
Question 6. (i) Determine: (a) P(F3,9 < 3.863) (b) P(F10,10 < 0.269)
(ii) Determine the value of p such that:
(a) P(F24,30 > p) = 0.10 (b) P(F18,9 > p) = 99%
Question 7. For random samples of size 10 and 25 from two normal populations with equal
variances, use F tables to determine the values of α and β such that
!!! !!!
P !!!
> 𝛼 = 0.05 and P !!!
< 𝛽 = 0.05, where 𝑆!! is the sample variance from
the sample of size 10, and 𝑆!! is the other sample variance.
Question 8. What is the probability that the sample variance of a sample of 10 values from a
www.sankhyiki.in
+91-‐9711150002

normal distribution will be more than 6 times the sample variance of a sample of
5 values from an independent normal distribution with the same variance?
Question 9. A random sample of 10 observations is drawn from a normal distribution with

mean µ and standard deviation 15. Independently, a random sample of 25
observations is drawn from a normal distribution with mean µ and standard
deviation 12. Let 𝑋 and 𝑌 denote the respective sample means.
Evaluate P(𝑋 − 𝑌 > 3).
Question 10. Let 𝑋!, 𝑋!, … 𝑋! be independent N (0, 1) random variables and let
! ! ! !
𝑋 = ! !!! 𝑋! and 𝑆 ! = ! !!!(𝑋! − 𝑋)!
!
Calculate P 𝑋 > 0 and !!!(X ! − 𝑋)! < 9.488
Question 11. Consider a random sample of size 21 taken from a normal distribution with mean
𝜇= 25 and variance 𝜎 ! = 4. Let the sample variance be denoted 𝑆 ! . State the
distribution of the statistic 5𝑆 ! and hence find the variance of the statistic 𝑆 ! .
Question 12. A random sample of size 10 is taken from a normal distribution with mean 𝜇 = 20
and variance 𝜎 ! = 1.
Find the probability that the sample variance exceeds 1, that is find P(S2 > 1).
Question 13. Let X1, X2, …, Xn be a random sample of size n from a population with mean
𝜇 and variance 𝜎 ! .
!
Let the sample mean be 𝑋 and the sample variance be 𝑆 ! = !!! { 𝑋!! − 𝑛𝑋 ! }.
!!
You may assume E[𝑋] = 𝜇 and V 𝑋 = !
. Show that 𝐸[𝑆 ! ] = 𝜎 ! .
Question 14. Consider a random sample, X1,… , Xn, from a normal N(μ, σ2) distribution, with
sample mean 𝑋 and sample variance S2.
(i) Define carefully what it means to say that X1,…, Xn is a random sample
from a normal distribution.
(ii) State what is known about the distributions of 𝑋 and S2 in this case,
including the dependencies between the two statistics.
(iii) Define the t -distribution and explain its relationship with 𝑋 and S2.
Question 15. Consider a random sample consisting of the random variables X1, X2,..., Xn with
mean µ and variance σ2. The variables are independent of each other.
(i) Show that the sample variance, S2, is an unbiased estimator of the true
variance σ2.
Now consider in addition that the random sample comes from a normal
(!!!)! ! !
distribution in which case it is known that !!
~𝜒!!! .
www.sankhyiki.in
+91-‐9711150002

(ii) (a) Derive the variance of S2 in terms of σ and n.

(b) Comment on the quality of the estimator S2 with respect to the
sample size n.
Question 16. Regarding the small sample inference concerning the equality of means of two
Normal populations, the basic assumption to be made is
(i) The sample are independent and Normal

(ii) The population variance are equal
(iii) The samples were randomly and independently selected from Normal
populations and variances of the populations are same.
(iv) The samples are drawn from Normal populations and population variances
are not same.
Question 17. A random sample 𝑋! , 𝑋! , … 𝑋!" is drawn from a normal distribution with mean 1
and variance 𝜎 ! . Let 𝑋 and 𝑆 ! denote the sample mean and variance respectively.
Find the approximate value of 𝑃[ 𝑋 − 1 > 𝑆)] by referring to statistical tables.
Question 18. The following are the summary measures of birth weights (in grams) of babies in
a city
Gender Mean Standard Deviation

Male 3000 300
Female 3500 400
Assuming that the birth weights are independently normally distributed for the
two genders, calculate the probability that at birth a boy outweighs a girl.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
!! !
Ans.1. E(S2) = 𝜎 ! and var (S2)= !!!
Ans.2. (i) 0.926 (ii) 0.0253 (iii) 0.023
!!!""
Ans.3. !/ !
~𝑡! and probability = 0.1
!! !!
Ans.4. (i) 𝑋1~N(µ1, ! ), 𝑋2~N(µ2, ! ), 𝑋1 - 𝑋2 is the difference between two independent normal
!! !!
!!! !!
variables and so is itself normal, with mean µ1 -‐ µ2 and variance
!!
+ !! .
!
(!! !!! )!(!! !!! ) !
(ii) (a) ! !
~𝑁(0,1) (b) 𝜒!!! (c)𝑡!! !!! !!
!! !
!! !!
Ans.5. (i) 0.025 (ii) 0.99 (iii) 0.05 (iv) p = 0.2244

Ans.6. (i)(a) 0.95 (b) 0.025 (ii)(a) = 1.638 (b) 0.278
Ans.7. 𝛼 = 2.3 𝑎𝑛𝑑 𝛽 = 0.345
Ans.8. Approx 5%
Ans.9. 0.28638
Ans.10. P (𝑋 > 0) P (𝑋!! < 9.488) = 0.5*0.95= 0.475
!
Ans.11. 5𝑆 ! ~𝜒!" and V[𝑆 ! ] = 1.6
Ans.12. 0.437
Ans.14. (i) The random variables X1,… , Xn, are independent and identically distributed with
𝑋! ~N(µμ, 𝜎 ! )
(ii) 𝑋 and S2 are independent
𝜒!! ! !!!
(iii) 𝑡! ≡ 𝑁(0,1)/ 𝑘 where N(0,1) and 𝜒! are independent , we get ! !
~𝑡!!!
!! !
Ans.15. (ii) (a) 𝑉 𝑆 ! = (!!!)
(b) Estimator gets better (more accurate) as n increases, as its variance reduces.
(MSE also gets smaller)
Ans.16. (iii)
Ans.17. From actuarial table page 163 the probability is between 0.001 and 0.0005.
Ans.18. 0.15866

www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT - 10
RANDOM NUMBER SIMULATION
Question 1. Generate 3 random variants from an exponential distribution with mean 0.5, using
the 3 random numbers 15/59, 55/59 and 42/59.
Question 2. Describe how you would generate random variants from a Binomial (3, 0.6)
distribution X, using a sequence of random numbers {𝑢! }.
Question 3. Using the first 5 numbers in the first column of random U (0, 1) numbers given in
the Tables on page 190 generate 5 values from a Poisson (10) distribution
Question 4. Simulate three random values from an Exp (0.1) distribution using the random
values 0.113, 0.608 and 0.003 from U (0, 1).
Question 5. Generate three random values from a U (-1, 4) using the following random values
from U (0, 1): 0.07 0.628 0.461
Question 6. Simulate two random values from a Poi (2) distribution using the random values
0.721 and 0.128 from U (0, 1).
Question 7. Generate three random values from a Bin (4, 0.6) using the following random
values from U (0,1): 0.588 0.222 0.906
Question 8. A model used for claim amounts (X, in units of £10,000) in certain circumstances
has the following probability density function, f(x), and cumulative distribution
function, F(x) :
!(!")! !" !
𝑓 𝑥 = (!"!!)! , 𝑥 > 0 ; 𝐹 𝑥 = 1 − !"!!
You are given the information that the distribution of X has mean 2.5 units
(£25,000) and standard deviation 3.23 units ($32,300).
(i) Describe briefly the nature of a model for claim sizes for which the
standard deviation can be greater than the mean.
(ii) (a) Show that we can obtain a ‘simulated observation of, X by
calculating:
!!.!
𝑥 = 10[ 1 − 𝑟 − 1]
where r is an observation of a random variable which is uniformly
distributed on (0,1).
(b) Explain why we can just as well use the formula:
𝑥 = 10 𝑟 !!.! − 1 to obtain a simulated observation of X.
(c) Calculate the missing values for the simulated claim amounts in
www.sankhyiki.in
+91-‐9711150002

the table below (which ha been obtained using the method in (ii)(b)
above):
r Claim (£)
0.7423 6,141
0.0291 102,872
0.2770 29,272
0.5895 11,148
0.1131 54,635
0.9897 207
0.6875 7,782
0.8525 3,243
0.0016 ?
0.5154 ?
Question 9. Consider the following simple model for the number of claims, N, which occur in
a year on, a policy:
n 0 1 2 3
P (N=n) 0.55 0.25 0.15 0.05
(a) Explain how you would simulate an observation of N using a number r an
observation of a random variable, which is uniformly distributed on (0,1).
(b) Illustrate your method described in (i) by simulating three observations of
N using the following random numbers between 0 and l:
0.6221, 0.1472, 0.9862
Question 10. One variable of interest, T in the description of a physical process can be
modelled as T = XY where X and Y are random variables such that
X ~ N(200, 100) and Y depends on X in such a way that Y|X = x ~ N(x,1).
Simulate, two observations of T, using the following pairs of random numbers
(observations of a uniform (0,1) random variable), explaining your method and
calculations clearly:
Random numbers
0.5714, 0.8238
0.3192, 0.6844
www.sankhyiki.in
+91-‐9711150002

Question 11. The random variable X has probability density function:
! !
𝑓 𝑥 = ! ! , 𝑥 > 1 and cumulative distribution function:𝐹 𝑥 = 1 − ! !
Use the following uniform (0,1) random numbers:

0.5719, 0.8612, 0.3028 to simulate three observations of X, explaining your
method and calculations clearly.
Question 12. (i) Use the following uniform (0,1) random numbers:
0.9236, 0.2578
and a suitable table of probabilities to simulate two observation of the
random variable: X where X ~ N(200, 100).
(ii) Use the following uniform (0,1) random numbers:
0.3287, 0.9142
to simulate two observations of the random variable Y, where Y has an
exponential distribution with mean 100.
Question 13. Let X be a discrete random variable with the following probability distribution:
X: 0 1 2 3
P(X = x): 0.4 0.3 0.2 0.1
(i) Simulate three observations of X using the following three random

numbers from a uniform distribution on (0,1) (you should explain your
method briefly and clearly).
Random numbers: 0.4936, 0.7269, 0.1 652
Let X be a random variable with cumulative distribution function:
! !
𝐹! 𝑥 = !!! !! (1 − 𝑒 !! ), 0<x<1
(FX(X) = 0 for x ≤ 0 and FX(X) = l for x ≥ 1)
(ii) Derive an expression for a simulated value of X using a random number u
from a uniform distribution on (0,1) and hence simulate an observation of
X using the random number u = 0.8149.
Question 14. Consider a random variable U that has a uniform distribution on [0,1] and let F be
the cumulative distribution function of the standard normal distribution.
Show that the random variable X = F−1(U) has a standard normal distribution.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
Ans.1. 0.1467, 1.3456, 0.6222
Ans.3. 11, 5, 5, 11 and 7
Ans.4. 1.20, 9.36 and 0.03
Ans.5. -0.65, 2.14 and 1.305
Ans.6. 3 and 0
Ans.7. 3, 2 and 4
Ans.8. (i) X takes positive values only so to have such a relatively high standard deviation the
distribution must be positively skewed with sizeable probability associated with high
values (i.e. the model embraces high claim sizes; the density has a long or heavy tail).
(ii) (b) 𝑅~𝑈 0,1 ⟹ 1 − 𝑅~𝑈 0,1 , so (1 r) is also a random number from (0, 1), so we
can use 1- r in place of r
(ii) (c) 262390, 141175
Ans.9. (b) 1, 0, 3
Ans.10. 40911, 38236
Ans.11. 1.5284, 2.6842, 1.1976
Ans.12. (i) 214.3, 193.5, (ii) 39.85, 245.57
Ans.13. (i) (1,2,0), (ii) X = 0.851
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 11
BAYESIAN STATISTICS AND CREDIBILITY THEORY
1. The number of e-mail messages received each day by an actuarial student has a
Poisson distribution with mean λ, where from past experience, the prior
distribution of λ is exponential with mean µ.
(i) The student has data x1,..., xn, where xi is the number of messages arriving
on day i, i = 1,2,…..n.
(a) Derive the posterior distribution of λ.
(b) Show that the Bayesian estimate of λ under quadratic loss can be
written in the form of a credibility estimate, and state the credibility
factor.
(c) If µ = 50 and the student receives a total of 550 messages over 10
days, calculate the Bayesian estimate of λ under quadratic loss.
(ii) 60% of messages require an answering time (in minutes) that is
exponentially distributed with mean 1, and the remaining messages
require an answering time that has a Pareto distribution-with mean 2 and
variance 12.
Determine the probability that a randomly chosen message requires more
than M minutes answering time. [UK April 2002]
2. Claims on a portfolio of insurance policies are exponentially distributed with

mean 1/λ, where previous experience with similar portfolios suggests that the
prior distribution of λ is gamma with mean 1 and variance ½. Twenty claims are
observed with average value 1.2.
Determine the posterior distribution of λ. [UK Sept 2002]
3. The lengths of time taken to deal with each of n reports are independent
exponentially distributed random variables with mean 1/λ.
Show that the gamma distribution is the conjugate prior for this exponential
distribution. [UK April 2003]
4. In a portfolio of property insurance policies let θ denote the proportion of

policies on which claims are made in the year. The value of θ is unknown and is
assumed to have a Beta prior distribution with parameters α and β. A claims
www.sankhyiki.in
+91-‐9711150002

analyst estimates that the mean and standard deviation of θ are 0.20 and 0.25
respectively.
From a random sample of 50 policies, a claim is made on 24% of them during the
year.
(i) Determine the values of the parameters α and β, of the prior distribution.
(ii) Determine the posterior distribution and hence the posterior mean of θ.
(iii) For the general case where x is the number of claims arising from sample
size n and µ is the mean of the Beta prior distribution, show that the
posterior mean of θ can be expressed as:
Z.(x/n) + (1—Z).µ
and express Z as a function of α, β and n.
(iv) (a) Calculate the value of Z for the situation in part (ii) and explain
what it represents.
(b) Without performing any further calculations, explain how you
would expect the value of Z to change if:
(1) The analyst now believes the standard deviation, σ, of the
prior distribution to be 0.50.
(2) The sample size, n, was 400.
(c) State the limiting value of Z as σ and n increase and explain what
this means. [UK Sept 2003]
5. A risk consists of 5 policies. On each policy in one month there is exactly one
claim with probability θ and there is negligible probability of more than one
claim in one month. The prior distribution for θ is uniform on (0,1). There are a
total of 10 claims on this risk over a 12-month period.
(i) Derive the posterior distribution for θ.
(ii) Determine the Bayesian estimate of θ under:
(a) quadratic loss
(b) all or nothing loss. [UK April 2004]
6. The number of claims from one group of drivers in a year has a Poisson
distribution with mean λ and the number of claims from a second group of
drivers has a Poisson distribution with mean 2λ. In one year, there are n1 claims
from group 1 and n2 claims from group 2.
www.sankhyiki.in
+91-‐9711150002

(i) Derive the maximum likelihood estimator, λ̂ of λ.

(ii) Suppose that past experience shows that λ has an exponential distribution
with mean 1/v.
(a) Derive the posterior distribution of λ.
(b) Show that the Bayesian estimate of λ under quadratic loss may be
written in the form of a credibility estimate combining the prior
mean of λ with the maximum likelihood estimate λ̂ in (i). State the
credibility factor. [UK Sept 2004]
7. An insurer wishes to estimate the expected number of claims, λ on a particular

type of policy. Prior beliefs about λ are represented by a gamma distribution
with density function:
!!
𝑓 𝜆 = !
𝜆!!! 𝑒 !!" 𝜆 > 0
For an estimate, d , of λ the loss function is defined as:
L(λ , d ) = (λ − d ) 2 + d 2
Show that the expected loss is given by:
α(α + 1) 2dα
E[L(λ, d)] = − + 2d 2
β2 β
and hence determine the optimal estimate for λ under the Bayes rule.
[UK April 2005]
8. (i) Explain what a conjugate prior distribution is.
(ii) The random variables X1,X2,...,Xn are independent and have density
function: f ( x) = λe − λx , (x > 0)
Show that the conjugate prior distribution for λ is a gamma distribution.
(iii) (a) The density function of λ is: .
!!
f(λ) = 𝜆!!! 𝑒 !!! (λ > 0)
Γ(α )
Show that E(1 / λ) = s /(α − 1)
(b) Hence if X1,X2,....Xn is an independent random sample from an
exponential distribution with parameter λ, show that the posterior
mean of 1/λ can be expressed as a weighted average of the prior
mean of 1/λ and the sample average.
www.sankhyiki.in
+91-‐9711150002

(iv) An insurer is considering introducing a new policy to provide insurance
against the failure of toasters within the first five years of purchase. Alan
and Beatrice are underwriters working for the insurer. Based on his
experience of similar products, Alan believes that toasters last three years
on average. Beatrice believes that six years is the average lifetime. Both are
adamant and are prepared to express their uncertainties about the average
lifetime, in terms of standard deviations of six months and one year
respectively. They decide to resolve their differences by testing a sample
of toasters large enough to ensure the difference in their posterior
expectations for the average lifetime will be less than one year.
Calculate how many toasters they should test, assuming the exponential
distribution is a good model for toaster lifetimes.
You may use the fact that if λ ~ Γ(α,s) then:
1
var(1 / λ) = [E(1 / λ)]2 × [UK April 2005]
α−2
9. The total amounts claimed each year from a portfolio of insurance policies over n
years were x1,x2 … , xn. The insurer believes that annual claims have a normal
distribution with mean θ and variance σ 12 , where θ is unknown. The prior
distribution of θ is assumed to be normal with mean µ and variance σ 22 .
(i) Derive the posterior distribution of θ.
(ii) Using the answer in (a), write down the Bayesian point estimate of θ
under quadratic loss.
(iii) Show that the answer in (b) can be expressed in the form of a credibility
estimate and derive the credibility factor.
The claims experience over five years for two companies was as follows:
Year 1 2 3 4 5
Company A Amount 421 417 438 456 463
Company B Amount 343 335 356 366 380
(iv) Determine the Bayes credibility estimate of the premiums the insurer
should charge for each company based on the modelling assumptions of
part (i), a profit loading of 25% and the following parameters:
www.sankhyiki.in
+91-‐9711150002

Company A Company B
µ 400 300
σ12 500 350
σ 22 800 600
(v) Comment on the effect on the result of increasing σ12 and σ 22 .

[UK Sept 2005]
10. An insurer has for 2 years insured a number of domestic animals against
veterinary costs. In year 1 there were n1 policies and in year 2 there were n2
policies. The number of claims per policy per year follows a Poisson distribution
with unknown parameter θ.
Individual claim amounts were a constant c in year 1 and a constant c(1 + r) in
year 2. The average total claim amount per policy was 𝑦! in year 1 and 𝑦! in year
2. Prior beliefs about θ follow a gamma distribution with mean α/λ and variance
α/λ . In year 3 there are n3 policies, and individual claim amounts are c(1 + r)2. Let
2
Y3 be the random variable denoting average total claim amounts per policy in
year 3.
(i) State the distribution of the number of claims on the whole portfolio over
the 2 year period.
(ii) Derive the posterior distribution of θ given 𝑦! and 𝑦!
(iii) Show that the posterior expectation of Y3 given 𝑦! , 𝑦! can be written in
the form of a credibility estimate:
α
× c(1 + r ) 2
Z × k + (1 − Z) ×
λ
specifying expressions for k and Z .
(iv) Describe k in words and comment on the impact the values of n1 , n2, have
on Z. [UK April 2006]
11. (i) Let p be an unknown parameter, and let f(p|x) denote the probability
density of the posterior distribution of p given information 𝑥 . Show that
under all-or-nothing loss the Bayes estimate of p is the mode of f(p|x).
(ii) Now suppose p is the proportion of the population carrying a particular
genetic condition. Prior beliefs about p have a U(0,1) distribution. A
sample of size N is taken from the population revealing that m individuals
have the genetic condition.
www.sankhyiki.in
+91-‐9711150002

(a) Suggest why the U(0,1) distribution has been chosen as the prior,
and derive the posterior distribution of p.
(b) Calculate the Bayes estimate of p under all-or-nothing loss.
[UK Sept 2006]
12. The number, X of claims on a given insurance policy over one year has
probability distribution given by
P(X = k ) = θk (1 − θ) , k = 0,1,2,…
where θ is an unknown parameter with 0 < θ < 1.
Independent observations x1,..., x n are available for the number of claims in the
previous n years. Prior beliefs about θ are described by a distribution with
density:
f (θ) ∝ θα −1 (1 − θ)α −1
for some constant α> 0.
(i) (a) Derive the maximum likelihood estimate. θ̂ , of θ given the data
x1,..., x n .
(b) Derive the posterior distribution of θ given the data x1,..., x n .
(c) Derive the Bayesian estimate of θ under quadratic loss and show
that it takes the form of a credibility estimate
Zθˆ + (1 − Z)µ
where µ is a quantity you should specify from the prior distribution
of θ.
(d) Explain what happens to Z as the number of years of observed data
increases.
(ii) (a) Determine the variance of the prior distribution of θ.
(b) Explain the implication for the quality of prior information of
increasing the value of α. Give an interpretation of the prior
distribution in the special case α = 1.
(iii) Calculate the Bayesian estimate of 𝜃 under quadratic loss if n = 3, x1 = 3,
x2= 3, x3= 5 and
(a) α=5
(b) α = 2.
Comment on your results in the light of (ii) above. [UK April 2007]
www.sankhyiki.in
+91-‐9711150002

13. The number of claims, N. in a year on a portfolio of insurance policies has a
Poisson distribution with parameter λ. Claims are either large (with probability
p) or small (with probability 1 — p) independently of one another.
Suppose we observe r large claims. Show that the conditional distribution of
N—r|r is Poisson and find its mean. [UK Sept 2007]
14. A claim amount distribution is normal with unknown mean µ and known
standard deviation £50. Based n past experience a suitable prior distribution for µ
is normal with mean £300 and standard deviation £20.
(i) Calculate the prior probability that µ,the mean of the claim amount
distribution, is less than £270.
(ii) A random sample of 10 current claims has a mean of £270.
(a) Determine the posterior distribution of µ.
(b) Calculate the posterior probability that µ is less than £270 and
comment on your answer. [UK April 2008]
15. Claim amounts on a portfolio of insurance policies have an unknown mean µ.

Prior beliefs about µ are described by a distribution with mean µ0 and variance
σ 02 . Data are collected from n claims with mean claim amount x and variance s2.
A credibility estimate of is to be made, of the form:
Zx + (1 − Z)µ0
Suggestions for the choice of Z are:
!!!! !!!! !!!
A !!!! !! !
B !!!! !!
C !!!!!
Explain whether each suggestion is an appropriate choice for Z. [UK Sept 2008]
16. An insurance company provides warranties for a certain electrical gadget; At the
start of 2006 there were 4,500 gadget under warranty, each of which has a
probability q of suffering complete failure in 2006 (independently between
gadgets). The prior distribution of q is beta with mean 0.015 and standard
deviation 0.005. Given that 58 gadgets suffer a complete failure in 2006,
determine the posterior distribution of q. [UK Sept 2008]
17. An insurer’s portfolio consists of three independent policies. Each policy can give
rise to at most one claim per month, which occurs with probability θ
www.sankhyiki.in
+91-‐9711150002

independently from month to month. The prior distribution of θ is beta with

parameters α = 2 and β = 4. A total of 9 claims are observed on this portfolio over
a 12 month period.
(ii) Derive the Bayesian estimate of θ under all or nothing loss.
[UK April 2009]
18. A certain proportion p of electrical gadgets produced by a factory is defective.
Prior beliefs about p are represented by a Beta distribution with parameters α
and β. A sample of n gadgets is inspected, and k are found to be defective.
(i) Explain what is meant by a conjugate prior distribution.
(ii) Derive the posterior distribution for beliefs about p.
⎛ 1 ⎞ α + β − 1
(iii) Show that X ~ Beta(α, β) with α > 1 then E⎜ ⎟ =
⎝ X ⎠ α −1
(iv) It is required to make an estimate d of p. The loss function is given by
(d − p) 2
L(d, p) =
p
Determine the Bayes estimate d* of p.
(v) Determine a parameter Z such that d* can be written as:
k 1 1
d* = Z × + (1 − Z) × where µ is the prior expectation of .
n µ p
α+k
(vi) Under quadratic loss, the Bayes estimate would have been .
α+β+n
Comment on the difference in the two Bayes’ estimates in the specific case
where α = β = 3 , k=2 and n = 10. [UK Sept 2009]
19. A coin is biased so that the probability of throwing a head is an unknown

constant p. It is known that p must be either 0.4 or 0.75. Prior beliefs about p are
given by the distribution:
P(p = 0.4) = 0.6 P(p = 0.75) = 0.4
The coin is tossed 6 times and 4 heads are observed. Find the posterior
distribution of p. [UK April 2010]
20. An office worker receives a random number of emails each day. The numbers of
emails per day follows a Poisson distribution with unknown mean µ. Prior
www.sankhyiki.in
+91-‐9711150002

beliefs about µ are specified by a gamma distribution with mean 50 and standard
deviation 15. The worker receives a total of 630 emails over a period of ten days.
Calculate the Bayesian estimate of µ under all or nothing loss. [UK Sept 2010]
21. Let y1,..., yn be samples from a uniform distribution on the interval [0, θ] where
θ > 0 is an unknown constant. Prior beliefs about θ are given by a distribution
with density
⎧αβαθ−(1+ α ) θ>β
f (θ) = ⎨
⎩0 otherwise
where α and β are positive constants.
(i) Show that the posterior distribution of θ given y1 is of the same form as
the prior distribution, specifying the parameters involved.
(ii) Write down the posterior distribution of θ given y1,..., yn [UK April 2011]
22. An accountant is using a psychic octopus to predict the outcome of tosses of a

fair coin. He claims that the octopus has a probability p > 0.5 of successfully
predicting the outcome of any given coin toss. His actuarial colleague is
extremely sceptical and summarises his prior beliefs about p as follows: there is
an 80% chance that p = 0.5 and a 20% chance that p is uniformly distributed on
the interval [0.5,1]. The octopus successfully predicts the results of 7 out of 8 coin
tosses.
Calculate the posterior probability that p = 0.5. [UK Sept 2011]
23. The total claim amount per annum on a particular insurance policy follows a
normal distribution with unknown mean θ and variance 2002. Prior beliefs about
θ are described by a normal distribution with mean 600 and variance 502. Claim
amounts x1,x2,……xn are observed over n years.
(i) State the posterior distribution of θ.
(ii) Show that the mean of the posterior distribution of θ can be written in the
form of a credibility estimate.
Now suppose that n=5 and that total claims over the five years were 3,400.
(iii) Calculate the posterior prob that θ is greater than 600. [UK April 2012]
www.sankhyiki.in
+91-‐9711150002

24. A proportion p of packets of a rather dull breakfast cereal contain an exciting toy
(independently from packet to packet). An actuary has been persuaded by his
children to begin buying packets of this cereal. His prior beliefs about p before
opening any packets are given by a uniform distribution on the interval [0,1]. It
turns out the first toy is found in the n1th packet of cereal.
(i) Specify the posterior distribution of p after the first toy is found.
A further toy was found after opening another n2 packets, another toy after
opening another n3 packets and so on until the fifth toy was found after opening
a grand total of n1 + n2 + n3 + n4 + n5 packets.
(ii) Specify the posterior distribution of p after the fifth toy is found.
(iii) Show the Bayes’ estimate of p under quadratic loss is not the same as the
maximum likelihood estimate and comment on this result.
[UK April 2012]
25. The number of claims arising in a year from a group of policies follows a Poisson
distribution with mean µ. The prior distribution for µ is gamma with parameters
α=10 and λ = 2. Given that 8 claims arose in the last year, determine the posterior
distribution for µ.
26. The number of claims per month are independent Poisson random variables with
mean λ, and the prior distribution for λ is exponential with mean 0.2.
(i) Determine the posterior distribution for λ given the observed values
x1, ……., xn of the number of claims in each of n months.
(ii) Determine the Bayesian estimator of λ.
(a) under quadratic loss
(b) under “all-or-nothing” loss
(iii) If n = 5 and !!!! x! = 1, calculate to 2 significant figures the Bayesian
estimate of λ under absolute error loss.
27. The number of claims registered per week has a Poisson distribution for which
the mean λ, is either 1 or 2. The prior distribution for λ is given by:
P(λ=1) = 0.4 P(λ=2) = 0.6
Given that three claims are claims are registered in a particular week, calculate
the Bayesian estimate of λ under squared error loss, and under zero-one loss.
www.sankhyiki.in
+91-‐9711150002

28. The number of claims arising each month from a general insurance portfolio has
a Poisson distribution, with unknown Poisson parameter λ. Claims are
monitored over a period of 50 months, and an average of 210 claims per month is
observed.
(i) It is suggested, based on knowledge gained from similar portfolios,
that a suitable prior distribution for λ has mean 250 and variance 45.
Using the conjugate prior distribution, determine the posterior
distribution of λ and the Bayesian estimate of λ under quadratic loss.
(ii) An alternative suggestion for estimating λ is to use the number of
claims occurring on a single day, which is assumed to have a Poisson
distribution with mean λ/30. It is suggested that the following prior
distribution for λ should be used:
P(λ = 230) = 0.2, P(λ = 250) = 0.5 and P(λ = 270) = 0.3
If 7 claims were recorded on the most recent day for which data are
available, determine the posterior distribution for λ, and hence find the
Bayesian estimate of λ under quadratic loss.
29. An actuary has a tendency to be late for work. If he gets up late then he arrives at
work X minutes late where X is exponentially distributed with mean 15. If he
gets up on time then he arrives at work Y minutes late where Y is uniformly
distributed on [0, 25]. The office manager believes that the actuary gets up late
one third of the time.
Calculate the posterior probability that the actuary did in fact get up late given
that he arrives more than 20 minutes late at work. [UK April 2013]
30. An insurance company has a portfolio of n policies. The probability of a claim in

a given year on each policy is p independently from policy to policy, and the
possibility of more than one claim can be ignored. Prior beliefs about p are
specified by a Beta distribution with parameters α and β. In one year the
insurance company has a total of k claims on the portfolio.
Calculate the posterior estimate of p under all or nothing loss and show that it
can be written in the form of a credibility estimate.
[You may use without proof the fact that the mode of a Beta distribution with
parameters α and β is α-1/α+β-2] [UK Sept 2013]
www.sankhyiki.in
+91-‐9711150002

31. The heights of adult males in a certain population are Normally
distributed with unknown mean µ and standard deviation σ = 15. Prior
beliefs about µ are described by a Normal distribution with mean 187 and
standard deviation 10.
(i) Calculate the prior probability that µ is greater than 180.
A sample of 80 men is taken and the mean height is found to be 182.

(ii) Calculate the posterior probability that µ is greater than 180.
[UK April 2014]
32. Let θ denote the proportion of insurance policies in a certain portfolio on which a
claim is made. Prior beliefs about θ are described by a Beta distribution with
parameters α and β.
Underwriters are able to estimate mean µ and variance σ2 of θ.
(i) Express α and β in terms of µ and σ.
A random sample of n policies is taken and it is observed that claims have arisen
on d of them.
(ii) (a) Determine the posterior distribution on θ.
(b) Show that the mean of the posterior distribution can be written in
form of a credibility estimate.
(iii) Show that the credibility factor increases as σ increases. [UK April 2014]
33. A direct insurance sales office of a motor insurance company receives random
number of enquiry calls every day. The number of calls each day follows a
Poisson distribution with unknown mean β.
Prior beliefs about β are specified by a Gamma distribution with mean of 200 and
standard deviation of 50. The sales team has received 240 calls daily on average
recently. Calculate the Bayesian estimate of β under Quadratic loss.
[India May 2014]
34. There is a group of m independent ‘mediclaim’ policies, which are in the book
of an insurer since long time. Under each policy, at the most one claim is
possible in any month as per the contract. The probability of a claim in a month
for each policy is p (0 < p <1). The total monthly number of claims from the
group of m policies are x1, x2, ….., xn in the past n months. The prior
distribution of p is given by the density function f(p) ∞ {p(1-p)}a where a > -1.
(i) Derive the posterior distribution of p given x1, x2, ….., xn.
(ii) Derive the maximum likelihood estimate of p.
www.sankhyiki.in
+91-‐9711150002

(iii) Derive the Bayesian estimate of p under quadratic loss and show it takes
the form of a credibility estimate Zp + (1 – Z) k, where k is a scalar (which
you should specify) in terms of prior distribution of p.
(iv) Explain what happens to Z when n increases gradually.
(v) Calculate the Bayesian estimates of p and Z if m = 100, n = 12 and
x1+ x2 + ….. + x12 = 15 when a = 0 and a = 3.
(vi) Considering the prior variance, comment on effect on Z of increasing a
and also relate this effect to the quality of prior information of p in each
case. [India Sept 2013]
35. Let p be an unknown parameter and let f(p|x) be the probability density of the
posterior distribution of p given information x.
(i) Show that under all-or-nothing loss the Bayes estimate of p is the mode of
f(p|x).
John is setting up an insurance company to insure luxury yachts. In year 1 he will
insure 100 yachts and in year 2 he will insure 100 + g yachts where g is an integer.
If there is a claim the insurance company pays a fixed sum of $1m per claim.
The probability of a claim on a policy in a given year is p. You may assume that
the probability of more than one claim on a policy in any given year is zero. Prior
beliefs about p are described by a Beta distribution with parameters α = 2 and
β = 8.
In year 1 total claims are $13m and in year 2 they are $20m.
(ii) Derive the posterior distribution of p in terms of g.
(iii) Show that it is not possible in this case for the Bayes estimate of p to be the
same under quadratic loss and all-or-nothing loss. [UK April 2015]
36. A small island is holding a vote on independence. Two recent survey results are
shown below:
Poll Sample size Support for
independence
A 10 5
B 20 11
You should assume that the samples are independent.
A politician is using a suitable uniform distribution as the prior distribution in
order to estimate the proportion θ in favour of independence.
www.sankhyiki.in
+91-‐9711150002

(i) Calculate an estimate of θ under the quadratic loss function.
A rival politician decides to use instead a beta distribution as the prior, with
parameters α and β, where α = β.
(ii) Determine the new estimate of θ under the “all-or-nothing” loss function
in terms of α. [UK Sept 2015]
37. Claims X each year from a portfolio of insurance policies are normally distributed
with mean θ and variance τ2. Prior information is that q is normally distributed
with known mean µ and known variance σ2.
Aggregate claims over the last n years have been xi for i = 1 to n, and you should
assume that these are independent.
(ii) Write down the Bayesian estimate of θ under quadratic loss.
(iii) Show that the estimate in your answer to part (ii) can be expressed in the
form of a credibility estimate, including statement of the credibility factor
Z. [UK Sept 2015]
38. A child playing a game believes that a six sided die is unfair, and that he has a
probability p > 1/6 of predicting the outcome of any given throw. His mother is
less sure, and her prior beliefs about p are as follows:
• a 1/3 chance that p = 2/6 and
• a 2/3 chance that p = 1/6
The child accurately predicts the results of 4 out of 10 dice throws. Calculate the
posterior probability that p = 1/6. [UK April 2016]
39. A breathalyser used by the police in a certain town incorrectly gives a positive
reading for drivers who are not over the legal limit one time in 20 and an
incorrect negative reading for drivers who are over the limit one time in 5. If one
driver in 10 is actually over the limit on a particular day, what is the probability
that a driver who fails the breathalyser test is in fact over the legal limit (which
will be checked using a blood test at the police station)?
40. A single observation, x, is drawn from a distribution with the probability density
function:
f(x|θ) = 1/θ 0 < 𝑥 < 𝜃
The prior distribution of q is given by: g(θ) = θ exp(-θ ), θ > 0
Derive an expression in terms of x for the Bayes estimator of θ with respect to the
absolute error loss function.
www.sankhyiki.in
+91-‐9711150002

41. The number of claims in a week that arises from a certain group of insurance
policies has a Poi(µ) distribution. In the last 2 weeks, the numbers of claims
incurred were 7 and 11, respectively.
(i) Derive the posterior distribution given that the prior distribution for m is:
(a) gamma with parameters α = 7 and λ = 0.5
(b) uniform on the integers 8, 10 and 12.
(ii) Hence for each case in part (i) obtain the Bayesian estimate of µ using a:
(a) quadratic loss function
(b) absolute loss function
(c) zero-one loss function.
42. At the end of last year, a new laptop manufacturer approached an insurance
company for providing insurance for laptops sold during the first year of its
operation.
Based on existing insurance contracts in its portfolio, the insurer estimated the
probability of failure over the coming year for each laptop to be “p”. After a year
the insurer observes that 92 laptops suffer from complete failure during the year
of insurance out of 9000 laptops insured.
Assuming that the prior distribution of p is beta with mean 0.013 and standard
deviation 0.004 find out the posterior distribution of p. [India April 2016]
43. The number of claims per policy per year follows Poisson distribution with
unknown parameter µ. Prior beliefs that µ follows Gamma distribution with
parameters α and λ. Number of policies sold, individual claim amounts and
average total claim amount per policy is summarized in the table given below.
Item Year 1 Year 2 Year 3 Year 4

No. of Policy 100 130 180 250
Individual claim amounts 1,500 1,700 2,000 2,500
Average total claim amount per policy a1 a2 a3 A4
(i) Using posterior distribution of µ, derive E(A4 |a1, a2, a3) in a credibility
estimate form as given below:
!
2500× × 1 − 𝑍 + 𝑏×𝑍, specifying b and Z.
!
(ii) Comment on Z. [India Nov 2015]
44. Profit Insurance Company sells 3500 policies under a particular category of
business last year. The policies are assumed to be independent and at most one
claim can be made on any policy. The probability of making a claim q is the same
for all policies.
www.sankhyiki.in
+91-‐9711150002

The total number of claims in the previous year was found to be p.
The prior distribution of q is Beta (α, β)
(i) Find maximum likelihood estimate of q and posterior distribution of q
given the past year data.
(ii) Find the Bayesian estimate of the posterior distribution under quadratic
loss function. If p = 500, α =1 and β = 4, find the value of the Bayesian
estimate.
(iii) Can the Bayesian estimate of q be written in the form of a credibility
estimate? If yes, express the same in the form of a credibility estimate and
compute the credibility factor for the above values mentioned in part (ii).
[India May 2015]
www.sankhyiki.in
+91-‐9711150002

ANSWERS
!
1. (i) (a) Gamma(1+ 𝑋! , 𝑛 + !)
!
(b) Z = !
!!
!
(c) 𝜆 = 54.99
!
(ii) (a) P(T>M) = 𝑒 !! ×0.6 + (!!!)! ×0.4
2. Gamma(22,26)
4. (i) α = 0.312 and β = 1.248

(ii) 𝜃|𝑋~𝐵𝑒𝑡𝑎(12.312, 39.248) and Mean = 0.23879
! !"
(iv) (a) 𝑍 = !!!!! = !"!!.!"#!!.!"# = 0.9697
(b) (1) Z should increase (2) Z should increase
(c) The limiting value of Z = 1. This would mean that we would just use
the sample data and ignore the prior.
5. (i) Beta(11,51) (ii) (a) 0.177 (b) 0.167
!! !!! !
6. (i) λ̂ = !
(ii)(a) Gamma(n1+n2+1, 3+v) (b) Z=
!!!
!
7. d= !!
8. (iv) n > 74
!! ! !! !
! ! ! !
!! ! !! ! !!
! !!
9. (i) 𝜃|𝑋~( ! ! ,
! ! ) (ii) 𝜃 = ! !
! ! !! ! ! !
!! ! !! !! !! !!
! !!
!
!!!
(iii) Z= ! ! (iv) 437.69
!
!!
! !!
!
2
(v) As σ 1
increases, Z decreases and as σ 22 increases, Z increases.
!! !! ! !
! !
10. (i) Poi((n1+n2)𝜃) (ii) Gamma(𝛼 + !
+ !(!!!) , 𝜆 + 𝑛! + 𝑛! )
! !!! (!!!)! !! !! !(!!!)!! !!
(iii) 𝑍 = !!!! and 𝑘 =
! !!! !! !!!
(iv) k is effectively a weighted average of the inflation adjusted average claim
amounts for the previous 2 years, weighted by the number of policies in force. As
the number of policies in force increases, Z becomes closer to 1, and so more
weight is placed on the actual experience, and less on the prior expectations.
11. (ii) (a) Using U(0, 1) as the prior for p suggests that no prior information or
beliefs about p have been formed — it is equally likely to lie anywhere in the
www.sankhyiki.in
+91-‐9711150002

range [0, 1]. So posterior beliefs about p have a Beta distribution with parameters
m + 1 and N − m + 1.
(b) p = m/N
! !!! !⋯!!
12. (i) (a) 𝜃 = !!!!
! !!! !⋯!!
(b) The posterior distribution is given by a beta distribution with parameters
α + x1 +…+ xn and α + n.
!! !! !! !!
(c) 𝜃 ∗ = !!! !! !!
𝑍 = !!! !! !!
𝜇 = 0.5 is the prior mean of 𝜃.
(d) As n increases, Z tends towards 1, and the Bayes estimate approaches the
maximum likelihood estimate, as more credibility is put on the data, and less
on the prior estimate.
!
(ii)(a) Var = !(!!!!)
(b) Higher values of α result in a lower variance and hence imply greater
certainty over the prior value of θ. In the special case where α =1 the prior
distribution is Uniform on [0,1] implying that we have no particular reason to
believe that any prior value of θ is more or less likely than any other.
(iii) (a) 𝜃 ∗ = 0.6667 (b) 𝜃 ∗ = 0.7222
The first set of parameters has greater certainty attached to the prior estimate (i.e.
a higher value of α ), and therefore the posterior estimate is closer to the mean of
the prior distribution (which is 0.5) than in the second case.
14. (i) 0.06681 (ii) (a) N(281.54, 12.402) (b) 0.1759

15. A This is an appropriate choice – the larger the value of n, the closer Z is to 1 and
the more weight is placed on the data. Furthermore, high values of the variance
of the prior (indicating uncertainty in the prior) lead to higher values of Z and so
more weight on the sample data. Finally, high variance in the sample reduces the
value of Z and places more reliance on the prior.
B This is not appropriate – the value is a constant independent of the size of the
sample, whereas we would expect more weight on the sample the larger the
sample.
C This is not appropriate – the greater the value of n, the lower the value of Z
and the less weight is put on the sample. This is the reverse of what we would
expect.
16. Beta (66.85, 5023.15)

17. (i) Beta (11, 31) (ii) 0.25
www.sankhyiki.in
+91-‐9711150002

18. (i) A distribution is a conjugate prior for an unknown parameter if when used as
a prior distribution for that parameter it leads to a posterior
distribution which is from the same family.
(ii) Beta (𝛼 + 𝑘, 𝛽 + 𝑛 − 𝑘)
!!!!!
(iii) 𝑑 ∗ = !!!!!!!
!
(iv) 𝑍 = !!!!!!!
(v) Using the given loss function the estimate is 0.26666 and using Bayesian loss,
we have 0.3125. The mean of the prior is 0.5 and the observed sample mean is 0.2.
The loss function in (iv) penalises mis-estimates particularly when the true value
of p is lower. This means that the estimate in (iv) is lower than would result from
straight quadratic loss.
19. The post dist of p is given by P(p = 0.4) = 0.411436 and P(p = 0.75)= 0.588564
20. 62.62
21. (ii) The posterior distribution has the same form with parameters α + n and
max(β, y1, …, yn).
22. 0.364557
23. (iii) 0.669
24. (i) Beta (2, n1) (ii) Beta (6, n1+n2+…+n5 - 4)
(iii) Under squared error loss the Bayes estimate is given by the mean of the
! !
posterior distribution which in this case is 𝑝 = !!! = !
! !!! ⋯!!! !𝟐
!
and the MLE estimate is 𝑝 = !
! !!! ⋯!!!
So the two estimates are not the same. This is perhaps a little surprising given
that we started with an uninformative prior, but arises because the estimates are
calculated in two different ways – i.e. one maximises the likelihood and the other
minimises the expected squared error. If we wanted the two to be the same we
should use an “all-or-nothing” loss function.
25. Gamma (18, 3)
!! !! !
26. (i) Gamma (1+ 𝑥! , 𝑛 + 5) (ii) (a) !!!
(b) !!!! (iii) 0.17
27. 1.815
28. (i) Gamma (11888.89, 55.5556) and Bayesian estimate is 214
(ii)
www.sankhyiki.in
+91-‐9711150002

𝜆 230 250 270
P(𝜆 𝑋 = 7 0.22145 0.50954 0.26901
Bayesian estimate is 250.951

(iii) We would expect both procedures to reduce the prior estimates of 250 and
252, respectively, downwards towards 210. However, the extent to which they do
this differs substantially. In the first case the new estimate is almost equal to 210,
ie the sample data have been very important in determining the new estimate. In
the second case the new estimate is still very close to 252, ie the sample data have
been given very little weighting in the procedure. This seems sensible to the
extent that in the first case the sample data are based on a period of 50 months,
whereas in the second case the sample data have been based only on a single
day. So the sample data in the first case are probably more reliable. The first
procedure is preferable, other things being equal (eg cost of obtaining the data,
etc).
29. 0.3972
!!!!! !
30. 𝑝 = !!!!!!! and 𝑍 = !!!!!!!
31. (i) 0.75804 (ii) 0.90180

!! !!! !!! ! (! !!! !! ! )(!!!)
32. (i) 𝛼 = !!
𝛽= !!
(ii)(a) Beta (𝛼 + 𝑑, 𝛽 + 𝑛 − 𝑑)
33. 237.03
!!
34. (i) Beta ( 𝑥! + 𝑏, 𝑚𝑛 + 𝑏 − 𝑥! ) (ii) 𝑝 = !"
!! ! !"
(iii) Bayesian estimate = !!!!"! , k = 0.5 and 𝑍 = !!!!"
(iv) When n increases, Z increases and for very large values of n, for a given b, Z
tends to 1. It means for a given b, as the size of past observations increases, more
and more weight is assigned to M.L.E of p and lesser weight is assigned to prior
estimates of p.
(v) When a = 0, b = 1 and So, p* = 16 / 1202 = 8/601. Z = 1200 / 1202.
When a = 3, b = 4 and So, p* = 19 / 1208 = 1 / 80. Z = 1200 / 1208.
(vi) When a = 0, b = 1 & so, prior variance = 1 / 2.2.3 = 1/ 12.
When a = 3, b = 4 & so, prior variance = 4.4/8.8.9 = 1/36.
So, as a increases, prior variance of p decreases. Though the prior mean remains
same as 0.5, but with higher value of a, we are more confident about p around 0.5
35. (ii) Beta(35, 175+g)
www.sankhyiki.in
+91-‐9711150002

!"!!
36. (i) 0.53125 (ii) !"!!!
37. (i)
!
(ii) (iii) 𝑍 = !!
!! !
!
38. 0.323
39. 0.64
40. The Bayes estimator of θ with respect to absolute error loss is x + log 2
41. (i) (a) Gamma (25, 2.5) (b)
µ 8 10 12
Post prob 0.398 0.4047 0.1973
(ii) (a) 10 and 9.599 (b) 9.866 and 10 (c) 9.6 and 10
42. Beta(102.41, 9698.52)
43. (ii) If the number of policies (I.e. the past experience) increases then Z becomes
closer to 1 and hence more weight is placed on actual experience and less on the
prior expectations.
!
44. (i) MLE, 𝑞 = !"## and Post distribution – Beta (p+α, β –p +3500)
(ii) 0.1429
! ! !"##
(iii) 𝑍 !"## + (1 − 𝑍) !!! where 𝑍 = !!!!!"## = 0.998573
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 12
GENERALISED LINEAR MODELS (G.L.M.)
1. Let Yij be the number of accidents on a particular motorway in the jth quarter of
year i, i = 1, 2, 3, j = 1,…..4. Suppose that Yij has a Poisson distribution with
mean µij.
(i) (a) Derive the log-likelihood function as a function of µij and determine
the maximum likelihood estimate of µij.
(b) If log µij = µ, determine the maximum likelihood estimate of µ.
(c) Define the scaled deviance and derive an expression for the scaled
deviance for the model in (i)(b).
(ii) Three models are shown below
Deviance Degrees of Freedom
Model 1 log µij = µ 266.35 11
Model 2 log µij = αi 202.19 9
Model 3 log µij = αi + βj 10.68 6
(a) Interpret each of these models

(b) Determine which model you would recommend, giving your
reasons.
(iii) It is found that the model log µij = α×i + βj provides a reasonable fit to the
data, with the estimate of α given as 0.34. Interpret this model.
[UK April 2004]
2. The preparation times for coffee in a high-street coffee shop have density
4
𝑓 𝑦 = ! 𝑦𝑒 !!!/!
𝜇
(i) Show that this can be written in exponential family form, and determine
the natural parameter.
(ii) Interpret the two models:
!
Model I : !
= 𝛼! 𝑖 = 1, 2, 3
𝛼 𝑖 = 1
!
Model II : =
𝛼 + 𝛽 𝑖 = 2, 3
!
where i = 1, 2, 3 correponds to filter coffee, cappuccino and espresso
respectively. [UK Sept 2004]
www.sankhyiki.in
+91-‐9711150002

3. Y1, Y2,…… Yn are independent claims, which are assumed to be exponentially
distributed, with E(Yi) = µi
(i) Show that the canonical link function is the inverse link function.
(ii) It is decided that the canonical link should not be used, but that the mean
claim sizes should be modeled as follows:
𝛼 𝑖 = 1,2, … . , 𝑚
log 𝜇! =
𝛽 𝑖 = 𝑚 + 1, 𝑚 + 2, … . 𝑛
(a) Show that the log-likelihood can be written as:
! !
!! !!
− 𝑚𝛼 + 𝑛 − 𝑚 𝛽 + 𝑒 𝑦! + 𝑒 𝑦!
!!! !!!!!
(b) Derive the maximum likelihood estimators of α and β.
(c) Show that the scaled deviance for this model is
! 1 ! ! 1 !
𝑚 !!! 𝑦! 𝑛 − 𝑚 !!!!! 𝑦!
2 log + 𝑙𝑜𝑔
𝑦! 𝑦!
!!! !!!!!
(iii) For a particular data set, m = 20, n = 44

!" !!
1 1
𝑦! = 14.2 𝑦! = 18.7
20 24
!!! !!!"
Calculate the deviance residual for y1 = 7. [UK April 2005]
4. Y1, Y2,…..Yn are independent random variable, and Yi ~ Poisson, mean µi

The fitted values for a particular model are denoted by µi . Derive the form of
the scaled deviance. [UK Sept 2005]
5. An insurance company has a set of n risks ( I = 1,2,….n) for which it has

recorded the number of claims per month. Yij for m months (j = 1, 2,…..,m).
It is assumed that the number of claims for each risk, for each month, are
independent Poisson random variables with E[Yij] = µij
These random variables are modelled using a generalized linear model, with
log µij = βi ( i = 1,2,…., n)
(i) Derive the maximum likelihood estimator of βi
(ii) Show that the deviance for this model is:
! ! !!" ! !
2 !!! !!! 𝑦!" 𝑙𝑜𝑔 !!
− (𝑦!" − 𝑦! 𝑤ℎ𝑒𝑟𝑒 𝑦! = ! !!! 𝑦!"
(iii) A company has data for each month over a 2 year period. For one risk, the
average number of claims per month was 17.45. In the most recent month
of this risk, there were 9 claims. Calculate the contribution that this
www.sankhyiki.in
+91-‐9711150002

observation makes to the deviance. [UK April 2006]
6. The random variable W has a binomial distribution such that:

𝑛 !
𝑃 𝑊 = 𝑤 = 𝜇 1 − 𝜇 !!! 𝑤 = 0,1,2, … . . 𝑛
𝑤
!
Let 𝑌 = !
! !
(i) Write down the expression for 𝑃 𝑌 = 𝑦 , 𝑓𝑜𝑟 𝑦 = 0, ! , ! , … .1.
(ii) Express the distribution of Y as a member of the exponential family and
identify the natural parameter and the dispersion parameter.
(iii) Derive an expression for the variance function.
(iv) For a set of n independent observations of Y, derive an expression for the
scaled deviance. [UK Sept 2006]
7. (i) The gamma distribution with mean µ and variance µ2/α has density
function:
!"
!! !!
𝑓 𝑦 = !! ! !
𝑦 !!! 𝑒 𝑦 > 0
(a) Show that this may be written in the form of an exponential family.
(b) Use the properties of exponential families to confirm that the mean and
variance of the distribution are µ and µ2/α.
(i) Explain the difference between a continuous cavariate and a factor.
(ii) A company is analyzing its claims data on a portfolio of motor
policies and uses a gamma distribution to model the claim severties.
The company uses threerating factors:
• Policyholder age ( as a continuous variable)
• Policyholder gender
• Vehicle rating group ( as a factor)
(a) Write down the form of the linear predictor when all rating factors
are included as main effects.
(b) State how the linear predictor changes if an interaction between
policyholder age and gender is included. [UK April 2007]
8. y1, y2,…..yn are independent, identically distributed observations with

! !! ! !!
probability function given by 𝑓 𝑦! |𝜇 = !! !
(i) Show that the log-likelihood may be written as
𝜃 !!!! 𝑦! – 𝑛𝑏 𝜃 + 𝑡𝑒𝑟𝑚𝑠 𝑛𝑜𝑡 𝑑𝑒𝑝𝑒𝑛𝑑𝑖𝑛𝑔 𝑜𝑛 𝜃
and identify the natural parameter, θ, and the function b(θ).
www.sankhyiki.in
+91-‐9711150002

(ii) The fitted value for observation yi is denoted by 𝑦! .
(a) Write down the Pearson residual for yi, in terms of yi and 𝑦! .
(b) Explain why Pearson residuals are usually not suitable for model
checking for the Poisson distribution.
(iii) Show that the conjugate prior density function for θ is proportional to
𝑒𝑥𝑝 𝛼𝜃 − 𝛽𝑒 ! , and derive the posterior distribution for this prior.
! !"# !
(iv) Use the identity 𝐸 !"
= 0 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 to show that
! ! ! !
!!! !!
𝐸𝑏 𝜃 = ! 𝑎𝑛𝑑 𝐸 𝑏 𝜃 𝑦! , 𝑦! , … . . 𝑦! = !!!
, and comment on these
results. [UK Sept 2007]
9. Y1, Y2,….., Yn are independent observations from a normal distribution with

E[Yi] = µi and var[Yi] = σ2.
(i) Write the density of Yi in the form of an exponential family of
distributions.
(ii) Identify the natural parameter and derive the variance function.
(iii) Show that the Pearson residual is the same as the deviance residual.
[UK April 2008]
10. (i) Express the probability density function of the gamma distribution in the
form of a member of the exponential family of distributions. Specify the
natural and scale parameters.
(ii) State the corresponding canonical link function for generalized linear
modeling if the response variable has a gamma distribution.
[UK April 2009]
11. A portfolio consists of k independent travel insurance policies. Each policy

covers the policyholder’s trips over one year. For policy i, the number of claims
in the jth month of the covered year. Yij is assumed to have a distribution given
!
by 𝑃 𝑌!" = 𝑦 = 𝜃!" 1 − 𝜃!" for y = 0, 1, 2,…..
where 𝜃!" are unknown constants between 0 and 1.
(i) Write down the likelihood function and obtain the maximum likelihood
estimate for the parameters θij.
(ii) Show that P(Yij = y) can be written in exponential family form and suggest
its natural parameter.
www.sankhyiki.in
+91-‐9711150002

(iii) Suppose that θij depends on the temperature xj recorded in the jth month.
Explain why it is not appropriate to set θij = α + βxj. Suggest another
relationship between θij and α + βxj that might be used. [UK Sept 2009]
12. An insurance company is modeling claim numbers on its portfolio of motor

insurance policies using a Poisson distribution, whose mean depends on the age
and gender of the policyholder.
(i) Suggest a link function for fitting a generalized linear model for the mean
of the Poisson distribution.
(ii) Specify the corresponding linear predictor used for modeling the age and
gender dependence as:
(a) Age + gender
(b) Age + gender + age x gender [UK April 2010]
13. The probability density function of a gamma distribution is given in the

following parameterized form:
!!"
!!
𝑓 𝑥 = !! Г(!) 𝑥 !!! 𝑒 ! for x>0
(i) Express this density in the form of a member of the exponential family,
specifying all the parameters.
(ii) Hence show that the mean and variance of the distribution are given by µ
and µ2/α respectively. [UK Sept 2010]
14. Suppose that Y is a random variable belonging to a special subset of the

exponential family where the density function of Y has the
!" !! !
𝑓 𝑦, 𝜃, ∅ = 𝑒𝑥𝑝 ∅
+ 𝑐 𝑦, ∅
for some constants 𝜃 𝑎𝑛𝑑 ∅ and functions b and c.
(i) Show that the moment generating function of Y is given by:
𝑏 𝜃 + 𝑡∅ − 𝑏 𝜃
𝑀! 𝑡 = 𝑒𝑥𝑝
∅
{Hint: Note that the function 𝑓 𝑦, 𝜃 + ∅𝑡, ∅ is the density of another
!
random variable of the same family and hence !!
𝑓 𝑦, 𝜃 + ∅𝑡, ∅ 𝑑𝑦 = 1}
(ii) Show that E(Y) = b’(θ) and Var(Y) = φb’’(θ) using the result in (i).
(iii) Verify that the result in (i) holds if Y has a Poisson distribution.
[UK April 2011]
www.sankhyiki.in
+91-‐9711150002

15. An insurance company covers pedigree cats against the costs of medical
treatment. The cost of claims from a policy in a year is assumed to have a
normal distribution with mean µ ( which varies from policy to policy) and
known variance 252. It is assumed that 𝜇 = 𝛼 + 𝛽𝑥, 𝑤ℎ𝑒𝑟𝑒 𝛼 𝑎𝑛𝑑 𝛽 are fixed
constants and x is the age of the cat. You are given to the following data for the
pairs (yi,xi) for i = 1, 2, …., 50 where yi is the cost of claims 1st year for the ith
policy and xi is the age of the corresponding cat.
!" !" !" !"
𝑥! = 637 𝑦! = 5492 𝑦! 𝑥! = 74532 𝑥!! = 8312

!!! !!! !!! !!!
Calculate the maximum likelihood estimates of α and β. [UK Oct 2011]
16. (i) Define what it means for a random variable to belong to an exponential
family.
(ii) Show that if a random variable has the exponential distribution it belongs
to an exponential family. [UK April 2012]
17. The numbers of claims on three different classes of insurance policies over the
last four years are given in the table below:
Year 1 Year 2 Year 3 Year 4 Total
Class 1 1 4 5 0 10
Class 2 1 6 4 6 17
Class 3 5 6 4 6 24
The number of claims in a given year from a particular class is assumed to

follow a Poisson distribution.
(i) Determine the maximum likelihood estimate of the Poisson parameter for
each class of policy based on the data above.
(ii) Perform a test on the scaled deviance to check whether there is evidence
that the classes of policy have different mean claim rates and state your
conclusion. [UK April 2012]
18. A discrete probability distribution is defined by:

𝑛 1 2
𝑓 𝑦, 𝜇 = 𝑛𝑦 𝜇 !" 1 − 𝜇 !!!" 𝑦 = 0, , , … . . ,1
𝑛 𝑛
where µ is a parameter between 0 and 1.
www.sankhyiki.in
+91-‐9711150002

(i) Explain why this distribution belongs to an exponential family.
(ii) State the three main components that need to be taken into account when
constructing a generalized linear model.
(iii) Suggest a natural choice of link function if the response variable followed
the distribution defined above
(iv) Suggest a natural choice of link function if instead the response variable
followed a lognormal distribution. [UK Sept 2012]
19. An insurance company believes that individual claim amounts from house
insurance policies follow a gamma distribution with distribution function given
by:
𝛼! !
!
𝑓 𝑦 = ! 𝑦 !!! 𝑒 ! 𝑓𝑜𝑟 𝑦 > 0
𝜇 Γ 𝛼
where α and µ are positive parameters.
(i) Show that the gamma distribution can be written in exponential family
form, giving the natural parameter and the canonical link function.
The insurance company has data for claim amounts from previous claims. It
believes that the claim amount is primarily influenced by two variables.
• xi the type of geographical area in which the house is situated. This can
take one of 4 values.
• yi the category of the age of the house where the three categories are 0-
29 years, 30-59 years and 60 years+.
It wishes to model claim amounts using this data and the generalized linear
model from part (i) with its canonical link function. The insurance company is
investigating models, which take into account these variables, and has the
following table of values.
Model Choice of Scaled

predictor Deviance
A 1 900
B Age 789
C Age + location 544
D Age * location 541
www.sankhyiki.in
+91-‐9711150002

(ii) Explain, by analyzing the scaled deviances, which model the insurance
company should use. [UK April 2013]
20. The number of claims per month Y arising on a certain portfolio of insurance
policies is to be modeled using a modified geometric distribution with
probability density given by:
𝛼 !!!
𝑝 𝑦 𝛼 = 𝑦 = 1, 2, 3, … ..
1 + 𝛼 !
where α is an unknown positive parameter. The most recent four months have
resulted in claim numbers of 8, 6, 10 and 9.
(i) Derive the maximum likelihood estimate of α.
(ii) Show that Y belongs to an exponential family of distributions and suggest
its natural parameter. [UK Sept 2013]
21. For a certain portfolio of insurance policies the number of claims on the ith
policy in the jth year of cover is denoted by Yij. The distribution of Yij is given by:
!
𝑃 𝑌!" = 𝑦 = 𝜃!" 1 − 𝜃!" 𝑦 = 0, 1, 2, ….
𝑤ℎ𝑒𝑟𝑒 0 ≤ 𝜃!" ≤ 1 𝑎𝑟𝑒 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑤𝑖𝑡ℎ 𝑖 = 1, 2, … , 𝑘 𝑎𝑛𝑑 𝑗 = 1, 2, … , 𝑙.
(i) Derive the maximum likelihood estimate of θij given the single observed
data point yij.
(ii) Write P(Yij = y) in exponential family form and specify the parameters.
(iii) Describe the different characteristics of Pearson and deviance residuals.
[UK April 2014]
22. Annual numbers of claims on three different types of insurance policy follow a
Poisson distribution with parameter µi for i = 1, 2, 3. Data for the last four years
is given in the table below.
Type Year 1 Year 2 Year 3 Year 4 Total
1 5 5 0 1 11
2 2 5 4 5 16
3 5 6 4 5 20
www.sankhyiki.in
+91-‐9711150002

(i) Derive the maximum likelihood estimate of µ1 and calculate the
corresponding estimates of µ2 and µ3.
(ii) Test the hypothesis that µ1, µ2 and µ3 are equal using the scaled deviance.
[UK April 2015]
23. (i) Explain what is meant by a saturated model.
(ii) State the definition of the scaled deviance in a fitting under generalised
linear modelling.
(iii) (a) Define both Pearson and deviance residuals.
(b) Explain how these two types of residuals are generally different.
(c) State in which case they are the same. [UK Sept 2015]
24. (i) State the general expression of the exponential families of distributions
and use this to derive the relevant expressions for the mean and the
variance of these distributions.
(ii) Extend the result in (i) to obtain an expression for the third central
moment.
(iii) Show that the following density function belongs to the exponential
family of distributions:
!
!! ! !!
𝑓 𝑦 = !! ! ! 𝑦 !!! 𝑒 𝑓𝑜𝑟 𝑦 > 0
(iv) Using the results in (i) and (ii) obtain the second and third central
moments for this distribution. [UK April 2016]
25. A small insurer wishes to model its claim costs for motor insurance using a
simple generalised linear model based on the three factors:
The insurer is considering three possible models for the linear predictor:
Model 1: YO + FS + TC
Model 2: YO + FS + YO.FS + TC
Model 3: YO* FS *TC
www.sankhyiki.in
+91-‐9711150002

(i) Write each of these models in parameterised form, stating how many non-
zero parameter values are present in each model.
(ii) Explain why Model 1 might not be appropriate and why the insurer may
wish to avoid using Model 3.
(iii) The student fitting the models has said “We are assuming a normal error
structure and we are using the canonical link function.” Explain what this
means.
(iv) The table below shows the student’s calculated values of the deviance for
these three models and the constant model.
Complete the table by filling in the missing entries in the degrees of

freedom column and carry out the calculations necessary to determine
which model would be the most appropriate.
26. The following study was carried out into the mortality of leukaemia sufferers. A
white blood cell count was taken from each of 17 patients and their survival
times were recorded.
Suppose that Yi represents the survival time (in weeks) of the ith patient and xi
represent the logarithm (to the base 10) of the ith patient’s initial white blood
cell count (i = 1,2,…,17 ).
The response variables Yi are assumed to be exponentially distributed. A
possible specification for E(Yi ) is E(Yi ) = exp(α + βxi ) . This will ensure that
E(Yi ) is nonnegative for all values of xi .
(i) Write down the natural link function associated with the linear predictor
ηi = α + βxi .
(ii) Use this link function and linear predictor to derive the equations that
must be solved in order to obtain the maximum likelihood estimates of α
and β.
(iii) Given that the maximum likelihood estimate of α derived from the
www.sankhyiki.in
+91-‐9711150002

experimental data is 𝛼 = 8.477 and se(𝛼) = 1.655 , obtain an approximate
95% confidence interval for a and interpret this result.
(iv) The following two models are now to be compared:
Model 1: E(Yi ) = α
Model 2: E(Yi ) = α + βxi
The deviance for Model 1 is found to be 26.282 and the deviance for Model
2 is 19.457. Test the null hypothesis that β = 0 against the alternative
hypothesis that β≠0 stating your conclusion clearly.
27. An analyst at a general insurance company is examining claims data on a

portfolio of home insurance policies in a particular region. An exponential
distribution models the claim amounts and the following rating factors are
used:
SA sum assured, x (as a continuous variable)
PT property type, Ti (as a factor with i =1,2, . . .,10 )
NB number of bedrooms, Bj (as a factor with j =1,2, . . .,6 )
The table below shows 4 models considered by the analyst and their scaled
deviances for the data set.
Parameterise
Number of Scaled deviance
Model form of the
parameters
linear predictor
SA 𝛼 + 𝛽𝑥 2 238.4
SA+PT 206.7
SA+PT+SA.PT 178.3
SA*PT+NB 166.2
SA*PT*NB 𝛼!" + 𝛽!" 𝑥 120 58.9
(i) Complete the table.

(ii) Determine the model the analyst should choose on the basis of scaled
deviance
(iii) Describe the further information the analyst should consider before
making her recommendation about an appropriate choice of model.
www.sankhyiki.in
+91-‐9711150002

ANSWERS
1. (i) (a) 𝑙𝑛𝐿 𝜇!" = ! ! −𝜇!" + 𝑦!" 𝑙𝑛𝜇!" + 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 and µμij = yij
! ! !!" ! ! !!"
(b) 𝜇 = 𝑙𝑛 !"
(c) 𝑠𝑐𝑎𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒 = 2 ! ! 𝑦!" 𝑙𝑛𝑦!" − 𝑦!" ln !"
(ii)(a) In Model 1, the number of accidents is constant, ie the number of

accidents is independent of the season and the year. In Model 2, the
number of accidents depends on which year you are in.In Model 3,
the number of accidents depends on both the year and which quarter
of the year it is.
(b) Between model 1 and model 2, the deviance drops by 64.16 but the
difference in the degree of freedom is only 2, so model 2 is better
than model 1. Between model 2 and model 3, the deviance drops by
191.51 but the difference in the degree of freedom is only 3, so model
3 is better than model 2.
Overall model 3 is best on this basis.
(iii) The number of accidents on the motorway has a Poisson distribution. The
log of the mean of this distribution has a linear trend from year to year, so
that the annual increas in the log is the same from one year to the next.
However, within each year there is a seasonal variation effect which is
unrestricted by the model.
! !!
2. (i) 𝜃 = − ! 𝑏 𝜃 = 𝑙𝑛𝜇 = 𝑙𝑛 !
!
𝜙1 𝑎 𝜙 = ! 𝑐 𝑦, 𝜙 = 𝑙𝑛𝑦 + 𝑙𝑛4
(ii) Interpretation
Model1. This model indicates that the mean preparation time for filter coffee is
constant. It is also a fixed value for the other two types of coffee but all three
means are different from each other.
Model 2. This model indicates that there is a constant mean preparation time for
each type of coffee, but that the mean preparation times for cappuccino and
espresso are the same, whereas the mean preparation time for filter is (possibly)
different.
www.sankhyiki.in
+91-‐9711150002

3. (i) We see that the function θ is equal to 1/µ. So the canonical link function is
!
𝑔 𝜇 = !, the inverse function.The minus sign can be absorbed into the linear
predictor, and so is not needed as part of the link function.
! ! ! !
(ii)(b) 𝛼 = 𝑙𝑜𝑔 ! !!! 𝑦! and 𝛽 = 𝑙𝑜𝑔 !!! !!!!! 𝑦!
(iii)The contribution to the scaled deviance from y1 is:
𝑦! 7
2 −𝑙𝑛𝑦! − 1 + 𝛼 + = 2 – 𝑙𝑛7 − 1 + 𝑙𝑛14.2 + = 0.4006
𝑒! 14.2
So the deviance residual is − 0.4006 = −0.633

!!
4. The scaled deviance is = 2 𝑦! 𝑙𝑛 !!
− 𝑦! + 𝜇!
5. (i) 𝛽! = 𝑙𝑜𝑔𝑦! (iii) 𝑦! = 17.45 𝑎𝑛𝑑 𝑦!" = 9
The contribution to the deviance is:
9
2 9𝑙𝑜𝑔 − 9 − 17.45 = 4.982
17.45
6. (i) If w = ny then:
𝑛
𝑃 𝑌 = 𝑦 = 𝑃 𝑊 = 𝑛𝑦 = 𝑛𝑦 𝜇 !" 1 − 𝜇 !!!"
!
(ii).The natural parameter, 𝜃 = 𝑙𝑛 !!! , dispersion parameter 𝜙 = 𝑛
1 1
𝑎 𝜙 = = 𝑏 𝜃 = − ln 1 − 𝜇 = ln 1 + 𝑒 !
𝑛 ∅
𝑛 𝜙
𝑐 𝑦, 𝜙 = ln 𝑛𝑦 = 𝑙𝑛
𝑦𝜙
(iii). 𝑏" = 𝜇(1 − 𝜇)
! ! !!!! !!!!
(iv). Scaled deviance = 2𝑛 !!! 𝑙𝑛 !! !!!!
+ 𝑙𝑛 !!!!
!
!
7. (i)(a) 𝜃 = − ! ⇒ 𝑏 𝜃 = log 𝜇 = − log −𝜃
(ii) A continuous covariate is a quantity whose real numerical value appears in

the linear predictor. For example, age of policyholder. In this case, individual
www.sankhyiki.in
+91-‐9711150002

policyholder’s ages will appear in the corresponding linear predictor, and will
affect the expected claim rate.
A factor is a variable which does not have a numerical value, or whose

valuedoes not enter directly into the linear predictor. For example, if we wish to
analyse by sex, the linear predictor will contain numerical parameters αm and αf
, corresponding to the male effect and the female effect on the mean claim rate.
However, the sex itself does not appear as a numerical variable in the linear
predictor formula, so sex is here being treated as a factor.
(iii) (a) 𝜂 = 𝑎𝑔𝑒 + 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝑣𝑒ℎ𝑖𝑐𝑙𝑒 𝑔𝑟𝑜𝑢𝑝 = 𝛼 + 𝛽𝑥 + 𝛾! + 𝛿!
= 𝛼! + 𝛽𝑥 + 𝛿!
So αi is a parameter related to the policyholder’s sex i, β is a measure of the extra

level of risk per additional year of age, x is the age of the policyholder, and δj is
a parameter related to vehicle group j.
(iii) (b) If there is interaction between age and gender, then the increase in risk
for each year of age is different for males and females. Instead of one β
parameter, we now need two –a βMale and a βFemale . So the linear predictor
now becomes:
𝜂 = 𝑎𝑔𝑒 + 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝑣𝑒ℎ𝑖𝑐𝑙𝑒 𝑔𝑟𝑜𝑢𝑝 + 𝑎𝑔𝑒. 𝑔𝑒𝑛𝑑𝑒𝑟

= 𝛼 + 𝛽𝑥 + 𝛾! + 𝛿! + 𝛼 + 𝛽𝑥 𝛾!
= 𝛼! + 𝛽! 𝑥 + 𝛿!
where 𝛼! = 𝛼 + 𝛼𝛾! 𝑎𝑛𝑑 𝛽! = 𝛽 + 𝛽𝛾! .
Here i is either male or female, and j = 1,2,….., n, where n is the number of

vehicle groups.
! ! ! !
8. (i) log 𝐿 𝑦 𝜇 = !!! 𝑦! 𝑙𝑜𝑔𝜇 − !!! 𝜇 − !!! 𝑙𝑜𝑔𝑦! ! = 𝜃 !!! 𝑦! − 𝑛𝑏 𝜃 + 𝑐
!
!
𝑤ℎ𝑒𝑟𝑒, 𝜃 = log 𝜇, 𝑏 𝜃 = 𝜇 = 𝑒 𝑎𝑛𝑑 𝑐 = − log 𝑦! ! 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑖𝑛𝑣𝑜𝑙𝑣𝑒 𝜃.
!!!
!! !!
(ii) (a) The Pearson residual is defined as where var(µμ) is the variance
!"# !
of Yi with fitted mean,𝜇. Using 𝑦! as the fitted value of µ, the Pearson
!! ! !!
residual becomes
!!
www.sankhyiki.in
+91-‐9711150002

(ii) (b)The distribution of Pearson residuals is ofetn skewed for non-normal
data(eg for Poisson data). This makes the interpretation of residuals plots
and the assessment of goodness of fit by eye very difficult. In contrast,
deviance residuals are more likely to be symmetrically distributed and to
have approximately normal distributions.
(iii) Let 𝑓 𝜃 ∝ 𝑒𝑥𝑝 𝛼𝜃 − 𝛽𝑒 ! represent the prior density function for θ.
𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 ∝ 𝑝𝑟𝑖𝑜𝑟 ×𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
⇔ 𝑓 𝜃 𝑦 ∝ 𝑓 𝜃 ×𝐿 𝑦 𝜃
!
Now, 𝐿 𝑦 𝜃 ∝ exp 𝜃 !!! 𝑦! − 𝑛𝑏 𝜃 from part (i)
Remembering from part(i) that 𝑏 𝜃 = 𝑒 ! , we combine the above

expression to give:
!
𝑓 𝜃 𝑦 ∝ exp 𝛼𝜃 − 𝛽𝑒 ! 𝑒𝑥𝑝 𝜃 𝑦! − 𝑛𝑏 𝜃
!!!
= 𝑒𝑥𝑝 𝛼 + 𝑦! 𝜃 − 𝛽 + 𝑛 𝑒 !
!!!
We note that the posterior density function 𝑓 𝜃 𝑦 is of the same form as

the prior density function f(θ), but with 𝛼 + !!!! 𝑦! and 𝛽 + 𝑛 rather than α
and β. Therefore, the prior distribution of θ is a conjugate prior.
(iv) We note that the posterior ecpectation of b(θ) can be expressed as a

weighted average of the prior expectation of b(θ) and the sample mean:
𝛼 + !!!! 𝑦!
𝐸 𝑏 𝜃 𝑦! , 𝑦! , … . . 𝑦! =
𝛽+𝑛
!
𝛽 𝛼 𝑛 1
= . + . 𝑦!
𝛽+𝑛 𝛽 𝛽+𝑛 𝑛
!!!
𝑛 𝑛
= . 𝐸 𝑏 𝜃 + 𝑦
𝛽+𝑛 𝛽+𝑛
! !
9. (i) 𝜃! = 𝜇! , 𝑏 𝜃! = ! 𝜇!! = ! 𝜃!! , ∅ = 𝜎, 𝑎 ∅ = 𝜎 ! = ∅! ,
www.sankhyiki.in
+91-‐9711150002

𝑦!! 1
𝑐 𝑦! , ∅ = − !
− 𝑙𝑛∅ − 𝑙𝑛2𝜋
2∅ 2
(ii). The natural parameter , θi is simply µi.

!
From part(i), we have 𝑏 𝜃! = ! 𝜃!! Hence: 𝑏 ! 𝜃! = 𝜃!
𝑉 𝜇! = 𝑏" 𝜃! = 1
!
10. (i). 𝜃 = − ! ⟹ 𝑏 𝜃 = 𝑙𝑜𝑔 𝜇 = −𝑙𝑜𝑔 −𝜃
1
∅ = 𝛼 ⟹ 𝑎 𝜃 =
∅
𝑐 𝑦, ∅ = 𝜙 − 1 𝑙𝑜𝑔𝑦 + ∅𝑙𝑜𝑔∅ − 𝑙𝑜𝑔Γ 𝜙
(ii). The canonical link function for the gamma distribution is 1/µ.
11. (i). 𝑙𝑛𝐿 = 𝑙𝑛𝜃!,! + 𝑙𝑛𝜃!,! + ⋯ . +𝑙𝑛𝜃!,!" + 𝑦!,! 𝑙𝑛 1 − 𝜃!,! + … + 𝑦!,!" 𝑙𝑛 1 − 𝜃!,!"
1
𝑖. 𝑒 𝜃!" =
1 + 𝑦!"
!
(ii). 𝑃 𝑌!" = 𝑦 = 𝑒𝑥𝑝 𝑙𝑛 𝜃!" 1 − 𝜃!" = exp 𝑦𝑙𝑛 1 − 𝜃!" − −𝑙𝑛𝜃!"
𝑎 𝜙 = 1, 𝜃 = ln 1 − 𝜃!" , 𝑏 𝜃 = −𝑙𝑛𝜃!", 𝑐 𝑦, 𝜙 = 0
(iii).From the information in the question, θij needs to lie between 0 and 1, which
𝛼 + 𝛽𝑥 is not. This means that it would be inappropriate to set θij equal to
𝛼 + 𝛽𝑥! . A more appropriate relationship is:
!
𝑙𝑛 !!!!" = 𝛼 + 𝛽𝑥! .
!"
12. (i) 𝑔 𝜇 = 𝑙𝑛𝜇
(ii)(a) 𝜂 = 𝛼! + 𝛽𝑥
(ii)(b) 𝜂 = 𝛼! + 𝛽! 𝑥
!
! !!"#$
!
13. (i). 𝑓 𝑥 = 𝑒𝑥𝑝 !/!
+ 𝛼𝑙𝑜𝑔𝛼 − 𝑙𝑜𝑔Γ 𝛼 + 𝛼 − 1 𝑙𝑜𝑔𝑥
www.sankhyiki.in
+91-‐9711150002

!
𝜃 = − ! ⇒ 𝑏 𝜃 = 𝑙𝑜𝑔 𝜇 = −𝑙𝑜𝑔 −𝜃
1
𝜙 = 𝛼 ⇒ 𝑎 𝜙 = 𝑐 𝑦, 𝜙 = 𝜙 − 1 log 𝑦 + 𝜙𝑙𝑜𝑔𝜙 − 𝑙𝑜𝑔Γ(𝜙)
𝜙
15. 𝛽 = 23.21188 and 𝛼 = −185.8794
16. (i) A random variable Y belongs to an exponential family if we can write its PDF
in the following format:
𝑦𝜃 − 𝑏 𝜃
𝑓! 𝑦; 𝜃, 𝜙 = 𝑒𝑥𝑝 + 𝑐 𝑦, 𝜙
𝑎 𝜙
where θ is the natural parameter, φ is the scale parameter and a,b and c are the
functions.
!
! ! ! !" !
!
(ii). 𝑓! 𝜇 = 𝑒𝑥𝑝 !
+ 0. We have:
!
𝜃 = − ! , 𝑏 𝜃 = − ln −𝜃 , 𝑎 𝜙 = 1 𝑎𝑛𝑑 𝑐 𝑦, 𝜙 = 0
18. (ii) Three main components of GLM:
1. A distribution of the response variable.Yi.
2. A linear predictor η, which is a function of the covariates.
3. A link function between the response variable and the linear predictor.
!
(iii) 𝑔 𝜇 = 𝑙𝑛 !!!
(iv) We apply a log transform to the response(to change it to a normal

distribution) and then use the link function for the normal
distribution(which is µ). So, the natural choice of link function is the log
function, lnµ.
! !
19 (i). 𝜃 = − ! , 𝑏 𝜃 = 𝑙𝑛𝜇 = 𝑙𝑛 − ! = ln −𝜃 !! = −𝑙𝑛 −𝜃
1 1
𝜙 = 𝛼, 𝑎 𝜙 = = 𝑎𝑛𝑑
𝛼 𝜙
𝑐 𝑦, 𝜙 = 𝛼𝑙𝑛𝛼 − 𝑙𝑛Γ 𝛼 + 𝛼 − 1 𝑙𝑛𝑦 = 𝜙𝑙𝑛𝜙 − 𝑙𝑛Γ 𝜙 + 𝜙 − 1 𝑙𝑛𝑦
www.sankhyiki.in
+91-‐9711150002

!
The natural parameter is 𝜃 = − !. The canonical link function is 𝑔 𝜇 = 1/𝜇
(ii). We would choose Model C “ Age + location”.

!
20. (i) 𝛼 = 7.25 (ii) 𝜃 = log !!!
, 𝜙 = 1, 𝑎 𝜙 = 𝜙, 𝑏 𝜃 = 𝑙𝑜𝑔𝛼, 𝑐 𝑦, 𝜙 = 0
!
21. (i) 𝜃𝑖𝑗 = !!! (ii) θ = log(1 − θij) is the natural parameter
!"
b(θ) = −logθij = −log[1 − 𝑒 ! ] ϕ = 1 a(ϕ) = 1 c(y, ϕ) = 0
(iii) The Pearson residuals are often skewed for non normal data which makes
the interpretation of residual plots difficult. Deviance residuals are usually more
likely to be symmetrically distributed and are preferred for actuarial
applications.
22. (i) 𝜇! = 2.75 , 𝜇! = 4, 𝜇! = 5
!!!!"!!"
(ii) Assuming µ = µ1 = µ2 = µ3 we have 𝜇 = !" = 3.916667
The difference in scaled deviance is given by
Δ = 2(log L1 + log L2 + log L3 - log L)
= 2(-4𝜇! + 11log 𝜇! - 4𝜇! +16 log𝜇! - 4𝜇! + 20 log𝜇! + 12𝜇 - 47 log𝜇)
[The logarithms of factorials cancel]
= 2(-47 + 11 log2.75 + 16 log4 + 20 log5 + 47 - 47 log3.91667)
= 2.6615
Under H0: µ1 = µ2 = µ3 we have that Δ comes from a 𝜒2 distribution with 3 - 1 = 2
degrees of freedom.
The upper 5% point of the 𝜒!! distribution is 5.991. The observed value is below
this and so there is no evidence to suggest that the underlying parameters are
different for each risk.
23. (i) The saturated model is one where the number of parameters is the same as the
data points, i.e. the fitted values are the same as the fitted data.
(ii) The scaled deviance is twice the difference between the log likelihood values
between the model in consideration and the saturated model.
!!!
(iii) (a) Pearson residuals are where 𝜇 is the fitted response estimator.
!"#(!)
The deviance residuals are sign(y -𝜇)di where di is the contribution of the i-th to
the total deviances, i.e. 𝑑!! is the scaled deviance.
(b) The Pearson residuals tend to be skewed in non normal data while the
deviance residuals tend to be symmetric and hence the normal assumption is
more appropriate.
For that reason the latter is preferred in actuarial applications.
www.sankhyiki.in
+91-‐9711150002

24. (i) A random variable Y belongs to an exponential family if we can write its PDF
in the following format:
𝑦𝜃 − 𝑏 𝜃
𝑓! 𝑦; 𝜃, 𝜙 = 𝑒𝑥𝑝 + 𝑐 𝑦, 𝜙
𝑎 𝜙
where θ is the natural parameter, φ is the scale parameter and a,b and c are the
functions.
Mean = 𝑏 ! 𝜃 Var = 𝑏 !! 𝜃 𝑎(𝜙)
(ii) Skewness = 𝑏 !!! 𝜃 𝑎(𝜙)!
!! !! !
(iv) Var = !
and Skewness = !!
25. (i) Model 1 : 𝛼! + 𝛽! + 𝛾! - 4 parameters Model 2 : 𝛼!" + 𝛾! - 5 parameters
Model 3 : 𝛼!"# - 8 parameters
(ii) Model 1 does not allow for the possibility that there may be interactions
(correlations) between some of the factors. For example, it may be the case that
young drivers tend to drive fast cars and to live in towns.
With Model 3, which is a saturated model, it would be possible to fit the average
values for each group exactly ie there are no degrees of freedom left. This defeats
the purpose of applying a statistical model, as it would not “smooth” out any
anomalous results.
(iii) Normal error stucture means that the randomness present in the observed
values in each category (eg young/fast/town) is assumed to follow a normal
distribution.
The link function is the function applied to the linear estimator to obtain the
predicted values. Associated with each type of error structure is a “canonical” or
“natural” link function. In the case of a normal error structure, the canonical link
function is the identity function g(µ) = µ.
(iv)
www.sankhyiki.in
+91-‐9711150002

Model 2 would be the most appropriate in this case.
26. (i) 𝑔 𝜇! = 𝑙𝑛𝜇!
(ii)
(iii) (5.233, 11.721) Since this confidence interval does not contain zero we are
95% confident that the parameter is non-zero and should be kept.
(iv) T.S. = 6.826, reject H0 and conclude that Model 2 significantly reduces the
scaled deviance (ie it is significantly better fit to the data ) so survival time is
dependent on initial white blood cell count.
27. (i)
SA 𝛼 + 𝛽𝑥 2 238.4
SA+PT 𝛼! + 𝛽𝑥 11 206.7
SA+PT+SA.PT 𝛼! + 𝛽! 𝑥 20 178.3
SA*PT+NB 𝛼! + 𝛽! 𝑥 + 𝐵! 25 166.2
SA*PT*NB 𝛼!" + 𝛽!" 𝑥 120 58.9
(ii) the analyst should choose the SA*PT+NB model

(iii) The analyst should also check:
- that the SA*PT+NB model is a significant improvement when the order is
different, eg add the NB factor before the PT fact
- other models involving these rating factors, eg SA*NB+PT
- the residuals of the proposed model (to ensure that it is a good fit to the data)
- the significance of the parameters of the proposed model (to ensure that all the
estimated parameters are significantly different from zero).
www.sankhyiki.in
+91-‐9711150002

ASSIGNMENT – 13
EMPIRICAL BAYES CREDIBILITY THEORY
1. An insurance company has insured a fleet of cars for the last four years. For year
j (j =1,...4), let Yj and Pj be the total amount claimed and the number of cars in the
fleet respectively. Let Xj = Yj/Pj be the average amount claimed per car in year j.
Assume that the distribution of Xj depends on a risk parameter q and that the
conditions of Empirical Bayes Credibility Theory Model 2 are satisfied.
Let: m(θ) = E(Xj/θ), s2(θ) = Pj V(Xj|θ), m = E[m(θ)], c = V[m(θ)] > 0
(i) (a) Derive E(Xj).
(b) Derive E(XjXk) , for j ≠ k
(c) Determine whether Xj, and Xk are independent ( j ≠ k ).
(ii) The company has insured ten similar fleets over the last four years. Using
the data from these years, m, E[s2(θ)] and V[m(θ)] are estimated to be 62.8,
106.32 and 5.8 respectively.
Calculate next year’s credibility premium for a fleet of cars with claims
over the last four years given below, if the fleet will have 16 cars next year.
Year
1 2 3 4
Total amount claimed 1,000 1,200 1,500 1,400
Number of cars 15 16 18 15
Explain how and why the credibility factor would be affected if the
estimate of V[m(θ)] increases, and comment on the effect on the credibility
premium. [UK Sept 2002]
2. The number of claims on a particular risk in a fixed time period has a Poisson
distribution with mean λ. There were x1 and x2 claims during the first two time
periods. .
Suppose that λ has prior density f(λ) = 2𝑒 !!! (λ > 0). Determine the Bayesian
estimate under quadratic loss for the expected number of claims during the third
time period, and show that it is of the form of a credibility estimate.
www.sankhyiki.in
+91-‐9711150002

3. The total amount claimed for a particular risk in a portfolio is observed for each
of 5 consecutive years.
(i) From past knowledge of similar portfolios, an insurer believes that the
claims are normally distributed with mean θ and variance 25, and that the
prior distribution of θ is normal with mean 125 and variance 36.
(a) Derive the Bayesian estimate for θ under quadratic loss, and show
that it can be written in the form of a credibility estimate combining
the mean observed claim size for this risk with the prior mean for θ.
(b) State the credibility factor, and calculate the credibility premium if
the mean claim size over the 5 years is 122.
(c) Comment on how the credibility factor and the credibility estimate
change if the variance of 25 is increased.
(ii) A second insurer does not believe that this is an appropriate prior
distribution for risks in this portfolio, and decides to use Empirical Bayes
Credibility, Model 1, where the credibility premium combines the mean
for the particular risk with the estimated value of E(m(θ)). Data from 3
risks in this portfolio over 5 years are available. Let Xij be the claim for risk
i in year j. The table shows various summary statistics for the observed
data.
5
Xi
∑ (X ij − X j )2
j=1
Risk 1 122 2,848

Risk2 164 1628
Risk 3 106 1,887
(a) Calculate the estimated credibility factor, and calculate the
credibility premium for risk 1.
(b) Compare your answer to that obtained in (i)(b). [UK April 2004]
4. An insurance company has to estimate the risk premium for the coming year for
a certain risk.
(i) Describe how the credibility approach to calculating the risk premium
differs from a conventional approach.
(ii) State the advantages and disadvantages of using Bayesian estimation and
empirical Bayes credibility theory estimation.
www.sankhyiki.in
+91-‐9711150002

(iii) State the differences between the assumptions in empirical Bayes
credibility theory Model 1 and Model 2, and state why Model 2 is more
likely to be useful in practice. [UK Sept 2004]
5. An actuary has, for three years, recorded the volume of unsolicited advertising
that he receives. He believes that the number of items that he receives follows a
Poisson distribution with a mean which varies according to which quarter of the
year it is. He has recorded Yij the number of items received in the ith quarter of
the jth year (i =1, 2, 3 4 and j = 1, 2, 3). The actuary wishes to estimate the number
of items that he will receive in the 1st quarter of year 4. He has recorded the
following data:
Yi1 Yi2 Yi3 1

Yi = ∑ Yij ∑ (Y ij − Yi ) 2
3 j j
i=1 98 117 124 113 362

i=2 82 102 95 93 206
i=3 75 83 88 82 86
i=4 132 152 148 144 224
(i) Estimate Y1,4 the number of items that the actuary expects to receive in the
first quarter of year 4 using the assumptions of EBCT Model 1.
The actuary believes that, in fact, the volume of items has been increasing at the
rate of 10% per annum.
(ii) Suggest how the approach in (i) can be adjusted to produce a revised
estimate taking this growth into account.
(iii) Calculate the maximum likelihood estimate of Y1,4 (based on the quarter 1
data already observed and the 10% pa increase described above).
(iv) Compare the assumptions underlying the approach in (i) and (ii) with
those underlying the approach in (iii). [UK April 2010]
6. The table below shows aggregate annual claim statistics for three risks over a
period of seven years. Annual aggregate claims for risk i in year j are denoted by
Xij.
www.sankhyiki.in
+91-‐9711150002

Risk, i 1 7 1 7
Xi = ∑ Xij Si2 = ∑ (Xij − Xi ) 2
7 j=1 6 j=1
i=1 127.9 335.1
i=2 88.9 65.1
i= 3 149.7 33.9
(i) Calculate the credibility premium of each risk under the assumptions of
EBCT Model 1.
(ii) Explain why the credibility factor is relatively high in this case.
[UK Sept 2010]
7. An insurance company has collected data for the number of claims arising from
certain risks over the last 10 years. The number of claims in the jth year from the
ith risk is denoted by Xij for i = 1,2,... ,n and j = 1,2.., 10.
The distribution of Xij for j = 1,2,... 10 depends on an unknown parameter θj and
given θi the are independent identically distributed random variables.
(i) Give a brief interpretation of E[s2(θ)} and V[m(θ)] under the assumptions
of Empirical Bayes Credibility Theory Model 1.
(ii) Explain how the value of the credibility factor Z depends on E[s2(θ)] and
V[m(θ)]. [UK April 2011]
8. Five years ago, an insurance company began to issue insurance policies covering
medical expenses for dogs. The insurance company classifies dogs into three risk
categories: large pedigree (category 1), small pedigree (category 2) and non-
pedigree (category 3). The number of claims nij in the ith category in the jth year is
assumed to have a Poisson distribution with unknown parameter θi. Data on the
number of claims in each category over the last 5 years is set out as follows:
Year
! !
!
1 2 3 4 5 𝑛!" 𝑛!"
!!! !!!
Category 1 30 43 49 58 60 240 12,114

Category 2 37 49 58 52 64 260 13,934
Category 3 26 31 18 37 32 144 4,354
Prior beliefs about θ1 are given by a gamma distribution with mean 50 and
variance 25.
www.sankhyiki.in
+91-‐9711150002

(i) Find the Bayes estimate of θ1 under quadratic loss.
(ii) Calculate the expected claims for year 6 of each category under the
assumptions of Empirical Bayes Credibility Theory Model 1.
(iii) Explain the main differences between the approaches in (i) and that
in (ii).
(iv) Explain why the assumption of a Poisson distribution with a
constant parameter may not be appropriate and describe how each
approach might be generalised. [UK Sept 2011]
9. An insurer classifies the buildings it insures into one of three types. For Type 1
buildings, the number of claims per building per year follows a Poisson
distribution with parameter λ. Data are available for the last five years as follows:
Year 1 2 3 4 5
Number of type 1 buildings covered 89 112 153 178 165
Number of claims 15 23 29 41 50
(i) Determine the maximum likelihood estimate of λ based on the data above.
The insurer also has data for the other two types of building for all five years.
Define
Pij = the number of buildings insured in the jth year from type i and
Yij = the corresponding number of claims.
The five years of data can be summarised as follows:
! ! ! ! ! !
𝑌!" 𝑌!" 𝑌!"
Type(i) 𝑃! = 𝑃!" 𝑋! = 𝑃!" − 𝑋! 𝑃!" −𝑋
𝑃! 𝑃!" 𝑃!"
!!! !!! !!! !!!
Type 1 697 0.226686 1.527016 2.502737
Type 2 295 0.237288 0.96605 1.178133
Type 3 515 0.330097 4.53253 6.775614
! ! !!" !
𝑋= !!! !!! ! = 0.264101 where 𝑃 = !!! 𝑃!
There are 191 buildings of Type 1 to be insured in year six.
(ii) Estimate the number of claims from Type 1 buildings in year six using
Empirical Bayes Credibility Theory model 2.
(iii) Explain the main differences between the approaches in parts (i) and (ii).
[UK Sept 2012]
10. An insurance company has a portfolio of building insurance policies. The

company classifies buildings into three types and believes that the number of
claims on buildings of each type follows a Poisson distribution with parameters
www.sankhyiki.in
+91-‐9711150002

as shown:
Type Parameter
1 λ
2 2λ
3 5λ
where λ is an unknown positive constant.
Actual claim numbers over the last five years have been as follows. Here Xij
represents the number of claims from the ith type in the jth year:
(i) Derive the maximum likelihood estimate of λ.

(ii) Estimate the average number of claims per year for each type of building
using EBCT Model 1.
(iii) Comment on the results of parts (i) and (ii).
(iv) Explain the main weakness of the model in part (ii). [UK April 2013]
11. For three years an insurance company has insured buildings in three different
towns against the risk of fire damage. Aggregate claims in the jth year from the
ith town are denoted by Xij for i = 1, 2, 3 and j = 1, 2, 3. The data is given in the
table below.
Town i Year j
1 2 3
1 8,130 9,210 8,870
2 7,420 6,980 8,130
3 9,070 8,550 7,730
Calculate the expected claims from each town for the next year using the
assumptions of Empirical Bayes Credibility Theory model 1. [UK Sept 2014]
12. An insurance company has for five years insured three different types of risk.
The number of policies in the jth year for the ith type of risk is denoted by Pij for i
= 1, 2, 3 and j = 1, 2, 3, 4, 5. The average claim size per policy over all five years
for the ith type of risk is denoted by Xi. The values of Pij and Xi are tabulated
below.
www.sankhyiki.in
+91-‐9711150002

Number of policies Mean claim size
Risk Type i Year 1 Year 2 Year 3 Year 4 Year 5 𝑋!
1 17 23 21 29 35 850
2 42 51 60 55 37 720
3 43 31 62 98 107 900
The insurance company will be insuring 30 policies of type 1 next year and has
calculated the aggregate expected claims to be 25,200 using the assumptions of
Empirical Bayes Credibility Theory Model 2.
Calculate the expected annual claims next year for risks 2 and 3 assuming the
number of policies will be 40 and 110 respectively. [UK April 2015]
13. A shipping insurance company has insured ships for six years, and classifies the
ships it insures into three types.
Let:
Pij be the number of ships insured in the jth year from type i,
Yij be the corresponding number of claims.
The six years of data are summarised as follows:
There are 100 ships of Type 3 to be insured in year seven.

(i) Estimate the number of claims from Type 3 ships in year seven using
empirical Bayes credibility theory (EBCT) Model 2.
The insurance company’s actuary is considering using EBCT Model 1 instead.
(ii) Explain an advantage and a disadvantage of using EBCT Model 1 rather
than EBCT Model 2. [UK Sept 2015]
www.sankhyiki.in
+91-‐9711150002

ANSWERS
1. (i) (a) E(Xj) = E(E(Xj|𝜃)) = E(m(𝜃)) = m.
(b) E(XjXk) = E(E(XjXk|𝜃)) = E(m(𝜃)2) for j≠k
(c) E(XjXk) ≠ E(Xj)E(Xk) for j≠k so Xj and Xk are not independent.
(ii) 1214.84
If the estimate of V(m(𝜃)) increases, then the estimate of Z increases and
relatively more weight is put on the data from this particular fleet.
This happens because an increase V(m(𝜃)) means an increase in the variability
between fleets and so less emphasis on collateral information.
If Z increases, then Z×79.69 + (1 - Z) ×62.8 also increases. The credibility
premium moves closer to 𝑋, and, since this is greater than the estimated value of
m, this implies an increase in the premium.
! !!
2. (i) (a) ! 𝑥 + ! !
3. (i) (a) This is of the form Z𝑥 + (1- Z)125, where 125 is the
prior mean for 𝜃.
(c) If the variance of 25 is increased, then the value of Z would decrease, and the
credibility estimate would move closer to the prior mean. This makes sense, since
increasing this variance means that the claim amounts within each risk are more
variable, and so we should put relatively less weight on past data.
(ii) (a) Z = 0.8818 and credibility premium = 123.03
(b) This is similar to the value obtained in (i), so the assumptions made in the
prior appear not to be inappropriate.
4. (i) Conventional approach uses data from risk itself only. Credibility approach
combines this with information from other sources using a credibility premium.
Z𝑋 + (1- Z)𝜇
(ii) Bayes
Advantage: Not an approximation.
Disadvantage: Have to assume full distribution is correct.
EBCT
Advantage: Can be used when distribution is not known.
Disadvantage: An approximation. May not take account of tail of distribution
www.sankhyiki.in
+91-‐9711150002

(iii)
5. (i)112.75
(ii) The average number of pieces of mail is assumed to be growing each year.
We need to adjust the data to take account of this. Two approaches are:
• Convert the data into “Year 4” values by increasing by 10% p.a. and then
applying the methodology above; OR
• Recognise the lower volume of data in earlier years, by applying a risk volume
to each year and using EBCT model 2. If the risk volume for year 4 is 1, then the
risk volume for year 3 is 1/1.1 and year 2 is 1/1.21 etc.
(iii) 136.32
(iv) The main difference is that the maximum likelihood estimate approach
considers the data for Q1 in isolation, whereas the EBCT approach assumes that
data from other quarters come from a related distribution and so can tell us
something about Q1.
Specifically, the EBCT approach assumes that the mean volume of unsolicited
mail for each quarter is itself a sample from a common distribution. Hence whilst
each quarter has a different mean, they provide some information about the
population from which the mean is drawn.
6. (i) 127.8, 89.6 and 149.1

(ii) The data show that the variation within risks is relatively low (the Si2 are low,
especially for the 2nd and 3rd risks) but there seems to be quite a high variation
between the average claims on the risks.
With the Si2 being low, this variation cannot be explained just by variability
in the claims, and must be due to variability in the underlying parameter.
www.sankhyiki.in
+91-‐9711150002

This means that we can put relatively little weight on the information provided
by the data set as a whole, and must put more on the data from the individual
risks, leading to a relatively high credibility factor.
7. (i) E(s2(θ)) represents the average variability of claim numbers from year to year
for a single risk.
V(m(θ)) represents the variability of the average claim numbers for different risks
i.e. the variability of the means from risk to risk.
(ii) We can see that it is the relative values of E(s2(θ)) and V(m(θ)) that matter. In
particular, if E(s2(θ)) is high relative to V(m(θ)), this means that there is more
variability from year to year than from risk to risk. More credibility can be placed
on the data from other risks leading to a lower value of Z.
On the other hand, if V(m(θ)) is relatively higher this means there is greater
variation from risk to risk, so that we can place less reliance on the data as a
whole leading to a higher value of Z.
8. (i) 48.57 (ii) 47.33, 50.81 and 30.66

(iii) The main differences are that:
• The approach under (i) makes use of prior information about the distribution
of θ1 whereas the approach in (ii) does not.
• The approach under (i) uses only the information from the first category to
produce a posterior estimate, whereas the approach under (ii) assumes that
information from the other categories can give some information about category
1.
• The approach under (i) makes precise distributional assumptions about the
number of claims (i.e. that they are Poisson distributed) whereas the approach
under (ii) makes no such assumptions.
(iv) The insurance policies were newly introduced 5 years ago, and it is therefore
likely that the volume of policies written has increased (or at least not been
constant) over time. The assumption that the number of claims has a Poisson
distribution with a fixed mean is therefore unlikely to be accurate, as one would
expect the mean number of claims to be proportional to the number of policies.
Let Pij be the number of policies in force for risk i in year j. Then the models can
be amended as follows:
The approach in (i) can be taken assuming that that the mean number of claims
in the Poisson distribution is Pijθi.
www.sankhyiki.in
+91-‐9711150002

The approach in (ii) can be generalised by using EBCT Model 2 which explicitly
incorporates an adjustment for the volume of risk.
9. (i) 𝜆 = 0.226686 (ii) 45.16

(iii) The main differences are:
• The approach in (i) uses only the data from type 1 policies; the approach in
(ii) uses a weighted average of the data from type 1 policies and the overall data.
• The approach in (i) makes a precise distributional assumption about claims (i.e.
that they are Poisson distributed). This assumption is not used in approach (ii).
10. (i) 𝜆 = 20.1 (ii) 17.3, 40.9 and 102.6

(iii) The corresponding estimates based on our computed 𝜆 are 20.1, 40.2 and
100.5.
The estimates are remarkably similar. The biggest difference is for type 1
buildings, where the maximum likelihood estimate gives a lower weight to the
data from that risk, but the credibility estimate gives greater weight.
(iv) The main limitation is that the model in (ii) does not take account of the
volume of buildings covered, which will probably vary from year to year.
11. (i) 8587.2, 7724 and 8385.5
12. 30,200 and 97,028
13. (i) 35.60

(ii) Disadvantages of Model 1
Does not make use of the risk volumes
Requires more assumptions about the data
Advantages of Model 1
Requires less information (does not take account of risk volumes)
EBCT Model 1 is likely to be computationally more straightforward

CS1A Workbook For Sept 2020 Exams Sankhyiki

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS1A Workbook For Sept 2020 Exams Sankhyiki

Uploaded by

Copyright:

Available Formats

www.sankhyiki.

5. Revision Assignment - 1…………………………………………………………………57

6. Central Limit Theorem…………………………………………………………………...61

8. Confidence Interval and Hypothesis Testing…………………………………………….87

9. Correlation and Regression………………...…………………………………………...113

10. Revision Assignment - 2……………………………………………………………… 125

12. Random Number Simulation.…………………………………………………………..137

13. Bayesian Statistics and Credibility Theory...…………………………………………...141

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 1

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 2

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 3

(a) 𝑃(2 < 𝑋 ≤ 6) (b) 𝑃(𝑋 = 4) (c) 𝑃(𝑋 ≥ 10)

(d) 𝑃(𝑋 < 4) (e) 𝑃(𝑋 > 4) (f) 𝑃(𝑋 ≥ 4)

Question 6. If X has the distribution function

(a) 𝑃(𝑋 ≤ 3) (b) 𝑃(𝑋 = 3) (c) 𝑃(𝑋 < 3)

(d) 𝑃(𝑋 ≥ 1) (e) 𝑃(−0.4 < 𝑋 < 4) (f) 𝑃(𝑋 = 5)

(g) 𝑃(3 < 𝑋 < 5) (h) 𝑃(3 ≤ 𝑋 < 5) (i) 𝑃(3 ≤ 𝑋 ≤ 5)

Question 8. A random variable X has the following probability function:

(i) Find 𝑘, (ii) Find 𝑃 𝑋 ≥ 5 , 𝑃 3 < 𝑋 ≤ 6 , 𝑃(𝑋 < 4).

i) Find k ii) Evaluate 𝑃 𝑋 < 6 , 𝑃 𝑋 ≥ 6 & 𝑃(0 < 𝑋 < 5)

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 4

(i) Determine constant 𝑎 (ii) 𝑃(𝑋 ≤ 1.5).

Question 13. If X has the probability density function

Find k and 𝑃(0.5 ≤ 𝑋 ≤ 1).

Question 14. 𝑓 𝑥 = 𝑒 !! ; 𝑥 > 0, Find 𝑃(𝑋 > 1).

Question 16. The distribution f𝑥 ! of the random variable X is given by

Find i) 𝑃(𝑋 ≤ 2) ii) 𝑃(1 < 𝑋 < 3) iii) 𝑃(𝑋 > 4)

Question 17. The probability density of the random variable Y is given by

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 5

6𝑥 1 − 𝑥 for 0 < 𝑥 < 1 ! !

(b) Calculate the probability that (0.1 < 𝑋 < 0.5).

x for 0 < 𝑥 < 1

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 6

Question 26. A random variables X which can be used in certain circumstances as a

Calculate the value of the conditional probability 𝑃 𝑋 > 3 𝑋 > 1).

Question 27. The probability density of the random variable Z is given by

(i) Find the value of p.

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 7

Find E(X) and E (𝑋 ! ) and evaluate E (2𝑋 + 1)! .

Question 34. If the probability density of X is given by

2 1 − 𝑥 for 0 < 𝑥 < 1

(ii) And use the result to evaluate E [(2𝑋 + 1)! ].

Question 36. A continuous random variable X has the p.d.f.

Question 38. An insurance company monthly claim are modeled by a continuous

(i) Determine the companies expected monthly claims.

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 8

Find the expected value of g(X) = 𝑒 !! ! .

Question 43. 𝑓 𝑥 = 𝜆𝑒 !!" , 0 < 𝑥 < ∞. Find the variance of X.

Question 44. The probability density function of a random variable X is given by

where k and 𝑎 are positive constants.

(i) Determine the value of k in terms of 𝑎.

(i) Verify that 𝑘 = 0.0002.

Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 9

6𝑥 1 − 𝑥 for 0 < 𝑥 < 1

Question 47. If the probability density of X is given by

Question 48. If the probability density of X is given by

Find the probability density function of 𝑌 = 𝑋 ! .

Question 49. Let X is a continuous random variable with p.d.f.

Find the p.d.f. of 𝑌 = (2𝑋 − 3).

Question 50. If X is a continuous random variable with pdf

where α is a parameter such that α > 0

Calculate the variance of Y, where Y = 2X + 10.