Professional Documents
Culture Documents
CS1A Workbook For Sept 2020 Exams Sankhyiki
CS1A Workbook For Sept 2020 Exams Sankhyiki
in
+91-‐9711150002
INDEX
1. Random Variable………………………………………………………………………….3
2. Probability Distribution………………………………………………………………….16
3. Generating Functions…………………………………………………………………….37
4. Joint Distributions………………………………………………………………………..42
7. Point Estimation………..………………………………………………………………...69
11. Sampling………………………………………………………………………………..133
14. GLM…………….………………………………………………………………………162
15. EBCT………….………………………………………………………………………..182
16. Tables
ASSIGNMENT – 1
RANDOM VARIABLE
Question 1. For each of the following, determine whether the given values can serve
as the probability distribution of a random variable with the given range:
!!!
a) 𝑓 𝑥 = !
for 𝑥 = 1,2,3,4,5;
!!
b) 𝑓 𝑥 = !" for 𝑥 = 1,2,3,4;
!
c) 𝑓 𝑥 = ! for 𝑥 = 1,2,3,4,5;
!
d) 𝑓 𝑥 = ! for 𝑥 = 1,2,3,4;
!!
Question 2. Verify that 𝑓 𝑥 = !(!!!) for 𝑥 = 1,2,3, … , 𝑘 can serve as the probability
distribution of random variables with the given range.
Question 3. For each of the following, determine 𝑐 so that the function can serve as the
probability distribution of a random variable with the given range:
a) 𝑓 𝑥 = 𝑐𝑥 for 𝑥 = 1,2,3,4,5;
𝟓
b) 𝑓 𝑥 = 𝑐 𝒙
for 𝑥 = 1,2,3,4,5;
! !
c) 𝑓 𝑥 = 𝑐 !
for 𝑥 = 1,2,3, ….
!
d) 𝑓 𝑥 = 𝑐𝑥 for 𝑥 = 1,2,3, … , 𝑘
Question 4. For each of the following, determine whether the given values can serve
as the values of a distribution function variable with the range x=1, 2, 3,
and 4;
a) 𝐹 1 = 0.3, 𝐹 2 = 0.5, 𝐹 3 = 0.8 and 𝐹 4 = 1.2;
b) 𝐹 1 = 0.5, 𝐹 2 = 0.4, 𝐹 3 = 0.7 and 𝐹 4 = 1.0;
c) 𝐹 1 = 0.25, 𝐹 2 = 0.61, 𝐹 3 = 0.83 and 𝐹 4 = 1.0;
Question 5. If X has the distribution function
0 for 𝑥 < 1
!
!
for 1 ≤ 𝑥 < 4
!
F(x) = !
for 4 ≤ 𝑥 < 6
!
!
for 6 ≤ 𝑥 < 10
1 for 𝑥 ≥ 10
Question 7. Given that the discrete random variable X has the distribution function
!
; 𝑥 = 1,2,3
𝑓 𝑥 = ! Find 𝐹(𝑥).
0 elsewhere
𝑘𝑒 !!! ; 𝑥 > 0
𝑓 𝑥 =
0 ; otherwise
Question 15: Find a prob. density function for the random variable whose distribution
function is given by
0 for 𝑥 ≤ 0
𝐹 𝑥 = 𝑥 for 0 < 𝑥 < 1
1 for 𝑥 > 1
1 − 1 + 𝑥 𝑒 !! for 𝑥 > 0
𝐹 𝑥 =
0 for 𝑥 ≤ 0
Question 21. The probability density of the continuous random variable X is given by
!
for 2 < 𝑥 < 7
𝑓 𝑥 = !
0 elsewhere
Find 𝑃(3 < 𝑋 < 5).
Question 22. Find the distribution function of the random variable X whose
probability density is given by
!
!
for 0 < 𝑥 < 1
𝑓 𝑥 = ! for 2 < 𝑥 < 4
!
0 elsewhere
Question 23. Find the distribution function of the random variable X whose
probability density is given by
!! !
𝑓 𝑧 = 𝑘𝑧𝑒 for 𝑧 > 0 Find k.
0 for 𝑧 ≤ 0
Question 28. If the probability density of X is given by
!!
𝑓 𝑥 = 2𝑥 for 𝑥 > 1
0 elsewhere
Check whether it’s mean and its variance exists.
Question 29. A random variable X has the following probability distribution
X -2 -1 0 1 2
P(X) 1/6 p 1/4 p 1/6
Question 30. If X is the number of point rolled with a balanced die, find the expected
value of g(X) = 2𝑋 ! + 1.
Question 31. What is the expectation of the sum of points on 2 unbiased dice?
Question 32. A lot of 12 television sets includes 2 with white cords. If three of the sets
are chosen at random for shipment to a hotel, how many sets with white
cords can the shipper expect to send to the hotel?
Question 35. Find the expected value of the random variable Y whose probability
density is given by
!
𝑦 + 1 for − 1 < 𝑦 < 1
𝑓 𝑦 = !
0 elsewhere
Question 37. Certain coded measurements of the pitch diameter of threads of a fitting
have the probability density
!
for 0 < 𝑥 < 1
𝑓(𝑥) !(!!! ! )
0 elsewhere
Find the expected value of this random variable.
𝑒 !! for 𝑥 > 0
𝑓 𝑥 =
0 elsewhere
Question 40. 𝑓 𝑥 = 𝑘(1 + 𝑥 ! )!! , 𝑥 > 0. Find the value of k for which f(x) will be the pdf
of a continuous random variable X. Find F(x).
Question 41. Let X be a random variable denoting the hours of life in an electric light
bulb. Suppose X is distributed with density function
!
𝑓 𝑥 = !""" 𝑒 !!/!""" for 𝑥 > 0. Find the expected life time of such a bulb.
!
Question 42. 𝑓 𝑥 = ! 𝑥 + 1 ; −1 < 𝑥 < 1. Find the variance of X.
𝑘𝑥 1 − 𝑎𝑥 ! , 0 ≤ 𝑥 ≤ 1
𝑓 𝑥 =
0, otherwise
Question 45. A claim size distribution is modeled using a simple distribution with
density of the form
𝑘 100 − 𝑥 , 0 ≤ 𝑥 ≤ 100
𝑓 𝑥 =
0 , otherwise
!! !
𝑓 𝑥 = 2𝑥𝑒 for 𝑥 > 0
0 elsewhere
and 𝑌 = 𝑋 ! , find
a) The probability density function of Y.
b) The distribution function of Y.
!!
!
if x ≤ 0
𝑓 𝑥 = !!!
Find 𝐸[ 𝑋 ].
!
if x > 0
Question 51. The probability density for damage claims X paid by the Automobile
insurance company on collision insurance is given below.
!! !
𝑓(𝑥) = ! (!! !! ! ) for 𝑥 ≤ 𝑎
=0 otherwise
Obtain the mean and variance of X.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
10
www.sankhyiki.in
+91-‐9711150002
Question 52. A claim size distribution is modeled using a simple distribution with
density of the form
𝑘 50 − 𝑥 , 0 ≤ 𝑥 ≤ 50
𝑓 𝑥 =
0, otherwise
(i) Find k.
(ii) Determine the mean of this claim size distribution.
(iii) Calculate the probability that an individual claim size is greater
than 25.
(iv) Calculate the probability that an individual claim size is less than
30 given that it is greater than 25.
Question 53. A random sample of size n is taken from a distribution with probability
density function:
!
𝑓 𝑥 = (!!!)!!! , 0 < 𝑥 < ∞
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
11
www.sankhyiki.in
+91-‐9711150002
Question 56. Let X be a discrete random variable with the following probability
distribution:
X 0 1 2 3
P(X = x) 0.4 0.3 0.2 0.1
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
12
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.10. (i) 1/5 (ii) 1/7 [Hint (ii) P(1< X< 2 𝑋 > 1)
0 𝑥 < 1
!
!"
1 ≤ 𝑥 < 2
!
Ans.11. 𝐹(𝑥) = 2 ≤ 𝑥 < 3
!"
!
!"
3 ≤ 𝑥 < 4
!"
!"
4 ≤ 𝑥 < 5
! !
Ans.12. (i) 𝑎 = ! (ii) !
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
13
www.sankhyiki.in
+91-‐9711150002
!
Ans.16 (i) 1 − 3𝑒 !! (ii) !
(1 − 2𝑒 !! ) (iii) 5𝑒 !!
Ans.27. 𝑘 = 2
Ans.28. E(X) = 2 V(X) = ∞, which is not a finite number and hence it does not exist
Ans.29. (i) p=5/24, (ii) 2, 17/2
Ans.30. 94/3
Ans.31. 𝐸 𝑋 = 7
Ans.32. 1/2
Ans.33. E(X) =11/2, E (𝑋 ! ) = 93/2, E (2X+1)! = 209
Ans.34. 3
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
14
www.sankhyiki.in
+91-‐9711150002
Ans.35. 1/12
Ans.36. (i) 1/42 (ii) 31/8
!
Ans.37. !
log 2
Ans.50. E [|X|] = 1
!!
Ans.51. E(X) = 0 V(X) = !
(4 − 𝜋)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
15
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 2
PROBABILITY DISTRIBUTION
Question 1. Let X ~ 𝐵(𝑛, 𝑝) with 𝑛 = 25 & 𝑃 = 0.2. Find P [𝑋 < 𝜇 − 2𝜎].
Question 6. If 1% Gillette blades are defective, what is the probability that a carton of
50 Gillette blades has at least 2 defective blades?
Question 7. The average no. of calls arriving at a telephone exchange is 30 per hour.
What is the probability that
Question 8. Suppose that flows in plywood occur at an average of one flow per 50
sq.ft. What is the probability that a 4×8 ft. sheet will have?
(i) no flows (ii) at most one flow.
Question 9. An insurance company finds that .005 of the population die from a certain
kind of accident each year. What is the probability that the company must
pay off three or more than 3 of 10,000 insured risk against such accident in
a given year.
Question 10. Assume that the number of fatal car accidents in a certain state obeys a
Poisson distribution with an average of one per day. What is the
probability of more than 10 such accidents in a weak?
Question 11. A die is cast until 6 appear. What is the probability that it must be cast
more than five times?
Question 12. A marks man is required to shoot at a target until he scores 5 bulls eye is
X. The prob. that he hits the bull’s eyes on any trial is 0.3. What is the
prob. that he requires 8 shots?
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
16
www.sankhyiki.in
+91-‐9711150002
Question 13. Find the probability of getting five heads and seven tails in 12 flips of a
balanced coin.
Question 14. Find the probability that seven of 10 persons will recover from a tropical
disease if we can assume independence and the probability is 0.80 that
any one of them will recover from the disease.
!
Question 15. If X has the discrete uniform distribution 𝑓 𝑥 = ! for 𝑥 = 1,2, … , 𝑘, show
!!! ! ! !!
that (a) Its mean is 𝜇 = !
(b) its variance is 𝜎 ! = !"
Question 16. If the probability is 0.40 that a child exposed to a certain contiguous
disease will catch it, what is the probability that the tenth child exposed to
the disease will be the third to catch it?
Question 17. If the probability is 0.75 that an applicant for a driver’s license will pass
the road test on any given try, what is the probability that an applicant
will finally pass the test on the fourth try?
Question 19. Among the 120 applicants for a job, only 80 are actually qualified. If five of
the applicants are randomly selected for an in-depth interview, find the
probability that only two of the five will be qualified for the job by using
Question 20. If 2% of the books bound at a certain bindery have defective binding, use
the Poisson approximation to the binomial distribution to determine the
probability that five of 400 books bound by this bindery will have
defective bindings.
Question 21. Records show that the probability is 0.00005 that a car will have a flat tire
while crossing a certain bridge. Use the Poisson distribution to
approximate the binomial probabilities that, among 10,000 cars crossing
this bridge.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
17
www.sankhyiki.in
+91-‐9711150002
a. exactly two will have a flat tire;
b. at most two will have a flat tire;
Question 22. The average number of trucks arriving on any one day at a truck depot in
a certain city is known to be 12. What is the probability that on a given
day fewer than nine trucks will arrive at the depot?
Question 23. A certain kind of sheet metal has, on the average, five defects per 10
square feet. If we assume a Poisson distribution, what is the probability
that a 15- square foot sheet of the metal will have at least six defects?
Question 24. Derive the formulas for the mean and the variance of the Poisson
distribution by first evaluating E(X) and E [𝑋(𝑋 − 1)].
Question 25. If the probability is 0.75 that a person will believe a rumor about the
transgressions of a certain politician, find the probabilities that
a. The eighth person to hear the rumour will be the fifth to believe
it;
b. The fifteenth person to hear the rumour will be the tenth to
believe it.
Question 26. If the probabilities of having a male or female child or both 0.50, find the
probabilities that
Question 27. When taping a television commercial, the probability is 0.30 that a certain
actor will get his lines straight on any one take. What is probability that he
will get his straight for the first time on the sixth take?
Question 28. Records show that the probability is 0.0012 that a person will get food
poisoning spending a day at a certain state fair. Use the Poisson
approximation to the binomial distribution to find the probability that
among 1,000 persons attending the fair at most two will get food
poisoning.
Question 29. Among the 16 applicants for a job 10 have college degrees. If three of the
applicants are randomly chosen for interviews, what are the probabilities
that
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
18
www.sankhyiki.in
+91-‐9711150002
a. None has a college degree b. One has a college degree;
c. Two has college degrees; d. All three have college degrees;
Question 30. Find the probabilities that the value of a random variable will exceed 4 if it
has a gamma distribution with
a. ∝= 2 and 𝜆 = 3; b. ∝= 3 and 𝜆 = 4;
Question 31. Find the probabilities that random variable having the standard normal
distribution will take on a value
Question 32. Suppose that the time in days between services calls on an office-copying
machine follows an exponential distribution with mean 50 days.
i. What is the probability that the time until the machine again
requires service exceeds 60 days?
ii. Find the probability that the time until the machine again require
service is longer than 50 + 2𝜎, where 𝜎 is the standard deviation of
the distribution.
Question 33. (a) Assume that 40% of the policyholders in a certain metropolitan area
have type A blood. If the distribution of the blood donors among the
policy holders entering a blood bank on any given day is considered
random,
(b) Suppose that the amount of time a customer spends at a cash counter
in a certain office has an exponential distribution with a mean of six
minutes. Find
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
19
www.sankhyiki.in
+91-‐9711150002
ii. The conditional probability that the customer will spend more
than 12 minutes in the cash counter given that the customer has
been there for more than six minutes.
iii. The probability that the customer spends longer than (µ+2𝜎)
minutes where µ and 𝜎 are the mean and standard deviation of the
exponential distribution.
Question 34. The time (in minutes) between telephone calls at an Insurance claims
office has the following exponential distribution:
!
!
𝑓 𝑥 = ! 𝑒 ! ! 0≤𝑥≤∞
Question 35. The average time a subscriber spends reading ‘THE HINDU’ is 49
minutes. Assume that the standard deviation is 16 minutes and that the
times are normally distributed.
Question 36. Phone calls arrive at the rate of 48 per hour at the reception desk for an
insurance company. Find
Question 37. 40% of business travelers carry either a cell phone or a laptop. In a sample
of 15 business travelers,
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
20
www.sankhyiki.in
+91-‐9711150002
ii. What is the probability that at least three of the travellers have a
cell phone or a laptop?
iii. What is the probability that 12 of the travellers have neither a cell
phone nor a laptop?
Question 40. An insurance company found that only 0.01% of the population is
involved in a certain type of accident each year. If its 1000 policyholders
can be regarded as randomly selected from the population, what is the
probability that not more than two of its clients are involved in such
accidents?
Question 41. In a certain metropolitan city the daily consumption of electric power (in
Million Kilowatt Hour (MKH)) may be regarded as a random variable
having Gamma distribution with parameter (3, 1/2). If the power plant
has a daily capacity of 12 MKH, what is the probability that this power
supply will be inadequate on any given day?
Question 42. On the average 8 calls per hours are received in a telephone board.
Assuming that the number of calls received in the board in a given length
of time is a Poisson process, find the probability that
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
21
www.sankhyiki.in
+91-‐9711150002
ii. At least 2 calls in the next 20 minutes.
Question 43. Consumer demand for milk X, in a metropolitan area, is known to follow
a Gamma distribution with p.d.f.
!! ! !!" ! !!!
𝑓 𝑥 = !"
It is given that the average demand is ‘a’ liters and the modal demand is
‘b’ liters (𝑏 < 𝑎).
Question 44. The random variable 𝑌 = Log 𝑋 has N (10, 4) distribution. Find
a) The p.d.f. of X
b) Mean and variance of X
c) 𝑃(𝑋 ≤ 1000)
Question 45. A very crude model for the distribution of claim size, X, in a particular
situation represents X as a ‘discrete random variable, which takes the
values £5,000, £10,000, and £20,000 with probabilities 0.4, 0.5, and 0.1
respectively.
Calculate the probability that of five randomly selected claims, three are
for £5,000 each and the other two are for larger amounts.
Question 48. If 40% of the mice used in an experiment will become very aggressive
within 1 minute after having been administered an experimental drug,
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
22
www.sankhyiki.in
+91-‐9711150002
find the probability that exactly six of the 15 mice that have been
administered the drug will become very aggressive within 1 minute?
Question 49. In a certain city, incompatibility is given as the legal reason in 70% of all
divorce cases. Find the probability that 5 of the next 6 divorce cases files in
this city will claim incompatibility as reason.
Question 50. A social scientist claims that only 50% of all high school seniors capable of
doing college work actually go to college. Assuming that this claim is true,
find the probabilities that among 18 high school seniors capable of doing
college work
Question 51. (a) To reduce the standard deviation of the binomial distribution by half,
what change must be made in the number of trials?
Question 52. A and B play a game in which their chances of winning are in the ratio 3:2.
Find A’s chance of winning at least 3 games out of 5 games played.
Question 53. A coffee connoisseur claims that he can distinguish between a cup of
instant coffee and a cup of percolator coffee 75% of the time. It is agreed
that his claim will be accepted if he correctly identifies at least 5 of the 6
cups. Find his chances of having the claim (i) accepted, (ii) rejected, when
he does have the ability he claims.
Question 54. An irregular six-faced die is thrown and the expectation that in 10 throws
it will give five even numbers is twice the expectation that it will give four
even numbers. How many times in 10,000 sets of 10 throws each, would
you expect it to give no even number?
Question 55. A department in a works has 10 machines, which may need adjustment
from time to time during the day. Three of these machines are old; each
having a probability of 1/11 of needing adjustment during the day, and 7
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
23
www.sankhyiki.in
+91-‐9711150002
are new, having corresponding probabilities of 1/21. Assuming that no
machine needs adjustment twice on the same day, determine the
probabilities that on a particular day
(ii) If just 2 machines need adjustment, they are of the same type.
(i) If he fires 7 times what is the probability of his hitting the target at
least twice?
(ii) How many times must he fire so that the probability of his hitting
the target at least once is greater than 2/3?
Question 57. In a precision bombing attack there is a 50% chance that any one bomb
will strike the target. Two direct hits are required to destroy the target
completely. How many bombs must be dropped to give a 99% chance or
better of completely destroying the target? [Hint: Probability that out of n
bombs, at least two strike the target, is greater than 0.99]
Question 59. With the usual notations, find p for a binomial variate X, if n = 6 and
9P(X=4) = P(X=2).
Question 60. The mean and variance of binomial distribution are 4 and 4/3
respectively. Find P(X>=1).
Question 61. A manufacturer of cotter pins knows that 5% of his product is defective. If
he sells cotter pins in boxes of 100 and guarantees that not more than 10
pins will be defective, what is the approximate probability that a box will
find to meet the guaranteed quality?
Question 62. A car hire firm has two cars, which it hires out day by day. The number of
demands for a car on each day is distributed as a Poisson distribution with
mean 1.5. Calculate the proportion of days on which (i) neither car is used,
and (ii) the proportion of days on which some demand is refused.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
24
www.sankhyiki.in
+91-‐9711150002
Question 63. An insurance company insures 4,000 people against loss of both eyes in a
car accident. Based on previous data, the rates were computed on the
assumption that on the average 10 persons in 1,00,000 will have car
accident each year that result in this type of injury. What is the probability
that more than 3 of the injured will collect on their policy in a given year?
Question 64. A manufacturer, who produces medicine bottles, finds that 0.1% of the
bottles are defective. The bottles are packed in boxes containing 500
bottles. A drug manufacturer buys 100 boxes from the producer of bottles.
Using Poisson distribution, find how many boxes will contain; (i) no
defective and (ii) at least two defectives.
Question 65. Six coins are tossed 6,400 times. Using the Poisson distribution, find the
approximate probability of getting 6 heads r times. [Hint : p = 0.5! and n =
6,400]
Question 66. In a book of 520 pages, 390 typographical errors occur. Assuming Poisson
law for the number of errors per page, find the probability that a random
sample of 5 pages will contain no error.
Question 67. Suppose that the number of telephone calls coming into a telephone
exchange between 10 a.m. and 11 a.m. say, X1 is a random variable with
Poisson distribution with parameter 2. Similarly the number of calls
arriving between 11 a.m. and 12 p.m., say, X2 has a Poisson distribution
with parameter 6. If X1 and X2 are independent, what is the probability
that more than 5 calls come in between 10 a.m. and 12 p.m.?
Question 69. If X and Y are independent Poisson variates with 𝜆, 1 and 2 respectively,
find the probability that X+Y = k.
Question 70. If X is uniformly distributed with mean 1 and variance 4/3, find P(X<0).
Question 71. Subway trains on a certain line run every half hour between mid-night
and six in the morning. What is the probability that a man entering the
station at a random time during this period will have to wait at least
twenty minutes? [Hint : U(0,30) distribution]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
25
www.sankhyiki.in
+91-‐9711150002
Question 72. At Yamuna expressway, the number of cars exceeding the speed limit by
more than 100km/hr is a random variable having Poisson distribution
with 𝜆 = 8.4 for 30 minutes. What is the probability of a waiting time of
less than 5 minutes between cars exceeding the speed limit by more than
100km/hr?
Question 73. Show that if a random a variable has a uniform density with the
parameters a and b, the probability it takes values less than a+p(b-a) is
equal to p.
Question 74. If a random variable X has a uniform density with the parameters a and b,
find its distribution function.
Question 75. Show that if a random variable has an exponential distribution with mean
𝜆, the probability that it will take on a value less – 𝜆.ln(1-p) is equal to p.
Question 76. Suppose that the amount of cosmic radiation to which a person is exposed
when flying by jet across the United States is a random variable having a
normal distribution with mean of 4.35mrem and a standard deviation of
0.59mrem. What is the probability that a person will be exposed to more
than 5.20mrem of cosmic radiation on such a flight?
Question 77. X is a normal variate with mean 30 and S.D. 5. Find the probabilities that
(i) 26<X<40 (ii) X>45 (iii) |X-30| > 5
Question 78. The mean yield for one-acre plot is 662 kgs with a s.d. 32 kgs. Assuming
normal distribution, how many one-acre plots in a batch of 1,000 plots
would you expect to have yield (i) over 700kgs, (ii) below 650 kgs.
Question 79. The local authorities in a certain city install 10,000 electric lamps in the
streets of the city. If these lamps have an average life of 1,000 burning
hours with a standard deviation of 200 hours, assuming the normality,
what number of lamps might be expected to fail
(i) in the first 800 hours (ii) between 800 and 1,200 hours
Question 80. Claim amounts are modeled as an exponential random variable with
mean £1,000.
(i) Calculate the probability that one such claim amount is greater than
£5,000
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
26
www.sankhyiki.in
+91-‐9711150002
(ii) Calculate the probability that a claim amount is greater than £5,000
given that it is greater than £1,000.
Question 81. The ratio of the standard deviation to the mean of a random variable is
called the coefficient of variation.
For each of the following distributions, decide whether increasing the
mean of the random variable increases, decreases or has no effect an the
value of the coefficient of variation:
(a) Poisson with mean λ (b) Exponential with mean µ
(c) Chi-square with ν degrees of freedom
Question 82. Claim sizes are normally distributed about a mean µ = £6,000 and with
standard deviation σ = £1,000.Calculate the probability that a claim is for
more than £7,500: given that it is for more than £6,000.
Question 83. It is assumed that claims arising on an industrial policy can be modeled as
a Poisson process at a rate of 0.5 per year.
(i) Determine the probability that no claims arise in a single year.
(ii) Determine the probability that, in three consecutive years, there is one
or more claims in one of the years and no claims in each of the other two
years.
(ii) Suppose a claim has just occurred. Determine the probability that more
than two years will elapse before the next claim occurs.
Question 84. Claim sizes in a certain insurance situation are modeled by a normal
distribution with mean µ = £30,000 and standard deviation σ = £4,000. The
insurer defines a claim to be a large claim if the claim size exceeds £35,000.
(i) Calculate the probabilities that the size of a claim exceeds:
(a) £ 35,000 and (b) £ 36,000
(ii) Calculate the probability that the size of a large claim (as defined
by the insurer) exceeds £ 36000.
(iii) Calculate the probability that a random sample of 5 claims includes
2 which exceed £ 35,000 and 3 which are less than £ 35,000.
Question 85. The probability that a component in a rocket motor will fail when the
motor is fired is 0.02. To achieve a greater reliability several similar
components are to be fitted in parallel; the motor will then fail only if all
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
27
www.sankhyiki.in
+91-‐9711150002
the individual components fail simultaneously. Determine the minimum
number of components required to ensure that the probability the motor
fails is less than one in a billion (to less than
10-9), assuming that components fail independently.
Question 86. Suppose that in a group of insurance policies (which are independent as
regards occurrence of claims), 20% of the policies have incurred claims
during the last year. An auditor is examining the policies in the group on
by one in random order until two policies with claims are found.
(i) Determine the probability that exactly five policies have to be examined
until two policies with claims are found.
(ii) Find the expected number of policies that have to be examined, until
two policies with claims are found.
Question 88. If log X has a N(𝜇 ,𝜎 ! ) distribution , we say that X has a logN(𝜇, 𝜎 ! )
distribution. If Y ~ log N (10, 4) , calculate P(Y>200,000).
Question 89. An insurance company’s records suggest that experienced drivers (those
aged over 21) submit claims at a rate of 0.1 per year, and inexperienced
drivers (those 21 years old or younger) submit claims at a rate of 0.15 per
year. A driver can submit more than one claim a year. The company has
40 experienced and 20 inexperienced drivers insured with it.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
28
www.sankhyiki.in
+91-‐9711150002
(iv) X is the number of phone calls made before an agent makes the first
sale. The probability that any phone call leads to a sale is 0.01
independently of any other call.
Question 91. Suppose that the distribution of a physical coefficient, X , can be modeled
using a uniform distribution on (0,1) . A researcher is interested in the
distribution of Y, an adjusted form of the reciprocal of the coefficient,
!
where Y = ! − 1.
Question 92. Let X and Y be independent random variables. Let V and W be the random
variables defined by V = max{X, Y} and W = min{X, Y}, i.e., V is the larger,
and W is the smaller, of the observations of X and Y.
Question 93. A coin has two sides, “heads” and “tails”. Such a coin with P(heads) = p is
tossed repeatedly until it lands “heads” for the first time. Let X be the
number of tosses required.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
29
www.sankhyiki.in
+91-‐9711150002
(i) Explain why, for each i = 1, 2, …, n, P(Xi ≥ x) is given by
P(Xi ≥ x) = (1 – p)x−1 , x = 1, 2, … .
Question 95. A secretary is given 100 computer passwords and only one, which is
correct, opens a file. Since the secretary has no information on the correct
password, she tries to open using one of the passwords. She randomly
chooses one and discards it if incorrect until she finds the correct one.
(i) Calculate the probability that she obtains the correct password in
the third attempt.
A security system has been set up so that if three incorrect passwords are
tried before the correct one, the computer file is locked and access to it is
denied.
(ii) Calculate the probability that the secretary will gain access to the
file.
The secretary selects a password tries it and if it does not work, puts it
back with the other passwords before randomly selecting a new password.
(iii) Calculate the probability that the correct password is found on the
tenth attempt.
Question 96. Let 𝑋! , 𝑋! , … 𝑋! be iid random variables from exponential distribution with
parameter λ. Find the pdf of 𝑉 = max (𝑋! , 𝑋! , … 𝑋! ).
Question 97. Derive an iterative/ recursive formula for the probability function of the
Poisson distribution.
[Hint : Result of the form 𝑃 𝑋 = 𝑥 = 𝑘(𝑥, 𝜆)𝑃(𝑋 = 𝑥 − 1)]
Question 99. A random variable has a lognormal distribution with mean 10 and
variance 4. Calculate the probability that the variable will take a value
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
30
www.sankhyiki.in
+91-‐9711150002
between 7.5 and 12.5.
Question 100. The random variable N has a Poisson distribution with parameter 𝜆 and
P(N = 1| N ≥ 1) = 0.4 . Calculate the value of 𝜆 to 2 decimal places.
Question 102. Obtain the recursive relation for the Binomial distribution (n,p) of the
form 𝑃 𝑋 = 𝑥 = 𝑔 𝑥, 𝑛, 𝑝 𝑃 𝑋 = 𝑥 − 1 ; 𝑥 = 1,2,3, … 𝑛; 0 < 𝑝 < 1 where
𝑔 𝑥, 𝑛, 𝑝 is a general function of x, n and p.
Question 103. A sports scientist is building a statistical model to describe the number of
attempts a high jump athlete will have to make until she succeeds in
clearing a certain height for the first time during an indoor sports event.
For this model the scientist considers a geometric distribution with
probability of success p. The cumulative distribution function of the
geometric distribution is given as
𝐹! 𝑥 = 1 − (1 − 𝑝)! , x = 1, 2, 3, …
(i) (a) State the assumptions that the scientist needs to make for
considering this distribution.
(b) Comment on the validity of the assumptions in part (i)(a).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
31
www.sankhyiki.in
+91-‐9711150002
ANSWERS
!"
Ans.1. (0.8)
!
Ans.2. 𝑛 = 25, 𝑃 = !
Ans.3. E(X) = 1
Ans.4. E(X) = Log2
Ans.5. E(X) = 1
Ans.5. (1) B (n, p) (2) P (𝜆) (3) Neg. Binomial
Ans.6. 0.0902
Ans.7. (i) 0.2231 (ii) 0.04202
Ans.8. (i) 𝑒 !!.!" (ii) 𝑒 !!.!" (1 + 0.64)
Ans.9. 𝑃(𝑋 ≥ 3) = 1
Ans.10. 0.09852
Ans.11. 0.401878
Ans.12. 0.02917
Ans.13. 0.1934
Ans.14. 0.20
Ans.16. 0.0645
Ans.17. 0.0117
Ans.18. 0.2880
Ans.19. (a) 0.164 (b) 0.165
Ans.20. 0.093
Ans.21. (a) 0.0758 (b) 0.9856
Ans.22. 0.1550
Ans.23. 0.7586 (Hint 𝜆 = 7.5)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
32
www.sankhyiki.in
+91-‐9711150002
Ans.29. (i) 1/28 (ii)15/56 (iii) 27/56 (iv) 3/14
Ans.30.(a) 0.0001 (b) 0
Ans.31.(a) 0.95728 (b) 0.18943 (c) 0.05674 (d) 0.27235
Ans.32. (i) 0.3012 (ii) 0.0498
Ans.33. (a) (i) P(X=x) = (0.6)!!! (0.4) (ii) E(X) = 1.5, 𝜎 = 3.75
(b) (i) 0.1353 (ii) 0.3679 (iii) E(X) = 6 V(X) = 36 P(X > 18) = 0.049
Ans.34. (i) 3 min (ii) 0.1353 (iii) 0.0855
Ans.35. (i) 0.2451 (ii) 0.1170
Ans.36. (i) 0.1048 (ii) 0.1953
Ans.37. (i) 0.0634 (ii) 0.9729 (iii) 0.0634
Ans.38. (i) 0.3679 (ii) 0.2592
Ans.39. (i) 0.0359 (ii) 0.6179
Ans.40. 0.9998
Ans.41. 0.062
Ans.42. (i) 0.0026 (ii) 0.7452
!!!
Ans.43. (a) Mode = !
(b) V(X) = 𝑎(𝑎 − 𝑏)
! ! ! !"# !!!" !
Ans.44. (a) ! !! !
. exp !
( ! ) ; 𝑥 > 0, (b) E(X) = 162.754, V(X) = 53.598 𝑒 !"
(c) 0.0611
Ans.45. 0.2304
Ans.46. 0.1707
Ans.47. 0.0086
Ans.48. 0.2066
Ans.49. 0.3025
Ans.50. (a) 0.1669 (b) 0.4073 (c) 0.4073
Ans.51. (a) New number of trials are one-fourth of original number of trials
(b) New s.d. is square root of k times the original s.d.
Ans.52. 0.68
Ans.53. (i) 0.534 (ii) 0.466
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
33
www.sankhyiki.in
+91-‐9711150002
Ans.54. Approx 1
Ans.55. (i) 0.016 (ii) 0.044
Ans.56. (i) 0.5550 (ii) n = 4
Ans.57. n = 11
Ans.58. p = 0.2
Ans.59. p = 0.025
Ans.60. 0.99863
Ans.61. 0.9863
Ans.62. (i) 0.2231 (ii) 0.19126
Ans.63. 0.0008
Ans.64. (i) 60.65 (ii) 9.025
! !!"" !""!
Ans.65. 𝑃 𝑋 = 𝑟 = !!
Ans.66. 0.02352
Ans.67. 0.08088
Ans.68. (i) 𝜆=1 (ii) Mean = 1 (iii) Coeff of Skewness = 1
Ans.70. 0.25
Ans.71. 1/3
Ans.72. 0.75
!!!
Ans.74. 𝐹 𝑥 = !!!
Ans.76. 0.0749
Ans.77. (i) 0.7653 (ii) 0.00135 (iii) 0.3174
Ans.78. (i) 117 (ii) 352
Ans.79. (i) 1,587 (ii) 6,826
Ans.80. (i) 0.0067 (ii) 0.0183
Ans.81. (i) Coefficient of variation = 1/ λ (COV decreases as mean 𝜆 increases)
(ii) Coefficient of variation = 1 (No effect)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
34
www.sankhyiki.in
+91-‐9711150002
!
(iii) Coefficient of variation = !
(COV decreases as mean n increases)
Ans.82. 0.13362
Ans.83. (i) 0.60653 (ii) 0.43425 (iii) 0.36788
Ans.84. (i) (a) 0.10565 (b) 0.06681
(ii) 0.6324
(iii) 0.07985
Ans.85. n=6
Ans.86. (i) 0.08192 (ii) 10
Ans.87. (ii) 400k2(1 – P)2 + 400kP(1 – P) + 400P (iii) 0.842
Ans.87. L = 1.8785
Ans.88. 0.1350
Ans.89. 0.08177
Ans.90. (i) 0.94887 (ii) 0.12604 (iii) 0.86663 (iv) 0.07726
!
Ans.92. (iii) CDF = 1 − 𝑒 !!! and Mean = !
Ans.93. (ii) (a) ((1-p)n)y-1 (b) The probability in part(a) implies that Y has the same
distribution as X but with 1- (1-p)n in place of p
Ans.94. 0.632
Ans.95 (i) 0.01 (ii) 0.03 (iii) 0.009135
Ans.96. 𝑓! 𝑣 = 𝑛𝜆𝑒 !!" (1 − 𝑒 !!" )!!!
!
Ans.97. 𝑃 𝑋 = 0 = 𝑒 !! 𝑃 𝑋 = 𝑥 = ! 𝑃 𝑋 = 𝑥 − 1 𝑓𝑜𝑟 𝑥 = 1,2,3, …
Ans.98. Var(U)= 1/3 Var(V)=1/2
The variance is a measure of the spread of values. Both distributions take values
in the range from -1 to +1 and are centred around zero. However, the variance of
V is greater than the variance of U because there is a greater probability of
obtaining the extreme values -1 and +1.
Ans.99. 0.802
Ans.100. 𝜆 ≈ 1.62
! !
! ! !
Ans.101. (i) 𝑓! 𝑦 = ! 𝜆𝑦 !! 𝑒 !!! !
(ii) 𝑐 = 𝜆 𝑎𝑛𝑑 𝛾 = !
Ans.102.
Ans.103. (i ) (a) Needs to assume that each time the athlete tries she independently has
the same probability p of passing the height, i.e. that attempts here are iid.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
35
www.sankhyiki.in
+91-‐9711150002
(b) Given that the attempts are at the same event and on the same day, it is
reasonable to assume that conditions are the same (independence) and that
probability of success does not change.
(ii) (a) (1 − 𝑝)!
(b) The lack of success on the first n jumps is irrelevant – under this model the
chances of success are not any better because there have been n attempts already.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
36
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 3
GENERATING FUNCTIONS
Question 2. Let X follows the Poisson distribution with parameter λ=2. Obtain the
MGF of X and 2X.
Question 3. An unbiased coin is tossed twice. If X denotes the number of heads that
appear, find the MGF of X.
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the cumulant generating function and find mean and variance.
Question 7. The size of a claim, X, which arises under a certain type of insurance
contract, is to be modeled using a gamma random variable with
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
37
www.sankhyiki.in
+91-‐9711150002
!
parameter ! and a, (both> 0) such that the moment generating function of
X is given by
Question 10. Let X is a random variable, which has a Poisson distribution with
parameter µ.
Question 13. The cumulant generating function of a random variable X is given by:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
38
www.sankhyiki.in
+91-‐9711150002
where 𝑀! 𝑡 is the moment generating function.
Determine the mean and variance of the distribution of X.
!
Question 14. The random variable X has an exponential dist. with mean !. It is found
that 𝑀! −𝑏 ! = 0.2. Find b.
Question 15. Consider a negative binomial variable X with probability function given
by:
!!!!!
𝑃 𝑋=𝑥 = !
𝑝! 𝑞 ! , 𝑥 = 0,1,2, … where 0 < p < 1 and q = 1 – p
(i) Show that the moment generating function is given by;
! !
𝑀 𝑡 = !!!! !
𝑓𝑜𝑟 𝑞𝑒 ! < 1
(ii) Determine E(X) and E(X2) by expanding M(t) as a power series as far as
the term in t2, and hence verify that the mean arid variance of X are
!" !"
given by: !
and !! respectively.
Question 16. Claim sizes in a certain insurance situation are modelled by a distribution
with moment generating function M(t) given by: 𝑀 𝑡 = (1 − 10𝑡)!!
Question 17. Consider the discrete random variable X with probability function:
!
𝑓 𝑥 = !!!! , 𝑥 = 0,1,2, …
Qustion 18. The claim amount X in units of £1,000 for a certain type of industrial
policy is modeled as a gamma variable with parameters a =3 and l = 1/4.
!
(i) Use moment generating functions to show that ! 𝑋~𝜒!! .
(ii) Hence use tables to find the probability that a claim amount
exceeds £20,000.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
39
www.sankhyiki.in
+91-‐9711150002
Question 19. Let X be, a random variable with moment generating function Mx(t) and
cumulant generating function CX(t) and let Y = aX + b where a and b are
constants. Let Y have moment generating function MY(t) and cumulant
generating function CY(t).
(ii) Find the coefficient of skewness of Y in the case that MX(t) = (1 – t)-2
and Y = 3X + 2 (you may use the fact that 𝐶!!!! 0 = 𝐸[ 𝑌 − 𝜇! ! ]
Question 20. Let X have a normal distribution with mean m and standard deviation s,
let the 𝑖 !! cumulant of the distribution of X be denoted by 𝑘! . Assuming
the moment generating function of X, determine the values of
𝑘! , 𝑘! 𝑎𝑛𝑑 𝑘! .
Question 21. (i) Determine the moment generating function of the two parameter
exponential random variable X, defined by the probability density
function: 𝑓 𝑥 = 𝜆𝑒 !!(!!!) , 𝑥 ≥ 𝛼 𝑤ℎ𝑒𝑟𝑒 𝜆, 𝛼 > 0 .
! ! !
Question 23. Show that the MGF of normal distribution is given by 𝑀! 𝑡 = 𝑒 !"!!! !
.
Question 24. Use MGFs to show that if X has a Gamma (𝛼, 𝜆) distribution, then 2𝜆X has
a 𝜒 ! !! distribution. Hence, if X is Gamma (20, 0.4), estimate P(X>75).
(ii) Hence find the mean and the variance of X using the moment
generating function in part (i).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
40
www.sankhyiki.in
+91-‐9711150002
ANSWERS
!
Ans.1. (i) k = 4 (ii) 𝐶! 𝑡 = -2log (1−2𝑡); 𝑡 < ! (iii) Mean = 4 and Var = 8
! !!) !! !!)
Ans.2. 𝑀! 𝑡 = 𝑒 !(! 𝑃!! 𝑡 = 𝑒 !(!
!
Ans.3. 𝑀! 𝑡 = ! [1 + 𝑒 ! ]!
!
Ans.4. 𝐾! 𝑡 = −𝛼 log( 1 − !) 𝐾!! 0 = 𝛼/𝜆, 𝐾!!! 0 = 𝛼/𝜆!
! ! !!
Ans.6. (i) !
(ii) Y~𝑈(0,1)
Ans.9. 𝑋! + 𝑋! ~𝑃(𝜇! + 𝜇! )
Ans.11. 2
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
41
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 4
JOINT DISTRIBUTION
Question 1. Determine the value of k for which the function given by
Question 2. Given the values of the joint probability distribution of X and Y shown in the
table
X -1 1
-1 1/8 1/2
Y 0 0 1/4
1 1/8 0
V
5 10 15
1 a 3a 0
U 2 11a 2a 8a
3 a a 3a
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
42
www.sankhyiki.in
+91-‐9711150002
Question 4. If the joint probability distribution of X and Y is given by
X
-‐1
0
1
-‐1
1/6
1/3
1/6
2/3
Y
0
0
0
0
0
1
1/6
0
1/6
1/3
1/3 1/3 1/3
Show that their covariance is zero even though the two random variables are not
independent.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
43
www.sankhyiki.in
+91-‐9711150002
Find the expected value of 𝑔 𝑋 = 𝑋 ! − 5𝑋 + 3.
Question 10. Find the expected value of the random variable X whose probability density is
given by
𝑥 for 0 < 𝑥 < 1
𝑓 𝑥 = 2 − 𝑥 for 1 ≤ 𝑥 < 2
0 elsewhere
Question 11. Let (X, Y) have the joint density
!! !!!
𝑓 𝑥𝑦 = 𝑥𝑒 𝑥 ≥ 0 𝑦 ≥ 0
0 elsewhere
Show that (i) 𝐸(𝑌) does not exist.
!
(ii) 𝐸 𝑌|𝑋 = 𝑥 = ! , 𝑥 > 0
= 0; otherwise
Question 15. Find the joint probability density of the two random variable X and Y whose joint
distribution function is given by
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
44
www.sankhyiki.in
+91-‐9711150002
!
Find 𝑃 𝑋 + 𝑌 < ! .
Find (a) 𝑓! (𝑥) (b) 𝑃[𝑋 + 𝑌 < 0.5] (c) 𝐸 𝑌 𝑋 = 𝑥] (d) cov. (𝑋, 𝑌)
!
Question 20. Let 𝑓 𝑥, 𝑦 = ! 𝑥𝑦 0 < 𝑦 < 𝑥, 0 < 𝑥 < 2.
Question 22. Let the two dimensional random variable(X, Y) have the following joint density
function;
!
𝑓!,! 𝑥, 𝑦 = ! 6 − 𝑥 − 𝑦 ; 0 < 𝑥 < 2, 2 < 𝑦 < 4
= 0 elsewhere
(a) Find 𝐸(𝑌/𝑋 = 𝑥) (b) 𝐸(𝑋𝑌/𝑋 = 𝑥)
(c) show that 𝐸 𝑌 = 𝐸[𝐸(𝑌/𝑋)].
Question 23. The p.d.f. of (X, Y) is
= 0 otherwise
Find (i) 𝐸(𝑌/𝑋 = 𝑥) (ii) 𝐸(𝑋𝑌/𝑋 = 𝑥) (iii) Var (𝑌/𝑋 = 𝑥)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
45
www.sankhyiki.in
+91-‐9711150002
Question 24. Let X and Y have joint probability density function
! !
𝑓 𝑥, 𝑦 = 21𝑥 𝑦 0 < 𝑥 < 𝑦 < 1
0 otherwise
Find (i) 𝐸(𝑋/𝑌 = 𝑦) (ii) 𝑉(𝑋/𝑌 = 𝑦)
Question 25. Let X and Y be two random variables each taking three -1, 0 and 1 and having the
joint probability distribution
Y/X
-‐1
0
1
-‐1
0
0.1
0.1
0
0.2
0.2
0.2
1
0
0.1
0.1
Find (i) the corr. (X, Y) (ii) 𝐸(𝑌/𝑋 = −1) and 𝑉(𝑌/𝑋 = −1)
Question 26. The random variables (X, Y) have the following joint probability mass function
X
y
1
2
3
2
1/12
1/6
1/12
3
1/6
0
1/6
4
0
1/3
0
X
Y
0
1
2
0
1/6
1/3
1/12
1
2/9
1/6
0
2
1/36
0
0
Find (a) Mean values of X and Y (b) Find the covariance between X and Y
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
46
www.sankhyiki.in
+91-‐9711150002
Question 28. Let the joint p.d.f. of (X, Y) be
!
𝑓 𝑥, 𝑦 = ! 𝑥 + 2𝑦 ; 0 < 𝑥, 𝑦 < 1
!
Find (a) Conditional mean of X given 𝑌 = !
!
(b) Conditional variance of X given 𝑌 = !.
Question 30. The joint probability distribution of the amounts X and Y of two commodities
supplied to a market has the probability density function (pdf)
𝑓 𝑥, 𝑦 = 2 exp[ − 𝑥 + 𝑦 ]; 0 ≤ 𝑦 ≤ 𝑥 < ∞
= 0 elsewhere
Derive the conditional expectation 𝐸(𝑋/𝑌 = 𝑦).
Question 32. Suppose that the joint pdf of 𝑋! and 𝑋! is given by
!! !!
𝑓 𝑥! , 𝑥! = 𝑥!! + !
; 0 ≤ 𝑥! ≤ 1, 0 ≤ 𝑥! ≤ 2
= 0; otherwise
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
47
www.sankhyiki.in
+91-‐9711150002
a. Find the marginal pdf’s of X and Y
b. Show that E[Y] does not exist.
c. Calculate E[Y/X] and comment in light of (b).
Question 34. Let 𝑋! + 𝑋! + ⋯ + 𝑋! be iid exponential random variables with mean 1/λ
Question 35. Let Z be a random variable with mean 0 and variance 1, and let X be a random
variable independent of Z with mean 5 and variance 4. Let 𝑌 = 𝑋 − 𝑍.Calculate
the correlation coefficient between X and Y.
Question 36. Consider three random variables X, Y and Z with the same variance 𝜎 ! = 4.
Suppose that X is independent of both Y and Z but Y and Z are correlated, with
correlation coefficient 𝜌!" = 0.5.
Question 37. Suppose that the joint probability distribution of two, random variables X and Y
is given by the following table:
Y
2 4 6
1 0.2 0.0 0.2
X 2 0.0 0.2 0.0
3 0.2 0.0 0.2
(i) Show that X and Yare uncorrelated, but are not independent.
(ii) Leaving the probabilities in the first and third rows of the table the same,
change the entries in the second row so that X and Y are independent.
Question 38. Let X and Y be random variables which each takes values 1 and 2 only.
Calculate E[X|Y = 2], given that E[X] = 6/5, E[X|Y = 1] = 7/6, and P(Y=1)=3/5.
Question 39. The continuous random variables X and Y have a bivariate probability density
function
f(x,y) = 2 0 < x + y < 1, x > 0, y > 0
The conditional distribution of X given Y = y is a uniform distribution with
probability density function:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
48
www.sankhyiki.in
+91-‐9711150002
!
𝑓 𝑥 𝑦 = !!! 0<x<1-y
Question 40. (i) Show that for continuous random variables X and Y:
E(Y) = E[E(Y|X)]
(ii) Suppose that a random variable X has a standard normal distribution, and the
conditional distribution of a Poisson random variable Y given the value of X =
x has expectation g(x) = 𝑥 ! + 1.
Determine E(Y) and var(Y).
Question 41. Consider two random variables X and Y with joint probability density function
(PDF):
!
𝑓 𝑥, 𝑦 = ! (1 − 𝑥𝑦) 0 < x < 1, 0 < y < 1
The marginal PDF of X is given by:
!
𝑓 𝑥 = ! (2 − 𝑥) 0<x<1
with a corresponding marginal PDF for Y by symmetry. (You are not asked to
verify these marginal densities.)
(i) Show that the conditional PDF of Y given X = x is given by:
(!!!")
𝑓 𝑦𝑥 =2 (!!!)
0 < y < 1
Question 42. (i) Let Y be the sum of two independent random variables X1 and X2 that is:
Y = X 1 + X2
Show that the moment generating function (MGF) of Y is the product of the
MGFs of X1 and X2.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
49
www.sankhyiki.in
+91-‐9711150002
(ii) Let X1 and X2 be independent gamma random variables with parameters
(α1,
λ)
and
(α2,
λ), respectively.
Use MGFs to show that Y = X1 + X2 is also a gamma random variable and
specify its parameters.
Question 43.
The number of claims, X, arising on each policy in a certain portfolio depends on
another random variable Y. X is considered to follow a Poisson distribution with
mean Y. The variable Y itself is assumed to have a gamma distribution with
parameters (a,b).
Find expressions for the unconditional moments E(X) and E(X2) using
appropriate conditional moments
Question 44.
Let the random variables (X,Y) have the joint probability density function:
𝑓!,! 𝑥, 𝑦 = exp − 𝑥 + 𝑦 x > 0, y > 0
(i) Derive the marginal probability density functions of X and Y and hence
determine (giving reasons) whether or not the two variables are independent.
(ii) Derive the joint cumulative distribution function FX,Y(x, y).
Question 45. The table below shows a bivariate probability distribution for two discrete
random variables X and Y:
X=0 X=1 X=2
Y=1 0.15 0.20 0.25
Y=2 0.05 0.15 0.20
Find the value of E(X|Y = 2).
Question 46.
Consider two random variables X and Y, for which the variances satisfy
V [X] = 5V[Y] and the covariance cov[X,Y] satisfies cov[X,Y] = V[y].
Let S = X + Y and D = X – Y
(i) Show that the covariance between S and D satisfies
cov[S,D] = 4V[Y].
(i) Calculate the correlation coefficient between S and D.
Question 47. The random variable X and Y have a joint probability density with density
3𝑥, 0 < 𝑦 < 𝑥 < 1
function 𝑓!" 𝑥, 𝑦 =
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(i) Determine the marginal densities of X and Y.
(ii) State, with reasons, whether X and Y are independent.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
50
www.sankhyiki.in
+91-‐9711150002
(iii) Determine E(X) and E(Y)
Question 48. X and Y are discrete random variables with joint distribution given below.
Y = −1 Y=0 Y=1
X =1 0 1/4 0
Question 49. Consider two random variables X and Y with E(X) = 2, V(X) = 4, E(Y) = -3,
V(Y) = 1 and Cov(X,Y) = 1.6. Calculate
Question 50. Claim amounts arising under a particular type of insurance policy are modelled as
having a normal distribution with standard deviation £35. They are also assumed
to be independent from each other. Calculate the probability that two randomly
selected claims differ by more than £100.
Question 51. The random variables X and Y are related as follows: X conditional on Y = y has
a N(2y, y2) distribution. Y has a N(200, 100) distribution.
Question 52. Consider the random variable X taking the value X = 1 if a randomly selected
person is a smoker, or X = 0 otherwise. The random variable Y describes the
amount of physical exercise per week for this randomly selected person. It can
take the values 0 (less than one hour of exercise per week), 1 (one to two hours)
and 2 (more than two hours of exercise per week). The random variable
R = (3 − Y)2(X + 1) is used as a risk index for a particular heart disease.
The joint distribution of X and Y is given by the joint probability function in the
following table.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
51
www.sankhyiki.in
+91-‐9711150002
Y
X 0 1 2
(i) Calculate the probability that a randomly selected person does more than
two hours of exercise per week.
(ii) Decide whether X and Y are independent or not and justify your answer.
Question 53. The random variable X has a Poisson distribution with mean Y, where Y itself is
considered to be a random variable. The distribution of Y is lognormal with
parameters µ and σ2
Derive the unconditional mean E[X] and variance var[X] using appropriate
conditional moments. (You may use any standard results without proof, including
results from the book of Formulae and Tables.)
Question 54. Consider two random variables X and Y with E[X] = 2, V[X] = 4, E[Y] = −3,
V[Y] = 1, and Cov[X, Y] = 1.6.
Calculate:
(a) the expected value of 5X + 20Y.
(b) the correlation coefficient between X and Y.
(c) the expected value of the product XY.
(d) the variance of X − Y.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
52
www.sankhyiki.in
+91-‐9711150002
Question 56. The joint density function of X and Y is given by
!
! !!! /!
𝑓 𝑥, 𝑦 = 𝑦𝑒 !! − ∞ < 𝑥 < ∞, 𝑦 > 0
!!
!
a) State the probability density functions and hence identify the statistical
distributions of
i) Y ii) X conditional on Y = y.
b) Compute E(Y) and Var(Y).
c) Compute E(X|Y = y) and Var(X|Y = y).
d) Hence, compute E(X) and Var(X).
Question 57. Let X and Y be identically distributed and uncorrelated random variables such
that the moment generating of the random variable Z = X+Y is
Value -1 0 1
Question 58. Let X denote the time taken by a worker to complete a specified work in a factory.
For a given worker, the distribution of X is modelled as an exponential
distribution with unknown mean u that varies across the work force. U is treated
as a Uniform random variable over (a, b), i.e.
!
𝑋|𝑈~𝐸𝑥𝑝(!) and 𝑈~𝑈(𝑎, 𝑏).
Find the mean and variance of the marginal distribution of X.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
53
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. 1/36
Ans.5. 30
! !
Ans.7. ! (𝑥 + 1), !
(1 + 4𝑦)
Ans.9. -11/6
Ans.10. 1
Ans.12. Yes
!!!
Ans.13. (ii) 0.04637, (iii) ! , (iv) 7/12, 7/12, 14/12
!!
!
Ans.16. 0.0625
Ans.17. 1/2
Ans.20. No
!!! ! ! !!!!! !
(iii) !
−!( !!! !
)!
! ! !
Ans.24. (i) 𝐸 !
= ! 𝑦 (ii) 𝑉 !
= 𝑦 = (3/80) 𝑦 !
Ans.25. (i) Corr. (XY) = 0 (ii) 𝐸(𝑌/𝑋 = −1) = 0 (iii) 𝑉(𝑌/𝑋 = −1) = 0
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
54
www.sankhyiki.in
+91-‐9711150002
! ! ! !!
Ans.27. (a) 𝐸 𝑋 = ! , 𝐸 𝑌 = ! (b) Cov(XY) = − !" (d) 𝜌 = !
!
Ans.28. (i) 𝐸 𝑋 = ! (ii) Var = 13/162
Ans.29. (a) 𝑓!,! = 6𝑒 !(!!!!!) ; 𝑥 , 𝑦 > 0 (b) 𝑓! 𝑥 = 2𝑒 !!! 𝑥 > 0 ; 𝑓! 𝑦 = 3𝑒 !!! 𝑦 > 0
! !"!!"! ! !"!!"! !"!!"! !
Ans.30. (a) K = 1/26 (b) 𝐸 !
=𝑦 = (!!!!)
(c) 𝑉 !
=𝑦 = !!!!
− !!!!
)
Ans.31. 𝐸 𝑋 𝑌 = 𝑦] = (𝑦 + 1)
!!! !!!
! 0 < 𝑥! ≤ 1, 0 ≤ 𝑥! < 2
Ans.32. (a) 𝑓!! !! (𝑥! 𝑥! ) = !
!!! !!
0 elsewhere
!! !!
(b) 𝐸 𝑋! 𝑋! = 𝑥! = !!! !! (c) E (𝑋! ) = 10/9 = E [E [𝑋! 𝑋! = 𝑥! ]]
!
! !
Ans.33. (a) 𝑓 𝑥 = 𝑒 !! (c) ! 𝑓 𝑦 = (!!!)!
!
Ans.34. (b) 𝐸 𝑌 = ! , 𝑣 𝑦 = 𝜋/𝜆!
Ans.35. 0.894
! !"!! ! !!
Ans.43. E[X] = ! E[𝑋 ! ] = !!
(ii) 𝐹!,! 𝑥, 𝑦 = 1 − 𝑒 !! (1 − 𝑒 !! )
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
55
www.sankhyiki.in
+91-‐9711150002
(ii) Not independent because 𝑓! (𝑥)𝑓! (𝑦) ≠ 𝑓!" (𝑥, 𝑦)
!
(iii) E(X) = 0.75 and E(Y) = !
Ans.49. (a) -50 (b) 0.8 (c) E(XY) = -4.4 (d) V(X-Y) = 1.8
Ans.50. 0.043
Ans.51. 40,500
(iii) R 1 2 4 8 9 18
(iv) 5.95
! ! !
Ans.53. 𝐸 𝑋 = exp (𝜇 + ! 𝜎 ! ) 𝑉 𝑋 = exp 𝜇 + ! 𝜎 ! + exp (2𝜇 + 𝜎 ! )(𝑒 ! − 1)
Ans.54. 𝐹! 𝑧 − 𝐹! 𝑧 − 1
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
56
www.sankhyiki.in
+91-‐9711150002
REVISION ASSIGNMENT – 1
Question 1. An actuarial student has said that the following three distributions are the same:
Question 2. The number of telephone calls per hour on a working day received at an insurance
office follows a Poisson distribution with mean 2.5.
(i) Calculate the probability that more than 7 telephone calls are received on a
working day between 9am and 11am.
(ii) Calculate the probability that, if the office opens at 8am, there are no
telephone calls received until after 9am.
Question 3. The random variable X has an exponential distribution with parameter 𝜆. Use the
moment generating function to determine an expression for E[X 4 ].
Question 4. The random variable X has a beta distribution with parameters 𝛼 = 1 and 𝛽 =4.
Question 5. A large life office has 1,000 policyholders, each of whom has a probability of
0.01 of dying during the next year (independently of all other policyholders).
(i) Derive a recursive relationship for the binomial distribution of the form:
𝑃 𝑋 = 𝑥 = 𝑘𝑔 𝑥 𝑃(𝑋 = 𝑥 − 1)
where k is a constant and g(x) is a function of x
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
57
www.sankhyiki.in
+91-‐9711150002
Question 6. On a portfolio of insurance policies, the claim size, Y is assumed to depend on the
age of the policyholder, X . Suppose that the conditional mean and variance of Y
are:
!!
𝐸 𝑌 𝑋 = 𝑥 = 2𝑥 + 400 𝑉 𝑌 𝑋 = 𝑥 = !
The distribution of X over the portfolio is assumed to be normal with mean 50 and
standard deviation 14.
Calculate the unconditional mean and standard deviation of Y.
Question 7. (i) For a pair of jointly distributed random variables X and Y , derive the
result: 𝑉 𝑋 + 𝑌 = 𝑉 𝑋 + 𝑉 𝑌 + 2𝑐𝑜𝑣(𝑋, Y)
(ii) The random variables X and Y are jointly distributed with standard
deviations of 5 and 7 respectively and corr(X ,Y) = -3/7 . Calculate the
standard deviation of 3X - 2Y + 5.
Question 8. (i) The random variables X and Y have a discrete joint distribution with joint
probability function:
𝑐 𝑥 + 2𝑦 𝑥 = 0,1,2 𝑎𝑛𝑑 𝑦 = 0,1,2
𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 =
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where c is an appropriate constant.
Determine the conditional distribution of X given Y=y for each value of y
Question 9. (i) For a lognormal distribution with mean m and standard deviation s, give
an expression for 𝜇, the mean of the underlying normal distribution.
(ii) Claim amounts for a particular type of medical negligence are lognormally
distributed with mean 15,000 and standard deviation 8,000. Calculate the
probability that the next claim exceeds 20,000.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
58
www.sankhyiki.in
+91-‐9711150002
Question 10. X and Y are discrete random variables. The only possible combinations of
these two variables have the following probabilities:
Y 0 1 2
0 1/2 0 1/16
1 0 1/8 0
2 1/4 1/16 0
(iii) Calculate:
(a) E(X + Y|X = 1)
(b) E(X|Y = 2)
(c) var(X|Y = 2).
(iv) Determine the values of the random variable E[Y2|X] and hence
calculate E[E(Y2|X)] .
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
59
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. By definition, the 𝜒 ! distribution is the same as the Gamma(1,1/2) distribution. The
exponential distribution with mean 1/2 has parameter 2. This is a Gamma(1,2)
distribution, and so is not equivalent to the other two. Therefore the student is wrong.
Ans.3. 24 𝜆!
Ans.4. (i)1/5 (ii) 0.159 (iii) Since the mean is to the right of the median this
suggests that the distribution is positively skewed.
! !!!!!
Ans.5. (i)𝑘 = !!! 𝑔 𝑥 = !
(ii) (a) 0.0000432 (b) 0.997321 (c) 0.00179
!!
Ans.9. (i)𝜇 = 0 ∙ 5𝑙 n !! !! !
(ii) 0.20469 (iii) 3.9
Ans.10. (ii) Always (iii)(a) 7/3 (b) 1/5 (c) 4/25 (iv) 11/8 (v) 11/8
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
60
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 5
Question 1. State the central limit theorem for independent identically distributed random
variables.
In a large population the distribution of a variable has mean 167 and standard
deviation 27 units. If a random sample of size 36 is chosen, find the approximate
probability that the sample mean lies between 163 and 171 units.
Question 3. If the probability is 0.20 that a certain bank will refuse a loan application, using
normal approximation (to three decimal), find the probability that the bank will
refuse at most 40 out of 225 loan applications.
Question 4. A fair die is tossed 180 times. Determine the probability that the face 6 will
appear
Question 6. A random sample of size 100 is taken from an infinite population having mean 76
and variance 256. What is probability that the sample mean 𝑋 will be between 75
and 78?
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
61
www.sankhyiki.in
+91-‐9711150002
Question 7. Let X ≡ N (µ, 𝜎 ! ), then the distribution 𝑋 is
(i) N(𝜇! , 𝜎 ! /𝑛! ) (ii) N(µ, 𝜎 ! /𝑛)
(iii) N(µ, 𝜎 ! /𝑛! ) (iv) N(µ, 𝜎)
Question 8. The number of claims arising in a period of one month from a group of policies
can be modeled by a Poisson distribution with mean 24. Determine the probability
that fewer than 20 claims arise in a particular month.
Question 9. A magazine claims that 25% of its readers are students. A random sample of 200
readers is taken and is found to contain 42 students.
Question 10. Suppose that the sums assured under policies of a certain type are modeled by a
distribution with mean £8,000 and standard deviation £3,000. Consider a group of
100 independent policies of this type.
Calculate the approximate probability that the total sum sure under this group of
policies exceeds £845,000.
Question 11. The probability that a claim is made on a certain type of policy in a particular year
is 0.04. Five hundred policies are selected at random.
Use a suitable normal approximation to calculate the probability that no more than
30 of these will result in a claim during the year.
Question 12. Consider a random sample of size 16 taken from a normal distribution with mean
µ=25 and variance 𝜎 ! = 4. Let the sample mean be denoted 𝑋.
State the distribution of 𝑋 and hence calculate the probability that 𝑋 assumes a
value greater than 26.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
62
www.sankhyiki.in
+91-‐9711150002
Question 14. The occurrence of claims in a group of 2000 policies is modeled such that the
probability of a claim arising in the next year is 0.015 independently for each
policy. Each policy can give rise to a maximum of one claim.
Calculate an approximate value for the probability that more than 40 claims arise
from group of policies in the next year.
Question 16. Let 𝑋!, 𝑋! ,…. , 𝑋!"" be independent random variable, each having a gamma (4, 1)
distribution (and hence with mean 4 and variance 4). Calculate an approximate
value for the probability that the sum of the variables assumes a value, which
exceeds 425.
Question 17. Claim amounts on a certain type of policy are modeled as following a gamma
distribution with parameters 𝛼=120 and 𝜆=1.2.
Question 18. In a certain large population 45% of people have blood group A. A random
sample of 300 individuals is chosen from this population. Calculate an
approximate value for the probability that more than 115 of the sample have
blood group A.
Question 19. In a large portfolio 65% of the policies have been in force for more than five
years. An investigation considers a random sample of 500 policies from the
portfolio. Calculate an approximate value for the probability that fewer than 300
of the policies in the sample have been in force for more than five years.
Question 20. It is known that 24% of the customers in a bank holding a current account also
have another type of account with the bank. Calculate the approximate value for
the probability that fewer than 50 customers in a random sample of 250 customers
with a current account also have another type of account.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
63
www.sankhyiki.in
+91-‐9711150002
Question 21. For a certain class of policies issued by a large insurance company it is believed
that the probability of each policy-giving rise to any claims is 0.5, independently
of all other policies. A random sample of 250 such policies is selected.
Determine approximately the probability that at least 139 of the policies in the
sample will each giving rise to any claims.
Question 22. Consider ten independent random variables X1,…
, X10 which are identically
distributed with an exponential distribution with expectation 4.
(i) Specify the approximate distribution of 𝑋 = !" !!! 𝑋! , including all
parameters, using the central limit theorem.
(ii) Calculate the approximate value of the probability P[X < 40] using the result
in part (i).
(iii) Calculate the exact probability P[X < 40].
(iv) Comment on the answers in parts (ii) and (iii).
Question 23. A computer routine selects one of the integers 1, 2, 3, 4, 5 at random and replicate
the process a total of 100 times. Let S denote the sum of the 100 numbers
selected. Calculate the approximate probability that S assumes a value between
280 and 320 inclusive.
Question 24. For a certain class of business, claim amounts are independent of one another and
are distributed about a mean of µ = £4,000 and with standard deviation σ = £500.
Calculate an approximate value for the probability that the sum of 100 such claim
amounts is less than £407,500.
Question 25. A certain type of claim amount (in units of £1,000) is modelled as an exponential
random variable with parameter λ = 1.25. An analyst is interested in S, the total of
10 such independent claim amounts. In particular he wishes to calculate the
probability that S exceeds £10,000.
(i) (a) Show, using moment generating functions, that:
(1) S has a gamma distribution, and
(2) 2.5S has a 𝜒 ! distribution with 20 degrees of freedom.
(b) Use tables to calculate the required probability.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
64
www.sankhyiki.in
+91-‐9711150002
Question 26. A random sample of size n = 36 has sample standard deviation s = 7. Calculate,
approximately, the probability that the mean of this sample is greater than 44.5
when the mean of the population is µ = 42.
Question 27. Let X1, X2, X3, X4 and X5 be independent random variables, such that Xi~gamma
with parameters i and λ for i = 1, 2, 3, 4, 5. Let 𝑆 = 2𝜆 !!!! 𝑋!
(i) Derive the mean and variance of S using standard results for the mean and
variance of linear combinations of random variables.
(ii) Show that S has a chi-square distribution using moment generating
functions and state the degrees of freedom of this distribution.
(iii) Verify the values found in part (i) using the results of part (ii).
Question 28. An insurance company experiences claims at a constant rate of 150 per year. Find
the approximate probability that the company receives more than 90 claims in a
period of six months.
Question 29. A woodcutter has to cut 100 fence posts of a standard length and he has a metal
bar of the required length to act as the standard. The woodcutter decides to vary
his procedure from post to post − he cuts the first post using the metal standard,
then uses this post as his standard for the cut of the next post. He continues in a
similar manner, each time using the most recently cut post as the standard for the
next cut.
Each time the woodcutter cuts a post there is an error in the length cut relative to
the standard being employed for that cut − you should assume that the errors are
independent observations of a random variable with mean 0 and standard
deviation 3mm.
Calculate, approximately, the probability that the length of the final post differs
from the length of the original metal standard by more than 15mm.
Question 30. Suppose that the time T, measured in days, until the next claim arises under a
portfolio of non-life insurance policies, follows an exponential distribution with
mean 2.
(i) Find the probability that no claim is made in the next one-day period.
(ii) The median of a random variable is defined as the value for which the
cumulative distribution function of the variable is equal to 0.5.
Find the median time until the next claim arises.
(iii) Now let T1, T2, …, T30 be the times (in days) until the next claim arises
under each one of 30 similar portfolios of non-life insurance policies, and
assume that each Ti, i = 1,…,30, follows an exponential distribution with
mean 2,independently of all others.
Calculate, approximately, the probability that the total of all 30 times
which elapse until a claim arises on each of the portfolios exceeds 45 days.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
65
www.sankhyiki.in
+91-‐9711150002
Question 31. A university runs a 3-year B.Sc degree course in Statistics. The course is divided
over 6 semesters each consisting of 5 credit papers over the three year period.
Each credit paper is assessed on a maximum possible 100 marks and is recorded
as integers.
At the end of the course, the university ranks the students based on a measure
called “grade point average” which is the average of marks obtained over all
credit papers examined over 3 years.
Assume that in each subject, the instructor makes an error of quantum k in
!
awarding marks with probability !"|!| 𝑤ℎ𝑒𝑟𝑒 𝑘 = ±1, ±2, ±3, ±4, ±5. Assume
that these errors occur independently.
(a) Show that the probability of no error is 463/600.
(b) State the approximate distribution of quantum of error in a given student’s
final grade point average using the Central limit theorem.
(c) Hence show that there is only 17.7% chance that his final grade point
average is accurate to within ±0.05.
Question 32. It is known from past experience that the daily tip amount, a waiter in a restaurant
gets, is a random variable with mean Rs.100 and standard deviation Rs.10.
(i) Assuming that the number of tips is sufficiently large, calculate the
number of tips required to ensure with at least 0.95 probability that the
average daily tip would exceed Rs.98.
(ii) Given that the waiter gets 64 tips on a particular day and the tip amounts
are independent. What is the probability that the total tips amount is
greater than Rs.6500 on the given day?
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
66
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. 0.62594
Ans.2. 0.065907, 0.637622
Ans.3. 0.22663
Ans.4. 0.58919, 0.044565
!
Ans.5. E(X) = 0, V(X) = !, 0.21375
Ans.6. 0.62836
Ans.7. (ii)
Ans.8. 0.179168
Ans.9. 0.11034
Ans.10. 0.06681
Ans.11. 0.99172
Ans.12. 0.02275
Ans.13. P [𝑌 > 20] = 0.59871
Ans.14. P (𝑋 > 40) = 0.0267
Ans.15. (i) S~N (0, n/12) (ii) S~N (0, 1) if n=12
Ans.16. P (𝑆 > 425) = 0.106
Ans.17. P (𝑋 > 120) = 0.014228
Ans.18. P (𝑋 > 115) = .98986
Ans.19. P (𝑋 < 300) = 0.0084
Ans.20. P (𝑋 < 50) = 0.0599
Ans.21. P (𝑋 ≥ 139) = 0.04385
Ans.22. (i) 𝑋~𝑁(40, 160) (ii) X is symmetric so P[X < 40] = 0.5 (iii) 0.5421
(iv) Although the sample size here is small, the CLT gives an answer which is close to
the exact probability.
Ans.23. 0.853
Ans.24. 0.933
Ans.25. (i)(b) 0.2014 (ii)(a) 𝑆~𝑁(8, 6.4) and 0.214
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
67
www.sankhyiki.in
+91-‐9711150002
(b) n is not particularly large for the use of the CLT, but the approximation is still
quite close to the true probability.
Ans.26. 0.016
Ans.27. (i) E(S) = 30 V(S) = 60
Ans.28. 0.037
Ans.29. 0.617
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
68
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 6
POINT ESTIMATION
Question 1. It is known that a random sample of 12, 11.2, 13.5, 12.3, 13.8, and 11.9 comes
from a population having the following p.d.f
!
; 𝑥 > 1, 𝜃 > 1
𝑓 𝑋; 𝜃 = ! !!! Find
0; otherwise
𝑓 𝑥 = 𝑚 𝜃 − 𝑥 0 ≤ 𝑥 ≤ 𝜃
=0 otherwise
Question 4. Twenty electronic tubes were put to test and the test continued till of all them
failed. The failure times (in hours) were recorded.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
69
www.sankhyiki.in
+91-‐9711150002
9.9
35.5
57.9
94.6
141.4
154.4
163.3
226.7
244.3
337.2
391.8
417.2
444.6
461.2
497.1
582.6
606.8
616.3
672
784.7
Total hours: 6939.5
a. Write down the likelihood function and hence by drawing the rough
sketch of the likelihood function, obtain the maximum likelihood
estimator (MLE) for θ.
b. Examine if the MLE is unbiased.
! ! ! !!! ! !!"
𝑓 𝑥, 𝜃 = !!! !
; 𝑥 ≥ 0, 𝜃 > 0
!!!
Where m is a known integer ≥ 2. Show that !
is an unbiased estimator of θ.
(b) Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from 𝑓! (𝑥) where
𝑓! 𝑥 = 1 + 𝜃 𝑥 ! ; 0 < 𝑥 < 1
= 0 otherwise
Obtain the MLE of θ.
(c) Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from 𝑓! 𝑥 where
𝑓! 𝑥 = 𝜃𝑒 !!" ; 𝑥 > 0
= 0; otherwise
Obtain Cramer-Rao Lower Bound (CRLB) for the variance of the unbiased
estimator of θ, assuming that the regularity conditions are satisfied.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
70
www.sankhyiki.in
+91-‐9711150002
Question 7. Let 𝑋! , 𝑋! , … , 𝑋! be a random sample from the following density function
!"
𝑓 𝑥; 𝜃 = !! ; 0 < 𝑥 < 𝜃, 𝜃 > 0
Question 8. When Ramesh was appointed as the laboratory assistant on 1st Jan 2008 to
observe the lifetime of mice, there were 10 mice in the laboratory. His assignment
was to observe the lifetime of time of the mice till 100 weeks and then estimate
the expected remaining lifetime (in weeks) of mice as at 1st Jan 2008. 7 mice died
within the 100 weeks period at the following times (in weeks)
And 3 mice were alive at the end of 100th week. Assuming that the future life time
(as at 1st Jan 2008) follows Exp (λ) with density function
a) Write down the likelihood function for the sample of 10 life times that
Ramesh observed.
b) Compute the MLE of λ based on this likelihood.
c) What is the asymptotic variance of the MLE?
Ganesh (Ramesh’s boss) is the laboratory in-charge. He knows that the
experiment actually started 25 weeks before 1st Jan 2008 with 15 mice. 5
mice had died before 1st Jan 2008(Ramesh didn’t know about it). Even
Ganesh did not have the exact weeks in which these 5 mice died. He only
knows that they had died before 1st Jan 2008. Based on this new
information Ramesh wanted to correct the likelihood function.
d) What is the correct likelihood function for the life time (starting 25 weeks
before 1st Jan 2008) of 15 mice?
!!
Question 9. Suppose that 𝜃 is an unbiased estimator of a parameter θ and has variance !".
Derive an expression for the mean square error of 𝑘𝜃, where k is a constant, and
determine the value of k for which the mean square error is a minimum.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
71
www.sankhyiki.in
+91-‐9711150002
i. Write down the likelihood function L(p) and show that the maximum
!
likelihood estimator (MLE) of p is 𝑝 = !
ii. (a) Determine the Cramer-Rao lower bound for the estimation of p.
(b) Show that the variance of the MLE is equal to the Cramer-Rao lower
bound.
(c) Write down an approximate sampling distribution for 𝑝 valid for large
n.
Question 11. The number of claims, X which arise in a year on each policy of a particular class
is to be modelled as a Poisson random variable with mean λ. Let
𝑋 = (𝑋! , 𝑋! , … , 𝑋! ) be a random sample of size n from the distribution of X, and
! !
let 𝑋 = ! !!! 𝑋! . Suppose that it is required to estimate λ, the mean number of
claims on a policy.
i. Show that 𝜆, the maximum likelihood estimator of λ is given by 𝜆 = 𝑋.
ii. Derive the Cramer-Rao lower bound (CRLB) for the variance of unbiased
estimators of λ.
iii. (a) Show that 𝜆 is unbiased for λ and that it attain the CRLB.
(b) Explain clearly why, in the case that n is large, the distribution of 𝜆
!
can be approximated by 𝜆~𝑁 𝜆, ! .
Question 12. Claims in a portfolio are believed to arise as an Exp (λ) distribution. There is a
retention limit of 1,000 in force, and claims in excess of 1,000 are paid by the
reinsurer. The insurer, wishing to estimate λ, observe a random sample of 100
claims, and finds that the average amount of the 90 claims that do not exceed
1,000 is 82.9. There are 10 claims that do exceed the retention limit. Find the
MLE for λ.
Question 14. A random sample 𝑥! , 𝑥! , … , 𝑥!" is taken from a distribution having the density
! !/!
function: 𝑓 𝑥 = ! 𝑥 !!/! 𝑒 !!! ,𝑥 > 0
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
72
www.sankhyiki.in
+91-‐9711150002
Determine the:
Question 15. The discrete random variable X has the following probability function:
P(X = x) = 0.2 + αx x = -2, -1, 0, 1, 2.
(i) State the possible values that α can take.
(ii) Given a random sample x1 , x2 , ..., xn from this distribution, determine the
method of moments estimate of α and show that this can result in
inadmissible estimates (i.e. estimates outside the range of possible values
of α).
Question 17. Claim amounts of a certain type are modelled using a normal distribution with an
unknown mean and a known standard deviation σ = £20.
For a random sample of 20 claim amounts all that is known is that 5 of them are
greater than £200.
(i) Let 𝜃 be the probability that a claim amount is greater than £200. Write
down the maximum likelihood estimate of 𝜃.
(ii) Determine 𝜃 in terms of µ and hence calculate the maximum likelihood
estimate of µ.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
73
www.sankhyiki.in
+91-‐9711150002
Question 18. It has been decided to model a claim amount distribution using a gamma
distribution with parameters α = 4 and λ(unknown), that is, with density
!
𝑓 𝑥; 𝜆 = ! 𝜆! 𝑥 ! 𝑒 !!" ; 0 < 𝑥 < ∞
A random sample of n claim amounts, X1, X2, ……,Xn is selected and it is
required to estimate the parameter λ.
(i) Determine the method of moments estimator of λ.
(ii) Show that the MLE of λ is same as the method of moments estimator.
Question 20. The percentage return on an investment of a particular type over a period of one
year is to be modelled as a normal random variable X with mean µ and variance 1.
A potential investor is interested in the chance that the return on such an
investment will exceed 9%.
A random sample of ten such returns have values
7.3, 8.9, 8.3, 6.2, 9.8, 7.7, 9.4 , 7.9, 9.1, and 7.4 .
Calculate the maximum likelihood estimate of θ = P(X > 9).
Question 21. In a quality control test, a random sample of 100 items is selected, of which 90 are
found to be satisfactory. The value of p, the probability of an item being
satisfactory, is unknown.
Write down the probability of observing 90 satisfactory items out of 100 as a
function of p, and thus derive the maximum likelihood estimate of p.
Question 22. A simple model for the movement of a stock price is such that, independently in
each time period, the stock either:
! !
goes up with probability (! − 𝜃); stays the same with probability (! + 2𝜃);
!
goes down with probability (! − 𝜃).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
74
www.sankhyiki.in
+91-‐9711150002
(i) Determine the range of admissible values of the parameter 𝜃.
(ii) (a) Calculate the probability that the stock goes down in one time
period, in the case θ =0.1.
(b) Calculate the probability that the stock stays the same for two
consecutive time periods, in the case θ = 0.
(c) Calculate the probability that, in four time periods, the stock goes
up twice and stays the same twice, in the case θ = -0.2.
(iii) Data are collected for 80 consecutive time periods and yield the following
observed frequencies
change in stock up same down
no. of time periods 24 35 21
(a) Write down an expression for L(𝜃), the likelihood of these data,
!
and show that !!
𝑙𝑜𝑔𝐿 𝜃 = 0 reduces to the equation
5120𝜃 ! − 468𝜃 − 95 = 0
(b) Explain why one of the roots of this quadratic yields the maximum
likelihood estimate of 𝜃 and hence determine this estimate.
Question 23. A random sample of size n is taken from an exponential distribution with
parameter λ, that is, with pdf
𝑓 𝑥 = 𝜆𝑒 !!" , 0 < 𝑥 < ∞
(i) Determine the MLE of λ.
Claim sizes for certain policies are modelled using an exponential distribution
with parameter λ. A random sample of such claims results in the value of the
MLE of λ as 𝜆 = 0.00124.
A large claim is defined as one greater than £4,000 and the claims manager is
particularly interested in p, the probability that a claim is a large claim.
(ii) Determine 𝑝, the MLE of p, explaining why it is the MLE.
Question 24. The size of claims (in units of £1,000) arising from a portfolio of house contents
insurance policies can be modelled using a random variable X with probability
density function (pdf) given by:
!! !
𝑓! 𝑥 = ! !!! 𝑥 ≥ 𝑐
where 𝑎 > 0 𝑎𝑛𝑑 𝑐 > 0 are the parameters of the distribution.
!"
(i) Show that the expected value of X is E[X] = !!!, for a >1.
(ii) Verify that the cumulative distribution function of X is given by
𝐹! 𝑥 = 1 − (𝑐/𝑥)! 𝑥 ≥ 𝑐
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
75
www.sankhyiki.in
+91-‐9711150002
Suppose that for the distribution of claim sizes X, it is known that c = 2.5, but a is
unknown and needs to be estimated given a random sample x1, x2, …, xn.
(iii) Show that the MLE of a is given by:
!
𝑎= ! !!
!!! !"# (!.!)
(iv) Derive the asymptotic variance of the MLE 𝑎, and hence determine its
approximate asymptotic distribution.
Consider a sample of 30 observations from this distribution, for which
!"
!!! log 𝑥! = 32.9
(v) Calculate the MLE 𝑎 in this case, together with an approximate 95%
confidence interval for a.
In the current year, claim sizes are assumed to follow the distribution of X with
a =6, c = 2.5. Inflation for the following year is expected to be 5%.
(vi) Calculate the probability that the size of a claim arising from this portfolio
in the following year will exceed £4,000.
Question 25. A life insurance company runs a statistical analysis of mortality rates. The
company considers a population of 100,000 individuals. It assumes that the
number of deaths X
during one year has a Poisson distribution with expectation
E[X] = µ. Over four years the company has observed the following realisations of
X
(number of deaths).
Year
1 2 3 4
Number
of
deaths
(per
100,000
lives) 1,140 1,200 1,170 1,190
The maximum likelihood estimator for the parameter µ of the Poisson distribution
is given by 𝑋.
(i) Obtain the maximum likelihood estimate of the parameter µ using these
data.
A statistician suggests using a Poisson distribution for the number of deaths per
year in each group, where the parameter μ
depends on the middle age in that
group. Under the suggested model the number of deaths in the group with middle
age ti is given by Xi ~ Poisson(μi) with μi = wti, where ti is the middle age of the
group that the individual belongs to at the time of death.
(ii) Derive a maximum likelihood estimator for the parameter w and estimate
the value of w from the data in the above table if
𝑥! = 1179 𝑎𝑛𝑑 𝑡! = 160.
Question 26. The number of claims made by each policyholder in a certain class of business is
modeled as having a Poisson distribution with mean λ.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
76
www.sankhyiki.in
+91-‐9711150002
(i) Derive an expression for the probability, p, that a policyholder in this class
has made at least one claim.
The claims records of 20 randomly chosen policyholders were examined and the
number of policyholders that made at least one claim in a year, X, was recorded.
(ii) (a) State the distribution of the random variable X and its parameters.
(b) Derive an expression for the maximum likelihood estimator of the
probability p given in (i) using your answer in (ii)(a).
(iii) Show that, in the case X = 5, the maximum likelihood estimate (MLE) of p
is pˆ = 0.25 and hence calculate the MLE of λ.
It is now found that of the five policyholders who had made at least one claim
there were four who had made exactly one claim and one who had made two
claims.
(iv) Calculate the MLE of λ given this additional information.
Question 27. An experiment has three possible outcomes (A, B, C) and a model states that the
probabilities of these outcomes are θ, θ2, and 1 – θ – θ2 respectively, for some
suitable value of θ > 0.
Let nA, nB, and nC be the number of occurrences of outcomes A, B, and C
respectively in n (= nA + nB + nC) repetitions of the experiment. Let logL(θ)
!"#$%(!)
represent the loglikelihood function, and let U(θ)= !"
(i) (a) Show that
!! !!!! !! (!!!!)
𝑈 𝜃 = !
− !!!!! !
(b) Hence find a quadratic equation whose solution gives the
maximum likelihood estimate of 𝜃.
!" !
(ii) (a) Find an expression for !"
.
(b) Hence show that
!"(!) !(!!!!!! ! )
𝐸 − !"
= !(!!!!! ! )
The results of 100 repetitions of the experiment show that outcome A occurred 51
times, outcome B occurred 16 times, and outcome C occurred 33 times.
(iii) (a) Show that the maximum likelihood estimate of θ is 𝜃 = 0.4525.
(b) Calculate an estimate of the asymptotic standard error of 𝜃.
(c) Find an approximate 95% confidence interval for θ.
Question 28. A random sample of size n is taken from a gamma distribution with parameters
α= 8 and λ = 1/θ. The sample mean is 𝑋 and θ is to be estimated.
(i) Determine the method of moments estimator (MME) of θ.
(ii) Find the bias of the MME determined in part (i).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
77
www.sankhyiki.in
+91-‐9711150002
(iii) (a) Determine the mean square error of the MME of θ.
(b) Comment on the efficiency of the MME of θ based on your answer
in part (iii)(a).
Question 29. A regulator wishes to inspect a sample of an insurer’s claims. The insurer
estimates that 10% of policies have had one claim in the last year and no policies
had more than one claim. All policies are assumed to be independent.
(i) Determine the number of policies that the regulator would expect to
examine before finding 5 claims.
On inspecting the sample claims, the regulator finds that actual payments
exceeded initial estimates by the following amounts:
£35 £120 £48 £200 £76
(ii) Find the mean and variance of these extra amounts.
(iii) It is assumed that these amounts follow a gamma distribution with
parameters α and λ. Estimate these parameters using the method of
moments.
Question 30. The random variable X
has a distribution with probability density function given
by
!!
; 0 ≤ 𝑥 ≤ 𝜃
𝑓 𝑥 = !!
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
where θ is the parameter of the distribution.
(i) Derive expressions in terms of θ for the expected value and the variance of
X.
Suppose that X1, X2,..., Xn
is a random sample, with mean 𝑋
, from the distribution
of X.
!!
(ii) Show that the estimator 𝜃 = !
is an unbiased estimator of 𝜃.
Question 31. An actuary is considering statistical models for the observed number of claims, X,
which occur in a year on a certain class of non-life policies. The actuary only
considers policies on which claims do actually arise. Among the considered
models is a model for which
!! !!
𝑃 𝑋 = 𝑥 = !"#(!!!) !
, 𝑥 = 1,2,3 …
where 𝜃 is a parameter such that 0 < 𝜃 < 1.
Suppose that the actuary has available a random sample X1, X2,…..Xn with
sample mean 𝑋.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
78
www.sankhyiki.in
+91-‐9711150002
(i) Show that the method of moments estimator (MME), 𝜃, satisfies the
equation 𝑋(1 − 𝜃) log(1 − 𝜃) + 𝜃 = 0.
(ii) (a) Show that the log likelihood of the data is given by
𝑙 𝜃 ∝ −𝑛 log − log(1 − 𝜃) + !!!! 𝑥! log(𝜃)
(b) Hence verify that the maximum likelihood estimator (MLE) of θ is
the same as the MME.
(iii) Suggest two ways in which the MLE of θ can be computed when a
particular data set is given.
Question 32. In order to estimate a certain probability of success a single observation taken
from the binomial random variable X ~ Bin(20, p)
(i) Write down an expression for the mean square error of the maximum
!
likelihood estimator 𝑝 = !" and evaluate this mean square error at p = 0.5.
(ii) Determine an expression for the mean square error of the estimator
!!!
𝑝= !"
and evaluate this mean square error at p = 0.
(iii) Comment briefly on the comparison of 𝑝 and 𝑝 as estimation the case p = 0.5.
Question 33. Let X1, X2 ,…, Xn be a random sample from a distribution with parameter θ and density
function:
!!
;0 ≤ 𝑥 ≤ 𝜃
𝑓 𝑥 = !!
0; 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Suppose that 𝑥 = (𝑥! , 𝑥! , … , 𝑥! ) is a realization of X1, X2, …, Xn.
(i) (a) Derive the likelihood function 𝐿(𝜃; 𝑥) and produce a rough sketch of its
graph.
(b) Use the graph produced in part (i)(a) to explain why the maximum
likelihood estimate of θ is given by x(n) = max{x1, x2,… , xn}
Let X(n) = max{X1, X2 ,… , Xn} be the estimator of θ, that is the random variable
corresponding to x(n) .
(ii) (a) Show that the cumulative distribution function of the estimator X(n) is
given by:
!
𝐹! ! 𝑥 = (!)!! for 0 ≤ 𝑥 ≤ 𝜃
(b) Hence, derive the probability density function of the estimator X(n).
(c) Determine the expected value E(X(n)) and the variance V (X(n)).
!!!!
(d) Show that the estimator !! 𝑋(!) is an unbiased estimator of θ.
(iii) (a) Derive the mean square error of the estimator given in part (ii)(d).
(b) Comment on the consistency of this estimator.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
79
www.sankhyiki.in
+91-‐9711150002
Question 34. Consider the following discrete distribution with an unknown parameter p for the
distribution of the number of policies with 0, 1, 2, or more than 2 claims per year in a
portfolio of n independent policies.
The following frequencies are observed in a portfolio of n = 200 policies during the
year 2012:
number of claims 0 1 2 more than 2
observed frequency 123 58 13 6
A statistician proposes that the parameter p can be estimated by 𝑝= 58/200 = 0.29 since
p is the probability that a randomly chosen policy leads to one claim per year.
(iii) Estimate the parameter p using the estimator derived in part (i).
(iv) Explain why your answer to part (iii) is different from the proposed estimated
value of 0.29.
An alternative model is proposed where the probability function has the form
number of claims 0 1 2 more than 2
probability p 2p 0.25p 1− 3.25p
(v) Explain how the maximum likelihood estimator suggested in part (i) needs to be
adapted to estimate the parameter p in this new model.
(vi) Suggest a suitable test to use to make a decision about which of the two models
should be used based on empirical data.
Question 35. Let X1, X2, …, X6 be a random sample from a population following a Gamma(2,1)
distribution. Consider the following two estimators of the mean of this distribution:
! !
𝜃! = 𝑋 and 𝜃! = !" 𝑋! + 𝑋! + 𝑋! + !" 𝑋! + 𝑋! + 𝑋!
where 𝑋 is the mean of the sample.
(i) Determine the sampling distribution of 𝑋 using moment generating
functions.
(ii) Derive the bias of each estimator 𝜃! and 𝜃! .
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
80
www.sankhyiki.in
+91-‐9711150002
Question 36. Let (X1, X2, . . . ,Xn) be a random sample from a uniform distribution on the interval
(−𝜃, 𝜃), where is an unknown positive number.
A particular sample of size 5 gives values 0.87, -0.43, 0.12, -0.92, and 0.58.
(i) Draw a rough graph of the likelihood function L(𝜃) against 𝜃 for this sample.
(ii) State the value of the maximum likelihood estimate of 𝜃.
a) Show that:
! ! ! ! !
𝑃 𝑆=𝜃 =! 𝑃 𝑆 =𝜃−! =! 𝑃 𝑆 =𝜃+! =!
b) Calculate the bias of S as an estimator of θ. Is S unbiased?
c) Calculate the mean squared error of S as an estimator of θ.
!
It can be shown that T is an unbiased estimator of θ with variance !.
d) Which of the estimators T or S should Shriya use then if she wants to minimize
the error of her estimation?
Question 38. A certain type of insurance policy has a claim rate of per year and the cover
ceases and the policy expires after the first claim. Accordingly the duration of a
policy is modelled by an exponential distribution with density function
𝜆𝑒 !!" ; 𝑥 > 0.
A company has data on (m + n) policies which have expired and which may be
assumed to be independent. Of these, m policies had duration less than 5 years
and n policies had duration greater than or equal to 5 years.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
81
www.sankhyiki.in
+91-‐9711150002
(i) An investigator makes note of the actual durations, x1,…, xn, of the latter
group of n policies, but ignores the former group without even noting the
value of m.
(a) Explain why the xi’s come from a truncated exponential
distribution with density function 𝑓 𝑥 = 𝑘𝜆𝑒 !!"
and show that 𝑘 = 𝑒 !! .
(b) Write down the likelihood for the data from the point of view of
this investigator and hence show that the maximum likelihood
!
estimate (MLE) of is given by 𝜆 = ! ! !!!
!!! !
(c) The data yield the values: n = 10 and 𝑥! = 71. Calculate this
investigator’s MLE of 𝜆.
(ii) A second investigator ignores the actual policy durations and simply notes
the values of m and n.
(a) Write down the likelihood for this information and hence show that
! !!!
the resulting MLE of is given by 𝜆 = ! 𝑙𝑜𝑔 !
(b) The same data as in part (i) yield the values: m = 120 and n = 10.
Calculate this investigator’s MLE of 𝜆.
(iii) The two investigators decide to pool their data, and so have the
information that there are m policies with duration less than 5 years, and n
policies with actual durations x1, ... , xn.
(a) Explain why the likelihood for this joint information is given by
𝐿 𝜆 = (1−𝑒 !!! )! !!!! 𝜆𝑒 !!!!
and determine an equation, the solution of which will lead to the MLE
of 𝜆.
(b) Given that this leads to an MLE of 𝜆 equal to 0.508, comment on the
comparison of the three MLE’s.
Question 39. Suppose that 𝑋! , 𝑋! , … 𝑋! are independent and identically distributed Poisson(𝜆)
random variables.
(a) Find the maximum likelihood estimator of 𝜆.
(b) Suppose that rather than observing the random variables precisely, only the
events "𝑋! = 0" or "𝑋! > 0" for 𝑖 = 1,2 … 𝑛 are observed. Find the maximum
likelihood estimator of 𝜆 under the new observation scheme.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
82
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. (a) 𝜃 = 0.3970 (b) 𝜃 = 1.087
!(!!!)
Ans.2. (i) 𝑀 = 2/𝜃 ! (ii)𝐸 𝑋 = 𝜃/3, 𝑉 𝑋 = 𝜃 ! /18 (iii) 𝐿 𝜃 = !!
𝜃 ≥ 𝑥
!! !!
Ans.3. (i) 𝜆 = 𝑋 (ii) 𝐸 𝜆 = 𝜆 (iii) CRLB = !
(iv) 𝑉(𝜆) = !
! !!
Ans.4. (i) 𝜆 = ! = 0.0029 (ii) CRLB = ! (iii) 𝜆 = 0.0024
!
Ans.5. (i) 𝐿 𝜃 = 2! /𝜃 !! !!! 𝑥! (ii) 𝐸(𝜃) < 𝜃 𝜃 is biased
!!! !! !!
Ans.6. (a) 𝐸 !
= 𝜃, (b) 𝜃 = !"# !!
−1 (c) CRLB is !
Ans.9. K = 10/11
! ! !!! !(!!!)
Ans.10. (i) 𝑝 = ! (ii) (a) CRLB = !
(b) Var(𝑝) = CRLB (c) 𝑝~𝑁 𝑝, !
! ! !
Ans.11. (i) 𝜆 = 𝑋 (ii) CRLB = ! (iii) (a) E(𝜆) = 𝜆, V(𝜆) = ! (b) 𝜆~𝑁(𝜆, !)
Ans.12. 𝜆 = 0.005154
Ans.13. 𝑐 = 3.425
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
83
www.sankhyiki.in
+91-‐9711150002
!
Ans.18.
(i)
𝜆 = !
!!
Ans.19.
i 𝜆 = !!
(ii)
Both
are
unbiased
! !
Ans.20.
(i)
− !" ≤ 𝜃 ≤ !
(ii)(a)
0.025
(b)
0.391
(c)
0.062
(iii)(b)0.0980
!""
Ans.21.
P(90
satisfactory
items
out
of
100)
=
!"
𝑝!" (1 − 𝑝)!"
and
𝑝 = 0.9
! !
Ans.22.
(i) − ! ≤ 𝜃 ≤ !
(ii)(a)
0.025
(b)
0.391
(c)
0.062
! ! !
(iii) (a)
𝐿 𝜃 = (! − 𝜃)!" (! + 2𝜃)!" (! − 𝜃)!"
(b)
0.189
is
inadmissible
and
-‐0.0980
is
admissible,
therefore
MLE
𝜃 = −0.0980
!
Ans.23.
(i)
𝜆=!
(ii)
𝑝 = 0.0070
! ! !!
Ans.28.
(i)
MME= !
(ii)
Bias
=
0
(iii)
(a)
MSE( ! ) = !!
(iii)(b) MME gets more efficient (MSE gets smaller) as sample size increases
Ans.29. (i) Mean = 5/0.1 = 50 (ii) Mean = 95.8, Variance = 4454.2 (iii) α=2.06 and
λ=0.0215
!! !!
Ans.30.
(i)
Mean
=
!
Variance
=
!"
Ans.31.
(iii)
The equation above needs to be solved numerically. Alternatively, the likelihood
(or log-likelihood) function can be plotted and the maximum can be identified
from the graph.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
84
www.sankhyiki.in
+91-‐9711150002
!(!!!) !"!(!!!) (!!!)!
Ans.32. (i) MSE = !" = 0.0125 (ii) MSE = !"! + !"! = 0.0119
(iii) Even though p is the MLE and is unbiased, p is a more efficient estimate (for p =
0.5) having a smaller mean square error.
!! !! !! …!!
Ans.33. (i) (a) 𝐿 = ! !!
(b) 𝜃 = 𝑥(!) = max {𝑥!, 𝑥!,… 𝑥! }
!!! !!!! !!" !! !
(ii) (b) 𝑓! ! 𝑥 = ! !!
0≤𝑥≤𝜃 𝑐 𝐸 𝑋! = !!!! , 𝑉 𝑋 ! = (!!!)(!!!!)!
!!
(iii) (a) MSE= !!(!!!) (b) We have MSE→0 as n→ ∞, therefore the estimator is
consistent.
!
Ans.34. (i) 𝑝 = !.!"! (iii)0.2985
(iv) The MLE in part (iii) takes the structure of the entire probability function into
account while the estimator 58/200 only considers the number of policies with
one claim.
(v) No change required, since the MLE ˆp turns out to dependent only on the total
number of policies with less than three claims. [1]
2
(vi) χ –test
Ans.35. (i) Gamma(12, 6) (ii) 𝐵𝑖𝑎𝑠 𝜃! = 0 and 𝐵𝑖𝑎𝑠 𝜃! = 0
(iii) 𝑀𝑆𝐸(𝜃! ) = 0.333 and 𝑀𝑆𝐸 𝜃! = 0.547
(iv) 𝜃! has smaller MSE, therefore is more efficient than 𝜃! .
! !
Ans.36. L = !! So, as 𝜃 increases from zero, L(𝜃) is zero until it reaches the largest
observation in absolute value i.e. max |xi|, i = 1, 2, , n. For the data given, this value
is 0.92.
Ans.37. (b) Bias = 4/27 (c) 32/81
(d) S has a lower mean square error than T, Shriya should use the estimator S for
guessing the value of in spite of it being a biased estimator unlike T.
Ans. 38. (i)(a) The xi’s are known to be such that xi > 5, therefore have density which is a scaled
form of 𝜆𝑒 !!" for x>5.
!
The scaling constant k is such that ! 𝑘𝜆𝑒 !!" = 1
(c) 0.476
(ii) (b) 0.513
(iii) (b) All three are re-assuringly close. The pooled estimate is between the first two (as
expected, but it is closer to 0.513).
Ans.39. (i) 𝜆 = 𝑋
!!!
(ii) 𝜆 = −log ( ! ) where m are the number of observations greater than 0
Ans.40. (i) An estimator is a rule, as a function of the random sample, often expressed as a
formula that tells how to calculate the value of an estimate.
• If we have a random sample 𝑋 = (𝑋! , 𝑋! , … 𝑋! ) from a distribution with an
unknown parameter θ and 𝑔 𝑋 is an estimator of θ, it seems desirable that
E [𝑔 𝑋 ] = θ. This is the property of unbiasedness.
• The MSE of an estimator 𝑔 𝑋 for θ is defined by 𝑀𝑆𝐸 𝑔 𝑋 = 𝑉𝑎𝑟 + 𝐵𝑖𝑎𝑠 !
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
85
www.sankhyiki.in
+91-‐9711150002
• It is also desirable that an estimator gets better as the sample size increases i.e. it
is desirable that MSE → 0 as n → ∞. This property is known as consistency.
(ii) Method of moments and MLE
(iii) Steps for finding the maximum likelihood estimator in two parameter cases are:
• Write down the likelihood function, L
• Find log L and simplify the resulting expression
• Partially differentiate log L with respect to each parameter to be estimated
• Set the derivatives equal to zero
• Solve these equations simultaneously.
• Check the condition that the Hessian matrix i.e. the matrix of second
derivatives, is negative definite.
(iv) The CRLB provides a lower bound for the variance of an unbiased estimator as a
function of the true parameter value. It can be used to compare the efficiency of
different estimators.
It also provides an approximate value for the variance of the MLE of a parameter
when the sample size is large. Hence, it may be used to obtain approximate
confidence intervals.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
86
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT - 7
Question 2. Consider the following testing problem. A random variable X has the following
distribution under 𝐻! and 𝐻! .
X:
17
210
𝐻! :
𝑃! :
0.3
0.7
𝐻! :
𝑃! :
0.8
0.2
𝐻! is rejected with probability ‘a’ when X = 17 and with probability ‘b’ when X =
210. Find ‘a’ and ‘b’ such that P (Type I error) = 1.6/3 and P (Type II error) =
0.4/3.
Question 3. (a) A tax preparation firm is interested in comparing the quality of work at two of
its regional offices. Out of 250 tax returns from office A, 35have errors, whereas
out of 300 returns filed in office B, 27 have errors.
Test the hypothesis that the proportions of erroneous return are equal at 1% level.
(b) In a random sample of 100 articles taken from a large batch of articles, 8 are
found to be defective. Obtain 95% confidence interval for the true proportion of
defectives in batch.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
87
www.sankhyiki.in
+91-‐9711150002
Question 4. An insurance company has clients for its automobile policies in two regions A
and B. The company believes that the average claim amounts in both the areas are
not equal. A sample of 8 automobiles for which claims have been made is
selected at random from each of the two areas. The claim amounts (in 000’s) are
as follows.
Automobiles No: 1
2
3
4
5
6
7
8
Area A(x): 49
53
51
52
47
50
52
53
Area B(y): 52
55
52
53
50
54
54
53
Intelligent Dull
Boys 75 82 157
Girls 102 141 243
Total 177 223 400
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
88
www.sankhyiki.in
+91-‐9711150002
Gender
Boys 49 29 78
Girls 31 91 122
Total 80 120 200
Carry out separate test for association between intelligence level and gender
according to the place of their domiciles. Comment briefly on your result.
Question 7. A municipal corporation asks a random sample if 800 people whether they are for
or against a ban on smoking in public places. Each person was classified by
smoking habit and educational level. The results of the sample survey are detailed
in the following table.
Question 8. The following figures gives the price (Rs) of a certain commodity in a sample of
10 shops each selected at random from city A and city B.
City
A:
7.41
7.77
7.44
7.4
7.38
7.93
7.58
8.28
7.23
7.52
City
B:
7.08
7.49
7.42
7.04
6.9
7.22
7.68
7.74
7.28
7.43
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
89
www.sankhyiki.in
+91-‐9711150002
i. Construct a dot plot and comment on the normality of the distribution and
the relative variability’s of two sets of data.
ii. Formally test for the equality of variance of the prices between the two
cities at 5% level.
iii. Assuming equality of variance construct 95% confidence interval for the
presumed common standard deviation of the prices.
iv. Test at the 5% level that the mean price of the commodity in city A is
greater than Rs. 7.40.
Question 9. A social worker interested in reducing obesity among children visited a village
and recorded the weight (in Kg) of 20 children.
The following are the weights of these 20 children
65 62 70 62 64 72 55 50 60 60
70 64 55 63 64 70 56 54 50 69
𝑥 = 1235 𝑥 ! = 77117
a) Plot the data using a plot and comment on whether measurements follow
normal distribution.
b) Calculate 95% confidence interval for the mean weight of obese children
in the village.
During the visit, the social worker advised the children to have routine physical
exercise and prescribed diets to reduce the weights. After two months the social
worker revisited the village and recorded the weights of the same children. The
data of recorded weights are given below in the same order of the children.
62 60 68 62 60 68 60 50 58 61
68 64 56 62 60 68 55 51 55 70
Question 10. A bird watcher sitting in a park has spotted a number of birds belonging to six
categories. The exact classification is given below.
Category 1 2 3 4 5 6
Frequency 6 7 13 14 9 5
Test at 5% level of significance whether or not the data are compatible with the
assumption that the park is visited by the same proportion of birds belonging to
these six categories in the population.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
90
www.sankhyiki.in
+91-‐9711150002
Question 11. In the surgical treatment of duodenal ulcers there are three different operations
corresponding to the removal of various amounts of the stomach. The three
operations are denoted A, B and C with A being the least traumatic and C the
most traumatic.
The data in the following table related to a group of 417 patients and specify the
operation received and the degree of the side effects suffered.
Question 12. The following table gives the length of time required to assemble the device using
standard procedure and new procedure. Two groups of nine employees were
selected randomly, one group using the new procedure and the other following
standard procedure
! !
It is given that (𝑋! − 𝑋! )! = 195. 5556 (𝑋! − 𝑋! )! =160. 2222
!!! !!!
a) Do the data present sufficient evidence to indicate that the mean time to
assemble the device under standard procedure is less than the mean time
under new procedure?
b) Obtain the 95% confidence interval for the difference in mean.
c) Test for the equality of the variance of the two procedures.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
91
www.sankhyiki.in
+91-‐9711150002
Question 13. Let X be a random variable following exponential distribution with density
Question 14. Two thousand individuals were chosen at random by a researcher and cross
classified according to gender and color blindness as given below:
Question 15. Eight pairs of slow learners with similar reading capabilities are identified in a
third grade class. One member of each pair is randomly assigned to the standard
teaching method, while the other is assigned to a new teaching method. The
scores are as given below.
Pair 1
2
3
4
5
6
7
8
New Method 77
74
82
73
87
69
66
80
Old Method 72
68
76
68
84
68
64
76
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
92
www.sankhyiki.in
+91-‐9711150002
a) Test for the difference between mean scores for the two methods.
b) Test for the equality of variance for these two methods.
c) Obtain the 95% of confidence interval for the difference in means.
Question 16. In a random sample of 200 stomachs cancer patients yielded 92 having blood type
A, 20 having blood type B, 4 having blood type AB and 84 having blood type O.
Question 17. A software company has developed a new software package to help the system
analyst working in insurance industries to reduce the time required to design,
develop and implement an information system. To evaluate the benefits of this
new software the insurance company has selected 24 system analysts, out of
which 12 of them were instructed to produce the information system using current
technologies and the rest of them were trained and then were asked to produce the
information system using new software. The data set is as given below.
Current technology (𝑥! ) 300
280
344
385
372
360
288
321
376
290
301
283
New Software (𝑥! ) 276
222
310
338
200
302
317
260
320
312
334
265
i. Stating the hypothesis test that the new software package will provide a
shorter mean project completion time than the current technology? Use
L.S = 0.05.
ii. Test the hypothesis that the variances of the project completion times are
equal. Use L.S = 0.05.
iii. Obtain 90% confidence interval for the difference between the means of
two populations.
Question 18. A training manager of an insurance company wishes to see if there has been any
change in the ability of his trainees after they have been on a course. The trainees
take an aptitude test before they start the course and equivalent one after they
have completed it. The scores are recorded below:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
93
www.sankhyiki.in
+91-‐9711150002
a) Has any change taken place? Test your claim at 5% level.
b) Obtain 95% confidence interval for the mean change in ability of trainees.
c) Compute Pearson’s correlation coefficient between the scores before and
after training and test its significance using t-test at 5% level of
significance.
Question 19. The diameter of steel rods manufactured on two different machines A and B is
studied. Two random samples of sizes 𝑛! = 12 and 𝑛! = 15 are selected and the
sample means and sample standard deviation respectively are
Assuming that the diameter of the rods follow 𝑁(𝜇! , 𝜎!! ) and 𝑁(𝜇! , 𝜎!! ).
Question 20. A textile fiber manufacturer is investigating a new drapery yarn, which the
company claims that the thread elongation this yarn follows normal distribution
with mean 12kg and sd 0.5 kg. The company wishes to test the hypothesis
𝐻! : µ = 12 against 𝐻! : µ <12 using a random sample of 4 specimens.
a) What is the probability of type I error if the critical region used is 𝑥 < 11.5 kg.
b) Find the power for the case in (a) when the true mean elongation is 11.25 kg.
Question 21. Claims on a certain type of policy are such that the claim amounts are
approximately normally distributed.
(i) A sample of 101 such claim amounts (in £) yields a sample mean of £416
and sample standard deviation of £72. For this type of policy:
(a) Obtain a 95% confidence interval for the mean of the claim
amounts.
(b) Obtain a 95% confidence interval for the standard deviation of the
claim amounts.
The company makes various alterations to its policy conditions and thinks that
these changes may result in a change in the mean, but not the standard deviation,
of the claim amounts. It wants to take a random sample of claims in order to
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
94
www.sankhyiki.in
+91-‐9711150002
estimate the new mean amount with a 95% confidence interval equal to sample
mean ± £10.
(ii) Determine how large a sample must be taken, using the following as an
estimate of the standard deviation:
Question 22. A random sample of 60 adult men who live in Leeds includes 21 who have visited
Majorca. An independent random sample of 70 adult women who live in Leeds
includes 28 who have visited Majorca.
Calculate a 98% confidence interval for the proportion of adults who live in Leeds
who have visited Majorca.
Question 23. Consider a random sample X1, …, Xn from a Poisson distribution with expectation
E[Xi] = λ. An estimator 𝜆 for the parameter λ is given by the observed mean of the
!
sample, that is: 𝜆 = ! !!!! 𝑋!
(i) Derive formulae for the expected value & variance of 𝜆 in terms of λ & n.
Assume in parts (ii) to (v) that the true parameter value is λ =0.25
(ii) Calculate the exact probability that 0.2 ≤ 𝜆 ≤ 0.3 if the sample size is n
=10.
(iii) Calculate the approximate probability that 0.2 ≤ 𝜆 ≤ 0.3 if the sample size
is n =10 using the following :
(a) the normal approximation to !!!! 𝑋! with continuity correction.
(b) the normal approximation to !!!! 𝑋! without continuity correction.
(iv) Comment on the difference in your answers in part (ii) and (iii).
(v) Calculate the minimal required sample size n for which the probability
that 0.2≤ 𝜆 ≤ 0.3 is at least 0.95, using the normal approximation without
continuity correction.
Suppose a random sample of size n = 400 gives the estimate 𝜆=0.27.
(vi) Calculate a 95% confidence interval for λ.
Question 24. In a recent study of attitudes to a proposed new piece of consumer legislation
(“proposal X”) independent random samples of 200 men and 200 women were
asked to state simply whether they were “for” (in favour of) , or “against”, the
proposal. The resulting frequencies, as reported by the consultants who carried
out the survey, are given in the following table:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
95
www.sankhyiki.in
+91-‐9711150002
Men Women
For 138 130
Against 62 70
(i) Carry out a formal chi-squared test to investigate whether or not an
association exists between gender and attitude to proposal X.
Note: in this and any later such tests in this question you should state the P-value
of the data and your conclusion clearly.
(b) Discuss the results of the survey for England and Wales separately
and together, quoting relevant percentages to support your
comments.
(iii) A different survey of 200 people conducted in each of England, Wales,
and Scotland gave the following percentages in favour of another
proposal:
Suppose a second survey of the same size is conducted in the three countries and
results in the same percentages in favour of the proposal as in the first survey.
The results of the two surveys are now combined, giving a survey based on the
attitudes of 1,200 people.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
96
www.sankhyiki.in
+91-‐9711150002
(a) State (or find) the results of a second chi-squared test for an association
between country and attitude to the proposal, based on the overall survey
of 1,200 people.
(b) Comment briefly on the results.
Question 25. In a random sample of 200 people taken from a large population of adults, 70
people intend to vote for party A at the next election.
Question 26. A random sample of 200 email messages was selected from all messages
delivered through an Internet provider company. Each message is monitored for
the presence of computer viruses. It is assumed that each message contains a virus
with the same probability p, independently from all other messages.
Let Yi , i =1,… , 200 be indicator random variables taking the value 1 if message i
contains a virus, and 0 otherwise. Also, let Y denote the total number of messages
in this sample found to contain viruses, i.e.𝑌 = !""
!!! 𝑌!
(i) Derive expressions for the expected value and the variance of Y in terms
of the parameter p, using the indicator variables Y1, Y2,… , Y200.
Question 27. Consider a random sample X1,…, Xk of size k = 400 . Statistician A wants to use a
χ2 -test to test the hypothesis that the distribution of Xi is a binomial distribution
with parameters n = 2 and unknown p based on the following observed
frequencies of outcomes of Xi :
Possible realisation of Xi 0 1 2
Frequency 90 220 90
(i) Estimate the parameter p using the method of moments.
(ii) Test the hypothesis that Xi has a binomial distribution at the 0.05
significance level using the data in the above table and the estimate of p
obtained in part (i).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
97
www.sankhyiki.in
+91-‐9711150002
Statistician B assumes that the data are from a binomial distribution and wants to
test the hypothesis that the true parameter is p0 = 0.5.
(iii) Explain whether there is any evidence against this hypothesis by using the
estimate of p in part (i) and without performing any further calculations.
Statistician C wants to test the hypothesis that the random variables Xi have a
binomial distribution with known parameters n= 2 and p = 0.5.
(iv) Write down the null hypothesis and the alternative hypothesis for the test
in this situation.
(v) Carry out the test at the significance level of 0.05 stating your decision.
(vi) Explain briefly the relationship between the test decisions in parts (ii), (iii)
and (v), and in particular whether there is any contradiction.
Question 28. Analyst A collects a random sample of 30 claims from a large insurance portfolio
and calculates a 95% confidence interval for the mean of the claim sizes in this
portfolio. She then collects a different sample of 100 claims from the same
portfolio and calculates a new 95% confidence interval for the mean claim size.
(i) Explain how the widths of the two confidence intervals will differ.
Analyst B obtains a 95% confidence interval for the mean claim size of this
portfoliobased on a different sample of 30 claims. She subsequently realises that
one of the claims in the sample has an extremely large value and can be
considered as an outlier. She decides to replace this claim with a new randomly
selected one, whose size is not an outlier, and obtains a new 95% confidence
interval.
(ii) Explain how the two confidence intervals will differ in the case of Analyst
B.
Question 29. In order to compare the effectiveness of two new vaccines, A and B, for a
childhood disease, 11 infants were immunised with vaccine A and 9 infants were
immunized with vaccine B. One month after immunisation the concentration of
the disease antibodies in the blood of each infant was recorded in appropriate
units. The sample mean and variance for each group is given below.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
98
www.sankhyiki.in
+91-‐9711150002
!
!!
(ii) Calculate an equal-tailed 95% confidence interval for the ratio !
!!
using the
pivotal quantity in part (i). (You are not required to show the derivation of
the interval.)
We now assume that 𝜎!! = 𝜎!! = 𝜎 ! . Under this assumption, you are given that
!"! !
!
the distribution of !!! is 𝜒!" , 𝑤ℎ𝑒𝑟𝑒 𝑠!! is the pooled variance of the two samples
and is independent from 𝑥 A and 𝑥 B.
(iii) Explain why, under the above result, the sampling distribution of
!! !!! !(!! !!! )
! !
is 𝑡!" .
!! !
!! !
(iv) Calculate an equal-tailed 95% confidence interval for µA − µB using the
sampling distribution in part (iii). (You are not required to show the
derivation of the interval.)
(v) Comment on your results with regard to differences between vaccine A
and vaccine B.
Question 30. An insurer has collected data about the body mass index of 200 males between the
age of 18 and 40. The results are shown in the following table.
A statistician suggests the following model for the distribution of the body mass
index with an unknown parameter p.
Body mass index < 18.5 18.5–25 25−30 >30
Relative frequency p 20p 10p 1−31p
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
99
www.sankhyiki.in
+91-‐9711150002
A life office has considered a sample of 10,000 men aged between 18 and 40 of
which 50% are married and the other 50% are single.
(iii) Estimate the proportion of men with a body mass index of more than 30 in
this sample, based on the data in the above table.
(iv) Determine whether the body mass index is independent of the marital
status or not, using an appropriate statistical test. You should state the null
hypothesis for the test, calculate the value of the test statistic and the
approximate p-value and state your decision.
Question 31. Bank robberies in various countries are assumed to occur according to Poisson
processes with rates that vary from year to year. It was reported that the number
of robberies in a particular country in a specific year was 123. The number of
robberies in a different country in the same year was 111. It can be assumed that
each robbery is an independent event and that robberies occur independently in
the two countries.
Determine an approximate 90% confidence interval for the difference between the
true yearly robbery rates in the two countries.
Question 32. A survey is undertaken to investigate the proportion p of an adult population that
support a certain government policy. A random sample of 100 adults is taken and
contains 30 who support the policy.
A different sample of 1,000 adults is taken and it contains 300 who support the
policy.
(iii) Explain how the width of a 95% confidence interval for p in this case will
compare to the width of the interval in part (i), without performing any
calculations.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
100
www.sankhyiki.in
+91-‐9711150002
(i) It is first suggested that the number of bananas taken by each individual of
each group follows the same binomial distribution with common
parameter p and n=7.
(a) Use the method of moments to estimate the parameter p.
(b) The scientist is unsure whether a common parameter is appropriate
and wishes to compare pA and pB, the probability that a banana is
taken by an individual in groups A and B respectively.
Test the hypothesis that pA = pB.
(ii) A statistician suggests an alternative model. The number of bananas taken
by an individual still follows a binomial distribution with n=7, but for
group A the parameter is 2θ and for group B the parameter is θ, where
θ < 0.5.
(a) Show that the log likelihood for θ is given by:
33ln (2θ) + 9ln (1− 2θ) + 37ln (θ) + 40ln (1− θ) + constant
(b) Hence calculate the maximum likelihood estimate of θ.
(iii) (a) Compare the fit of the two suggested models in parts (i) (with
common parameter p) and (ii) by considering the expected number
of bananas taken in groups A and B under the two models. You are
not required to perform a formal test.
(b) Comment on the above comparison in relation to your answer in
part (i)(b).
Question 34. A researcher obtains samples of 25 items from normally distributed measurements
from each of two factories. The sample variances are 2.86 and 9.21 respectively.
(i) Perform a test to determine if the true variances are the same.
(ii) For each factory calculate central 95% confidence intervals for the true
variances of the measurements.
(iii) Comment on how your answers in parts (i) and (ii) relate to each other.
Question 35. In an opinion poll, a sample of 100 people from a large town were asked which
candidate they would vote for in a forthcoming national election with the
following results:
Candidate A B C
Supporters 32 47 21
(i) Determine the approximate probability that candidate B will get more than
50% of the vote.
A second opinion poll of 150 people was conducted in a different town with the
following results:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
101
www.sankhyiki.in
+91-‐9711150002
Candidate A B C
Supporters 57 56 37
(ii) Use an appropriate test to decide whether the two towns have significantly
different voting intentions.
Question 36. In a medical study conducted to test the suggestion that daily exercise has the
effect of lowering blood pressure, a sample of eight patients with high blood
pressure was selected. Their blood pressure was measured initially and then again
a month later after they had participated in an exercise programme. The results are
shown in the table below:
Patient 1 2 3 4 5 6 7 8
Before 155 152 146 153 146 160 139 148
After 145 147 123 137 141 142 140 138
(i) Explain why a standard two-sample t-test would not be appropriate in this
investigation to test the suggestion that daily exercise has the effect of
lowering blood pressure.
(ii) Perform a suitable t-test for this medical study. You should clearly state
the null and alternative hypotheses.
Question 37. An insurance company experiences claims from 290 insurance policies in a year
on a portfolio of 900 policies. Only one claim can be made on a policy in a year.
The company assumes that all policies are independent of each other.
Determine a 90% confidence interval for the proportion of policies on which a
claim is made in a year.
Question 38. A random sample of 30 observations is drawn from a normal distribution with
unknown variance.
(i) Write down an expression for the distribution of S, the population standard
deviation.
The sample standard deviation, s, is 7.5.
(ii) Calculate a 95% confidence interval for the population standard deviation.
Question 39. An insurance company has calculated premiums assuming that the average claim
size per claim for a certain class of insurance policies does not exceed £20,000
per annum. An actuary analyses 25 such claims that have been randomly selected.
She finds that the average claim size in the sample is £21,000 and the sample
standard deviation is £2,500. Assume that the size of a single claim is normally
distributed with unknown expectation α and variance σ2.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
102
www.sankhyiki.in
+91-‐9711150002
(ii) Perform a test for the null hypothesis that the expected claim size is not
greater than £20,000 at a 5% significance level.
(iii) Discuss whether your answers to parts (i) and (ii) are consistent.
(iv) Calculate the largest expected claim size, a0, for which the hypothesis
α ≤α0 can be rejected at a 5% significance level based on the sample of 25
claims.
The insurer is also concerned about the number of claims made each year. It is
found that the average number of claims per policy was 0.5 during the year 2011.
When the analysis was repeated in 2012 it was found that the average number of
claims per policy had increased to 0.6. These averages were calculated on the
basis of random samples of 100 policies in each of the two years. Assume that the
number of claims per policy per year has a Poisson distribution with unknown
expectation λ and is independent from the number of claims in any other year or
for any other policy.
(v) Perform a test at 5% significance level for the null hypothesis that l= 0.6
during the year 2011.
(vi) Perform a test to decide whether the average number of claims has
increased from 2011 to 2012.
Question 40. The distribution of claim size under a certain class of policy is modeled as a
normal random variable, and previous years records indicate that the standard
deviation is £120.
(i) Calculate the width of a 95% confidence interval for the mean claim size
if a sample of size 100 is available.
(ii) Determine the minimum sample size required to ensure that a 95%
confidence interval for the mean claim size is of width at most £10.
(iii) Comment briefly on the comparison of the confidence intervals in (i) and
(ii) with respect to widths and sample sizes used.
Question 41. A researcher wishes to investigate whether a coin is balanced or not, that is if
P(heads) = 0.5. She throws the coin four times and decides to accept the
hypothesis H0 : P(heads) = 0.5 in a test against the alternativeH1 : P(heads) ≠ 0.5,
if the number of times that the coin lands “heads” is 1, 2, or 3.
(i) Calculate the probability of the type I error of this test.
(ii) Calculate the probability of the type II error of this test, if the true
probability that the coin lands “heads” is 0.7.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
103
www.sankhyiki.in
+91-‐9711150002
Question 42. Pressure readings are taken regularly from a meter. It transpires that, in a random
sample of 100 such readings, 45 are less than 1, 35 are between 1 and 2, and 20
are between 2 and 3.
Perform a χ2 goodness of fit test of the model that states that the readings are
independent observations of a random variable that is uniformly distributed on
(0, 3).
Question 43. In a survey conducted by a mail order company a random sample of 200
customers yielded 172 who indicated that they were highly satisfied with the
delivery time of their orders.
Calculate an approximate 95% confidence interval for the proportion of the
company’s customers who are highly satisfied with delivery times.
Question 44. Let X1, …, Xn denote a large random sample from a distribution with unknown
population mean and known standard deviation 3. The null hypothesis H0: 𝜇 = 1 is
to be tested against the alternative hypothesis H1: 𝜇 > 1, using a test based on the
sample mean with a critical region of the form 𝑋 > k, for a constant k.
It is required that the probability of rejecting H0 when 𝜇 = 0.8 should be
approximately 0.05, and the probability of not rejecting H0 when 𝜇 = 1.2 should
be approximately 0.1.
(i) Show that the test requires
where Φis the standard normal distribution function.
(ii) The values for the sample size n and the critical value k which satisfy the
requirements of part (i) are n = 482 and k = 1.025 (you are not asked to
verify these values).
Calculate the approximate level of significance of the test, and comment
on the value.
Question 45. (i) A random variable Y has a Poisson distribution with parameter but there is
a restriction that zero counts cannot occur. The distribution of Y in this
case is referred to as the zero-truncated Poisson distribution.
(a) Show that the probability function of Y is given by
! ! ! !!
𝑃 𝑦 = !!(!!! !! ) 𝑦 = 1,2,3, …
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
104
www.sankhyiki.in
+91-‐9711150002
Show that the maximum likelihood estimate of may be determined
by the solution to the following equation:
!! !!
𝑦 − 𝜃 − !!! !! = 0
(a) Find the value of C so that Test 2 has the same 𝑃(𝑇𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟) as that of
Test 1.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
105
www.sankhyiki.in
+91-‐9711150002
(b) Compute the P(Type II error) of each test given the value of 𝜃 = 𝜃! > 0.
(c) Comment on your results as obtained in part(b)?
Question 49. An insurance company has a portfolio of 10,000 policies. Based on past data the
company estimates that the probability of a claim on any one policy in a year is
0.003. It assumes no policy will generate more than one claim in a year.
(i) Determine the approximate probability of more than 40 claims from the
portfolio of 10,000 policies in a year.
(ii) Determine an approximate equal-tailed interval into which the number of
claims per year will fall with probability 0.95.
In practice 42 claims were received in a particular year. A Director of the
company complains about the range of estimates in part (ii) being wrong.
(iii) Comment on the Director’s complaint.
Question 51. (i) In the context of hypothesis testing, define a statistical test, null hypothesis
and alternate hypothesis.
(ii) List the steps involved in hypothesis testing.
(iii) An experimenter has prepared a drug dosage level that she claims will
induce sleep for 80% of people suffering from insomnia. After examining
the dosage, we feel that her claims regarding the effectiveness of the
dosage are inflated. In an attempt to disprove her claim, we administer her
prescribed dosage to 20 insomniacs and we observe X, the number for
whom the drug dose induces sleep. Assume that the rejection region
{x ≤ k} is used
(a) Find the value of k so that P (Type I error), α, is approximately at
1% level
(b) For the rejection region in part (iii), find P (Type II error), β, when
the proportion of people suffering from insomnia is 1/2.
Question 52. A Seismologist collected data on one of the islands of Japan. The island was struck with
one earthquake in past one year. She is interested in µ, the average number of
earthquakes per annum.
(i) Obtain an exact 95% confidence interval for µ using one-year data.
She also collected past data and found that the island was struck with 36 earth quakes in
the past 36 years
(ii) Find an approximate 95% confidence interval for µ.
(iii) Compare and comment on the confidence intervals obtained in parts (i)
and (ii) above.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
106
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. (a) T.S = 2.02, do not reject 𝐻! (b) C.I = (-3.06, 18.70) (c) Yes, as C.I included zero
Ans.2. a = 1 b = 1/3
Ans.3. (a) T.S = 1.85, do not reject 𝐻! (b) C.I = (0.03, 0.13)
Ans.4. (ii) F = 1.83, Accept H! (iii) T.S = −2.17, Reject H! (iv) C.I = (−3.98, −0.02)
Ans.5. (i) T.S = 1.3, Accept H! (ii) T.S = 27.25, reject H! (Rural) T.S = 12.71, Reject H! (Urban)
Ans.7. (i) T.S = 6.37, Reject H! (ii) T.S. = 4.32, Reject H! (iii) T.S = 0.402, do not reject H!
Ans.8. (ii) F = 1.31, do not reject H! (iii) C.I = (0.0499, 0.1912) (iv) T.S = 1.95, reject H!
Ans.9. (b) C.I = (58.61, 64.89) (c) T.S = 1.47, do not reject H!
Ans.11. (a) T.S = 7.65, do not reject H! (b) T.S = 7.01, reject H!
Ans.12. (a) T.S = 1.65, accept H! (b) C.I = (-1.05, 8.37) (c) F = 1.22, accept H!
Ans.15. (a) T.S = 1.198, accept H! (b) T.S = 1.1667, accept H! (c) C.I = (-3.16, 11.16)
Ans.17. (i) T.S = 2.16, reject H! (ii) F = 1.21, accept H! (iii) C.I = (7.5261, 66.4739)
(c) C.I = (-3.24, -1.76) or C.I = (1.76, 3.24) (d) C.I = 0.2411, 2.581
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
107
www.sankhyiki.in
+91-‐9711150002
(ii)(a) 𝑛 = 199.15, 𝑠𝑜 𝑛 ≥ 200 (b) 𝑛 = 268.30, 𝑠𝑜 𝑛 ≥ 269
(iii) Assuming a larger value of s results in a larger standard error, so a larger sample
size is required to achieve the same width of confidence interval.
Ans.22. (0.278, 0.476)
Ans.23. (i) 𝐸 𝜆 = 𝜆 and V[𝜆] = 𝜆 𝑛 (ii) 0.47028 (iii) (a) 0.4713 (b) 0.2510
(iv) When compared to the exact probability in (ii) the results in (iii) (a) and (b) show that
the continuity correction reduces the approximation error significantly for this small
sample size.
(v) 𝑛 ≈ 384 (vi) [0.21908, 0.32092]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
108
www.sankhyiki.in
+91-‐9711150002
(b) Comment: having more data with the same proportions provides strong enough
evidence to justify claiming that an association exists.
Ans.25. (i) (0.284, 0.416) (ii) If we take a large number of samples from this population, we
expect 95% of the resulting CIs to include the true value of θ.
Ans.26 (i) E(Y) = 200p V(Y) = 200p(1-p)
(ii) Again, with Yi
being iid Bernoulli(p) random variables and n
being sufficiently
large, the central limit theorem implies that Y follows an approximately a normal
distribution.
(iii) (0.144, 0.236)
Ans.27.(i) 𝑝 = 0.5 (ii) TS = 4 : Reject H0
(iii) Since the estimated value is 0.5, any reasonable test will not reject that value, since
the value 0.5 will always be in the acceptance region of the test. In other words, 0.5 will
always be in any confidence interval around the estimate 0.5.
(iv) We now have: H0 : Xi
~ Bin(2, 0.5) and
H1 : Xi
does not follow Bin(2, 0.5) (emphasis on both Bin, p
= 0.5 )
(v) TS = 4, do not reject H0 (degree of freedom is 2)
(vi) The result in part (ii) states that a binomial distribution does not fit the data
well and is rejected. However, in part (iii) we found that, under the assumption of a
binomial distribution, p0 = 0.5 cannot be rejected. A specific binomial distribution with
parameter p
= 0.5 is not rejected in part (v) for the same data. The reason is that the
additional degree of freedom in part (v) allows for a larger value of the test-statistic under
the null.
Ans.28.(i) With the larger sample of 100 claims the standard error of the sample mean will be
smaller, giving a narrower confidence interval.
(ii) The replacement of the extreme value will give a smaller sample mean, which means
that the interval will be shifted to the left. The variance of the sample will also be smaller,
which will again give a narrower interval.
Ans.29.(i) This is an F distribution with 10, 8 degrees of freedom.
(ii) (0.198, 3.281)
(iii) As the two samples are independent we have that
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
109
www.sankhyiki.in
+91-‐9711150002
!(!! ) !(!! ) ! !
V(𝑋! − 𝑋! ) = !!
+ !
= 𝜎 ! (!! + !)
! !
We are also given that 𝑌~𝜒!" and Z and Y being independent we can use that ~𝑡!" .
!/!"
(iv) (– 1.126, 0.506). (v) The interval includes the value 0, suggesting that there is no
difference in the mean effectiveness of the two vaccines.
Ans.30.(i) 𝑝 = 0.02935
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
110
www.sankhyiki.in
+91-‐9711150002
different.
Ans.35. (i) 0.274 (ii) T.S. = 2.315, Accept H0
Ans.36. (i) The two samples are from the same patients, so they are clearly not independent.
(ii) T.S. = 4.785, we have strong evidence against H0 (P-value < 0.5%), and conclude that
daily exercise has the effect of lowering blood pressure.
Ans.37. (0.296, 0.348)
(!!!)! !
Ans.38. (i) !!
~𝜒 ! !!! (ii) (5.97, 10.08)
Ans.39. (i) (19.968, 22.032) (ii) T.S. = 2, we reject null hypothesis at 5%.
(iii) The confidence interval in part (i) corresponds to a two-sided test. We found in
part (i) that 20 is contained in the confidence interval, and we can therefore not
reject the null hypothesis H0 : α=20 at a 5% significance level. However, the one-
sided test rejects H0 : α ≤ 20 since only positive differences 𝑋 -α0 are considered.
Answers are consistent.
(iv) 20.1445 (v) T.S. = -.129, Accept H0 (vi) T.S. = 0.9535, Accept H0.
Ans.40. (i) Width = 47.04 (ii) 554
(iii) The confidence interval in (ii) in narrower to achieve this we require a much
larger sample size.
Ans.41. (i) 0.125 (ii) 0.7518
Ans.42. T.S. = 9.50, P-Value < 1%, reject H0.
Ans.43. (0.812, 0.908)
Ans.44. 43%
!(!!! !! )!
Ans.45. (ii) (b) !(!!! !! !!! !! )
(iii)(a) Y 1 2 3 4 5 ≥6
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
111
www.sankhyiki.in
+91-‐9711150002
Ans.46. (73.3%, 92.7%)
Ans.47. n = 500 is very large, so the Central Limit Theorem justifies normality. (225, 249)
Ans.48. (a) 𝐶 = 1.3435 (b) 𝑃 𝑍 ≤ 0.95 − 𝜃! 2 (c) Test 2 is more powerful test that Test 1.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
112
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 8
Question 1. The following data relate x, the moisture of wet mix of a certain product and y,
the density of the finished product.
x 12 11 10 9 8
y 4 3 2 0 1
Question 2. A study was made on the effect of temperature on the yield of a chemical process.
The following data (in coded form) were collected.
x -5 -4 -3 -2 -1 0
1
2
3
4
5
y 1 5 4 7 10 8
9
13
14
13
8
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
113
www.sankhyiki.in
+91-‐9711150002
Question 3. The following measurements, air velocity (cm/secs) and evaporation coefficients
(mm! /secs) of burning fuel droplets in an impulses engine.
Air Velocity: x 20 60 100 140 180 220 260 300 340 380
Evaporation coeff.: y 0.18 0.37 0.35 0.78 0.56 0.75 1.18 1.36 1.17 1.65
For the above data, a regression model 𝑦 = 𝛼 + 𝛽𝑥 + 𝑒 is to be fitted
Question 4. The following data refers to the number of claims (X) received by a motor
insurance company in a week and the number of settlements (Y) of these claims
in the following week during 10 randomly selected weeks in a year.
X: 100 110 120 130 140 150 160 170 180 190
Y: 45 51 54 61 66 70 74 78 85 89
𝑛 = 25 𝑥 = 1314.90 𝑦 = 235.70
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
114
www.sankhyiki.in
+91-‐9711150002
a) Find the estimated correlation coefficient between X and Y and test the
hypothesis H! :ρ=0, against H! :ρ ≠ 0; ρ being the population correlation
coefficient between X and Y.
b) Stating the assumption fit a regression line of the model
Y! = β! + β! X! + e! for the above data.
c) Obtain the unbiased estimator of 𝜎 ! .
d) Test the hypothesis H! : β! = 0 against H! : β! ≠ 0
e) Obtain 99% confidence interval for β! .
f) Obtain the coefficient determination 𝑅! .
Question 6. The following data shows the number of hours that ten administrative officers
worked and number of files disposed by them in a certain LIC office.
Hours Worked (x) 4 9 10 14 4 7 12 22 1 17
No. of file Disposed(y) 31 58 65 73 37 44 60 91 21 84
A linear regression model 𝑌 = 𝛼 + 𝛽𝑥 has been fitted to the above data.
i. Find the equation of the least squares line that approximates the regression
of the disposed on the number of hours worked.
ii. Estimate the average number of files disposed by an officer who worked
14 hours. Assuming the usual normal linear regression model.
iii. Test the null hypothesis β = 3 against β > 3 at 0.01 level of significance.
iv. Calculate a two sided 95% confidence interval for β.
Question 8. A survey was conducted to investigate whether people tend to marry partners of
about the same age. This question was addressed to 12 married couples and their
ages were given in the following table.
Couple No. 1 2 3 4 5 6 7 8 9 10 11 12
Husband's age(x) 30 29 36 72 37 36 51 48 37 50 51 36
Wife's age(y) 27 20 34 67 35 37 50 46 36 42 46 35
a) Draw the scatter plot and comment on it.
b) Find the correlation coefficient and interpret it.
c) If ρ represents the population correlation coefficient between the ages of
partners, test the significance of ρ at 5% level.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
115
www.sankhyiki.in
+91-‐9711150002
Question 9. It is thought that a suitable model for a plumber’s charge when called out for a job
is a linear one based on a fixed call-out charge and an hourly rate.
A random sample of 10 of his invoices gave the following results:
Duration of Job
(hours) x 0.5 1 1.5 2 2.5 3 3.5 4.5 5 5.5
Cost of Job(Rs) y 40 55 45 65 80 75 95 100 120 130
!
𝑥 = 29 𝑥 = 110.5 𝑦 = 805 𝑦 ! = 73,225 𝑥 𝑦 = 2,795
a) Plot these data and comment on the suitability of the proposed model.
b) Calculate the least squares estimates of the plumber’s call-out charge, and
the plumber’s hourly rate charge.
c) Determine a 90% confidence interval for the plumber’s hourly rate charge.
d) Compute Pearson correlation co-efficient and test for its significance.
Question 10. The following table shows the student population (in thousands) and quarterly
sales (in thousand Rupees) data for 10 armands’ pizza parlors
The simple linear regression model y= β! + β! x + e has been fitted for these data:
Question 11. The following data shows the student population and quarterly sales data for ten
food restaurants. The manager believes that quarterly sales for these restaurants
(y) are related positively to the size of the student population (x).
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
116
www.sankhyiki.in
+91-‐9711150002
Student
Restaurant Population Quarterly Sales
i (in‘000) 𝑥! (in Rs.’000) 𝑦!
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
A simple linear regression model 𝑦 = β! + β! x + 𝑒! is fitted.
Question 12. A study was done to find if the students who are good in high school, carry on
doing well also in the college. The high school grade (X) and college grade (Y) of
15 randomly chosen students are given below.
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X 2 3.4 3.7 1.5 3.3 0.3 0.4 2 2 2.1 2.1 1.3 1.5 3.1 2.1
Y 2 2.6 3.8 1.1 3 0.1 1.4 1.5 1.4 4 1.5 1.3 1.9 3.1 1.9
a) Draw a scatter plot of the above data with high school grades on X-axis and
college grade on Y-axis.
b) Estimate the slope and intercept parameters of the linear regression. Also
calculate the sample correlation coefficient between the high school grades
and the college grades.
c) Can it be concluded that, in general, the expected performance in college is
the same as that in high school?
d) Estimate the college grade of a student who has a grade of 2.1 in high
school. Comment on the college grade of student number 10 in the above
data.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
117
www.sankhyiki.in
+91-‐9711150002
Question 13. A training manager wishes to see if there has been any alteration in the ability of
his trainees after they have been on a course. The trainees take an aptitude test
before they start the course and on equivalent one after they have completed it.
The scores are given below.
Question 14. In a study of the relation between the amount of information available and use of
buses in eight comparable test cities, bus route maps were given to residents of
the cities at the beginning of the test period. The increase in average daily bus use
during the test period was recorded. The numbers of maps and the increase in bus
use are given in the table below (both in thousands).
Number of maps(x) 80 220 140 120 180 100 200 160
Increase in bus use(y) 0.6 6.7 5.3 4 6.55 2.15 6.6 5.75
For these data:
𝑥 = 1,200 , 𝑥 ! = 196,800, 𝑦 = 37.65 , 𝑦 ! = 213.4875, 𝑥𝑦 = 6,378
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
118
www.sankhyiki.in
+91-‐9711150002
Fitted values (𝑦) 1.66 7.75 4.27 3.4 6.01 2.53 6.88 5.14
Residuals(𝑒) -1.06 -1.05 1.03 0.6 0.54 -0.38 -0.28 0.61
Plot the residuals against the values of the fitted response and comment on
the adequacy of the model.
iv. A new city is added to the study, and 250,000 maps are distributed to its
citizens. Calculate the prediction of the increase in bus use in this city
according to the model fitted in part (ii) and comment on the validity of
this prediction.
Question 15. Consider a situation in which the data consist of two responses at each of five
values of an explanatory variable (𝑥 = 1, 2, 3, 4, 5), so we have a data set with ten
responses (y), as in the following table:
X 1 1 2 2 3 3 4 4 5 5
Y 12 19 18 35 19 44 32 53 44 65
i. You are asked to carry out a linear regression analysis using data.
a) Draw a plot of the data to show the relationship between the responses
and explanatory values.
b) Calculate the total, regression and residual sums of squares for a least
squares linear regression analysis of y on x, and hence calculate the
value of R! , the coefficient of determination.
c) Determine the equation of the fitted regression line.
d) Calculate a 95% confidence interval for the slope of the underlying
regression line.
ii. A colleague suggests that it will be simpler and will produce the same results
if we use the following reduced data, in which the two responses at each x
value are replaced by their mean:
x 1 2 3 4 5
y 15.5 26.5 31.5 42.5 54.5
The details of the regression analysis for these data are given in the box below.
Regression equation: 𝑦 = 5.90 + 9.40𝑥
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
119
www.sankhyiki.in
+91-‐9711150002
Coef Stdev t-ratio p-val
𝑠 = 2.129 R-sq=98.5%
Analysis of Variance:
Source df SS MS F p-val
Regression 1 883.6 883.6 194.91 0.001
Error 3 13.6 4.53
Total 4 897.2
Discuss the similarities and the differences between the two approaches and their
results, in particular addressing the claim by the colleague that the two analysis
will produce “the same results”.
Question 16. As part of an investigation into health service funding a working party was
concerned with the issue of whether mortality rates could be used to predict
sickness rates. Data on standardised mortality rates and standardised sickness
rates were collected for a sample of 10 regions and are shown in the table below:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
120
www.sankhyiki.in
+91-‐9711150002
(ii) Noting the issue under investigation, draw an appropriate scatterplot for
these data and comment on the relationship between the two rates.
(iii) Determine the fitted linear regression of sickness rate on mortality rate and
test whether the underlying slope coefficient can be considered to be as
large as 2.
(iv) For a region with mortality rate 115.0, estimate the expected sickness rate
and calculate 95% confidence limits for this expected rate.
Question 17. A chain of LUMA ice cream parlors is floated in college campuses across the
metropolitan cities. A sample of 10 ice cream parlors across different colleges is
selected. The daily revenue Y (in INR) and the student population X in each of the
colleges are recorded as follows.
X 200 600 800 800 1200 1600 2000 2000 2200 2600
Y 6000 10000 9000 12000 11500 14000 16000 17000 15000 20000
The proprietor of the parlor chain has contacted a statistician, provided the above
information, and asked her to predict daily revenue given the student population
in a college. The statistician has decided to fit a linear regression model with
student population as independent variable.
(i) Fit a linear regression model.
(ii) Find the 99% confidence interval for the slope parameter.
(iii) Establish statistically, based on the regression model (i) above, whether
there is any relationship between the daily revenue and the college student
size by proposing and testing an appropriate hypothesis
(iv) Find the 95% confidence interval for the mean daily revenue when the
population of the college is 1,000.
(v) For what size of the college student population, the 95% confidence
interval for the mean daily revenue is the shortest?
(vi) Find 95% confidence interval for the predicted daily revenue for an
individual parlor when the student population of one particular college is
1,000.
(vii) Interpret the results of (iv) and (vi)
Question 18. Auditors are often required to compare the audited (or current) value of an inventory
item with the book (or listed) value. If a company is keeping its inventory and books
up to date, there should be a strong linear relationship between the audited and book
values: An Accountant intends to fit a linear regression model. He sampled ten
inventory items and obtained the audited and book values shown in the following
table.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
121
www.sankhyiki.in
+91-‐9711150002
(i) Fit a Linear regression model: Y = α + β X + e for the above data.
(ii) Obtain a 95% confidence interval for β, the slope parameter.
(iii) If the book value x = 100, find a 95% confidence interval for the predicted
mean audited value 𝜇 = 𝐸 𝑌|𝑋 = 𝑥 .
(iv) Find the book value x for which the 95% confidence interval for 𝑦, the
predicted individual audited value, has minimum length.
(v) Calculate coefficient of determination and interpret.
Question 19. A statistician has a series of bivariate data {(x1,y1), (x2,y2), … (xn,yn)} and
wishes to perform a linear regression on these data.
(i) State the equation that must be minimized to give the least squares
estimates of the regression coefficients.
(ii) Derive the least squares estimate of the slope coefficient from the equation
in part (i).
For a sample of 44 fish, the age (days) and length (millimeters) of each fish are
measured. Denote age by X and length by Y. The following summary data are
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
122
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. (b) 𝑟 = 0.9 (c) 𝑦 = −7 + 0.9𝑥 (d) S.E. = 0.6333 (e) C.I. = (0.0992, 1.7007)
Ans.2. (b) 𝑦= 8.3636+0.9818x (c) F=16.36, reject H! (d) CI= (0.4319, 1.5317)
Ans.4. (b)𝑋 = 145, 𝑌 = 67.3, 𝑆!! = 8250, 𝑆!! = 1932.1 β = 0.483 𝛼 = −2.739
Ans.5. (a) T.S = 1.714, reject (b) β! = -0.08 𝛽! = 13.64 (c) 𝜎 ! = 0.79 (d) T.S = 7.64, reject H!
Ans.6. (i) 𝑦 = 21.69 + 3.471𝑥 (ii) 𝑦 = 70.284 (iii) T.S. = 1.731, can not reject H!
Ans.7. 𝑟 = 0.632
Ans.9. (b) 𝛼 = 29.915 𝛽 = 17.443 (c) C.I = (14.91, 19.97) (d) 𝑟 = 0.977 T.S. = 12.9, reject H!
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
123
www.sankhyiki.in
+91-‐9711150002
(d) T.S. = −1.629, accept H! (e) C.I = (-12.077, 2.077)
Ans.15. (i) (b) 𝑅! =65.0% (c) 𝑦 = 5.9 + 9.4𝑥 (d) C.I. = (3.78, 15.02)
(ii)
There seems to be an increasing linear relationship such that mortality could be used to
predict sickness.
(iii) 𝛼 = 7.426 𝑎𝑛𝑑 𝛽 = 1.6371, T.S. = -0.74, cannot reject H0
(iv) 𝜇! = 195.69 and CI = (185.60, 205.78)
Ans.17. (i) ) 𝑦 = 6025.35 + 5.0176𝑥 (ii) (3.2078, 6.8275) (iii) Ho: β = 0; Vs. H1: β ≠ 0
Because 0, the hypothesized value of β is not included in the confidence interval
calculated in above part, we can reject Ho and conclude that significant statistical
relationship exists between the student size and daily revenue.
(iv) (9,981.57, 12,104.35)
(v) The standard error of mean daily revenue estimate, 𝑆𝑒(𝜇), will be minimum at
𝑥! = 𝑥 since term is squared and minimum can be zero at 𝑥! = 𝑥
For any confidence interval, the shortest length is attained at minimum 𝑆𝑒(𝜇). Hence for
a college with 1,400 students, the 95% confidence interval for the mean daily revenue
will be the shortest.
(vi) (7,893.97, 14,191.94)
(vii) CI for the daily revenue of a particular college with 1,000 students is wider than CI
for mean daily revenue with 1,000 students.
The difference reflects the fact that we are able to estimate mean value of y more
precisely than individual value of y.
The resulting interval for an individual response is wider than the corresponding interval
for the mean response because the uncertainty associated with individual estimator is
more than the relatively more stable mean response.
Ans.18. (i) 𝑦 = 0.7198 + 0.9914𝑥 (ii) (0.9651, 1.0177) (iii) (97.781, 101.937)
(iv) 𝑆𝑒(𝑦) will be minimum at 𝑋! = 𝑋 = 72
(v) Coefficient of determination is used to measure the goodness of fit of a linear
regression model. A value of 99.89% means the model is a good fit.
𝑆
Ans.19. (i) 𝑦! − (𝛼 + 𝛽𝑥! ) ! (ii) 𝛽 = !" 𝑆 (iii) 𝛼 = 924 ∙ 68 𝛽 = 26 ∙ 241
!!
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
124
www.sankhyiki.in
+91-‐9711150002
(iv) 𝑟 = 0 ∙ 879
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
125
www.sankhyiki.in
+91-‐9711150002
REVISION ASSIGNMENT – 2
Question 1. The waist measurements (in cm) of six male patients before and after undergoing
a medically controlled diet are as follows:
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6
Before 106 98 110 100 105 96
After 98 97 82 89 80 90
Calculate a 90% confidence interval for the reduction in waist measurement
following the diet.
Question 2. A random sample of size 2n is taken from a geometric distribution for which:
𝑃 𝑋 = 𝑥 = 𝑝𝑞 !!! 𝑥 = 1,2,3, ⋯
Give an expression for the likelihood that the sample contains equal numbers of
odd and even values of X.
Question 3. A random sample of 16 observations ( 𝑥!, 𝑥!, ⋯ 𝑥!" ) from a normal distribution
gives: !" !" !
!!! 𝑥! = 128 !!! 𝑥! = 1168
Score Movement
2, 3 or 4 Down 1
5, 6 or 7 Stay same
8, 9 or 10 Up 1
11 or 12 Up 2
(i) Using a normal approximation, calculate the probability that after 15 turns
a child’s spider will have moved up more than 8 squares from the start.
(ii) Comment briefly on the suitability of this approximation.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
126
www.sankhyiki.in
+91-‐9711150002
Question 6. A sample of 50 independent and identically distributed observations from an
Exp(𝜆) distribution gave:
Range 0≤x<1 1≤x<2 x≥2
Frequency 30 15 5
(i) Show that the log-likelihood can be expressed as:
𝑙𝑛𝐿 𝜆 = 𝑐𝑜𝑛𝑠𝑡 − 25𝜆 + 45ln (1 − 𝑒 !! )
explaining clearly why the constant has arisen.
(ii) Hence calculate the maximum likelihood estimate of 𝜆.
Question 7. A random variable X has probability density function:
2𝑒 !!(!!!) 𝑥 ≥ 𝜃 where the value of 𝜃 is unknown.
Five observations of X are: 1.90, 2.97, 1.88, 2.94 and 1.56.
(i) Derive a formula for the maximum likelihood estimator of 𝜃 and obtain
the maximum likelihood estimate for this sample.
!
(ii) Show that 𝐸 𝑋 = 𝜃 + ! and hence calculate the method of moments
estimate of 𝜃.
(iii) Comment briefly on your results.
Question 8. A large life office has n policyholders, each with a probability of 0.01 of dying
during the next year (independently of all other policyholders).
Calculate the approximate probability that there will be between 9 and 16 (both
inclusive) deaths during the year, when:
(i) n = 400 (ii) n = 3,000
Question 9. The gamma distribution, with parameters 𝛼 and 𝜆, has moment generating
! !!
function : 𝑀! 𝑡 = 1 − !
(i) Show, using moment generating functions, that the sum of two
independent gamma random variables, each with second parameter 𝜆, is
also a gamma random variable and state its parameters.
(ii) A random sample X1 ,⋯, Xn is taken from a Gamma(𝛼,𝜆) distribution.
Derive the moment generating function of 2𝜆 𝑋! , and hence show that it
!
has a 𝜒!!" distribution.
(iii) Suppose that X is the mean of a random sample of size 5 taken from a
Gamma(2,0.1) distribution. Use the result from part (ii) to calculate the
probability that X exceeds 40.
Question 10. The number of claims per annum from a certain type of medical insurance policy
sold to policyholders over the age of 60 is believed to follow a Poi(𝜆) distribution,
where the parameter 𝜆 is unknown. A sample of 10 policies gave rise to the
following numbers of claims: 0, 1, 0, 0, 3, 0, 1, 0, 2, 2
(i) Use a normal approximation to calculate an approximate 99% confidence
interval for the Poisson parameter 𝜆.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
127
www.sankhyiki.in
+91-‐9711150002
(ii) Comment on the accuracy of the interval obtained in part (i).
(iii) Write down the equations that you would use to obtain the confidence
interval for 𝜆 using an accurate method.
Question 11. A random sample of 10 pet insurance claims had an average size of £680. It is
believed that claim amounts are exponentially distributed.
(i) Using the fact that if X1,⋯, Xn are exponentially distributed with
!
parameter 𝜆 , then 2𝑛𝜆𝑋 has a 𝜒!! distribution, where 𝑋 is the mean of
X1 ,⋯, Xn , calculate an exact 90% confidence interval for the mean pet
insurance claim size.
(ii) Write down the likelihood function in terms of the mean 𝜇 of the
exponential distribution and hence show that the maximum likelihood
estimator of 𝜇 is 𝑋.
(iii) (a) Show that the Cramér-Rao lower bound for estimators of the mean
𝜇!
of the exponential distribution is given by 𝑛
(b) Hence, calculate the estimated asymptotic standard error of the
mean, 𝑋.
(iv) (a) Use your results from (iii) and the asymptotic properties of
estimators to calculate an approximate 90% confidence interval for
the mean claim size.
(b) Comment on the confidence intervals produced in (i) and (iv)(a).
Question 12. It is desired to test the value of the parameter p for a random variable that has a
binomial distribution. In order to test the null hypothesis H0 : p = 0.4 against the
alternative hypothesis H1 : p = 0.6 , the following test is devised:
The number of successes, X, in a sample of size 50 is determined. If X≥25 , then
H0 is rejected. Calculate the approximate size of this test.
Question 13. Following archaeological excavations at a site in Egypt, ten samples of wood
were carbon-dated and their ages x (years) estimated as:
4,900 4,750 4,820 4,710 4,760
4,570 4,300 4,680 4,800 4,670
!
𝑥 = 46960 𝑥 = 220772
(i) Calculate a 95% confidence interval for the true mean age of the wood
found at this site.
(ii) Present these data values graphically and comment on the validity of the
confidence interval calculated in part (i).
(iii) Ideally the archaeologist would like the 95% confidence interval for the
true mean age, calculated in (i) above, to have a width of no more than
200 years. Calculate the minimum sample size needed.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
128
www.sankhyiki.in
+91-‐9711150002
(iv) At a second site, eight samples of wood gave the following results:
𝑦 = 36000 𝑦 ! = 162280000
Calculate a 95% confidence interval for the difference between the mean
ages of the wood found at the two sites.
(v) Obtain a 90% confidence interval for the ratio of the underlying variances
in the ages of the two samples of wood. Hence comment on the validity of
the confidence interval given in part (iv).
Question 14. A research chemist thinks he has discovered a new desiccant, which is more
efficient at extracting moisture from chemicals than the existing one. In order to
test the claim, equal amounts of a homogeneously mixed compound are put into
each of sixteen desiccators. These are divided into two batches of eight, labelled
A and B, and in each batch the desiccators are numbered 1 to 8. Into each
desiccator is also put a standard amount of the respective desiccant under test.
Batch A contains the existing desiccant whilst the new desiccant is placed in
Batch B. The desiccators are sealed for 24 hours and then the increase in weight
in grams of each of the sixteen samples of desiccant is measured. The results are:
Sample number 1 2 3 4 5 6 7 8
Existing desiccant (A) 4.59 5.05 4.49 5.33 4.66 4.98 5.67 5.23
New desiccant (B) 4.75 5.03 4.66 5.56 4.90 4.88 5.80 5.33
𝐴 = 40 𝐴! = 201.1574 𝐵 = 40.91 𝐴! = 210.3659
(i) (a) (1) Draw a plot of the data and comment briefly.
(2) Perform a test to verify that the variances arising from the
use of each desiccant are not significantly different and
comment briefly in relation to your plot of the data.
(b) Use a t test to investigate the claim that the new desiccant extracts
more moisture than the existing one.
(ii) It was subsequently discovered that eight different compounds had been
used in the above test. The ith pair of desiccators A and B had contained
equal weights of compound i, 𝑖 = 1,2,⋯ ,8. Perform a new analysis with
the same aim, as in part (i)(b) above, again using a t test.
(iii) Comment on any difference found between the analyses, and the cause.
Question 15. A study was carried out into the effects of smoking on life expectancy. The
average number (x) of cigarettes smoked per day from age 50 by 11 individuals
was calculated and the number (y) of years from age 50 until their deaths was
recorded. The results were as follows:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
129
www.sankhyiki.in
+91-‐9711150002
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
130
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. (4.22,22.11)
!! !!
Ans.2. ! (!!!)!!
Ans.3. (2.4,4.45)
Ans.5. (i) 0.14381 (ii) The Central Limit Theorem requires n to be large. Fifteen turns is not
large, therefore this will be a poor approximation.
Ans.7. (i) 𝜃 = 𝑚𝑖𝑛𝑋! From this sample, the maximum likelihood estimate of 𝜃 is 1.56
(ii) 1.75
(iii) One of the observed values was less than the method of moments estimate of 𝜃 . So
the method of moments gives an estimate of 𝜃 that is not ‘possible’ in this case. This
contrasts with the situation for maximum likelihood estimators, which, provided they
exist, must, by definition, give feasible estimates.
Ans.12. 0.09697
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
131
www.sankhyiki.in
+91-‐9711150002
Ans.13. (i)
(4,577, 4,815)
(ii)
Given that our data set is small, the confidence interval in part (i) requires that the ages
are normally distributed. The plot seems to show that 4,300 is very different to the other
values and so it may be an outlier. In which case the underlying distribution is not
normal, and our confidence interval is not valid. However, more data is needed for us to
be sure.
(iii) Asample size of at least 14 is required
(iv) (13.2, 379) (v) (0.188,2.27) Since this confidence interval contains 1, the
assumption of equal variances used in the confidence interval in part (iv) looks reasonable.
Ans.14. (i)(a)(1)
The plots suggest that the new desiccant may extract more water. The spread of values is
similar for each desiccant.
(2) TS = 1.004 : insufficient evidence to reject H0
(b) TS = 0.559 : insufficient evidence to reject H0
(ii) TS = 2.7083 : sufficient evidence to reject H0
(iii) The paired test shows that there was a significant difference between the two
desiccators, whereas the two-sample test does not indicate any significant difference. The
small but significant difference between the two desiccants is masked in the two-sample
test because the test statistic for the two-sample test is calculated using the pooled
variance (which is 0.1657) rather than the sample variance of the differenced data (which
is 0.01411). A smaller variance leads to a larger test statistic, which means we are more
likely to reject the null hypothesis. In other words, the increased power of the paired test
enables a significant difference to be identified.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
132
www.sankhyiki.in
+91-‐9711150002
Pearson’s correlation coefficient. Hence the Spearman’s rank correlation coefficient is
more robust.
(iv) TS = - 2.415 : sufficient evidence to reject H0
(v) 𝜏 = −0 ∙ 5636
𝑆!"
Ans.16. (i) (a) 𝑎 = 𝑦 − 𝑏𝑥 𝑏= 𝑆!! (b) The answer would not have differed at
all. For a normal distribution, maximum likelihood and least squares obtain the same
estimates.
(ii) 𝑏 = 19.667 𝑎 = 16.5 (iii) TS = - 2.355 : insufficient evidence to reject H0
(iv) (a) (90.7,99.7) (b) (121,148)
(v) The confidence interval for the individual job is wider (í27) than the confidence
interval for the average cost (í9). So there is greater uncertainty over an individual result
than an average one.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
133
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT - 9
SAMPLING AND STATISTICAL INFERENCE
Question 1. A random sample of n observations is taken from a normal distribution with mean
µ and variance σ2. The sample variance is an observation of a random variable S2.
Derive from first principles E(S2) and Var (S2).
Question 2. Calculate the probability that, for a random sample of 5 values taken from a
N(100, 252) population (i) 𝑋 will be between 80 and 120, (ii) S will exceed 41.7
(iii) both conditions (i) and (ii) will hold?
!!!""
Question 3. State the distribution of !/ !
for a random sample of 5 values taken from a
N(100,σ2 ) population. What is the probability that this quantity will exceed
1.533?
Question 4. Independent random samples of size n1 and n2 are taken from the normal
populations N(µ1,𝜎!! ) and N(µ2,𝜎!! ) respectively.
(i) Write down the sampling distributions of 𝑋1 and 𝑋2 and hence determine
the sampling distribution of 𝑋! − 𝑋! , the difference between the sample
means.
(ii) Now assume that 𝜎!! = 𝜎!! = 𝜎 !
Question 5. Determine:
(i) P(F9,10 >
3.779) (ii) P(F12,14 <
3.8)
(iii) P(F11,8 <
0.3392) (iv) the value of p such that P(F14,6 <
p) =
0.01.
Question 6. (i) Determine: (a) P(F3,9 <
3.863) (b) P(F10,10 <
0.269)
(ii) Determine the value of p such that:
(a) P(F24,30 >
p) =
0.10 (b) P(F18,9 >
p) =
99%
Question 7. For random samples of size 10 and 25 from two normal populations with equal
variances, use F tables to determine the values of α and β such that
!!! !!!
P !!!
> 𝛼 = 0.05 and P !!!
< 𝛽 = 0.05, where 𝑆!! is the sample variance from
the sample of size 10, and 𝑆!! is the other sample variance.
Question 8. What is the probability that the sample variance of a sample of 10 values from a
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
134
www.sankhyiki.in
+91-‐9711150002
normal distribution will be more than 6 times the sample variance of a sample of
5 values from an independent normal distribution with the same variance?
Question 10. Let 𝑋!, 𝑋!, … 𝑋! be independent N (0, 1) random variables and let
! ! ! !
𝑋 = ! !!! 𝑋! and 𝑆 ! = ! !!!(𝑋! − 𝑋)!
!
Calculate P 𝑋 > 0 and !!!(X ! − 𝑋)! < 9.488
Question 11. Consider a random sample of size 21 taken from a normal distribution with mean
𝜇= 25 and variance 𝜎 ! = 4. Let the sample variance be denoted 𝑆 ! . State the
distribution of the statistic 5𝑆 ! and hence find the variance of the statistic 𝑆 ! .
Question 12. A random sample of size 10 is taken from a normal distribution with mean 𝜇 = 20
and variance 𝜎 ! = 1.
Find the probability that the sample variance exceeds 1, that is find P(S2 > 1).
Question 13. Let X1, X2, …, Xn be a random sample of size n from a population with mean
𝜇 and variance 𝜎 ! .
!
Let the sample mean be 𝑋 and the sample variance be 𝑆 ! = !!! { 𝑋!! − 𝑛𝑋 ! }.
!!
You may assume E[𝑋] = 𝜇 and V 𝑋 = !
. Show that 𝐸[𝑆 ! ] = 𝜎 ! .
Question 14. Consider a random sample, X1,…
, Xn, from a normal N(μ, σ2) distribution, with
sample mean 𝑋 and sample variance S2.
(i) Define carefully what it means to say that X1,…, Xn is a random sample
from a normal distribution.
(ii) State what is known about the distributions of 𝑋 and S2 in this case,
including the dependencies between the two statistics.
(iii) Define the t -distribution and explain its relationship with 𝑋 and S2.
Question 15. Consider a random sample consisting of the random variables X1, X2,..., Xn with
mean µ and variance σ2. The variables are independent of each other.
(i) Show that the sample variance, S2, is an unbiased estimator of the true
variance σ2.
Now consider in addition that the random sample comes from a normal
(!!!)! ! !
distribution in which case it is known that !!
~𝜒!!! .
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
135
www.sankhyiki.in
+91-‐9711150002
Question 16. Regarding the small sample inference concerning the equality of means of two
Normal populations, the basic assumption to be made is
Question 17. A random sample 𝑋! , 𝑋! , … 𝑋!" is drawn from a normal distribution with mean 1
and variance 𝜎 ! . Let 𝑋 and 𝑆 ! denote the sample mean and variance respectively.
Find the approximate value of 𝑃[ 𝑋 − 1 > 𝑆)] by referring to statistical tables.
Question 18. The following are the summary measures of birth weights (in grams) of babies in
a city
Assuming that the birth weights are independently normally distributed for the
two genders, calculate the probability that at birth a boy outweighs a girl.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
136
www.sankhyiki.in
+91-‐9711150002
ANSWERS
!! !
Ans.1. E(S2) = 𝜎 ! and var (S2)= !!!
Ans.2. (i) 0.926 (ii) 0.0253 (iii) 0.023
!!!""
Ans.3. !/ !
~𝑡! and probability = 0.1
!! !!
Ans.4. (i) 𝑋1~N(µ1, ! ), 𝑋2~N(µ2, ! ), 𝑋1 - 𝑋2 is the difference between two independent normal
!! !!
!!! !!
variables and so is itself normal, with mean µ1 -‐
µ2 and variance
!!
+ !! .
!
(!! !!! )!(!! !!! ) !
(ii) (a) ! !
~𝑁(0,1) (b) 𝜒!!! (c)𝑡!! !!! !!
!! !
!! !!
(b) Estimator gets better (more accurate) as n increases, as its variance reduces.
(MSE also gets smaller)
Ans.16. (iii)
Ans.17. From actuarial table page 163 the probability is between 0.001 and 0.0005.
Ans.18. 0.15866
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
137
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT - 10
RANDOM NUMBER SIMULATION
Question 1. Generate 3 random variants from an exponential distribution with mean 0.5, using
the 3 random numbers 15/59, 55/59 and 42/59.
Question 2. Describe how you would generate random variants from a Binomial (3, 0.6)
distribution X, using a sequence of random numbers {𝑢! }.
Question 3. Using the first 5 numbers in the first column of random U (0, 1) numbers given in
the Tables on page 190 generate 5 values from a Poisson (10) distribution
Question 4. Simulate three random values from an Exp (0.1) distribution using the random
values 0.113, 0.608 and 0.003 from U (0, 1).
Question 5. Generate three random values from a U (-1, 4) using the following random values
from U (0, 1): 0.07 0.628 0.461
Question 6. Simulate two random values from a Poi (2) distribution using the random values
0.721 and 0.128 from U (0, 1).
Question 7. Generate three random values from a Bin (4, 0.6) using the following random
values from U (0,1): 0.588 0.222 0.906
Question 8. A model used for claim amounts (X, in units of £10,000) in certain circumstances
has the following probability density function, f(x), and cumulative distribution
function, F(x) :
!(!")! !" !
𝑓 𝑥 = (!"!!)! , 𝑥 > 0 ; 𝐹 𝑥 = 1 − !"!!
You are given the information that the distribution of X has mean 2.5 units
(£25,000) and standard deviation 3.23 units ($32,300).
(i) Describe briefly the nature of a model for claim sizes for which the
standard deviation can be greater than the mean.
(ii) (a) Show that we can obtain a ‘simulated observation of, X by
calculating:
!!.!
𝑥 = 10[ 1 − 𝑟 − 1]
where r is an observation of a random variable which is uniformly
distributed on (0,1).
(b) Explain why we can just as well use the formula:
𝑥 = 10 𝑟 !!.! − 1 to obtain a simulated observation of X.
(c) Calculate the missing values for the simulated claim amounts in
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
138
www.sankhyiki.in
+91-‐9711150002
the table below (which ha been obtained using the method in (ii)(b)
above):
r Claim (£)
0.7423 6,141
0.0291 102,872
0.2770 29,272
0.5895 11,148
0.1131 54,635
0.9897 207
0.6875 7,782
0.8525 3,243
0.0016 ?
0.5154 ?
Question 9. Consider the following simple model for the number of claims, N, which occur in
a year on, a policy:
n 0 1 2 3
P (N=n) 0.55 0.25 0.15 0.05
(a) Explain how you would simulate an observation of N using a number r an
observation of a random variable, which is uniformly distributed on (0,1).
(b) Illustrate your method described in (i) by simulating three observations of
N using the following random numbers between 0 and l:
0.6221, 0.1472, 0.9862
Question 10. One variable of interest, T in the description of a physical process can be
modelled as T = XY where X and Y are random variables such that
X ~ N(200, 100) and Y depends on X in such a way that Y|X = x ~ N(x,1).
Simulate, two observations of T, using the following pairs of random numbers
(observations of a uniform (0,1) random variable), explaining your method and
calculations clearly:
Random numbers
0.5714, 0.8238
0.3192, 0.6844
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
139
www.sankhyiki.in
+91-‐9711150002
Question 11. The random variable X has probability density function:
! !
𝑓 𝑥 = ! ! , 𝑥 > 1 and cumulative distribution function:𝐹 𝑥 = 1 − ! !
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
140
www.sankhyiki.in
+91-‐9711150002
ANSWERS
Ans.1. 0.1467, 1.3456, 0.6222
Ans.3. 11, 5, 5, 11 and 7
Ans.4. 1.20, 9.36 and 0.03
Ans.5. -0.65, 2.14 and 1.305
Ans.6. 3 and 0
Ans.7. 3, 2 and 4
Ans.8. (i) X takes positive values only so to have such a relatively high standard deviation the
distribution must be positively skewed with sizeable probability associated with high
values (i.e. the model embraces high claim sizes; the density has a long or heavy tail).
(ii) (b) 𝑅~𝑈 0,1 ⟹ 1 − 𝑅~𝑈 0,1 , so (1 r) is also a random number from (0, 1), so we
can use 1- r in place of r
(ii) (c) 262390, 141175
Ans.9. (b) 1, 0, 3
Ans.10. 40911, 38236
Ans.11. 1.5284, 2.6842, 1.1976
Ans.12. (i) 214.3, 193.5, (ii) 39.85, 245.57
Ans.13. (i) (1,2,0), (ii) X = 0.851
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
141
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 11
BAYESIAN STATISTICS AND CREDIBILITY THEORY
1. The number of e-mail messages received each day by an actuarial student has a
Poisson distribution with mean λ, where from past experience, the prior
distribution of λ is exponential with mean µ.
(i) The student has data x1,..., xn, where xi is the number of messages arriving
on day i, i = 1,2,…..n.
(a) Derive the posterior distribution of λ.
(b) Show that the Bayesian estimate of λ under quadratic loss can be
written in the form of a credibility estimate, and state the credibility
factor.
(c) If µ = 50 and the student receives a total of 550 messages over 10
days, calculate the Bayesian estimate of λ under quadratic loss.
(ii) 60% of messages require an answering time (in minutes) that is
exponentially distributed with mean 1, and the remaining messages
require an answering time that has a Pareto distribution-with mean 2 and
variance 12.
Determine the probability that a randomly chosen message requires more
than M minutes answering time. [UK April 2002]
3. The lengths of time taken to deal with each of n reports are independent
exponentially distributed random variables with mean 1/λ.
Show that the gamma distribution is the conjugate prior for this exponential
distribution. [UK April 2003]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
142
www.sankhyiki.in
+91-‐9711150002
analyst estimates that the mean and standard deviation of θ are 0.20 and 0.25
respectively.
From a random sample of 50 policies, a claim is made on 24% of them during the
year.
(i) Determine the values of the parameters α and β, of the prior distribution.
(ii) Determine the posterior distribution and hence the posterior mean of θ.
(iii) For the general case where x is the number of claims arising from sample
size n and µ is the mean of the Beta prior distribution, show that the
posterior mean of θ can be expressed as:
Z.(x/n) + (1—Z).µ
and express Z as a function of α, β and n.
(iv) (a) Calculate the value of Z for the situation in part (ii) and explain
what it represents.
(b) Without performing any further calculations, explain how you
would expect the value of Z to change if:
(1) The analyst now believes the standard deviation, σ, of the
prior distribution to be 0.50.
(2) The sample size, n, was 400.
(c) State the limiting value of Z as σ and n increase and explain what
this means. [UK Sept 2003]
5. A risk consists of 5 policies. On each policy in one month there is exactly one
claim with probability θ and there is negligible probability of more than one
claim in one month. The prior distribution for θ is uniform on (0,1). There are a
total of 10 claims on this risk over a 12-month period.
(i) Derive the posterior distribution for θ.
(ii) Determine the Bayesian estimate of θ under:
(a) quadratic loss
(b) all or nothing loss. [UK April 2004]
6. The number of claims from one group of drivers in a year has a Poisson
distribution with mean λ and the number of claims from a second group of
drivers has a Poisson distribution with mean 2λ. In one year, there are n1 claims
from group 1 and n2 claims from group 2.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
143
www.sankhyiki.in
+91-‐9711150002
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
144
www.sankhyiki.in
+91-‐9711150002
(iv) An insurer is considering introducing a new policy to provide insurance
against the failure of toasters within the first five years of purchase. Alan
and Beatrice are underwriters working for the insurer. Based on his
experience of similar products, Alan believes that toasters last three years
on average. Beatrice believes that six years is the average lifetime. Both are
adamant and are prepared to express their uncertainties about the average
lifetime, in terms of standard deviations of six months and one year
respectively. They decide to resolve their differences by testing a sample
of toasters large enough to ensure the difference in their posterior
expectations for the average lifetime will be less than one year.
Calculate how many toasters they should test, assuming the exponential
distribution is a good model for toaster lifetimes.
You may use the fact that if λ ~ Γ(α,s) then:
1
var(1 / λ) = [E(1 / λ)]2 × [UK April 2005]
α−2
9. The total amounts claimed each year from a portfolio of insurance policies over n
years were x1,x2 … , xn. The insurer believes that annual claims have a normal
distribution with mean θ and variance σ 12 , where θ is unknown. The prior
distribution of θ is assumed to be normal with mean µ and variance σ 22 .
(i) Derive the posterior distribution of θ.
(ii) Using the answer in (a), write down the Bayesian point estimate of θ
under quadratic loss.
(iii) Show that the answer in (b) can be expressed in the form of a credibility
estimate and derive the credibility factor.
The claims experience over five years for two companies was as follows:
Year 1 2 3 4 5
Company A Amount 421 417 438 456 463
Company B Amount 343 335 356 366 380
(iv) Determine the Bayes credibility estimate of the premiums the insurer
should charge for each company based on the modelling assumptions of
part (i), a profit loading of 25% and the following parameters:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
145
www.sankhyiki.in
+91-‐9711150002
Company A Company B
µ 400 300
σ12 500 350
σ 22 800 600
Y3 be the random variable denoting average total claim amounts per policy in
year 3.
(i) State the distribution of the number of claims on the whole portfolio over
the 2 year period.
(ii) Derive the posterior distribution of θ given 𝑦! and 𝑦!
(iii) Show that the posterior expectation of Y3 given 𝑦! , 𝑦! can be written in
the form of a credibility estimate:
α
× c(1 + r ) 2
Z × k + (1 − Z) ×
λ
specifying expressions for k and Z .
(iv) Describe k in words and comment on the impact the values of n1 , n2, have
on Z. [UK April 2006]
11. (i) Let p be an unknown parameter, and let f(p|x) denote the probability
density of the posterior distribution of p given information 𝑥 . Show that
under all-or-nothing loss the Bayes estimate of p is the mode of f(p|x).
(ii) Now suppose p is the proportion of the population carrying a particular
genetic condition. Prior beliefs about p have a U(0,1) distribution. A
sample of size N is taken from the population revealing that m individuals
have the genetic condition.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
146
www.sankhyiki.in
+91-‐9711150002
(a) Suggest why the U(0,1) distribution has been chosen as the prior,
and derive the posterior distribution of p.
(b) Calculate the Bayes estimate of p under all-or-nothing loss.
[UK Sept 2006]
12. The number, X of claims on a given insurance policy over one year has
probability distribution given by
P(X = k ) = θk (1 − θ) , k = 0,1,2,…
where θ is an unknown parameter with 0 < θ < 1.
Independent observations x1,..., x n are available for the number of claims in the
previous n years. Prior beliefs about θ are described by a distribution with
density:
f (θ) ∝ θα −1 (1 − θ)α −1
for some constant α> 0.
(i) (a) Derive the maximum likelihood estimate. θ̂ , of θ given the data
x1,..., x n .
(b) Derive the posterior distribution of θ given the data x1,..., x n .
(c) Derive the Bayesian estimate of θ under quadratic loss and show
that it takes the form of a credibility estimate
Zθˆ + (1 − Z)µ
where µ is a quantity you should specify from the prior distribution
of θ.
(d) Explain what happens to Z as the number of years of observed data
increases.
(ii) (a) Determine the variance of the prior distribution of θ.
(b) Explain the implication for the quality of prior information of
increasing the value of α. Give an interpretation of the prior
distribution in the special case α = 1.
(iii) Calculate the Bayesian estimate of 𝜃 under quadratic loss if n = 3, x1 = 3,
x2= 3, x3= 5 and
(a) α=5
(b) α = 2.
Comment on your results in the light of (ii) above. [UK April 2007]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
147
www.sankhyiki.in
+91-‐9711150002
13. The number of claims, N. in a year on a portfolio of insurance policies has a
Poisson distribution with parameter λ. Claims are either large (with probability
p) or small (with probability 1 — p) independently of one another.
Suppose we observe r large claims. Show that the conditional distribution of
N—r|r is Poisson and find its mean. [UK Sept 2007]
14. A claim amount distribution is normal with unknown mean µ and known
standard deviation £50. Based n past experience a suitable prior distribution for µ
is normal with mean £300 and standard deviation £20.
(i) Calculate the prior probability that µ,the mean of the claim amount
distribution, is less than £270.
(ii) A random sample of 10 current claims has a mean of £270.
(a) Determine the posterior distribution of µ.
(b) Calculate the posterior probability that µ is less than £270 and
comment on your answer. [UK April 2008]
Explain whether each suggestion is an appropriate choice for Z. [UK Sept 2008]
16. An insurance company provides warranties for a certain electrical gadget; At the
start of 2006 there were 4,500 gadget under warranty, each of which has a
probability q of suffering complete failure in 2006 (independently between
gadgets). The prior distribution of q is beta with mean 0.015 and standard
deviation 0.005. Given that 58 gadgets suffer a complete failure in 2006,
determine the posterior distribution of q. [UK Sept 2008]
17. An insurer’s portfolio consists of three independent policies. Each policy can give
rise to at most one claim per month, which occurs with probability θ
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
148
www.sankhyiki.in
+91-‐9711150002
20. An office worker receives a random number of emails each day. The numbers of
emails per day follows a Poisson distribution with unknown mean µ. Prior
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
149
www.sankhyiki.in
+91-‐9711150002
beliefs about µ are specified by a gamma distribution with mean 50 and standard
deviation 15. The worker receives a total of 630 emails over a period of ten days.
Calculate the Bayesian estimate of µ under all or nothing loss. [UK Sept 2010]
21. Let y1,..., yn be samples from a uniform distribution on the interval [0, θ] where
θ > 0 is an unknown constant. Prior beliefs about θ are given by a distribution
with density
⎧αβαθ−(1+ α ) θ>β
f (θ) = ⎨
⎩0 otherwise
where α and β are positive constants.
(i) Show that the posterior distribution of θ given y1 is of the same form as
the prior distribution, specifying the parameters involved.
(ii) Write down the posterior distribution of θ given y1,..., yn [UK April 2011]
23. The total claim amount per annum on a particular insurance policy follows a
normal distribution with unknown mean θ and variance 2002. Prior beliefs about
θ are described by a normal distribution with mean 600 and variance 502. Claim
amounts x1,x2,……xn are observed over n years.
(i) State the posterior distribution of θ.
(ii) Show that the mean of the posterior distribution of θ can be written in the
form of a credibility estimate.
Now suppose that n=5 and that total claims over the five years were 3,400.
(iii) Calculate the posterior prob that θ is greater than 600. [UK April 2012]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
150
www.sankhyiki.in
+91-‐9711150002
24. A proportion p of packets of a rather dull breakfast cereal contain an exciting toy
(independently from packet to packet). An actuary has been persuaded by his
children to begin buying packets of this cereal. His prior beliefs about p before
opening any packets are given by a uniform distribution on the interval [0,1]. It
turns out the first toy is found in the n1th packet of cereal.
(i) Specify the posterior distribution of p after the first toy is found.
A further toy was found after opening another n2 packets, another toy after
opening another n3 packets and so on until the fifth toy was found after opening
a grand total of n1 + n2 + n3 + n4 + n5 packets.
(ii) Specify the posterior distribution of p after the fifth toy is found.
(iii) Show the Bayes’ estimate of p under quadratic loss is not the same as the
maximum likelihood estimate and comment on this result.
[UK April 2012]
25. The number of claims arising in a year from a group of policies follows a Poisson
distribution with mean µ. The prior distribution for µ is gamma with parameters
α=10 and λ = 2. Given that 8 claims arose in the last year, determine the posterior
distribution for µ.
26. The number of claims per month are independent Poisson random variables with
mean λ, and the prior distribution for λ is exponential with mean 0.2.
(i) Determine the posterior distribution for λ given the observed values
x1, ……., xn of the number of claims in each of n months.
(ii) Determine the Bayesian estimator of λ.
(a) under quadratic loss
(b) under “all-or-nothing” loss
(iii) If n = 5 and !!!! x! = 1, calculate to 2 significant figures the Bayesian
estimate of λ under absolute error loss.
27. The number of claims registered per week has a Poisson distribution for which
the mean λ, is either 1 or 2. The prior distribution for λ is given by:
P(λ=1) = 0.4 P(λ=2) = 0.6
Given that three claims are claims are registered in a particular week, calculate
the Bayesian estimate of λ under squared error loss, and under zero-one loss.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
151
www.sankhyiki.in
+91-‐9711150002
28. The number of claims arising each month from a general insurance portfolio has
a Poisson distribution, with unknown Poisson parameter λ. Claims are
monitored over a period of 50 months, and an average of 210 claims per month is
observed.
(i) It is suggested, based on knowledge gained from similar portfolios,
that a suitable prior distribution for λ has mean 250 and variance 45.
Using the conjugate prior distribution, determine the posterior
distribution of λ and the Bayesian estimate of λ under quadratic loss.
(ii) An alternative suggestion for estimating λ is to use the number of
claims occurring on a single day, which is assumed to have a Poisson
distribution with mean λ/30. It is suggested that the following prior
distribution for λ should be used:
P(λ = 230) = 0.2, P(λ = 250) = 0.5 and P(λ = 270) = 0.3
If 7 claims were recorded on the most recent day for which data are
available, determine the posterior distribution for λ, and hence find the
Bayesian estimate of λ under quadratic loss.
29. An actuary has a tendency to be late for work. If he gets up late then he arrives at
work X minutes late where X is exponentially distributed with mean 15. If he
gets up on time then he arrives at work Y minutes late where Y is uniformly
distributed on [0, 25]. The office manager believes that the actuary gets up late
one third of the time.
Calculate the posterior probability that the actuary did in fact get up late given
that he arrives more than 20 minutes late at work. [UK April 2013]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
152
www.sankhyiki.in
+91-‐9711150002
31. The heights of adult males in a certain population are Normally
distributed with unknown mean µ and standard deviation σ = 15. Prior
beliefs about µ are described by a Normal distribution with mean 187 and
standard deviation 10.
32. Let θ denote the proportion of insurance policies in a certain portfolio on which a
claim is made. Prior beliefs about θ are described by a Beta distribution with
parameters α and β.
Underwriters are able to estimate mean µ and variance σ2
of θ.
(i) Express α and β in terms of µ and σ.
A random sample of n policies is taken and it is observed that claims have arisen
on d of them.
(ii) (a) Determine the posterior distribution on θ.
(b) Show that the mean of the posterior distribution can be written in
form of a credibility estimate.
(iii) Show that the credibility factor increases as σ increases. [UK April 2014]
33. A direct insurance sales office of a motor insurance company receives random
number of enquiry calls every day. The number of calls each day follows a
Poisson distribution with unknown mean β.
Prior beliefs about β are specified by a Gamma distribution with mean of 200 and
standard deviation of 50. The sales team has received 240 calls daily on average
recently. Calculate the Bayesian estimate of β under Quadratic loss.
[India May 2014]
34. There is a group of m independent ‘mediclaim’ policies, which are in the book
of an insurer since long time. Under each policy, at the most one claim is
possible in any month as per the contract. The probability of a claim in a month
for each policy is p (0 < p <1). The total monthly number of claims from the
group of m policies are x1, x2, ….., xn in the past n months. The prior
distribution of p is given by the density function f(p) ∞ {p(1-p)}a where a > -1.
(i) Derive the posterior distribution of p given x1, x2, ….., xn.
(ii) Derive the maximum likelihood estimate of p.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
153
www.sankhyiki.in
+91-‐9711150002
(iii) Derive the Bayesian estimate of p under quadratic loss and show it takes
the form of a credibility estimate Zp + (1 – Z) k, where k is a scalar (which
you should specify) in terms of prior distribution of p.
(iv) Explain what happens to Z when n increases gradually.
(v) Calculate the Bayesian estimates of p and Z if m = 100, n = 12 and
x1+ x2 + ….. + x12 = 15 when a = 0 and a = 3.
(vi) Considering the prior variance, comment on effect on Z of increasing a
and also relate this effect to the quality of prior information of p in each
case. [India Sept 2013]
35. Let p be an unknown parameter and let f(p|x) be the probability density of the
posterior distribution of p given information x.
(i) Show that under all-or-nothing loss the Bayes estimate of p is the mode of
f(p|x).
John is setting up an insurance company to insure luxury yachts. In year 1 he will
insure 100 yachts and in year 2 he will insure 100 + g yachts where g is an integer.
If there is a claim the insurance company pays a fixed sum of $1m per claim.
The probability of a claim on a policy in a given year is p. You may assume that
the probability of more than one claim on a policy in any given year is zero. Prior
beliefs about p are described by a Beta distribution with parameters α = 2 and
β = 8.
In year 1 total claims are $13m and in year 2 they are $20m.
(ii) Derive the posterior distribution of p in terms of g.
(iii) Show that it is not possible in this case for the Bayes estimate of p to be the
same under quadratic loss and all-or-nothing loss. [UK April 2015]
36. A small island is holding a vote on independence. Two recent survey results are
shown below:
Poll Sample size Support for
independence
A 10 5
B 20 11
You should assume that the samples are independent.
A politician is using a suitable uniform distribution as the prior distribution in
order to estimate the proportion θ in favour of independence.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
154
www.sankhyiki.in
+91-‐9711150002
(i) Calculate an estimate of θ under the quadratic loss function.
A rival politician decides to use instead a beta distribution as the prior, with
parameters α and β, where α = β.
(ii) Determine the new estimate of θ under the “all-or-nothing” loss function
in terms of α. [UK Sept 2015]
37. Claims X each year from a portfolio of insurance policies are normally distributed
with mean θ and variance τ2. Prior information is that q is normally distributed
with known mean µ and known variance σ2.
Aggregate claims over the last n years have been xi for i = 1 to n, and you should
assume that these are independent.
(i) Derive the posterior distribution of θ.
(ii) Write down the Bayesian estimate of θ under quadratic loss.
(iii) Show that the estimate in your answer to part (ii) can be expressed in the
form of a credibility estimate, including statement of the credibility factor
Z. [UK Sept 2015]
38. A child playing a game believes that a six sided die is unfair, and that he has a
probability p > 1/6 of predicting the outcome of any given throw. His mother is
less sure, and her prior beliefs about p are as follows:
• a 1/3 chance that p = 2/6 and
• a 2/3 chance that p = 1/6
The child accurately predicts the results of 4 out of 10 dice throws. Calculate the
posterior probability that p = 1/6. [UK April 2016]
39. A breathalyser used by the police in a certain town incorrectly gives a positive
reading for drivers who are not over the legal limit one time in 20 and an
incorrect negative reading for drivers who are over the limit one time in 5. If one
driver in 10 is actually over the limit on a particular day, what is the probability
that a driver who fails the breathalyser test is in fact over the legal limit (which
will be checked using a blood test at the police station)?
40. A single observation, x, is drawn from a distribution with the probability density
function:
f(x|θ) = 1/θ 0 < 𝑥 < 𝜃
The prior distribution of q is given by: g(θ) = θ exp(-θ ), θ > 0
Derive an expression in terms of x for the Bayes estimator of θ with respect to the
absolute error loss function.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
155
www.sankhyiki.in
+91-‐9711150002
41. The number of claims in a week that arises from a certain group of insurance
policies has a Poi(µ) distribution. In the last 2 weeks, the numbers of claims
incurred were 7 and 11, respectively.
(i) Derive the posterior distribution given that the prior distribution for m is:
(a) gamma with parameters α = 7 and λ = 0.5
(b) uniform on the integers 8, 10 and 12.
(ii) Hence for each case in part (i) obtain the Bayesian estimate of µ using a:
(a) quadratic loss function
(b) absolute loss function
(c) zero-one loss function.
42. At the end of last year, a new laptop manufacturer approached an insurance
company for providing insurance for laptops sold during the first year of its
operation.
Based on existing insurance contracts in its portfolio, the insurer estimated the
probability of failure over the coming year for each laptop to be “p”. After a year
the insurer observes that 92 laptops suffer from complete failure during the year
of insurance out of 9000 laptops insured.
Assuming that the prior distribution of p is beta with mean 0.013 and standard
deviation 0.004 find out the posterior distribution of p. [India April 2016]
43. The number of claims per policy per year follows Poisson distribution with
unknown parameter µ. Prior beliefs that µ follows Gamma distribution with
parameters α and λ. Number of policies sold, individual claim amounts and
average total claim amount per policy is summarized in the table given below.
(i) Using posterior distribution of µ, derive E(A4 |a1, a2, a3) in a credibility
estimate form as given below:
!
2500× × 1 − 𝑍 + 𝑏×𝑍, specifying b and Z.
!
(ii) Comment on Z. [India Nov 2015]
44. Profit Insurance Company sells 3500 policies under a particular category of
business last year. The policies are assumed to be independent and at most one
claim can be made on any policy. The probability of making a claim q is the same
for all policies.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
156
www.sankhyiki.in
+91-‐9711150002
The total number of claims in the previous year was found to be p.
The prior distribution of q is Beta (α, β)
(i) Find maximum likelihood estimate of q and posterior distribution of q
given the past year data.
(ii) Find the Bayesian estimate of the posterior distribution under quadratic
loss function. If p = 500, α =1 and β = 4, find the value of the Bayesian
estimate.
(iii) Can the Bayesian estimate of q be written in the form of a credibility
estimate? If yes, express the same in the form of a credibility estimate and
compute the credibility factor for the above values mentioned in part (ii).
[India May 2015]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
157
www.sankhyiki.in
+91-‐9711150002
ANSWERS
!
1. (i) (a) Gamma(1+ 𝑋! , 𝑛 + !)
!
(b) Z = !
!!
!
(c) 𝜆 = 54.99
!
(ii) (a) P(T>M) = 𝑒 !! ×0.6 + (!!!)! ×0.4
2. Gamma(22,26)
!! !!! !
6. (i) λ̂ = !
(ii)(a) Gamma(n1+n2+1, 3+v) (b) Z=
!!!
!
7. d= !!
8. (iv) n > 74
!! ! !! !
! ! ! !
!! ! !! ! !!
! !!
9. (i) 𝜃|𝑋~( ! ! ,
! ! ) (ii) 𝜃 = ! !
! ! !! ! ! !
!! ! !! !! !! !!
! !!
!
!!!
(iii) Z= ! ! (iv) 437.69
!
!!
! !!
!
2
(v) As σ 1
increases, Z decreases and as σ 22 increases, Z increases.
!! !! ! !
! !
10. (i) Poi((n1+n2)𝜃) (ii) Gamma(𝛼 + !
+ !(!!!) , 𝜆 + 𝑛! + 𝑛! )
! !!! (!!!)! !! !! !(!!!)!! !!
(iii) 𝑍 = !!!! and 𝑘 =
! !!! !! !!!
(iv) k is effectively a weighted average of the inflation adjusted average claim
amounts for the previous 2 years, weighted by the number of policies in force. As
the number of policies in force increases, Z becomes closer to 1, and so more
weight is placed on the actual experience, and less on the prior expectations.
11. (ii) (a) Using U(0, 1) as the prior for p suggests that no prior information or
beliefs about p have been formed — it is equally likely to lie anywhere in the
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
158
www.sankhyiki.in
+91-‐9711150002
range [0, 1]. So posterior beliefs about p have a Beta distribution with parameters
m + 1 and N − m + 1.
(b) p = m/N
! !!! !⋯!!
12. (i) (a) 𝜃 = !!!!
! !!! !⋯!!
(b) The posterior distribution is given by a beta distribution with parameters
α + x1 +…+ xn and α + n.
!! !! !! !!
(c) 𝜃 ∗ = !!! !! !!
𝑍 = !!! !! !!
𝜇 = 0.5 is the prior mean of 𝜃.
(d) As n increases, Z tends towards 1, and the Bayes estimate approaches the
maximum likelihood estimate, as more credibility is put on the data, and less
on the prior estimate.
!
(ii)(a) Var = !(!!!!)
(b) Higher values of α result in a lower variance and hence imply greater
certainty over the prior value of θ. In the special case where α =1 the prior
distribution is Uniform on [0,1] implying that we have no particular reason to
believe that any prior value of θ is more or less likely than any other.
(iii) (a) 𝜃 ∗ = 0.6667 (b) 𝜃 ∗ = 0.7222
The first set of parameters has greater certainty attached to the prior estimate (i.e.
a higher value of α ), and therefore the posterior estimate is closer to the mean of
the prior distribution (which is 0.5) than in the second case.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
159
www.sankhyiki.in
+91-‐9711150002
18. (i) A distribution is a conjugate prior for an unknown parameter if when used as
a prior distribution for that parameter it leads to a posterior
distribution which is from the same family.
(ii) Beta (𝛼 + 𝑘, 𝛽 + 𝑛 − 𝑘)
!!!!!
(iii) 𝑑 ∗ = !!!!!!!
!
(iv) 𝑍 = !!!!!!!
(v) Using the given loss function the estimate is 0.26666 and using Bayesian loss,
we have 0.3125. The mean of the prior is 0.5 and the observed sample mean is 0.2.
The loss function in (iv) penalises mis-estimates particularly when the true value
of p is lower. This means that the estimate in (iv) is lower than would result from
straight quadratic loss.
19. The post dist of p is given by P(p = 0.4) = 0.411436 and P(p = 0.75)= 0.588564
20. 62.62
21. (ii) The posterior distribution has the same form with parameters α + n and
max(β, y1, …, yn).
22. 0.364557
23. (iii) 0.669
24. (i) Beta (2, n1) (ii) Beta (6, n1+n2+…+n5 - 4)
(iii) Under squared error loss the Bayes estimate is given by the mean of the
! !
posterior distribution which in this case is 𝑝 = !!! = !
! !!! ⋯!!! !𝟐
!
and the MLE estimate is 𝑝 = !
! !!! ⋯!!!
So the two estimates are not the same. This is perhaps a little surprising given
that we started with an uninformative prior, but arises because the estimates are
calculated in two different ways – i.e. one maximises the likelihood and the other
minimises the expected squared error. If we wanted the two to be the same we
should use an “all-or-nothing” loss function.
25. Gamma (18, 3)
!! !! !
26. (i) Gamma (1+ 𝑥! , 𝑛 + 5) (ii) (a) !!!
(b) !!!! (iii) 0.17
27. 1.815
28. (i) Gamma (11888.89, 55.5556) and Bayesian estimate is 214
(ii)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
160
www.sankhyiki.in
+91-‐9711150002
29. 0.3972
!!!!! !
30. 𝑝 = !!!!!!! and 𝑍 = !!!!!!!
33. 237.03
!!
34. (i) Beta ( 𝑥! + 𝑏, 𝑚𝑛 + 𝑏 − 𝑥! ) (ii) 𝑝 = !"
!! ! !"
(iii) Bayesian estimate = !!!!"! , k = 0.5 and 𝑍 = !!!!"
(iv) When n increases, Z increases and for very large values of n, for a given b, Z
tends to 1. It means for a given b, as the size of past observations increases, more
and more weight is assigned to M.L.E of p and lesser weight is assigned to prior
estimates of p.
(v) When a = 0, b = 1 and So, p* = 16 / 1202 = 8/601. Z = 1200 / 1202.
When a = 3, b = 4 and So, p* = 19 / 1208 = 1 / 80. Z = 1200 / 1208.
(vi) When a = 0, b = 1 & so, prior variance = 1 / 2.2.3 = 1/ 12.
When a = 3, b = 4 & so, prior variance = 4.4/8.8.9 = 1/36.
So, as a increases, prior variance of p decreases. Though the prior mean remains
same as 0.5, but with higher value of a, we are more confident about p around 0.5
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
161
www.sankhyiki.in
+91-‐9711150002
!"!!
36. (i) 0.53125 (ii) !"!!!
37. (i)
!
(ii) (iii) 𝑍 = !!
!! !
!
38. 0.323
39. 0.64
40. The Bayes estimator of θ with respect to absolute error loss is x + log 2
41. (i) (a) Gamma (25, 2.5) (b)
µ 8 10 12
Post prob 0.398 0.4047 0.1973
(ii) (a) 10 and 9.599 (b) 9.866 and 10 (c) 9.6 and 10
42. Beta(102.41, 9698.52)
43. (ii) If the number of policies (I.e. the past experience) increases then Z becomes
closer to 1 and hence more weight is placed on actual experience and less on the
prior expectations.
!
44. (i) MLE, 𝑞 = !"## and Post distribution – Beta (p+α,
β
–p
+3500)
(ii) 0.1429
! ! !"##
(iii) 𝑍 !"## + (1 − 𝑍) !!! where 𝑍 = !!!!!"## = 0.998573
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
162
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 12
GENERALISED LINEAR MODELS (G.L.M.)
1. Let Yij be the number of accidents on a particular motorway in the jth quarter of
year i, i = 1, 2, 3, j = 1,…..4. Suppose that Yij has a Poisson distribution with
mean µij.
(i) (a) Derive the log-likelihood function as a function of µij and determine
the maximum likelihood estimate of µij.
(b) If log µij = µ, determine the maximum likelihood estimate of µ.
(c) Define the scaled deviance and derive an expression for the scaled
deviance for the model in (i)(b).
(ii) Three models are shown below
(iii) It is found that the model log µij = α×i + βj provides a reasonable fit to the
data, with the estimate of α given as 0.34. Interpret this model.
[UK April 2004]
2. The preparation times for coffee in a high-street coffee shop have density
4
𝑓 𝑦 = ! 𝑦𝑒 !!!/!
𝜇
(i) Show that this can be written in exponential family form, and determine
the natural parameter.
(ii) Interpret the two models:
!
Model I : !
= 𝛼! 𝑖 = 1, 2, 3
𝛼 𝑖 = 1
!
Model II : =
𝛼 + 𝛽 𝑖 = 2, 3
!
where i = 1, 2, 3 correponds to filter coffee, cappuccino and espresso
respectively. [UK Sept 2004]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
163
www.sankhyiki.in
+91-‐9711150002
3. Y1, Y2,…… Yn are independent claims, which are assumed to be exponentially
distributed, with E(Yi) = µi
(i) Show that the canonical link function is the inverse link function.
(ii) It is decided that the canonical link should not be used, but that the mean
claim sizes should be modeled as follows:
𝛼 𝑖 = 1,2, … . , 𝑚
log 𝜇! =
𝛽 𝑖 = 𝑚 + 1, 𝑚 + 2, … . 𝑛
(a) Show that the log-likelihood can be written as:
! !
!! !!
− 𝑚𝛼 + 𝑛 − 𝑚 𝛽 + 𝑒 𝑦! + 𝑒 𝑦!
!!! !!!!!
(b) Derive the maximum likelihood estimators of α and β.
(c) Show that the scaled deviance for this model is
! 1 ! ! 1 !
𝑚 !!! 𝑦! 𝑛 − 𝑚 !!!!! 𝑦!
2 log + 𝑙𝑜𝑔
𝑦! 𝑦!
!!! !!!!!
(iii) A company has data for each month over a 2 year period. For one risk, the
average number of claims per month was 17.45. In the most recent month
of this risk, there were 9 claims. Calculate the contribution that this
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
164
www.sankhyiki.in
+91-‐9711150002
observation makes to the deviance. [UK April 2006]
7. (i) The gamma distribution with mean µ and variance µ2/α has density
function:
!"
!! !!
𝑓 𝑦 = !! ! !
𝑦 !!! 𝑒 𝑦 > 0
(a) Show that this may be written in the form of an exponential family.
(b) Use the properties of exponential families to confirm that the mean and
variance of the distribution are µ and µ2/α.
(i) Explain the difference between a continuous cavariate and a factor.
(ii) A company is analyzing its claims data on a portfolio of motor
policies and uses a gamma distribution to model the claim severties.
The company uses threerating factors:
• Policyholder age ( as a continuous variable)
• Policyholder gender
• Vehicle rating group ( as a factor)
(a) Write down the form of the linear predictor when all rating factors
are included as main effects.
(b) State how the linear predictor changes if an interaction between
policyholder age and gender is included. [UK April 2007]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
165
www.sankhyiki.in
+91-‐9711150002
(ii) The fitted value for observation yi is denoted by 𝑦! .
(a) Write down the Pearson residual for yi, in terms of yi and 𝑦! .
(b) Explain why Pearson residuals are usually not suitable for model
checking for the Poisson distribution.
(iii) Show that the conjugate prior density function for θ is proportional to
𝑒𝑥𝑝 𝛼𝜃 − 𝛽𝑒 ! , and derive the posterior distribution for this prior.
! !"# !
(iv) Use the identity 𝐸 !"
= 0 𝑓𝑜𝑟 𝑎𝑛𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝑓 to show that
! ! ! !
!!! !!
𝐸𝑏 𝜃 = ! 𝑎𝑛𝑑 𝐸 𝑏 𝜃 𝑦! , 𝑦! , … . . 𝑦! = !!!
, and comment on these
results. [UK Sept 2007]
10. (i) Express the probability density function of the gamma distribution in the
form of a member of the exponential family of distributions. Specify the
natural and scale parameters.
(ii) State the corresponding canonical link function for generalized linear
modeling if the response variable has a gamma distribution.
[UK April 2009]
(i) Write down the likelihood function and obtain the maximum likelihood
estimate for the parameters θij.
(ii) Show that P(Yij = y) can be written in exponential family form and suggest
its natural parameter.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
166
www.sankhyiki.in
+91-‐9711150002
(iii) Suppose that θij depends on the temperature xj recorded in the jth month.
Explain why it is not appropriate to set θij = α + βxj. Suggest another
relationship between θij and α + βxj that might be used. [UK Sept 2009]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
167
www.sankhyiki.in
+91-‐9711150002
15. An insurance company covers pedigree cats against the costs of medical
treatment. The cost of claims from a policy in a year is assumed to have a
normal distribution with mean µ ( which varies from policy to policy) and
known variance 252. It is assumed that 𝜇 = 𝛼 + 𝛽𝑥, 𝑤ℎ𝑒𝑟𝑒 𝛼 𝑎𝑛𝑑 𝛽 are fixed
constants and x is the age of the cat. You are given to the following data for the
pairs (yi,xi) for i = 1, 2, …., 50 where yi is the cost of claims 1st year for the ith
policy and xi is the age of the corresponding cat.
!" !" !" !"
16. (i) Define what it means for a random variable to belong to an exponential
family.
(ii) Show that if a random variable has the exponential distribution it belongs
to an exponential family. [UK April 2012]
17. The numbers of claims on three different classes of insurance policies over the
last four years are given in the table below:
Year 1 Year 2 Year 3 Year 4 Total
Class 1 1 4 5 0 10
Class 2 1 6 4 6 17
Class 3 5 6 4 6 24
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
168
www.sankhyiki.in
+91-‐9711150002
(i) Explain why this distribution belongs to an exponential family.
(ii) State the three main components that need to be taken into account when
constructing a generalized linear model.
(iii) Suggest a natural choice of link function if the response variable followed
the distribution defined above
(iv) Suggest a natural choice of link function if instead the response variable
followed a lognormal distribution. [UK Sept 2012]
19. An insurance company believes that individual claim amounts from house
insurance policies follow a gamma distribution with distribution function given
by:
𝛼! !
!
𝑓 𝑦 = ! 𝑦 !!! 𝑒 ! 𝑓𝑜𝑟 𝑦 > 0
𝜇 Γ 𝛼
where α and µ are positive parameters.
(i) Show that the gamma distribution can be written in exponential family
form, giving the natural parameter and the canonical link function.
The insurance company has data for claim amounts from previous claims. It
believes that the claim amount is primarily influenced by two variables.
• xi the type of geographical area in which the house is situated. This can
take one of 4 values.
• yi the category of the age of the house where the three categories are 0-
29 years, 30-59 years and 60 years+.
It wishes to model claim amounts using this data and the generalized linear
model from part (i) with its canonical link function. The insurance company is
investigating models, which take into account these variables, and has the
following table of values.
A 1 900
B Age 789
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
169
www.sankhyiki.in
+91-‐9711150002
(ii) Explain, by analyzing the scaled deviances, which model the insurance
company should use. [UK April 2013]
20. The number of claims per month Y arising on a certain portfolio of insurance
policies is to be modeled using a modified geometric distribution with
probability density given by:
𝛼 !!!
𝑝 𝑦 𝛼 = 𝑦 = 1, 2, 3, … ..
1 + 𝛼 !
where α is an unknown positive parameter. The most recent four months have
resulted in claim numbers of 8, 6, 10 and 9.
(i) Derive the maximum likelihood estimate of α.
(ii) Show that Y belongs to an exponential family of distributions and suggest
its natural parameter. [UK Sept 2013]
21. For a certain portfolio of insurance policies the number of claims on the ith
policy in the jth year of cover is denoted by Yij. The distribution of Yij is given by:
!
𝑃 𝑌!" = 𝑦 = 𝜃!" 1 − 𝜃!" 𝑦 = 0, 1, 2, ….
(i) Derive the maximum likelihood estimate of θij given the single observed
data point yij.
(ii) Write P(Yij = y) in exponential family form and specify the parameters.
(iii) Describe the different characteristics of Pearson and deviance residuals.
[UK April 2014]
22. Annual numbers of claims on three different types of insurance policy follow a
Poisson distribution with parameter µi for i = 1, 2, 3. Data for the last four years
is given in the table below.
1 5 5 0 1 11
2 2 5 4 5 16
3 5 6 4 5 20
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
170
www.sankhyiki.in
+91-‐9711150002
(i) Derive the maximum likelihood estimate of µ1 and calculate the
corresponding estimates of µ2 and µ3.
(ii) Test the hypothesis that µ1, µ2 and µ3 are equal using the scaled deviance.
[UK April 2015]
23. (i) Explain what is meant by a saturated model.
(ii) State the definition of the scaled deviance in a fitting under generalised
linear modelling.
(iii) (a) Define both Pearson and deviance residuals.
(b) Explain how these two types of residuals are generally different.
(c) State in which case they are the same. [UK Sept 2015]
24. (i) State the general expression of the exponential families of distributions
and use this to derive the relevant expressions for the mean and the
variance of these distributions.
(ii) Extend the result in (i) to obtain an expression for the third central
moment.
(iii) Show that the following density function belongs to the exponential
family of distributions:
!
!! ! !!
𝑓 𝑦 = !! ! ! 𝑦 !!! 𝑒 𝑓𝑜𝑟 𝑦 > 0
(iv) Using the results in (i) and (ii) obtain the second and third central
moments for this distribution. [UK April 2016]
25. A small insurer wishes to model its claim costs for motor insurance using a
simple generalised linear model based on the three factors:
The insurer is considering three possible models for the linear predictor:
Model 1: YO + FS + TC
Model 2: YO + FS + YO.FS + TC
Model 3: YO* FS *TC
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
171
www.sankhyiki.in
+91-‐9711150002
(i) Write each of these models in parameterised form, stating how many non-
zero parameter values are present in each model.
(ii) Explain why Model 1 might not be appropriate and why the insurer may
wish to avoid using Model 3.
(iii) The student fitting the models has said “We are assuming a normal error
structure and we are using the canonical link function.” Explain what this
means.
(iv) The table below shows the student’s calculated values of the deviance for
these three models and the constant model.
26. The following study was carried out into the mortality of leukaemia sufferers. A
white blood cell count was taken from each of 17 patients and their survival
times were recorded.
Suppose that Yi represents the survival time (in weeks) of the ith patient and xi
represent the logarithm (to the base 10) of the ith patient’s initial white blood
cell count (i = 1,2,…,17 ).
The response variables Yi are assumed to be exponentially distributed. A
possible specification for E(Yi ) is E(Yi ) = exp(α + βxi ) . This will ensure that
E(Yi ) is nonnegative for all values of xi .
(i) Write down the natural link function associated with the linear predictor
ηi = α + βxi .
(ii) Use this link function and linear predictor to derive the equations that
must be solved in order to obtain the maximum likelihood estimates of α
and β.
(iii) Given that the maximum likelihood estimate of α derived from the
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
172
www.sankhyiki.in
+91-‐9711150002
experimental data is 𝛼 = 8.477 and se(𝛼) = 1.655 , obtain an approximate
95% confidence interval for a and interpret this result.
(iv) The following two models are now to be compared:
Model 1: E(Yi ) = α
Model 2: E(Yi ) = α + βxi
The deviance for Model 1 is found to be 26.282 and the deviance for Model
2 is 19.457. Test the null hypothesis that β = 0 against the alternative
hypothesis that β≠0 stating your conclusion clearly.
SA+PT 206.7
SA+PT+SA.PT 178.3
SA*PT+NB 166.2
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
173
www.sankhyiki.in
+91-‐9711150002
ANSWERS
1. (i) (a) 𝑙𝑛𝐿 𝜇!" = ! ! −𝜇!" + 𝑦!" 𝑙𝑛𝜇!" + 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 and µμij = yij
! ! !!" ! ! !!"
(b) 𝜇 = 𝑙𝑛 !"
(c) 𝑠𝑐𝑎𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑛𝑐𝑒 = 2 ! ! 𝑦!" 𝑙𝑛𝑦!" − 𝑦!" ln !"
(b) Between model 1 and model 2, the deviance drops by 64.16 but the
difference in the degree of freedom is only 2, so model 2 is better
than model 1. Between model 2 and model 3, the deviance drops by
191.51 but the difference in the degree of freedom is only 3, so model
3 is better than model 2.
(iii) The number of accidents on the motorway has a Poisson distribution. The
log of the mean of this distribution has a linear trend from year to year, so
that the annual increas in the log is the same from one year to the next.
However, within each year there is a seasonal variation effect which is
unrestricted by the model.
! !!
2. (i) 𝜃 = − ! 𝑏 𝜃 = 𝑙𝑛𝜇 = 𝑙𝑛 !
!
𝜙1 𝑎 𝜙 = ! 𝑐 𝑦, 𝜙 = 𝑙𝑛𝑦 + 𝑙𝑛4
(ii) Interpretation
Model1. This model indicates that the mean preparation time for filter coffee is
constant. It is also a fixed value for the other two types of coffee but all three
means are different from each other.
Model 2. This model indicates that there is a constant mean preparation time for
each type of coffee, but that the mean preparation times for cappuccino and
espresso are the same, whereas the mean preparation time for filter is (possibly)
different.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
174
www.sankhyiki.in
+91-‐9711150002
3. (i) We see that the function θ is equal to 1/µ. So the canonical link function is
!
𝑔 𝜇 = !, the inverse function.The minus sign can be absorbed into the linear
predictor, and so is not needed as part of the link function.
! ! ! !
(ii)(b) 𝛼 = 𝑙𝑜𝑔 ! !!! 𝑦! and 𝛽 = 𝑙𝑜𝑔 !!! !!!!! 𝑦!
𝑦! 7
2 −𝑙𝑛𝑦! − 1 + 𝛼 + = 2 – 𝑙𝑛7 − 1 + 𝑙𝑛14.2 + = 0.4006
𝑒! 14.2
9
2 9𝑙𝑜𝑔 − 9 − 17.45 = 4.982
17.45
6. (i) If w = ny then:
𝑛
𝑃 𝑌 = 𝑦 = 𝑃 𝑊 = 𝑛𝑦 = 𝑛𝑦 𝜇 !" 1 − 𝜇 !!!"
!
(ii).The natural parameter, 𝜃 = 𝑙𝑛 !!! , dispersion parameter 𝜙 = 𝑛
1 1
𝑎 𝜙 = = 𝑏 𝜃 = − ln 1 − 𝜇 = ln 1 + 𝑒 !
𝑛 ∅
𝑛 𝜙
𝑐 𝑦, 𝜙 = ln 𝑛𝑦 = 𝑙𝑛
𝑦𝜙
! ! !!!! !!!!
(iv). Scaled deviance = 2𝑛 !!! 𝑙𝑛 !! !!!!
+ 𝑙𝑛 !!!!
!
!
7. (i)(a) 𝜃 = − ! ⇒ 𝑏 𝜃 = log 𝜇 = − log −𝜃
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
175
www.sankhyiki.in
+91-‐9711150002
policyholder’s ages will appear in the corresponding linear predictor, and will
affect the expected claim rate.
= 𝛼! + 𝛽𝑥 + 𝛿!
(iii) (b) If there is interaction between age and gender, then the increase in risk
for each year of age is different for males and females. Instead of one β
parameter, we now need two –a βMale and a βFemale . So the linear predictor
now becomes:
= 𝛼! + 𝛽! 𝑥 + 𝛿!
!! !!
(ii) (a) The Pearson residual is defined as where var(µμ) is the variance
!"# !
of Yi with fitted mean,𝜇. Using 𝑦! as the fitted value of µ, the Pearson
!! ! !!
residual becomes
!!
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
176
www.sankhyiki.in
+91-‐9711150002
(ii) (b)The distribution of Pearson residuals is ofetn skewed for non-normal
data(eg for Poisson data). This makes the interpretation of residuals plots
and the assessment of goodness of fit by eye very difficult. In contrast,
deviance residuals are more likely to be symmetrically distributed and to
have approximately normal distributions.
(iii) Let 𝑓 𝜃 ∝ 𝑒𝑥𝑝 𝛼𝜃 − 𝛽𝑒 ! represent the prior density function for θ.
⇔ 𝑓 𝜃 𝑦 ∝ 𝑓 𝜃 ×𝐿 𝑦 𝜃
!
Now, 𝐿 𝑦 𝜃 ∝ exp 𝜃 !!! 𝑦! − 𝑛𝑏 𝜃 from part (i)
𝑓 𝜃 𝑦 ∝ exp 𝛼𝜃 − 𝛽𝑒 ! 𝑒𝑥𝑝 𝜃 𝑦! − 𝑛𝑏 𝜃
!!!
= 𝑒𝑥𝑝 𝛼 + 𝑦! 𝜃 − 𝛽 + 𝑛 𝑒 !
!!!
𝛼 + !!!! 𝑦!
𝐸 𝑏 𝜃 𝑦! , 𝑦! , … . . 𝑦! =
𝛽+𝑛
!
𝛽 𝛼 𝑛 1
= . + . 𝑦!
𝛽+𝑛 𝛽 𝛽+𝑛 𝑛
!!!
𝑛 𝑛
= . 𝐸 𝑏 𝜃 + 𝑦
𝛽+𝑛 𝛽+𝑛
! !
9. (i) 𝜃! = 𝜇! , 𝑏 𝜃! = ! 𝜇!! = ! 𝜃!! , ∅ = 𝜎, 𝑎 ∅ = 𝜎 ! = ∅! ,
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
177
www.sankhyiki.in
+91-‐9711150002
𝑦!! 1
𝑐 𝑦! , ∅ = − !
− 𝑙𝑛∅ − 𝑙𝑛2𝜋
2∅ 2
𝑉 𝜇! = 𝑏" 𝜃! = 1
!
10. (i). 𝜃 = − ! ⟹ 𝑏 𝜃 = 𝑙𝑜𝑔 𝜇 = −𝑙𝑜𝑔 −𝜃
1
∅ = 𝛼 ⟹ 𝑎 𝜃 =
∅
(ii). The canonical link function for the gamma distribution is 1/µ.
11. (i). 𝑙𝑛𝐿 = 𝑙𝑛𝜃!,! + 𝑙𝑛𝜃!,! + ⋯ . +𝑙𝑛𝜃!,!" + 𝑦!,! 𝑙𝑛 1 − 𝜃!,! + … + 𝑦!,!" 𝑙𝑛 1 − 𝜃!,!"
1
𝑖. 𝑒 𝜃!" =
1 + 𝑦!"
!
(ii). 𝑃 𝑌!" = 𝑦 = 𝑒𝑥𝑝 𝑙𝑛 𝜃!" 1 − 𝜃!" = exp 𝑦𝑙𝑛 1 − 𝜃!" − −𝑙𝑛𝜃!"
(iii).From the information in the question, θij needs to lie between 0 and 1, which
𝛼 + 𝛽𝑥 is not. This means that it would be inappropriate to set θij equal to
𝛼 + 𝛽𝑥! . A more appropriate relationship is:
!
𝑙𝑛 !!!!" = 𝛼 + 𝛽𝑥! .
!"
(ii)(a) 𝜂 = 𝛼! + 𝛽𝑥
(ii)(b) 𝜂 = 𝛼! + 𝛽! 𝑥
!
! !!"#$
!
13. (i). 𝑓 𝑥 = 𝑒𝑥𝑝 !/!
+ 𝛼𝑙𝑜𝑔𝛼 − 𝑙𝑜𝑔Γ 𝛼 + 𝛼 − 1 𝑙𝑜𝑔𝑥
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
178
www.sankhyiki.in
+91-‐9711150002
!
𝜃 = − ! ⇒ 𝑏 𝜃 = 𝑙𝑜𝑔 𝜇 = −𝑙𝑜𝑔 −𝜃
1
𝜙 = 𝛼 ⇒ 𝑎 𝜙 = 𝑐 𝑦, 𝜙 = 𝜙 − 1 log 𝑦 + 𝜙𝑙𝑜𝑔𝜙 − 𝑙𝑜𝑔Γ(𝜙)
𝜙
16. (i) A random variable Y belongs to an exponential family if we can write its PDF
in the following format:
𝑦𝜃 − 𝑏 𝜃
𝑓! 𝑦; 𝜃, 𝜙 = 𝑒𝑥𝑝 + 𝑐 𝑦, 𝜙
𝑎 𝜙
where θ is the natural parameter, φ is the scale parameter and a,b and c are the
functions.
!
! ! ! !" !
!
(ii). 𝑓! 𝜇 = 𝑒𝑥𝑝 !
+ 0. We have:
!
𝜃 = − ! , 𝑏 𝜃 = − ln −𝜃 , 𝑎 𝜙 = 1 𝑎𝑛𝑑 𝑐 𝑦, 𝜙 = 0
3. A link function between the response variable and the linear predictor.
!
(iii) 𝑔 𝜇 = 𝑙𝑛 !!!
1 1
𝜙 = 𝛼, 𝑎 𝜙 = = 𝑎𝑛𝑑
𝛼 𝜙
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
179
www.sankhyiki.in
+91-‐9711150002
!
The natural parameter is 𝜃 = − !. The canonical link function is 𝑔 𝜇 = 1/𝜇
!
21. (i) 𝜃𝑖𝑗 = !!! (ii) θ = log(1 − θij) is the natural parameter
!"
b(θ) = −logθij = −log[1 − 𝑒 ! ] ϕ = 1 a(ϕ) = 1 c(y, ϕ) = 0
(iii) The Pearson residuals are often skewed for non normal data which makes
the interpretation of residual plots difficult. Deviance residuals are usually more
likely to be symmetrically distributed and are preferred for actuarial
applications.
!!!!"!!"
(ii) Assuming µ = µ1 = µ2 = µ3 we have 𝜇 = !" = 3.916667
The difference in scaled deviance is given by
Δ = 2(log L1 + log L2 + log L3 - log L)
= 2(-4𝜇! + 11log 𝜇! - 4𝜇! +16 log𝜇! - 4𝜇! + 20 log𝜇! + 12𝜇 - 47 log𝜇)
[The logarithms of factorials cancel]
= 2(-47 + 11 log2.75 + 16 log4 + 20 log5 + 47 - 47 log3.91667)
= 2.6615
Under H0: µ1 = µ2 = µ3 we have that Δ comes from a 𝜒2 distribution with 3 - 1 = 2
degrees of freedom.
The upper 5% point of the 𝜒!! distribution is 5.991. The observed value is below
this and so there is no evidence to suggest that the underlying parameters are
different for each risk.
23. (i) The saturated model is one where the number of parameters is the same as the
data points, i.e. the fitted values are the same as the fitted data.
(ii) The scaled deviance is twice the difference between the log likelihood values
between the model in consideration and the saturated model.
!!!
(iii) (a) Pearson residuals are where 𝜇 is the fitted response estimator.
!"#(!)
The deviance residuals are sign(y -𝜇)di where di is the contribution of the i-th to
the total deviances, i.e. 𝑑!! is the scaled deviance.
(b) The Pearson residuals tend to be skewed in non normal data while the
deviance residuals tend to be symmetric and hence the normal assumption is
more appropriate.
For that reason the latter is preferred in actuarial applications.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
180
www.sankhyiki.in
+91-‐9711150002
24. (i) A random variable Y belongs to an exponential family if we can write its PDF
in the following format:
𝑦𝜃 − 𝑏 𝜃
𝑓! 𝑦; 𝜃, 𝜙 = 𝑒𝑥𝑝 + 𝑐 𝑦, 𝜙
𝑎 𝜙
where θ is the natural parameter, φ is the scale parameter and a,b and c are the
functions.
!! !! !
(iv) Var = !
and Skewness = !!
(ii) Model 1 does not allow for the possibility that there may be interactions
(correlations) between some of the factors. For example, it may be the case that
young drivers tend to drive fast cars and to live in towns.
With Model 3, which is a saturated model, it would be possible to fit the average
values for each group exactly ie there are no degrees of freedom left. This defeats
the purpose of applying a statistical model, as it would not “smooth” out any
anomalous results.
(iii) Normal error stucture means that the randomness present in the observed
values in each category (eg young/fast/town) is assumed to follow a normal
distribution.
The link function is the function applied to the linear estimator to obtain the
predicted values. Associated with each type of error structure is a “canonical” or
“natural” link function. In the case of a normal error structure, the canonical link
function is the identity function g(µ) = µ.
(iv)
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
181
www.sankhyiki.in
+91-‐9711150002
Model 2 would be the most appropriate in this case.
(ii)
(iii) (5.233, 11.721) Since this confidence interval does not contain zero we are
95% confident that the parameter is non-zero and should be kept.
(iv) T.S. = 6.826, reject H0 and conclude that Model 2 significantly reduces the
scaled deviance (ie it is significantly better fit to the data ) so survival time is
dependent on initial white blood cell count.
27. (i)
SA 𝛼 + 𝛽𝑥 2 238.4
SA+PT 𝛼! + 𝛽𝑥 11 206.7
SA+PT+SA.PT 𝛼! + 𝛽! 𝑥 20 178.3
SA*PT+NB 𝛼! + 𝛽! 𝑥 + 𝐵! 25 166.2
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
182
www.sankhyiki.in
+91-‐9711150002
ASSIGNMENT – 13
EMPIRICAL BAYES CREDIBILITY THEORY
1. An insurance company has insured a fleet of cars for the last four years. For year
j (j =1,...4), let Yj and Pj be the total amount claimed and the number of cars in the
fleet respectively. Let Xj = Yj/Pj be the average amount claimed per car in year j.
Assume that the distribution of Xj depends on a risk parameter q and that the
conditions of Empirical Bayes Credibility Theory Model 2 are satisfied.
Let: m(θ) = E(Xj/θ), s2(θ) = Pj V(Xj|θ), m = E[m(θ)], c = V[m(θ)] > 0
(i) (a) Derive E(Xj).
(b) Derive E(XjXk) , for j ≠ k
(c) Determine whether Xj, and Xk are independent ( j ≠ k ).
(ii) The company has insured ten similar fleets over the last four years. Using
the data from these years, m, E[s2(θ)] and V[m(θ)] are estimated to be 62.8,
106.32 and 5.8 respectively.
Calculate next year’s credibility premium for a fleet of cars with claims
over the last four years given below, if the fleet will have 16 cars next year.
Year
1 2 3 4
Total amount claimed 1,000 1,200 1,500 1,400
Number of cars 15 16 18 15
Explain how and why the credibility factor would be affected if the
estimate of V[m(θ)] increases, and comment on the effect on the credibility
premium. [UK Sept 2002]
2. The number of claims on a particular risk in a fixed time period has a Poisson
distribution with mean λ. There were x1 and x2 claims during the first two time
periods. .
Suppose that λ has prior density f(λ) = 2𝑒 !!! (λ > 0). Determine the Bayesian
estimate under quadratic loss for the expected number of claims during the third
time period, and show that it is of the form of a credibility estimate.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
183
www.sankhyiki.in
+91-‐9711150002
3. The total amount claimed for a particular risk in a portfolio is observed for each
of 5 consecutive years.
(i) From past knowledge of similar portfolios, an insurer believes that the
claims are normally distributed with mean θ and variance 25, and that the
prior distribution of θ is normal with mean 125 and variance 36.
(a) Derive the Bayesian estimate for θ under quadratic loss, and show
that it can be written in the form of a credibility estimate combining
the mean observed claim size for this risk with the prior mean for θ.
(b) State the credibility factor, and calculate the credibility premium if
the mean claim size over the 5 years is 122.
(c) Comment on how the credibility factor and the credibility estimate
change if the variance of 25 is increased.
(ii) A second insurer does not believe that this is an appropriate prior
distribution for risks in this portfolio, and decides to use Empirical Bayes
Credibility, Model 1, where the credibility premium combines the mean
for the particular risk with the estimated value of E(m(θ)). Data from 3
risks in this portfolio over 5 years are available. Let Xij be the claim for risk
i in year j. The table shows various summary statistics for the observed
data.
5
Xi
∑ (X ij − X j )2
j=1
4. An insurance company has to estimate the risk premium for the coming year for
a certain risk.
(i) Describe how the credibility approach to calculating the risk premium
differs from a conventional approach.
(ii) State the advantages and disadvantages of using Bayesian estimation and
empirical Bayes credibility theory estimation.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
184
www.sankhyiki.in
+91-‐9711150002
(iii) State the differences between the assumptions in empirical Bayes
credibility theory Model 1 and Model 2, and state why Model 2 is more
likely to be useful in practice. [UK Sept 2004]
5. An actuary has, for three years, recorded the volume of unsolicited advertising
that he receives. He believes that the number of items that he receives follows a
Poisson distribution with a mean which varies according to which quarter of the
year it is. He has recorded Yij the number of items received in the ith quarter of
the jth year (i =1, 2, 3 4 and j = 1, 2, 3). The actuary wishes to estimate the number
of items that he will receive in the 1st quarter of year 4. He has recorded the
following data:
(i) Estimate Y1,4 the number of items that the actuary expects to receive in the
first quarter of year 4 using the assumptions of EBCT Model 1.
The actuary believes that, in fact, the volume of items has been increasing at the
rate of 10% per annum.
(ii) Suggest how the approach in (i) can be adjusted to produce a revised
estimate taking this growth into account.
(iii) Calculate the maximum likelihood estimate of Y1,4 (based on the quarter 1
data already observed and the 10% pa increase described above).
(iv) Compare the assumptions underlying the approach in (i) and (ii) with
those underlying the approach in (iii). [UK April 2010]
6. The table below shows aggregate annual claim statistics for three risks over a
period of seven years. Annual aggregate claims for risk i in year j are denoted by
Xij.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
185
www.sankhyiki.in
+91-‐9711150002
Risk, i 1 7 1 7
Xi = ∑ Xij Si2 = ∑ (Xij − Xi ) 2
7 j=1 6 j=1
i=1 127.9 335.1
i=2 88.9 65.1
i= 3 149.7 33.9
(i) Calculate the credibility premium of each risk under the assumptions of
EBCT Model 1.
(ii) Explain why the credibility factor is relatively high in this case.
[UK Sept 2010]
7. An insurance company has collected data for the number of claims arising from
certain risks over the last 10 years. The number of claims in the jth year from the
ith risk is denoted by Xij for i = 1,2,... ,n and j = 1,2.., 10.
The distribution of Xij for j = 1,2,... 10 depends on an unknown parameter θj and
given θi the are independent identically distributed random variables.
(i) Give a brief interpretation of E[s2(θ)} and V[m(θ)] under the assumptions
of Empirical Bayes Credibility Theory Model 1.
(ii) Explain how the value of the credibility factor Z depends on E[s2(θ)] and
V[m(θ)]. [UK April 2011]
8. Five years ago, an insurance company began to issue insurance policies covering
medical expenses for dogs. The insurance company classifies dogs into three risk
categories: large pedigree (category 1), small pedigree (category 2) and non-
pedigree (category 3). The number of claims nij in the ith category in the jth year is
assumed to have a Poisson distribution with unknown parameter θi. Data on the
number of claims in each category over the last 5 years is set out as follows:
Year
! !
!
1 2 3 4 5 𝑛!" 𝑛!"
!!! !!!
Prior beliefs about θ1 are given by a gamma distribution with mean 50 and
variance 25.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
186
www.sankhyiki.in
+91-‐9711150002
(i) Find the Bayes estimate of θ1 under quadratic loss.
(ii) Calculate the expected claims for year 6 of each category under the
assumptions of Empirical Bayes Credibility Theory Model 1.
(iii) Explain the main differences between the approaches in (i) and that
in (ii).
(iv) Explain why the assumption of a Poisson distribution with a
constant parameter may not be appropriate and describe how each
approach might be generalised. [UK Sept 2011]
9. An insurer classifies the buildings it insures into one of three types. For Type 1
buildings, the number of claims per building per year follows a Poisson
distribution with parameter λ. Data are available for the last five years as follows:
Year 1 2 3 4 5
Number of type 1 buildings covered 89 112 153 178 165
Number of claims 15 23 29 41 50
(i) Determine the maximum likelihood estimate of λ based on the data above.
The insurer also has data for the other two types of building for all five years.
Define
Pij = the number of buildings insured in the jth year from type i and
Yij = the corresponding number of claims.
The five years of data can be summarised as follows:
! ! ! ! ! !
𝑌!" 𝑌!" 𝑌!"
Type(i) 𝑃! = 𝑃!" 𝑋! = 𝑃!" − 𝑋! 𝑃!" −𝑋
𝑃! 𝑃!" 𝑃!"
!!! !!! !!! !!!
Type 1 697 0.226686 1.527016 2.502737
Type 2 295 0.237288 0.96605 1.178133
Type 3 515 0.330097 4.53253 6.775614
! ! !!" !
𝑋= !!! !!! ! = 0.264101 where 𝑃 = !!! 𝑃!
(ii) Estimate the number of claims from Type 1 buildings in year six using
Empirical Bayes Credibility Theory model 2.
(iii) Explain the main differences between the approaches in parts (i) and (ii).
[UK Sept 2012]
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
187
www.sankhyiki.in
+91-‐9711150002
as shown:
Type Parameter
1 λ
2 2λ
3 5λ
where λ is an unknown positive constant.
Actual claim numbers over the last five years have been as follows. Here Xij
represents the number of claims from the ith type in the jth year:
11. For three years an insurance company has insured buildings in three different
towns against the risk of fire damage. Aggregate claims in the jth year from the
ith town are denoted by Xij for i = 1, 2, 3 and j = 1, 2, 3. The data is given in the
table below.
Town i Year j
1 2 3
1 8,130 9,210 8,870
2 7,420 6,980 8,130
3 9,070 8,550 7,730
Calculate the expected claims from each town for the next year using the
assumptions of Empirical Bayes Credibility Theory model 1. [UK Sept 2014]
12. An insurance company has for five years insured three different types of risk.
The number of policies in the jth year for the ith type of risk is denoted by Pij for i
= 1, 2, 3 and j = 1, 2, 3, 4, 5. The average claim size per policy over all five years
for the ith type of risk is denoted by Xi. The values of Pij and Xi are tabulated
below.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
188
www.sankhyiki.in
+91-‐9711150002
Number of policies Mean claim size
Risk Type i Year 1 Year 2 Year 3 Year 4 Year 5 𝑋!
1 17 23 21 29 35 850
2 42 51 60 55 37 720
3 43 31 62 98 107 900
The insurance company will be insuring 30 policies of type 1 next year and has
calculated the aggregate expected claims to be 25,200 using the assumptions of
Empirical Bayes Credibility Theory Model 2.
Calculate the expected annual claims next year for risks 2 and 3 assuming the
number of policies will be 40 and 110 respectively. [UK April 2015]
13. A shipping insurance company has insured ships for six years, and classifies the
ships it insures into three types.
Let:
Pij be the number of ships insured in the jth year from type i,
Yij be the corresponding number of claims.
The six years of data are summarised as follows:
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
189
www.sankhyiki.in
+91-‐9711150002
ANSWERS
1.
(i) (a) E(Xj) = E(E(Xj|𝜃)) = E(m(𝜃)) = m.
(b) E(XjXk) = E(E(XjXk|𝜃)) = E(m(𝜃)2) for j≠k
(c) E(XjXk) ≠ E(Xj)E(Xk) for j≠k so Xj and Xk are not independent.
(ii) 1214.84
If the estimate of V(m(𝜃)) increases, then the estimate of Z increases and
relatively more weight is put on the data from this particular fleet.
This happens because an increase V(m(𝜃)) means an increase in the variability
between fleets and so less emphasis on collateral information.
If Z increases, then Z×79.69 + (1 - Z) ×62.8 also increases. The credibility
premium moves closer to 𝑋, and, since this is greater than the estimated value of
m, this implies an increase in the premium.
! !!
2. (i) (a) ! 𝑥 + ! !
3. (i) (a) This is of the form Z𝑥 + (1- Z)125, where 125 is the
prior mean for 𝜃.
(c) If the variance of 25 is increased, then the value of Z would decrease, and the
credibility estimate would move closer to the prior mean. This makes sense, since
increasing this variance means that the claim amounts within each risk are more
variable, and so we should put relatively less weight on past data.
(ii) (a) Z = 0.8818 and credibility premium = 123.03
(b) This is similar to the value obtained in (i), so the assumptions made in the
prior appear not to be inappropriate.
4. (i) Conventional approach uses data from risk itself only. Credibility approach
combines this with information from other sources using a credibility premium.
Z𝑋 + (1- Z)𝜇
(ii) Bayes
Advantage: Not an approximation.
Disadvantage: Have to assume full distribution is correct.
EBCT
Advantage: Can be used when distribution is not known.
Disadvantage: An approximation. May not take account of tail of distribution
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
190
www.sankhyiki.in
+91-‐9711150002
(iii)
5. (i)112.75
(ii) The average number of pieces of mail is assumed to be growing each year.
We need to adjust the data to take account of this. Two approaches are:
• Convert the data into “Year 4” values by increasing by 10% p.a. and then
applying the methodology above; OR
• Recognise the lower volume of data in earlier years, by applying a risk volume
to each year and using EBCT model 2. If the risk volume for year 4 is 1, then the
risk volume for year 3 is 1/1.1 and year 2 is 1/1.21 etc.
(iii) 136.32
(iv) The main difference is that the maximum likelihood estimate approach
considers the data for Q1 in isolation, whereas the EBCT approach assumes that
data from other quarters come from a related distribution and so can tell us
something about Q1.
Specifically, the EBCT approach assumes that the mean volume of unsolicited
mail for each quarter is itself a sample from a common distribution. Hence whilst
each quarter has a different mean, they provide some information about the
population from which the mean is drawn.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
191
www.sankhyiki.in
+91-‐9711150002
This means that we can put relatively little weight on the information provided
by the data set as a whole, and must put more on the data from the individual
risks, leading to a relatively high credibility factor.
7. (i) E(s2(θ)) represents the average variability of claim numbers from year to year
for a single risk.
V(m(θ)) represents the variability of the average claim numbers for different risks
i.e. the variability of the means from risk to risk.
(ii) We can see that it is the relative values of E(s2(θ)) and V(m(θ)) that matter. In
particular, if E(s2(θ)) is high relative to V(m(θ)), this means that there is more
variability from year to year than from risk to risk. More credibility can be placed
on the data from other risks leading to a lower value of Z.
On the other hand, if V(m(θ)) is relatively higher this means there is greater
variation from risk to risk, so that we can place less reliance on the data as a
whole leading to a higher value of Z.
Satya
Niketan
|
North
Campus
|
Mumbai|
Kolkata
|
Jaipur
|Siliguri
Page
192
www.sankhyiki.in
+91-‐9711150002
The approach in (ii) can be generalised by using EBCT Model 2 which explicitly
incorporates an adjustment for the volume of risk.
Satya Niketan | North Campus | Mumbai| Kolkata | Jaipur |Siliguri Page 193