Professional Documents
Culture Documents
ACST356 Section 4 Complete Notes
ACST356 Section 4 Complete Notes
Section 4
Loss Models - IV
Complete Notes
OBJECTIVES
The objectives for Sections 1 to 5 are given in Section 1.
OPTIONAL READING
The optional reading for Sections 1 to 5 is given in Section 1.
OVERVIEW OF SECTION 4
We began the topic of estimation in Section 3 by considering methods of deriving
point estimates where we have a set of individual sample data. In that Section we
covered the maximum likelihood method, and we begin Section 4 by looking at two
alternatives for this scenario, method of moments and method of percentiles. We
then move on to Bayesian estimation, which uses not just sample data but also prior
beliefs. Methods that can be applied when we have grouped rather than individual data
are then considered. We finish estimation by considering the standard errors /
confidence intervals for our estimates.
An important feature of any model is that it fits the data closely. We conclude Section
4 by looking at two tests that assess how well a statistical model fits a set of data.
METHOD OF MOMENTS
Suppose there are r parameters to be estimated (eg. one parameter for the Poisson, two
for the Pareto, three for the Burr). Each of the first r sample moments is equated
to its theoretical value and the resulting equations solved for the r unknown parameters.
EX
1
2 2
EX2
1 2
x
j 1
i
j
mi
n
The two equations which when solved give the method of moments estimators and
are:
m1
1
2 2
m2
1 2
which can be solved simultaneously to give the methods of moments estimators
2 m2 m12
m2 2m12
m1 m2
m2 2m12
(Don’t remember these formulae, which are just to illustrate the method for a particular
distribution – the Pareto.)
Method of moments estimators are often easier to calculate than maximum likelihood
estimators. However maximum likelihood estimators usually have the better statistical
properties.
Methods of moments estimators may give you a useful starting value for the parameters to
use in solving for the MLEs.
METHOD OF PERCENTILES
The method of percentile matching can be useful for estimating parameters where the
underlying distribution has an algebraic distribution function (eg. Weibull, Burr).
The method simply involves equating selected sample percentiles to the theoretical
distribution function at those points. It is like the method of moments, but instead of
equating sample and theoretical moments, we equate sample and theoretical percentiles.
Like the method of moments, the number of percentile points we use depends on the
number of parameters we need. If we only need to estimate one parameter, it is probably
best to equate the sample and theoretical medians (50th percentiles). If we need to estimate
two parameters, it might make sense to equate the sample and theoretical 25th and 75th
percentiles.
Recalculate the estimates of and for the Pareto distribution using the data in Lecture
Exercise 10 of Section 3, now using percentile matching at the points x 1 and x 4 .
The data is given again below:
0.06 0.11 0.17 0.43 0.75 0.87 0.99 1.06 1.08 1.18
1.27 1.54 1.72 2.11 3.32 3.78 4.97 5.04 9.82 14.72
So far, we have assumed that our estimates will be based solely on the sample data – we
have ignored all other sources of data. But sometimes, we may have other information
about the likely value of the parameter, which should be taken into account. To do this,
we use a Bayesian approach.
BAYES’ THEOREM
Say the event F can only occur if one of r mutually exclusive events E1 , E2 ,..., Er occurs.
P F | E j .P E j
PEj | F j 1, 2,..., r
PF
(This is also similar to the expression used at the start of the derivation of the conditional
mean and variance formulae in Section 1.)
BAYESIAN ESTIMATION
We now want to use Bayes’ theorem to estimate a parameter. Consider a random variable
X which has a distribution depending on a parameter .
Previously (ie. for maximum likelihood, method of moments, and method of percentiles
estimation) has been considered as an unknown constant. We calculate an estimate of
using the sample data only.
Before we even collect any data we have a belief about the range of possible values of ,
which we summarise by the prior distribution. Bayesian estimation then combines this
prior information with sample data, so that our final estimate of is a combination of our
prior belief and the sample data.
COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 3 of 29
The Bayesian estimate is denoted ˆB .
Let us have the prior belief that , the parameter to be estimated, has a distribution with
p.d.f. g . g is referred to as the prior density function.
h x1 ,..., xn g
g x1 ,..., xn
h x1 ,..., xn
L g
h x1 ,..., xn
L g
So now we have a posterior distribution for our unknown parameter which incorporates
both our sample data and our prior information.
CONJUGATE PRIOR
If the prior distribution leads to a posterior distribution of the same type as the prior
distribution, then the prior is called the conjugate prior of that particular likelihood
function. This often makes the calculation of the Bayesian estimator easier.
1. Poisson – Gamma
See below for a description of this scenario, and Lecture Exercise 2 for an example.
2. Normal – Normal
Suppose that we are estimating the mean for a Normal distribution, where we
already know the variance is 2 .
nx 02 0 2 2 02
,
1
2
1 where 1
n 02 2
( x is the sample mean) and 1
2
n 02 2
.
Both of these special cases will be very useful to us in the future, when we do credibility
theory, for estimating risk premiums.
LOSS FUNCTIONS
We now have a posterior distribution for the parameter, but in order to derive a point
estimate we need to choose some central value of this distribution.
We can define a loss function as a measure of the seriousness of the difference between
the true value of and the point estimate ˆB . The loss function should take a value of
zero when the estimate is equal to the true value, and some positive value otherwise
(which may increase as we move further away from the true value).
We choose a point estimate to minimise a certain posterior expected loss function (ie.
minimise the expected difference between our estimate and the true parameter).
then we wish to minimise E ˆB with
2 2
If the loss function is given by ˆB
respect to ˆB . In this case the solution for ˆB is the mean of the posterior distribution.
This can be shown as follows with all expectations being taken with respect to the
posterior distribution of .
E ˆB
E E ( ) E ( ) ˆ
2
2
B
Var E ˆB
2
If the loss function is given by ˆB , the solution for ˆB is the median of the posterior
distribution.
If the loss function is 0 if ˆB and 1 if ˆB , the solution for ˆB is the mode of the
posterior distribution.
We will find the Bayesian estimator (with squared error loss) for the parameter µ of a
Poisson distribution, where the prior of µ has a gamma distribution with parameters and
.
1e
g 0
n
e xi
L
i 1 xi !
n
xi
e n
i 1
n
x !
i 1
i
g | x1 ,.., xn g .L
n
n
xi
e . i 1 1e
.
n
x !
i 1
i
n
xi 1
e . i 1
ignoring terms not involving
i 1
n
X i
i 1
n
n
X i
i 1
n n
n
.X .
n n
which is a linear combination of ˆ X and the prior mean .
As n (ie. as our sample becomes larger, and thus our sample mean X will be a
better estimate of the parameter ), the weight given to the sample data tends toward 1.
Lecture Exercise 2
Calculate the Bayesian estimate (with squared error loss) for the parameter µ of a Poisson
distribution. A sample of 5 observations has given a mean of 1.6. The prior of µ has a
gamma distribution with (i) mean = 1, variance = 1 (ii) mean = 1, variance = 0.25. State a
reasonableness check on your answer.
GROUPED DATA
Previously we have considered how to derive a point estimate when we have a sample
where we have each individual sample value. We will now consider methods for the
situation where we don’t have each individual sample value but we know that a certain
number of observations lie within a certain range. It is also possible to deal with situations
where we have some individual and some grouped observations, eg. we may know that
there are six observations between 10 and and that the observations less than 10 are 1,
3, 5, 6, 7, 7, 7, 8, 9, 9, 9.
The likelihood function again represents the likelihood of observing our particular set of
sample observations.
However whereas for individual data we used f xi for the likelihood of an observation
b
of xi , for grouped data we use f x dx
a
for the likelihood of an observation between a
and b.
f x; dx F c ; F c
ci 1
i i 1 ;
The p.d.f. f x; which we wish to fit to the loss distribution is defined for 0 x .
F x; represents the distribution function.
k
The frequencies are f1 , f 2 ,..., f k fi n in the classes
i 1
c0 , c1 , c1 , c2 , ..., ck 1 , ck .
L f x; dx
ci
c
i 1 i 1
k fi
F ci ; F ci 1 ;
i 1
Lecture Exercise 3
Write an expression for the likelihood function based on the following data if a Poisson
with parameter is the model being fitted. Hence, use Excel to find the maximum
likelihood estimate of .
The statistic is calculated using the observed number in each cell fi and the expected
number in the corresponding cell which is n F ci F ci 1 .
f n F ci F ci 1
2
k
i
i 1 n F ci F ci 1
This expression will be non-linear in the unknown parameters and numerical methods will
be required to minimise the expression.
COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 8 of 29
Lecture Exercise 4
Write an expression for the chi-square statistic based on the data in Lecture Exercise 3 if a
Poisson with parameter is the model being fitted. Hence, use Excel to find the
minimum chi-square estimate of .
If the estimator has a simple form, the variance may be able to be calculated directly.
Lecture Exercise 5
Find the standard error of ̂ , the maximum likelihood estimator of the Poisson
distribution from Section 3 Lecture Exercise 8.
If the variance cannot be calculated directly, we may in some cases by able to estimate it.
For maximum likelihood estimators, we can use the result that if ˆ is the maximum
likelihood estimator of a parameter based on a sample X , then as n , ˆ is
asymptotically normally distributed with mean (ie. the estimator is unbiased) and
variance
1
2
nE 2 ln f X ;
1
2
E 2 ln L , X
A similar expression exists for the case where more than one parameter is being estimated,
but this is no longer part of the syllabus for this unit.
Lecture Exercise 6
Apply the asymptotic theory of this section to find the approximate variance of the
maximum likelihood estimator of the Poisson distribution of Section 3 Lecture Exercise 8.
(As shown in Lecture Exercise 5, the variance of this estimator may be calculated exactly.
We are using the approximation formula here just as practice in applying it.)
INTERVAL ESTIMATION
A point estimator is a random variable which is distributed around the true value of the
parameter. A point estimate can be used to obtain a confidence interval for this true value.
Remember that the outcome of Bayesian estimation is a posterior distribution – not just a
single point estimate. Since we have a whole distribution, we can easily find an interval
so that 95% of observations on that distribution fall within the interval. This interval
would then represent the confidence interval for the parameter.
is say 0.95. The interval u , v provides a Bayesian interval estimate with posterior
probability 0.95.
We could choose u and v so that there was an equal probability in each tail, or
alternatively so that we obtained the shortest interval.
Lecture Exercise 7
f x; 32 x 2 exp x 2
1 3
x0
TESTING FIT
After we have chosen a model (one of our statistical distributions from Sections 1-3) and
fitted its parameters (using one of the methods in Sections 3-4), it makes sense that we
would want to test whether the final model seems to fit our data well enough.
We will consider two tests of fit.
k
Suppose there are k classes with observed frequencies f1 ,..., f k f i n .
i 1
The expected frequencies are npi where pi is the probability under H 0 that the random
variable will take a value in that interval.
observed expected
2
2
cells expected
fi npi
2
k
i 1 npi
If the expected frequency in any cell is less than say 5, this cell should be combined with
one or more cells until the condition is satisfied.
If the parameters of the distribution are estimated from the data, the number of degrees of
freedom will be reduced by one for each parameter estimated.
Naturally, fitting our parameters using the minimum chi-square method will minimise the
probability of rejecting the model based on the chi-square test, relative to other methods
of estimation.
Lecture Exercise 8
xi fi
0 89
1 143
2 94
3 42
4 20
5 8
6 3
7 1
8 0
Let Fn x be the empirical distribution function ie. the distribution function based on the
sample data.
Define Dn max Fn x F0 x
X
Dn measures the maximum discrepancy between the sample distribution function and the
theoretical distribution. Thus a large value of Dn indicates that the model is a poor fit to
the data.
Approximate critical values (which are a function of n, the sample size) are given on page
124 of Klugman et. al. (2005) and will be provided in tests or examinations if required.
Klugman et. al. (2005, pg. 124) states that “the major problem with the Kolmogorov-
Smirnov test is its lack of power ... for small samples it is unlikely that a model will be
rejected”.
Unlike the chi-square test, which may be applied when the model is either discrete or
continuous, the Kolmogorov-Smirnov test is intended for use for continuous models only.
Lecture Exercise 9
A Pareto , 1 is being fitted to the following loss amounts (in $’000) using MLE.
Test the fit of this model using the Kolmogorov-Smirnov test at 5% significance.
1.36
Klugman gives the critical value as , although this assumes that n 15 . Use this
n
critical value here as an illustration of the method, but please be aware that it is not
appropriate for such a small sample.
Section 4
Loss Models –IV
Solutions to Lecture Exercises
Lecture Exercise 1
F x 1 .
x
Equating the theoretical and sample distribution functions at the points x 1 and x 4
gives:
7
1
1 20
16
1
4 20
Rearranging gives
13
ln ln
1 20
4
ln ln
4 20
Dividing gives
13
ln ln
1 20
ln 4
ln
4 20
13
ln ln
ie. 1 20 0
ln 4
ln
4 20
These estimates are very different to the maximum likelihood estimates, which may be
due to the small sample size or may indicate that the Pareto is a poor fit to the data.
Lecture Exercise 2
n5
X 1.6
Mean =
Variance =
2
(i) = 1, = 1
5 1
ˆ B .1.6 .1
5 1 5 1
5 1
.1.6 .1
6 6
1.5
(ii) = 4, = 4
5 4
ˆ B .1.6 .1
54 54
5 4
.1.6 .1
9 9
1.3
Reasonableness Check:
In both cases the mean of the prior distribution is 1, so our best estimate of the parameter
in the absence of sample evidence is the same in (i) and (ii). However, in case (ii) we have
much more confidence in this prior estimate than in case (i), as represented by the lower
variance of the prior distribution. Thus in case (ii) we will place more emphasis on the
prior distribution and less on the direct sample evidence as compared with case (i), ie. we
will need more sample evidence to alter our prior belief. This is the case, as 1.333 is closer
to the prior mean than 1.5. A second reasonableness check is that both answers should be
between 1 (the prior mean) and 1.6 (the sample mean).
e 1 e
8,777 1,143 80
L e
e
ln L 8, 777 1,143 ln 80 ln 1 e e
Using the Excel Solver to maximise this expression gives ˆ 0.130656 .
Lecture Exercise 4
8, 777 10, 000e 1,143 10, 000 e 80 10, 000 1 e e
2
2 2
2
10, 000 e 10, 000 1 e e
10, 000e
Minimising this expression using the Excel Solver gives 0.130660 . This is very close
to the maximum likelihood estimate.
Lecture Exercise 5
Var ˆ Var X
1 n
Var X i
n i 1
1
.nVar X i assuming that the X i are independent
n2
1
Var X
n
since Var X
n
Lecture Exercise 6
e x
f x;
x!
ln f x; x ln ln x !
d x
ln f x; 1
d
d 2
x
ln f x; 2
d 2
Lecture Exercise 7
3 12 xi2
3
n
L xi e
i 1 2
3 n n 2 xi2
n 1 3
xi e
2 i 1
3
3
1 n n
ln L n ln n ln ln xi xi2
2 2 i 1 i 1
3
d n n
ln L xi2
d i 1
n 3
n
So, X i2
ˆ i 1
n
ˆ n 3
X i2
i 1
3
3
1
ln f x; ln ln ln x x 2
2 2
3
d 1
ln f x; x 2
d
d2 1
ln f x; 2
d 2
1
So, 2ˆ
1
n 2
2
n
ˆ
An approximate 95% confidence interval for is ˆ tn 1 where tn 1 is the upper 2.5%
n
point from a t-distribution with n 1 degrees of freedom.
0 89 1 143 ... 7 1
ˆ x 1.505
89 143 ... 1
xi fi Pr X xi Expected
Frequency
0 89 0.2220 88.8
1 143 0.3341 133.6
2 94 0.2514 100.6
3 42 0.1261 50.4
4 20 0.0475 19.0
5 8 0.0143 5.7
6 3 0.0036 1.4
7 1 0.0008 0.3
8 0 0.0001 0.1
2
... 5.25
88.8 133.6 7.5
4,0.05
2
9.488
Hence, we do not have enough evidence to reject the Poisson distribution at the 5% level.
Lecture Exercise 9
f x 1 x
1
x x
F x 1 r dr 1 r 1 1 x
1
0
0
n
L 1 xi
1
i 1
n
n 1 xi
1
i 1
n
ln L n ln 1 ln 1 xi
i 1
n
d n
ln L ln 1 xi
d i 1
n
ˆ n
ln 1 xi
i 1
8
7.223044373
1.107566227
A graph showing the empirical distribution function, which is a step function and F x
the distribution function for the Pareto distribution with ˆ 1.107566227 , which is
continuous, is as follows:
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10
The maximum absolute difference from the last two columns is 0.2577.
1.36
The critical value is 0.4808 .
8
Since 0.2577 < 0.4808, there is insufficient evidence to reject the null hypothesis that the
model is a good fit to the data.
1.
A motor insurance portfolio produces claim incidence data for 100,000 policies over one
year. The table below shows the observed number of policyholders making 0, 1, 2, 3, 4,
5, 6 or more claims in a year.
(a) Using the method of moments, estimate the parameter of a Poisson distribution to fit
the above data and then estimate the expected number of policies giving rise to the
different numbers of claims.
(b) Using the method of moments, estimate the two parameters of a negative binomial
distribution to fit the above data.
(c) Without doing any further calculations, explain why you would expect a negative
binomial distribution to fit the above data better than a Poisson distribution.
(d) Using a negative binomial distribution and the two parameters calculated in (b)
above, estimate the number of policies giving rise to the different numbers of
claims.
2.
The random variable X has the following density function
x 2
f ( x) x exp x0
2
is an unknown parameter 0 . Below is a sample of thirty values of X.
0.753 1.847 0.793 1.420 0.183
0.873 0.325 1.268 0.597 0.819
0.636 0.833 0.957 0.378 1.049
1.149 0.915 0.715 1.097 1.644
1.207 1.517 0.427 1.338 0.263
0.676 0.515 0.472 1.001 0.557
30 30
(For this set of data xi 26.224 and x 2
i 28.049 )
i 1 i 1
(b) Assume that the prior distribution of is gamma with parameters 12 and
21 .
(i) Find the posterior distribution of
(ii) Calculate the Bayesian estimate of with respect to quadratic loss
(c) Discuss the difference between the estimates of in parts (a)(i) and (b)(ii).
The following set of data represents all losses arising from a portfolio of marine
insurance. The data have been recorded to the nearest $1,000,000 and are expressed
in millions. You are planning to fit a Weibull distribution to the data.
1.860 2.714 2.780 3.128 3.149 3.274 3.424 3.890
3.950 3.989 4.791 4.976 4.994 5.047 5.091 5.130
5.139 5.172 5.194 5.196 5.334 5.372 5.599 5.655
5.735 5.780 5.799 5.920 5.953 5.954 6.297 7.190
7.240 7.689 7.911 8.377 8.680 8.745 9.237 9.639
40 40
For this set of data, n 40, xi 221.0,
i 1
x
i 1
2
i 1,357.0
If the random variable X has a Weibull distribution with parameters c and , then
f x c x 1 exp c x
F x 1 exp cx c 0, 0 x 0
(a) Find the two expressions which must be equated to zero to solve for the maximum
likelihood estimates cˆ and ˆ of c and .
(b) Write an expression for E X n where n is any positive integer and show that
n
1
E X n n
c
Hence, state the two equations to be solved to give the method of moments
estimates c and of c and .
(c) The equations in (a) and (b) cannot easily be solved to produce estimates of c and .
By equating the theoretical and empirical distribution functions at the points x = 4
and x = 6, show that the percentile matching estimates c and of c and are given
by c 0.001330 and 3.87835 .
5.
Test the fit of the Weibull distribution in Section 4 Exercise 3 (above), using the
percentile matching estimates found in (c). Use the Kolmogorov-Smirnov test at 5%
1.36
significance. (The critical value is .)
n
Section 4
Loss Models - IV
Solutions to Exercises
1. (a)
Sample mean x 0.13345
Population mean where is the parameter of the Poisson distribution
x 0.13345
Pr X 1 e 0.11678
11,678
Pr X 2 e 2 2! 0.00779
779
Pr X 3 e 3 3! 0.00035
35
Pr X 4 e 4 4! 0.00001
1
Pr X 5 e 5 5! 0.00000
0
Pr X 6 0.00000 0
100,000
(b)
Sample mean = 0.13345
Sample variance = 0.14304
k 1 p
Population mean =
p
k 1 p
Population variance
p2
0.13345
p 0.93295
0.14304
k 1.85686
(c)
The mean and variance of a Poisson distribution are always equal (both equal to ). The
mean and variance of the negative binomial distribution are not equal; the variance is
1
greater than the mean by a factor of . Since in our sample the variance is greater than
p
the mean, the negative binomial distribution will give a better fit to the data.
2.85686
eg. Pr X 2 1 0.93295
1.85636 2
0.93295
2
2.85686 1.85686
0.932951.85636 1 0.93295
2
2
0.01048
2. (a)
(i)
x12 x22 xn2
L x1e 2
. x2 e 2
.... xn e 2
n
n xi2
xe
n 2 i 1
i
i 1
n
n
ln L n ln ln xi x 2
i
i 1 2 i 1
ln L n 1 n
xi2
2 i 1
2n
ˆ n
x
i 1
2
i
2 30
28.049
2.139
(b)(i)
Posterior distribution is proportional to L g
n
n
n
2 xi2
1e
xi e i 1
.
i 1
1
xi2
proportional to n 1
e 2
1
which is gamma with parameters n+ and
2
xi2 , ie. 42 and 35.0245
(ii)
Bayesian estimate of Mean of gamma 42,35.0245 distribution
42
=
35.0245
1.199
(c)
The Bayesian estimate of 1.199 does not lie within the approximate 95% confidence
interval for given by the maximum likelihood estimate, ie.
2.139 t29 0.391
2.139 2.045 0.391
The prior Bayesian estimate of is gamma with parameters 12 and 21, ie. has mean 0.571
and standard deviation 0.165. The small standard deviation indicates a strong prior belief,
ie. a high level of confidence in the prior estimate of 0.571.
If is actually in the region expressed by this strong prior belief then the Bayesian
estimate is better in the sense that it has a smaller mean square error. If the belief is
incorrect then the maximum likelihood estimate is better.
n n
Note that for large n, ˆ and ˆB are essentially the same.
1 1
2
xi2
2
xi2
i 1
n
n c xi
c n . n . xi 1 .e i 1
i 1
n n
ln L n ln c n ln 1 ln xi c xi
i 1 i 1
n
d n
ln L xi
dc c i 1
d n n n
ln L ln xi c xi ln x i
d i 1 i 1
To find the MLEs, set both of these equations equal to zero and solve to find cˆ and ˆ .
(b)
E X n x n .c x 1e cx dx
0
cx n x 1e cx dx
0
Let y x dy x 1 dx
E X n cy e cy dy
n
n
1 n
c y e cy
n
c 1 n
dy
0 1 n
1 n
n
c
since the integral represents the area under a gamma 1 n , c between 0
and , =1
Equations are
m1
221.0 1
1
1
40 c
m2
1357.0 1
2
2
40 c
1 e c 4 0.25, e c 4 0.75
1 e c6 0.75, e c6 0.25
c 4 ln 0.75
c 6 ln 0.25
6 ln 0.25
4 ln 0.75
ln 4.81884167
3.87835
ln 1.5
c 0.001330
4.
Dn 0.21370
1.36
0.215035
40
We do not reject the null hypothesis that the Weibull with c 0.00133, 3.87835 is a
good fit to the data.