ACST356 Section 4 Complete Notes

ACST 3056 MATHEMATICAL THEORY OF RISK
Section 4
Loss Models - IV
Complete Notes
OBJECTIVES
The objectives for Sections 1 to 5 are given in Section 1.
OPTIONAL READING
The optional reading for Sections 1 to 5 is given in Section 1.
OVERVIEW OF SECTION 4
We began the topic of estimation in Section 3 by considering methods of deriving
point estimates where we have a set of individual sample data. In that Section we
covered the maximum likelihood method, and we begin Section 4 by looking at two
alternatives for this scenario, method of moments and method of percentiles. We
then move on to Bayesian estimation, which uses not just sample data but also prior
beliefs. Methods that can be applied when we have grouped rather than individual data
are then considered. We finish estimation by considering the standard errors /
confidence intervals for our estimates.
An important feature of any model is that it fits the data closely. We conclude Section
4 by looking at two tests that assess how well a statistical model fits a set of data.
METHOD OF MOMENTS
Suppose there are r parameters to be estimated (eg. one parameter for the Poisson, two
for the Pareto, three for the Burr). Each of the first r sample moments is equated
to its theoretical value and the resulting equations solved for the r unknown parameters.
If we wish to estimate the parameters α and λ for a Pareto distribution we have

from Section 1 the theoretical moments

EX  
 1
2 2
EX2 
  1  2 
COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 1 of 29

The ith moment about the origin of a sample of size n is
n
x
j 1
i
j
mi 
n
The two equations which when solved give the method of moments estimators  and 
are:

m1 
 1
2 2
m2 
  1  2 
which can be solved simultaneously to give the methods of moments estimators
2  m2  m12 
 
m2  2m12
m1 m2
 
m2  2m12
(Don’t remember these formulae, which are just to illustrate the method for a particular
distribution – the Pareto.)
Method of moments estimators are often easier to calculate than maximum likelihood
estimators. However maximum likelihood estimators usually have the better statistical
properties.
Methods of moments estimators may give you a useful starting value for the parameters to
use in solving for the MLEs.
METHOD OF PERCENTILES
The method of percentile matching can be useful for estimating parameters where the
underlying distribution has an algebraic distribution function (eg. Weibull, Burr).
The method simply involves equating selected sample percentiles to the theoretical
distribution function at those points. It is like the method of moments, but instead of
equating sample and theoretical moments, we equate sample and theoretical percentiles.
Like the method of moments, the number of percentile points we use depends on the
number of parameters we need. If we only need to estimate one parameter, it is probably
best to equate the sample and theoretical medians (50th percentiles). If we need to estimate
two parameters, it might make sense to equate the sample and theoretical 25th and 75th
percentiles.

Lecture Exercise 1
Recalculate the estimates of  and  for the Pareto distribution using the data in Lecture
Exercise 10 of Section 3, now using percentile matching at the points x  1 and x  4 .
The data is given again below:
0.06 0.11 0.17 0.43 0.75 0.87 0.99 1.06 1.08 1.18
1.27 1.54 1.72 2.11 3.32 3.78 4.97 5.04 9.82 14.72
BAYESIAN POINT ESTIMATION

OVERVIEW
So far, we have assumed that our estimates will be based solely on the sample data – we
have ignored all other sources of data. But sometimes, we may have other information
about the likely value of the parameter, which should be taken into account. To do this,
we use a Bayesian approach.
BAYES’ THEOREM
Recall from ACST211 the following general form of Bayes’ Theorem:
Say the event F can only occur if one of r mutually exclusive events E1 , E2 ,..., Er occurs.
P  F | E j  .P  E j 
PEj | F   j  1, 2,..., r
PF 
(This is also similar to the expression used at the start of the derivation of the conditional
mean and variance formulae in Section 1.)
BAYESIAN ESTIMATION
We now want to use Bayes’ theorem to estimate a parameter. Consider a random variable
X which has a distribution depending on a parameter  .
Previously (ie. for maximum likelihood, method of moments, and method of percentiles
estimation)  has been considered as an unknown constant. We calculate an estimate of
 using the sample data only.
In Bayesian estimation the parameter is looked upon as a random variable. (Since  is

now a random variable we should really denote it by the capital  and its density by
f    - however, for simplicity we will stick with  and f   ).
Before we even collect any data we have a belief about the range of possible values of  ,
which we summarise by the prior distribution. Bayesian estimation then combines this
prior information with sample data, so that our final estimate of  is a combination of our
prior belief and the sample data.
The Bayesian estimate is denoted ˆB .
PRIOR AND POSTERIOR DISTRIBUTIONS
Let us have the prior belief that  , the parameter to be estimated, has a distribution with
p.d.f. g   . g   is referred to as the prior density function.
We want the conditional p.d.f. of  given X 1  x1 , X 2  x2 , ..., X n  xn , i.e.

g  | x1 ,..., xn  . This is called the posterior density of  given X . ( X is the set of
sample data.)
From the definition of conditional probability / Bayes’ Theorem,
h  x1 ,..., xn   g  
g  x1 ,..., xn  
h  x1 ,..., xn 
L   g  

h  x1 ,..., xn 
since the likelihood function, L   , can be thought of as the conditional

joint p.d.f. of X 1 , X 2 ,..., X n given  i.e. h  x1 ,..., xn |   .
 L   g  
since h  x1 ,..., xn    h  x1 ,..., xn   g   d is independent of  .


The symbol  is read as “is proportional to”.
The constant of proportionality (ie. h  x1 ,..., xn  ) is often not required to be evaluated

because the terms in L   g   which are a function of  can often be recognised as a
standard distribution. However, if you retain these constant values your final answer will
be a valid p.d.f. with the correct constant value. (I would recommend that you satisfy
yourself that this is true for at least one example.)
So now we have a posterior distribution for our unknown parameter which incorporates
both our sample data and our prior information.
CONJUGATE PRIOR
If the prior distribution leads to a posterior distribution of the same type as the prior
distribution, then the prior is called the conjugate prior of that particular likelihood
function. This often makes the calculation of the Bayesian estimator easier.

Two important examples are:
1. Poisson – Gamma
See below for a description of this scenario, and Lecture Exercise 2 for an example.
2. Normal – Normal
Suppose that we are estimating the mean  for a Normal distribution, where we
already know the variance is  2 .
If the prior distribution is Normal   ,   , then the posterior will be Normal

0
2
0
nx  02  0 2  2 02
  , 
1
2
1 where 1 
n 02   2
( x is the sample mean) and  1 
2
n 02   2
.
This result will be proved in Section 10.
Both of these special cases will be very useful to us in the future, when we do credibility
theory, for estimating risk premiums.
LOSS FUNCTIONS
We now have a posterior distribution for the parameter, but in order to derive a point
estimate we need to choose some central value of this distribution.
We can define a loss function as a measure of the seriousness of the difference between
the true value of  and the point estimate ˆB . The loss function should take a value of
zero when the estimate is equal to the true value, and some positive value otherwise
(which may increase as we move further away from the true value).
We choose a point estimate to minimise a certain posterior expected loss function (ie.
minimise the expected difference between our estimate and the true parameter).
Squared error or quadratic loss
 
then we wish to minimise E    ˆB  with  
2 2
If the loss function is given by   ˆB
  
respect to ˆB . In this case the solution for ˆB is the mean of the posterior distribution.
This can be shown as follows with all expectations being taken with respect to the
posterior distribution of  .

E    ˆB   
  E    E ( )  E ( )  ˆ
 
2
  
2
 
B
 E   E ( )    2 E   E ( )  E ( )  ˆB    E ( )  ˆB 

2 2
 
 Var     E    ˆB 
2
which is a minimum when ˆB  E  

Absolute error loss
If the loss function is given by   ˆB , the solution for ˆB is the median of the posterior
distribution.
All or nothing loss
If the loss function is 0 if ˆB   and 1 if ˆB   , the solution for ˆB is the mode of the
posterior distribution.
THE POISSON-GAMMA MODEL
We will find the Bayesian estimator (with squared error loss) for the parameter µ of a
Poisson distribution, where the prior of µ has a gamma distribution with parameters  and
.
    1e 
g     0
  
n
 e    xi 
L    
i 1  xi ! 
n
 xi
e  n
 i 1
 n
x !
i 1
i
g   | x1 ,.., xn   g    .L   
n
 n
 xi
e . i 1     1e 
 .
n
  
x !
i 1
i
   n 
  xi 1
e . i 1
ignoring terms not involving 
This can be recognised as a gamma with parameters    xi and  n    .

n
i 1

n
Therefore, ˆ B  mean of a gamma with       X i ,    n  
i 1



n
X i 
 i 1
n
n
X i

 i 1

n n
n   
 .X  . 
n n  

which is a linear combination of ˆ  X and the prior mean .

As n   (ie. as our sample becomes larger, and thus our sample mean X will be a
better estimate of the parameter  ), the weight given to the sample data tends toward 1.
Lecture Exercise 2
Calculate the Bayesian estimate (with squared error loss) for the parameter µ of a Poisson
distribution. A sample of 5 observations has given a mean of 1.6. The prior of µ has a
gamma distribution with (i) mean = 1, variance = 1 (ii) mean = 1, variance = 0.25. State a
reasonableness check on your answer.
GROUPED DATA
Previously we have considered how to derive a point estimate when we have a sample
where we have each individual sample value. We will now consider methods for the
situation where we don’t have each individual sample value but we know that a certain
number of observations lie within a certain range. It is also possible to deal with situations
where we have some individual and some grouped observations, eg. we may know that
there are six observations between 10 and  and that the observations less than 10 are 1,
3, 5, 6, 7, 7, 7, 8, 9, 9, 9.
MAXIMUM LIKELIHOOD METHOD
The likelihood function again represents the likelihood of observing our particular set of
sample observations.
However whereas for individual data we used f  xi  for the likelihood of an observation
b
of xi , for grouped data we use  f  x  dx
a
for the likelihood of an observation between a
and b.

For example, assume that we have a set of grouped data where the class boundaries are
c0  0, c1 , ..., ck 1 , ck  . The probability that an observation lies between ci 1 and ci is
ci
 f  x;  dx  F  c ;   F  c
ci 1
i i 1 ; 
The p.d.f. f  x;  which we wish to fit to the loss distribution is defined for 0  x   .
F  x;  represents the distribution function.
 k 
The frequencies are f1 , f 2 ,..., f k   fi  n  in the classes
 i 1 
 c0 , c1  ,  c1 , c2  , ...,  ck 1 , ck  .
We use the likelihood function

k fi
L       f  x;  dx 
ci
 c
i 1  i 1 
 k fi

   F  ci ;   F  ci 1 ;   
 i 1 
Lecture Exercise 3
Write an expression for the likelihood function based on the following data if a Poisson
with parameter  is the model being fitted. Hence, use Excel to find the maximum
likelihood estimate of  .
Number of claims Observed number of policyholders

0 8,777
1 1,143
2 or over 80
MINIMUM CHI-SQUARE METHOD
The statistic is calculated using the observed number in each cell fi and the expected
number in the corresponding cell which is n  F  ci   F  ci 1   .
We find the value of the parameters which minimise
f  n  F  ci   F  ci 1   
2
k

i
i 1 n  F  ci   F  ci 1  
This expression will be non-linear in the unknown parameters and numerical methods will
be required to minimise the expression.
Lecture Exercise 4
Write an expression for the chi-square statistic based on the data in Lecture Exercise 3 if a
Poisson with parameter  is the model being fitted. Hence, use Excel to find the
minimum chi-square estimate of  .
STANDARD ERROR OF ESTIMATORS

The standard error or standard deviation of an estimator can be calculated. As stated
previously, an efficient estimator is one which has a low variance or standard error.
If the estimator has a simple form, the variance may be able to be calculated directly.
Lecture Exercise 5
Find the standard error of ̂ , the maximum likelihood estimator of the Poisson
distribution from Section 3 Lecture Exercise 8.
If the variance cannot be calculated directly, we may in some cases by able to estimate it.
For maximum likelihood estimators, we can use the result that if ˆ is the maximum
likelihood estimator of a parameter  based on a sample X , then as n   , ˆ is
asymptotically normally distributed with mean  (ie. the estimator is unbiased) and
variance
1
 2

nE  2 ln f  X ;  
  
where f  X ;  is the p.d.f. of X in terms of  the parameter to be estimated. The

expectation is taken with respect to X.
[An alternative expression is
1
 2

E  2 ln L  , X  
  
where L  , X  is the likelihood.]

The above formulae are known as the Cramer-Rao bound for an unbiased estimator. It
can be shown that the variance of any unbiased estimator of  is greater than or equal to
this bound. The proof requires that certain “mild regularity” assumptions are met. (You
are not required to know the proof.)
A similar expression exists for the case where more than one parameter is being estimated,
but this is no longer part of the syllabus for this unit.
Lecture Exercise 6
Apply the asymptotic theory of this section to find the approximate variance of the
maximum likelihood estimator of the Poisson distribution of Section 3 Lecture Exercise 8.
(As shown in Lecture Exercise 5, the variance of this estimator may be calculated exactly.
We are using the approximation formula here just as practice in applying it.)
INTERVAL ESTIMATION
A point estimator is a random variable which is distributed around the true value of the
parameter. A point estimate can be used to obtain a confidence interval for this true value.
MAXIMUM LIKELIHOOD INTERVAL ESTIMATION
For a N   ,  2  distribution with  2 known the M.L.E. of µ is X . We know that X has

 2 
a N  ,  distribution.
 n 
So we can say that

 
 X  
Pr  1.96   1.96   0.95
  
 
 n 
   
Pr  X  1.96    X  1.96   0.95
 n n

A 95% confidence interval for  is x  1.96 .
n
The maximum likelihood estimator ˆ of  in f  x;  has an approximate normal

distribution with mean  and variance  2ˆ which can be obtained from what we know
about the asymptotic properties of MLEs. An approximate 95% confidence interval for 
is ˆ  1.96 ˆ .

If  2ˆ is a function of  which is unknown, it may be approximated by substituting ˆ for
 . An approximate 95% confidence interval for  can then be found by replacing the
normal distribution percentage point of 1.96 by the corresponding value from the t-
distribution with n  1 degrees of freedom.
BAYESIAN INTERVAL ESTIMATION
Remember that the outcome of Bayesian estimation is a posterior distribution – not just a
single point estimate. Since we have a whole distribution, we can easily find an interval
so that 95% of observations on that distribution fall within the interval. This interval
would then represent the confidence interval for the parameter.
More formally, let u  u  x1 ,..., xn  and v  v  x1 ,..., xn  be such that

v
Pr  u    v x1 ,..., xn    g  x1 ,..., xn  d
u
is say 0.95. The interval  u , v  provides a Bayesian interval estimate with posterior
probability 0.95.
We could choose u and v so that there was an equal probability in each tail, or
alternatively so that we obtained the shortest interval.
Lecture Exercise 7
Let X 1 , X 2 ,..., X n be a random sample from a Weibull distribution having p.d.f.

f  x;   32  x 2 exp  x 2 
1 3
x0
Using the maximum likelihood estimate of  , find an approximate 95% confidence

interval for  .
TESTING FIT
After we have chosen a model (one of our statistical distributions from Sections 1-3) and
fitted its parameters (using one of the methods in Sections 3-4), it makes sense that we
would want to test whether the final model seems to fit our data well enough.
We will consider two tests of fit.

CHI-SQUARE GOODNESS-OF-FIT TEST
The chi-square test compares the frequency distribution observed in the sample with the
expected frequency distribution based on our model.
 k 
Suppose there are k classes with observed frequencies f1 ,..., f k   f i  n  .
 i 1 
The expected frequencies are npi where pi is the probability under H 0 that the random
variable will take a value in that interval.
The goodness-of-fit statistic is
 observed  expected 
2
 
2
cells expected
 fi  npi 
2
k

i 1 npi
which for large n has an approximate  2 distribution with k  1 degrees of freedom.
If the expected frequency in any cell is less than say 5, this cell should be combined with
one or more cells until the condition is satisfied.
If the parameters of the distribution are estimated from the data, the number of degrees of
freedom will be reduced by one for each parameter estimated.
Naturally, fitting our parameters using the minimum chi-square method will minimise the
probability of rejecting the model based on the chi-square test, relative to other methods
of estimation.
Lecture Exercise 8
Does the following data indicate that a Poisson distribution is appropriate?
xi fi
0 89
1 143
2 94
3 42
4 20
5 8
6 3
7 1
8 0

KOLMOGOROV-SMIRNOV TEST OF FIT
This is another way of testing the fit of a model.
Let Fn  x  be the empirical distribution function ie. the distribution function based on the
sample data.
Let F0  x  be the distribution function implied by the null hypothesis.
Define Dn  max Fn  x   F0  x 
X
Dn measures the maximum discrepancy between the sample distribution function and the
theoretical distribution. Thus a large value of Dn indicates that the model is a poor fit to
the data.
As Fn  x  is a step function, this maximum occurs either just before or at a sample

observation if F0  x  is continuous.
Approximate critical values (which are a function of n, the sample size) are given on page
124 of Klugman et. al. (2005) and will be provided in tests or examinations if required.
Klugman et. al. (2005, pg. 124) states that “the major problem with the Kolmogorov-
Smirnov test is its lack of power ... for small samples it is unlikely that a model will be
rejected”.
Unlike the chi-square test, which may be applied when the model is either discrete or
continuous, the Kolmogorov-Smirnov test is intended for use for continuous models only.
Lecture Exercise 9
A Pareto  ,   1 is being fitted to the following loss amounts (in $’000) using MLE.
0.021 0.041 0.555 1.470 1.753 2.081 2.649 9.845
Test the fit of this model using the Kolmogorov-Smirnov test at 5% significance.
1.36
Klugman gives the critical value as , although this assumes that n  15 . Use this
n
critical value here as an illustration of the method, but please be aware that it is not
appropriate for such a small sample.

Section 4
Loss Models –IV
Solutions to Lecture Exercises
Lecture Exercise 1
For the Pareto distribution,

  
F  x  1    .
  x
Equating the theoretical and sample distribution functions at the points x  1 and x  4
gives:

   7
1   
  1 20

   16
1   
 4 20
Rearranging gives
   13
 ln    ln
  1 20
   4
 ln    ln
 4 20
Dividing gives
   13
ln   ln
   1   20
   ln 4
ln  
 4 20
   13
ln   ln
ie.    1   20  0
   ln 4
ln  
 4 20

We can now use the Excel Solver to set this equation equal to zero by changing  . We
then use one of the initial two equations to find  given this value of  (and we can use
the second as a double-check on our answers.)
The results are ˆ PM  8.80065, ˆPM  19.93349 .
These estimates are very different to the maximum likelihood estimates, which may be
due to the small sample size or may indicate that the Pareto is a poor fit to the data.
Lecture Exercise 2
n5
X  1.6

Mean =


Variance =
2
So,  = Mean / Variance
(i)  = 1,  = 1
5 1
ˆ B  .1.6  .1
5 1 5 1
5 1
 .1.6  .1
6 6
 1.5
(ii)  = 4,  = 4
5 4
ˆ B  .1.6  .1
54 54
5 4
 .1.6  .1
9 9
 1.3
Reasonableness Check:
In both cases the mean of the prior distribution is 1, so our best estimate of the parameter
in the absence of sample evidence is the same in (i) and (ii). However, in case (ii) we have
much more confidence in this prior estimate than in case (i), as represented by the lower
variance of the prior distribution. Thus in case (ii) we will place more emphasis on the
prior distribution and less on the direct sample evidence as compared with case (i), ie. we
will need more sample evidence to alter our prior belief. This is the case, as 1.333 is closer
to the prior mean than 1.5. A second reasonableness check is that both answers should be
between 1 (the prior mean) and 1.6 (the sample mean).

Lecture Exercise 3
    e  1  e 
8,777   1,143 80
L  e 
  e 

ln L  8, 777  1,143  ln      80 ln 1  e     e   
Using the Excel Solver to maximise this expression gives ˆ  0.130656 .
Lecture Exercise 4
8, 777  10, 000e   1,143  10, 000 e   80  10, 000 1  e   e  
2
 2  2  
2 
10, 000 e 10, 000 1  e   e 
   
10, 000e
Minimising this expression using the Excel Solver gives   0.130660 . This is very close
to the maximum likelihood estimate.
Lecture Exercise 5
 
Var ˆ  Var  X 
1 n 
 Var   X i 
 n i 1 
1
 .nVar  X i  assuming that the X i are independent
n2
1
 Var  X 
n

 since Var  X   
n
Lecture Exercise 6
e   x
f  x;   
x!
ln f  x;      x ln   ln  x !
d x
ln f  x;    1 
d 
d 2
x
ln f  x;    2
d 2


1 1 2 
So asymptotic variance     .
  X  n E X n n
nE  2   
  
2
This is the same as the exact variance as calculated in Lecture Exercise 5.
Lecture Exercise 7
 3 12  xi2 
3
n
L       xi e 
i 1  2 
 3  n n  2  xi2 
n 1 3
      xi e 
2 i 1  
3
3
1 n n
ln L    n ln    n ln    ln xi    xi2
2 2 i 1 i 1
3
d n n
ln L      xi2
d  i 1
n 3
n
So,   X i2
ˆ i 1
n
ˆ  n 3
 X i2
i 1
3
3
1
ln f  x;   ln    ln   ln x   x 2
2 2
3
d 1
ln f  x;    x 2
d 
d2 1
ln f  x;    2
d 2 
1
So,  2ˆ 
 1 
n 2 
  
2

n
ˆ
An approximate 95% confidence interval for  is ˆ  tn 1 where tn 1 is the upper 2.5%
n
point from a t-distribution with  n  1 degrees of freedom.

Lecture Exercise 8
The Poisson parameter  is unknown but it may be estimated by
0  89  1  143  ...  7  1
ˆ  x   1.505
89  143  ...  1
xi fi Pr  X  xi  Expected
Frequency
0 89 0.2220 88.8
1 143 0.3341 133.6
2 94 0.2514 100.6
3 42 0.1261 50.4
4 20 0.0475 19.0
5 8 0.0143 5.7
6 3 0.0036 1.4
7 1 0.0008 0.3
8 0 0.0001 0.1
89  88.8 143  133.6  12  7.5

2 2 2
 
2
  ...   5.25
88.8 133.6 7.5
(combining the last four classes so that expected frequency > 5)
Degrees of freedom = 6 (classes) - 1 - 1 (parameter estimated)

=4
 4,0.05
2
 9.488
Hence, we do not have enough evidence to reject the Poisson distribution at the 5% level.
Lecture Exercise 9
f  x    1  x 
 1
x x
F  x     1  r  dr    1  r    1  1  x 
 1  
 0
0

We need to find the MLE:
n
L      1  xi 
 1
i 1
n
  n  1  xi 
 1
i 1
n
ln L    n ln     1  ln 1  xi 
i 1
n
d n
ln L      ln 1  xi 
d  i 1
n
ˆ  n
 ln 1  xi 
i 1
8

7.223044373
 1.107566227
A graph showing the empirical distribution function, which is a step function and F  x 
the distribution function for the Pareto distribution with ˆ  1.107566227 , which is
continuous, is as follows:
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10

The differences between these functions just before and just after each sample value are
shown below:
xi F0(xi) Fn(xi) Fn(xi-) F0(xi) - Fn(xi) F0(xi) - Fn(xi-)

0.021 0.0228 0.125 0.000 -0.1022 0.0228
0.041 0.0435 0.250 0.125 -0.2065 -0.0815
0.555 0.3867 0.375 0.250 0.0117 0.1367
1.470 0.6327 0.500 0.375 0.1327 0.2577
1.753 0.6742 0.625 0.500 0.0492 0.1742
2.081 0.7124 0.750 0.625 -0.0376 0.0874
2.649 0.7616 0.875 0.750 -0.1134 0.0116
9.845 0.9286 1.000 0.875 -0.0714 0.0536
The maximum absolute difference from the last two columns is 0.2577.
1.36
The critical value is  0.4808 .
8
Since 0.2577 < 0.4808, there is insufficient evidence to reject the null hypothesis that the
model is a good fit to the data.

Section 4
Loss Models - IV
Exercises
1.
A motor insurance portfolio produces claim incidence data for 100,000 policies over one
year. The table below shows the observed number of policyholders making 0, 1, 2, 3, 4,
5, 6 or more claims in a year.
Number of Claims Number of Policies

0 87,889
1 11,000
2 1,000
3 100
4 10
5 1
6 or more 0
100,000
(a) Using the method of moments, estimate the parameter of a Poisson distribution to fit
the above data and then estimate the expected number of policies giving rise to the
different numbers of claims.
(b) Using the method of moments, estimate the two parameters of a negative binomial
distribution to fit the above data.
(c) Without doing any further calculations, explain why you would expect a negative
binomial distribution to fit the above data better than a Poisson distribution.
(d) Using a negative binomial distribution and the two parameters calculated in (b)
above, estimate the number of policies giving rise to the different numbers of
claims.
2.
The random variable X has the following density function
  x 2 
f ( x)   x exp   x0
 2 
 is an unknown parameter   0  . Below is a sample of thirty values of X.
0.753 1.847 0.793 1.420 0.183
0.873 0.325 1.268 0.597 0.819
0.636 0.833 0.957 0.378 1.049
1.149 0.915 0.715 1.097 1.644
1.207 1.517 0.427 1.338 0.263
0.676 0.515 0.472 1.001 0.557
30 30
(For this set of data  xi  26.224 and x 2
i  28.049 )
i 1 i 1

(a) (i) Calculate the maximum likelihood estimate ˆ of  .
(ii) Estimate the standard deviation of ˆ
(b) Assume that the prior distribution of  is gamma with parameters   12 and
  21 .
(i) Find the posterior distribution of 
(ii) Calculate the Bayesian estimate of  with respect to quadratic loss
(c) Discuss the difference between the estimates of  in parts (a)(i) and (b)(ii).
3. (Based on 1994 Exam, Question 1)
The following set of data represents all losses arising from a portfolio of marine
insurance. The data have been recorded to the nearest $1,000,000 and are expressed
in millions. You are planning to fit a Weibull distribution to the data.
1.860 2.714 2.780 3.128 3.149 3.274 3.424 3.890
3.950 3.989 4.791 4.976 4.994 5.047 5.091 5.130
5.139 5.172 5.194 5.196 5.334 5.372 5.599 5.655
5.735 5.780 5.799 5.920 5.953 5.954 6.297 7.190
7.240 7.689 7.911 8.377 8.680 8.745 9.237 9.639
40 40
For this set of data, n  40,  xi  221.0,
i 1
x
i 1
2
i  1,357.0
If the random variable X has a Weibull distribution with parameters c and  , then

f  x   c x 1 exp c x 
F  x   1  exp cx  c  0,   0 x  0
(a) Find the two expressions which must be equated to zero to solve for the maximum
likelihood estimates cˆ and ˆ of c and .
(b) Write an expression for E  X n  where n is any positive integer and show that
 n
 1  

E  X n    n 
c
Hence, state the two equations to be solved to give the method of moments
estimates c and  of c and  .
(c) The equations in (a) and (b) cannot easily be solved to produce estimates of c and .
By equating the theoretical and empirical distribution functions at the points x = 4
and x = 6, show that the percentile matching estimates c  and   of c and  are given
by c   0.001330 and    3.87835 .

4.
Use the chi-square test to assess the suitability of the model used to find the expected
frequencies given below:
Number of claims Observed number of Expected number of

policies policies
0 120 104
1 356 327
2 520 564
3 322 309
4 62 76
5.
Test the fit of the Weibull distribution in Section 4 Exercise 3 (above), using the
percentile matching estimates found in (c). Use the Kolmogorov-Smirnov test at 5%
1.36
significance. (The critical value is .)
n

Section 4
Loss Models - IV
Solutions to Exercises
1. (a)
Sample mean  x  0.13345
Population mean   where  is the parameter of the Poisson distribution
   x  0.13345
Estimated Probabilities Estimated Numbers

Pr  X  0   e    0.87507

87,507
Pr  X  1  e     0.11678

11,678
Pr  X  2   e    2 2!  0.00779

779
Pr  X  3  e    3 3!  0.00035

35
Pr  X  4   e    4 4!  0.00001

1
Pr  X  5   e    5 5!  0.00000

0
Pr  X  6   0.00000 0
100,000
(b)
Sample mean = 0.13345
Sample variance = 0.14304
k 1  p 
Population mean =
p
k 1  p 
Population variance 
p2
Equating the sample and population values,
0.13345
p   0.93295
0.14304
k  1.85686
(c)
The mean and variance of a Poisson distribution are always equal (both equal to ). The
mean and variance of the negative binomial distribution are not equal; the variance is
1
greater than the mean by a factor of . Since in our sample the variance is greater than
p
the mean, the negative binomial distribution will give a better fit to the data.

(d)
Estimated Probabilities Estimated Numbers
Pr  X  0   0.87909 87,909
Pr  X  1  0.10945 10,945
Pr  X  2   0.01048 1,048
Pr  X  3  0.00090 90
Pr  X  4   0.00007 7
Pr  X  5   0.00001 1
Pr  X  6   0.00000 0
100,000
 2.85686 
eg. Pr  X  2    1  0.93295
1.85636 2
 0.93295
 2 
2.85686  1.85686
  0.932951.85636 1  0.93295 
2
2
 0.01048
2. (a)
(i)
  
 x12  x22  xn2
L     x1e 2
. x2 e 2
.... xn e 2
 n
n   xi2
 xe
n 2 i 1
i
i 1
n
 n
ln L    n ln    ln xi  x 2
i
i 1 2 i 1
 ln L   n 1 n
   xi2
  2 i 1
2n
ˆ  n
x
i 1
2
i
2  30

28.049
 2.139

(ii)
   X2 
ln f  X ;    ln   ln X  
   2 
1 X2
 
 2
2 1
ln f  X ;    2
 2 
 2 ˆ2

Var ˆ 
n

n
ˆ

SD ˆ 
30
 0.391
(b)(i)
Posterior distribution is proportional to L   g  
 n
n 
n
 2 xi2
    1e 
    xi  e i 1
.
 i 1    
1 
   xi2   
proportional to  n  1
e 2
1
which is gamma with parameters n+ and
2
 xi2   , ie. 42 and 35.0245
(ii)
Bayesian estimate of   Mean of gamma 42,35.0245 distribution
42
=
35.0245
 1.199
(c)
The Bayesian estimate of 1.199 does not lie within the approximate 95% confidence
interval for  given by the maximum likelihood estimate, ie.
2.139  t29  0.391
 2.139  2.045  0.391
The prior Bayesian estimate of  is gamma with parameters 12 and 21, ie. has mean 0.571
and standard deviation 0.165. The small standard deviation indicates a strong prior belief,
ie. a high level of confidence in the prior estimate of 0.571.
If  is actually in the region expressed by this strong prior belief then the Bayesian
estimate is better in the sense that it has a smaller mean square error. If the belief is
incorrect then the maximum likelihood estimate is better.
n n 
Note that for large n, ˆ  and ˆB  are essentially the same.
1 1
2
 xi2
2
 xi2  

3.(a)
n
L   f  xi 
i 1
n
  c xi 1e  cxi

i 1
n
n c  xi
 c n . n . xi 1 .e i 1
i 1
n n
ln L  n ln c  n ln      1  ln xi  c  xi
i 1 i 1
n
d n
ln L    xi
dc c i 1
d n n n
ln L    ln xi  c  xi ln x i
d  i 1 i 1
To find the MLEs, set both of these equations equal to zero and solve to find cˆ and ˆ .
(b)

E  X n    x n .c x 1e cx dx

0


  cx n  x 1e  cx dx
0
Let y  x dy   x 1 dx

E  X n    cy  e  cy dy
n

n
1 n
c  y  e  cy
 
 n
 c  1  n
dy
0  1  n 


 1  n 
n
c
since the integral represents the area under a gamma   1  n ,   c between 0  
and  , =1
Equations are
m1 
221.0  1  

1
 
1
40 c
m2 
1357.0  1  

2
 
2
40 c

(c)
10
F40  4    0.25
40
30
F40  6    0.75
40
 
1  e  c 4  0.25, e  c 4  0.75
 
1  e  c6  0.75, e  c6  0.25
c 4   ln  0.75 
c 6   ln  0.25 
6  ln  0.25 

4  ln  0.75 
ln  4.81884167 
  3.87835
ln 1.5 
c   0.001330
4.
Number of claims Observed number Expected number Chi-square

of policies of policies
0 120 104 2.4615
1 356 327 2.5719
2 520 564 3.4326
3 322 309 0.5469
4 62 76 2.5789
1,380 1,380 11.5919
degrees of freedom = 5 - 1 = 4
critical value = 9.488
Therefore, reject that the model gives an adequate fit to the data.

5.
x F40  x  F0  x  F40  x   F0  x  F40  x    F0  x 

1.860 0.025 0.01465 0.01035 0.01465
2.714 0.050 0.06191 0.01191 0.03691
2.780 0.075 0.06774 0.00726 0.01774
3.128 0.100 0.10491 0.00491 0.02991
3.149 0.125 0.10752 0.01748 0.00752
3.274 0.150 0.12391 0.02609 0.00109
3.424 0.175 0.14562 0.02938 0.00438
3.890 0.200 0.22753 0.02753 0.05253
3.950 0.225 0.23963 0.01463 0.03963
3.989 0.250 0.24767 0.00233 0.02267
4.791 0.275 0.43962 0.16462 0.18962
4.976 0.300 0.48870 0.18870 0.21370
4.994 0.325 0.49352 0.16852 0.19352
5.047 0.350 0.50771 0.15771 0.18271
5.091 0.375 0.51952 0.14452 0.16952
5.130 0.400 0.52998 0.12998 0.15498
5.139 0.425 0.53239 0.10739 0.13239
5.172 0.450 0.54124 0.09124 0.11624
5.194 0.475 0.54714 0.07214 0.09714
5.196 0.500 0.54767 0.04767 0.07267
5.334 0.525 0.58449 0.05949 0.08449
5.372 0.550 0.59455 0.04455 0.06955
5.599 0.575 0.65353 0.07853 0.10353
5.655 0.600 0.66768 0.06768 0.09268
5.735 0.625 0.68756 0.06256 0.08756
5.780 0.650 0.69855 0.04855 0.07355
5.799 0.675 0.70315 0.02815 0.05315
5.920 0.700 0.73174 0.03174 0.05674
5.953 0.725 0.73932 0.01432 0.03932
5.954 0.750 0.73955 0.01045 0.01455
6.297 0.775 0.81208 0.03708 0.06208
7.190 0.800 0.93895 0.13895 0.16395
7.240 0.825 0.94343 0.11843 0.14343
7.689 0.850 0.97341 0.12341 0.14841
7.911 0.875 0.98259 0.10759 0.13259
8.377 0.900 0.99364 0.09364 0.11864
8.680 0.925 0.99699 0.07199 0.09699
8.745 0.950 0.99746 0.04746 0.07246
9.237 0.975 0.99938 0.02438 0.04938
9.639 1.000 0.99984 0.00016 0.02484
Dn  0.21370
1.36
 0.215035
40
We do not reject the null hypothesis that the Weibull with c  0.00133,   3.87835 is a
good fit to the data.

ACST356 Section 4 Complete Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ACST356 Section 4 Complete Notes

Uploaded by

Copyright:

Available Formats

ACST 3056 MATHEMATICAL THEORY OF RISK

If we wish to estimate the parameters α and λ for a Pareto distribution we have

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 1 of 29

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 2 of 29

BAYESIAN POINT ESTIMATION

Recall from ACST211 the following general form of Bayes’ Theorem:

In Bayesian estimation the parameter is looked upon as a random variable. (Since  is

PRIOR AND POSTERIOR DISTRIBUTIONS

We want the conditional p.d.f. of  given X 1  x1 , X 2  x2 , ..., X n  xn , i.e.

From the definition of conditional probability / Bayes’ Theorem,

since the likelihood function, L   , can be thought of as the conditional

since h  x1 ,..., xn    h  x1 ,..., xn   g   d is independent of  .

The symbol  is read as “is proportional to”.

The constant of proportionality (ie. h  x1 ,..., xn  ) is often not required to be evaluated

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 4 of 29

If the prior distribution is Normal   ,   , then the posterior will be Normal

This result will be proved in Section 10.

Squared error or quadratic loss

 E   E ( )    2 E   E ( )  E ( )  ˆB    E ( )  ˆB 

which is a minimum when ˆB  E  

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 5 of 29

All or nothing loss

THE POISSON-GAMMA MODEL

This can be recognised as a gamma with parameters    xi and  n    .

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 6 of 29

MAXIMUM LIKELIHOOD METHOD

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 7 of 29

We use the likelihood function

Number of claims Observed number of policyholders

MINIMUM CHI-SQUARE METHOD

We find the value of the parameters which minimise

STANDARD ERROR OF ESTIMATORS

where f  X ;  is the p.d.f. of X in terms of  the parameter to be estimated. The

[An alternative expression is

where L  , X  is the likelihood.]

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 9 of 29

MAXIMUM LIKELIHOOD INTERVAL ESTIMATION

For a N   ,  2  distribution with  2 known the M.L.E. of µ is X . We know that X has

So we can say that

The maximum likelihood estimator ˆ of  in f  x;  has an approximate normal

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 10 of 29

BAYESIAN INTERVAL ESTIMATION

More formally, let u  u  x1 ,..., xn  and v  v  x1 ,..., xn  be such that

Let X 1 , X 2 ,..., X n be a random sample from a Weibull distribution having p.d.f.

Using the maximum likelihood estimate of  , find an approximate 95% confidence

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 11 of 29

The goodness-of-fit statistic is

which for large n has an approximate  2 distribution with k  1 degrees of freedom.

Does the following data indicate that a Poisson distribution is appropriate?

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 12 of 29

Let F0  x  be the distribution function implied by the null hypothesis.

As Fn  x  is a step function, this maximum occurs either just before or at a sample

0.021 0.041 0.555 1.470 1.753 2.081 2.649 9.845

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 13 of 29

For the Pareto distribution,

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 14 of 29

The results are ˆ PM  8.80065, ˆPM  19.93349 .

So,  = Mean / Variance

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 15 of 29

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 16 of 29

This is the same as the exact variance as calculated in Lecture Exercise 5.

COPYRIGHT MACQUARIE UNIVERSITY ACST3056 Section 4 Complete Notes, Page 17 of 29

The Poisson parameter  is unknown but it may be estimated by