Professional Documents
Culture Documents
04.4 PP 56 74 Statistical Inference With Mortality Data
04.4 PP 56 74 Statistical Inference With Mortality Data
4.1 Introduction
In statistics, it is routine to model some observable quantity as a random vari-
able X, to obtain a sample x1 , x2 , . . . , xn of observed values of X and then to
draw conclusions about the distribution of X. This is statistical inference.
If the random variable in question is T x , obtaining a sample of observed
values means fixing a population of persons aged x (homogeneous in respect
of any important characteristics) and observing them until they die. If the age
x is relatively low, say below age 65, it would take anything from 50 to 100
years to complete the necessary observations. This is clearly impractical.
The available observations (data) almost always take the following form:
56
The fact that the ith individual is observed from age xi > 0 rather than from
birth means that observation of that lifetime is left-truncated. This observation
cannot contribute anything to inference of human mortality at ages below xi
(equivalently, the distribution function F0 (t) for t ≤ xi ).
If observation of the ith individual ends at age xi + ti when that individ-
ual is still alive, then observation of that lifetime is right-censored. We know
that death must happen at some age greater than xi + ti but we are unable to
observe it.
Left-truncation and right-censoring are the key features of the mortality data
that actuaries must analyse. (Left-truncation is a bigger issue for actuaries than
for other users of survival models such as medical statisticians.) They mat-
ter because they imply that any probabilistic model capable of generating the
data we actually observe cannot just be the family of random future lifetimes
{T x } x≥0 . The model must be expanded to allow for:
We may call this “the full model”. Although we may be interested only in that
part of the full model that describes mortality, namely the distributions of the
family {T x } x≥0 (or, given the consistency condition, the distribution of T 0 ), we
cannot escape the fact that we may not sample values of T x at will, but only
values of T x in the presence of left-truncation and right-censoring. Strictly,
inference should proceed on the basis of an expanded probabilistic model ca-
pable of generating all the features of what we can actually observe. Figure
4.1 illustrates how left-truncation and right-censoring limit us to observing the
density function of T 0 over a restricted age range, for a given individual who
enters observation at age 64 and is last observed to be alive at age 89.
Intuitively, the task of inference about the lifetimes alone may be simpli-
fied if those parts of the full model accounting for left-truncation and right-
censoring are, in some way to be made precise later, “independent” of the
lifetimes {T x } x≥0 . Then we may be able to devise methods of estimating the
distributions of the family {T x } x≥0 without the need to estimate any properties
of the left-truncation or right-censoring mechanisms. We next consider when
it may be reasonable to assume such “independence”, first in the case of right-
censoring.
0.04
0.02
0.01
0.00
0 20 40 60 80 100 120
Age at death
0.04
Probability density
0.03
0.02 right−censored
0.00
0 20 40 60 80 100 120
Age at death
Figure 4.1 A stylised example of the full probability density function of a random
lifetime T 0 from birth (top) and that part of the density function observable in
respect of an individual observed from age 64 (left-truncated) to age 89 (right-
censored) (bottom).
4.2 Right-Censoring
At its simplest, we have a sample of n newborn people (so we can ignore left-
truncation), and in respect of the ith person we observe either T 0i (their lifetime)
or C i (the age at which observation is censored). Exactly how we should han-
dle the censoring depends on how it comes about. Some of the commonest
assumptions are these (they are not all mutually exclusive):
• Type I censoring. If the censoring times are known in advance then the mech-
anism is called Type I censoring. An example might be a medical study in
which subjects are followed up for at most five years; in actuarial studies
plan. As with any study that will generate data for statistical analysis, it is best
to consider what probabilistic models and estimation methods are appropriate
before devising an observational plan and collecting the data. In actuarial stud-
ies this rarely happens, and data are usually obtained from files maintained for
business rather than for statistical reasons.
4.3 Left-Truncation
In simple settings, survival data have a natural and well-defined starting point.
For human lifetimes, birth is the natural starting point. For a medical trial,
the administration of the treatment being tested is the natural starting point.
Data are left-truncated if observation begins some time after the natural start-
ing point. For example, a person buying a retirement annuity at age 65 may
enter observation at that age, so is not observed from birth. The lifetime of a
participant in a medical trial, who is not observed until one month after the
treatment was administered, is left-truncated at duration one month. Note that
in these examples, the age or duration at which left-truncation takes place may
be assumed to be known and non-random.
Left-truncation has two consequences, one concerning the validity of statis-
tical inference, the other concerning methodology:
• By definition, a person who does not survive until they might have come
under observation is never observed. If there is a material difference between
people who survive to reach observation and those who do not, and this is
not allowed for, bias may be introduced.
• If lifetime data are left-truncated, we must then choose methods of inference
that can handle such data. All the methods discussed in this book can do so,
unless specifically mentioned otherwise.
Suppose we are investigating the mortality experience of a portfolio of term-
insurance policies. Consider two persons with term-insurance policies, both
age 30. One took out their policy at age 20, the other at age 29, so observation
of their lifetimes was left-truncated at ages 20 and 29, respectively. For the
purpose of inference, can we assume that at age 30 we may treat their future
lifetimes as independent, identically distributed random variables? Any actu-
ary would say not identically distributed, because at the point of left-truncation
such persons would just have undergone medical underwriting. The investiga-
tion would most likely lead to the adoption of a select life table.
In the example above, an event in a person’s life history prior to entry into
observation influences the distribution of their future lifetime. In this case, the
Age 64
Age 63
(i)
Age 62
(ii)
Age 61
Age 60
1 Jan 2005 1 Jan 2006 1 Jan 2007 1 Jan 2008 1 Jan 2009
Figure 4.2 A Lexis diagram illustrating: (i) the lifetime of individual i entering
observation at age xi = 60.67 years on 1 May 2005 and leaving observation by
censoring at exact age xi = 63.67 years on 1 May 2008; and (ii) the lifetime of
individual j entering observation at age x j = 60.25 years on 1 July 2006 and
leaving observation by death at exact age x j = 62.5 years on 1 October 2008.
85 Males
75
70
65
60
55
Year of observation
Definition 4.2 Define the random variable T i to be the length of time for
which we observe the ith individual after age xi .
We regard the observed data (di , ti ) as being sampled from the distribution
of the bivariate random variable (Di , T i ). This is consistent with the notational
convention that random variables are denoted by upper-case letters, and sample
values by the corresponding lower-case letters.
It is important to realise that Di and T i are not independent; they are the com-
ponents of a bivariate random variable (Di , T i ). Their joint distribution depends
on the maximum length of time for which we may observe the ith individual’s
lifetime. For example, suppose we collect data in respect of the calendar years
2008–2011. Suppose the ith individual enters observation at age xi on 29 Oc-
tober 2010. Then T i has a maximum value of 1 year and 63 days (1.173 years),
and will attain it if and only if the ith individual is alive and under observation
on 31 December 2011. In that case, also, di = 0. Suppose for the moment that
no other form of right-censoring is present. Then the only other possible obser-
vation arises if the ith individual dies some random time T i < 1.173 years after
29 October 2010. In that case, ti < 1.173 years and di = 1. Since ti < 1.173 if
and only if di = 1, it is clear that Di and T i are dependent. Motivated by this
example, we make the following additional definition.
dx
r̂ x = . (4.4)
E cx
The notation d x for deaths is self-explanatory. The notation E for the de-
nominator stands for “exposed-to-risk” since this is the conventional actuarial
name for the denominator in a “raw” mortality ratio. The superscript c further
identifies this quantity as a “central exposed-to-risk”, which is the conven-
tional actuarial name for exposure based on “time lived” rather than “number
of lives” (Section 1.4). The choice of r is to remind us that we are estimating a
mortality ratio, and the “hat” adorning r on the left-hand side of this equation
is the usual way of indicating that r̂ x is a statistical estimate of some quantity
based on data.
This leads to the following question. The quantity r̂ x is a statistical estimate
of something, but what is that something? We shall see in Section 4.8 that
under some reasonable assumptions r̂ x may be taken to be an estimate of the
hazard rate at some (unspecified) age between x and x+1. In actuarial practice,
we usually call such a “raw” estimate a crude hazard rate. We might therefore
rewrite equation (4.4) as:
dx
r̂ x = μ̂ x+s = , (4.5)
E cx
but since s is undetermined this would not be helpful. In some texts, this mor-
tality ratio is denoted by μ̂ x , with the proviso that it does not actually estimate
μ x , but we reject this as possibly confusing. If the hazard rate is not changing
too rapidly between ages x and x + 1, which is usually true except at very high
ages, it is reasonable to assume s ≈ 1/2, and that r̂ x can reasonably be taken to
estimate μ x+1/2 .
Because these are each based on finite samples, we may expect their sequence
to be irregular. Experience suggests, however, that the hazard rate displays a
strong and consistent pattern over the range of human ages. We may suppose
(or imagine) that there is a “true” underlying hazard rate, which is a smooth
function of age, and that the irregularity of the sequence of crude estimates is
just a consequence of random sampling.
Figure 4.4 shows crude hazard rates as in equation (3.1) for single years of
age, based on data for ages 65 and over in the Case Study. Data for males
and females have been combined. The points on the graphs are values of
log r̂ x , which we take to be estimates of log μ x+1/2 . The logarithmic scale shows
clearly a linear pattern with increasing age, suggesting that an exponential
function might be a candidate model for the hazard rate. On the left, the crude
0 0
log(mortality hazard)
log(mortality hazard)
−1 −1
−2 −2
−3
−3
−4
−4
−5
−5
−6
60 70 80 90 100 60 70 80 90 100
Age Age
Figure 4.4 Logarithm of the crude mortality hazards for single years of age, for
the Case Study, males and females combined. The left panel shows the experi-
ence data for 2012 only, showing a roughly linear pattern by age on a log scale.
The right panel shows the experience data for 2007–2012, showing how a greater
volume of data brings out a clearer linear pattern.
rates are calculated using the data from 2012 only, and on the right using the
data from 2007–2012. Comparing the two, we see that the larger amount of
data results in estimates log r̂ x that lie much closer to a straight line.
Then we may attempt to bring together the consequences of all our assump-
tions:
• From our idea that the “true” hazard rate ought to be a smooth function of
age, we may propose a smooth function that we think has the right shape.
• From the probabilistic model for the observations, we calculate the sampling
distributions of the crude estimates of the hazard rates.
Example The Poisson model. An alternative approach does not impose any
limits on the form of the data, other than those in Section 4.6. This is therefore
an approach based on “time lived”. We assume instead that the hazard rate is
constant between ages x and x + 1, and for brevity in this example we denote
it by μ. We then proceed by analogy with the theory of Poisson processes.
Consider a Poisson process with constant parameter μ. If it is observed at
time t for a short interval of time of length dt, the probability that it is observed
to jump in that time interval is μ dt + o(dt) (see Cox and Miller, 1987, pp.153–
154). This is exactly analogous to equation (3.16). However, a Poisson process
can jump any positive integer number of times, while an individual can die just
once. So our observation of the ith individual takes the form of observing a
Poisson process with parameter μ between ages xi and xi + bi , with the stipula-
tion that if death or right-censoring occurs between these ages, observation of
the Poisson process ceases at once.
This form of observation does not conform to any of the standard univariate
random variables, so it does not have a name in the literature. Its natural setting
is in the theory of counting processes, which is now the usual foundation of
survival analysis (see Andersen et al., 1993, and Chapters 15 and 17).
What distribution does the aggregated data D x then have? D x is no longer
the sum of identically distributed random variables. However, if the number of
individuals n is reasonably large, and μ is small, we can argue that D x should
be, approximately, the number of jumps we would see if we were to observe
a Poisson process with parameter μ for total time E cx . Thus, D x is distributed
approximately as a Poisson(E cx μ) random variable.
It is obvious that this can be only an approximation, since a Poisson random
variable can take any non-negative integer value, whereas D x ≤ n. When the
expected number of deaths is very small, it cannot be relied upon; see Table
1.2 and the ensuing discussion for an example. In most cases, however, this
approximation is very good indeed, so this Poisson model is also often taken
as the starting point in the actuarial analysis of mortality data. As in the case
of the binomial model, this overlooks the fact that the data are generated at the
level of the individual.
In this book we will make much use of the Poisson model and models
closely related to it because of the form of their likelihood functions. Here, we
just note that if we assume the Poisson model to hold, then E[D x ] = Var[D x ] =
E cx μ and the mortality ratio r̂ x = d x /E cx is an estimate of the parameter μ, and
we usually assume that μ is in fact μ x+1/2 .
P x + P x
E cx ≈ (4.7)
2
in respect of that calendar year. The CMI until recently collected “sched-
uled” census data of this form from UK life offices and aggregated the re-
sults over four-year periods to estimate r̂ x .
• If there is a national census, often every ten years, and reliable registration
data for births, deaths and migration, then the population in years after a
census can be estimated by applying the “movements” to the census totals.
although it is still important for the reasons given at the start of Section 4.6.
Our main focus is on building models based on complete lifetime data. We
regard the fact that we were able to set out all the definitions needed for that in
one short section as indicating the relative simplicity of that approach.