Professional Documents
Culture Documents
Survival Models: and Their Estimation
Survival Models: and Their Estimation
Third Edition
6.1 INTRODUCTION
Assume that the basic data regarding an observation period, dates of birth
and membership into the group under observation, and dates of death have
been analyzed to produce the ordered pair (xri , xsi ) for person i’s sched-
uled contribution to (x, x1], as illustrated in Figure 5.6. Recall that si œ 1 is
possible but si œ 0 is not, whereas, ri œ 0 is possible but ri œ 1 is not.
For person i, the probability of death while under observation in the
single-decrement environment is the conditional probability of death before
age xsi , given alive at age xri , which is si ri qxr i , and this is also the ex-
pected number of deaths from this sample of size one. This is easily
seen, since the number of deaths is one (with probability si ri qxri ), or
zero (with probability si ri pxri ). Thus the expected number is
1 † si ri q x ri 0 † s i r i p x r i œ s i r i q x r i . (6.1)
114 Incomplete Data Samples: Moment Procedures
If nx is the total number of persons who contribute to (x, x1], then the total
number of expected deaths is !
nx
si ri q x ri . (For convenience we will use n
iœ1
for nx in our summations.) When equated to the actual number of observed
deaths, we obtain the moment equation
where Dx is the random variable for deaths in (x, x1], and dx is the observed
number.
To solve (6.2) for our estimate of qx , we will use the approximation
iœ1
E [Dx ] œ nx † qx œ dx , (6.6)
estimator of the conditional mortality probability qx , where the model for the
likelihood is a simple binomial model. Thus the number of persons in the
sample, nx , can be thought of as a number of binomial trials. The standard
characteristics of a binomial model are assumed to apply. Thus each trial is
considered to be independent, and the probability of death on a single trial
(qx ) is assumed constant for all trials. In such a situation, the sample pro-
portion of deaths, which is given by (6.7), is a natural estimator for this
parameter qx .
If si œ 1 for all, but ri 0 for some of the nx persons who contribute
to (x, x1], then we have si ri qxri œ 1ri qxri , and (6.2) becomes
which is the same as (3.77). Substituting (6.9) into (6.8), we obtain the result
dx
! (1ri )
q^ x œ n , (6.10)
iœ1
E [Dx ] œ " si qx œ dx .
n
(6.11)
iœ1
si q x œ si † qx , (6.12)
which is the same as (3.54). When (6.12) is substituted into (6.11) we obtain
the result
q^ x œ ndx ,
! si
(6.13)
iœ1
EXAMPLE 6.1 From the sample of five persons whose basic data are
given below, estimate q30 , using an observation period of calendar year 1997.
Person Date of Birth Date of Death
1 July 1, 1967 —
2 April 1, 1967 —
3 January 1, 1967 October 1, 1997
4 July 1, 1966 —
5 April 1, 1966 April 22, 1997
- - - - qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
ñ ñ | ñ ñ ñ | ----
30.25 30.50 30.75
30 31
1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ñ
2 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ñ
3~~~~~~~~~~~~~~~~~~~~~~~~Œ~~~~~~~ ñ
4~~~~~~~~~~~~~~~~ñ
5 ~~~~~~~ñ - - - Œ
FIGURE 6.1
The ordered pairs (ri , si ) are (0, .50), (0, .75), (0, 1), (.50, 1), and (.75, 1), for
i œ 1,2,3,4,5, respectively. Thus the denominator of Estimator (6.5) is
.50 .75 .1.00 .50 .25 œ 3.00. The numerator of (6.5) is dx œ 1, be-
cause person 5 died after reaching age 31. Thus q^30 œ 13 .
6.2.3 Exposure
The concept of exposure was defined in Section 3.3.5 in the context of a life
table model. At this time, and at several other places throughout the text, we
will make use of this concept in the context of a study sample.
The ordered pair (ri , si ) for person i represents the interval of age,
within (x, x1], over which person i is potentially under observation. We say
that person i is exposed to the risk of death from age xri to age xsi .
Numerically, si ri is the length of time (in years) that person i is exposed, so
it also represents the amount of exposure, measured in units of life-years,
Moment Estimation in a Single-Decrement Environment 117
iœ1
The exposure interpretation of the denominators of Estimators (6.7),
(6.10), and (6.13) is the same as that for Estimator (6.5).
It is important to recognize that the denominator of the moment
estimator represents the scheduled exposure of the sample. When a sample
member dies while under observation, not all of the scheduled exposure is
realized. Henceforth in this text we will refer to the realized exposure as
exact exposure, to distinguish it from scheduled exposure. Note that exact
exposure cannot exceed scheduled exposure.
EXAMPLE 6.2 Calculate the exact exposure, for the interval (30,31], of
the sample described in Example 6.1.
SOLUTION Only person 3 has exact exposure less than scheduled expo-
sure, and it is less by .25 years. Thus the total exact exposure of the sample
is 2.75.
^x
m
q^ x œ 1 ^x , (6.14)
1 2 †m
which uses the linear distribution assumption for jxs . Alternatively, if we as-
sume an exponential jxs (the constant force of mortality assumption), then
^ œ m
this constant force, ., is the same as mx , so that . ^ x , and thus
6.2.4 Grouping
Consider Special Case B in which the nx persons who contribute to (x, x1]
enter this interval at various ages, but all are scheduled to exit the interval at
age x1. That is, all are scheduled to survive (x, x1]; none are scheduled to
be study enders at an age zi such that x zi x1. It may reasonably be
assumed that many of these nx persons enter the interval at exact age x (i.e.,
have ri œ 0); let us say that bx is the subset of nx which has ri œ 0. (bx is
chosen to stand for beginner, since these persons are exposed from the
beginning of (x, x1].)
The remainder of the sample, say kx œ nx bx , all have ri such that
0 ri 1. Let us assume that the average value of these ri ’s is r, 0 r 1.
(Alternatively, it might be possible that the study was designed such that kx
persons all had ri œ r, and no averaging is needed.) This situation is
illustrated in Figure 6.2.
bx Æ ~~~~~~~~~~~~~~~~~~~~ñ
kx Æ ~~~~~~~~~ñ
- - - qqqqqqqqqqqqqqq
| | | ---
x xr x1
FIGURE 6.2
q^x œ b (1d
x
r) † kx . (6.17)
x
nx cx Æ ~~~~~~~~~~~~~~~~~~~~~~~ñ
cx Æ ~~~~~~~~~~~~~ñ
- - - qqqqqqqqqqqqqqqqqqqq
| | | ---
x xs x1
FIGURE 6.3
Under the linear assumption, s qx œ s † qx , (6.18) is easily solved for the esti-
mator
q^x œ n (1dx s) † c . (6.19)
x x
In this section we will explore the bias and the variance of the moment
estimators derived thus far in the chapter. It is assumed that the reader is fam-
iliar with these two properties of estimators. Formal definitions of them are
given in Appendix A.
A characteristic of the moment estimators for qx which we will
derive is that they are always unbiased under the assumption, or other ap-
proximation, used to derive them. We will demonstrate this for the general
form of the moment estimator given by (6.5).
To investigate bias, we need to compare the expected value of the es-
timator with the true value of the parameter, qx , which it seeks to estimate.
We are using q^x to represent the realized value of the estimator random
variable when the random variable for number of deaths, Dx , has realized
value dx . Thus we use Q^ for the estimator random variable, so that
x
120 Incomplete Data Samples: Moment Procedures
^ œ Dx
! (si ri )
Qx n (6.5a)
iœ1
! si ri q x ri qx † ! (si ri )
n n
^ ] œ
! (si ri ) ! (si ri )
iœ1 iœ1
E [Qx n œ n œ qx . (6.20)
iœ1 iœ1
This shows that Estimator (6.5) is unbiased under the approximation used to
derive it. It is biased to the extent that (6.3) deviates from the mortality distri-
bution to which the sample is actually subject. This last point is illustrated in
the following example.
iœ1
n
D 1ri qxri
^ œ
mator Q n
Dx ^ ] œ n E[ D x ] œ
. Then E[Q iœ1
n . If we assume
x x
D (1ri ) D (1ri ) D (1ri )
iœ1 iœ1 iœ1
^ ] œ q , so Estimator (6.10) is unbiased
that 1ri qxri œ (1ri ) † qx , then E[Qx x
under this assumption. However, as shown in Chapter 3, this assumption
implies that the force of mortality decreases over (x, x1). Thus if the force
is constant or increasing, as is true over most ages other than juvenile ages,
then 1ri qxri (1ri ) † qx . In this case we have E[Dx ] ! (1ri ) † qx , and
n
iœ1
n
D (1ri )†qx
^ ]
thus E[Q iœ1
n œ qx , which shows that, for most ages, Estimator
x
D (1ri )
iœ1
Dx œ " Di ,
n
(6.21)
iœ1
Var (Dx ) œ " Var (Di ) œ " si ri qxri (1 si ri qxri ),
n n
(6.22)
iœ1 iœ1
so that
! si ri qxri (1 si ri qxri )
n
^ ln , {r ,s }) œ iœ1
” !(si ri )•
Var (Qx x i i 2 . (6.23)
n
iœ1
The notation Var (Q^ ln , {r ,s }) is used to remind us that the variance is con-
x x i i
ditional on the size of the sample, nx , and the set of times at which persons
enter and are scheduled to exit (x, x1]. Expression (6.23) could then be sim-
plified by using approximation (6.3).
Alternatively, the variance of (6.5) can be approximated by noting
n
^
that Qx is approximately a binomial proportion with sample size D (si ri ).
iœ1
Thus the approximate variance is
^ ln , {r ,s }) ¸ px qx
! (si ri )
Var (Qx x i i n . (6.24)
iœ1
Expressions for the variances of Estimators (6.7), (6.10) and (6.13),
for Special Cases A, B, and C, respectively, are analogous to Expressions
(6.23) and (6.24), and their derivations are pursued in the exercises.
EXAMPLE 6.4 Find an exact expression for the variance of the Special
Case C estimator given by (6.19).
SOLUTION Let Dxw be the random variable for deaths out of the cx
group, and Dxww be the random variable for deaths out of the (nx cx ) group.
122 Incomplete Data Samples: Moment Procedures
Note that Dx œ Dxw Dxww , that Dxw and Dxww are both binomial random
variables, and that they are independent. From these properties we find that
Var (Dx ) œ Var (Dxw ) Var (Dxww ) œ cx † s qx (1s qx ) (nx cx ) † qx (1qx ), so
^ ln , c ) œ cx †s qx (1s qx ) (nx cx )†qx (1qx ) .
that Var (Q
x x x [n (1s)†c ]2
x x
We saw in Section 6.2.4 that the expected deaths out of the cx , all scheduled
to exit at age xs, was given by cx † s qx . The fractional-interval probability
required an assumption (such as the uniform) in order to solve the resulting
moment equation. As an alternative to making such a distribution assump-
tion, we now consider a different approach.
Assume that the cx scheduled enders are subject to the same proba-
bility of death in (x, x1] as are the remaining nx cx , namely qx . Now con-
sider all of the deaths out of cx in (x, x1]; let us call this dxc . Some of these
deaths occur before exit from the study, and are thus observed, whereas
others occur after exit, and are not observed. Let dxw denote the number of
observed deaths out of cx , and let ! represent the proportion of dxc that are
observed. Thus dxw œ ! † dxc .
Note well that dxc is not known, so ! is not calculated. Rather, ! is
an input parameter, presumably derived from other data, or merely assumed.
The assumption of ! takes the place of a distribution assumption.
Of the cx group, cx † qx are expected to die in (x, x1], and ! † cx † qx
is the expected number of observable deaths. (nx cx ) † qx is the expected
observable deaths out of nx cx , so total expected observable deaths, equated
to the actual observed deaths, gives the moment equation
Note that there are no observed enders in (x, x 1] from the nx cx group.
Moment Estimation in a Single-Decrement Environment 123
b È b2 4! † nx † dx
q^x œ , (6.28)
2! † nx
where b œ nx (1!) † ex ! † dx . The negative radical is used in (6.28)
because the positive radical would result in q^x 1.
It turns out that Estimator (6.28) is also a maximum likelihood
estimator, which we will show in Chapter 7 (see Exercise 7-9). We also defer
to Chapter 7 the derivation of the variance of this estimator (see Exercise 7-
22).
One advantage of Estimator (6.28) over Estimator (6.19) is that
(6.28) does not require knowing the value of cx , but rather depends on the
number of actually observed enders, ex . Also, it has some of the desirable
properties of an MLE. Although it does not require a distribution assump-
tion, as does (6.19), it does require an assumed value for !.
This estimator was derived by Schwartz and Lazar [66].
and
where Wx is the random variable for withdrawals in (x, x1], and wx denotes
the observed number of withdrawals in the sample.
A simple way to solve Equations (6.29a) and (6.29b) for estimates of
the double-decrement probabilities q(xd) and q(xw) would be to use approxima-
tions analogous to (6.3), namely
( d) ( d)
si ri qxri ¸ (si ri ) † qx (6.30a)
and
(w)
si r i q x ri ¸ (si ri ) † q(xw) , (6.30b)
producing
Moment Estimation in a Double-Decrement Environment 125
dx
! (si ri )
(d)
q^x œ n (6.31a)
iœ1
and
wx
! (si ri )
(w)
q^x œ n . (6.31b)
iœ1
If the random events, death and withdrawal, are each assumed to have uni-
form distributions over (x, x1) in the single-decrement context, we know
w (d) (ur) † qwx (d)
from Section 3.5 that ur qxr œ , 0 Ÿ r u Ÿ 1, and also that
1 r†qwx (d)
qwx (d) qwx (d)
.(xd) u œ , so that (1ur qwx
( d) ( d)
r ).xu œ . Then Equation
1 u†qwx (d) 1 r†qwx (d)
(5.14a) becomes
126 Incomplete Data Samples: Moment Procedures
œ (
s ( d) (w)
qxw 1 u † qxw
( d) † (w) du
r 1 r † qxw 1 r † qxw
( ’1 u † qxw “ du
( d) s
qxw
’1 r † qxw “ ’1 r † qxw “
(w)
œ ( d) (w)
r
’(sr) 12 (s 2 r 2 ) † qwx “
(d) (w)
qxw
’1 r † qxw “ ’1 r † qwx “
œ (d) (w)
. (6.32a)
’(sr) 12 (s 2 r 2 ) † qwx “
(w) ( d)
qxw
’1 r † qxw “’1 r † qxw “
(w)
s r q x r œ ( d) (w)
. (6.32b)
Substitution of (6.32a) and (6.32b) into the basic moment equations (6.29a)
and (6.29b), respectively, results in
and
q^x œ d
( d) x
nx (6.31c)
and
q^x œ w
(w) x
nx , (6.31d)
Moment Estimation in a Double-Decrement Environment 127
and
In Chapter 7 we will see that Estimators (6.34a) and (6.34b) are maximum
likelihood estimators.
Alternatively, we could simplify Equations (6.33a) and (6.33b) under
the Special Case A values of ri œ 0 and si œ 1 and solve the resulting
equations directly for our estimates of qwx (d) and qwx (w) given by Equations
(6.34a) and (6.34b). This alternative approach is pursued in the exercises.
EXAMPLE 6.6 100 persons enter the estimation interval (20,21] at exact
age 20. None are scheduled to exit before exact age 21, but one death and
two random withdrawals are observed before age 21. Assuming a uniform
distribution of both random events, estimate qw20(d) .
œ .(d) (
s
(d)
.(w) )
e(ur)(. du
r
(w) ’1 e “.
( d)
. (sr)(.(d) .(w) )
œ ( d) (6.35a)
. .
(w) ’1 e “.
(w) .(w) (sr)(.(d) .(w) )
s r q x r œ (d) (6.35b)
. .
Substitution of (6.35a) and (6.35b) into the basic moment Equations (6.29a)
and (6.29b), respectively, results in
Again, the necessity of a numerical solution of these equations for . ^ (d) and
^ (w) is obvious.
.
If Special Case A prevails, with ri œ 0 and si œ 1 for all i, then we
can easily obtain our estimates of qwx(d) and qwx(w) by substituting the estimates
of q(xd) and q(xw) , given by Equations (6.31c) and (6.31d) for q(xd) and q(xw)
themselves in Equations (5.28a) and (5.28b), producing,
œ 1 ’ nx dnxxwx “
w (d) dx Î(dx wx )
q^ x (6.37a)
and
œ 1 ’ nx dnxxwx “
w (w) wx Î(dx wx )
q^ x . (6.37b)
In Chapter 7 we will see that Estimator (6.37a) and (6.37b) are maximum
likelihood estimators.
Moment Estimation in a Double-Decrement Environment 129
EXAMPLE 6.7 Use the data of Example 6.6 to estimate qw20(d) , assuming
an exponential distribution for both death and withdrawal.
nx dx wx
SOLUTION Using Equation (6.37a), we have the values nx œ .97
dx w (d)
and dx wx œ 13 . Then q^20 œ 1 (.97)1Î3 œ .0101017.
Hoem deals with this dilemma by simply assuming that no one who
died in (x, x1] had a ti value that was less than si . In other words, all deaths
contribute (si ri ) to the exposure. Then the estimate of qwx (d) is produced as
the ratio of number of deaths to the exposure.
An example will make this procedure clear. The important point is
that all persons have a known value of si at entry into (x, x1], regardless of
the outcome for person i (survival, death or withdrawal). Each person’s value
of ti becomes known only if that person actually withdraws. Since ti can
never become known for a person who dies, si values are used for all deaths.
EXAMPLE 6.8 The basic data on a sample of four persons has been
analyzed to produce the following ordered pairs (ri , si ) with respect to the
estimation interval (x, x1]. In addition, it is known that one person died and
one person withdrew while under observation, at the times shown. Estimate
qwx (d) .
Time of Time of
Person (ri , si ) Death Withdrawal
1 (0, .40) — —
2 (0, 1) .70 —
3 (.30, 1) — —
4 (.50, .80) — .75
Of the various approaches to the estimation of qx over the interval (x, x1]
described in this text, the actuarial approach was the first to be developed,
dating from the middle of the nineteenth century. There are two significant
implications of this early origin:
It will be easy to see the reflection of these two observations in the features
of the actuarial method. The method was developed primarily in an intuitive
manner, in the absence of a guiding statistical theory, and a key theoretical
concession was made in the interest of simpler record-keeping and calcula-
tion. The traditional actuarial approach remained in use well into the second
half of the twentieth century, even after the availability of both high-speed
data processing capabilities and modern statistical theory had made the pro-
cess nearly obsolete. Over the past ten years, however, we have seen mount-
ing evidence to suggest that this pioneering method of survival model
estimation should become, and is becoming, of historical significance only.
Therefore the presentation of the method here is in the nature of a historical
summary. The reader interested in a fuller history can pursue this through
the many references given here.
We will present the actuarial method assuming a double-decrement
environment. The simplification that results if the environment is single-dec-
rement will be easily seen.
EXAMPLE 6.9 Three persons in a study sample all have the ordered pair
(ri , si ) = (0, .75). Person 1 remains in the sample to age x.75, Person 2
withdraws at age x.50, and Person 3 dies at age x.50. How much expo-
sure does each person contribute (a) under the actuarial approach; (b) under
Hoem’s moment approach?
SOLUTION (a) Person 1 contributes .75, from age x to age x.75; Person
2 contributes .50, from age x to age x.50, where exposure ceases due to
withdrawal; Person 3 contributes 1.00, from age x to age x1. (b) Person 1
and Person 2 again contribute .75 and .50, respectively; Person 3 contributes
only to scheduled exit age, which is .75, under the moment approach.
Cantelli [18] in 1914, and Balducci [5] a few years later. (Cantelli’s faulty
argument is not presented here; the interested reader will find it in [18].)
Cantelli’s argument, although faulty, appeared to have been widely
accepted, at least in North America. Subsequent papers by Wolfenden [75],
Beers [9] and Marshall [54], and textbooks by Gershenson [32] and Batten
[8], in effect repeated his argument. But the preponderance of modern
statistical thought indicates that Cantelli’s approach, although correct for
Special Cases A and B, is seriously flawed in the cases involving enders
within the estimation interval. (See, for example, Hoem [36] and Seal [68].)
In a single-decrement environment the possibility of withdrawals
does not exist. Then observed enders contribute exposure to their observed
ending ages, and observed deaths contribute exposure to age x1, regardless
of a possibly earlier scheduled ending age.