Professional Documents
Culture Documents
University of California, Berkeley, California 94720 Received 13 January I977
University of California, Berkeley, California 94720 Received 13 January I977
ABSTRACT
Presented in this paper is a modification of the general illness-death model for the
study of chronic diseases. Coronary heart diseases (CHD) are used as an example. The
model contains two illness (transient) states: St, the “healthy” state; S,, state of having
CHD; R,, state of death from other causes; R,, death from CHD. Transitions between
states are governed by intensity functions. The study population consists of two groups of
people: a group of n, people who are initially healthy (in S,), and a group of n2 people
who are affected with CHD (in S,) at the time of entering the study. The n, + nz people
are subject to various lengths of time of observation. The time of first symptom of having
CHD and the time of death are treated as random variables. Likelihood function of these
random variables has been derived and maximum likelihood estimators of the intensity
functions have been obtained. Formulas for the variances-covariances of the ML estima-
tors have been obtained by using the information matrix. Application to the model to an
empirical data set is also discussed.
I. INTRODUCTION
Statistical analysis of medical follow-up data is usually confined to
survival and death of each member of a study population; health conditions
of survivors and causes of death are not considered. However, when a study
is designed to investigate a particular disease [or a group of diseases, such as
coronary heart diseases (CHD)], either as a cause of death or for its
incidence and prevalence, distinction should be made between survivors
who are affected with the disease and those who are not, and between
deaths from CHD and death from other causes. Statistical methods have
been developed for dealing with such problems for large samples when the
*The research reported herein was performed pursuant to a grant (#003-P-20-2) from
the Department of Health, Education and Welfare, Washington, D.C. The opinions and
conclusions expressed herein are solely those of the authors, and should not be construed
as representing the opinions or policy of any agency of the United States government.
are discussed. The model is then applied to data from the Japanese
American Health Research Program, an epidemiological study of coronary
heart disease in the San Francisco Bay Area of Northern California.
Q&A Tu)
= Pr{he will die from CHD in (0, Tv)}, satisfying the equation
‘I,(O,‘,,)+P,Z(O,‘,~)+ Q~~.~(O~T~j)+Q~~.~(O,T~j)+Q~~(O,T
(1)
The subscript after the dot in Q,,., and Q,,., refers to the state S, or S, from
which one enters state R,.These transition probabilities form the basis for
the analysis of morbidity and mortality data when the sample size is large
and the time of illness or death is unknown. When the time of occurrence of
an illness or death is an observable random variable, the corresponding
probability density functions and likelihood function need be developed.
For this purpose, we introduce for each individual a vector of indicators
forj= l,..., n,, corresponding to the five possible end results listed above, so
that
EI lj = 1 if S,+S, is realized in (0, Tlj),
= 0 otherwise,
El2, = 1 if S,+S2 is realized in (0, Tu),
= 0 otherwise,
6,,.lj= l if S,+ R, is realized in (0, T,j ),
=0 otherwise,
E,lj+E12j+~ll.,j+S,,.Zj+S,Zj=1. (2)
Here the symbol E is used to denote living, and 6 death. The expectations of
these indicators are the corresponding probabilities, namely
E[El~jI=f'I1(O,T1,), (3)
E [ElZjI
= PI2(03Tlj), (4)
Q11.1(OTTlj)a
E[“,,,,j]= (5)
E[a,,.2j]=
Q~~.z(OTT,,), (6)
E[6,2j]=Q,2(0,',). (7)
TRANSITION-PROBABILITY MODEL 329
For each individual in the group who is initially affected with coronary
heart disease (in S,), there are three possible end results:
(1) He may remain ill and alive at Tzj (S2+S2) with a probability
P,,(O, T,j)>
(2) He will die from other causes (S, -+R,), for which the probability is
Qdl ‘,,I, and
(3) He will die from CHD (S2+R2) with a probability Q,,(O, Tzj).
The corresponding indicators are +j, 6,,j, a,,, satisfying the equation
EL&Z,] = P22
(0,Tzj)’ (9)
and
x,1
*I x,2
c
j=l
= y,,.,
yi1.2
(12)
y,2
n2 E22J
x22
&2lj =
c Y2l (13)
i=J
I
Y 22 1
for the second group. When the observation period is the same for all the n,
individuals, the random vector on the right-hand side of (12) has the
ordinary multinomial distribution with the probabilities given in (1). A
similar statement may be made for the vector (X2,, Y2,, Y22)’in Eq. (13).
The above notation is summarized in Table I.
330 SUSAN T. SACKS AND CHIN LONG CHIANG
TABLE 1
Transition in
State at interval State at Transition Number of
time 0 (0, T,,) time Tti probability Indicator individuals
Total 1 1 “1
Total 1 1 “2
The unknown parameters underlying the present model are the morbid-
ity intensity function ~&t), and the mortality intensity functions piI(
pZ,(t), and pz2(t); each represents instantaneous transition between respec-
tive states at time t. Formally, they are defined as follows:
The intensities pzI(~) and fizz(t) are similarly defined. For simplicity of
formulas. we let
and
(21)
(22)
is the probability that an individual will not develop CHD in (0, Tv),the
assertion is proven by Eqs. (23) and (24). Similarly, it can be shown that
P&O, T2j) + Q21<0,T2j) + QdO, T2j) = 1.
Although the formulas (18) through (22) hold whatever form the integra-
ble functions may take, functional forms must be specified if the model is to
be applied to practical problems. When the intensity functions are indepen-
dent of time, so that
v,z(t) = Vl2> F,,(t)=P,b
(25)
P2l (t> = P21, P*z(f> = P22,
332 SUSAN T. SACKS AND CHIN LONG CHIANG
with
the formulas for the transition probabilities in (18) and (22) assume simpler
forms. Substituting (25) and (26) in (18) through (22) and integrating the
resulting expressions yields
(214
(22a)
and
The intensity functions v,~, p,,, pLz,,and p22 are, in effect, (instantaneous)
incidence rates. These rates are time independent under the assumption
(25). How realistic this assumption is, depends upon the problem under
study. When this assumption appears to be too strong, the study period To
should be subdivided in order to use this model.
This model will describe illness and death processes more accurately
when a study population is homogeneous with respect to race, sex, age, and
possibly other demographic variables, as well as known risk factors such as
cholesterol level and blood pressure. In the case of non-homogeneous
populations, specific functions may be formulated such as those of Cox [8].
That is, however, beyond the scope of this paper.
TRANSITION-PROBABILITY MODEL 333
X (eyll~~12ey2Z(LI~-7’)
p2,)
a;,.,,
(ev~Q,2ey22(t~~-7) Siz,
yz2) . (30)
Each factor to the right of a multiplication sign represents the probability
function for a sequence of transitions listed in Table 1. For justification of
these factors, reference may be made to 16, Chapter 41.
The likelihood function for the group of n, individuals initially affected
with the disease is somewhat simpler. Denote the time at death by t2,; we
have
(31)
j=l
The likelihood function for the entire group is, of course, the product
L = L,.L,. (32)
In the likelihood function (32) there are four unknown parameters: v12, p,,,
p2,, and pz2. The intensity functions v,, and vz2 can be determined from
YII= -(++~~ii) and v22= -(Pz~+Pzz). (26)
The values of the parameters which maximize the likelihood function in (32)
can be derived by taking the derivatives of the logarithm of L. Formally, we
334 SUSAN T. SACKS AND CHIN LONG CHIANG
&lnL=O, (33)
12
- 2 [El*jTlj+E1*j7j+~Il.,jt*j+(~;1,2j+6;2j)7/]
j- 1
Yl1.2+Yl2
+ ~(X,*+Yll.2+~,2+Y;,.2fy;2)+
VII - v22
“I (6r,.2, + 6,2j)f?ie”ll’l~
-
c
j=l
e’“‘l,- e”d1,
=o (334
1
r “1
- ,~,{E12j(~\j-~,)+(6;1,2j+B;2j)(rll-~l)i +,~l('22,T2,+blji2j+8*2j'2j)
1
+ j3Y;II+YII-I+Y21)-
~
VII
_ v22 0,,.2+.Y12)
“’ (6,1.2j+~,2,)t,je”**“J o
+ = (35a)
c e”ll’l,- ~991,
j=l
I
~_v22
+$p;,+Y,,+yn)- VII
I
(Y,,.,+Y,2)
"' (6~~.2j+8~2j)tVeY2*“’ =.
W)
+ f
c ~“ll’l,-- ~“dl,
j=l
(Y;,.2+Y,,2+Y2,)Pz2-(Y;*+Y,*+Y22)P2I=o~ (38)
-,g, [E22jT2j+(821j+S22,)t2j]
P2,=@,, (40)
where k, which represents the relative risk of dying from other causes for an
individual with CHD as compared with one without CHD, may be de-
termined from other sources. Substituting (37) in (39) and using (40), we
obtain the estimator
k I$ {(&~~~+&~2~)'~j+(~~~~~,+S~~~2,+~~2j+~;~.2j+S;2j)t~j}
1'1
+5{
j=l
~22jT2j+ (62,j+ &,)t,}
1
(41)
336 SUSAN T. SACKS AND CHIN LONG CHIANG
which is equivalent to the ratio of the number dying from other causes (R,)
to the length of time lived by the n individuals, dependent on the value of k.
From (40) we have
iL = &ii. (42)
~,,+Yll.z+Ylz+Yll.z ’ + Yiz
i 12= E-i,,
Yll.1
and
Y12+Y22+Y;2
l&2 = kill. (4)
Yll.z+Yzl
E(~II)= 2 ew{~ll~lj}~
j=l
03
F)
V12P21
E(~lw~ll>= vI1_ ~[exp{V,,~lj}(VllT,-1)+1]
[
-~[eXP{~,,r,)(~*,r,-l)+l)l. (G)
Vl2P22
E(612jtb) = yll ~[exP{v,~7.ij}(vi~Tli-1)+1]
[
- $exp{v22~~,} (v22TI,-l)+l!].
6,,.2jtZexP{(V1L+Y22)t]j}
E(Y,,.,+Y,,)= - $E(YI,.2h WI
E(Y~,)= 2 ~(exP{v227,}-1)~ GW
j- 1
E(6,,,ly)=E[S,,jE(rylS,,,)]=~[exp{v,,T,,}(~~~TZ~j-1)+llr (0)
u-7
E(~;,.2,)=E(~,,.2j)~ (Q)
E(6;2j)=E(S12,)' W
The information matrix is defined in this case as Eq. (45), opposite, where
(46)
(47)
+ EE(6,,) (48)
(v,,V2$);jh(6,.2)-Z]~E(
-E(slnL)=‘h[$-(&+
(49)
The expectations of the “mixed” derivatives are all equal, except for sign.
Namely,
TRANSITION-PROBABILITY MODEL 339
340 SUSAN T. SACKS AND CHIN LONG CHIANG
with
a2
-E
( av,,ap,, lnL
1
=n,
i
cY,,~v22j2E(B,,~2)-I
I
’ (51)
where
n,
u12v2z =I, f 2 [~-WI,-
I= e-“llfl,
I- ’ dt,j, (52)
n,(v,,-v22) XI
;=, 0 ,’
and E,2, f,,.,, c?,,.~, and f2, are the respective sample means. The inverse of
the information matrix is the covariance-matrix of the estimators Y^,2,b,,,
&,, and fi22. Simple computations yield the approximate formulas
ABD+ABE+ADE+BDE
var(F2,)=
ABCD+EABC+EABD+EACD+EBCD’ (55)
ABC+ABE+ACE+BDE
Var( fi22) = ABCD+EABC+EABD+EACD+EBCD’ (56)
- CDE
Cov(i,,,fi,,)= (57)
ABCD+ EABC+ EABD+ EACD+ EBCD’
BDE
Cov(i,,.$,,) = (58)
ABCD+EABC+EABD+EACD+EBCD’
BCE
Cov(~,,,~,,) =
ABCD+EABC+EABD+EACD+EBCD’ (59)
ADE
Cov(I-i,,,fiz,)=
ABCD+EABC+EABD+EACD+EBCD’ (60)
ACE
cov((i,,>fi,,)=
ABCD+EABC+EABD+EACD+EBCD’ (61)
-ABE
Cov( $2,) i(i22) =
ABCD+ EABC+ EABD+ EACD+ EBCD’ (62)
where
B= $,E(&,-,), (64)
C= %[2E(&,.,)+6’+,)], (65)
TRANSITION-PROBABILITY MODEL 341
D= ~[ZE(G,*.,)+BE(S,,)].
v22
E=n, E(L)-I 1 (67)
(VII - v22)2P21 I
where 0 stands for the ratio n2/n1. Thus, as n,+co. the variances in (53)
through (56) all vanish, and i,,, fi,,, fiz,, and fi22 are consistent estimators of
the corresponding parameters.
questions have been raised at the same time, and the applicability of the
model will be enhanced once these questions have been answered. The
question of just what is an appropriate value of k for the JAHRP and other
follow-up studies of CHD such as Framingham is very important, and once
it is answered, one’s attention could be directed to the relevant tables in this
section. Perhaps subsets of the data could be used to get an estimate of k, or
perhaps data from the National Center for Health Statistics might be
available for this task. The accuracy of the assumption that the parameters
are independent of time should be investigated and subsets of the long-term
incidence data from JAHRP, Framingham, and other studies might provide
an answer. The “fit” of the model to other data sets is another important
issue to be investigated. It is the authors’ hope that future investigators will
find these questions of sufficient importance and relevance to investigate
and to solve.
TABLE 2
JAHRP Study Population-Case II
s, s,+s, Sl X ,, =3471
S,+Sz s2 X ,2 = 42
S,-+R, RI Y I,.,= 5.5
SpS2+R, RI Y,,.*= 0
S,-+S*-PR* R, Y I2 = 17
TABLE 3
Relative Risk =0.5
Variance of Coefficient
Parameter Estimator estimator8 of variation Transition probability
TABLE 4
Relative Risk = 0.7
Variance of Coefficient
Parameter Estimator estimatora of variation Transition probability
TABLE 5
Relative Risk = 1.O
Variance of Coefficient
Parameter Estimator estimate? of variation Transition probability
TABLE 6
Relative Risk = 1.4
Variance of Coefficient
Parameter Estimator estimato? of variation Transition probability
TABLE 7
Relative Risk = 2.0
Variance of Coefficient
Parameter Estimator estimatoP of variation Transition probability
TABLE 8
Relative Risk = 3.0
Variance of Coefficient
Parameter Estimator estimate? of variation Transition probability
REFERENCES