University of California, Berkeley, California 94720 Received 13 January I977

A Transition-Probability Model
for the Study of Chronic diseases
SUSAN T. SACKS AND CHIN LONG CHIANG’

University of California, Berkeley, California 94720
Received 13 January I977
ABSTRACT
Presented in this paper is a modification of the general illness-death model for the
study of chronic diseases. Coronary heart diseases (CHD) are used as an example. The
model contains two illness (transient) states: St, the “healthy” state; S,, state of having
CHD; R,, state of death from other causes; R,, death from CHD. Transitions between
states are governed by intensity functions. The study population consists of two groups of
people: a group of n, people who are initially healthy (in S,), and a group of n2 people
who are affected with CHD (in S,) at the time of entering the study. The n, + nz people
are subject to various lengths of time of observation. The time of first symptom of having
CHD and the time of death are treated as random variables. Likelihood function of these
random variables has been derived and maximum likelihood estimators of the intensity
functions have been obtained. Formulas for the variances-covariances of the ML estima-
tors have been obtained by using the information matrix. Application to the model to an
empirical data set is also discussed.
I. INTRODUCTION
Statistical analysis of medical follow-up data is usually confined to
survival and death of each member of a study population; health conditions
of survivors and causes of death are not considered. However, when a study
is designed to investigate a particular disease [or a group of diseases, such as
coronary heart diseases (CHD)], either as a cause of death or for its
incidence and prevalence, distinction should be made between survivors
who are affected with the disease and those who are not, and between
deaths from CHD and death from other causes. Statistical methods have
been developed for dealing with such problems for large samples when the
*The research reported herein was performed pursuant to a grant (#003-P-20-2) from
the Department of Health, Education and Welfare, Washington, D.C. The opinions and
conclusions expressed herein are solely those of the authors, and should not be construed
as representing the opinions or policy of any agency of the United States government.
MATHEMATICAL BIOSCIENCES 34, 325-346 (1977) 325

0 Elsevier North-Holland, Inc., 1977
326 SUSAN T. SACKS AND CHIN LONG CHIANG
information is in the form of frequency distributions. The theory of compet-

ing risks, for example, was developed for the analysis of mortality data by
causes of death; illness-death processes can be used to determine transition
probabilities from one health state to another and other statistical quantities
relevant to epidemiological studies of diseases.
The study of survivorship analysis has a long and varied history. A
common statistical problem in studies of survivorship is the measurement of
the rate at which depletion of a population takes place. The classical
method for estimating the survivorship function is the life table. In 1933,
Frost [ 141 used the ratio of number dying to the “person-years” of exposure
as an estimate of the probability of dying, which Berkson and Gage [2]
called the actuarial method. Cutler and Ederer [9] recounted this method
and emphasized the advantage gained by including survival information on
cases which entered a study too late to have had the opportunity to survive
the full interval.
The classical life table approach has been extended to the study of
multiple decrement functions which is a problem of competing risks. The
concept of competing risks originated in a controversy over the value of
smallpox vaccination. Daniel Bernoulli, D’Alembert, and Laplace all made
theoretical contributions to the problem. Fix and Neyman [ 131 were the first
to study the problem from a modern statistical viewpoint, and they in-
troduced the concept of the net and crude probabilities. Cornfield [7] gave a
lucid discussion of the rationale behind competing-risks analysis. Berkson
and Elveback [l] and Chinag [4, 51 used the life-table methodology for the
analysis of medical follow-up data. Kaplan and Meier [19] introduced the
product-limit estimates for the survivorship probabilities. Other significant
work includes that of Kimball [20], Denson [lo], Dorn [ 111, Harris, Meier
and Tukey [17], and Boag [3]. Survival analysis with the consideration of
covariates seems to have begun with the publications of Feigl and Zelen [ 121
and Zippin and Armitage [23]. Since the celebrated paper by D. R. Cox [8]
many papers have appeared. They include Gehan and Siddiqui [15], Turn-
bull, Brown and Hu [22], Mantel and Byar [21], and Greenberg, Bayard and
Byar [16].
The purpose of this paper is to present a modification of the general
illness-death model for the study of chronic diseases. For easy reference,
coronary heart disease (CHD) will be used as an example. The model
contains two illness states and two death states, transitions from one state to
another being governed by intensity functions. Thus the main difference
between the model in this paper and those appearing in the early publica-
tions is the consideration of more than one transient (living) state. Estima-
tors of these intensity functions are obtained by the method of maximum
likelihood. Additionally, the corresponding asymptotic variances and co-
variances are estimated, and optimum statistical properties of the estimates
TRANSITION-PROBABILITY MODEL 327
are discussed. The model is then applied to data from the Japanese
American Health Research Program, an epidemiological study of coronary
heart disease in the San Francisco Bay Area of Northern California.
II. DESCRIPTION OF THE MODEL

In this model there are two transient states: S, and S,; and two
absorbing states: R, and R,. S, stands for the “healthy” state; S,, the
“illness” state; R,,the state of death from other causes; R,,death from
CHD. An individual is said to be in state S, if he is free from CHD, or in S2
if he is affected with the disease. A person enters absorbing state R, if he
dies from other causes, and enters R, if he dies from coronary heart disease.
We shall assume that the disease is irreversible, so that the transition from
S2 to S, is impossible. Also, since a person cannot die from coronary heart
disease without first having developed the disease, the transition from S, to
R, is not allowed.
Typically, a study population (that is, the sample) initially consists of two
groups: n, individuals who are initially free from the disease (i.e., in S,), n2
individuals who are initially affected with the disease (i.e., in S,), with
n,+n2 = n, the total sample size. Individuals in the study are observed for
varying lengths of time, denoted by Tu,, where j= l,...,ni (denoting the
individual), and i= 1,2 (denoting the health state). Each Tti is an arbitrary
but fixed positive number. Transitions taking place during the observation
period may be summarized as follows: an individual, who is initially in S,,
(1) may remain free from the disease during the entire period (0, T,J
(S,-+Sd;
(2) may have developed CHD, with first symptom at 5, for 7; < T,j, but is
still alive at T,j(S1-+S2);
(3) may have died at ty from other causes without ever having developed
CHD, for ty< T,j(S,+R,);
(4) may have died at ty from other causes after having developed CHD
for tljd T,j(S,+&-R,); or
(5) may have died at tij from CHD after developing the disease for
rlj < T, (S,-+S2-+R2).
Corresponding to each of the end results at T,j, there is a probability.

For an individual initially in S, at time r=O, the transition probabilities are:
P,,(O, TV)= Pr{he will be free from the disease at T,j)

P,,(O, Tlj) = Pr{he will have developed CHD but is alive at Tv}
Q,,.,(O, Ti,)=Pr{he will die from other causes in (0, TV) without having
developed CHD}
Q1i.2(0, Tu)= Pr{he will die from other causes in (0, Tlj)
after having devel-
oped CHD)
Q&A Tu)
= Pr{he will die from CHD in (0, Tv)}, satisfying the equation
‘I,(O,‘,,)+P,Z(O,‘,~)+ Q~~.~(O~T~j)+Q~~.~(O,T~j)+Q~~(O,T
(1)
The subscript after the dot in Q,,., and Q,,., refers to the state S, or S, from
which one enters state R,.These transition probabilities form the basis for
the analysis of morbidity and mortality data when the sample size is large
and the time of illness or death is unknown. When the time of occurrence of
an illness or death is an observable random variable, the corresponding
probability density functions and likelihood function need be developed.
For this purpose, we introduce for each individual a vector of indicators
[ &llj &l2j 611.1j 611.2j ‘12j]’
forj= l,..., n,, corresponding to the five possible end results listed above, so
that
EI lj = 1 if S,+S, is realized in (0, Tlj),
= 0 otherwise,
El2, = 1 if S,+S2 is realized in (0, Tu),
= 0 otherwise,
6,,.lj= l if S,+ R, is realized in (0, T,j ),
=0 otherwise,
s11.2j' l if S,+S2+R, is realized in (0, Tli),

= 0 otherwise,
42, = 1
if S,-+S2-+R2
is realized in (0, Tv),
= 0 otherwise,
satisfying the obvious relationship
E,lj+E12j+~ll.,j+S,,.Zj+S,Zj=1. (2)
Here the symbol E is used to denote living, and 6 death. The expectations of
these indicators are the corresponding probabilities, namely
E[El~jI=f'I1(O,T1,), (3)
E [ElZjI
= PI2(03Tlj), (4)
Q11.1(OTTlj)a
E[“,,,,j]= (5)
E[a,,.2j]=
Q~~.z(OTT,,), (6)
E[6,2j]=Q,2(0,',). (7)
For each individual in the group who is initially affected with coronary
heart disease (in S,), there are three possible end results:
(1) He may remain ill and alive at Tzj (S2+S2) with a probability
P,,(O, T,j)>
(2) He will die from other causes (S, -+R,), for which the probability is
Qdl ‘,,I, and
(3) He will die from CHD (S2+R2) with a probability Q,,(O, Tzj).
The corresponding indicators are +j, 6,,j, a,,, satisfying the equation
E22j + 621j + 622J = l (8)
and their relationships with the transition probabilities are
EL&Z,] = P22
(0,Tzj)’ (9)
E[ aztj] = QZI (0, T2j)’ (10)
and
E[ i&2,] = Q22 (0, T2j). (11)
The number of individuals in the sample associated with the transitions

is the sum of the corresponding indicator taken over each of the two groups.
Denoting the number of survivors by X and the number of deaths by Y, we
have the vector sum
x,1
*I x,2
c
j=l
= y,,.,
yi1.2
(12)
y,2
for the first group, and
n2 E22J
x22
&2lj =
c Y2l (13)
i=J
I
Y 22 1
for the second group. When the observation period is the same for all the n,
individuals, the random vector on the right-hand side of (12) has the
ordinary multinomial distribution with the probabilities given in (1). A
similar statement may be made for the vector (X2,, Y2,, Y22)’in Eq. (13).
The above notation is summarized in Table I.
TABLE 1
Transition in
State at interval State at Transition Number of
time 0 (0, T,,) time Tti probability Indicator individuals
s, SIr+SI Sl P, ,(O.T,,) &I I, XII
S,+S2 s2 ‘,2(‘, T,j) &12j Xl2
S,-*R, RI Q,,.,@ T,j) 8,,.lj Y1l.l

S,-++R, RI Q, 1.20’3 TI,) b1.2, Yll.2
S,-d2+R2 R, Q,dO, 7’1,) 6,2j Yl2
Total 1 1 “1
s2 s2492 s2 p22(“, T,j) E22j x22
S,-*R, RI Q2,UI T,,, 62lj Y 2,
S2+R2 R2 Q22(0, T,j) 6 Y 22

221
Total 1 1 “2
The unknown parameters underlying the present model are the morbid-
ity intensity function ~&t), and the mortality intensity functions piI(
pZ,(t), and pz2(t); each represents instantaneous transition between respec-
tive states at time t. Formally, they are defined as follows:
Y,,(t)A+o(A)=Pr{ an individual in S, at time t
will be in S, in interval (t, t + A)}, (14)
pii(t)A+o(A)=P r { an individual in S, at time t
will be in R, in interval (t, t + A)}. (15)
The intensities pzI(~) and fizz(t) are similarly defined. For simplicity of
formulas. we let
Yl,(o= - [4f)+~l,w] (16)
and
“22(t)= - [ kl (t) + h(t) 1. (17)
Relationships between the intensity functions and transition probabilities

for the general illness-death process are given in [6]. In the present case, the
relationships are
(21)
(22)
It is easy to verify that the quantities on the right-hand side of equations

(18) to (22) add to unity. Since the sum of the right-hand side quantities of
equations (19) (2 l), and (22)
is the probability that an individual who is initially in S, at time 0 will

develop CHD during the interval (0, T,,),and the sum of (18) and (20)
P,, (O,T,j)+ Ql1.1(O,T~j)=l- lTbv{ ~‘yll(~)~~) v12(t)4 (24)
is the probability that an individual will not develop CHD in (0, Tv),the
assertion is proven by Eqs. (23) and (24). Similarly, it can be shown that
P&O, T2j) + Q21<0,T2j) + QdO, T2j) = 1.
Although the formulas (18) through (22) hold whatever form the integra-
ble functions may take, functional forms must be specified if the model is to
be applied to practical problems. When the intensity functions are indepen-
dent of time, so that
v,z(t) = Vl2> F,,(t)=P,b
(25)
P2l (t> = P21, P*z(f> = P22,
with
v II = -[~,~++,,l and v22=-[1*2,+~221, (26)
the formulas for the transition probabilities in (18) and (22) assume simpler
forms. Substituting (25) and (26) in (18) through (22) and integrating the
resulting expressions yields
P,, (0,T,,)=eYIITII, (18a)

P,2(0,T,j)= &[e”~lTI~-evJ~~], (194
Q ,,,, (O,T?,)= &xJ- 11, (20a)
(214
(22a)
For an individual who is initially in S,, the probabilities are
P,, (0, Tzj ) = e ‘2zT2~, (27)
Q,, (O,T,,)= $(e”zz’~~- I), (28)
and
Q,,(O.T,,)= $(e~Ju-- 1). (29)
The intensity functions v,~, p,,, pLz,,and p22 are, in effect, (instantaneous)
incidence rates. These rates are time independent under the assumption
(25). How realistic this assumption is, depends upon the problem under
study. When this assumption appears to be too strong, the study period To
should be subdivided in order to use this model.
This model will describe illness and death processes more accurately
when a study population is homogeneous with respect to race, sex, age, and
possibly other demographic variables, as well as known risk factors such as
cholesterol level and blood pressure. In the case of non-homogeneous
populations, specific functions may be formulated such as those of Cox [8].
That is, however, beyond the scope of this paper.
III. LIKELIHOOD FUNCTION AND MAXIMUM-LIKELIHOOD

ESTIMATORS
The likelihood function appropriate for the problem depends on the
information available in a particular study regarding the time of contracting
the disease and the time of death. In most epidemiological investigations of
chronic diseases, however, the time at death and cause of death are usually
recorded. The time of contracting the disease, rj, is known for those who are
still alive at Tg, but is not always recorded for those who have died before
7;,. For distinction, we use the indicators S,,.2j and a,, for those deaths
where the time of contracting the disease is unknown, and S;,., and S;, for
those deaths where the time of contracting disease is known. In the analysis
of such data, the time of contracting the disease (TV)and the time of death
(tij) are the basic random variables. Taking into account the different
sequences of transitions and causes of death and the knowledge of rj, we
have the likelihood function for the random variables associated with the
group of n, individuals,
X (eyll~~12ey2Z(LI~-7’)
p2,)
a;,.,,
(ev~Q,2ey22(t~~-7) Siz,
yz2) . (30)
Each factor to the right of a multiplication sign represents the probability
function for a sequence of transitions listed in Table 1. For justification of
these factors, reference may be made to 16, Chapter 41.
The likelihood function for the group of n, individuals initially affected
with the disease is somewhat simpler. Denote the time at death by t2,; we
have
(31)
j=l
The likelihood function for the entire group is, of course, the product
L = L,.L,. (32)
In the likelihood function (32) there are four unknown parameters: v12, p,,,
p2,, and pz2. The intensity functions v,, and vz2 can be determined from
YII= -(++~~ii) and v22= -(Pz~+Pzz). (26)
The values of the parameters which maximize the likelihood function in (32)
can be derived by taking the derivatives of the logarithm of L. Formally, we
establish the following equations:
&lnL=O, (33)
12
When the differentiation is realized, and the resulting expressions sim-

plified, the above equations are reduced, respectively, to
- 2 [El*jTlj+E1*j7j+~Il.,jt*j+(~;1,2j+6;2j)7/]
j- 1
Yl1.2+Yl2
+ ~(X,*+Yll.2+~,2+Y;,.2fy;2)+
VII - v22
“I (6r,.2, + 6,2j)f?ie”ll’l~
-
c
j=l
e’“‘l,- e”d1,
=o (334
YIb2+Yl2 n' (&1.2,+ %2,)f&~I'~~ =. (34a)

-
+ +ym +
c e"ll'l,- e"22'1,
VII - v22
j=l
1
r “1
- ,~,{E12j(~\j-~,)+(6;1,2j+B;2j)(rll-~l)i +,~l('22,T2,+blji2j+8*2j'2j)
1
+ j3Y;II+YII-I+Y21)-
~
VII
_ v22 0,,.2+.Y12)
“’ (6,1.2j+~,2,)t,je”**“J o
+ = (35a)
c e”ll’l,- ~991,
j=l
I
~_v22
+$p;,+Y,,+yn)- VII
I
(Y,,.,+Y,2)
"' (6~~.2j+8~2j)tVeY2*“’ =.
W)
+ f
c ~“ll’l,-- ~“dl,
j=l
where EjS;l,2j =y;1.2 and X,a;2jzY;2.

To solve Eqs. (33a) through (36a) simultaneously for the unknown

parameters, we subtract (34a) from (33a) to obtain
(x,,+Y,,.*+Y,*+Y;,.,+Y;*)Pl,-Yl,.l~,z- -0. (37)
Similarly, subtracting (36a) from (35a) gives
(Y;,.2+Y,,2+Y2,)Pz2-(Y;*+Y,*+Y22)P2I=o~ (38)
addition of (33a) and (35a) gives
-,g, [E22jT2j+(821j+S22,)t2j]
+ $p2+Y,,.z+Y,2+Y;,.2+Y;2~+ +,(Y,,.2+Y2,+Y;,.2)=0. (39)
While the above subtraction and addition operations have considerably

simplified the original equations (33a) to (36a), they also have eliminated
one equation in the process. In the three equations (37) (38), and (39), we
still have the same four parameters that we had originally. Another equa-
tion is needed.
Recall from the formula (15) that ~,,a+ o(A) is the probability that an
individual free from the disease under study will die from the other causes
in (t, t + A), whereas p2,A + o(A) is the probability that an individual affected
with the disease will die from other causes. Thus, these are the probabilities
of dying from the same cause (the other causes) for individuals in different
health conditions. It is, therefore, reasonable to assume a definite relation-
ship between p2, and p,,. Specifically, we let
P2,=@,, (40)
where k, which represents the relative risk of dying from other causes for an
individual with CHD as compared with one without CHD, may be de-
termined from other sources. Substituting (37) in (39) and using (40), we
obtain the estimator
PI, = (kY,,., +~11.2+~2, +~;1.2)

/
k I$ {(&~~~+&~2~)'~j+(~~~~~,+S~~~2,+~~2j+~;~.2j+S;2j)t~j}
1'1
+5{
j=l
~22jT2j+ (62,j+ &,)t,}
1
(41)
which is equivalent to the ratio of the number dying from other causes (R,)
to the length of time lived by the n individuals, dependent on the value of k.
From (40) we have
iL = &ii. (42)
Now use (37) and (38) respectively, to obtain the estimators
~,,+Yll.z+Ylz+Yll.z ’ + Yiz
i 12= E-i,,
Yll.1
and
Y12+Y22+Y;2
l&2 = kill. (4)
Yll.z+Yzl
The estimators given in (41) through (44) are maximum-likelihood estima-

tors, and as such, they are Fisher consistent. That is, when the random
variables in (41) through (44) are replaced by the corresponding expected
values, we recover the parameters Y,,, p2,, p12, and pL2*.This property is
easily verified by direct computation.
IV. VARIANCES AND COVARIANCES OF THE ESTIMATORS

The maximum-likelihood estimators in (41) through (44) are ratios of
random variables. Therefore, exact formulas for their variances and covari-
antes cannot be derived. Approximate formulas, however, may be obtained
by using Fisher’s information matrix [12a]. It should be mentioned that in
this approach, the variance of an estimator is derived from the likelihood
function (or more precisely, the expectations of the second derivatives of
lnL), and is not directly related to the estimator itself. Therefore, we shall
consider Y,~, P,,, p2, and ~~~ as independent parameters without taking into
account the constraint (40) which was introduced merely for obtaining
explicity formulas for the estimators. Many expected values are necessary
for computing the information matrix as defined below. These expectations
are
E(~II)= 2 ew{~ll~lj}~
j=l
E(x12)= 2 ~(exp{~~I~~;}-exp{~22~u}), (B)

VII
j=l
E(y,,.,)= i LII(exp{ul,Tlj}- I), cc>

j=, vll
337
TRANSITION-PROBABILITY MODEL
03
F)
V12P21
E(~lw~ll>= vI1_ ~[exp{V,,~lj}(VllT,-1)+1]
[
-~[eXP{~,,r,)(~*,r,-l)+l)l. (G)
Vl2P22
E(612jtb) = yll ~[exP{v,~7.ij}(vi~Tli-1)+1]
[
- $exp{v22~~,} (v22TI,-l)+l!].
6,,.2jtZexP{(V1L+Y22)t]j}
E (exp{ Vllfy} -exP{ v22f,,))2

I
= *JruT”lfj(eXp{ - V22tlj) -eXP{ - vllt~,})~‘dty~ (1)
a,2jt2jexP{ tvll + v22)tl,}

E
I (exP{ V,,t,j} -exP{ V22ty))'

I
= ~~"?$(exp{ - V22t,j}-exp{ - V,l~,j})-'~~,j~ (J)
E(Y,,.,+Y,,)= - $E(YI,.2h WI
E(x~~)= i! exp{v22T2,}, CL)

j- I
E(Y~,)= 2 ~(exP{v227,}-1)~ GW
j- 1
E(~22)=5 $f(exP{v22T,,}-1), (N)

j-1 --
338 SUSAN ‘I-. SACKS AND CHIN LONG CHIANG
E(6,,,ly)=E[S,,jE(rylS,,,)]=~[exp{v,,T,,}(~~~TZ~j-1)+llr (0)
u-7
E(~;,.2,)=E(~,,.2j)~ (Q)
E(6;2j)=E(S12,)' W
The information matrix is defined in this case as Eq. (45), opposite, where
(46)
(47)
+ EE(6,,) (48)
(v,,V2$);jh(6,.2)-Z]~E(
-E(slnL)=‘h[$-(&+
(49)
The expectations of the “mixed” derivatives are all equal, except for sign.
Namely,
with
a2
-E
( av,,ap,, lnL
1
=n,
i
cY,,~v22j2E(B,,~2)-I
I
’ (51)
where
n,
u12v2z =I, f 2 [~-WI,-
I= e-“llfl,
I- ’ dt,j, (52)
n,(v,,-v22) XI
;=, 0 ,’
and E,2, f,,.,, c?,,.~, and f2, are the respective sample means. The inverse of
the information matrix is the covariance-matrix of the estimators Y^,2,b,,,
&,, and fi22. Simple computations yield the approximate formulas
BCD + EBD + EBC + ECD

Var( f,,) = (53)
ABCD+EABC+EABD+EACD+EBCD’
ACD+ACE+ADE+CDE
Var( ti,, ) = ABCD+EABC+EABD+EACD+EBCD’ (54)
ABD+ABE+ADE+BDE
var(F2,)=
ABCD+EABC+EABD+EACD+EBCD’ (55)
ABC+ABE+ACE+BDE
Var( fi22) = ABCD+EABC+EABD+EACD+EBCD’ (56)
- CDE
Cov(i,,,fi,,)= (57)
ABCD+ EABC+ EABD+ EACD+ EBCD’
BDE
Cov(i,,.$,,) = (58)
ABCD+EABC+EABD+EACD+EBCD’
BCE
Cov(~,,,~,,) =
ADE
Cov(I-i,,,fiz,)=
ACE
cov((i,,>fi,,)=
-ABE
Cov( $2,) i(i22) =
ABCD+ EABC+ EABD+ EACD+ EBCD’ (62)
where
A= 2 E(F,,)+ %E(6,,,) , (63)

[ I
B= $,E(&,-,), (64)
C= %[2E(&,.,)+6’+,)], (65)
D= ~[ZE(G,*.,)+BE(S,,)].
v22
E=n, E(L)-I 1 (67)
(VII - v22)2P21 I
where 0 stands for the ratio n2/n1. Thus, as n,+co. the variances in (53)
through (56) all vanish, and i,,, fi,,, fiz,, and fi22 are consistent estimators of
the corresponding parameters.
V. APPLICATION OF THE MODEL

The Japanese American Health Research Program (JAHRP) is one part
of a tripartite prospective investigation into coronary heart disease, the
other two parts of the study being carried out in Hawaii and Japan. A
comprehensive description of all three study cohorts can be found in an
article in the Journal of Chronic Disease [ 181.
In California a total population register of Japanese persons living in the
eight-county San Francisco Bay Area was developed, and in 1969 all men
between the ages of 30 and 69 were invited to participate in a screening
examination at the Kaiser Permanente Clinics in Oakland and San Fran-
cisco. In addition to a complete physical examination, the subjects filled out
extensive questionnaires on medical history, diet, and cultural background.
Since the time of the initial examination a continuous collection of informa-
tion on mortality in the cohort has been maintained. In addition to the
mortality surveillance, mail surveillance of morbidity was also carried out
on the second and fourth anniversary of the subject’s initial examination. If
there was any indication on the surveillance form that the subject might
have experienced an incidence of coronary heart disease, a follow-up letter
was sent to the subject’s doctor and/or hospital requesting additional
information on the incident. Thus a conclusive medical decision could be
made according to previously established criteria as to whether the subject
had developed clinical coronary heart disease.
The four-year surveillance file from the Japanese American Health
Research Program to which the proposed transition probability model was
applied contained the following information on each of 3809 subjects:
(1) JAHRP identification number

(2) Date of initial examination
(3) At-risk code
I= initial ECG tracing reported as any one of Minnesota codes I- I- 1
to l-3-6, 6-4, 7-1 (prevalent)
O=initial ECG tracing reported as other than that described above
9 = no initial ECG
(4) CHD event-non-fatal

0 = at-risk code equals 0 and no incidence
1=not applicable (at-risk code equals 1)
2 = definite CHD incident
3 = possible CHD incident
4 = at-risk code equals 3
(5) Date of CHD event
(blank if not applicable)
(6) Mortality
0= alive at last follow-up
1= died, non-incident
2= died, definite CHD incident
3= died, possible CHD incident
4= at-risk code equals 9
(7) Date of death
(blank if not applicable)
(8) Cause of death
ICDA Code, 8th Revision.
Recall from Eq. (4) that the maximum-likelihood estimates of Y,~, p,,,
pLzl, pz2, vll and v22 are dependent on the value of k, the relative risk of
dying from “other causes” for an individual with CHD, as compared to an
individual without CHD.
Also recall from the introductory paragraph of Section IV that the
formulas for the variances and covariances of the estimators were derived
on the basis of the asymptotic properties of maximum-likelihood estimation
and are independent of the constraint of Eq. (40) (i.e., p2r = kp,,) which was
imposed on the parameters themselves. However, it is still necessary to use
the estimators of the parameters in order to obtain numerical values for the
variances and covariances in the data analysis. Obviously then, the value of
k affects the variances and covariances as well as the parameters. Ideally,
the value of k should be determined from the mortality data in the general
population or from a large-scale study population. Since we have so far
failed to unearth such information, and in order to study the effect of the
value of k on our model, it was decided to compute the estimates of the
parameters and their variances and covariances, as well as the transition
probabilities for several values of relative risk ranging from 0.5 to 3.0 (see
Tables 3 to 8).
The application of the proposed model to “real” data from an epidemio-
logical study of CHD has resulted in some interesting observations concern-
ing the transition probabilities, and it is these probabilities which would be
of the most interest to the cardiovascular research worker. However, certain
questions have been raised at the same time, and the applicability of the
model will be enhanced once these questions have been answered. The
question of just what is an appropriate value of k for the JAHRP and other
follow-up studies of CHD such as Framingham is very important, and once
it is answered, one’s attention could be directed to the relevant tables in this
section. Perhaps subsets of the data could be used to get an estimate of k, or
perhaps data from the National Center for Health Statistics might be
available for this task. The accuracy of the assumption that the parameters
are independent of time should be investigated and subsets of the long-term
incidence data from JAHRP, Framingham, and other studies might provide
an answer. The “fit” of the model to other data sets is another important
issue to be investigated. It is the authors’ hope that future investigators will
find these questions of sufficient importance and relevance to investigate
and to solve.
VI. SELECTED TABLES
TABLE 2
JAHRP Study Population-Case II
State at Transition in State at

time 0 interval (0,4) time 4 yr Number of individuals
s, s,+s, Sl X ,, =3471
S,+Sz s2 X ,2 = 42
S,-+R, RI Y I,.,= 5.5
SpS2+R, RI Y,,.*= 0
S,-+S*-PR* R, Y I2 = 17
s2 sz-+s2 s2 X22 = 119

Sz-+R, RI Y,,= 5
S2+R2 R2 Y,,= 5
TABLE 3
Relative Risk =0.5
Variance of Coefficient
Parameter Estimator estimator8 of variation Transition probability
“I2 0.00474 3.4313x 10-7 0.12371 P,,(O,4)=.9641

lLll 0.00441 3.0504x 10-7 0.12512 P,,(O,4)=.0182
(121 0.00221 3.1786x 1O-6 0.80779 Q,,.,(O,4)=.0173
P22 0.00971 9.8380x 1O-6 0.32299 Q11.2(o,4)=.ooo08
“I1 -0.00915 6.4817x lo-’ - 0.08799 Qi2(0,4) = .0004
“22 -0.01192 1.3017x 1O-5 - 0.30272 P,,(O,4) = .9534
Q2,(0,4) = .0086
Q22(0, 4) = .0379
“All covariance estimates = 0.
TABLE 4
Relative Risk = 0.7
Parameter Estimator estimatora of variation Transition probability
“12 0.00454 3.3521 x lO-7 0.12789 P, ,(O, 4) = .9656

PII 0.00422 2.9293 x lO-7 0.12825 P,,(O,4)=.0172
P21 0.00295 4.5588 x lO-6 0.72277 Q,,.,(O,4)=.0166
1122 0.01300 1.7897x 10m5 0.32547 Q,,.2@,4)=.ooOl
VII -0.00875 6.2814x 1O-7 - 0.0906 1 Q&2 4) = .0005
y22 -0.01595 2.2455x 10-j - 0.29706 P,,(O, 4) = .9382
Q,,(O,4)= .0114
Q22(0, 4) = .0504
‘All covariance estimates = 0.
TABLE 5
Relative Risk = 1.O
Parameter Estimator estimate? of variation Transition probability
VI2 0.00437 3.3208 x lO-7 0.13184 P, ,(O, 4) = .9668

PII 0.00407 2.8299 x lO-7 0.13056 P,,(O,4)= .0164
P2l 0.00407 6.4913x lO-6 0.62529 Q,,.,(O,4)=.0160
I422 0.01793 2.6843x 1O-5 0.28899 Qj,.2(0,4)=.00014
VI1 -0.00845 6.1507x 1O-7 - 0.09286 Q12(o,4)=.0006
y22 -0.02200 3.3334x 10-s - 0.26240 P,,(O,4)=.9158
Q,,(O,4)=.0156
Q22(0, 4) = .0686
TABLE 6
Relative Risk = 1.4
Parameter Estimator estimato? of variation Transition probability
“12 0.00427 3.3528 x lo-’ 0.13570 P,,(O,4)=.9676

lLll 0.00398 2.7624x lO-7 0.13214 P,,(O,4)=.0158
P2l 0.00557 9.0910x 10-6 0.54145 Q,,.,(O,4)=.0156
P22 0.02450 3.8274x 10-j 0.25250 Q,,.,(O,4)=.00018
VII -0.00824 6.1152x lO-7 - 0.09485 Q,2(0,4)= .8867
v22 -0.03007 4.7365 x lO-5 - 0.22887 P,,(O,
4) = .8867
Q,,(O,4)=.0210
Qz2(0, 4) = .0923
*All covariance estimates = 0.

TABLE 7
Relative Risk = 2.0
Parameter Estimator estimatoP of variation Transition probability
VI2 0.04419 3.4644~ lo-’ 0.14052 P,,(O,4)=.9681

PII 0.00390 2.7116x IO-’ 1.13335 P&0,4)=.0152
P21 0.00781 1.3115~ lo-’ 0.46372 Q,,.,(0,4)=.0154
P22 0.03436 5.5735 x 1O-5 0.21725 Q,,.,(O,4)=.OCJO2
VII -0.00809 6.1760~ lo-’ - 0.097 10 Q,,(O,4)=.0011
y22 -0.04217 6.8851 x lO-5 -0.19676 P,,(O, 4) = ,844s
Qz,(0,4)= .0288
Q22(0, 4) = .I265
aAll covariance estimates = 0.
TABLE 8
Relative Risk = 3.0
Parameter Estimator estimate? of variation Transition probability
VI2 0.00413 3.1272x lo-’ 0.14789 P, ,(O, 4) = .9686

1111 0.00385 2.6719~ lo-’ 0.13432 P,,(O,4)=.0144
I421 0.01154 2.0198x lO-5 0.38929 Q,,.,(O,4)=.0152
1122 0.05080 8.6356 x 1O-5 0.18294 Qll.2(0,4)=.oo04
VII -0.00798 6.3991 x lo-’ - 0.10029 Q,2(0,4)=.0015
v22 -0.06234 1.0655x lO-4 -0.16558 P,,(O, 4) = .7793
Q2,(0,4)= 0409
Q22(0, 4) = ,179s
REFERENCES
J. Berkson and L. Elveback, Competing exponential risks, with particular reference to

the study of smoking and lung cancer, J. A. S. A. 55 (291) 415-428 (1960).
J. Berkson and R. P. Gage, Calculation of survival rates for cancer, Proc. Stuff Meet.
Mayo C/in. 25, 27&286 (1950).
J. W. Boag, Maximum likelihood estimates of the proportion of patients cured by
cancer therapy, J. R. Star. Sot. B 11, 15-44 (1949).
C. L. Chiang, A stochastic study of the life table and its applications. III. The
follow-up study with the consideration of competing risks, Biometrics 17 (I), 57-78
(1961).
C. L. Chiang, On the probability of death from specific causes in the presence of
competing risks, in Proceedings of the Fourrh Berkeley Symposium on Mathematical
Sfatistics and Probabiliry, (J. Neyman, Ed.), Univ. of Calif. Press, 1961, Vol. IV, pp.
168-180.
6 C. L. Chiang, Introduction to Stochastic Processes in Biostatistics, Wiley, New York,

1968.
7 J. Cornfield, The estimation of the probability of developing a disease in the presence
of competing risks, A.J.P.H. 47, 601-607 (1957).
8 D. R. Cox, Regression models and life tables, J.R. Stat. Sec. B 34, 187-220 (1972).
9 S. J. Cutler and F. Ederer, Maximum utilization of the life table method in analyzing
survival, J. Chron. Dis. 8 (6) 699-712 (1958).
10 P. M. Densen, Long-time follow-up in morbididty studies: The definition of the
group to be followed, Hum. Biol. 22 (4), 233-237 (1958).
II H. F. Dorn, Methods of analysis for follow-up studies, Hum. Biol. 22 (4) 238-248
(1950).
12 P. Feigl and M. Zelen, Estimation of exponential survival probabilities with concom-
itant information, Biometrics 21, 826-838 (1965).
12a R. A. Fisher, Theory of statistical estimation, Proc. Camb. Phil. Sot. 22, 700-725
(1925).
13 E. Fix and J. Neyman, A simple stochastic model of recovery, relapse, death, and loss
of patients, Hum. Biol. 23, 205-241 (1951).
14 W. H. Frost, Risk of persons in familial contact with pulmonary tuberculosis,
A.J.P.H. 23, 426-432 (1933).
15 E. A. Gehan and M. M. Siddiqui, Simple regression methods for survival time
studies, J.A.S.A. 68, 848-856 (1973).
16 R. A. Greenberg, S. Bayard, and D. Byar, Selecting concomitant variables using a
likelihood ratio step-down procedure and a method of testing goodness of fit in an
exponential survival model, Biometrics 30, 601-608 (1974).
17 T. E. Harris, P. Meier, and J. W. Tukey, Timing of the distribution of events between
observations. A contribution to the theory of follow-up studies, Hum. Biol. 22,
249-270 (1950).
18 A. Kagan, B. R. Harris. W. Winkelstein, Jr., et al., Epidemiologic studies of coronary
heart disease and stroke in Japanese men living in Japan, Hawaii and California:
Demographic, physical, dietary and biochemical characteristics, J. Chron. Dis. 27,
345-364 (1974).
19 E. L. Kaplan and P. Meier, Nonparametric estimation from incomplete observations,
J.A.S.A 53, 457481 (1958).
20 A. W. Kimball, Disease incidence estimation in populations subject to multiple
causes of death, Bull. Inst. Int. Sfaf. 36 (3) 193-204 (1958).
21 N. Mantel and D. P. Byar, Evaluation of response-time data involving transient
states: An illustration using heart-transplant data, J.A.S.A. 69 (345), 81-86 (1974).
22 B. W. Turnbull, B. W. Brown, Jr., and M. Hu, Survivorship analysis of heart
transplant data, J.A.S.A. 69 (345), 7480 (1974).
23 C. Zippin and P. Armitage, Use of concomitant variables and incomplete survival
information in the estimation of an exponential survival parameter, Biometrics 22,
665-672 (1966).

University of California, Berkeley, California 94720 Received 13 January I977

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

University of California, Berkeley, California 94720 Received 13 January I977

Uploaded by

Copyright:

Available Formats

A Transition-Probability Model

for the Study of Chronic diseases

SUSAN T. SACKS AND CHIN LONG CHIANG’

Received 13 January I977

MATHEMATICAL BIOSCIENCES 34, 325-346 (1977) 325

information is in the form of frequency distributions. The theory of compet-

II. DESCRIPTION OF THE MODEL

Corresponding to each of the end results at T,j, there is a probability.

P,,(O, TV)= Pr{he will be free from the disease at T,j)

[ &llj &l2j 611.1j 611.2j ‘12j]’

s11.2j' l if S,+S2+R, is realized in (0, Tli),

E22j + 621j + 622J = l (8)

and their relationships with the transition probabilities are

E[ aztj] = QZI (0, T2j)’ (10)

E[ i&2,] = Q22 (0, T2j). (11)

The number of individuals in the sample associated with the transitions

for the first group, and

s, SIr+SI Sl P, ,(O.T,,) &I I, XII

S,+S2 s2 ‘,2(‘, T,j) &12j Xl2

S,-*R, RI Q,,.,@ T,j) 8,,.lj Y1l.l

s2 s2492 s2 p22(“, T,j) E22j x22

S,-*R, RI Q2,UI T,,, 62lj Y 2,

S2+R2 R2 Q22(0, T,j) 6 Y 22

Y,,(t)A+o(A)=Pr{ an individual in S, at time t

will be in S, in interval (t, t + A)}, (14)

pii(t)A+o(A)=P r { an individual in S, at time t

will be in R, in interval (t, t + A)}. (15)

Yl,(o= - [4f)+~l,w] (16)

“22(t)= - [ kl (t) + h(t) 1. (17)

Relationships between the intensity functions and transition probabilities

It is easy to verify that the quantities on the right-hand side of equations

is the probability that an individual who is initially in S, at time 0 will

P,, (O,T,j)+ Ql1.1(O,T~j)=l- lTbv{ ~‘yll(~)~~) v12(t)4 (24)

v II = -[~,~++,,l and v22=-[1*2,+~221, (26)

P,, (0,T,,)=eYIITII, (18a)

Q ,,,, (O,T?,)= &xJ- 11, (20a)

For an individual who is initially in S,, the probabilities are

P,, (0, Tzj ) = e ‘2zT2~, (27)

Q,, (O,T,,)= $(e”zz’~~- I), (28)

Q,,(O.T,,)= $(e~Ju-- 1). (29)

III. LIKELIHOOD FUNCTION AND MAXIMUM-LIKELIHOOD

establish the following equations:

When the differentiation is realized, and the resulting expressions sim-

YIb2+Yl2 n' (&1.2,+ %2,)f&~I'~~ =. (34a)

where EjS;l,2j =y;1.2 and X,a;2jzY;2.

To solve Eqs. (33a) through (36a) simultaneously for the unknown

(x,,+Y,,.*+Y,*+Y;,.,+Y;*)Pl,-Yl,.l~,z- -0. (37)

Similarly, subtracting (36a) from (35a) gives

addition of (33a) and (35a) gives

+ $p2+Y,,.z+Y,2+Y;,.2+Y;2~+ +,(Y,,.2+Y2,+Y;,.2)=0. (39)

While the above subtraction and addition operations have considerably

PI, = (kY,,., +~11.2+~2, +~;1.2)

Now use (37) and (38) respectively, to obtain the estimators

The estimators given in (41) through (44) are maximum-likelihood estima-

IV. VARIANCES AND COVARIANCES OF THE ESTIMATORS

E(x12)= 2 ~(exp{~~I~~;}-exp{~22~u}), (B)

E(y,,.,)= i LII(exp{ul,Tlj}- I), cc>

E (exp{ Vllfy} -exP{ v22f,,))2

= *JruT”lfj(eXp{ - V22tlj) -eXP{ - vllt~,})~‘dty~ (1)

a,2jt2jexP{ tvll + v22)tl,}

I (exP{ V,,t,j} -exP{ V22ty))'

= ~~"?$(exp{ - V22t,j}-exp{ - V,l~,j})-'~~,j~ (J)

(x,,+Y,,.+Y,+Y;,.,+Y;*)Pl,-Yl,.l~,z- -0. (37)

E(x12)= 2 ~(exp{I;}-exp{~22~u}), (B)