Professional Documents
Culture Documents
Statistical Aspects of Clinical Trials PDF
Statistical Aspects of Clinical Trials PDF
Statistical Aspects of Clinical Trials PDF
OF THE DlSlGN
AND ANAlYSlS Of
(lINI(A1 TRIALS
STATISTICAL ASPECTS
OF THE DESIGN
AND ANAlYSlS OF
(llNKA1 TRlAlS
I
Brian 5. Everitt
lnrtitute of Piychiatry, London
Andrew Picklei
Univetrity of Monchertet
Distributed by
World Scientific Publishing Co. Re. Ltd.
P 0 Box 128, Farrer Road, Singapore 912805
USA ofice: Suite lB, 1060 Main Street, River Edge, NJ 07661
UKofJice: 57 Shelton Street, Covent Garden, London WC2H 9HE
For photocopying of material in this volume, please pay a copying fee through the Copyright
Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to
photocopy is not required from the publisher.
ISBN 1-86094-153-2
V
CONTENTS
PREFACE V
vii
viii Design and Analysis of Clinical Trials
7.6.1
Missing Data in a Longitudinal
Clinical Trial with an
Ordinal Response 203
7.7 SUMMARY 206
CHAPTER 8 . Survival Analysis 208
8.1 INTRODUCTIOK 208
8.2 THE SURVIVOR FUNCTION AND
THE HAZARD FUNCTION 209
8.2.1 Survivor Function 2 09
8.2.2 The Hazard Function 219
8.3 REGRESSION MODELS FOR
SURVIVAL DATA 220
8.4 ACCELERATED FAILURE TIME
MODELS 222
8.5 PROPORTIONAL HAZARDS MODEL 225
8.5.1 Proportional Hazards 225
8.5.2 The Semi-parametric
Proportional Hazard
or Cox Model 226
5.5.3 Cox Model Example 229
8.5.4 Checking the Specification
of a Cox Model 232
8.5.5 Goodness-of-fit Tests 240
8.5.6 Influence 241
8.6 TI ME-17ARYlh’GCOVARIATE S 242
8.6.1 Modelling with Time-Varying
Covariates 242
8.6.2 The Probiem of Int,ernal or
Endogeneous Covariates 243
8.6.3 Treatment Waiting Times 244
8.7 STRATIFICATION, MATCHING
AND CLUSTER SAMPLING 245
xii Design and Analysis of Clinical Trials
An Introduction t o
Clinical Trials
1.1. INTRODUCTION
1
2 Design and Analysis of Clinical Trials
enrolled, treated and followed over the same time period. The groups
may be established through randomisation or some other method of
assignment. The outcome measure may be the result of a laboratory
test, a quality of life assessment, a rating of some characteristic or,
in some cases, the death of a patient.
As a consequence of this somewhat restricted definition, compar-
ative studies involving animals, or studies that are carried out in vitro
using biological substances from man do not qualify as clinical trials.
The definition also rules out detailed consideration of investigations
involving historical controls.
On the 20th May 1747, I took twelve patients in the scurvy, on board
the Salisbury at sea. Their cases were as similar as I could have them. They
all in general had putrid gums, the spots and lassitude, with weakness of
their knees. They lay together in one place, being a proper apartment for
the sick in the fore-hold; and had one diet in common to all, viz. water-
gruel sweetened with sugar in the morning; fresh mutton broth often times
for dinner; at other times puddings, boiled biscuit with sugar etc. And for
supper, barley and raisins, rice and currants, sago and wine, or the like.
Two of these were ordered each a quart of cider a day. Two others took
twenty-five gutts of elixir vitriol three times a day, upon an empty stomach;
using a gargle strongly acidulated with it for their mouths. Two others took
two spoonfuls of vinegar three times a day, upon an empty stomach: having
their gruels and their other food well acidulated with it, as also the gargle
An Introduction to Clinical Trials 3
for their mouths. Two of the worst patients, with the tendons in the ham
rigid (a symptom none of the rest had) were put under a course of sea-water.
Of this they drank half a pint every day, and sometimes more or less as
it operated, by way of a gentle physic. Two others had each two oranges
and one lemon given them every day. These they eat with greediness,
at different times, upon an empty stomach. They continued but six days
under this course, having consumed the quantity that could be spared. The
two remaining patients, took the bigness of a nutmeg three times a day of
an electuary recommended by a hospital-surgeon, made of garlic, mustard-
feed, rad. raphan. balsam of Peru, and gum myrr; using for common drink
barley water well acidulated with tamarinds; by a decoction of which, with
the addition of cremor tartar, they were greatly purged three or four times
during the course. The consequence was, that the most sudden and visible
good effects were perceived from the use of the oranges and lemons; one of
those who had taken them, being at the end of six days fit for duty. The
spots mere not indeed at that time quite off his body, nor his gums sound;
but without any other medicine, than a gargle of elixir vitriol, he became
quite healthy before we came into Plymouth. which was OR the 16th June.
The other was the best recovered of any in his condition; and being now
deemed pretty well, was appointed nurse to the rest of the sick.
Since World War 11, the clinical triaI has evolved into a standard
procedure in the evaluation of new drugs. Its features include the use
of a control group of patients that do not receive the experimental
treatment, the random allocation of patients to the experimental or
control group, and the use of blind or masked assessment so that
neither the researchers nor the patients know which patients are in
either group at the time the study is conducted. The clinical trial
nicely illustrates the desire of modern democratic society to justify its
medical choices on the basis of the objectivity inherent in statistical
and quantitative data.
This book will be largely concerned with phase I11 trials. In or-
der to accumulate enough patients in a time short enough to make a
trial viable. many such trials will involve recruiting patients at more
than a single centre (for example, a clinic, a hospital, etc.); they
will be multicentre trials. The principal advantage of carrying out
a multicentre trial is that patient accrual is much quicker so that
the trial can be made larger and the planned number of patients can
be achieved more quickly. The end-result should be that a multicen-
tre trial reaches more reliable conclusions at a faster rate, so that
overall progress in the treatment of a given disease is enhanced.
Recommendations over the appropriate number of centres varies;
on the one hand, rate of patient acquisition may be completely inade-
quate when dealing with a smalI number of centres, but with a large
number (20 or more) potential practical problems (see Table 1.2)
may quickly outweigh benefits. There also be other problems involv-
ing the analysas of multi-centre trials. It is likely, for example, that
An Introduction to Clinical Trials 9
the true treatment effect will not b e identical at each centre. Conse-
quently there may be some degree of treatment-by-centre interaction
and various methods have been suggested for dealing with this possi-
bility. Details are available in Jones et al. (1998), Gould (1998) and
Senn (1998).
was cut into and the incised area was ‘cupped’ t o suck out an additional
eight ounces of blood. After this, the drugging began. An emetic and
purgative were administered, and soon after a second purgative. This was
followed by an enema containing antimony, sacred bitters, rock salt, mallow
leaves, violets, beetroot, camomile flowers, fennel seed, linseed, cinnamon,
cardamom seed, saphron, cochineal, and aloes. The enema was repeated in
two hours and a purgative given. The king’s head was shaved and a blister
raised on his scalp. A sneezing powder of hellebore root was administered
and also a powder of cowslip flowers ‘to strengthen his brain.’ The cathar-
tics were repeated at frequent intervals and interspersed with a soothing
drink composed of barley water, liquorice, and sweet almond. Likewise
white wine, absinthe, and anise were given, as also were extracts of thistle
leaves, mint, rue, and angelica. For external treatment a plaster of Bur-
gundy pitch and pigeon dung was applied to the king’s feet. The bleeding
and purging continued, and to the medicaments were added melon seeds,
manna, slippery elm, black cherry water, a n extract of flowers of lime, lily
of the valley, peony, lavender, and dissolved pearls. Later came gentian
root, nutmeg, quinine and cloves. The king’s condition did not improve,
indeed it grew worse, and in the emergency forty drops of extract of human
skull were administered to allay convulsions. A rallying dose of Raleigh’s
antidote was forced down the king’s throat; this antidote contained an enor-
mous number of herbs and animal extracts. Finally bezoar stone was given.
“Then”, said Scarburgh, “Alas! after an ill-fated night his serene majesty’s
strength seemed exhausted to such a degree that the whole assembly of
physicians lost all hope and became despondent; still so as not to appear to
fail in doing their duty in any detail, they brought into play the most active
cordial.’’ As a sort of grand summary to this pharmaceutical debauch, a
mixture of Raleigh’s antidote, pearl julep, and ammonia was forced down
the throat of the dying king.
conduct of a trial. And there are in addition some areas of trial con-
duct where the statistician needs to take particular responsibility for
ensuring that both the proposed and actual conduct of the trial are
appropriate.
A central ethical issue often identified with clinical trials is t,hat
of randomisation. Randomised controlled trials are now widely used
in medical research. Two recent examples from the many trials un-
dertaken each year include:
being recruited for a trial, having been made aware of the randomi-
sation component, might be troubled by the possibility of receiving
an ‘inferior’ treatment.
Few clinicians would argue against, the need for the voluntary
consent of subjects being asked to take part in a trial, but the amount
of information given in obtaining such consent might be a matter for
less agreement. Most clinicians would accept that the subject must
be allowed to know about the randomisation aspect of the trial, but
how many would want to go as far as Berry (1993) in advising the
subject along the following lines?
Coronary artery 1 7 16 5
surgery.
Anticoagulants for 1 9 5 1
acute myocardial
infarction.
Surgery for 0 8 4 1
oesophaeal varices.
Flurouracil (5-FU) 0 5 2 0
for colon cancer.
BCG 2 2 4 0
immunot her apy
for melanoma.
Diethylstilbesterol 0 3 5 0
for habitual
abortion.
1.6. SUMMARY
The controlled clinical trial has become one of the most important
tools in medical research and investigators planning to undertake
An Introduction to Clinical Trials 17
Treatment Allocation,
the Size of Trials
and Reporting Results
2.1. INTRODUCTION
The design and organisation of a clinical trial generally involves a
considerable number of issues. These range from whether it is ap-
propriate to mount the trial at all, to selection of an appropriate
outcome measure. Some of these issues will be of more concern to
statisticians than others. Three of these: 1) allocating subjects to
treatment groups; 2) deciding the size of the trial, i.e., how many
subjects should be recruited; and 3) how results should be reported,
will be discussed in this chapter.
18
Treatment Allocation, the Size of Trials and Reporting Results 19
under test? Despite the most extensive pre-clinical studies, the first human
allocation of a powerful treatment is largely a blind gamble and it is perhaps
not surprising that so much has been written on the most appropriate
fashion to allocate treatments in a trial.
10 2:8 1 :9
20 6:14 4:16
50 18:32 16:34
100 40:60 37:63
200 86:114 82:118
500 228:272 221:279
1000 469:531 459:541
(Taken with permission from Pocock, 1983.)
22 Design and Analysis of Clinical Trials
b / 2 , where b is the size of the block. This can be important for two
reasons. First if enrollment in a trial takes place slowly over a period
of months or even years, the type of patient recruited for the study
may change during the entry period (t,emporal changes in severity of
illness. for example, are not uncommon), and blocking will produce
more comparable groups. A second advantage of blocking is that
if the trial should be terminated before enrollment is completed be-
cause of the results of some form of interim analysis (see Chapter 3),
balance will exist in terms of the number of subjects randomised t o
each group.
Factor Level A B
Performance status Ambulatory 30 31
Non-ambulatory 10 9
Age < 50 18 17
2 50 22 23
Disease-free interval < 2 years 31 32
2 2 years 9 8
Dominant metastatic lesion Visceral 19 21
Osseous 8 7
Soft tissue 13 12
(Taken with permission from Pocock, 1983.)
sum for A = 30 + 18 + 9 + 19 = 76
sum for B = 31 + 17 + 8 + 21 = 77
1.0,
Fig. 2.1. Reduction in power of a trial as the proportion on the new treat-
ment is increased (taken with permission from Pocock, 1983).
Freatment Allocation, the Site of Trials and Reporting Resalts 29
But, in the recent past at least, such advice has often not been
heeded and the literature is dotted with accounts of inconsequential
trials. Freiman et al. (1978), for example, reviewed 71 ‘negative’
randomised clinical trials, i.e., trials in which the observed differences
between the proposed and control treatments were not large enough
to satisfy a specified ‘significance’level (the risk of a type I error), and
the results were declared to be ‘not statistically significant’. Analysis
30 Design and Analysis of Clinical Trials
The fact is that trials with truly modest treatment effects will achieve
statistical significance only if random variation conveniently exaggerated
these effects. The chances of publication and reader interest are much
greater if the results of the trial are statistically significant. Hence the cur-
rent obsession with significance testing combined with the inadequate size
of many trials means that publications on clinical trials for many treat-
ments are likely to be biased towards as exaggeration of therapeutic effect,
even if the trials are unbiased in all other respects.
2.5. SUMMARY
In this chapter three aspects of the design of clinical trials has been
considered: the allocation of patients to treatments, the size of the
trial, and how to report results. The well documented chaos that can
result from non-randomised trials has led to a general acceptance of
randomisation as an essential component of the vast majority of tri-
als. Several methods of randomisation have been considered in this
chapter, but there are others that have not been mentioned, in partic-
ular cluster randomisation in which natural groupings of individuals,
for example, general practices, become the units of randomisation,
and play the winner rules, in which the randomisation is biased to
allow a higher proportion of future allocations to the treatment with
(currently) better observed outcome. The former are described in
Donner and Klar (1994) and are increasingly important in, for ex-
ample, country intervention trials. The latter presumably arise from
the desire amongst some clinicians for more ‘ethically acceptable’
allocation procedures, but there have been very few clinical trials
that have actually used such data-dependent procedures. Pocock
36 Design and Analysis of Clinical Trials
Monitoring Trial
Progress: Outcome
Measures, Compliance,
Dropouts and
Interim Analyses
3.1. INTRODUCTION
The basic elements of the plan for any trial will be set long before the
first patient is enrolled, and will, eventually, be translated into the
study protocol. Here the primary objectives of the trial will be de-
tailed in terms of type of patients to be studied, class of treatments
to be evaluat,ed and primary outcome measures. In addition. the
document will specify the number of patients to be recruited (usu-
ally from the results of a sample size calculation), required length
of patient follow-up, pat.ient entry and exclusion crit.eria, method
of randomisation, details of pre-randomisation procedures and other
general organisational structures.
Putting the study plan into execution begins with patient recruit-
ment, a period which is crucial in the life of a trial, although the
details need not delay us here, since the various possible problems
and pitfalls are well documented in, for example, Meinert (1986).
37
38 Design and Analysis of Clinical Trials
The most elegant design of a clinical study will not overcome the dam-
age caused by unreliable or imprecise measurement. The requirement that
one’s data be of high quality is at least as important a component of a
proper study design as the requirement for randomisation, double blinding,
controlling where necessary for prognostic factors and so on. Larger sample
sizes than otherwise necessary, biased estimates, and even biased samples
are some of the untoward consequences of unreliable measurements that
can be demonstrated.
The claim is often made that in published drug trials more than
90% of patients have been satisfactorily compliant with the protocol-
specified dosing regimen. But Urquhart and DeKlerk (1998) sugest
that these claims, based as they usually are, on count of returned
dosing forms, which patients can easily manipulate, are exaggerated,
and that data from the more reliable methods for measuring compli-
ance mentioned above, contadict them.
Monitoring Dial Progress 45
i
+
. .
. .' . *-
‘\
P - ’OOI2’
-_
? < .OOOt’
5 0.3
MEDICAL IN
S U O C A L IN
-- 3501
3501
---- MEDICAL IN
sunaicAL IN
-- 2111
3501 -
MEDICAL IN
SURGICAL IN
-- 2111
4141
U
0.1
0.0
0 2 4 I b 10 0 2 4 I $ 1 0 0 1 , 8 8 10
baseline
1-
L111
TREATMENT 2
time 1 time 2 time t
nn # I
n
Fig. 3.3. Monotone data pattern caused by patients dropping out of trial.
Here ‘missingness’ depends only on the observed data with the dis-
tribution of future values for a subject who drops out at time t being
the same as the distribution of the future values of a subject who re-
mains in at time t , if they have the same covariates and the same past
history of outcome up to and including time t. An example of a set of
data in which the dropouts violate the MCAR assumption but may
be MAR is shown in Fig. 3.4, taken from Curran et al. (1998). The
diagram shows mean physical functioning scores by time of dropout;
higher scores represent a higher level of functioning. We can see that
patients with a lower physical functioning tended to dropout of the
study earlier than patients with a higher physical functioning score.
Consequently, the probability of dropout depended on the previous
functioning score and hence the dropout was not MCAR.
Finally, in the case of non-ignorable or informative dropout, the
dropout process, P ( D = k l X , Y1,.. . ,YT)depends on the unobserved
values of the outcome variable. That is, dropout is said to be non-
ignorable when the probability of dropout depends on the unrecorded
values of the outcome variable that would have been observed had
54 Design and Analysis of Clinical Trials
+4 Months
- -
3 Months
B -2 Months
P -%-I Month
c
P
04 4
0 1 2 3 4 Auassmont
Fig. 3.4. Physical functioning score by time of dropout (taken with permis-
sion from Curran et al., 1998).
The Tuskegee Syphillis Study was initiated in the USA in 1932 and
continued into the early 1970s. The study involved enrollment and
Monitoring Dial Progress 55
follow-up of 400 untreated latent syphilitic black males (and 200 un-
infected controls) in order to trace the course of the disease. In
recent years, the trial has come under severe criticism because of
the fact that the syphilitics remained untreated even when penicillin,
an accepted form of treatment for the disease, became available. Ma-
jor ethical questions arise if investigators elect to continue a medical
experiment beyond the point at which the evidence in favour of an
effective treatment is unequivocal.
Clearly then, it is ethically desirable to terminate a clinical trial
earlier than originally planned if one therapy is clearly shown to be
superior than the alternatives under test. (This may apply even
if it is a different concurrent study which reports such a result.)
But as mentioned in the Introduction to this chapter, in most clin-
ical trials patients are entered one at a time and their responses
to treatment observed sequentially. Assessing these accumulating
data for evidence of a treatment difference large and convincing
enough to terminate the trial is rarely straightforward. Indeed the
decision to stop accrual to a clinical trial early is often difficult and
multifaceted. The procedure most widely adopted is a planned series
of interim analyses to be done at a limited number of pre-specified
points during the course of the trial. Because the data are examined
after groups of observations rather than after each observation, the
name group sequential is often used. A number of such methods have
been proposed, some of which will be discussed in more detail later
in this section. The aim of all the different approaches, however, is to
overcome the potential problems that can arise from repeated tests
of significance or multiple testing.
The problem of taking ‘multiple looks’ at the accumulating data
in a clinical trial has been addressed by many authors including
Anscombe (1954), Armit age et al. (1969), McPherson (1982), Pocock
(1982) and O’Brien and Fleming (1979). The problem is that, if on
each ‘look’ the investigator follows conventional rules for interpreting
the resulting p-value, then inappropriate rejection of the null hypoth-
esis of no treatment difference will occur too often. In other words,
56 Design and Analysis of Clinical Trials
repeatedly testing interim data can inflate false positive rates if not
handled appropriately. Armitage et al. (1969) give the actual signif-
icance levels corresponding to various numbers of interim analyses
for a normally distributed test statistic; these values are shown in
Table 3.2. So, for example, if five interim analyses are performed,
the chance of at least one showing a treatment difference at the 5%
level, when the null hypothesis is true, is 0.14. As Cornfield (1976)
comments:
Just as the Sphinx winks if you look a t it too long, so, if you perform
enough significance tests, you are sure to find significance even when none
exists.
4 -
3 -
1 I
1-96
1978
0l Junb 1979
May Ocl.
1979 'Much
1980 % %: Oct.
1991 la2
June i
Data
Fig. 3.5. Results from a trial comparing mortality in clofibrate and placebo
treated patients (taken with permission from Freidman, Furberg and
DeMets, 1985).
5.8 c
4.0 -
3.0 -
-
L
2.0 -
N-
0
.-
-6
C
$ 1.0 -
-m Accept
E HO
5 0- Continue
z
7J
5 -1.0 -
E
m
z2 -2.0-
v)
-3.0-
-4.0-
/ A Peto
-5.0 /
1 2 3 4 5
Number of Sequential Groups
with Observed Data (i)
Fig. 3.6. Pocock, O'Brien and Fleming and Pet0 et al. boundaries for
interim analyses (taken with permission from Lan and DeMets, 1983).
where X i l A and X i l B denote the responses of the lth patient in the ith
group on treatment A and B, respectively.
The statistic S, is a partial sum of normal random variables, y Z , of the
form:
2 ( C 1 + C2)2K2(A-1)
n=2
62
BWNDIIRl ES
Look Boundary to rdrot
I HO H1
1 4.46 -1.69
2 3.15 -0.06
3 2.57 0.63
4 2.22 1.47
6 1.99 1.99
Fig. 3.7. O'Brien-Fleming boundary obtained from E a s t software for blood pressure example.
Monitoring Dial Progress 65
type of interim analysis was used to stop the trial early, but where the
reported treatment effect estimate and its p-value were not adjusted
for the sequential design but instead calculated as if the trial had been
of fixed size (see, for example, Moertel et al., 1990). Souhami (1994)
suggests that stopping early because an effect is undoubtedly present
may result in a serious loss of precision in estimation, and lead to
imprecise claims of benefit or detriment. Methods that attempt to
overcome such problems are described in Whitehead (1986), Rosner
and Tsiatis (1989), Jennison and Turnbull (1989) and Pinheiro and
DeMets (1997).
It was the Greenburg Report, finalised in 1967 but not published
until 1988, that established the rationale for interim analyses of ac-
cumulating data. In addition, however, it emphasised the need for
independent data monitoring committees to review interim data and
take into consideration the multiple factors that are usually involved
before early termination of a clinical trial can be justified. Such fac-
tors include baseline comparability, treatment compliance, outcome
ascertainment, benefit to risk ratio, and public impact. This type
of committee is now regarded as an almost essential component of
a properly conducted clinical trial and helps to ensure that interim
analyses, by whatever method, do not become overly prescriptive.
0 philosophical awkwardness,
0 how to draw inferences if the stopping rule is not followed,
0 how to estimate treatment effects at the end of a group se-
quential trial.
66 Design and Analysis of Clinical Trials
Suppose that a clinician Dr. C comes to statistician Dr. S for some ad-
vice. Dr. C has conducted a clinical trial of a new treatment for AIDS. He
has treated and assessed 200 patients, and analysis of the results suggests
a benefit from the new treatment that is statistically significant (p = 0.02).
Dr. S, who is also a frequentist, asks how many times Dr. C plans to
analyse the results as the trial progresses. We now consider two possible
responses.
(1) Dr. C may respond that this is the first and only analysis that has
and will be done. He has waited until the data are complete before
analysing them. Dr. S is then prepared to endorse t.he analysis and
p-value.
(2) Dr. C may, instead, respond that he intends to include 1000 patients
in this trial, and that this is the first of five analyses that are planned.
Moreover, the statistician who helped to design the trial had advocated
an O’Brien and Fleming boundary. Dr. S then advises Dr. C that. the
results are not yet statistically significant and that the p-value of 0.02
should not be taken at face value, being one of a series of tests that are
planned.
3.5. SUMMARY
Once a trial begins, a patient’s progress needs to monitored closely.
Much effort needs to be put into determining whether or not the pa-
tient is complying with the intended treatment and trying, wherever,
possible to ensure that the patient is observed on all the occasions
specified in the trial protocol. The analysis of longitudinal data from
68 Design and Analysis of Clinical Trials
4.1. INTRODUCTION
Senn (1997) makes the point that in its simplest form a clinical trial
consists of a head t o head comparison of a single treatment and a
control in order to answer a single well defined question. The analy-
sis of such a trial might then consist of applying a single significance
test or constructing a single confidence interval for the treatment
difference. In practice, of course, matters tend to be a little more
complex and most clinical trials generate a large amount of data;
for example, apart from the observations of the chosen outcome
variable(s) at different points in time, details of side effects, mea-
surements of laboratory safety variables, demographic and clinical
covariates will often also be collected. As a consequence, the anal-
yses needed may rapidly increase in complexity. In this chapter we
shall restrict attention to methods suitable for trials in which one
or more outcome variables are recorded on a single occasion, usually
the end of the trial. Later analysis chapters will deal with the more
69
70 Design and Analysis of Clinical Trials
_I
--
Fig. 4.2. Boxplots for placebo and active treatment groups in double-blind
trial of a n oral mouthwash.
type of plot for the data shown in Table 4.1; these data arise from
a double blind trial in which an oral mouthwash was compared with
a placebo mouthwash. Fifteen subjects were randomly allocated to
each mouthwash and at the end of 15 weeks an average plaque score
was obtained for each subject. This was calculated by allocating a
score of O! 1, 2 or 3 to each tooth and averaging over all teeth present
in the mouth. The scores of 0, 1, 2 and 3 for each tooth corresponded,
respectively to:
0 0 - no plaque,
0 1 - a film of plaque visible only by disclosing,
0 2 - a moderate accumulation of deposit visible to the naked
eye!
0 3 - an abundance of soft matter.
The histogram and boxplot for the placebo group show clear evi-
dence of an outlier, and the corresponding plots for the active group
72 Design and Analysis of Clinical Trials
Placebo Active
Mean 0.5348 0.2367
SD 0.2907 0.3432
n 15 15
t = 2.567, d.f. = 28, p-value = 0.0159,
95% confidence interval for treatment
difference (0.060, 0.536).
Myocardial Infarction
Group Yes No Total
Placebo 189 10,845 11,034
Aspirin 104 10,933 11,037
(0.017)(0.9829) (0.0094)(0.9906)
= 0.0015
11034 +- 11037
SE[log(8)] =
=
J' -
1
+-+-
189 10933
0.123
1
10845
1
+-104
Method 1 Method 2
Patient ID Age Pain Score Patient ID Age Pain Score
1 27 36 11 44 20
2 32 45 12 26 35
3 23 56 13 20 47
4 28 34 14 27 30
5 30 30 15 48 29
6 35 40 16 21 45
7 22 57 17 32 31
8 27 38 18 22 49
9 25 45 19 22 25
10 24 49 20 20 39
0
L n -
2
2
l 1
2 1
I 1
v
l
.E 52 - 1
a 2 1
2 ' 1
- 7 -
0
2 1 2 2
0
N - , 7
I I I I I I
20 25 30 35 40 45
Fig. 4.3. Scatterplot of age of patient vs. pain score after wisdom tooth
extraction with patients labelled by treatment group.
The analysis of covariance model assumes that pain on discharge and age
are linearly related and that the slope of the regression line is the same for
each extraction method. Specifically the model has the form:
U ( P )= c($)ru;l{K
n
i=l
- pz(p)) = 0
i=l
(%)'ql(%)
Basic Analyses of Clinical Trials . 79
E ( Y ) = p where p = X p
and the response variables are normal with constant variance g2.
where 77% = g ( p z ) with g(.) being known as the link function. Link
functions are monotone increasing, and hence invertible; the inverse
link f = 9-l is an equivalent and often a more convenient function for
relating p to the covariates.
0 So for a binary variable, g would be the logit function with vi =
log(*) leading to logistic regression. The inverse link is pi = &.
This guarantees that pa is in the interval [0,1], which is appropriate
since pa is an expected proportion or probability in this case.
80 Design and Analysis of Clinical Trials
Variance
function 1 P 4 1-P) P2 P3
I
I I I I
1 434
np12
distribution of 12 month polyp count
Table 4.10. Data from Polyposis CTE Clinical Trial (taken with per-
mission from Piantadosi, 1998).
1 0 7 6 3.6 3.4 . . . 1 17 1
2 0 77 67 71 63 3.8 2.8 3.0 2.8 . 1 20 0
3 1 7 4 4 2 4 5.0 2.6 1.2 0.8 1.0 1 16 1
4 0 5 5 16 28 26 3.4 3.6 4.0 2.8 2.1 1 18 0
5 1 23 16 8 17 16 3.0 1.9 1.0 1.0 1.2 1 22 1
6 0 35 31 65 61 40 4.2 3.1 5.6 4.6 4.1 1 13 0
7 0 11 6
1 1 14 2.2 . 0.4 0.2 3.3 1 23 1
8 1 12 7
20 7 16 2.0 2.6 2.2 2.2 3.0 1 34 0
9 1 7 7 11 15 11 4.2 5.0 5.0 3.7 2.5 1 50 0
10 1 318 347 405 448 434 4.8 3.9 5.6 4.4 4.4 1 19 0
11 1 160 142 41 25 26 5.5 4.5 2.0 1.3 3.5 1 17 1
1 2 0 8 1 2 3 7 1.7 0.4 0.6 0.2 0.8 1 23 1
13 1 20 16 37 28 45 2.5 2.3 2.7 3.2 3.0 1 22 0
14 1 11 20 13 10 32 2.3 2.8 3.7 4.3 2.7 1 30 0
15 1 24 26 55 40 80 2.4 2.2 2.5 2.7 2.7 1 27 0
16 1 34 27 29 33 34 3.0 2.3 2.9 2.5 4.2 1 23 1
17 0 54 45 22 46 38 4.0 4.5 4.2 3.6 2.9 1 22 0
18 1 16 10 1.8 1.0 ' . . 1 13 1
21 1 30 30 40 50 57 3.2 2.7 3.6 4.4 3.7 0 34 0
22 0 10 6 3 3 7 3.0 3.0 0.6 1.1 1.1 0 23 1
23 0 20 5 1 1 1 4.0 1.1 0.6 0.4 0.4 0 22 1
24 1 12 8 3 4 8 2.8 1.1 0.1 0.4 1.0 0 42 1
O M O M
E:
0251 / 025- 0 25
I o w 0 I o w 000000
o w 025 o m 075 < w OD0 025 O M 075 100 O W 025 O m 075 100
Emplrlul PIIJ - IIIN-1) Emplrlul PII) - UINI1) Emplrlul PI11 - II(N11)
Ordinary regression of count Ordinary regression of log-count Poisson GLM with log-link
c
~~~~~
O M 0 1
505
1 025
z
O W O W
000 025 O M 075 1w 000 015 O M 075 100
Emplrlul PI11 - IIIN+1) I
I n w l r l ~PI11 - u(N11)
Overdispersed Poisson GLM with log-link Inv Gausian GLM with log-link
4 26 16 40 14 16 11 434 26 7 45 32 80 34 38 577 1 8
.25
t
X 8
e x X t
'0 i
84
N
CI
O
i t
pl X
nI
x o t
.m
0 0
CI
0
0
X
-.25
t
-.5
I I I I I I
4 8 12 16 20 24
id
Treatment delta-beta's for polyposis
Basic Analyses of Clinical Trials . . . 89
increases with the mean cubed. For these data the gamma model,
with variance increasing with the mean squared, would seem most
appropriate with all the observations having similar and low influ-
ence. For the Poisson model, the expected variance of the points
with the highest mean (those with the highest baselines) is too small,
with these points being attributed greater influence. For the inverse-
Gaussian model, the expected variance of those with the lowest
means (the lowest baselines) is too small, and correspondingly these
points are found to have greater influence.
An alternative way of tackling the problem of non-normal condi-
tional errors using bootstrap is considered in Section 4.5.
0
I I I I
0 1 2 3
best-worst
CDF of qualitative response from lymphoma trial
Fig. 4.7. Cumulative distribution functions over the ordered response cat-
egories for the two treatments of lymphotic lymphoma.
P = 1 - (1 -
m P
1 0.05
2 0.0975
3 0.14625
4 0.18549
5 0.22622
10 0.40126
20 0.64151
50 0.92306
where n1 and n2 are the number of observations in the two groups and
n = n1 +n2; 21 and 2 2 are the sample mean vectors in each group and
S is the estimate of the assumed common covariance matrix given
bv:
s1 = ( 1.001
0.4903 0.5321
0.4903
sz = ( 1.1298 1.1956
1.1956 1.9975
s= ( 1.0162 0.8222
0.8222 1.2217
Xi = [3.7526,3.1100]
XL = [2.1726,1.1833]
not the same as the log of the estimated cost difference. The anal-
ysis of transformed costs simply does not answer the question of
interest. Instead, an approach that analyses the untransformed
cost using some appropriate GLM should be used, for example,
using a model with gamma or inverse-gaussian distributed errors
and log-link function. In such a model, the use of the log-link
function means that treatment and other effects are estimated in
terms of a multiplicative effect, implying an ability to report find-
ings in terms of a percentage reduction or increase in mean costs.
Although slightly less familiar to accountants than reporting an
absolute cost difference (which can anyway still b e derived), the
structure of the model recognizes that costs cannot fall below
zero. An alternative approach is to persist in the use of OLS
linear model methods that give consistent estimates regardless
of the distribution of costs, but to recognize their non-normality
by the use of a method such as bootstrap resampling to estimate
the standard errors. A third approach that may be more helpful
when the cost distribution shows signs of bimodality, is to fo-
cus attention more directly on the factors, including treatment,
that distinguish the high cost from the standard cost individuals,
using logistic regression on a binary split of the cost data.
(3) Almost always patient costs are derived by the application of unit
costs to data on the number of units- used by each patient. The
units might include days and nights on an in-patient ward, num-
ber of outpatient assessments, units of blood products infused
and so on. At least two potentially important consequences fol-
low. Firstly, if there is variation in the costs of the same units
between patients it is unlikely that. this variation occurs at the
patient level, but more often at the level of the medical centre,
supplier or some such. Thus, from the point of view of costs,
patients may fall within a much more complex sampling design,
perhaps with nesting within centre. or within a crossed design
of suppliers of different products or services. A failure to recog-
nize such clustering may give a misleading impression as to the
Basic Analyses of Clinical Trials . . . 101
Estimated Costs
Group Accomod’n Followup Treatment Total
1. cbt 4953 630 5583 11166
2. cbt
3. cbt 5589 12372 18261 36522
4. cbt 6084 2177 8261 16522
5. cbt 3120 2387 5507 11014
6. cbt
7. cbt
8. cbt 4680 1044 5724 11448
9. cbt
10. cbt
11. cbt 5499 21599 27098 54196
12. cbt 6084 1517 7601 15202
13. cbt 6162 11460 17622 35244
14. cbt 7917 352 8269 16538
15. cbt
16. cbt 8814 270 9084 18168
17. cbt
18. cbt 6162 438 6600 13200
19. cbt 7917 1067 8984 17968
20. cbt 7137 15329 22466 44932
21. cbt
22. cbt 3120 3811 6931 13862
23. cbt 4953 109 5062 10124
24. cbt 7683 1670 9353 18706
25. cbt
26. cbt 3212
27. cbt
28. cbt 8814 208 9022 18044
29. cbt
30. control
104 Design and Analysis of Clinical Trials
Estimated Costs
Group Accomod’n Followup Treatment Total
31. control
32. control 6162 6645 12807 25614
33. control 6084 852 6936 13872
34. control
35. control
36. control 6786 6015 12801 25602
37. control 11349 4047 15396 30792
38. control
39. control 5889 2016 7905 15810
40. control 3120 1742 4862 9724
41. control 6786 4734 11520 23040
42. control 4953 1026 5979 11958
43. control
44. control 5889 12323 18212 36424
45. control
46. control
47. control 6786 1600 8386 16772
48. control
49. control
50. control 6162 738 6900 13800
51. control 7917 2334 10251 20502
52. control 7059 25383 32442 64884
53. control
54. control
55. control 6084 21364 27448 54896
56. control
57. control
58. control
59. control 7137 440 7577 15154
60. control
Basic Analyses of Clinical Dials . . . 105
1 0
-
0
0
-
1
9724 -
106 Design and Analysis of Clinical B i a l s
4.6. SUMMARY
The primary data for analysis in many clinical trials will consist of
the observations of one or more outcome variables taken at the end of
the trial. In many cases the analysis might consist of the construction
of a simple confidence interval appropriate for the type of variable at
hand. When covariates other than treatment group are of interest,
then it will usually be necessary to consider some form of model for
the data. Many investigators often use Gaussian-based regression for
all but binary variables, where simple logistic regression is usually
applied. But it will frequently be advantageous to consider more
Basic Analyses of Clinical Trials . . . 107
carefully the nature of the outcome variable and try to find a more
suitable GLM.
One important element in this choice of model is the scale upon
which it is desired to estimate effects. For economic data, the scale
is clear and one for which a transformation of the response variable
would not be helpful. Appropriate choice of GLM or the use of meth-
ods such as bootstrap that are based on the empirical distribution
were shown to be more suitable. Unlike most clinical measures, the
construction of economic measures draws on a considerable amount
of information external to the trial. Trial protocols should where
possible bring the construction of these measures within their scope.
The common practice of using unit costs may also generate complex
patterns of correlated measurement error not easily dealt with at the
analysis stage.
When multiple endpoints are available a variety of methods might
be applied. If the number of outcomes is small, say less than five, then
adjusting significance levels using the Bonferroni correction or some
less conservative adjustment may suffice. Alternatively, simultane-
ous inference using Hotelling’s test may be required. With a large
number of outcome measures the problems become more difficult.
In many cases having many outcome measures may be indicative of
a poorly thought out study. Pocock (1996) points out the merit in
drawing up pre-defined strategies for statistical analysis and report-
ing of trials with appropriate predeclaration of priorities, since there
is a clear need to safeguard against manipulative post hoc emphases
and distortions in conclusions. On the other hand, a protocol that
is too prescriptive may lead to inflexibility and a supression of both
clinical and statistical ‘insight’.
CHAPTER
5
Simple Approaches
to the Analysis of
Longitudinal Data
from Clinical Trials
5.1. INTRODUCTION
Medical treatments rarely result in a one time final result for a pa-
tient; generally, they require clinicians to follow the evolution of a
patient’s health over a period of time. Consequently, in the ma-
jority of clinical trials the primary outcome variables are observed
on several occasions post randomisation and often also prior to ran-
domisation. Such longitudinal data can be analysed in a variety of
ways. In the last decade, many powerful new methods have been
developed which allow a variety of potentially useful models to be
applied, many of which can be employed when the data are unbal-
anced for whatever reasons. (As we shall see in the next two chapters,
however, fitting such models to unbalanced data requires consider-
able care if misleading inferences are to be avoided.) Some of the
newly developed techniques are also able to deal with non-normal
outcome variables, e.g., binary variables indicating the presence or
absence of some characteristic. This methodology will be considered
in detail in Chapters 6 and 7.
108
Simple Approaches to the Analysis of.. . 109
Active
Before treatmer
?!
O J
I I I I I I I I
vl v2 v3 v4 v5 v6 v7 v8
Visit
Fig. 5.1. Individual response profiles by treatment group for the oestrogen
patch data.
Visit
Fig. 5.2. Individual response profiles for oestrogen patch data after stan-
dardisation.
, I I I I I
Prel Pre2 v1 v2 v3 v4 v5 V6
Visit(months)
Fig. 5.3. Mean response profiles for active and placebo groups in the
oestrogen patch trial.
“1 Prel Prep V1
&
v2
i -
v3
’
v4
W
v5
L L
V6
W Y Y Y
w w
Prel Pie2 V1 v2 v3 v4 v5 V6
Fig. 5.4. Box-plots for data from the oestrogen patch data.
16 20 2. 28 0 5 10 15 ZO 5 10 IS 20
1010 1 0 01 1 1 01 1
0 0 0 0
0
I i 0 100
a 1001
I 0 ,lo 11°
1 1
I 10 0
5 10 ar
Fig. 5.5. Scatterplot matrix for the data from the oestrogen patch trial.
118 Design and Analysis of Clinical R i a l s
0 20 40 60 80
Age (months)
Fig. 5.6. Scatterplot of 11290 weights on 2 337 Nepali children. Bold dots
indicate a smoothing spline with 22 equivalent degrees of freedom as a n
estimate of the mean weigh at each age. The repeated observations for 100
children are connected - 50 chosen at random and 50 with extreme mean
weights for their age (taken with permission from Zeger and Katz, 1998).
response over time (see Section 5.3.1 for more details). In this way,
the essentially multivariate nature of the repeated observations is re-
duced to a univariate one. This approach has been in use for many
years, and is described in Oldham (1962), Yates (1982) and Matthews
et al. (1990). Various aspects of response feature analysis will be con-
sidered in this section including how to incorporate covariates, how
to deal with unbalanced data, and the implication for the method of
having missing values. We begin, however, with perhaps the most
important consideration, namely the choice of summary measure.
100 m
I I
C a + + Irnmol/Ll
Fig. 5.7. Observations and fitted curve for one subject’s data in study of
parathyroid hormone (taken with permission from Gornbein et al., 1992).
(The observation and fitted curve for one subject are shown in
Fig. 5.7.)
Each subject’s original data are now summarised by the four pa-
rameters po,/31,pz and ps. Normal volunteers can be compared to
diseased subjects by comparing these four parameters, rather than
by comparing mean profiles. (Since there are four parameters, a
multivariate test such as Hotelling’s T2might be needed.) As these
parameters have a physical meaning, this sort of comparison is gen-
erally easier to interpret than one involving mean profiles.
possibilities when the average response over time is the chosen sum-
mary measure:
We will consider a randomised parallel groups clinical trial for which baseline
and final values are available for each member of the control and treatment
groups.
* Measurements on members of the treatment group are represented by X t ,
(baselines) and Yt, (outcomes), i = 1,.. . ,nt. Similarly, measurements in the
control group are represented by Xci and Yci,i = 1,.. . ,nc.
Suppose that the covariance matrix, X X , Yin , the two groups is identical and
given by:
When POST
- -
is the method of analysis, the effect of treatment is estimated as
PraW= Yt - Y, with variance var (iraw) = qu? where q = 2- +
nt 1
n, .
When CHANGE is used, the effect of treatment is estimated as ?-change =
(s - Xt)- (pc- X c ) = (c - pc) - ( X t - X,) with variance var(FCjchange)=
q(& + ff -; 2paxay).
A more general estimator of the treatment effect is given by Fo = (pt - Yc)-
P(xt- X,) which has variance var(+p) = q[(Oux- p u ~ +) (1~ - p 2 ) u $ ] . AN-
COVA corresponds to choosing a suitable value for p, namely /?’= p a y l u x .
The raw-outcomes and change-score estimator are special forms of the general
estimator with 0 = 0 and ,B = 1.
The three methods POST, CHANGE and ANCOVA can now be seen merely
as ways of adjusting the observed difference at outcome using the baseline
difference.
POST relies on the fact that in a randomised trial, in the absence of a treat-
ment effect, the expected value of the mean difference at outcome is zero.
Hence the factor by which the observed outcome needs correcting, in order to
judge the treatment effect, is also zero.
CHANGE corresponds to the assumption that the difference at outcome, in
the absence of a treatment effect, is expected to be the diffference in the
means of the baselines. This assumption is, in fact, false (see, for example,
Senn, 1994).
ANCOVA allows for a more general system of predicting what the outcome
difference would have been in the absence of any treatment effect as a function
of the mean difference at baseline
(The above is an abbreviated version of Senn, 1998, from where a more detailed
account is available.)
132 Design and Analysis of Clinical Dials
,,..----/---
__--
,. *.-
_.=
I- Ancova
Post scores
___.-
Change scores
Npost=6
Std. diff.=0.5
RhOS.3
Alpha=O.OB
I I I I
0 10 20 30 40 50
Sample size
0 20 40 60
Sample size
_..----
,/’
,/”
,.//
/
/
/ i’
..... Post scores
Change scores
i
,
I i
// Npre=2
/ Npk6
Std. diff.=0.5
Rhod.9
Alpha=O.O5
I I I I I
0 20 40 60 80
Sample size
Covariate Estimate SE P
Treatment 0.377 0.101 < 0.001
Baseline 0.580 0.103 < 0.001
Age1100 -0.402 0.386 0.3
Sex 0.039 0.133 0.8
Centre 0.205 0.107 0.06
Sample Approaches to the Analysis of.. . 141
5.5. SUMMARY
The methods described in this chapter provide for the exploration
and simple analysis of longitudinal data collected in the course of
a clinical trial. The graphical methods can provide insights into
both potentially interesting patterns of response over time and the
structure of any treatment differences. In addition, they can indi-
cate possible outlying observations that may need special attention.
The response feature approach to analysis has the distinct advantage
that it is straightforward, can be tailored to consider aspects of the
data thought to be particularly relevant, and produces results which
are relatively simple to understand. The method can accomodate
data containing missing values without difficulty, although it might
be misleading if the observations are anything other than missing
completely at random.
CHAPTER6
Multivariate Normal
Regression Models for
Longitudinal Data from
Clinical Trials
6.1. INTRODUCTION
143
144 Design and Analysis of Clinical Trials
0
Lo . In
o
-3
0
Fl
0 . m
o
E
i=
0
N - 0
N
m
- 0 a,
2 a
2a
0 - 0
0s OP OE 02 01. 0 0s 09 OE 02 oc 0
0
Lo I
n
0 a
2
rn
0
*
0 ' S a
Lu
0
m
0 . mo
E E
F
i=
N
0
2
0
m
z -!
(D
bb
0 !z
0s OP OE 02 01. 0 0s 09 OE OZ OL 0
where Yi,Xi and ~i are obtained by deleting rows from Yf,Xf and
€5, respectively.
Columns of each design matrix correspond to terms in the model
under consideration. The first column of each Xi,for example, is
a one-vector corresponding to the intercept term; other columns
may correspond t o dummy variables for grouping (between-subjects,
usually treatment) factors, and others to fixed and/or time varying
covariates.
Maximum likelihood estimates of p and 8 can be found by max-
imizing the log-likelihood L(8,p) given by:
. r n n
depend heavily upon using the correct model for the data. Con-
sequently, examining residuals and other diagnostics for detecting
model misspecification after fitting the proposed model becomes of
even greater importance than usual.
A standard maximum likelihood approach as described above is
known to produce biased estimators of the covariance parameters. In
many cases this problem is unlikely to be severe, but where it i s of
concern the method of restricted maximum likelihood or REML es-
timation, introduced originally by Patterson and Thompson (1971),
might be used. The details of REML estimation are given in Diggle
et al. (1994). Often the two estimation methods will give similar
results, but where they do differ substantially, those obtained from
REML are generally preferred.
!c=
y,'= ('a)
Yaz
andXf= (zz)
156 Design and Analysis of Clinical Trzals
with representing the missing values for this subject. The terms
yi2
212, k21 and 2 1 1 are appropriate submatrices of X(6). The esti-
mated missing values depend on the covariance structure assumed
for the repeated observations; this is illustrated in Table 6.5 which
shows the imputed missing values for the oestrogen patch data under
each of the covariance structures considered above.
Table 6.5. Imputed Missing Values for the Oestrogen Patch Data
(under three different models for the covariance structure of the
repeated observations).
Subject No. of Group Model
Missing Values Indep. CS Unstruc.
3 4 placebo 11.6 12.9 12.5
10.3 11.7 11.2
9.2 10.6 10.1
8.4 9.8 9.2
8 4 placebo 17.3 22.4 22.9
16.0 21.2 20.7
15.0 20.1 19.5
14.1 19.2 17.3
11 5 placebo 16.9 14.4 14.7
15.4 13.0 13.7
14.1 11.8 12.8
13.1 10.7 12.0
12.2 9.8 11.8
13 4 placebo 16.4 15.6 18.8
15.1 14.4 17.2
14.0 13.4 16.8
13.2 12.5 16.0
14 4 placebo 14.0 9.6 11.9
12.7 8.4 11.1
11.6 7.3 10.9
10.8 6.4 11.2
Multivariate Normal Regression Models for . . . 157
ri = yi - Xip, (6.10)
Multivariate Normal Regression Models for ... 159
oOOo
0
0
I I I
5 10 15
Chi-squore quonliler
Fig. 6.2(a)
160 Design and Analysis of Clinical Trials
8ooo
I
I I
5 tO 15
Chi-square quonliies
(b)
Chi-squored plot of Moholanobis distances of
residuals from unstructured model
I 0
o o
0
ocloo
0
I I I
3 10 15
Chi-square quanliles
clearly departs from the expected y = z regression line but both the
compound symmetry and unstructured models lead to acceptable
plots.
I 1 1 I , 1
0 5 15 30 0 5 15 30
Tme(mins) Time(rnins)
w -
b -
N N -
0 0 -
I , 1 I , 1
0 5 15 30 0 5 15 30
Time(mins) Time(rnins)
w - T T
In-
* -
m -
N - w
- - LL.I
0 A 0
I
m -
N -
7 -
We assume that the recovery scores for the kth subject at the j t h
time point and the ith drug dose, y i j k , are given by:
where a i k and bik are random intercept and slope for subject k having
dose i, and ~ i j are
k random error terms. The t j are the times at which
the recovery scores are measured. The random intercept and slope
terms are assumed to have a multivariate normal distribution:
Likewise, the random error terms are assumed to have the following
distribution:
(6.15)
z = ( ;1 ’.).
30 (6.16)
Under this model, the expected values of the recovery scores are:
E ( Y i j k ) = Qi + pztj (6.17)
and the covariance matrix of y i k has the following random effects
structure:
z: = ZQZ‘ + a21 (6.18)
Multivariate Normal Regression Models for 167
One of the most common types of analysis for longitudinal data, par-
ticularly in psychiatry and psychology, remains either univariate or
multivariate analysis of variance. The ANOVA approach is usually
formulated in terms of a mixed-effects model which includes a ran-
dom subject effect, this allowing the repeated observations to have
the compound symmetry covariance structure. Such a model for the
168 Design and Analysis of Clinical Trials
The analyses carried out on the oestrogen patch data in Section 6.4
are only strictly valid if the missing observations caused by the
Multivariate Normal Regression Models f o r . . . 169
Yk = Y { , k = 1 , 2 ,...,D - 1
= 0 k>D
Pr(Yk = OIHI,,Yk-1 = 0 ) = 1
Multivariate Normal Regression Models for . . . 171
Dropout parameters:
Maximised likelihood:
[l]- 1291.848
Mean Parameters:
r I I I I I
v1 v2 v3 v4 v5 V6
Visit
Fig. 6.5. Means for completers and cohorts of subjects dropping out a
particular times for oestrogen patch data.
Multivariate Normal Regression Models for . . . 175
A 3 20 17 14 8 62
B 2 13 34 11 2 62
6.9. SUMMARY
For longitudinal data where the response variable has a normal distri-
bution, the regression model described in Section 6.4 can be applied
176 D e s i g n a n d A n a l y s i s of Clinical Trials
1 improved
Fig. 6.6. Individual profiles for all patients and for mixture modelling-
derived 'types'.
7.1. INTRODUCTION
178
Models for Non-Normal Longitudinal Data from Clinical Dials 179
visits. Two treatment groups were involved and in addition two fur-
ther covariates, sex and age, were available together with a binary
indicator for centre. One possible logistic marginal model is given
by:
0 Corr(xj,yZk) = 0
Baseline 0
0000 25 13.3 14.3 21.9
0001 2 4.3 4.1 3.1
0010 4 4.6 4.6 3.5
001 1 0 2.3 3.0 1.4
0100 2 5.0 5.6 4.0
0101 0 2.5 2.9 1.6
0110 2 2.7 3.7 1.8
0111 4 2.1 4.1 1.7
1000 3 5.4 4.2 4.5
1001 0 2.7 1.7 1.8
1010 1 2.9 2.0 2.0
1011 2 2.2 1.8 1.9
1100 2 3.2 2.2 2.2
Models for Non-Normal Longitudinal Data from Clinical Trials 183
U ( P , a )= c(y ) ’ v ; l ( y i
n
i=l
- pi@))= 0
Table 7. ( Continued )
~ ~~ ~
-
R= [ 1.oo
0.33(0.18) 1.00
0.33(0.18) 0.33(0.18) 1.00
0.33(0.18) 0.33(0.18) 0.33(0.18) 1.00
(b) AR-1
( c ) Unstructured
0.32 1.00
R=
0.20 0.42 1.00
\0.29 0.34 0.38 1.00)
measure of the response was also taken and, in addition, the age of
each patient was recorded. Figure 7.1 is a plot of the log of the total
number of seizures (plus one) during the trial against the log of the
baseline seizure count. Except for subject 58, with an unusually low
number of seizures during the trial, the relationship looks plausibly
linear on this log scale. The plot also suggests that subject 49 is
likely to have high influence.
The difference in mean log-total for placebo and treatment
groups (3.21 - s.e. 0.16 and 2.80 - s.e. 0.19) of 0.41 was not
significantly different from zero using a simple t-test. Of course,
controlling for baseline differences improves efficiency. However, A
188 Design and Analysis of Clinical Rials
ID
- Y1 Yz Y3 Y4 Treatment Baseline Age
1 5 3 3 3 0 11 31
2 3 5 3 3 0 11 30
3 2 4 0 5 0 6 25
4 4 4 1 4 0 8 36
5 7 18 9 21 0 66 22
6 5 2 8 7 0 27 29
7 6 4 0 2 0 12 31
8 40 20 23 12 0 52 42
9 5 6 6 5 0 23 37
10 14 13 6 0 0 10 28
11 26 12 6 22 0 52 36
12 12 6 8 4 0 33 24
13 4 4 6 2 0 18 23
14 7 9 12 14 0 42 36
15 16 24 10 9 0 87 26
16 11 0 0 5 0 50 26
17 0 0 3 3 0 18 28
18 37 29 28 29 0 111 31
19 3 5 2 5 0 18 32
20 3 0 6 7 0 20 21
21 3 4 3 4 0 12 29
22 3 4 3 4 0 9 21
23 2 3 3 5 0 17 32
24 8 12 2 8 0 28 25
25 18 24 76 25 0 55 30
26 2 1 2 1 0 9 40
27 3 1 4 2 0 10 19
28 13 15 13 12 0 47 22
29 11 14 9 8 1 76 18
30 8 7 9 4 1 38 32
Models for Non-Normal Longitudinal Data from Clinical Trials 189
0
0
0
0
0 00
0 0 o o
00 0
0
O
0 0
O38
0 0
o"o"go 02000 0
0 00 O O
0 0
0 0 0
0
0
0
1 I I 1 1 I I
2.0 2.5 3.0 3.5 4.0 4.5 5.0
Fig. 7.1. Plot of log of total number of seizures (plus one) against log of
the baseline seizure count for epilepsy data in Table 7.6.
with a log link function and Poisson errors to the total seizure count
gave a Pearson model chi-square of 614.79 for 55 degrees of free-
dom, clearly indicating some form of overdispersion. As we shall see,
this has implications for the specification of any GEE model that
we might fit as part of a longitudinal analysis of the four separate
trial counts.
Table 7.7 shows the results for treatment effect and for the trend
over visit from models with a log link, Poisson errors and additional
covariates for log-baseline and age which, for the sake of clarity, have
not been reported in the table. For illustrative purposes, the fitted
model included simple main effects and a common pattern of overdis-
persion, though Thall and Vail (1990) find some evidence in favour of
a more complicated structure. We show results from six models:
192 Design and Analysis of Clinical R i a l s
Unlike the previous example, the results from the basic exchange-
able model (2) are quite different from those of the independence
working model with robust standard errors (1). However, this is not
because the exchangeable correlation matrix is grossly innappropri-
ate (though it is not especially good). Rather, as the results from
model (3) show, it is because model (2) has not allowed for overdis-
persion. It is clear that at least as much thought must be given to
the possible presence of overdispersion as to the structure of the cor-
relation matrix. The use of the robust standard errors will cope with
failures in either, and as we see from model (4), their use with the
overdispersed exchangeable model does increase the standard error
for the treatment effect beyond the value of the classical standard
error from the overdispersed model. The AR-1 Model (5) and the
unstructured or free correlation Model (6) give quite similar results.
Inspection of the table of estimated correlations from Model (6), ex-
plains why. Unlike in the respiratory disease example, the estimated
pattern is suggestive of an autoregressive rather than an exchange-
able correlation matrix.
The results from all these GEE models differ strikingly from the
results based on applying regression to the log-transformed counts.
This very much reflects the variation in the relative influence that
is attached to a small number of datapoints according to the partic-
ular mean-variance relationship assumed, a circumstance previously
illustrated using the polyp count data in Chapter 4. The analyses
Models for Non-Normal Longitudinal Data from Clinical Daab 193
R=(.:- 1.00
0.39 0.57 1.00
)
0.24 0.30 0.41 1.00
g(pc.) = X'
23
.p* + d'23. ~ i ,
0 q = (b'u(p;j)
23
1.o
0.8
-- 0.6
II
r,
h 0.4
0.2
0.0
4 -2 0 2 4
X
This likelihood can be averaged over some distribution for T and then
maximised over subjects to obtain the required regression coefficient
estimates. A variety of distributional forms for 7 might be assumed.
Table 7.9 shows the results in which the random effect T has been
assumed to be normally distributed. The integral calculation was
approximated by the use of %point Gaussian quadrature.
The significance of the effects as estimated by this random ef-
fects model and by the GEE models of Tables 7.7 is generally simi-
lar. However, as expected from the discussion above, the estimated
coefficients are substantially larger. Thus, while the estimated ef-
fect of treatment of a randomly sampled individual, given the set
Models for Non-Normal Longitudinal Data from Clinical Trials 197
where now pfj and are the expectation and variance of y i j con-
ditional on all x j - k for k 2 1:
An example of a simple transition model for the respiratory status
data is:
Yt = 0 100 27 127
Column % 79.4 26.5
Yt = 1 26 75 101
Column % 20.6 73.5
Total 126 102 228
Active
Yt = 0 49 20 69
Column % 62.8 14.5
Yt = 1 29 118 147
Column % 37.2 85.5
Total 78 138 216
neither large nor at all significant, whereas the main effect of the
treatment (and prior history) is both.
The use of robust standard errors in the previous table could be
thought of as applying an IWM marginal modelling strategy to a
transition model. Equally, we could have approached the analysis by
adding the lagged response to the covariate list of a previous marginal
model. We did just that to the model of Eq. (7.1), and Table 7.12
gives the estimates for a subset of the estimated effects. Comparison
with Table 7.4 shows the estimated effect of treatment to be slightly
reduced, and the lagged respiratory status as a significant predictor,
its effect being in addition to that for the baseline response.
We have tabulated the expected frequencies for this model in
Table 7.2 for comparison with the corresponding simple marginal
model without lagged effects and the random effects model. In this
instance, as a method of modelling the joint response distribution,
Models for Non-Normal Longitudinal Data from Clinical Trials 201
there is little that has been gained by the addition of the lagged
response, particularly in comparison with the random effects model.
This is perhaps not surprising given the evidence from previous anal-
yses that an exchangeable model fits the data better than an autore-
gressive one, and that the chosen random effects model corresponds
to an assumption of an exchangeable correlation, while the transition
model corresponds rather more to the autoregressive pattern.
when the data are not MCAR, has been discussed by Kenward et
al. (1994). Robins and Rotnitzky (1995) have modified the GEE
method, allowing it to be valid under the weaker assumption that
the data are missing at random. The modification in part involves
weighting the usual GEE by the inverse of the estimated probabilities
of ‘missingness’for each subject. These probabilities can be derived
from a logistic (or probit) model of response in which observed out-
comes and/or other observed measures are covariates.
Table 7.13 presents longitudinal data from a clinical trial in which the
response was measured on an ordinal scale of severity (0 = good, 3 =
poor) in relation to recovery from sprain injury. The principal inter-
est here lies in the impact of treatment on speeding-up the process
of recovery, i.e., a time by treatment interaction.
One relatively straightforward method of analysis is to use a pro-
portional odds model (see Chapter 4) and to tackle the repeated
measures aspect of the problem using the marginal modelling ap-
proach, most simply by assuming an independence working model
(or the essentially equivalent methods of survey research). The only
terms in the model that we will examine are those for treatment, a
linear trend for occasion and their interaction. Any correlation over
time is not being formally modelled.
However, before considering the results, an inspection of
Table 7.12 indicates that by the end of the trial, 9 out of the 30 par-
ticipants had dropped out. We compare three approaches to tack-
ling this problem of missing data. The first method uses all the
available observations and requires that we assume that the missing
observations are missing completely at random. The second method
uses ‘last observation carried forward’ to ‘fill-in’ missing observations.
204 Design and Analysis of Clinical Trials
included in the analysis were: all time 1, all time 2, all available
time 3 where each was weighted by a simple weight, and all available
time 4 observations that also had time 3 observations and where each
was weighted by the product of simple weights at time 3 and time
4. In the main analysis account is taken of the fact that these non-
response weights are not frequency weights by the use of the robust
covariance estimator.
Results of the three analyses are compared in Table 7.14. The
fitted models all indicate a significant and large negative main effect
for time, reflecting a marked reduction in severity over time in the
comparison group. The negative estimate for the interaction with
7.7. SUMMARY
This chapter has outlined and illustrated the three main ap-
proaches to the analysis of non-normal longitudinal data: the
marginal approach; the random effects or mixed modelling approach;
and t,he transition model approach. Though less unified than the
methods available for normal data, these methods provide powerful
and flexible tools to a.nalyse, what until relatively recently, have been
seen as a.lmost intractable data. However, as with any such tool, care
in their use is required.
The random effects and transition models approaches, being
based on probability models, can be estimated by maximum likeli-
hood. This gives them a capability with respect to dealing with data
Models for Non-Normal Longitudinal Data from Clinical Trials 207
Survival Analysis
8.1. INTRODUCTION
In many clinical trials, the main response variable is often the time
to the occurrence of a particular event. In a cancer study, for exam-
ple, surgery, radiation and chemotherapy might be compared with
respect to the time from randomisation and the start of therapy until
death. In this case the event of interest is the death of a patient, but
in other situations it might be the end of a period spent in remission
from a disease, relief from symptoms, or the recurrence of a partic-
ular condition. Such data are generally referred to by the generic
term survival data even when the endpoint or event being studied
is not death but something else. Questions of interest for such data
involve comparisons of survival times for different treatment groups
and the identification of prognostic factors useful for predicting sur-
vival times. Since these do not differ from the questions usually
posed about other response variables used in clinical trials, it might
be asked why survival times merit any special consideration, in par-
ticular a separate chapter? There are a number of reasons. The first
is that survival data are generally not symmetrically distributed -
they will often be positively skewed, so assuming a normal distri-
bution will not be reasonable. A more important reason for giving
survival data special treatment is the frequent occurrence in such
data of censored observations. These arise because at the comple-
tion of the study, some patients may not have reached the endpoint
208
Survival Analysis 209
S ( t )= 4 co
f(u)du= 1
I
I
I
I
II
,, lambda-0.5
I lambda-1.0
, I
II
,
'. '. II
:' I
.. I,
..,,
%,
0 2 4 6 8 10
Time
i;t \
I
1- lailbda-0.5
.......... lamwa=l.o
lam-2.0
I I I
0 2 4 6 8 10
Time
0 2 4 6 8 10
Time
3(t)=
jlt(,)<t
(1 - 2) (8.9)
(8.10)
Placebo
.............
0 20 40 E€l
Time in months
Fig. 8.3. Kaplan-Meier estimated survivor functions for the two treatment
groups in t.he prostate cancer trial.
Survival times
~~~ ~ ~ ~
2.3 T1 T2 Total
Dead l(0.55) O( 0.45) 1
Alive 4 4 8
Total 5 4 9
3.8 T1 T2 Total
Dead O(0.5) l(0.5) 1
Alive 4 3 7
Total 4 4 8
218 Design and Analysis of Clinical Dials
18.7 T1 T2 Total
Dead O(0.33) l(0.67) 1
Alive 1 1 2
Total 1 2 3
Expected number of deaths are shown in parentheses,
‘Alive’ means alive and at risk.
Observed number of deaths for T1 = 3;
Expected number of deaths for T1 = 0.5 + 0.55 + 0.5 +
+
0.5 0.5 + 0.33 = 2.89.
Observed number of deaths for T2 = 3;
Expected number of deaths for T2 = 0.5 + 0.45 + 0.5 +
+ +
0.5 0.5 0.67 = 3.11.
(3 - 2.89)2 (3 - 3.11)2
x2 = = 0.008
2.89 $- 3.11
H ( t )= /0
t
h(u)du (8.15)
220 Design and Analysis of Clinical Trials
1
0 I
The regression models that have been developed for survival data
are essentially of two types. The first models the hazard function
in patient groups compared to a baseline population by means of
a multiplicative model, that is to say, additive on the log-hazard
scale. The multiplicative factor is assumed to be constant over time,
in which case the model forces the hazards in the different patient
groups to be proportional, thus yielding a proportional hazards re-
gression model. Following Pet0 (1976), estimates of the hazard ratio
in the two sample case (A and B) can be obtained directly from
the Mantel-Haenzsel log-rank test statistic and its variance (see Sec-
tion 8.2.1). Alternatively, following Mantel and Haenzsel (1959), it
may be estimated as:
(8.17)
(8.19)
(8.21)
i=l
Weibull
Log-normal
(8.24)
8 Assume first that there are no tied survival times and that tl < t 2 <
. . . < tl, represent the k distinct times to the event of interest among n
individual times.
0 The conditional probability that an individual with covariate vector xi
responds at time t,, given that a single response occurs a t time ti, and
given the risk set R, (indices of individuals at risk just prior to ti), is the
ratio of the hazards:
exP(P’xi)
c
jER,
exp(P’xJ
and Prentice (1973). In the particular case where there are no tied
survival times the estimated baseline hazard function at time t ( j ) is
given by:
i L O ( t ( 2 , ) = 1- i z (8.25)
(8.26)
where x ( ~is
) the vector of explanatory variables for the individual
who experiences the event of interest at time t(q. (Since the baseline
hazard function is the hazard function for an individual having zero
values for all the explanatory variables, it is often helpful to redefine
variables by subtracting average values over all individuals in the
sample.)
The baseline survivor function can now be estimated from:
So(,) = nk
j=1
i j (8.27)
SZ(t) = [ S o ( t ) ] e x p ( B ’ x , ) (8.28)
(8.29)
Months surviving
Time
Fig. 8.5. Estimated survivor functions for the two treatment groups in the
lung cancer trial when other covariates are fixed at their mean values.
234 Design and Analysis of Clinical Trials
To an extent, the particular tests and checks that one might use
of the modelling assumptions and the extensions of the model that
might be considered, will depend upon which of these possible causes
of nonproportionality are suspected.
As in the checking of other models, residuals play a key role.
A number of different residuals have been proposed for the Cox
model.
0 Cox-Snell residual; this is defined for the ith individual as:
"j - c
i E Rj
[Xiexp(Pz2)l
u(p)=
?j =
c
iER3
exp(Px2)
(8.35)
These compare the z vector of the subject who fails with its expected
value among all those subjects at risk. For each regressor, one (par-
tial) residual is obtained for each event, a feature as we show below
that is convenient for checking for homogeneity of effect over the
course of a trial.
Using once again the lung cancer trial data, Fig. 8.6 shows the
martingale residuals from the fitted Cox's regression model plotted
1-
31
92
0 - 81
-am
-1 -
2
!?
-mm -2-
(D
; -3- 7s
70
r 78
-5
1"' -1.5
1 I
-1 -.5
I
U
I
0
I
.5
log-performance index
Martingale residuals and log-performance index
Fig. 8.6. Martingale residuals from the Cox's regression model fitted to
the survival times from the lung cancer trial potted against the continuous
covariate.
Survival Analysis 237
1I
‘I
.....
Observation number
Fig. 8.7. Index plot of deviance residuals from Cox’s regression model fitted
to the survival times from the lung cancer trial.
o perf1 A perf2
0 perf0
6 500 1000
time
Cumulative hazard by performance stratum against time
Fig. 8.8. Cumulative hazard functions for strata defined by the Karnofsky
Performance Index from the lung cancer trial.
44
70
-1
I I I
1 500 1000
time
Scatter and smooth of score residuals for log-performance
Fig. 8.9. Schoenfeld (1982) or partial score residuals and regression smooth
plotted against log-Karnofsky Performance Index from the lung cancer trial.
(8.37)
nA .ng.
For the the MH estimator, the weights w j are given by 3
nj
3 , and
nA.ng.
for the Prentice estimator, by nj ni=,nICnk:l
-d +1
*
8.5.6. Influence
In the standard design of trial, the construction of the partial
likelihood means that as survival time increases, so the risk set be-
comes smaller. Thus individuals with the longest survival time con-
tribute not only through their numerous appearances in the risk sets
prior to their failure/censoring, but also to those risk sets near the
end of the trial in which few subjects are being compared. As a
result, individuals with the longest survival are most likely to be
influential points.
One simple approach to ensure that estimates are robust to the
possible presence of such individuals, is to artificially censor all ob-
servations at some earlier point in the trial. This can be considered
an extreme form of a weighted partial likelihood in which likelihood
contributions beyond the artificial censoring time are given a weight
of 0. More systematically, Sasieni (1993) suggests that the contri-
bution to the partial likelihood of each risk set be weighted by a
quantity proportional to the total number of individuals still at risk.
The Kaplan-Meier survivor function suggests itself as one such pos-
sible quantity. The inclusion of such weights requires the use of the
‘robust’ estimator of the parameter covariance matrix.
We illustrate in Table 8.6 the use of Kaplan-Meier weighting for
the lung cancer trial data. In this instance, the weights make very
242 Design and Analysis of Clinical Trials
(1) It can be applied to groups who either a przori may have quite
different survival patterns, e.g., for example men and women, or
post-hoc to groups defined by a variable that has failed a test of
proportionality.
(2) It can be applied so as to obtain distinct estimates of survivor,
hazard or integrated hazard functions for the groups in ques-
tion that are adjusted for the remaining covariates but which are
otherwise unconstrained. Plots of the group specific functions
can be used to check proportionality. This has particular appeal
where the grouping variable is treatment as it provides a sim-
ple graphical illustration of possible variation in treatment effect
over time.
(3) The grouping variable need not be a fixed grouping factor but
may be time varying, e.g., a variable defining the state of the
disease. Strata can also be defined to distinguish between the
different durations, e.g., first event, second event and so on in a
recurrent event process.
8.7.2. Matching
Where patients have been matched, the simplest approach to analyse
the resulting data is to specify each matched group as a separate stra-
tum. For a simpie two-treatment trial, the resulting partial likelihood
corresponds to conditional logistic regression with contributions only
from strata in which the first of the time ranked survival times in a
pair is a failure time (i.e., pairs in which either the shortest time or
both times in a pair are censored make no contribution).
Survival Analysis 247
(8.39)
where h,(t) and H,(t) are the baseline hazard and integrated hazard,
respectively, and the Poisson rate parameter pij is specified by a log-
linear model of the form:
Schoenfeld (1994) present a method that uses the time to the interme-
diate state as a covariate for subsequent survival. Their simulations
suggest that although improvements in precision of the estimates of
main interest are possible, losses of efficiency are also possible where
the intermediate state is not strongly prognostic. The use of weighted
estimating equations provides one of a number of other approaches
to this problem (Robins and Rotnitsky, 1992).
Z**ue r
I 0
1.88
136
0
Juna
1978
May
1979
Ocl. MMcI)
1979 1980
Ocl.
1980 %Y Ocl.
1981
June
1482
Dab
Fig. 8.10. O’Brien and Fleming boundary for log-rank test in a randomised
double blind trial.
also shown as the trial progressed. On the 5th interim analysis, the
log-rank test approached but did not exceed, the critical value. On
the 6th interim analysis, the logrank statistic was 2.82 and exceeded
the critical value of 2.23. This resulted in stopping of the trial almost
a year earlier than planned.
(8.41)
where n ( t ) is the size of the risk set at time t. The ARR can be
made more concrete by multiplying by the size of the treatment arm
of the trial to give a number of deaths avoided (NDA = ARR x n ) ,
with a variance given by n times var(ARR). It is sometimes sug-
gested that the estimated number of patients in the population who
might benefit from the treatment be used in this calculation to give
an estimate of the therapeutic impact. Care is required here, in
that the size may be rather poorly known and it is in general un-
likely that the characteristics of this wider group of patients will be
like those enrolled in the trial.
An increasingly common measure is the number needed to treat
(NNT), the inverse of the absolute risk reduction (NNT = l/ARR).
Like the ARR, this must be reported together with the length of
follow-up that is being assumed. A confidence interval can be ob-
tained by inverting the corresponding endpoints of the interval for
the ARR.
Epidemiologists find the risk ratio RR a natural scale. Given by
(1 - s A ( t ) ) / ( l - S q t ) ) ,it allows effects to be described in terms of
a halving (or some other fraction) of the risk. Note that it, too,
requires specifying the relevant follow-up period. An estimate of its
variance can be obtained using Fieller’s theorem (see Marubini and
Valsecchi, 1995). A relative risk reduction or difference, 1 - RR(t),
is also sometimes used.
8.13. SUMMARY
Survival analysis is the study of the distribution of times to some
terminating event (death, relapse etc.). A distinguishing feature of
254 Design and Analysis of Clinical Trials
Bayesian Methods
9.1. INTRODUCTION
Until relatively recently, Bayesian statistics were Iittle more than an
intellectual curiousity; rich in conceptual insight but of little practi-
cal value when it came to actual data analysis. All this has changed
in the most dramatic fashion, with Bayesian methods and appli-
cations now forming an area. of the most intense activity. In many
cases, Bayesian approaches lead to the same or similar conclusions as
those using the routine procedures of the frequentist statistician. But
differences do occur and there are proponents on each side with, for
example, Berry (1993) presenting the case for greater use of Bayesian
methods in clinica.1trials and Whitehead (1993) presenting the case
for the ongoing dominance of the frequentist approach, at least. in
the context of definitive phase I11 trials. Most statisticians involved
in trials are typically rather pragmatic, for the most part using the
familiar and quick to apply frequentist methods, but nonetheless also
applying Bayesian methods where these are convenient or have some
conceptual advantage and are acceptable to the intended consumer.
Much recent work proposing Bayesian methods in clinical trials has
been concerned a s much with establishing acceptability as with de-
veloping the methods.
Traditional frequentist analysis essentially treats each trial or
experiment as if it were entirely novel and each t.rial is usually be-
ing considered as being individually potentially decisive. Scientific
255
256 Design and Analysis of Clinical Trials
progress may occur outside the narrow focus of this trial, but the
numerical procedures themselves are not formulated to reflect the
process of progressive learning nor one in which the process itself in-
volves costs and potential benefits. By contrast, the focus of Bayesian
methods is one of progressive refinement of opinion as data from tri-
als and other sources accumulate. This is illustrated in Fig. 9.1.
Knowledge prior to a trial is synthesised and formally represented
as a distribution over the parameter space of the problem. The trial
is undertaken. The data from the trial is combined with the prior
distribution to form a posterior distribution over the same parameter
space, one which is hopefully more concentrated than the prior.
I-
prlor
I A.
Dlstrlbution
Trertmant
Dlftcrence
Bayes'
[Data1 I
0
T Mean
*
Theorem
Posterior
I
Distribution
Olfterence
I
(Decisions]
0
/
P(0ID)= P(O)P(D[O)/ P(O)P(D[O)d0 (9.2)
258 Design and Analysis of Clinical Trials
E [ f ( O ) l D ]= / /
f ( O ) P ( O ) P ( D I O ) d @ / P(O)P(DI@)d@ (9.3)
~ ( 8=) r l ( i - o)@-lr((Y +
p)pya)r(p) (9.4)
+
This has mean (Y/((Y p) and variance a p / ( ( a ,@'(a ?!, + + +
1)).
Figure 9.2 shows four beta distributions. Symmetrical unimodal dis-
tributions are obtained for (Y = p > 1, narrowing as their value
increases, becoming asymmetrical when a does not equal p. Also
notice that although a = p = 1 is uniform over 8, calculation of
the variance shows that larger variances are obtained from bimodal
shapes in which (Y and p tend to zero.
In a sequence of n trials we observe T successes, giving a like-
lihood P ( n , r ( B )proportional to O'(1 - Q)n-'. Equation 9.2 shows
that the posterior distribution is obtained by multiplying the prior
distribution by this likelihood and standardising. The selection of a
conjugate prior makes this straightforward, giving
0 1
0
monitor the ra.nks of the effects of each treatment from each MCMC
sample, and to obtain an estimate of their distribution, confidence
intervals and so on.
A second example, one that we illustrate later in this chapter, is
the estimation of the Number Needed to Treat (NNT) effect measure
together with its confidence interval. A variety of other effect scales
might be contemplated and the discussion in Chapter 5 relating to
the summary statistic approach is relevant here, both for pointing to
aspects of the trial that may be of interest, but also in warning of
the need to have agreed a single decision criterion prior to the study,
whenever that is appropriate.
(9.7)
Twice the logarithm of Blo is on the same scale as the deviance and
the likelihood ratio test statistic. The following scale is often useful
for interpreting values of Blo:
+
interval [t,t d t ) . As dt -+ 0 and assuming a proportional hazards
model with time constant covariates and time constant random effect
for each pair, the intensity takes the form:
- Continue trial
the confidence interval for the treatment effect with this range of
equivalence. Where the confidence interval falls entirely above SL,
the study should be stopped and the new treatment adopted as rou-
tine. Where the interval falls entirely below Su,the study should be
stopped and the standard treatment continued as routine. If neither
applies, then the trial should continue.
The values of SL and Su were obtained by interview with a range
of clinicians involved in the treatment of the disease in question.
Clinicians responded to the questions of the sort, “if the real ben-
efit was x%, would you use the new treatment as your routine?”
Such questions were asked a number of times, changing the value
of x up and down the scale, to identify the levels at which the
clinician would definitely use (Su)and not use (SL) the new treat-
ment. Such questioning yields results as illustrated in Fig. 9.4, with
both the location and the width of the range of equivalence varying
from clinician to clinician. The extent of overlap between the prior
distribution and the range of equivalence provides the ethical basis
for randomisation. Freedman and Spiegelhalter found considerable
2 5 1
0
-
C
u
u
W
.-I
I
m
U
U -
0- 11
I I I
Use standard treatment
I I
Bayesian Methods 271
9.8. MONITORING
9.8.1. The Prior as a Handicap
Subsequent real data arising out of the trial with which to update the
prior are summarised by corresponding values pd and nd. Combining
the prior and data together using Bayes theorem gives a posterior
+
mean p p , given by the simple weighted sum (po x no pa x n,-J/(ng +
+
nd) with variance 0; given by 4/(no n d ) . Undertaken early in the
trial with little real data, the combined results will largely reflect
the prior. Thus only the most extreme data could result in confidence
intervals that did not include one or the other of the SL and Su,
with correspondingly little chance of very early stopping of the trial.
Undertaken late in the trial, t,he weight of data will dominate the
influence of the prior, generating results that are similar to those
of a traditional analysis. A typical prior thus acts as a handicap,
restraining the study in the early phases from premature conclusions.
Grossman et al. (1994) provide a more structured approach for
how the handicap can be used for interim analyses within a group se-
quential design. They consider a trial involving T sequential blocks,
each of n/T patients, with analyses folIowing the completion of each
block. In addition, a handicapping prior sample of size f x n,the
size of f reflecting the extent of handicap desired, is included. Thus
+
at the tth analysis, the sample size is f n nt/T. If each block of
patients has mean response yt, and the standard deviation of the
response is 0 ,then the stopping rule test statistic is 2, = (Yl + Y2 +
. . .+Y~)-\/(n/t~)/c. Without adjustment, the trial would stop if the
test statistic exceeded the usual critical value Z,, 1.96 in the case of
a = 0.05. Grossman et al. (1994) show how the inclusion of the
handicap sample in a trial with t analyses increases the critical value
+
of the test statistic by the factor J ( ( t f T ) / t . Freedman, Spiegel-
halter and Parmar (1994) compare this 'unified approach' to trial
Bayesian Methods 273
+
variance 4/(22.7 40) = 0.064 (standard deviation 0.252). For the
enthusiasts, the probability of the hazard ratio being greater than,
say, 1.5 is 1- 4((log(1.5) - 0.930)/0.252) = 1 - 0.019 = 0.981. They
would be happy for the trial to be stopped early.
However, combing these data with the prior of the sceptics gives a
posterior distribution, with mean p p = (0 x 22.7+ 1.065 x 40)/(22.7+
40) = 0.679 and variance 4/(22.7 + 40) = 0.064. This posterior
distribution assigns probability 1 - 4((0.405 - 0.679)/0.252) = 1 -
0.139 = 0.861 to the hazard ratio, being greater than 1.5. This
might not be sufficient to persuade the sceptic that the trial should
be stopped.
(9.14)
276 Design and Analysis of Clinical Trials
/ ~ ( 8 7d>h(olvi>do (9.16)
where
(9.18)
278 Design and Analysis of Clinical Trials
where
9.11. SUMMARY
Although still unfamiliar to some statisticians and regulators, the
use of Bayesian methods in the design and analysis of clinical trials
is becoming increasingly common. We have done little more than
sketch some of the key ideas and procedures in Bayesian estimation
and analysis. With faster computation and improvements in the-
ory and associated algorithms, virtually all statisticians involved in
trials are likely to find themselves using Bayesian technology. To do
so often does not require an acceptance of the use of subjective prob-
ability. However, progress is being made in formalising procedures
that do use subjective probability. When combined with explicit gain
functions into a formal decision framework, the approach is clearly
well suited to guiding the research strat,egy of a group of investgators
sharing a common interest. However, we are yet to see whether these
will gain longer term acceptance in the more public arena of definitive
trials. Bayesian methods are now also much used for meta-analysis,
the subject of the final chapter.
CHAPTER10
Meta- Analysis
10.1. INTRODUCTION
280
Meta-Analysis 281
Chalmers and Lau conclude, “It seems obvious that a discipline which
requires that all available data be revealed and included in an analysis
has an advantage over one that has traditionally not presented anal-
yses of all the data on which conclusions are based.”
So meta-analysis is likely t o have an objectivity that is inevitably
lacking in literature reviews and can also achieve greater precision
and generalisability of findings than any single study. Consequently,
it is not surprising that the technique has become one of the great-
est growth areas in medical research. Some examples of its use are
given in Chalmers (1987); Louis: Fienberg and Mosteller (1985), and
DerSimonian and Laird (1986). There remain, however, sceptics who
feel that the conclusions drawn from meta-analysis often go beyond
what the techniques and the data justify. Some of their concerns
are echoed in the last part of the definition with which this chapter
began, and in the following quotation from Oakes (1993);
EiTltles and UI
References
from completed
Non-RCT L
rejects
Rejects
Master file
Quality scored with full paper
using Form 6 for ~ ~ ~ ~ ~ ~ ~ ~ r ~ d f o
blinded methods, blinded methods,
Form C tor blinded Form C for blinded
results with full results with full
paper. and Form A paper, and Form A
for blinded data for blmded data
extraction extraction
question (e.g., the same drug, disease and setting for studies following
a common protocol) or a more generic problem (e.g.: a broad class of
treatments for a range of conditions in a variety of settings). Pocock
suggests that the broader the meta-analysis, the more difficulty there
is in interpreting the combined evidence as regards future policy;
consequently, the broader the meta-analysis the more it needs to be
interpreted qualitatively rather than quantitatively,
The reliability of a meta-analysis will depend on the quality of
the data in the included studies. In meta-analyses of clinical tri-
als, for example, adherence to recognised criteria of acceptability
(blinding, randomisation, analysis by intention to treat, etc.) should
be looked for, and any relaxation of such standards requires careful
consideration as to whether the consequent precision of more data
is counter-productive, given the increased potential for loss of cred-
ibility. Determining quality would be helped if the results from so
many trials were not so poorly reported. In the future, this may
be improved by the Consolidation of Standards for Reporting Trials
(CONSORT) statement ( B e g et al., 1996). The core contribution of
the CONSORT statement consists of a flow diagram (see Fig. 10.2)
and a checklist (see Table 10.1). The flow diagram enables review-
ers and readers to quickly grasp how many eligible participants were
randomly assigned to each arm of the t,rial. Such information is fre-
quently difficult or impossible t o ascert,ain from trial reports as they
are currently presented. The checklist identifies 21 items that should
be incorporated in the title, abstract, int.roduction, methods, results
or conclusion of every randomised clinical trial.
Ensuring that a meta-analysis is truly representative can be
problematic. It has long been known that journal articles are not
a representative sample of work addressed to any particular area of
research (see, for example, Sterlin, 1959, Greenwald, 1975 and Smith,
1980). Significant research findings, in particular, are more likely to
find their way into journals than non-significant results. An infor-
mal method of assessing the effect of publication bias is the so-called
funnel plot, in which effect size from a study is plotted against the
Met a- Analysis 285
1
Followed up In = ...I Followed up (n = ...I
Timing of primary and Timing of primary and
secondary outcomes secondary outcomes
Withdrawn (n = ...I
Intervention ineffective In = ...I
Withdrawn (n = .. J
Intervention ineffective In = ...I I
Lost to iollow-up In = ...I Lost to follow-up (n = ...)
Other In = ...I Other I n = ...I
I
Completed trial (n = ...I
160 - Y
120 -
X
wIn
.-
X
80-
5
0
0
40 - x o o 0 0
X 0
:0 0 0
0
0 x i O?dQ
0,"= O O
XP X X % ; F
0 I " I . . ' 1 . 1 I ' * I " I
160 - X
120 -
X
a,
N
.-
v) X
$ 80-
E
UY
X
40 - X
X X
X
X
xmxx
x
xx x xfX * x : r
0 , - . [ I T - , , . 1 , 7 . I ) C ' V l
(b)
Fig. 10.3. Funnel plot.
290 Design and Analysis of Clinical Trials
1980
0.01 0.I 1 10
- 100
2 Swain 1981
3 Pam 1982
4 Rurgeens 1982
5 MacLeOd 1983
6 Jensen 1984
7 Kernohan 1984
8 Goudie 1984
9 Freilas 1985
10 Swain 1986
1 1 0Brten 1986
12 Krels 1987
13 Brearley 1987
14 Moreto 1987
15 Laine 1987
16 Panes 1987
17 Chung 1987
18 Balanzo 1988
19 Fellenon 1989
20 Angerinas 1989
21 Rutgeens 1989
22 Chozzini 1989
23 Lame 1989
24 Acabvxhi 1990
25 Oxner 1992
Overall
Favours Treatment Favours Control
Fig. 10.4. Confidence intervals for odds ratios from a series of randomised
trials of endoscopic treatment of bleeding peptic ulcer (from Sacks et al.,
1990).
i= 1
[irl I
-1/2
SE.(Y) =
I
N I
N
0 Under the assumption that the studies are a random sample from a larger
population of studies, there is a mean population effect size, say 4, about
which the study-specific effect sizes vary. Thus, even if each study’s
results were based on sample sizes so large that the standard errors of the
Y s were zero, there would still be study-to-st,udy variation because each
study would have its own underlying effect size (i.e., its own parameter
value).
Met a-Analysis 293
0 Let D denote the variance of the studies' effect sizes (a quantity yet to
be determined) and let Q denote the following statistic for measuring
study-to-study variation in effect size:
N
Q = CW,(y,
-Y ) 2
i= 1
D=O ifQ<N-1
u = ( N - 1) [w - 34
with W and S& being the mean and variance of the Ws.
0 An approximate l O O ( 1 - a)% confidence interval for I$ is:
I
N I
N
where
i= 1
294 Design and Analysis of Clinical %als
N
Q = CWz(y,- Y ) Z
a= 1
random effects model for the studies is the appropriate one. This
research question implicitly assumes that there is a population of
studies from which those analysed in the meta-analysis were sam-
pled, and anticipates future studies being conducted or previously
unknown studies being uncovered. In contrast, when the research
questions concern whether treatment has produced an effect, on the
average in the set of studies being analysed, then the fixed effects
model is the one-tc-use. Meier (1987) and Pet0 (1987) present fur-
ther arguments in favour of one or other of the models.
Oakes (1993) also considers the methodological arguments that
can arise in the application of meta-analysis, but in addition, ad-
dresses the wider question of whether the statistical combination of
results from different trials is legitimate at all. He makes the point
that the principal feature of statistical inference is the argument from
sample to population. For the argument to have pretensions to strict
296 Design and Analysis of Clinical Trials
Table 10.3. Results from Nine Trials Comparing SMFP and NaF
Toothpastes in Prevention of Caries Development.
death after a myocardial infarction (the data are taken from Fleiss,
1993). In this case, a suitable measure of effect size is the logarithm of
the odds ratio (see Chapter 4). The meta-analysis results are given
in Table 10.5. The fixed effects model indicates a positive effect
for aspirin, but using the random effects approach, the confidence
interval for the overall odds ratio expands to include the value one,
indicating no effect.
Meta- Analysis 299
= 0.104
and a 95% confidence interval for the logarithm of the population odds ratio
is (0.039,0.169).
With respect to the odds ratio itself, the point estimate is 1.109 and the
confidence interval is (1.040,1.184).
W = 129.85
S& = 56363.38
U = 345.02
D = 0.008
Q = 8.76
Y' = 0.1158
The 95% confidence for the logarithm of the population odds ratio is
(-0.0025,0.2341).
With respect to the odds ratio itself, the point estimate is 1.123 and the
confidence interval is (0.997,1.263).
(10.1)
82 - N(6,T2) 2 = 1,.. . , N (10.2)
where 13i is the true but unknown effect in the ith study and 4 is
the unknown population effect; it is the latter quantity which is of
most interest, since it represents the pooled effect indicated by the
studies. Finally, g2 is the population variance: or the between-study
variability, and is also of interest since it is a measure of how variable
the effect is within a population. In the case where T~ = 0, a fixed
effect model is obtained.
In a Bayesian setting, prior distributions would be needed for
the unknown parameters of the model. A normal prior is typically
specified for 417. Although a prior for r could be specified, for ex-
ample specifying the precision, l / ~as, being gamma distributed (see
Section 9.3 for such a specification for the variance of a random effect
for frailty), such a specification is not necessary. With a prior for 7,
n(7>,the posterior distribution for T given the observed effect sizes
f(71Y) is proportional to:
+
where Si = r2 (o?/ni)
Assuming a uniform distribution for T ( T ) , the posterior dist.ribu-
tion can then be approximated by a discrete distribution calculated
Meta-Analysis 301
over, say, 100 equally spaced points that straddle its mode and with
a range sufficient to include all the values in which the density is a t
least 1%of that at the mode (Hedges, 1998). The simple sum of the
100-point densities provides the normalising constant.
With f ( ~ 1 Yapproximated
) by this discrete distribution, a. pos-
terior density for 4 J Y is easily calculated as a finite mixture of the
conditional densities g(qIY,T ) . This conditional density is Normal
with mean:
(10.4)
and variance
a 2 ( r )= c-kj’ (10.5)
that the particular choice of prior for 7 may have a substantial im-
pact. It is therefore important to examine the sensitivity of findings
to different choices of prior.
Kass and Wasserman (1996) provide some useful background for
the choice of prior distributions. For T = 0, the Bayesian estimates
of the study specific effects are all equal to the common fixed effect
estimate. When T is very large in proportion to the within study sam-
pling variance, then the Bayesian estimates approach their observed
values. For intermediate values of T ,the estimates lie between their
observed values and the overall common mean - they are ‘shrunken’
towards the mean, the studies with the more uncertain effect sizes
(usually the smaller studies) being more shrunken. Therefore pri-
ors for T that are more concentrated on zero will generate greater
shrinkage than more diffuse priors.
However, Bayesian methods are relatively simply modified to con-
sider more general random effects specifications, e.g., to examine the
implications of using study effect size distributions with heavier tails,
such as Student’s t or gamma distributions (e.g., Smith et al., 1995).
Analyses with an informative prior for the effect size are also a little
more complex.
The analyses above have assumed that the contributing studies and
their corresponding effect sizes Y , are exchangeable. No study-level
covariates have been considered and are conditional on the 13i and
the Y , being independent. Non-independence may arise through a
clustering of contributing trials by centre, by an overlap of patients
across trials, or sometimes by the presence of single studies that
provide more than one estimate of effect. In the more complex cases,
the question arises as to whether a more effective analysis might be
achieved by combining data at the level of the individual subject,
i.e., data pooling, rather than at the level of the trial. Senn (1998)
Met a-Analysis 303
1
1
1
1
1
1
1
1
'I
I
1
1
1
1
L i
Favours Treatment Favours Control
Fig. 10.5. Confidence intervals for odds ratios from a series of randomised
trials of endoscopic treatment. of bleeding peptic ulcer presented in cumu-
lative form (form Chalmers and Lau, 1993).
10.7. SUMMARY
Software
305
306 Design and Analysis of Clinical Dials
311
312 Design and Analysis of Clinical Trials
Fairclough, D.L., Peterson, H.F. and Chang, V. (1998). Why are miss-
ing quality of life data a problem in clinical trials of cancer therapy?
Statist. Med. 17: 667-678.
Falissard, B. and Lellouch, J . (1992). A new procedure for group sequential
analysis in clinical trials, Biornetrics 48: 373-388.
Farewell, V.T.(1982). The use of mixture models for the analysis of survival
data with longterm survivors. Biometrics 38: 1041-46.
Farewell, V., Fleming, T.R. and Harrington, D.P. (1991). Counting Pro-
cesses and Survival Analysis, John Wiley and Sons, New York.
Feinstein, A.R. (1991). Intention-to-treat policy for analyzing randomized
trials: Statisticad distortions and neglected clinical challenges, Chap-
ter 28. In Patient Compliance in Medical Practice and Clinical Trials
(eds. by J.A. Cramer and B. Spilker), Raven Press Ltd, New York.
Finkelstein, D.M. and Schoenfeld, D.A. (1994). Analysing survival in the
presence of an auxiliary variable, Statist. Med. 13: 1747-1754.
Finney, D.J. (1990). Repeated Measurements: what is measured and what
repeats? Statist. Med. 9: 639-644.
Firth, D. (1987). On the efficiency of quasi-likelihood estimation, Bio-
metrika 74: 233-246.
Fisher, R.A. and MacKenzie, W.A. (1923). Studies in crop variation: 11.
The manurial response of different potato varieties, J. Agri. Sci. 13:
311-320.
Fitzmaurice, G.M., Laird, N. and Rotnltzky, A. (1993). Regression models
for discrete longitudinal responses, Statist. Sci. 8: 2844309.
Fleiss, J.L. (1986). The Design and Analysis of Clinical Experiments,
Wiley, New York.
Fleiss, J.L. (1993). The statistical basis of meta-analysis, Statist. Meth.
Med. Res. 2: 121-145.
Follman, D. (1995). Multivariate tests for multiple endpoints in clinical
trials. Statist. Med. 14: 1163-1175.
Fox, W., Sutherland, I. and Daniels, M. (1954). A five-year assessment of
patients in a controlled trial of streptomycin in pulmonary tubercolo-
sis, Quart. J . Med. 23: 347.
Freedman, L.S. (1982). Tables of the number of patients required in clinical
trials using the logrank test, Statist. Med. 1: 121-130.
318 Design and Analysis of Clinical Trials
Lee, Y.J. (1983). Quick and simple approximations of sample sizes for
comparing two independent binomial distributions: Different-sample
size case, Biometrics 40: 239-242.
Lee, Y.J., Ellenberg, J.H., Hirtz, D.G. and Nelson, K.B. (1991). Analysis
of clinical trials by treatment actually received: is it really and option?
Statist. Med. 10: 1595-1605.
Levine, R.J. (1981). Ethics and Regulations of Clinical Research, Urban
and Schwartzenberg, Baltimore.
Liang, K.-Y. and Zeger. S.L. (1986). Longitudinal data analysis using
generalized linear models, Biometrika, 73: 13-22.
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression
analyses for categorical data (with discussion), J. Roy. Statist. SOC.
B54: 3-40.
Lin, D. and Wei, L.J. (1989). The robust inference for the Cox proportional
hazards model, J. Am. Statist. Assoc. 84: 1074-1078.
Lind, J. (1753). A Treatise of the Scurvey reprinted in Lind’s Treatise on
Scvrwey (eds. C.P. Stewart and D. Guthrie), Edinburgh University
Press, Edinburgh, (1953). Sands, Murray, Cochran, Edinburgh.
Lindley, D.V. (1998). Decision analysis and bio-equivalence trials, Statist.
Sci. 13: 136-141.
Lindsey, J.K. (1993). Models f o r Repeated Measurements, Oxford Science
Publications, Oxford.
Lindsey, J.K. and Lambert, P. (1998). On the appropriateness of marginal
models for repeated measurements in clinical trials, Statist. Med. 28:
447-469.
Lipsitz, S.R. and Fitzmaurice, G.M. (1996). Estimating equations for mea-
sures of association between repeated binary responses, Biometrics 52:
903-91 2.
Lipsitz, S.R., Laird, M. and Harrington, D.P. (1991). Generalized estimat-
ing equations for correlated binary data: using the odds ratio as a
measure of association, Biometrika 78: 153-160.
Lipsitz, S.R., Fitzmaurice, G.M., Orav, E.J. and Laird, N.M. (1994). Per-
formance of generalized estimating equations in practical situations,
Biometrics 50: 270-278.
Little, R.J.A. (1995). Modelling the dropout mechanism in repeated mea-
sures studies, J. Am. Statist. Assoc. 90: 1112-1121.
References 323
Little, R.J.A. and Rubin, D.B. (1987). Statistical Analysis with Missing
Data, Wiley, New York.
Louis, T.A., Fienberg, H.V. and Mosteffer, F. (1985). Findings for public
health for meta-analysis, Ann. Rev. Public Health 6: 1-20.
Lubsen, J. and Pocock, S.J. (1994). Factorial trials in cardiology: Pros and
cons, Eur. Heart J. 15: 585-588.
Machin, D. (1994). Discussions of The What, Why and How of Bayesian
Clinical Trials Monitoring, Statist. Med. 13: 1385-1390.
Malmberg, K. (1997). Prospective randomized study of intensive insulin
treatment on long term survival after acute myocardial infarction in
patients with diabetes mellitus Br. Med. J. 314: 1512-1515.
Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of
dates from retrospective studies of disease, J. Natl. Cancer Inst. 22:
719-748.
Marubini, E. and Valsecchi, M.G. (1995). Analysing Survival Data from
Clinical Trials and Observational Studies, Wiley, New York.
Matthews, J.N.S. (1993). A refinement to the analysis of serial data using
summary measures, Statist. Med. 12: 27-37.
Matthews, J.N.S. (1994). Discussion of Diggle and Kenward, Appl. Statist.
43: 49-93.
Matthews, J.N.S., Altman, D.G., Campbell, M.J. and Royston, P. (1989).
Analysis of serial measurements in medical research, Br. Med. J. 300:
23&235.
Mauritsen, R.H. (1984). Logistic regression with random effects, Unpub-
lished PhD thesis, Department of Biostatistics, University of Wash-
ington.
McCullagh, P. (1983). Quasilikelihood functions, Ann. Statist. 11: 59-67.
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd
ed., Chapman and Hall, London.
McHugh, R.B. and Lee, C.T. (1984). Confidence estimation and the size
of a clinical trial, Contr. Clan. Trials 5 : 157-164.
McPherson, K. (1982). On choosing the number of interim analyses in
clinical trials, Statist. Med. 1: 25-36.
Medical Research Council. (1931). Clinical trials of new remedies (anno-
tations), Lancet 2: 304.
Meinert, C.L. (1986). Clinical Trials - Design, Conduct and Analysis,
Oxford University Press, New York.
324 Design and Analysis of Clinical Trials
Pocock, S.J. (1992). When to stop a clinical trial, Br. Med. J. 305: 235-
244.
Pocock, S.J. (1996). Clinical Trials: A statistician’s perspective. In Ad-
vances in Biometry (eds. P. Armitage and H.A. David), Wiley, Chich-
ester.
Pocock, S.J. and Abdalla, M. (1998). The hope and the hazards of using
compliance data in randomized controlled trials, Statist. Med. 17:
249-250.
Pocock, S.J., Geller, N.L. and Tsiatis, A.A. (1987). The analysis of multiple
end-points in clinical trials, Biometrics 43: 487-498.
Pregibon, D. (1981). Logistic regression diagnostics, Ann. Statist. 9: 705-
724.
Prentice, R.L. (1973). Exponential survivals with coronary and explanatory
variables, Biometrika 60: 279-288.
Prentice, R.L. (1988). Correlated binary regression with covariates specific
to each binary observation, Biometrics 44: 1033-1048.
Prentice, R.L. (1989). Surrogate endpoints in clinical trials: Definitions and
operations criteria, Statist. Med. 8: 431-440.
Prentice, R.L. and Zhao, L.P. (1991). Estimating equations for parame-
ters in means and covariances of multivariate discrete and continuous
responses, Biometrics 47: 825-883.
Qaqish, B.F. and Liang, K.-Y. (1992). Marginal models for correlated
binary responses with multiple classes and multiple levels of nesting,
Biometrics 48: 939-950.
Rabe-Hesketh, S. and Everitt, B.S. (1999). The effect of misspecifying the
covariance structure when analysing longitudinal data. In preperation.
Reid, N. and Crepeau, H. (1985). Influence functions for proportional haz-
ards regression model. Biometrika 72: 1-9.
Ridout, M.(1991). Testing for random dropouts in repeated measurement
data, Biometrics 47: 1617-1621.
Roberts, G. (1996). Markov chain concepts related to sampling algorithms
Markov Chain Monte Carlo in Practice (eds. W.R., Gilks, S., Richard-
son and D.J., Spiegelhalter), Chapman and Hall, London, pp. 45-58.
Robins, J.M. (1998). Correction for non-compliance in equivalence trials,
Statist. Med. 17: 269-302.
Robins, J.M. and Rotnitsky, A. (1992). Recovery of information and ad-
justment for dependent censoring using surrogate markers. In Aids
References 327
Spiegelhalter D. J. Thomas, A., Best, N. and Gilks, W-. (1996). BUGS 0.5
Examples. Vol. 1. MRC Biostatistics Unit, Cambridge, England.
Spiegelhalter, D.J., Freedman, L.S. and Parmar, M.K.B. (1994). Bayesian
approaches to randomized trials, J. Roy. Statist. SOC.Ser. A157:
357-4 16.
Stanley, K.E. (1980). Prognostic factors for survival in patients with inop-
erable lung cancer. J. Nat. Can.cer Inst. 65: 25-32.
Sterlin, T.D. (1959). Publication decisions and their possible effects in
inferences drawn from tests of significance - or vice-versa, J . Am.
Statist. Assoc. 54: 30-34.
Starer, B.E. and Crowley, J. (1985). A diagnostic for Cox regression and
general conditional likehoods. J. Am. Statist. Assoc. 80: 139-147.
Sutton, H.G. (1865). Ca.ses of rheumatic fever, Guys Hospital Report, 11,
pp. 392428.
Tango, T. (1998). A mixture mode1 to classify individual profiles of re-
peated measurements. In Data Science, Classification and Related
Methods (eds. C . Hayashi, N. Oksumi, K. Yajima, Y. Taraka, H.H.
Bock and Y. Baba), Springer, Tokyo, pp. 247-254.
Thall, P.F. and Vail, S.C. (1990). Some covariance models for longit,udinal
count data with overdispersion, Biometrics 46: 657-671.
Therneau, T.M., Grambsch, P.M. and Fleming, T.R. (1990). Martingale
hazard regression models and the analysis of censored survival data,
Biometrika 77:147-160.
Thompson, S.G. and Pocock, S.J. (1991). Can meta-analysis be trusted?
Lancet 338: 1127-1130.
Thompson, S.G. (1993). Controversies in meta-a.nalysis: the case of the
trials of serum cholesterol reduction, Statist. Meth. hled. Res. 2:
173-192.
Troxel, A.B., Fairclough, D.L., Curran, D. and Hahn, E.A. (1998). Sta-
tistical analysis of quality of life with missing data in cancer clinical
trials, Statist. hfed. 17: 653-666.
Tsiatis A.A. (1981). The asymptotic joint distribution of the efficient
scores test for the proportional hazards model calculated over time,
Biometrika 68: 311-315.
Unit,ed States Congress. (1962). Drug amendments of 1962, Public Law,
87-781, ~ 1 5 2 2Washingt.on.
,
330 Design and Analysis of Clinical Trials
333
334 Index