Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Contemporary Clinical Trials 109 (2021) 106538

Contents lists available at ScienceDirect

Contemporary Clinical Trials


journal homepage: www.elsevier.com/locate/conclintrial

Statistical considerations of phase 3 umbrella trials allowing adding one


treatment arm mid-trial
Yixin Ren *, Xiaoyun Li, Cong Chen
Biostatistics and Research Decision Sciences, Merck & Co., Inc., Kenilworth, NJ 07033, USA

A R T I C L E I N F O A B S T R A C T

Keywords: Master protocols, in particular umbrella trials and platform trials, when evaluating multiple experimental
Master protocols treatments with a common control, could save patient resource, increase trial efficiency, and reduce drug
Umbrella trials development cost. Compared to the phase 3 platform trials that allow unlimited number of experimental arms to
Optimal allocation ratio
be added, it is more practical for individual companies to evaluate two experimental arms with a common
Familywise error rate
Concurrent control
control in an umbrella trial and allow the second experimental arm to be added at a later time. There have been
Non-concurrent control limited research done in this type of trials in terms of statistical properties and guidance. In this article, we
Rolling arms present statistical considerations of a phase 3 three-arm umbrella design including Type I error control and
power, as well as the optimal allocation ratio. We intend to not only complement the existing literature, but more
importantly to provide practical guidance to pave the way for its implementation by individual companies.

1. Introduction umbrella study starting with only one experimental arm and its control
arm, with the potential to add a second experimental arm mid-trial
Umbrella trial is a type of master protocol where multiple therapies would still have the benefit of saving patient resources and reducing
would be evaluated in the same trial, while platform trial adds the enrollment competition. Despite the efficiencies of being able to add a
perpetual feature to an umbrella trial. When designed and implemented treatment arm to an ongoing clinical trial, there is no clear methodo­
properly, umbrella and platform trials could increase the efficiency of logical guidance on this topic [6]. In this article, we provide detailed
drug development ( [22]; FDA guidance). Compared to the traditional statistical considerations and guidance such as type I error, power, and
clinical trials that evaluate one experimental treatment at a time, an the optimal randomization ratio for this three-arm umbrella phase 3
umbrella or platform trial with common control has the potential of study design.
reducing the total number of patients allocated to the control arm, Several key statistical considerations arise in a phase 3 umbrella/
making it more appealing to patients and drug developers. It could also platform trial with common control [6]. The first one is the need for
gain operational efficiency when adding a new experimental arm uti­ family-wise error rate (FWER) control. There have been long debates
lizing the existing trial infrastructure. Although there has been a lot of whether FWER should be controlled at the study level. If two experi­
excitement in the concept of master protocols, very few Phase 3 um­ mental arms are evaluated in the same trial for efficiency purposes and
brella trials have been conducted especially by individual companies. the objectives are to compare each experimental arm to the control arm
The only few phase 3 platform trials were conducted are by third-party without comparing between experimental arms, it can be considered
contracting research organizations and co-sponsored by multiple phar­ similar as conducting two separate trials and therefore FWER is not
maceutical companies [24]. There are no individual company-led phase necessary [9]. Bennett and Mander [4] only evaluated the scenario
3 platform trials so far. Individual companies usually only have limited where multiplicity adjustment is required. In this article, we evaluate
number of investigational regimens moving into Phase 3 within a certain both cases, with or without FWER control. The second key statistical
timeframe. The perpetual feature of a platform trial may not be bene­ consideration is whether to include non-concurrent control patients in
ficial in individual company-led studies. It is more common that a the analysis. We consider three different ways of utilizing the control
pharmaceutical company may have few (two or three) investigational patients in the analysis. At the same time, the optimal allocation ratio
treatment regimens of interest within similar time period. A Phase 3 when adding a new treatment arm is often of interest, which we will

* Corresponding author.
E-mail address: yixin.ren@merck.com (Y. Ren).

https://doi.org/10.1016/j.cct.2021.106538
Received 24 May 2021; Received in revised form 2 August 2021; Accepted 6 August 2021
Available online 9 August 2021
1551-7144/© 2021 Elsevier Inc. All rights reserved.
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

evaluate in this article. arm i is denoted as RAi = n0A


= 2(ni − pi ni )+rpi ni
= 2− 2pi + rpi . For
ni ni
The article is organized as follows: We lay out the general setup of simplicity, we assume the sample sizes of the two experimental arms are
the 3-arm umbrella trial design in Section 2 and evaluate the type I error the same in the rest of the article, which is common in same indication
in various scenarios in Section 3. In Section 4 we evaluate the power and setting as the target treatment effects for both experimental arms are
optimal allocation ratio. Section 5 provides summaries and discussions. likely to be the same. When n1 = n2 = n, then noC1 = n0C2 = n0C, and p1 =
Although the setup in this article is for a 3-arm umbrella trial, it can be p2 = p. The sample size of all control arm n0A can be written as 2(n − n*)
generalized to a 4 or more-arms umbrella trial. N
+ rn*. Since N = n0A + 2n, we can get n = 4− 2p+rp . Thus, the sample size
ratio between the concurrent control and each experimental arm is RC =
2. General setup
1 − p + rp; and the sample size ratio between all control patients and
each experimental arm is RA = 2 − 2p + rp.
Consider a three-arm umbrella study with one common control and
There is no universal way to define the study power in a multi-arm or
up to two experimental arms. The control arm will continue enrollment
umbrella trial setting. Bennett and Mander [4] defined the overall study
if there is at least one experimental arm open for enrollment. When both
power slightly different, as the probability of detecting both treatments
experimental arms are open for enrollment, the control arm will be
that are better than the control. One may also be interested in evaluating
shared in randomization with both experimental arms. We assume the
the probability of correctly detecting the effective treatment when there
study starts with one experimental arm and one control arm with equal
is only one active treatment. This would be a topic of evaluating multiple
randomization ratio for the initial optimal power in a two-arm setting
hypotheses. In this article, in order to benchmark and compare with the
and have the feature of adding one experimental arm mid-trial. When
traditional two independent two-arm trials, we define the overall study
the second experimental arm is added, randomization ratio between
power to be the probability of detecting at least one effective treatment
control arm and each experimental arm would be r (see Fig. 1). When
when both experimental arms are active. For simplicity, we assume the
enrollment of the first experimental arm is complete, randomization
target treatment effect is the same in both experimental arms.
ratio will change back to 1:1 for optimal power for the remaining
When one experimental arm is added mid-trial with shared control,
experimental and control arms. We denote the total sample size of the
the most conservative approach is to only include the concurrent control
control arm as n0A, and the sample size of the concurrent control cor­
in the analysis with the corresponding experimental arm. Non-
responding to experimental arm 1 as n0C1 and to experimental arm 2 as
concurrent control is part of the control arm in the same study with
n0C2 (see Fig. 1). We consider a continuous outcome as our primary
same inclusion/exclusion criteria. In situations where the medical
endpoint in the remainder of the article, though the evaluations can be
landscape is very stable and there is little concern on the treatment effect
easily extended to binary endpoints. Let X0j be the outcome of patient j in
shift over time, and patients were enrolled in a relatively short period, or
the control arm, where j = 1, …, n0A, and Xik denote the outcome of
in the rare disease setting where patient resource is scarce, one may
patient k in experimental arm i, where k = 1, …, ni and i = 1, 2. We
consider including all control arm patients in the analysis [12] to in­
assume all X’s have equal variance σ 2. Let Xi be the sample mean for crease the estimate precision and power. In this article, we evaluate
experimental treatment group i, for i = 1, 2; X0C1 , X0C2 and X0A be the three different scenarios of the control data use: 1) only include con­
samples means for concurrent randomized control for experimental current randomized control for comparison with each experimental arm;
arms 1, 2, and all control, respectively. Let pi be the proportion of the 2) The first enrolled experimental arm (Arm 1) uses only concurrent
n*
overlap enrollment for arm i, i.e., pi = ni , where n* is the number of control while the later added experimental arm (Arm 2) utilizes all
patients in each experimental arm during the overlap. When p1 = p2 = 0, available control data; 3) both experimental arms use all control (both
it corresponds to two independent two-arm trials; and p1 = p2 = 1 cor­ concurrent and non-current) data. Scenarios 2 and 3 could be considered
responds to a conventional three-arm trial with common control. Fig. 1 in the trial analysis when there is no significant difference in the
illustrates the key parameters in the diagram. outcome data in the control arm before and after adding the new
We assume the total sample size of the study is fixed at N such that, in experimental arm. Examples of including both concurrent and non-
a traditional three-arm trial setting with equal randomization, each concurrent control data in the analysis include I-SPY2 [3] and the
experimental arm has 1 − β power to detect the standardized treatment 2NN study [19].
effect Δ with individual type I error controlled at α (one-sided),
6(Zα +Zβ )
2
3. Type I error evaluation
respectively, i.e., N = Δ2
. Here α and β are the type I error and type
II error for each experimental arm comparison to the control arm. Then, There have been long debates on the need for multiplicity adjustment
for a traditional three-arm trial, the sample size for each arm is N/3, and in multi-arm studies and master protocols with common control in the
for two independent two-arm trials, the sample size for each arm is N/4. past several decades ([9,11,16; 7; 5,15,18]). In general, when multiple
Based on the illustration in Fig. 1, n0A can be written as (n1 + n2 − 2n*) experimental arms are combined into one trial with a common control to
*
+ rn*. Combining with pi = nni and N = n0A + n1 + n2, we can get n* = improve operational efficiency, there is no need to control the family
2 2
N
. The sample size ratio between the concurrent control and wise error rate (FWER). However, if multiple experimental arms are for
2+r
p1 +p2 −
n0Ci ni − pi ni +rpni
the same or related claims, multiplicity adjustment would be required so
experimental arm i is denoted as RCi = ni = ni = 1 − pi + rpi ; that the FWER would not exceed the overall type I error rate. For
and the sample size ratio between all control patients and experimental example, when different formulations (sequential administration of
multiple single drug entity vs. co-formulation) or routes (oral vs.
Intravenous) of the same treatment are studied, and the development of
one option is slightly behind the other; or both the investigational drug
and its add-on to the standard of care (SOC) are being considered and the
goal is to only select the best regimen for regulatory approval, FWER
should be controlled within the type I error α. FWER is defined in the
traditional way, i.e., the probability of declaring at least one positive
treatment arm when both experimental arms are inactive. In this sec­
tion, we evaluate the FWER, as well as the probability of declaring both
experimental arms positive when neither is active under the case where
1) FWER control is not required (Section 3.1), and 2) FWER control is
Fig. 1. Randomized Phase 3 Umbrella Trial Design.

2
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

required (Section 3.2). If all control data are used for Arm 2, the test statistic for Arm 2 would
be:
3.1. Without FWER control X 2 − X 0A
ZH2 = √̅̅̅̅̅̅̅̅̅̅̅̅,
In this sub-section, we evaluate the situation where FWER does not σ n10A + 1n
need to be controlled at α. Each experimental arm to control arm com­
parison has its full level of type I error α. We evaluate each of the sce­ where (ZH1, ZH2)′ ~ BVN(0, ΣH), and ΣH is a 2 × 2 matrix with diagonal
narios of control data use respectively. entries equal to 1 and off-diagonal entries equal to ρH, with ρH =
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
Scenario 1 (only concurrent randomized control data used for com­ RC RA rp
(1+RC )(1+RA )⋅RC RA .
parison): We consider only using the concurrent randomized control
The FWER is
data when conducting the analysis between each experimental arm and
the control arm. This is a more conservative approach that maintains ∫
Z1− α ∫
Z1− α

true randomization between the control arm and each experimental arm

FWERH = 1 − ΦH ((zH1 , zH2 ) , r, p)dZH1 dZH2 ,
and is commonly used. − ∞ − ∞
The standardized test statistic comparing each experimental arm i, i
= 1, 2 to its concurrent control can be written as: where ΦH((zH1, zH2)′ , r, p) is the density function of the bivariate normal
distribution with mean 0 and variance covariance matrix ΣH. The
X i − X 0C
ZCi = √̅̅̅̅̅̅̅̅̅̅̅̅̅, i = 1, 2, probability of making two errors under null hypothesis is:
1
σ n0C + 1n
∫∞ ∫∞

P(both tested positive |both inactive) = ΦH ((zH1 , zH2 ) , r, p)dZH1 dZH2
where (ZC1, ZC2) follows a standardized bivariate normal distribution

Z1− α Z1− α
BVN(0, ΣC) under the null hypothesis, with mean 0 and variance of 1,
rp (2)
and the correlation between ZC1 and ZC2 is ρC = RC (1+R C)
.Here ΣC =
[ ] Scenario 3 (all-control): In this scenario, we consider using data from
1 ρC
ρC 1 both concurrent and non-concurrent control for the analyses of both
experimental arms. The corresponding test statistic for comparing
The FWER can be written as:
experimental treatment i, i = 1, 2 to control, is given by

Z1− α ∫
Z1− α
′ X i − X 0A
FWERC = 1 − ΦC ((zC1 , zC2 ) , r, p)dZC1 dZC2 , ZAi = √̅̅̅̅̅̅̅̅̅̅̅̅,
− ∞ − ∞ σ n10A + 1n

where ΦC((zC1, zC2)′ , r, p) is the density function of the bivariate normal and (ZA1, ZA2)′ ~ BVN(0, ΣA), where 0 is the zero vector of length 2, ΣA is
distribution with mean 0 and variance covariance matrix ΣC, and Z1− α is a 2 × 2 matrix with diagonal entries equal to 1 and off-diagonal entries
the percentile of a standard normal which satisfies P(X < Z1− α) = 1 − α. rp
equal to ρA, where ρA = RA (1+R . The FWER can then be written as:
Since both experimental arms share a portion of the control arm pa­ A)

tients, when the control arm patients performed worse than expected ∫
Z1− α ∫
Z1− α

due to randomness, the probability of declaring both experimental arms



FWERA = 1 − ΦA ((zA1 , zA2 ) , r, p)dZA1 dZA2 ,
positive under null may be higher, compared to that in two independent − ∞ − ∞
two-arm studies ( [11]; Collignon et al. 2019). Therefore, in addition to
FWER, we would also like to investigate the probability of making two where ΦA((zA1, zA2)′ , r, p) is the density function of the bivariate normal
errors (i.e., declaring both experimental arms positive when neither are distribution with mean 0 and variance covariance matrix ΣA. The
active) in this trial design. The probability of making two errors under probability of making two errors under null hypothesis is
the null hypothesis is: ∫∞ ∫∞

∫∞ ∫∞ P(both tested positive |both inactive) = ΦA ((zA1 , zA2 ) , r, p)dZA1 dZA2

P(both tested positive |both inactive) = ΦC ((zC1 , zC2 ) , r, p)dZC1 dZC2 Z1− α Z1− α

Z1− α Z1− α (3)


(1)
Scenario 2 (hybrid): In this scenario, analysis of the initial experi­
mental arm (Arm 1) would only include all its corresponding concurrent
control patients (i.e., all the originally planned control patients for Arm
1 in the original design), while the later added experimental arm (Arm
2) would include all control patients in the comparison, including con­
trol patients enrolled prior to the enrollment start of Arm 2. This is a
more practical scenario in reality, especially if Arm 1 and Arm 2 are two
experimental arms of their own interests (for example, two experimental
therapies with different mechanism of actions), and the only reason of
including them in the same clinical trial is to gain operational efficiency
and reduce the total number needed in the control arm by sharing part of
the control arm patients. Since the Arm 1 data with its corresponding
concurrent control sample size is large enough per the statistical analysis
plan to detect the difference between Arm 1 and the control arm, this
analysis strategy would allow the sponsor to analyze Arm 1 with control
in a timely fashion. Since more control arm patients are included in the
analysis for the later added experimental arm (Arm 2), intuitively,
power for the experimental Arm 2 would increase. ZH1 is the same as ZC1. Fig. 2. Correlation between two test statistics under three scenarios.

3
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

Fig. 2 shows the correlations ρC, ρH, and ρA over the overlap p for r = where m = C, H, A represents concurrent, hybrid, and all control,
0.5, 1, 1.5, and 2 under three scenarios. It is clear that correlation in­ respectively. The critical value cm can be calculated using [4]. By
creases as overlap increases; and as more control data is used, correla­ replacing the critical value Zα with cm, m = C, H, A in Eqs. ( [1,2], and
tion decreases. [3], the probabilities of making two errors with multiplicity adjusted can
Assume each experimental arm is tested at α = 2.5% (one-sided) also be calculated.
(typical for phase 3 studies), Fig. 3 shows the FWER of these three sce­ Fig. 4 shows the probability of making two errors under the null
narios of control data use. The black line represents the FWER of two hypothesis. The first row displays vales of r = 0.5, 1, 1.5, and 2 under all
independent two-arm trials, which is a fixed value of 1 − (1 − α)2; and three control scenarios without FWER control (Section 3.1). The second
the four dotted lines are for r = 0.5, 1, 1.5, and 2, respectively for the row displays the same scenarios with FWER control. The solid line in
proposed study design. The FWER is always smaller in the proposed each plot is the probability of making two errors under null with two
umbrella study compared to the traditional setting of two independent separate two-arm trials. Those probabilities are the same when p =
two-arm trials due to the positive correlation between the two test sta­ 0 without adjusting for multiplicity. Although the absolute values of the
tistics from shared control. probability of making two errors are small in all scenarios, the proba­
As the overlap parameter p increases, positive correlation increases bility increases as p increases; and it is larger when FWER control is not
and the FWER would decrease. When more control data is used, FWER required, compared to that when FWER control is required and in two
becomes similar for different values of randomization ratio r. For more separate two-arm trials with its own control. When adjusting for mul­
practical values of r (i.e., from 1 to 2), when overlap is minimum or tiplicity to control FWER, the probability of making two errors under the
moderate (say <0.6), FWER is similar for different values of randomi­ null hypothesis is smaller compared to the case of two independent two-
zation ratio r. When there is substantial overlap (say >0.6), FWER de­ arm studies with its own control when overlap is small. The probability
creases as r decreases. The patterns of FWER are similar in all three of making two errors under global null is slightly larger than that when
scenarios on the use of control. The more non-concurrent control data conducting two separate two-arm trial when overlap is large. This in­
included in the analysis, the smaller the correlation between the two test dicates that multiplicity adjustment has largely kept the probability of
statistics. When two separate two-arm trials are conducted, the corre­ making two errors under check. Moreover, the probability of making
lation between two test statistics is zero. As a result, FWER is the largest two errors under global null decreases as more control arm data are
in the two independent two-arm trial case, followed by the all control used, i.e., it is the largest under the concurrent control scenario and the
scenario, followed by the hybrid scenario, and is the smallest in the smallest under all control scenario.
concurrent control only scenario, assuming same randomization ratio r
and an overlap parameter p between 0 and 1. When p is 1, the FWER in 4. Power and optimal allocation
all three control scenarios are the same. When p is 0, the FWER in the
non-concurrent control scenario is the same as that in the two inde­ 4.1. Overall power without FWER control
pendent two-arm trials case.
We summarize and compare the probability of making two errors As introduced in the earlier section, we define the overall power as
under null hypothesis together with the scenarios where FWER control is the probability of at least one positive experimental arm when both
required in Section 3.4. experimental arms are active (assuming same target treatment effect Δ).
Scenario 1: When both treatment arms are compared with the con­
3.2. With FWER control current randomized control, without controlling the FWER, the overall
study power =1 − P(ZC1* < Zβ1*, ZC2* < Zβ2*), where
When the two experimental arms are for the same or related claims,
Z*Ci = ZCi − √̅̅̅̅̅̅̅̅
Δ ̅
, Zβ* = Zβ* for i = 1, 2 and satisfies
multiplicity adjustment of the two experimental arms comparison is 1 1
n0C +n
i

needed and the FWER must be maintained at α (one-sided). In this √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅


section, we evaluate the situation where FWER is controlled at α (one- 6RC ( )
(5)
Zα + Zβ* = Zα + Zβ
sided). We have FWER as: (4 − 2p + rp)(1 + RC )
∫− cm ∫− cm More explicitly, the overall study power can be written as
(4)

1− Φm ((zm1 , zm2 ) , r, p)dZm1 dZm2 = α
− ∞ − ∞

Fig. 4. Probabilities of making two errors under three scenarios without and
Fig. 3. Probabilities of family-wise error rate under three scenarios. with adjusting multiplicity.

4
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

∫Zβ* ∫Zβ* 4.3. Optimal allocation ratio r for overlapping enrollment period
(( )′ ) *
Overall Power = 1 − ΦC z*C1 , z*C2 , r, p dZC1 *
dZC2 (6)
Assume the portion of overlap p is given, the optimal allocation for
− ∞ − ∞
the overlapping enrollment period is obtained by maximizing the overall
Scenario 2: Under the hybrid case, the overall power =1 − P(ZH1* < study power for a fixed number of total sample size N. A straightforward
Zβ1*, ZH2* < Zβ2*), where ZH1* is the same as ZC1*, and ZH2* is centralized approach is to minimize the sum of the variances of Xi − X0 , without
by √̅̅̅̅̅̅̅̅
Δ ̅
1 1
. Zβ1* = Zβ* and Zβ2* satisfies considering their correlation or multiplicity adjustment subject to the
n0A +n

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ constraint of a fixed total sample size. The minimizer can be calculated
6RA ( ) directly for concurrent and all control cases. This approach is not
Zα + Zβ* = Zα + Zβ (7) applicable for the hybrid case because of the complexity and asymme­
2 (4 − 2p + rp)(1 + RA )
tricity in its structure of test statistics. The optimal allocation is
√̅̅̅̅̅̅̅
The overall power of the two definitions are 3− p− 1
r* = 1 + p for p ∈ (0, 1] under concurrent control case. For
∫Zβ* ∫β2
Z *
example, when p = 0.8, the optimal allocation ratio r* = 1.6. When p =
(( )′ ) *
Overall Power = 1 − ΦH z*H1 , z*H2 , r, p dZH1 *
dZH2 (8) 0, the second term of r* should be discarded, and the optimal r is 1 for
√̅̅̅
− ∞ − ∞ separate two-arm trials. When p = 1, the optimal allocation is 2, which
is the same as Dunnett’s suggestion under 3-arm trial with no delayed
Scenario 3: While all control data are used for both treatment arms,
entry. A more sophisticated approach is to incorporate the correlation
the overall power =1 − P(ZA1* < Zβ2*, ZA2* < Zβ2*), where ZA1* = ZA2*
between test statistics and multiplicity adjustment in the optimal allo­
= ZH2* More explicitly,
cation ratio calculation. There is no closed form derivation of the
Z * Z * optimal allocation ratio if we also consider the correlation between test
∫β2 ∫β2
(( )′ ) *
(9) statistics and the multiplicity adjustment, although we can evaluate the
Overall Power = 1 − ΦA z*A1 , z*A2 , r, p dZA1 *
dZA2
optimal allocation ratio via simulations.
− ∞ − ∞
We evaluate the optimal allocation ratio via closed form derivation
√̅̅̅̅̅̅̅
3− p− 1
4.2. Overall power with FWER control r* = 1 + p as well as via simulations. Results from both approaches
show that the optimal allocation ratio of the overlap period r is robust to
The overall power calculation under multiplicity adjustment is the choices of N and β. Fig. 6 shows the overall power when β = 30 % ,
similar to that without multiplicity adjustment by replacing Zα with cm 20 % , and 10% under concurrent control case without and with mul­
for m = C, H,and A in the formula [5,7] for obtaining Zβ* and Zβ2* for the tiplicity adjustment, respectively. The first row represents the results
concurrent, hybrid, and all control case, respectively. without multiplicity adjustment; and the second row shows the results
The first row of Fig. 5 shows the overall study power when β = 20% where FWER is controlled at α. The pattern is the same regardless of the
and r = 0.5, 1, 1.5, and 2 under all three control data use scenarios choices of β, p, and whether multiplicity is adjusted. When p is relatively
without multiplicity adjustment. The second row shows the same pa­ large, the optimal r for both with and without FWER control is around 2.
rameters for scenarios with multiplicity adjustment. The overall power For example, when p = 0.8, the optimal r is around 1.85. When p is small,
is similar when p is small, and the divergence becomes conspicuous say p = 0.2, the curves of the overall power are fairly flat, which in­
when overlap is larger than 0.6. For r = 0.5, since the randomization dicates different randomization ratios do not impact the overall power
ratio is less than half for control arm compared to the two separate 2-arm much for small p values. Choosing r = 1 for simplicity will not introduce
trials, the performance is not as good for all scenarios. The umbrella a large power loss. Though the figure shows r value from 0 to over 5 to
design is more powerful when r is in a practical range (i.e., from 1 to 2) have a complete picture of the pattern of r for different p values, in
and there is no need to control FWER. However, the design is less practice, r value less than 1 has a much lower overall power, and r value
powerful if controlling FWER at the α level except for the all control case over 2 would randomize more patients to the control arm than to the
when p is small. Overall, the conservatism of multiplicity adjustment experimental arms compared to two separate two-arm studies. Thus,
makes the design unworthy to consider in practice. both cases are not ideal.

5. Discussion

The three-arm umbrella trial with an opportunity to add a second


experimental arm mid-trial has the benefit of saving patient resource by
potentially allocating fewer patients to the control arm and reduce the
total sample size. The study is more appealing at the point that it reduces
the enrollment competition compared to conducting two separate two-
arm trials. Statistical properties, such as type I error, overall power,
and optimal allocation, have been explored with or without multiplicity
adjustment. The work can be extended to the situations when the third
or more arms are allowed to be added mid-trial, the basic concepts won’t
change much, but the derivation will be more complicated. A simple
extension would be assuming same sample sizes and overlaps for all
arms. However, allowing more arms to be added will bring more
complexity to trial design, which may not be practical to an individual
company.
Whether or not to control the FWER at the trial level depends on the
relationship between the experimental arms. The main argument for not
adjusting for multiplicity is that each experimental arm is for its own
Fig. 5. Overall power at β = 20% under three scenarios without and with claim. However, if the experimental arms are for related claims, for
adjusting multiplicity. example, two experimental arms representing two different routes of

5
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

Fig. 6. Overall power comparison at β = 30 % , 20 % , and 10% under concurrent case without and with adjusting multiplicity.

administration of the same drug, multiplicity adjustment is required. Declaration of Competing Interest
Both approaches were considered in this article.
In the case where FWER control is not required, the FWER is smaller The authors declare that they have no known competing financial
in the 3-arm umbrella trial than that of the two separate two-arm trials. interests or personal relationships that could have appeared to influence
The probability of making two false positive decisions under global null the work reported in this paper.
is larger than that of the two separate two-arm studies though the ab­ The authors declare the following financial interests/personal re­
solute value is small. If FWER is required to control at α, the probability lationships which may be considered as potential competing interests.
of making two errors under global null is similar to that with two
separate two-arm studies. The overall power increases as the enrollment References
of the second treatment arm starts earlier, i.e., with larger p, when only
concurrent control data are used in the analysis, under the assumption of [1] B.M. Alexander, S. Ba, M.S. Berger, et al., Adaptive global innovative learning
environment for glioblastoma: GBM AGILE, Clin. Cancer Res. 24 (4) (2019)
fixed total sample size. The trend is reversed when non-concurrent 737–743.
control is included in the analysis. Moreover, adjusting for the multi­ [2] X. Bai, Q. Deng, D. Liu, Multiplicity issues for platform trials with a shared control
plicity and control FWER at the α level makes the design more conser­ arm, J. Biopharm. Stat. (2020), https://doi.org/10.1080/
10543406.2020.1821703.
vative and generally unworthy in terms of power. [3] A.D. Barker, et al., I-SPY 2: an adaptive breast cancer trial design in the setting of
Assuming a fixed total sample size, the optimal allocation for the neoadjuvant chemotherapy, Clin. Pharmacol. & Ther. 86 (1) (2009) 97–100.
overlapping enrollment period is determined by maximizing the prob­ [4] M. Bennett, A.P. Mander, Designs for adding a treatment arm to an ongoing clinical
trial, Trials (2020), https://doi.org/10.1186/s13063-020-4073-1.
ability of detecting a treatment effect in at least one of the two experi­ [5] F. Bretz, F. Koenig, Commentary on Parker and Weir, Clinical Trials 17 (5) (2020)
mental arms. Apparently, the optimal allocation is robust to the total 567–569.
sample size and the type II error with or without multiplicity adjust­ [6] D.R. Cohen, S. Todd, W.M. Gregory, et al., Adding a treatment arm to an ongoing
clinical trial: a review of methodology and practice, Trials 16 (1) (2015) 1–9.
ment. When r is in a practical range, i.e., from 1 to 2, it is important to
[7] O. Collignon, C. Gartner, A.B. Haidich, et al., Current statistical considerations and
emphasize that adding a treatment arm midway through a two-arm trial regulatory perspectives on the planning of confirmatory basket, umbrella, and
is better than running two separate trials in terms of patient resource, platform trials, Clinical Pharmacology & Therapeutics 107 (5) (2020) 1059–1067.
time, and logistics. [9] B. Freidlin, E.L. Korn, R. Gray, et al., Multi-arm clinical trials of new agents: some
design considerations, Statistics in Clin. Cancer Res. (2008), https://doi.org/
Throughout this article, the standardized test statistics are assumed 10.1158/1078-0432.CCR-08-0328.
to follow a normal distribution with a known population variance. When [10] M.J. Grayling, J.M. Wason, A.P. Mander, An optimized multi-arm multi-stage
the primary endpoint is a time-to-event endpoint, modifications of the clinical trial design for unknown variance, Contemporary Clin. Trials 67 (2018)
116–120.
derivations are needed. For example, instead of evaluating the sample [11] D.R. Howard, J.M. Brown, S. Todd, et al., Recommendations on multiple testing
mean, the log hazard ratio follows a normal distribution. If we still as­ adjustment in multi-arm trials with a shared control group, Stat. Methods Med.
sume the total sample size N is given, the overall study power will be Res. 27 (5) (2018) 1513–1530.
[12] K.M. Lee, J. Wason, Including non-concurrent control patients in the analysis of
driven by the number of events rather than the total sample size of pa­ platform trials: is it worth it? BMC Med. Res. Methodol. 20 (1) (2020) 1–2.
tients. On the other hand, if all control case is considered, number of [15] R.A. Parker, C.J. Weir, Non-adjustment for multiple testing in multi-arm trials of
events in Arm 1 would be more than that in the concurrent control only distinct treatments: rationale and justification, Clinical Trials 17 (5) (2020)
562–566.
case due to longer follow up even with the same sample size. Further [16] M.A. Proschan, D.A. Follmann, Multiple comparisons with control in a single
research will be done considering time-to-event endpoints. Moreover, experiment versus separate experiments: why do we feel differently? Am. Stat. 49
the population variance is unknown. Even though the normality (2) (1995) 144–149.
[18] N. Stallard, S. Todd, D. Parashar, et al., On the need to adjust for multiplicity in
assumption is adequate especially in a confirmatory trial with a large
confirmatory clinical trials with master protocols, Ann. Oncol. 30 (4) (2019) 506.
sample size, inaccurate approximation of the population variance will
affect the operating characteristics of the trial. While one may adapt the
methods discussed by Grayling et al. to handle unknow variance [10],
the general conclusion is not expected to change much.

6
Y. Ren et al. Contemporary Clinical Trials 109 (2021) 106538

[19] F. Van Leth, et al., Comparison of first-line antiretroviral therapy with regimens [24] M. Elias Laurin Meyer, P. Mesenbrink, C. Dunger-Baldauf, H.J. Fulle, E. Glimm,
including nevirapine, efavirenz, or both drugs, plus stavudine and lamivudine: a Y. Li, M. Posch, F. Konig, The Evolution of Master Protocol Clinical Trials Designs:
randomised open-label trial, the 2NN study, Lancet 363 (9417) (2004) 1253–1263. A Systematic Literature Review, Clinical Therapeutics 42 (7) (2020) 1330–1360,
[22] J. Woodcock, L.M. LaVange, Master protocols to study multiple therapies, multiple https://doi.org/10.1016/j.clinthera.2020.05.010.
diseases, or both, N. Engl. J. Med. 377 (1) (2017) 62–70.

You might also like