Professional Documents
Culture Documents
Sample Size Calculators For Planning Stepped-Wedge Cluster Randomized Trials - A Review and Comparison
Sample Size Calculators For Planning Stepped-Wedge Cluster Randomized Trials - A Review and Comparison
Education Corner
Abstract
Recent years have seen a surge of interest in stepped-wedge cluster randomized trials
(SW-CRTs). SW-CRTs include several design variations and methodology is rapidly
developing. Accordingly, a variety of power and sample size calculation software for
SW-CRTs has been developed. However, each calculator may support only a selected set
of design features and may not be appropriate for all scenarios. Currently, there is no
resource to assist researchers in selecting the most appropriate calculator for planning
their trials. In this paper, we review and classify 18 existing calculators that can be imple-
mented in major platforms, such as R, SAS, Stata, Microsoft Excel, PASS and nQuery.
After reviewing the main sample size considerations for SW-CRTs, we summarize the
features supported by the available calculators, including the types of designs, out-
comes, correlation structures and treatment effects; whether incomplete designs,
cluster-size variation or secular trends are accommodated; and the analytical approach
used. We then discuss in more detail four main calculators and identify their strengths
and limitations. We illustrate how to use these four calculators to compute power for two
real SW-CRTs with a continuous and binary outcome and compare the results. We show
that the choice of calculator can make a substantial difference in the calculated power
and explain these differences. Finally, we make recommendations for implementing
sample size or power calculations using the available calculators. An R Shiny app is avail-
able for users to select the calculator that meets their requirements (https://douyang.shi
nyapps.io/swcrtcalculator/).
Key words: Stepped-wedge design, power, sample size, software, correlation structures
C The Author(s) 2022; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association
V 2000
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2001
Key Messages
• Stepped-wedge cluster randomized trials (SW-CRTs) are an attractive alternative to parallel-arm cluster randomized
trials in some scenarios and have been increasing in popularity. A variety of calculators that can help researchers
calculate power and sample size for SW-CRTs have been developed.
• Available calculators differ in the features they can support, including the types of designs, outcomes, correlation
structures and treatment effects that can be specified; ability to accommodate incomplete designs, cluster-size
variation or secular trends; and the type of analytical approach used; available calculators can also differ substantially
in the calculated power in some scenarios.
• We summarize and compare the features of 18 available calculators to calculate power and sample size for
SW-CRTs and describe a Shiny app to help researchers identify the calculators that support the desired features for a
particular planned trial.
developed by Grantham et al.12 that includes an R Shiny one or more transition periods.14 Figure 1c illustrates a de-
web app to determine power for some scenarios, assuming sign in which each cluster has the same number of control
individual measurement times are evenly spaced within and intervention periods, but not all clusters enter and exit
each interval. In practice, cross-sectional designs are often at the same time. This type of design is called a staircase
considered to be approximations of continuous recruit- design.15
ment short exposure designs and, depending on the as-
sumed correlation structure, sample size calculators based
on discrete sampling may yield identical calculations. Correlation structures and statistical models for
SW-CRTs
In a SW-CRT, the intervention effect is partially con-
Incomplete designs founded with the time effect (i.e. secular trend) by design;
Standard SW-CRTs collect data in each cluster period. therefore, the analytical model must always account for
Figure 1 (a) Stepped-wedge design with five sequences and six periods (steps). (b) Stepped-wedge design with transition periods. (c) ‘Staircase’
stepped-wedge design.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2003
between-cluster variance and the sum of the between-clus- ICC) and is not identical to the IAC referred to above in
ter and within-cluster variances.16 In multi-period CRTs, the context of a mixed-effects model. A comprehensive
such as SW-CRTs, correlation structures may be more overview of statistical models available for SW-CRTs can
complex as outcomes are measured in different periods. be found in Li et al.24
Nevertheless, the initial statistical model for SW-CRTs de- In addition to the above correlation structures that are
veloped by Hussey and Hughes was a random intercepts based on time (period), correlation structures can also be
model that assumed a single exchangeable correlation, i.e. allowed to depend on exposure. This is referred to as
the correlation between any two individuals from the same cluster-treatment effect heterogeneity and is accomplished
cluster is the same regardless of their distance apart in by including a random cluster by treatment effect in the an-
time.2 alytical model.25,26
Building upon earlier work on multi-period parallel-arm
CRTs,17 Hooper et al.18 and separately Girling and
Different ways to specify treatment effects
non-continuous outcomes theoretically depends on the as- Table 1 summarizes these and other features that
sumed treatment effect as well as on the size of any time researchers need to consider when designing a SW-CRT.
effects, whereas under the normal approximation it
depends on neither. Several recent methods have shifted
away from normal approximation approaches towards Summary of available sample size and
methods based on maximum-likelihood estimation power calculators
(MLE)28 of the generalized linear mixed-model (GLMM) To date, 18 power and sample size calculators for
or marginal models using GEE.23,29 These methods allow SW-CRTs have been developed for implementation in R,
users to specify odds ratios and risk ratios directly in their online via an R Shiny app, or in SAS, Stata, Excel, PASS31
sample size calculation procedures. Moreover, they allow and nQuery.32 A standalone calculator maintained by the
the variance to depend on the assumed treatment effect National Institutes of Health has also recently become
and thus require the effect of secular trends to be incorpo- available.33 A summary of the functionality of the calcula-
rated in the power calculation. Although the analytical tors with respect to the major features described in Table 1
model should always include time effects to account for is presented in Supplementary File 1, Tables S1–S6 (avail-
potential secular trends, power calculations for continuous able as Supplementary data at IJE online) organized by
outcomes are not affected by the magnitude of the secular software platform. Each calculator is also classified based
trends. on how correlation structures need to be specified (i.e. in
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2005
Table 1 Common features that users need to consider or specify when designing a stepped-wedge cluster randomized trial
ICC, intra-cluster correlation coefficient; CAC, Cluster Autocorrelation Coefficient; SW-CRT, stepped-wedge cluster randomized trial; CRT, cluster random-
ized trial.
2006 International Journal of Epidemiology, 2022, Vol. 51, No. 6
terms of coefficient of variation (CV) of outcomes, ICCs or features including an option for accommodating cluster-
variance components of random effects) and whether the treatment heterogeneity. The design matrix may contain
calculator allows for unequal cluster-period sizes, unequal empty cells to allow for incomplete designs or cells with
allocation of clusters to sequences and whether the user fractions to accommodate delayed or linear time-on-
can specify their own design matrix. Note that any calcula- treatment effects (although the design matrix will be dis-
tor that can support a user-specified design matrix can typ- played as integers). This calculator can also accommodate
ically also be used for other multiple-period designs (such unequal cluster sizes by specifying a CV for cluster-size
as repeated parallel-arm or multiple-period cluster cross- variation. Instead of a z-test, users can request a t-test
overs) as the sample size methodology underlying all these with degrees of freedom equal to the number of cluster
designs is similar. We also indicate whether cluster- periods minus the number of periods minus one, although
treatment effect heterogeneity, continuous time effects (as the t-test is not available for exponential decay or when
opposed to only categorical), subclusters (i.e. more than uploading a user-specified design matrix. An attractive
Overview
Important features and strengths
SHINYCRT automatically displays power or sample size • A calculator running in R, RShiny and SAS that calcu-
curves. Users can either provide sample sizes per cluster pe- lates power and sample size for SW-CRTs using GEE or
riod to calculate the power or specify a target power to ob- GLMM with MLE.34
tain the required sample sizes. In addition to the specified • Type of design and outcome: cross-sectional and closed
‘base’ ICC values, users can conduct a sensitivity analysis cohort designs with continuous and binary outcomes un-
by specifying the upper and lower bounds of the wp-ICC. der identity, log and logit link functions.
The plot will automatically include results for lower and • Correlation structures: block or nested exchangeable
higher CAC values (80% and 120% of the specified CAC) correlation structures (including exchangeable as a spe-
as a sensitivity analysis. cial case); wp-ICC and bp-ICC must be specified for
Users can upload their own design matrices to accom- cross-sectional designs (specified as equal to obtain ex-
modate unequal numbers of clusters per sequence. changeable correlation); for cohort designs, wi-ICC must
Uploading a design matrix also activates additional also be specified.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2007
Important features and strengths provide more accurate estimates than MLE even with a
The most significant feature of this calculator is the avail- small number of clusters is being planned as a future
ability of GEE and MLE to obtain more accurate sample addition.36
size estimates for binary outcomes. When the research • Correlation structure: exchangeable and nested ex-
requires relative measures, the implementation of log and changeable correlation structures. Users can specify the
logit link functions allows users to specify relative treat- wp-ICC and CAC or standard deviations for random
ment effects directly. When assuming no secular trend, intercepts, random treatment effects across clusters and
users can specify the target difference by either (i) specify- random cluster by time effects; the correlation between
ing the expected response in the control condition at the random intercepts and random treatment effects can
beginning of the study and the expected response in the in- also be specified (most other calculators assume
tervention condition at the end or (ii) replacing the independence).
expected response in the intervention condition at the end
Overview
CRTFASTGEEPWR by Zhang and Preisser
• A calculator running in R and RShiny app that calculates
Overview
power and sample size for SW-CRTs under a linear
mixed-effects model.35 • An SAS macro that calculates power for SW-CRT based
• Type of design and outcome: cross-sectional designs with on GEE analysis of a marginal model.37,38
continuous and binary outcomes under an identity link • Type of design and outcome: cross-sectional or closed
function. For non-continuous outcomes, a new method cohort designs with continuous, binary and count out-
that includes log and logit link functions and that can comes and identity, log and logit link functions.
2008 International Journal of Epidemiology, 2022, Vol. 51, No. 6
• Correlation structure: nested or block exchangeable (in- accommodate special features, e.g. designs with subclus-
cluding exchangeable as a special case) and exponential ters40; a calculator is also available to determine
or proportional decay. Under proportional decay, the allocation-based power.41,42 The choice of calculators that
calculator requires the bp-ICC and wi-ICC to decay at can deal with open cohort designs is limited10,43; however,
the same rate. a reviewer pointed out that by multiplying IAC by one mi-
nus the churn rate (the rate that participants leave the
Important features and strengths trial), calculators that support closed cohort designs can
Users can specify either time-averaged or ‘incremental’ in- also be used for open cohort designs. This observation will
tervention effects, the latter using fractional treatment val- hold for continuous outcomes under a constant individual-
ues for linear time-on-treatment effects. Additionally, time level correlation. Conversely, calculators for open cohort
effects can be specified as either categorical or linear (con- designs can also support closed cohort and cross-sectional
tinuous). Both intervention and time effects are specified as designs by setting the churn rate to 0 or 1.
school girls aged 10–16 years in Melbourne, Victoria, contradict an earlier statement that the magnitude of time
Australia.44 The trial was designed with three sequences effects is irrelevant for continuous outcomes but was in-
and four periods with two clusters per sequence (a total of cluded to illustrate how SWDPWR handles time effects
six clusters) and 10 participants per cluster, i.e. a total sam- compared with other packages.)
ple size of 60 participants. There were no transition periods. The power when using the three main calculators under
The primary outcome was continuous: a measure of self- these assumptions is presented in Table 3. Not surpris-
esteem on a scale from 0 to 40. The sample size parameter ingly, the power did not differ much whether GLMM or
values including ICC estimates under the three different cor- GEE was used. Whereas power was substantially different
relation structures from the CLOUDbank data set, which is under different correlation structures, power was similar
a repository of ICC values under different correlation struc- across the different calculators when using the normal dis-
tures, are summarized in Table 2.46 Due to the small num- tribution. Under the no-time-effect scenario however,
ber of clusters, we illustrate power with both a z-test and a SWDPWR yields substantially higher power due to fixed
Table 3 Power calculated under the main software packages for the ‘Girls on the Go’ trial
DF ¼ 19g DF ¼ 4
Exchangeable correlation 0.751 N/A N/A 0.501 0.501
Block exchangeable correlation 0.671 N/A N/A 0.418 0.418
Proportional decay N/A N/A N/A 0.401 0.401
IAC, Individual Autocorrelation Coefficient; ICC, intra-cluster correlation coefficient; DF, degrees of freedom.
a
SWCRTDESIGN does not support cohort designs.
b
Linear mixed-effects model used.
c
Generalized estimating equations (GEE) and mixed-effects model produced the same results.
d
GEE method for marginal model.
e
By default, coefficients for time are excluded from the underlying model when no time effect is specified.
f
SHINYCRT and CRTFASTGEEPWR used different DF in t-test.
g
This is a manual correction for the DF used in SHINYCRT based on its documentation; the actual implementation is under development.
gave substantially different results to the other packages, proportion is not too close to 0 or 1) and when anticipated
as explained earlier. When the time effects were not secular trends are small, SHINYCRT is potentially the best
null, the power from SWDPWR and CRTFASTGEEPWR choice for most users. It is free to access and provides an in-
was similar but substantially different from the power tuitive interface. It can accommodate continuous, binary
under the normal approximation in SHINYCRT and and count outcomes as well as cross-sectional and closed co-
SWCRTDESIGN (since the normal approximation does hort designs under the three types of correlation structures.
not rely on the magnitude of time effects). The differences It allows for incomplete designs and cluster-treatment het-
tend to increase when the time effects increase. erogeneity. Furthermore, it automatically implements a sen-
The logit link can also be specified for the mean model sitivity analysis for the assumed correlation parameters.
in SWDPWR and CRTFASTGEEPWR (see Supplementary However, for non-continuous outcomes in which relative
File 2, Section 3, available as Supplementary data at IJE on- effects are of interest or when extreme baseline rates/propor-
line). Both calculators require users to specify the effect size tions or large time effects are expected, CRTFASTGEEPWR
on the log scale and the ICC of the binary outcomes on the seems to be the better choice. This SAS macro allows for
proportions scale. Whereas SWDPWR requires the time ef- cross-sectional and closed cohort designs, incorporates all
fect to be specified as proportions, CRTFASTGEEPWR correlation structures considered herein and can accommo-
requires all marginal mean model parameters (i.e. period date incomplete designs, fractional treatment effects and
and intervention effects) to be specified on the link function both categorical and linear time effects. However, non-SAS
scale. users or users who do not want to use GEE may use
SWDPWR instead as long as they have a complete design
with a continuous or binary outcome (exchangeable correla-
Summary and recommendations tion only) and do not require decaying correlation structures
There are several key considerations when designing a or delayed treatment effects. When using SWDPWR, users
SW-CRT and choosing a power and sample size calculator. should incorporate non-zero time effects (even for continu-
For a continuous or non-continuous outcome when the nor- ous outcomes), otherwise the calculation will be based on a
mal approximation is appropriate (i.e. where there is a rela- model that excludes the time effect coefficients and this
tively large number of clusters and/or the anticipated could yield spuriously high power.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2011
Table 4 Sample size calculation parameter values used for Although users may select the calculator most conve-
the Expedited Partner Therapy trial (binary outcome) nient to them facilitated by our Shiny app, a good rule of
thumb when not all requirements can be met is to priori-
Sample size parameter Assumed value
tize those factors that have the most impact on power in
Number of sequences 4 their scenario. As our examples illustrate, an important
Number of periods 5 consideration is the assumed correlation structure: assum-
Number of clusters 22 (6, 6, 6, 4 for each sequence) ing an overly simple correlation structure (e.g. exchange-
Cluster size per period 305 able) when a more complex correlation structure (e.g.
Baseline rate 0.076
block exchangeable or exponential decay) is appropriate
Mean (absolute) difference 0.014
(treatment effect)
can lead to power being overestimated.49 Ideally, ICC val-
Type of design Cross-sectional ues should be informed by analysing existing data (e.g.
Exchangeable correlation historical or routinely collected data) under a similar set-
Table 5 Power calculated under the main software packages for the Expedited Partner Therapy trial
No time Small time Large time No time Small time Large time
effectc effect effect effect effect effect
(0.007) (0.014) (0.007) (0.014)
Exchangeable correlation 0.806 0.995 0.819 0.837 0.803 0.803 0.819 0.837
Nested exchangeable correlation 0.543 0.904 0.560 0.580 0.540 0.542 0.560 0.580
Exponential decay 0.570 N/A N/A N/A N/A 0.568 0.586 0.607
a
Normal approximation using mixed-effects model. Magnitude of time effects cannot be accommodated.
b
Generalized estimating equations (GEE) was used for calculation. With cluster-period size of >150, only GEE can be used in SWDPWR.
c
By default, coefficients for time are excluded from the underlying model when no time effect is specified.
2012 International Journal of Epidemiology, 2022, Vol. 51, No. 6
23. Li F, Turner EL, Preisser JS. Sample size determination for GEE 38. Zhang Y, Preisser JS, Li F, Turner EL, Rathouz PJ.
analyses of stepped wedge cluster randomized trials. Biometrics %CRTFASTGEEPWR: a SAS macro for power of the general-
2018;74:1450–58. ized estimating equations of multi-period cluster randomized
24. Li F, Hughes JP, Hemming K et al. Mixed-effects models for the trials with application to stepped wedge designs. arXiv
design and analysis of stepped wedge cluster randomized trials: preprint, arXiv:2205.14532, 28 May 2022, preprint: not peer
an overview. Stat Methods Med Res 2021;30:612–39. reviewed.
25. Hughes JP, Granston TS, Heagerty PJ. Current issues in the de- 39. Stata Packages—John Gallis. https://sites.duke.edu/johngallis/
sign and analysis of stepped wedge trials. Contemp Clin Trials stata-packages/ (29 November 2021, date last accessed).
2015;45:55–60. 40. Davis-Plourde K, Taljaard M, Li F. Sample size considerations
26. Hemming K, Taljaard M, Forbes A. Modeling clustering and for stepped wedge designs with subclusters. Biometrics 2021;
treatment effect heterogeneity in parallel and stepped-wedge doi:10.1111/biom.13596.
cluster randomized trials. Stat Med 2018;37:883–98. 41. Ouyang Y, Xu L, Karim ME et al. CRTpowerdist: An R package
27. Hemming K, Kasza J, Hooper R et al. A tutorial on sample size to calculate attained power and construct the power distribution
calculation for multiple-period cluster randomized parallel, for cross-sectional stepped-wedge and parallel cluster random-