Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

IEA

International Journal of Epidemiology, 2022, 2000–2013


https://doi.org/10.1093/ije/dyac123
Advance Access Publication Date: 9 June 2022
International Epidemiological Association
Education Corner

Education Corner

Sample size calculators for planning


stepped-wedge cluster randomized
trials: a review and comparison

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Yongdong Ouyang ,1,2* Fan Li ,3,4 John S Preisser5 and
Monica Taljaard1,2
1
Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, ON, Canada, 2School of
Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada, 3Department of Biostatistics,
Yale School of Public Health, New Haven, CT, USA, 4Center for Methods in Implementation and
Prevention Science, Yale School of Public Health, New Haven, CT, USA and 5Department of
Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
*Corresponding author. Clinical Epidemiology Program, Ottawa Hospital Research Institute, 1053 Carling Ave, Ottawa,
ON K1Y 4E9, Canada. E-mail: youyang@ohri.ca
Received 31 December 2021; Editorial decision 5 May 2022; Accepted 17 May 2022

Abstract
Recent years have seen a surge of interest in stepped-wedge cluster randomized trials
(SW-CRTs). SW-CRTs include several design variations and methodology is rapidly
developing. Accordingly, a variety of power and sample size calculation software for
SW-CRTs has been developed. However, each calculator may support only a selected set
of design features and may not be appropriate for all scenarios. Currently, there is no
resource to assist researchers in selecting the most appropriate calculator for planning
their trials. In this paper, we review and classify 18 existing calculators that can be imple-
mented in major platforms, such as R, SAS, Stata, Microsoft Excel, PASS and nQuery.
After reviewing the main sample size considerations for SW-CRTs, we summarize the
features supported by the available calculators, including the types of designs, out-
comes, correlation structures and treatment effects; whether incomplete designs,
cluster-size variation or secular trends are accommodated; and the analytical approach
used. We then discuss in more detail four main calculators and identify their strengths
and limitations. We illustrate how to use these four calculators to compute power for two
real SW-CRTs with a continuous and binary outcome and compare the results. We show
that the choice of calculator can make a substantial difference in the calculated power
and explain these differences. Finally, we make recommendations for implementing
sample size or power calculations using the available calculators. An R Shiny app is avail-
able for users to select the calculator that meets their requirements (https://douyang.shi
nyapps.io/swcrtcalculator/).

Key words: Stepped-wedge design, power, sample size, software, correlation structures

C The Author(s) 2022; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association
V 2000
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2001

Key Messages

• Stepped-wedge cluster randomized trials (SW-CRTs) are an attractive alternative to parallel-arm cluster randomized

trials in some scenarios and have been increasing in popularity. A variety of calculators that can help researchers
calculate power and sample size for SW-CRTs have been developed.
• Available calculators differ in the features they can support, including the types of designs, outcomes, correlation

structures and treatment effects that can be specified; ability to accommodate incomplete designs, cluster-size
variation or secular trends; and the type of analytical approach used; available calculators can also differ substantially
in the calculated power in some scenarios.
• We summarize and compare the features of 18 available calculators to calculate power and sample size for

SW-CRTs and describe a Shiny app to help researchers identify the calculators that support the desired features for a
particular planned trial.

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


• We advise users on how to choose the most appropriate calculator based on their requirements and what to

prioritize when no calculator can accommodate all their needs.

Introduction then illustrate how to use these calculators to obtain power


In recent years, the stepped-wedge cluster randomized trial for each of the trials and compare the results across the
(SW-CRT) design has been increasing in popularity.1–4 In four calculators. We conclude by summarizing our findings
SW-CRTs (Figure 1a), each cluster begins in the control and making recommendations for selecting the most ap-
condition. At different time points, clusters are randomized propriate calculator in practice.
to cross over to the intervention condition until by the end
of the trial, all clusters have received the intervention. In a Main factors to consider when planning a
complete SW-CRT design, measurements are taken from SW-CRT
each cluster in each period. The SW-CRT can help facili-
tate recruitment when stakeholders perceive there are ben- Types of SW designs based on sampling
efits to being exposed to the intervention at some point structure
during the trial.5,6 The design is also attractive when the There are three main types of SW-CRTs, depending on the
goal is to implement the intervention in all clusters but sampling structure.9 In a cross-sectional design, outcomes
there are inadequate resources to do so in all clusters simul- are measured on different participants over time so each
taneously.5,6 Despite their potential advantages, the use of participant contributes outcomes to either the control or
stepped-wedge (SW) designs must be appropriately justi- intervention condition but not both. In the closed cohort
fied and researchers must be aware of the potential risks of design, the same participants are measured repeatedly over
bias.7,8 time so each participant contributes outcomes under both
Methods for planning SW-CRTs are rapidly evolving the control and intervention conditions. The open cohort is
and a variety of sample size calculation methods and tools a variation on the closed cohort in which participants may
are now available. Applied statisticians are faced with a be- join or leave the cohort during the trial and thus at least
wildering array of choices and there is no centralized dis- some participants contribute outcomes under both control
cussion of the functionalities of different calculators. We and intervention conditions.10 The majority of work on
aim to provide a comprehensive overview of the trial de- sample size calculation for SW-CRTs has focused on cross-
sign features that can be accommodated by available sectional or closed cohort designs; open cohort designs
power and sample size calculators for SW-CRTs, illustrate have received relatively little development except for the
how to use the main calculators and provide a guide to al- work by Kasza et al.10 Recent work has started to consider
low users to select the most appropriate calculators for cross-sectional designs arising from the common scenario
their studies. in which participants are recruited continuously in
The rest of the article is organized as follows. We start time (but exposed for only a short period) as a special class
with an overview of SW-CRTs, including major features to of design requiring different correlation structures,
consider in sample size calculation. We then provide a de- called ‘continuous-time decay’, and different analytical
tailed review of four main calculators including their approaches.9,11 Further work is required but some
strengths and limitations. Using two example trials, we methodology for continuous-time decay has already been
2002 International Journal of Epidemiology, 2022, Vol. 51, No. 6

developed by Grantham et al.12 that includes an R Shiny one or more transition periods.14 Figure 1c illustrates a de-
web app to determine power for some scenarios, assuming sign in which each cluster has the same number of control
individual measurement times are evenly spaced within and intervention periods, but not all clusters enter and exit
each interval. In practice, cross-sectional designs are often at the same time. This type of design is called a staircase
considered to be approximations of continuous recruit- design.15
ment short exposure designs and, depending on the as-
sumed correlation structure, sample size calculators based
on discrete sampling may yield identical calculations. Correlation structures and statistical models for
SW-CRTs
In a SW-CRT, the intervention effect is partially con-
Incomplete designs founded with the time effect (i.e. secular trend) by design;
Standard SW-CRTs collect data in each cluster period. therefore, the analytical model must always account for

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Incomplete SW-CRTs are designs that do not collect data time to obtain unbiased inferences. It is further well known
in all cluster periods.13,14 Figure 1b illustrates a design in that CRTs must account for the correlation among multi-
which no data are collected during the period immediately ple responses from different individuals within the same
prior to the intervention. These ‘transition periods’ allow cluster. In a post-test-only parallel-arm CRT, the degree of
extra time to implement the intervention or for the inter- clustering is usually measured using a single intra-cluster
vention to start exerting its effect. Each sequence can have correlation coefficient (ICC), defined as the ratio of the

Figure 1 (a) Stepped-wedge design with five sequences and six periods (steps). (b) Stepped-wedge design with transition periods. (c) ‘Staircase’
stepped-wedge design.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2003

between-cluster variance and the sum of the between-clus- ICC) and is not identical to the IAC referred to above in
ter and within-cluster variances.16 In multi-period CRTs, the context of a mixed-effects model. A comprehensive
such as SW-CRTs, correlation structures may be more overview of statistical models available for SW-CRTs can
complex as outcomes are measured in different periods. be found in Li et al.24
Nevertheless, the initial statistical model for SW-CRTs de- In addition to the above correlation structures that are
veloped by Hussey and Hughes was a random intercepts based on time (period), correlation structures can also be
model that assumed a single exchangeable correlation, i.e. allowed to depend on exposure. This is referred to as
the correlation between any two individuals from the same cluster-treatment effect heterogeneity and is accomplished
cluster is the same regardless of their distance apart in by including a random cluster by treatment effect in the an-
time.2 alytical model.25,26
Building upon earlier work on multi-period parallel-arm
CRTs,17 Hooper et al.18 and separately Girling and
Different ways to specify treatment effects

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Hemming19 introduced a conditional (i.e. mixed-effects)
model (referred to as the Hooper/Girling model) that allows The most common way to model a treatment effect in a
the within-period ICC (wp-ICC) (i.e. the correlation be- SW-CRT is as a time-averaged effect (step/immediate
tween two individuals from the same cluster and same pe- change),2 which implies an immediate effect that is sus-
riod) to differ from the between-period ICC (bp-ICC) (i.e. tained. With this specification, the average response across
the correlation between two individuals from the same clus- the intervention periods is compared with that across the
ter but different periods). We would typically expect the bp- control periods (Figure 2a). In some situations, the inter-
ICC to be less than the wp-ICC. The ratio of the bp-ICC vention will not be fully implemented in the period imme-
and wp-ICC is called the Cluster Autocorrelation diately after crossing over or may have a delayed effect.
Coefficient (CAC), which is typically <1.18,20 Li21 referred This can be handled by specifying a fractional treatment in-
to this as a nested exchangeable correlation structure. In a dicator.2,25 For example, using prior knowledge, one may
cohort design, an additional correlation is needed to account specify that the treatment will only be 50% effective dur-
for repeated measures on the same individual over time. In ing the first intervention period and code the treatment in-
the Hooper/Girling model, this correlation was introduced dicator as 0.5 in that period. When the treatment effect is
through the addition of a patient-level random effect and anticipated to be gradual across multiple periods, one can
quantified as an Individual Autocorrelation Coefficient assign fractional values to consecutive periods (assuming
(IAC), which implies a constant correlation in the repeated the fractions are known) (Figure 2b). More generally,
measures on the same individual over time.18 Li21 referred rather than estimating a single treatment effect, one may
to this as a block exchangeable correlation structure. allow the treatment effect to vary in an unrestricted way
Intuitively, within a cluster, the responses from individ- according to the number of periods of exposure to the in-
uals between two consecutive periods are usually more cor- tervention. This is referred to as general time-on-treatment
related than those separated by a longer interval. effects24,25; it requires that an adequate number of clusters
Therefore, assuming a constant bp-ICC may not be realis- are available given the desired number of treatment effects.
tic, Kasza et al.22 proposed an extension to the Hooper/ A more parsimonious version is to assume a constant (lin-
Girling model that allows the bp-ICC to decay exponen- ear) change with increasing duration of exposure (called a
tially. In this case, the CAC refers to the rate of exponential linear time-on-treatment effect).24 Linear time-on-
decay per period. In the case of a cohort design, individual- treatment can also be viewed as a special case of fractional
level correlation is also required and there are two possibil- treatment indicators but where the fractions are equally
ities: either a constant individual-level correlation over spaced and determined by the maximum exposure time
time (specified through an IAC) or allowing individual- (Figure 2c).
level correlations to decay exponentially (referred to as the
proportional decay structure).21
As an alternative to mixed-effects models, Li et al.23 Analytical approximation for non-continuous
proposed a marginal model based on generalized estimat- outcomes
ing equations (GEE) and corresponding power and sam- For non-continuous (binary and count) outcomes, a com-
ple size calculation methodology allowing for both cross- mon approach to sample size calculation is to use the nor-
sectional and closed cohort SW-CRT designs. In a cohort mal approximation.2,27 However, this approach has
setting, it is worth noting that in a marginal model, the several issues. First, it requires the specification of an abso-
correlation in repeated measures on the same individual is lute difference even if relative measures are of interest.
modelled directly through the within-individual ICC (wi- Second, the variance of the estimated treatment effect for
2004 International Journal of Epidemiology, 2022, Vol. 51, No. 6

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Figure 2 (a) Design matrix for a stepped-wedge design with time-averaged treatment effect. (b) Design matrix for a stepped-wedge design with
delayed/gradual treatment effect and known fractions. (c) Design matrix for a stepped-wedge design with linear time-on-treatment effects (treatment
increases linearly with duration of exposure).

non-continuous outcomes theoretically depends on the as- Table 1 summarizes these and other features that
sumed treatment effect as well as on the size of any time researchers need to consider when designing a SW-CRT.
effects, whereas under the normal approximation it
depends on neither. Several recent methods have shifted
away from normal approximation approaches towards Summary of available sample size and
methods based on maximum-likelihood estimation power calculators
(MLE)28 of the generalized linear mixed-model (GLMM) To date, 18 power and sample size calculators for
or marginal models using GEE.23,29 These methods allow SW-CRTs have been developed for implementation in R,
users to specify odds ratios and risk ratios directly in their online via an R Shiny app, or in SAS, Stata, Excel, PASS31
sample size calculation procedures. Moreover, they allow and nQuery.32 A standalone calculator maintained by the
the variance to depend on the assumed treatment effect National Institutes of Health has also recently become
and thus require the effect of secular trends to be incorpo- available.33 A summary of the functionality of the calcula-
rated in the power calculation. Although the analytical tors with respect to the major features described in Table 1
model should always include time effects to account for is presented in Supplementary File 1, Tables S1–S6 (avail-
potential secular trends, power calculations for continuous able as Supplementary data at IJE online) organized by
outcomes are not affected by the magnitude of the secular software platform. Each calculator is also classified based
trends. on how correlation structures need to be specified (i.e. in
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2005

Table 1 Common features that users need to consider or specify when designing a stepped-wedge cluster randomized trial

Variable Definition or explanation

Type of primary outcome Continuous, binary, count


Details of the design
Number of sequences Number of rows in design matrix
Number of periods Determines number of columns in design matrix (when time is modelled as a categorical
variable)
Cluster size per period Average number of participants per cluster per period
Type of design
Cross-sectional Different individuals in each period
Closed cohort The same individuals repeatedly measured over time
Open cohort Mixture of different or same individuals

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Correlation structures
Cross-sectional designs
Exchangeable Assume a constant within-cluster correlation over time
Nested exchangeable Allow the between-period ICC to be different from the within-period ICC (often specified
using the CAC, which is the ratio of the between-period to the within-period ICC)
Exponential decay Specify a within-period ICC and a CAC measuring the exponential rate of decay in the
between-period ICC per period
Cohort designs
Exchangeable As for cross-sectional design; in addition, assume a constant within-individual correlation over
time
Block exchangeable As for cross-sectional design; in addition, assume a constant within-individual correlation over
time
Proportional decay As for cross-sectional design; in addition, specify an exponential decay in the within-individual
correlation over time
Type of treatment effect
Average (immediate) effect Intervention leads to step change (immediate change)
Delayed or gradual effect Allows gradual treatment effect or partial exposure to the intervention
Linear time-on-treatment effect Allows the treatment to increase linearly with duration of exposure
General time-on-treatment effect Allows the treatment effect to vary in an unrestricted way, according to the number of periods
of exposure to the intervention
Link function (for binary outcomes) Identity (Risk Difference), Log (Relative Risk), Logit (Odds Ratio)
Equal or Unequal allocation Same or different number of clusters per sequence; calculators that allow users to upload a user-
defined design matrix may allow users to explore power implications of different allocations
Varying or fixed cluster sizes Allowing cluster (or cluster-period) sizes to vary across clusters can affect power. Often speci-
fied in terms of a coefficient of cluster-size variation
Incomplete design Data are not collected from some clusters in some periods either to allow for a transition period
or to ease the data collection burden (see Figure 1). Calculators that allow users to upload or
define their own design matrix provide more flexibility over specifying incomplete designs
Analytic model Mixed-effects regression or marginal model (generalized estimating equations)
Different methods to specify correlation Different calculators may require different inputs to measure correlation. The most common
approach is to specify anticipated intra-cluster correlation coefficients. An alternative is to
specify the anticipated variance or standard deviation of the random-effects distributions.
Finally, the coefficient of outcome variation can sometimes be specified
Cluster-treatment heterogeneity Users may allow treatment effects to vary randomly across clusters (which can reduce power)
Multilevel clustering More than two levels of clustering during each time period, e.g. patients may be clustered
within providers who are clustered within primary care practices
Hybrid design The design contains a mixture of SW-CRT and parallel CRT (see Girling et al.19 for details)
Continuous or categorical time effects Users may model secular trends using fixed categorical or fixed continuous effects. Whether
linear or categorical time effects are incorporated in complete stepped-wedge designs has no
influence on continuous outcomes30
Magnitude of time effect Users may specify the expected secular trend, which, in the case of a non-continuous outcome,
can affect power

ICC, intra-cluster correlation coefficient; CAC, Cluster Autocorrelation Coefficient; SW-CRT, stepped-wedge cluster randomized trial; CRT, cluster random-
ized trial.
2006 International Journal of Epidemiology, 2022, Vol. 51, No. 6

terms of coefficient of variation (CV) of outcomes, ICCs or features including an option for accommodating cluster-
variance components of random effects) and whether the treatment heterogeneity. The design matrix may contain
calculator allows for unequal cluster-period sizes, unequal empty cells to allow for incomplete designs or cells with
allocation of clusters to sequences and whether the user fractions to accommodate delayed or linear time-on-
can specify their own design matrix. Note that any calcula- treatment effects (although the design matrix will be dis-
tor that can support a user-specified design matrix can typ- played as integers). This calculator can also accommodate
ically also be used for other multiple-period designs (such unequal cluster sizes by specifying a CV for cluster-size
as repeated parallel-arm or multiple-period cluster cross- variation. Instead of a z-test, users can request a t-test
overs) as the sample size methodology underlying all these with degrees of freedom equal to the number of cluster
designs is similar. We also indicate whether cluster- periods minus the number of periods minus one, although
treatment effect heterogeneity, continuous time effects (as the t-test is not available for exponential decay or when
opposed to only categorical), subclusters (i.e. more than uploading a user-specified design matrix. An attractive

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


two levels of clusters within each period) and hybrid feature of this app is that in addition to SW-CRT,
designs (i.e. a mixture of parallel and SW-CRTs) can be ac- menu options include other multiple-period designs,
commodated.19 A Shiny app is available (https://douyang. e.g. parallel-arm and multiple-period cluster crossover
shinyapps.io/swcrtcalculator/) in which users can obtain a designs.
list of calculators that accommodate their selected design
features. Among available calculators, we now review in Limitations
more detail four main packages that offer many of the The SHINYCRT web app has multiple features and allows
available features. substantial flexibility in specifying a design, but also has
some limitations. First, methods for binary and count out-
comes use a normal approximation; thus, the magnitude of
SHINYCRT by Hemming et al. time effects on power cannot be accommodated. Second,
Overview only identity link functions are allowed; thus, when the re-
search requires log or logit links, the sample size calcula-
• A menu-based web app (RShiny app) that calculates tion procedure may not be compatible with the analytical
power and sample size for SW-CRTs under linear mixed- approach. Third, in the case of closed cohort designs, the
effects models.27 app only allows constant individual-level correlation, even
• Type of design and outcome: cross-sectional and closed though proportional decay may be more appropriate.
cohort designs with continuous, binary and count out- Fourth, the t-distribution approximation was still under
comes using the normal approximation method. development at the time of submission. Finally, the RShiny
• Correlation structures: exchangeable (users need to spec- app platform limits the number of hours the app can run
ify a common wp-ICC); block or nested exchangeable on the server. When the quota has been reached in a partic-
(specify the wp-ICC and CAC as well as an IAC for co- ular month, users will have to find an alternative way to
hort designs); exponential decay (specify the wp-ICC and access the calculator.
CAC for the decay per period, as well as a constant IAC
for cohort designs). Exponential decay is referred to as
‘discrete time decay’ in this app. SWDPWR by Chen et al.

Overview
Important features and strengths
SHINYCRT automatically displays power or sample size • A calculator running in R, RShiny and SAS that calcu-
curves. Users can either provide sample sizes per cluster pe- lates power and sample size for SW-CRTs using GEE or
riod to calculate the power or specify a target power to ob- GLMM with MLE.34
tain the required sample sizes. In addition to the specified • Type of design and outcome: cross-sectional and closed
‘base’ ICC values, users can conduct a sensitivity analysis cohort designs with continuous and binary outcomes un-
by specifying the upper and lower bounds of the wp-ICC. der identity, log and logit link functions.
The plot will automatically include results for lower and • Correlation structures: block or nested exchangeable
higher CAC values (80% and 120% of the specified CAC) correlation structures (including exchangeable as a spe-
as a sensitivity analysis. cial case); wp-ICC and bp-ICC must be specified for
Users can upload their own design matrices to accom- cross-sectional designs (specified as equal to obtain ex-
modate unequal numbers of clusters per sequence. changeable correlation); for cohort designs, wi-ICC must
Uploading a design matrix also activates additional also be specified.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2007

Important features and strengths provide more accurate estimates than MLE even with a
The most significant feature of this calculator is the avail- small number of clusters is being planned as a future
ability of GEE and MLE to obtain more accurate sample addition.36
size estimates for binary outcomes. When the research • Correlation structure: exchangeable and nested ex-
requires relative measures, the implementation of log and changeable correlation structures. Users can specify the
logit link functions allows users to specify relative treat- wp-ICC and CAC or standard deviations for random
ment effects directly. When assuming no secular trend, intercepts, random treatment effects across clusters and
users can specify the target difference by either (i) specify- random cluster by time effects; the correlation between
ing the expected response in the control condition at the random intercepts and random treatment effects can
beginning of the study and the expected response in the in- also be specified (most other calculators assume
tervention condition at the end or (ii) replacing the independence).
expected response in the intervention condition at the end

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


of the study with the effect size. To accommodate a secular
Important features and strengths
trend, users can specify the expected response in the con-
The most important contribution of this package is to al-
trol condition at the beginning and end of the study (i.e. as-
low the intervention effect to vary across clusters by in-
suming the intervention has not been introduced) in
cluding a random treatment effect.25 To activate this
addition to the target difference. Users can upload a design
feature, users must input standard deviations of the antici-
matrix and accommodate hybrid designs and designs with
pated random effects (rather than ICCs). In the accompa-
different numbers of clusters per sequence.
nying Shiny app, delayed or linear time-on-treatment
effects can be accommodated by allowing fractional treat-
Limitations ment indicators; however, general time-on-treatment
The SWDPWR calculator has some limitations. First, it effects are not available. Designs must be specified via the
does not allow for exponential or proportional decay and function ‘swDns’. Users can specify the number of clusters
cannot accommodate count outcomes. Second, with the changing from control to intervention at each step. An
conditional model, a maximum cluster-period size of 150 incomplete design can be constructed by specifying a
is allowed for binary outcomes and the calculator is further cluster-period size of 0 in select periods.
limited to cross-sectional designs with exchangeable corre-
lation. If larger cluster-period sizes, cohort designs or more Limitations
complex correlation structures are desired, GEE must be SWCRTDESIGN has several limitations. First, calculations
used instead. Third, when specifying no time effects, i.e. for binary outcomes are only available using the normal
the same expected response in the control condition at the approximation and for count outcomes only via simula-
beginning and end, the time variable will be removed from tion. Second, it cannot be used for cohort designs. Third,
the underlying model used for calculation, which is incom- exponential or proportional decay cannot be accommo-
patible with recommendations to always include time dated. Fourth, there is no option for choosing a t-distribu-
effects in the analytical model. Fourth, there is currently no tion when the number of clusters is small. Fifth, compared
option for choosing a t-test when the number of clusters is with other packages, specifying incomplete designs is less
small. Finally, incomplete designs (including transition straightforward: a complete design matrix must be speci-
periods), cluster-treatment heterogeneity, unequal cluster fied using ‘swDns’ followed by specification of cluster-
sizes and delayed or time-on-treatment effects are not period sizes. Finally, delayed treatment effects are only
available. available in the Shiny app version so far, but relevant code
for simulation-based power calculation can be performed
by modifying the code provided in Voldal et al.35
SWCRTDESIGN by Hughes et al.

Overview
CRTFASTGEEPWR by Zhang and Preisser
• A calculator running in R and RShiny app that calculates
Overview
power and sample size for SW-CRTs under a linear
mixed-effects model.35 • An SAS macro that calculates power for SW-CRT based
• Type of design and outcome: cross-sectional designs with on GEE analysis of a marginal model.37,38
continuous and binary outcomes under an identity link • Type of design and outcome: cross-sectional or closed
function. For non-continuous outcomes, a new method cohort designs with continuous, binary and count out-
that includes log and logit link functions and that can comes and identity, log and logit link functions.
2008 International Journal of Epidemiology, 2022, Vol. 51, No. 6

• Correlation structure: nested or block exchangeable (in- accommodate special features, e.g. designs with subclus-
cluding exchangeable as a special case) and exponential ters40; a calculator is also available to determine
or proportional decay. Under proportional decay, the allocation-based power.41,42 The choice of calculators that
calculator requires the bp-ICC and wi-ICC to decay at can deal with open cohort designs is limited10,43; however,
the same rate. a reviewer pointed out that by multiplying IAC by one mi-
nus the churn rate (the rate that participants leave the
Important features and strengths trial), calculators that support closed cohort designs can
Users can specify either time-averaged or ‘incremental’ in- also be used for open cohort designs. This observation will
tervention effects, the latter using fractional treatment val- hold for continuous outcomes under a constant individual-
ues for linear time-on-treatment effects. Additionally, time level correlation. Conversely, calculators for open cohort
effects can be specified as either categorical or linear (con- designs can also support closed cohort and cross-sectional
tinuous). Both intervention and time effects are specified as designs by setting the churn rate to 0 or 1.

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


regression parameters on the scale of the link function. The During peer review, a new standalone calculator
model with categorical time effects implies an expected (SWGRT33) became available and was added to our man-
proportion/rate under the control condition for each time uscript. SWGRT can support continuous and binary out-
period, whereas specification of linear time effects requires comes, all three types of designs and available correlation
the user to specify both a fixed intercept and slope that de- structures, and is accompanied by excellent documenta-
termine marginal means for the outcome under the control tion. A notable feature of this calculator is that it can in-
condition. corporate the effect of covariate adjustments in the
Users are required to enter the number of clusters per design. It also allows an ICC obtained from a block ex-
sequence, which may vary. A required design pattern ma- changeable model to be converted to a discrete time decay
trix for sequence-periods uses 0 for control periods, 1 for ICC.10 However, it is based on the normal approximation
intervention periods and 2 for empty cells (for incomplete for binary outcomes and cannot deal with incomplete
designs). Users may determine power for the normal ap- designs, unequal allocation and user-specified design
proximation or from a t-distribution with choice of degrees matrices. Finally, it only allows the calculation of the re-
of freedom (the number of clusters minus two or the num- quired number of clusters per sequence for a pre-specified
ber of clusters minus the number of parameters in the treatment effect and a minimum of two clusters per se-
model). Unequal cluster-period sizes can be accommo- quence is required.
dated; however, users must enter the cluster-period sizes
rather than a single CV.
Examples
Limitations In this section, we illustrate power calculations using each
There are few limitations to this calculator. First, cluster- of the four packages for two published trials: Girls on the
treatment effect heterogeneity cannot be accommodated. Go!44 and the Washington State Expedited Partner
Second, for binary and count outcomes, the magnitude of Therapy (EPT) trial.45 We show how the sample size
secular trends is relevant but the calculator only considers parameters should be specified in each package and com-
these parameters on the link function scale, so the user will pare the obtained power values across the four packages.
need to use inverse link functions to convert proportions We make comparisons under exchangeable, nested or
and rates in case of a non-identity link.23 block exchangeable and exponential or proportional decay
structures. We were able to obtain realistic parameter val-
ues under these correlation structures using real data from
Other calculators both trials. The code used to implement the calculations is
The above four calculators may cover most scenarios of in- presented in Sections 1 and 2 of Supplementary File 2
terest; features of other available calculators are summa- (available as Supplementary data at IJE online) or RPubs
rized in the tables and available in our app and only briefly (https://rpubs.com/derek6561/swcrtcalculator).
highlighted here. For Stata users, POWERSWGEE39 might
offer the most flexibility and has similar functionalities
as CRTFASTGEEPWR. Notably, when specifying the ‘Girls on the Go!’ trial
proportional decay structure, it allows bp-ICC and ‘Girls on the Go!’ is a closed cohort SW-CRT designed to
individual-level correlation to decay at different rates. determine whether a 10-week programme that incorpo-
However, POWERSWGEE currently cannot accommodate rates interactive and practical learning would help to in-
incomplete designs. A few calculators in our review crease self-esteem, self-confidence and body image among
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2009

school girls aged 10–16 years in Melbourne, Victoria, contradict an earlier statement that the magnitude of time
Australia.44 The trial was designed with three sequences effects is irrelevant for continuous outcomes but was in-
and four periods with two clusters per sequence (a total of cluded to illustrate how SWDPWR handles time effects
six clusters) and 10 participants per cluster, i.e. a total sam- compared with other packages.)
ple size of 60 participants. There were no transition periods. The power when using the three main calculators under
The primary outcome was continuous: a measure of self- these assumptions is presented in Table 3. Not surpris-
esteem on a scale from 0 to 40. The sample size parameter ingly, the power did not differ much whether GLMM or
values including ICC estimates under the three different cor- GEE was used. Whereas power was substantially different
relation structures from the CLOUDbank data set, which is under different correlation structures, power was similar
a repository of ICC values under different correlation struc- across the different calculators when using the normal dis-
tures, are summarized in Table 2.46 Due to the small num- tribution. Under the no-time-effect scenario however,
ber of clusters, we illustrate power with both a z-test and a SWDPWR yields substantially higher power due to fixed

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


t-test. SWCRTDESIGN was excluded from this example as time effect parameters being excluded from the model,
it cannot be used for cohort designs. Power calculations leaving fewer fixed parameters to estimate. Therefore, to
when using SWDPWR and CRTFASTGEEPWR are shown approximate the results of CRTFASTGEEPWR assuming a
both with and without time effects. (This does not null secular trend (where the analytical model includes
time effects), users must specify a small secular trend in
Table 2 Sample size calculation parameter values used for SWDPWR. As expected, the magnitude of time effects in
the ‘Girls on the Go’ trial (continuous outcome) the linear model is irrelevant to CRTFASTGEEPWR. Note
that power from CRTFASTGEEPWR is based on an expo-
Sample size parameter Assumed value nentially decaying wi-ICC, whereas SHINYCRT assumes a
Number of sequences 3 constant IAC; nevertheless, the power obtained was compa-
Number of periods 4 rable. When using the t-distribution, calculated power in
Number of clusters (equal allocation)6 SHINYCRT and CRTFASTGEEPWR were substantially
Cluster size per period 10 different (76% vs 50%) due to the degrees of freedom
Standard deviation 1
implemented being different (19 in SHINYCRT vs 4 in
Mean difference (treatment effect) 0.39
CRTFASTGEEPWR, i.e. the number of clusters minus two).
Type of design Closed cohort
Exchangeable correlation
Within-period ICC 0.005
Community EPT trial
IAC or within-individual ICCa 0.65
Block exchangeable correlation The Washington State EPT trial is a cross-sectional
Within-period ICC 0.01 SW-CRT designed to evaluate the impact of promoting the
CAC 0.223 use of EPT on the decrease of Chlamydia test positivity
IAC or within-individual ICCa 0.65
among women.45 The trial was designed with four sequen-
Proportional decay
ces and five periods. Each sequence has 6, 6, 6 and 4 clus-
Within-period ICC 0.025
CAC (per period) 0.689 ters, respectively (a total of 22 clusters). The average cluster-
IAC or within-individual ICCa 0.65 period size was 305 participants, i.e. a total sample size of
Varying cluster sizes No 33 550 participants. There were no transition periods. The
Transition periods No primary outcome was binary: prevalence of Chlamydia posi-
Type of model Linear mixed-effects tivity. The sample size parameter values including ICC esti-
model or GEE mates under the three different correlation structures, from
Type of treatment effect Step change (time-averaged effect)
reanalysis of the real EPT trial data,47,48 are summarized in
Cluster-treatment heterogeneity No
Table 4. Since the magnitude of any assumed time effects
Type of time effect Fixed categorical
Magnitude of time effects Not applicableb will impact the power for binary outcomes, we assessed
Significance level Two-sided, 5% power assuming (i) no time effect, (ii) a small time effect
equal to half the size of the treatment effect and (iii) a mod-
ICC, intra-cluster correlation coefficient; IAC, Individual Autocorrelation
erate time effect equal to the treatment effect.
Coefficient; CAC, Cluster Autocorrelation Coefficient; GEE, generalized esti-
mating equations. The power when using the four main calculators is pre-
a
SHINYCRT requires IAC; SWDPWR and CRTFASTGEEPWR require sented in Table 5. The power was dramatically different
within-individual ICC. We used the same value for both parameters.
b under different assumed correlation structures. The power
Assumed 0.1 in SWDPWR where specification of time effects is necessary
to avoid inflating power; changing the value of assumed time effects will not did not appear to differ based on whether GLMM or GEE
change power as the outcome is continuous was used, but when assuming no time effects, SWDPWR
2010 International Journal of Epidemiology, 2022, Vol. 51, No. 6

Table 3 Power calculated under the main software packages for the ‘Girls on the Go’ trial

SHINYCRTb SWDPWRc SWDPWRc CRTFASTGEEPWRd CRTFASTGEEPWRd


e
No time effect With time effect (0.1) No time effect With time effect (0.1)

Power using z-testa

Exchangeable correlation 0.796 0.997 0.794 0.794 0.794


Block exchangeable correlation 0.720 0.991 0.724 0.724 0.724
Proportional decay 0.713 N/A N/A 0.708 0.708
(exponential (the same (the same
decay with exponential decay in exponential decay in
fixed IAC) between-period and between-period and
within-individual ICC) within-individual ICC)

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Power using t-testf

DF ¼ 19g DF ¼ 4
Exchangeable correlation 0.751 N/A N/A 0.501 0.501
Block exchangeable correlation 0.671 N/A N/A 0.418 0.418
Proportional decay N/A N/A N/A 0.401 0.401

IAC, Individual Autocorrelation Coefficient; ICC, intra-cluster correlation coefficient; DF, degrees of freedom.
a
SWCRTDESIGN does not support cohort designs.
b
Linear mixed-effects model used.
c
Generalized estimating equations (GEE) and mixed-effects model produced the same results.
d
GEE method for marginal model.
e
By default, coefficients for time are excluded from the underlying model when no time effect is specified.
f
SHINYCRT and CRTFASTGEEPWR used different DF in t-test.
g
This is a manual correction for the DF used in SHINYCRT based on its documentation; the actual implementation is under development.

gave substantially different results to the other packages, proportion is not too close to 0 or 1) and when anticipated
as explained earlier. When the time effects were not secular trends are small, SHINYCRT is potentially the best
null, the power from SWDPWR and CRTFASTGEEPWR choice for most users. It is free to access and provides an in-
was similar but substantially different from the power tuitive interface. It can accommodate continuous, binary
under the normal approximation in SHINYCRT and and count outcomes as well as cross-sectional and closed co-
SWCRTDESIGN (since the normal approximation does hort designs under the three types of correlation structures.
not rely on the magnitude of time effects). The differences It allows for incomplete designs and cluster-treatment het-
tend to increase when the time effects increase. erogeneity. Furthermore, it automatically implements a sen-
The logit link can also be specified for the mean model sitivity analysis for the assumed correlation parameters.
in SWDPWR and CRTFASTGEEPWR (see Supplementary However, for non-continuous outcomes in which relative
File 2, Section 3, available as Supplementary data at IJE on- effects are of interest or when extreme baseline rates/propor-
line). Both calculators require users to specify the effect size tions or large time effects are expected, CRTFASTGEEPWR
on the log scale and the ICC of the binary outcomes on the seems to be the better choice. This SAS macro allows for
proportions scale. Whereas SWDPWR requires the time ef- cross-sectional and closed cohort designs, incorporates all
fect to be specified as proportions, CRTFASTGEEPWR correlation structures considered herein and can accommo-
requires all marginal mean model parameters (i.e. period date incomplete designs, fractional treatment effects and
and intervention effects) to be specified on the link function both categorical and linear time effects. However, non-SAS
scale. users or users who do not want to use GEE may use
SWDPWR instead as long as they have a complete design
with a continuous or binary outcome (exchangeable correla-
Summary and recommendations tion only) and do not require decaying correlation structures
There are several key considerations when designing a or delayed treatment effects. When using SWDPWR, users
SW-CRT and choosing a power and sample size calculator. should incorporate non-zero time effects (even for continu-
For a continuous or non-continuous outcome when the nor- ous outcomes), otherwise the calculation will be based on a
mal approximation is appropriate (i.e. where there is a rela- model that excludes the time effect coefficients and this
tively large number of clusters and/or the anticipated could yield spuriously high power.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2011

Table 4 Sample size calculation parameter values used for Although users may select the calculator most conve-
the Expedited Partner Therapy trial (binary outcome) nient to them facilitated by our Shiny app, a good rule of
thumb when not all requirements can be met is to priori-
Sample size parameter Assumed value
tize those factors that have the most impact on power in
Number of sequences 4 their scenario. As our examples illustrate, an important
Number of periods 5 consideration is the assumed correlation structure: assum-
Number of clusters 22 (6, 6, 6, 4 for each sequence) ing an overly simple correlation structure (e.g. exchange-
Cluster size per period 305 able) when a more complex correlation structure (e.g.
Baseline rate 0.076
block exchangeable or exponential decay) is appropriate
Mean (absolute) difference 0.014
(treatment effect)
can lead to power being overestimated.49 Ideally, ICC val-
Type of design Cross-sectional ues should be informed by analysing existing data (e.g.
Exchangeable correlation historical or routinely collected data) under a similar set-

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Within-period ICC 0.007 ting over a similar study duration. A practical approach is
Nested exchangeable to select the best-fitting correlation structure based on in-
correlation formation criteria, i.e. AIC (Akaike Information
Within-period ICC 0.007
Criterion) and BIC (Bayesian Information Criterion).50
CAC 0.50
Regardless of the availability of prior data, it is always
Exponential decay
Within-period ICC 0.007 good practice to check the sensitivity of the calculations
CAC (per period) 0.70 by assuming a range of plausible ICC values. When there
Varying cluster sizes No are no prior data to inform the choice of correlation struc-
Transition periods No ture, we would recommend using a nested or block ex-
Type of model Linear mixed-effects changeable correlation for binary outcomes and
regression or GEE exponential or proportional decay for continuous out-
Type of treatment effect Step change (time-averaged effect)
comes. For non-continuous outcomes, the type of approxi-
Cluster-treatment heterogeneityNo
mation used is also critical. For non-continuous outcomes,
Type of time effect Fixed categorical
Magnitude of time effect None, small (0.007) users must consider whether strong time effects are antici-
or large (0.014) pated and choose a calculator that can accommodate such
A time effect of, say, 0.007 effects. As shown in our example, with binary outcomes,
means that the outcome is the normal approximation works poorly if the baseline
expected to decrease, even proportion is away from 0.5 and the time effect is not
in the absence of any intervention, null. A similar observation was made by Zhou et al.28
from 0.076 in the first period to
Finally, for continuous outcomes with a small number of
0.069 in the last period
Significance level Two-sided, 5%
clusters, using the t-test is generally advised, but users
must be aware that different degree-of-freedom options
ICC, intra-cluster correlation coefficient; CAC, Cluster Autocorrelation are implemented in different calculators and the best
Coefficient; GEE, generalized estimating equations.
choice for degree of freedom remains unclear.

Table 5 Power calculated under the main software packages for the Expedited Partner Therapy trial

SHINYCRTa SWDPWRb SWCRTDESIGNa CRTFASTGEEPWRb

No time Small time Large time No time Small time Large time
effectc effect effect effect effect effect
(0.007) (0.014) (0.007) (0.014)

Exchangeable correlation 0.806 0.995 0.819 0.837 0.803 0.803 0.819 0.837
Nested exchangeable correlation 0.543 0.904 0.560 0.580 0.540 0.542 0.560 0.580
Exponential decay 0.570 N/A N/A N/A N/A 0.568 0.586 0.607

a
Normal approximation using mixed-effects model. Magnitude of time effects cannot be accommodated.
b
Generalized estimating equations (GEE) was used for calculation. With cluster-period size of >150, only GEE can be used in SWDPWR.
c
By default, coefficients for time are excluded from the underlying model when no time effect is specified.
2012 International Journal of Epidemiology, 2022, Vol. 51, No. 6

Ethics approval design is particularly used to evaluate interventions during rou-


tine implementation. J Clin Epidemiol 2011;64:936–48.
Not applicable.
4. Hemming K, Haines TP, Chilton PJ et al. The stepped wedge
cluster randomised trial: rationale, design, analysis, and report-
Data availability ing. BMJ 2015;350:h391.
5. Hargreaves JR, Copas AJ, Beard E et al. Five questions to consider
There are no new data associated with this article.
before conducting a stepped wedge trial. Trials 2015;16:350.
6. Prost A, Binik A, Abubakar I et al. Logistic, ethical, and political
dimensions of stepped wedge trials: critical review and case stud-
Supplementary data
ies. Trials 2015;16:351.
Supplementary data are available at IJE online. 7. Hemming K, Taljaard M. Reflection on modern methods: when
is a stepped-wedge cluster randomized trial a good study design
choice? Int J Epidemiol 2020;49:1043–52.
Author contributions 8. Hemming K, Taljaard M, Grimshaw J. Introducing the new

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


Y.O. led the writing of the manuscript, developed the idea with CONSORT extension for stepped-wedge cluster randomised tri-
M.T., extracted the software information, created tables and figures, als. Trials 2019;20:68.
and developed the Shiny app. M.T. conceived of the ideas, helped 9. Copas AJ, Lewis JJ, Thompson JA et al. Designing a stepped
with data extraction and co-led the writing. F.L. participated in the wedge trial: three main designs, carry-over effects and random-
development of the idea and helped with data extraction. J.S.P. criti- isation approaches. Trials 2015;16:352.
cally reviewed the methods and ideas. All authors contributed to 10. Kasza J, Hooper R, Copas A et al. Sample size and power calcu-
writing, drafting and editing the manuscript. lations for open cohort longitudinal cluster randomized trials.
Stat Med 2020;36:1871–83.
11. Hooper R, Copas A. Stepped wedge trials with continuous re-
Funding cruitment require new ways of thinking. J Clin Epidemiol 2019;
M.T. and F.L. are supported by the National Institute of Aging 116:161–66.
(NIA) of the National Institutes of Health (NIH) under Award 12. Grantham KL, Kasza J, Heritier S et al. Accounting for a decay-
Number U54AG063546, which funds NIA Imbedded Pragmatic ing correlation structure in cluster randomized trials with contin-
Alzheimer’s Disease and AD-Related Dementias Clinical Trials uous recruitment. Stat Med 2019;38:1918–34.
Collaboratory (NIA IMPACT Collaboratory). Y.O. is funded by the 13. Kasza J, Forbes AB. Information content of cluster–period cells
Ontario SPOR SUPPORT Unit, which is supported by the Canadian in stepped wedge trials. Biometrics 2019;75:144–52.
Institutes of Health Research and the Province of Ontario. J.S.P. and 14. Hemming K, Lilford R, Girling AJ. Stepped-wedge cluster rando-
F.L. are supported through Patient-Centered Outcomes Research mised controlled trials: a generic framework including parallel
R R
InstituteV (PCORIV Award ME-2019C1-16196). The statements and multiple-level designs. Stat Med 2015;34:181–96.
presented in this article are solely the responsibility of the authors 15. Hooper R, Kasza J, Forbes A. The hunt for efficient, incomplete
R
and do not necessarily represent the views of the NIH nor PCORIV, designs for stepped wedge trials with continuous recruitment
its Board of Governors or Methodology Committee.
and continuous outcome measures. BMC Med Res Methodol
2020;20:279.
Acknowledgements 16. Donner A, Koval JJ. Design considerations in the estimation of
We thank Ying Zhang (zying@live.unc.edu) who provided instruc- intraclass correlation. Ann Hum Genet 1982;46:271–77.
tions on how to specify the parameters in CRTFASTGEEPWR. 17. Murray DM, Hannan PJ, Wolfinger RD et al. Analysis of data
from group-randomized trials with repeat observations on the
same groups. Statist Med 1998;17:1581–600.
Conflict of interest 18. Hooper R, Teerenstra S, de Hoop E et al. Sample size calculation
Y.O. is the author for CRTpowerdist calculator; M.T. is a co-author for stepped wedge and other longitudinal cluster randomised tri-
of the SHINYCRT calculator; F.L. is co-author of SWDPWR and als. Stat Med 2016;35:4718–28.
CRTFASTGEEPWR; J.S.P. is author of CRTFASTGEEPWR. J.S.P. 19. Girling AJ, Hemming K. Statistical efficiency and optimal design
R
has received a stipend for service as a merit reviewer from PCORIV. for stepped cluster studies under linear mixed effects models.
J.S.P. did not serve on the Merit Review panel that reviewed this Stat Med 2016;35:2149–66.
project. 20. Martin J, Girling A, Nirantharakumar K et al. Intra-cluster and
inter-period correlation coefficients for cross-sectional cluster
randomised controlled trials for type-2 diabetes in UK primary
References
care. Trials 2016;17:402.
1. Brown CA, Lilford RJ. The stepped wedge trial design: a system- 21. Li F. Design and analysis considerations for cohort stepped
atic review. BMC Med Res Methodol 2006;6:54. wedge cluster randomized trials with a decay correlation struc-
2. Hussey MA, Hughes JP. Design and analysis of stepped wedge ture. Stat Med 2020;39:438–55.
cluster randomized trials. Contemp Clin Trials 2007;28:182–91. 22. Kasza J, Hemming K, Hooper R et al. Impact of non-uniform corre-
3. Mdege ND, Man M-S, Taylor Nee Brown CA et al. Systematic lation structure on sample size and power in multiple-period cluster
review of stepped wedge cluster randomized trials shows that randomised trials. Stat Methods Med Res 2019;28:703–16.
International Journal of Epidemiology, 2022, Vol. 51, No. 6 2013

23. Li F, Turner EL, Preisser JS. Sample size determination for GEE 38. Zhang Y, Preisser JS, Li F, Turner EL, Rathouz PJ.
analyses of stepped wedge cluster randomized trials. Biometrics %CRTFASTGEEPWR: a SAS macro for power of the general-
2018;74:1450–58. ized estimating equations of multi-period cluster randomized
24. Li F, Hughes JP, Hemming K et al. Mixed-effects models for the trials with application to stepped wedge designs. arXiv
design and analysis of stepped wedge cluster randomized trials: preprint, arXiv:2205.14532, 28 May 2022, preprint: not peer
an overview. Stat Methods Med Res 2021;30:612–39. reviewed.
25. Hughes JP, Granston TS, Heagerty PJ. Current issues in the de- 39. Stata Packages—John Gallis. https://sites.duke.edu/johngallis/
sign and analysis of stepped wedge trials. Contemp Clin Trials stata-packages/ (29 November 2021, date last accessed).
2015;45:55–60. 40. Davis-Plourde K, Taljaard M, Li F. Sample size considerations
26. Hemming K, Taljaard M, Forbes A. Modeling clustering and for stepped wedge designs with subclusters. Biometrics 2021;
treatment effect heterogeneity in parallel and stepped-wedge doi:10.1111/biom.13596.
cluster randomized trials. Stat Med 2018;37:883–98. 41. Ouyang Y, Xu L, Karim ME et al. CRTpowerdist: An R package
27. Hemming K, Kasza J, Hooper R et al. A tutorial on sample size to calculate attained power and construct the power distribution
calculation for multiple-period cluster randomized parallel, for cross-sectional stepped-wedge and parallel cluster random-

Downloaded from https://academic.oup.com/ije/article/51/6/2000/6605012 by guest on 22 January 2024


cross-over and stepped-wedge trials using the Shiny CRT ized trials. Comput Methods Programs Biomed 2021;208:
Calculator. Int J Epidemiol 2020;49:979–95. 106255.
28. Zhou X, Liao X, Kunz LM et al. A maximum likelihood ap- 42. Ouyang Y, Karim ME, Gustafson P et al. Explaining the varia-
proach to power calculations for stepped wedge designs of bi- tion in the attained power of a stepped-wedge trial with unequal
nary outcomes. Biostatistics 2020;21:102–21. cluster sizes. BMC Med Res Methodol 2020;20:166.
29. Preisser JS, Young ML, Zaccaro DJ et al. An integrated 43. Mildenberger P, Marini F. SteppedPower: Power Calculation for
population-averaged approach to the design, analysis and sam- Stepped Wedge Designs. 2021. https://CRAN.R-project.org/
ple size determination of cluster-unit trials. Stat Med 2003;22: package=SteppedPower (22 September 2021, date last accessed).
1235–54. 44. Tirlea L, Truby H, Haines TP. Pragmatic, randomized controlled
30. Grantham KL, Forbes AB, Heritier S et al. Time parameteriza- trials of the Girls on the Go! Program to improve self-esteem in
tions in cluster randomized trial planning. The American girls. Am J Health Promot 2016;30:231–41.
Statistician 2020;74:184–89. 45. Golden MR, Kerani RP, Stenger M et al. Uptake and population-
31. NCSS LLC. PASS 2021 Power Analysis and Sample Size level impact of expedited partner therapy (EPT) on Chlamydia
Software. Kaysville, UT, USA. https://www.ncss.com/software/ trachomatis and Neisseria gonorrhoeae: the Washington State
pass/ (20 December 2021, date last accessed). Community-Level Randomized Trial of EPT. PLoS Med 2015;
32. Statsols. nQuery Sample Size and Power Calculation. Cork: 12:e1001777.
Statistical Solutions Ltd. https://www.statsols.com (20 December 46. Korevaar E, Kasza J, Taljaard M et al. Intra-cluster correlations
2021, date last accessed). from the CLustered OUtcome Dataset bank to inform the design
33. SWGRT Calculator—Research Methods Resources: National of longitudinal cluster trials. Clin Trials 2021;18:529–40.
Institutes of Health. https://researchmethodsresources.nih.gov/ 47. Tian Z, Preisser JS, Esserman D et al. Impact of unequal cluster
(17 March 2022, date last accessed). sizes for GEE analyses of stepped wedge cluster randomized tri-
34. Chen J, Zhou X, Li F et al. swdpwr: A SAS macro and an R als with binary outcomes. Biom J 2022;64:419–39.
package for power calculations in stepped wedge cluster random- 48. Li F, Yu H, Rathouz PJ et al. Marginal modeling of cluster-
ized trials. Comput Methods Programs Biomed 2022;213:106522. period means and intraclass correlations in stepped
35. Voldal EC, Hakhu NR, Xia F et al. swCRTdesign: an RPackage wedge designs with binary outcomes. Biostatistics 2021; doi:
for stepped wedge trial design and analysis. Comput Methods 10.1093/biostatistics/kxaa056.
Programs Biomed 2020;196:105514. 49. Kasza J, Forbes AB. Inference for the treatment effect in
36. Xia F, Hughes JP, Voldal EC et al. Power and sample size calcu- multiple-period cluster randomised trials when random effect
lation for stepped-wedge designs with discrete outcomes. Trials correlation structure is misspecified. Stat Methods Med Res
2021;22:598. 2019;28:3112–22.
37. Preisser JS, John S. Preisser—Software. CRTFASTGEEPWR. http:// 50. Rezaei-Darzi E, Kasza J, Forbes A et al. Use of information crite-
www.bios.unc.edu/~preisser/personal/crtfastgeepwr/ (18 December ria for selecting a correlation structure for longitudinal cluster
2021, date last accessed). randomised trials. Clin Trials (In press).

You might also like