Conducting Bayesian-Classical Hybrid Power Analysis With R Package Hybridpower (Parkpek - 2022)

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/357927193

Conducting Bayesian-Classical Hybrid Power Analysis with R Package


Hybridpower

Article  in  Multivariate Behavioral Research · January 2022


DOI: 10.1080/00273171.2022.2038056

CITATIONS READS
3 74

2 authors:

Jolynn Pek Joonsuk Park


The Ohio State University The Ohio State University
65 PUBLICATIONS   3,862 CITATIONS    13 PUBLICATIONS   126 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Pediatric Palliative Care View project

Model selection of cog models View project

All content following this page was uploaded by Jolynn Pek on 19 January 2022.

The user has requested enhancement of the downloaded file.


Running head: CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 1

Conducting Bayesian-classical hybrid power analysis with R package hybridpower

Joonsuk Park
Independent Researcher
and
Jolynn Pek
The Ohio State University

Joonsuk Park, Independent Scholar; Jolynn Pek, Department of Psychology, The Ohio State Uni-
versity

Author note: Correspondence concerning this article and bug reports about hybridpower
should all be addressed to the first author, Joonsuk Park (vergilius85@gmail.com). Parts of this
work have been presented at the Annual Meeting of the Society for Multivariate Experimental
Psychology in Albuquerque, New Mexico by Joonsuk Park.

Accepted for publication at Multivariate Behavioral Research on January 13, 2022.


CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 2

Abstract

There are several approaches to incorporating uncertainty in power analysis. We review

these approaches and highlight the Bayesian-classical hybrid approach that has been imple-

mented in the R package hybridpower. Calculating Bayesian-classical hybrid power cir-

cumvents the problem of local optimality in which calculated power is valid if and only if

the specified inputs are perfectly correct. hybridpower can compute classical and Bayesian-

classical hybrid power for popular testing procedures including the t-test, correlation, simple

linear regression, one-way ANOVA (with equal or unequal variances), and the sign test. Us-

ing several examples, we demonstrate features of hybridpower and illustrate how to elicit

subjective priors, how to determine sample size from the Bayesian-classical approach, and

how this approach is distinct from related methods. hybridpower can conduct power anal-

ysis for the classical approach, and more importantly, the novel Bayesian-classical hybrid

approach that returns more realistic calculations by taking into account local optimality that

the classical approach ignores. For users unfamiliar with R, we provide a limited number of

RShiny applications based on hybridpower to promote the accessibility of this novel ap-

proach to power analysis. We end with a discussion on future developments in hybridpower.


CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 2

Conducting Bayesian-classical hybrid power analysis with R package hybridpower


Power analysis is central to research design and sample size determination (Cohen, 1988). For
any frequentist hypothesis testing procedure (e.g., independent samples t-test), statistical power
(1 − β), Type I error rate (α), sample size (N), true effect size (θ), and nuisance parameters are
functions of one another in a power analysis. The classical perspective to power analysis rests on
varying inputs of these functions to obtain N or power. Sample size for a study can be determined
by selecting a desired level of power (e.g., [1 − β] = .8), specifying α, and selecting values of the
effect size and nuisance parameters. Alternatively, power can be calculated from specifying N, α,
and values of the effect size and nuisance parameters.
Lipsey (1990) emphasized that selecting an appropriate effect size is the most challenging
aspect to power analysis. There are two effect size frameworks for frequentist power analysis
(Anderson, Kelley, & Maxwell, 2017; Du & Wang, 2016). The first framework emphasizes use
of a minimum effect size of interest (Jaeschke, Singer, & Guyatt, 1989; Kraemer & Blasey, 2015)
or the minimum clinically worthwhile difference (Spiegelhalter & Freedman, 1986; Djimeu &
Houndolo, 2016; Maynard & Dong, 2013; Bloom, 1995) that is often used in applied settings
(e.g., public policy). In this framework, effect sizes have meaningful scale whereby researchers
can theoretically determine a size of the effect worth detecting (e.g., 20% reduction in alcohol
use). Because the minimum effect size of interest is a fixed threshold, this approach need not con-
sider uncertainty about the unknown population effect size. The second framework, typically
applied in basic science settings in psychology, focuses on an effect size that is expected. In ap-
plied research, this framework could be adopted to find any difference if it exists (Spiegelhalter
& Freedman, 1986). Often, expected effect sizes used in power analysis to plan a new study are
informed by or estimated from data and come with sampling variability. Thus, recent develop-
ments more realistically take into account the sampling variability of estimates entered as in-
puts in power analysis (e.g., see Anderson et al., 2017; McShane & Böckenholt, 2015; McShane,
Böckenholt, & Hansen, 2020; Perugini, Gallucci, & Costantini, 2014).
Instead of employing an estimated effect size, θ̂, Pek and Park (2019) introduce a Bayesian-
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 3

classical hybrid approach to power analysis where a best guess effect size is elicited from subjec-
tive expert opinion to constrain uncertainty about the unknown effect size (see also Chen, Fraser,
& Cuddeback, 2018; Spiegelhalter & Freedman, 1986; Weiss, 1997; Wang & Gelfand, 2002).
The hybrid approach blends the Bayesian concept of quantifying uncertainty in beliefs about θ
with frequentist significance testing procedures, resulting in a hybrid framework that is neither
purely frequentist or Bayesian. This method was developed to formalize how researchers ap-
proach power analysis when there is little information about θ. Without knowing the true value
of θ, researchers often use different values of θ to calculate different values of power for some
N to address the problem of local optimality (i.e., the calculations are accurate when the inputs
are exactly correct). By representing subjective uncertainty about θ from a set of plausible values
with a distribution, π(θ), researchers can more easily make design decisions from a distribution
of power values calculated from the hybrid approach compared to a set of power values calculated
from the classical approach that does not formally communicate uncertainty.
The purpose of this paper is to illustrate the advantage of conducting Bayesian-classical hy-
brid power analysis compared to related approaches using the R package hybridpower.1 Some
functionalities of hybridpower can also be executed via RShiny, which allows users unfamil-
iar with R to make use of the functions in hybridpower through a web application.2 The pack-
age currently supports popular null hypothesis significance testing (NHST) procedures including
t-tests, linear models (i.e., ANOVA and simple linear regression), correlations, and categorical
data analysis (e.g., chi-square test of independence). The package also provides new function-
alities that have not been available in traditional power analysis tools, including power analysis
procedures for non-parametric tests such as the sign test and Monte Carlo power analysis for de-
1 The R package hybpridpower can be downloaded from GitHub at https://github.com/JoonsukPark/
RHybrid. Every code snippet in this article can be found in the file examples.R under the R folder at the package
root, which is fully reproducible and contains other examples. Also, examples.R in the same folder provides exam-
ple code for other statistical tests that are not described in this article.
2 There are currently four RShiny applications that implement the Bayesian-classical hybrid approach for the inde-

pendent samples t-test, the sign test, the fixed-effects one-way ANOVA, and the repeated measures one-way ANOVA.
See https://joonsukpark.shinyapps.io/ttest/, https://joonsukpark.shinyapps.io/signtest/,
https://joonsukpark.shinyapps.io/feanova/, and https://joonsukpark.shinyapps.io/rmanova/, re-
spectively. By default, the t-test and the sign test RShiny applications will reproduce the examples described here.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 4

signs that do not have closed form solutions (e.g., Welch’s ANOVA). In the following sections,
we first review the classical frequentist, Bayesian, and the Bayesian-classical hybrid approach to
power analysis to highlight their conceptual distinction. Next, we describe the usage and structure
of hybridpower, demonstrating features of the package using the independent samples t-test as
an example. Then, we illustrate how to conduct power analysis from the classical and Bayesian-
classical hybrid approaches for the sign test and the one-way ANOVA with unequal variances
(i.e., Welch’s ANOVA). The sign test example demonstrates the capacity of the package to com-
pute power for a non-parametric test, and Welch’s one-way ANOVA highlights the Monte Carlo
(MC) method to power analysis. In these examples, we emphasize how to elicit subjective priors,
how to determine sample size from the distribution of power values that quantify subjective un-
certainty about θ, and how the Bayesian-classical hybrid approach is distinct from related meth-
ods. To conclude, we outline future extensions of hybridpower.
Different Perspectives to Power Analysis
Frequentist Methods
A frequentist NHST approach to power analysis quantifies the long-run performance of a pro-
cedure in rejecting the null hypothesis assuming the alternative is true. The unknown effect size,
θ, is fixed and cannot have its own distribution, i.e., π(θ) cannot be specified. However, when an
estimated effect size, θ̂, is used to inform of the unknown population effect size, θ, this estimate
has sampling variability (quantified by its standard error, S.E.θ̂ ). Frequentist methods that take
into account sampling error incorporate the sampling distribution of θ̂ by invoking a hierarchical
structure (e.g., Anderson et al., 2017; Taylor & Muller, 1995; McShane & Böckenholt, 2015), and
tend to recommend using the expectation (mean) of the resulting distribution of power (and sam-
ple size) values as the target value of interest. Alternatively, Perugini et al. (2014) recommend us-
ing a conservative lower bound of the 60% CI (i.e., .2 quantile) to guard against overly optimistic
values of power or calculated sample size.
Bayesian Methods
Bayesian approaches to power analysis tend to quantify the long-run performance of a Bayesian
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 5

hypothesis testing procedure (e.g., using the Bayes Factor [BF]; Kass & Raftery, 1995). For ex-
ample, in BF design analysis (Schönbrodt & Wagenmakers, 2018; Stefan, Gronau, Schönbrodt,
& Wagenmakers, 2019) researchers quantify the probability of obtaining conclusive evidence for
a given sample size using a BF that directly compares the null against the alternative hypothesis.
Bayesian hypothesis testing procedures are distinct from frequentist counterparts in that they in-
volve an analysis prior that quantifies researchers’ best guess of the unknown population param-
eter θ, which is applied in data analysis. The outcome of Bayesian data analysis is the posterior
distribution, which can be used to quantify the strength of evidence for θ given the data. The anal-
ysis prior is usually highly uninformative to assuage the skeptical audience (Kruschke, 2013; p.
593) such that the posterior distribution reflects the data more so than the prior. In fixed sample
BF design analysis, power analysis involves obtaining the distribution or expected value of BF for
some N, θ, and an uninformative analysis prior. Although the BF is most similar to NHST (e.g.,
see Jeon & De Boeck, 2017), power analysis has also been conceived for Bayesian estimation, in
which inference is made using the posterior distribution (e.g., see Kruschke, 2015, Chapter 13).
Bayesian-Classical Hybrid Methods
Unlike Bayesian approaches that quantify the long-run performance of a purely Bayesian test-
ing procedure, Bayesian-classical hybrid approaches quantify the performance of a frequentist
procedure with the aid of a Bayesian design prior. In a hybrid approach, π(θ) denotes an infor-
mative design prior instead of an analysis prior, which reflects the researcher’s subjective confi-
dence in the value of unknown θ before observing the data (Beavers & Stamey, 2012; O’Hagan
& Stevens, 2001; Wang & Gelfand, 2002; cf. Du & Wang, 2016). In contrast, an analysis prior is
used after the data are collected to test specific hypotheses. The uncertainty represented in π(θ) is
epistemic and due to a lack of knowledge about the value of θ. With full information on the popu-
lation or complete knowledge, π(θ) will converge to a point mass and reproduce calculations from
the classical approach. By incorporating a design prior in classical power analysis, the Bayesian-
classical hybrid approach formally expresses subjective uncertainty about not knowing the true
value of θ, resulting in a distribution of power values. Compare this with the aforementioned fre-
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 6

quentist methods in which the distribution of power estimates reflect uncertainty due to sampling
variability. Combining a statistical decision rule, frequentist or Bayesian, with prior information
does not violate the principle of Bayesian inference, but is a special case of the standard Bayesian
approach to pre-posterior analysis (Spiegelhalter & Freedman, 1986).
The Bayesian-classical hybrid approach to power analysis incorporates at least two sources of
uncertainty: (a) sampling variability that is expressed under the classical approach with the sam-
pling distribution of the test statistic, and (b) subjective epistemic uncertainty of not knowing the
true value of θ that is manifest in a distribution of power values. Note that (a) has typically been
considered in power analysis whereas (b) has not, which is the problem we address via the use
of π(θ). Due to the lack of closed-form solutions, however, Bayesian-classical power is usually
calculated by sampling effect sizes from π(θ), computing power for each sampled value of θ, and
collecting the power estimates to construct a distribution of power. With such a distribution, the
required sample size can be determined from the mean (expected) power value (called assurance
in the Bayesian literature; Chen et al., 2018; O’Hagan & Stevens, 2001; O’Hagan, Stevens, &
Campbell, 2005) or other summary quantities such as the median (.5 quantile), quartiles, or spe-
cific quantiles of the power distribution (e.g,. .2 or .8 quantiles). Following Du and Wang (2016),
we also recommend the use of assurance level to inform the study design. An assurance level
is related to a lower bound of a confidence interval (e.g., see Perugini et al., 2014) or a quantile
value. Specifically, the assurance level in the context of power analysis is defined as the prob-
ability (area under the curve that represents the distribution of power) of obtaining power of at
least some lower bound value. For example, suppose a researcher is interested in a lower bound
of .7 power, then the area of the distribution of power above .7 is the assurance level. If the assur-
ance level is 65%, then there is 65% probability that power values will be greater than or equal
to .7 power.3 In terms of interpretation, the probability carries a subjective meaning because the
distribution of power has been constructed with respect to the design prior. Visualizing the distri-
bution of power to examine its shape and dispersion also informs on the uncertainty about power.
3 For clarity, we denote a power value as a number below 1 and an area under the curve as a percentage.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 7

When researchers consider the variability in power distributions while determining sample size of
a study, they naturally take into account sampling variability and uncertainty about the value of θ
(Pek & Park, 2019).
When to use the hybrid Bayesian-classical approach. The classical approach to power anal-
ysis for frequentist procedures (Cohen, 1988) produces a single power or sample size value that
does not realistically reflect uncertainty in research. Stated differently, the classical approach
suffers from local optimality in that power calculations are accurate only when specified inputs
such as the effect size are perfectly correct. Because effect sizes are unknown, classically calcu-
lated power tends to be suboptimal. Ignoring uncertainty may be justified by using a minimum
effect size of interest, in which strong arguments support the choice of the minimum value of θ.
If an effect size estimate is used as input for power analysis, then methods that incorporate uncer-
tainty due to sampling such as safeguard power (Perugini et al., 2014), the power calibrated effect
size method (McShane & Böckenholt, 2015), or the (biased and) uncertainty corrected approach
(Anderson et al., 2017) should be employed. These approaches quantify uncertainty due to sam-
pling variability of the estimate, θ̂.
When frequentist procedures are planned for data analysis but little is known about the effect
size θ, we recommend the Bayesian-classical hybrid approach. This approach uses a Bayesian
design prior to take into account a lack of knowledge or uncertainty about the value of θ and is
distinct from the aforementioned frequentist methods because the uncertainty in the distribution
of power values (and determined N) reflects subjective uncertainty absent of data. By formalizing
uncertainty with a design prior, the Bayesian-classical approach addresses the limitation of local
optimality inherent in the classical approach by generating a distribution of plausible power val-
ues associated with the distribution of plausible effect sizes. There is an additional practical ben-
efit of using the Bayesian-classical hybrid approach; by employing a design prior, recommended
sample sizes from large effect sizes are blended with those from small effect sizes, which tends to
yield a smaller required sample size than simply using a minimum effect size or picking a single
value from a set of calculations from a series of plausible θ values. With the Bayesian-classical
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 8

approach to power analysis, the researcher can thus design a study that can hedge against uncer-
tainty about the unknown value of the effect size.
The Bayesian-classical hybrid method is distinct from a Bayesian approach (e.g., BF design
analysis). The Bayesian approach assumes that the researcher will engage in a Bayesian hypoth-
esis testing procedure whereas the Bayesian-classical hybrid method is mostly frequentist in that
it uses Bayesian design priors merely as a tool to take into account subjective uncertainty about
the size of the unknown effect. When data have been collected, subsequent analyses following
from the Bayesian-classical power analysis are frequentist in which analysis priors are unneces-
sary. In contrast, Bayesian approaches require specification of an analysis prior for data analysis.
Thus, the Bayesian-classical method is for frequentist testing whereas a Bayesian approach is
used when BFs or the posterior distribution (which require analysis priors) are computed for in-
ference.
Computing Power with hybridpower
Structure and Usage of the Package
hybridpower can compute classical power as well as Bayesian-classical hybrid power for the
same testing procedure. The power analysis procedures within hybridpower, their acceptable
priors for the Bayesian-classical hybrid approach, and their required inputs are summarized in
Tables 1 and 2, respectively. The package can also summarize and visualize distributions of power
estimates from the Bayesian-classical hybrid approach that we describe below.
There are two steps to calculate power with procedures in hybridpower. First (Step 1), a
store of inputs for a power analysis procedure needs to be created.4 For example, a store of in-
puts can be specified with α = .03, N = 25 and d = .8 for hp ttest, the procedure for computing
power for a t-test in hybridpower. For the same type of test, the necessary inputs are different
between the classical and the Bayesian-classical approaches, although they can be put in the same
store (illustrated in a later example). Second (Step 2), a power calculation routine is called and the
power values are generated.5 For example, suppose that there is a store of input for a power calcu-
4 Such a store of inputs is called an instance in hybridpower and computer science in general.
5 Such a routine is called a method in hybridpower. Methods are almost identical to functions except that meth-
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 9

lation, x, that contains input values common to the classical and the hybrid approaches (see Table
1) as well inputs specific to each approach (see Table 2). The user can then compute either clas-
sical power by running x$classical power(), or Bayesian-Classical hybrid power by running
x$hybrid power(). In the sections to follow, we demonstrate three different power calculation
processes available in hybridpower for the simple independent samples t-test, namely classical,
Bayesian-classical hybrid, and both.
Classical power. Consider the simplest frequentist example of calculating power absent of
uncertainty about the effect size for an independent samples t-test. In hybridpower, all the three
types of t-tests (i.e., one-sample, paired samples, and independent samples) can be computed
with a single store of inputs. Let such a store be called x classical. Suppose that a visual cog-
nitive researcher is interested in whether objects grouped together versus objects grouped apart
would create differences in short-term memory in the form of recall errors in a new paradigm (cf.
Corbett, 2017). In particular, she expects to find a difference of .5 in the Cohen’s d scale between
the two groups (objects close together vs. objects spaced apart) and creates a store with the fol-
lowing code:

x_classical <- hp_ttest$new(


ns = seq(10, 90, 10),
design = ’two.sample’,
d = .5
)

In the code snippet above, new() is the function to call when creating a new store for power anal-
ysis. Note that the function is attached to the power analysis procedure itself, hp ttest. In terms
of arguments, ns is a list of sample sizes in each group starting from N = 10 and ending with N =
90 in increments of 10.6 Next, design is the type of experimental design, one.sample, paired,
or two.sample (for the indepdent samples design). Finally, d is the input effect size in the scale
ods are always attached to another variable with $ in R.
6 The current implementation only supports balanced designs, and we expect future implementations to include

imbalanced designs.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 10

of Cohen’s d. hybridpower only takes standardized effect sizes (i.e., d) as input for classical
power analysis. Although not set explicitly, the default alternative hypothesis is two.sided and
the default Type I error rate is .05 (see Table 2). These defaults can be overridden with explicit
specifications (e.g., alpha = .005, alt = ’one.sided’). Finally, classical power is computed
(Step 2) by running the following command:

x_classical$classical_power()

The respective power values for each sample size between 10 and 90 are:

[1] .1838375 .3377084 .4778410 .5981316 .6968888 .7752644 .8358218


[8] .8816023 .9155872

These values closely match the power values calculated using the pwr package (Champely,
2018) or values from power tables in Cohen (1988). In general, classical power() can be used
to compute classical power for other procedures listed in Table 1. These power values will also be
consistent with the minimum effect size approach if the minimum effect size of interest is indeed
d = .5. As a reasonable design, the researcher could plan to collect N = 60 in each group, or a
total N of 120, for 77.5% power to detect d = .5.
Bayesian-classical hybrid power. Unlike the classical approach, the Bayesian-classical hy-
brid approach to power requires the researcher to pick a design prior for the unknown effect size.
Because the hybrid approach computes power via Monte Carlo simulations, this approach re-
quires additional inputs such as number of draws from the design prior (n prior), and the func-
tional form of the prior distribution π(·) (prior) and its parameters. A list of supported prior dis-
tributions is provided in Table 1. The process of creating a prior distribution based on subject
matter experts’ judgments is called prior elicitation. There are several prior elicitation meth-
ods that can be readily implemented (e.g., Zondervan-Zwijnenburg, Peeters, Depaoli, & Van de
Schoot, 2017; see also O’Hagan et al., 2006). For example, one can use the web-based tool called
MATCH, which is accessible at http://optics.eee.nottingham.ac.uk/match/uncertainty
.php (Morris, Oakley, & Crowe, 2014). The website supports a number of different prior elicita-
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 11

tion methods and we showcase the quartile method (Raiffa, 1968). Readers are strongly encour-
aged to explore other methods.
In the quartile method, the researcher decides on a median value for the effect of interest.
Next, the researcher chooses values for the first and third quartiles (Q1 and Q3) of the prior dis-
tribution (i.e., .25 and .75 quantiles) to quantify the extent of uncertainty about the median value.
Parameter values of π(θ) would then be calculated by the online application for a specified fam-
ily of distributions (e.g., normal) that best matches the three elicited numbers. Suppose that the
vision researcher chooses the median in Cohen’s d scale to be .5, Q1 = .2 and Q3 = .8, respec-
tively. These values imply that she is 50% confident that the unknown effect would lie between
.2 ≤ d ≤ .8. By setting the upper and lower limits of the prior to −3 and 3 (arbitrary boundary val-
ues of a z-distribution underlying Cohen’s d), the normal prior would have µd = .5 and σd = .44.7
These µd and σd values are associated with the normal (prior) distribution that most closely has
.2, .5 and .8 as its .25, .5, and .75 quantiles.
Extending the classical power analysis to the Bayesian-classical hybrid approach in our exam-
ple, the vision researcher specifies the mean of the normal prior distribution as .5, and the stan-
dard deviation of the prior distribution as .44. The researcher continues to believe that d = .5 is
the expected (mean) value of the effect size, but now is 95% confident that d will fall between
.5 ± 2(.44). In hybridpower, prior mu and prior sigma stand for prior mean and prior standard
deviation. It is unnecessary to specify d here because it stands for a fixed effect size, which is only
relevant to the classical approach to power analysis. 8

Similar to computing classical power, two steps are required to compute Bayesian-classical
hybrid power. First, the researcher defines input values in a store called x hybrid with the follow-
ing code (Step 1):
7 The prior can be specified to follow other distributions, and we provide a more detailed discussion on choosing
the form of the prior distribution in the example on the sign test.
8 Unlike the classical approach, where the effect size can be specified only in the scale of d within hybridpower,

the Bayesian-classical hybrid approach can also take unstandardized effect sizes when the standard deviation of
the outcome, sigma, is specified. By default, sigma (the standard deviation of the population) is set to 1 such that
the scale of the mean is in Cohen’s d. Note that prior sigma is distinct from sigma; prior sigma pertains to un-
certainty about the unknown mean whereas sigma pertains to the expected variation in the actual data. Thus, they
represent different types of uncertainties.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 12

x_hybrid <- hp_ttest$new(


ns = seq(10, 90, 10),
design = ’two.sample’,
n_prior = 1000,
prior = ’normal’,
prior_mu = .5,
prior_sigma = .44
)

Bayesian-classical hybrid power is calculated with the next line of code (Step 2):

set.seed(1234)
x_hybrid$hybrid_power()

Executing these lines of code creates a distribution of power for each sample size and shows the
first six rows of the output; set.seed(1234) facilitates reproducibility although results might
differ slightly depending on the operating system of the computer. Calculated power values are
stored in x hybrid as a data frame, which can be accessed by running x hybrid$output. This
data frame has two columns, n and power, containing the sample sizes and the power values, re-
spectively. In this example, the power distribution for each sample size N is made up of 1,000 val-
ues as declared in n prior = 1000. Thus, the data frame will have 9 levels of N, as determined
by ns = seq(10, 90, 10), each with 1,000 estimates of power, resulting in a data frame with
9,000 rows in total.
The distributions of power can be summarized by their respective means. By calling assurance()
attached to the store of inputs, the mean power values with respect to the prior of the effect size
for each N will be retuned. That is, one can execute the following line to calculate assurances:

x_hybrid$assurance()

The assurances associated with N = 10, 20, 30, 40, 50, 60, 70, 80, and 90 are .257, .407, .498, .560,
.604, .638, .665, .686 and .704, respectively. If the researcher sought a mean power of .7, she
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 13

would plan a study with N = 90 per group, or a total of N = 180. Note that this assurance value is
much lower than the corresponding power for N = 90 calculated in the classical approach, .916,
which is due to the uncertainty about the effect size. Also, because a distribution of power values
tends to be skewed, we recommend examining quantiles of this distribution. Quantiles can be ob-
tained by specifying values in quantiles while creating the store of inputs. By default, the quan-
tiles are set to the five-number summary (Tukey, 1977), namely the minimum (0th percentile),
the first quartile, the median, the third quartile, and the maximum or 100th percentile. These de-
faults can be overriden by explicitly specifying the values in the quantiles argument when cre-
ating the store. For example, the default quantiles = c(0,.25,.5,.75,1) can be modified to
quantiles = c(.1,.3,.5,.7,.9). To calculate the quantiles, power quantiles() is called
with the following code:

x_hybrid$power_quantiles()

Executing this line produces a data frame of the quantiles for the sample sizes. Note that
hybrid power() has to be executed beforehand because power quantiles() utilizes the power
values located in the output variable store. The default quantiles (i.e., five number summary) for
each sample size in this example are:

# A tibble: 9 x 6
# Groups: n [9]
n ‘0‘ ‘.25‘ ‘.5‘ ‘.75‘ ‘1‘
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10 .0250 .0790 .182 .375 .981
2 20 .0250 .122 .335 .666 1.00
3 30 .0250 .165 .474 .840 1.00
4 40 .0250 .207 .594 .928 1.00
5 50 .0250 .248 .693 .970 1.00
6 60 .0250 .289 .771 .988 1
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 14

7 70 .0250 .330 .832 .995 1


8 80 .0250 .369 .878 .998 1
9 90 .0250 .407 .913 .999 1

Based on these outputs, the vision researcher may choose N = 60 for each group or a total of
N = 120 with 50% confidence that power will lie in the range [.289, .988] with a median power
of .771. If the researcher wishes to obtain exactly .8 median power, she can make use of interpo-
lation to obtain a rough calculation for sample size. For example, N = 60 and N = 70 give the
closest median power to .8. Using the two associated median power values and assuming a lin-
ear relationship between median power and N, she can calculate that a one unit increment of N is
(.832−.771)
associated with a 10 = 0.0061 increase in median power. Thus, a median power of approx-
imately .8 would be expected with a sample size of N ≈ 64. Alternatively, the researcher can use
a finer grid for N using hybridpower. For example, she can set the sample sizes from N = 61 to
N = 69 with an increment of 1 and compute distributions of power to get a more precise estimate.
Here, N = 65 is associated with a mean power of .8.
Instead of focusing on the median value of power for designing a study, executing x hybrid$summary()
will summarize the distributions of power with the assurance, the standard deviation, and the five-
number summary:

# A tibble: 9 x 8
n mean sd min Q1 median Q3 max
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10 0.257 0.224 0.0250 0.0790 0.182 0.375 0.981
2 20 0.407 0.312 0.0250 0.122 0.335 0.666 1.00
3 30 0.498 0.344 0.0250 0.165 0.474 0.840 1.00
4 40 0.560 0.356 0.0250 0.207 0.594 0.928 1.00
5 50 0.604 0.361 0.0250 0.248 0.693 0.970 1.00
6 60 0.638 0.361 0.0250 0.289 0.771 0.988 1
7 70 0.665 0.360 0.0250 0.330 0.832 0.995 1
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 15

8 80 0.686 0.358 0.0250 0.369 0.878 0.998 1


9 90 0.704 0.355 0.0250 0.407 0.913 0.999 1

To visualize power distributions, boxplot() can be called to generate boxplots for each N,
which utilizes power values stored in output. Thus, hybrid power() also needs to be run before
calling boxplot(). The code below generates the box plots in Figure 1.

x_hybrid$boxplot()

In Figure 1, the edges of the boxes communicate the Q1 and Q3, the bar in the center of the
box the median, and the diamond in the box the mean power (assurance). By convention, the
whiskers extend to 1.5×IQR where IQR is the inter-quartile range (Q3 − Q1). Outliers beyond
the whiskers are represented as dots.
Because Bayesian-classical hybrid power calculations require random draws to be made from
the prior π(θ), calculated values will vary slightly between different runs unless a particular seed
is specified with set.seed(). In our example, set.seed(1234) will reproduce the reported val-
ues. The R Shiny application, which can be accessed at https://joonsukpark.shinyapps.io/
ttest/, can reproduce these results when “Yes” is selected for “Reproduce the example in the
article?”
Beyond assurance and quantiles as summary statistics of the distribution of power, Du and
Wang (2016) forwarded the concept of assurance level. Recall that assurance levels are quantified
by the area under the distribution of power curve (i.e., a Bayesian probability with respect to the
design prior) above a lower bound value of power. Suppose that 50% of the distribution of power
lies above the lower bound power value of .65; the assurance level for a power value of .65 is then
said to be 50%. To obtain assurance levels, the lower bound value of power is specified by setting
the variable in the store, assurance level lb, to a specific value. Calling assurance levels()
after executing hybrid power() will return assurance levels. We illustrate assurance levels be-
low where we compute both classical and Bayesian-classical hybrid power with a single store of
inputs.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 16

Classical and Bayesian-classical power. The following code is an example of using a single
store of inputs to compute both classical and Bayesian-classical power:

x_both <- hp_ttest$new(


ns = seq(10, 90, 10),
d = .5,
design = ’two.sample’,
n_prior = 1000,
prior = ’normal’,
prior_mu = .5,
prior_sigma = .44,
assurance_level_props = c(.5, .8),
quantiles = c(0, .2, .5, .7, 1)
)

Using the store x both, classical and Bayesian-classical hybrid power values can be obtained by
calling classical power() and hybrid power(), respectively (Step 2). The functions imple-
mented to summarize the distribution of power values pertain only to Bayesian-classical hybrid
power, and can be used by running the following commands:

set.seed(1234)
x_both$hybrid_power()
x_both$assurance()
x_both$power_quantiles()
x_both$boxplot()
x_both$assurance_level()

The last line above will provide the assurance levels associated with the specified lower bounds to
power of .5 and .8 with output:

# A tibble: 9 x 3
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 17

# Groups: n [9]
n ‘.5‘ ‘.8‘
<fct> <dbl> <dbl>
1 10 .155 .029
2 20 .374 .167
3 30 .482 .284
4 40 .559 .375
5 50 .604 .434
6 60 .636 .482
7 70 .67 .523
8 80 .69 .558
9 90 .706 .577

For N = 10, only 15.5% and 2.9% of the simulated power values (assurance levels) are at or above
.5 and .8 power, respectively (first row of the output above). For N = 50, however, about 60.4%
and 43.4% of the simulated power values are greater than or equal to .5 and .8 power, respectively
(fifth row of the output above). If the vision researcher plans to ensure that at least 50% of power
values will be greater than .8, she would choose N = 70 per group or a total of N = 140 for her
design. At the expense of more resources at N = 90 per group (total N = 180), the assurance level
is 57.7% for a lower bound power value of .8.
Next, we highlight other procedures implemented in hybridpower. The example on the sign
test illustrates how to conduct a Bayesian-classical power analysis for a non-parametric test while
providing another example of how to elicit a design prior. The final example on Welch’s one-
way ANOVA illustrates the Monte Carlo (MC) approach to power analysis and the capacity of
the package to utilize parallel processing when closed-form solutions are unavailable.
Other Examples
Sign test. The sign test is a non-parametric procedure that tests whether the population suc-
cess probability in a Bernoulli or binomial experiment is equal to some hypothesized value. A
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 18

public health researcher plans to conduct a study examining which brand of vaccine (Pfizer vs.
Moderna) is preferred among patients seeking to get vaccinated. His null hypothesis H0 : p = .5
indicates no preference for one vaccine over the other. Because the null proportion is specified
with the variable p 0 in a store of inputs, hp sign, he needs to specify p 0 = .5. Additionally,
the researcher expects that patients prefer Moderna over Pfizer 30% of the time. This alternative
hypothesis can be specified by setting p 1 = .3 in the store of inputs, hp sign.
The sign test can also be used to test whether the population median (or some other quantile
of the population distribution) is different from a specified population value. The median is evalu-
ated with the sign test when observations above the median (.5 quantile) are coded as ‘positive’
events, and observations below the median are coded as ‘negative’ events. In this context, the
population median or a quantile to be rejected is specified with p 0 and the alternative is speci-
fied with p 1. For example, the public health researcher is interested in the median time it takes to
administer a vaccine to children aged 12 to 17. From observations in the city, it took 24 minutes
to administer a vaccine to children in this age group. The researcher expects it to take longer to
administer a vaccine in a rural neighborhood and tests H0 : m = 24 (H0 : p = .5) against the alter-
native H1 : m > 24 (H1 : p > .5). Here, times that took less than or equal to 24 minutes would be
coded “no” whereas times that look longer than 24 minutes would be coded “yes.”
The sign test can also be applied to paired data, where the proportion of differences between
each pair is tested against some population value. For example, let a positive event be a positive
difference score on an outcome between pre- and post-treatment. Then, a negative event is a neg-
ative difference score between pre- and post-treatment. Thus, the sign test can test whether the
probability of a positive event in the population is different from a particular probability under
the null (p 0). In any case, the sign test is considered non-parametric because it does not make
explicit distributional assumptions about the population.
Classical power. Under the classical perspective to power analysis, the necessary inputs are
N, α, sidedness of the test (i.e., one or two-sided), p 0 and p 1. Note that p 0 is a common input
to both the classical and the hybrid procedures whereas p 1 is a fixed parameter value under the
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 19

alternative, thus only being relevant in case of classical power analysis. By default, p 0 = .5 in
hybridpower. The public health researcher who wants to examine preference for the Pfizer over
the Moderna vaccine would set up a store of inputs (Step 1) as follows:

x_classical <- hp_sign$new(


ns = seq(50, 150, 10),
p_0 = .5,
p_1 = .3
)

The null hypothesis is set to 50% probability implying no difference whereas the researcher ex-
pects only 30% of the population to prefer Moderna over Pfizer under the alternative hypoth-
esis. When the Type I error rate and form of the alternative is left unspecified, the defaults are
alpha = .05 and alt = ’two.sided’ (see Table 2). Classical power is then calculated (Step 2)
by executing the following code

x_classical$classical_power()

which returns power values for sample sizes of N = 50, 60, 70, 80, 90, 100, 110, 120, 130, 140 and
150 as follows:

[1] .8283261 .8929041 .9348586 .9612269 .9773521 .9869883


[7] .9926336 .9958842 .9977276 .9987588 .9993287

If the public health researcher is targeting a design with about .9 power, he would plan to recruit
roughly N = 60 patients. Or he could use interpolation as described above, which gives N = 62 as
the minimum samples size.
Bayesian-classical power. Under the Bayesian-classical perspective, the prior for p 1 requires
specification. Similar to the t-test, the null hypothesis for the sign test (p 0) is treated as fixed.
There are three possible prior distributions to choose from for p 1, which are beta, uniform and
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 20

truncnorm (see Table 1). These priors constrain the range of p 1 between 0 and 1 because p 1 as
a probability is by definition bounded by 0 and 1.
Ideally, choice of the design prior should depend on the nature of the subjective uncertainty
about p 1 that the researcher has in mind. For example, if the public health researcher employs a
uniform prior, he believes that all the values within the range of the uniform distribution (i.e., be-
tween prior lower and prior upper which are stored as inputs) are equally likely. That is, the
preference for Moderna over Pfizer is equally likely for say .2, .3, and .4 if the uniform distribu-
tion is Uniform(.15, .45). The uniform distribution could also represent the researcher’s lack of
prior knowledge about the form of the uncertainty about the parameter. That is, the Uniform(.15,
.45) design prior formalizes the belief that the researcher does not have any information to favor
any value within [.15, .45] over others as the true proportion. On the other hand, if the researcher
believes that certain effect size values are more likely than others, a Beta or a truncated normal
prior distribution would be more appropriate. The Beta distribution is bounded between 0 and 1,
and different specifications of the two shape parameters (prior a and prior b) will change the
form of the design prior. Compared to the uniform distribution, the Beta distribution is usually
more peaked; and unlike the normal or t-distribution the Beta distribution is bounded in [0,1].
Users who prefer fat-tailed prior distributions could also consider the truncated normal distribu-
tion. Although similar to a normal distribution, the truncated normal limits the range of p 1 to a
support constrained within the bounds specified by prior lower and prior upper.
Figure 2 presents prior distributions from the uniform, Beta, and truncated normal distribu-
tions that the public health researcher might consider in a study on vaccine preference. The Uni-
form(.15, .45) distribution, represented by the horizonal solid lines, implies that the public health
researcher believes that the probability of interest lies between .15 and .45 with equal probabil-
ity and there is a strong belief that the probability will not be lower that .15 or higher than .45.
The Beta(3,5) prior distribution, represented by the dark grey shaded area, implies that there is
nonzero probability for the effect size value to be from 0 to 1; the mean of this distribution is
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 21

.375, 9 and the IQR is [.25, .49]. Finally, the truncated normal distribution with µ p = .3, σ p = .15,
and bounds of 0 and 1, represented by the light grey distribution, implies that the researcher be-
lieves that the probability is most likely .3, with decreasing strength of belief moving away from
this value. The tail of this truncated normal distribution is truncated at 0 but goes all the way out
to 1. Unlike our first t-test example, there currently is no off-the-shelf tool for determining param-
eter values of these design priors. Researchers are advised to visualize competing distributions
with different parameters to ensure that the shape of the design prior matches subjective intuition
about plausible parameter values (see also Depaoli & Van de Schoot, 2017).
The public health researcher uses the following code to create a uniform prior (Step 1) for
computing Bayesian-classical hybrid power of a sign test:

x_hybrid <- hp_sign$new(


ns = seq(50, 150, 10),
p_0 = .5,
prior = ’uniform’,
prior_lower = .15,
prior_upper = .45,
n_prior = 5000,
assurance_level_props = c(.5, .8)
)

The uniform prior is specified with prior = ’uniform’, and the limits are specified by prior lower
= .15 and prior upper = .45, respectively. Finally, n prior = 5000 determines the number
of MC draws from the prior distribution used in calculating Bayesian-classical hybrid power. Exe-
cuting the next two lines of code below will compute power (Step 2) under the Bayesian-classical
hybrid approach and calculate assurance for each N, respectively.

set.seed(1234)
9 The a 3
mean of a Beta distribution, Beta(a, b), is given by a+b . For Beta(3,5), the mean is equal to 3+5 = .375.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 22

x_hybrid$hybrid_power()
x_hybrid$assurance()

To reproduce our results, set.seed(1234) needs to be run before creating the store of inputs
and calling the functions to compute power as shown above. The assurance or mean power values
for N = 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, and 150 are .702, .739, .769, .792, .811, .828,
.841, .854, .864, .873, and .882, respectively; the calculated power distributions by N can be vi-
sualized as box plots with x hybrid$boxplot() (see Figure 3). Although not shown here, the
specific quantiles as well as the assurance levels can also be obtained from the computed power
distributions by calling power quantiles() and assurance level(), respectively. If the pub-
lic health researcher intends to have a mean power of .8 under the uncertainty represented by the
Uniform(.15, .45) design prior, he would choose to collect roughly N = 90 participants. Alterna-
tively, he could be more conservative and base his design on assurance levels, choosing to collect
N = 130 to ensure that, with at least 75% subjective probability, the power value is greater than or
equal to .8.
Welch’s one-way ANOVA. In this final example, we highlight the use of MC simulations to
compute classical and Bayesian-classical hybrid power. The MC approach is useful when closed-
form power formulas are unavailable. Because the MC approach is computationally intensive,
this approach is not implemented in RShiny but only in the R package hybridpower. Within
hybridpower, the MC approach is implemented for the following procedures without closed
form solutions to compute power: Welch’s t-test, Welch’s one-way ANOVA, and Fisher’s exact
test. Below, we provide example code for Welch’s one-way ANOVA (i.e., one-way ANOVA with
unequal variances for three groups). With unequal variances across the groups, the degrees of
freedom require adjustment, resulting in a procedure that lacks a closed form solution for power
computation. The MC method thus simulates a sufficiently large number of hypothetical data sets,
conducts a significance test on each data set, and aggregates the results to yield a simulated esti-
mate of power.
Depending on the nature of the target procedure, hybridpower automatically determines
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 23

when to engage the MC approach to empirically estimate power. The variable n MC within a store
of inputs specifies the number of draws from the population to generate simulated data. Note that
n MC specifies the number of draws from the population whereas n prior the number of draws
from the design prior, π(θ). By default, n MC = 1000 to ensure reasonable accuracy; the user
should increase n MC to obtain more precise power estimates (e.g., n MC = 5000).
Classical power. Welch’s one-way ANOVA relaxes the assumption of equal group variances
in the usual one-way ANOVA procedure. Suppose that a team of clinical developmental psychol-
ogists was interested in the math performance (measured from 0 to 4 points) of three groups of
children: controls, austistic, and impulsive. Because there is much more performance variabil-
ity in austistic and impulsive children, they plan to employ a Welch test. The team can specify
different population standard deviations within a store of inputs created from hp oneway anova
with the variable sigma. When sigma has a single element or multiple elements with the same
value, hybridpower assumes that all the treatment conditions have equal variances and will com-
pute power using a closed-form solution. However, when sigma contains multiple elements with
different values, hybridpower defaults to computing power for Welch ANOVA by engaging the
MC routine. Parallel processing is used by default where multiple cores on the user’s machine are
used to conduct the MC simulation. The user can, however, turn off the parallel processing feature
by specifying parallel=F.
hybridpower computes power for Welch’s one-way ANOVA by (a) simulating data for each
treatment condition, (b) conducting Welch’s omnibus F-test of equal means for each data set, and
finally (c) returning the proportion of simulated tests where the null hypothesis of equal means is
rejected. Specifically, H0 : µ1 = µ2 = · · · = µK , where k = 1, · · · , K indexes each group for a total
of K groups. Note that the F-test is inherently one-sided, which means that alt=’one.sided’
by default for this procedure. The following code creates a store of inputs required to compute
classical power for Welch’s one-way ANOVA across three treatment groups (Step 1):

x_classical_welch <- hp_oneway_anova$new(


ns = seq(40, 120, 20),
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 24

mu = c(2.2, 2.5, 2.0),


sigma = c(1.0, 1.1, 1.2),
design = ’fe’,
n_MC = 1000,
seed = 1234
)

The line ns = seq(40, 120, 20) specifies each group’s sample size to be n = 40, 60, 80, 100,
and 120. The population means of the three groups are specified in the line mu = c(2.2, 2.5,
2.0) under the alternative for the controls, autistic, and impulsive groups, respectively. The popu-
lation standard deviations of these groups are 1.0, 1.1 and 1.2, respectively, as specified in sigma = c(1.0,
1.1, 1.2). The next line design = ’fe’ indicates that the design is a fixed effects (i.e., between-
subjects) one-way ANOVA. hybridpower has the capability of computing power for a repeated
measures (i.e., within-subjects) one-way ANOVA (see Table 3 and online vignettes for example
code).10 The last line, seed = 1234 is required to reproduce these results.
To calculate classical power values,the following code should be executed (Step 2):

x_classical_welch$classical_power()

Power values for group sizes with N = 40, 60, 80, 100, and 120 (i.e., total N = 120, 180, 240, 300,
and 360) are .380, .553, .709, .801 and .878, respectively. Based on this classical power analysis,
if the research team aims for a power value of approximately .8, they would plan to recruit N =
100 children from each group, resulting in a total N = 300.
Bayesian-classical power. The computation of Bayesian-classical power using the MC method
is much more computationally intensive than the classical approach because values are first drawn
from the design prior, and then samples are drawn from a population defined by unique effect
sizes drawn from the prior distribution. The execution time to compute power is a function of
the product of the two numbers of draws in n MC and n prior. For example, if n MC = 1000 and
10 Example
code and vignettes of other designs that have been implemented in hybridpower is available at
https://github.com/JoonsukPark/RHybrid.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 25

n prior = 1000, power will be calculated from 1, 0002 simulated data sets. Pek and Park (2019)
report that setting n prior = 1000 and n MC = 10000 returned similar results to n prior = 10000
and n MC = 10000. To determine whether the number of draws from the prior and the population
are reasonable, sensitivity analyses can be conducted to compare the quantiles of power distribu-
tions based on different combinations of n MC and n prior.
To compute Bayesian-classical hybrid power for Welch’s one-way ANOVA, priors for each
group mean have to be specified. The clinical developmental team can use the code below to de-
fine a store of inputs in hp oneway anova (Step 1). Here, they specify normal priors for each
group mean and choose normal priors for the group means (cf. example on the t-test) because
meta-analytic work on math ability scores within different groups of children tend to follow a nor-
mal distribution. With alternative and proper justifications, other types of priors (e.g., uniform)
can be specified.

x_hybrid_welch <- hp_oneway_anova$new(


ns = seq(40, 120, 20),
prior_mu = c(2.2, 2.5, 2.0),
prior_sigma = c(.1, .2, .15),
sigma = c(1.0, 1.1, 1.2),
design = ’fe’,
prior = ’normal’,
n_prior = 1000,
n_MC = 1000,
seed = 1234
)

The code above that creates a store of inputs for hp oneway anova is similar to the code for clas-
sical power analysis except for several lines. In place of mu, the Bayesian-classical approach is
invoked by specifying prior mu and prior sigma while declaring prior = ’normal’. Re-
call that prior sigma is a formalization of the epistemic uncertainty about the unknown group
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 26

means, which is different from the within-group standard deviation, sigma. The research team
specified these priors to formalize their belief that the most likely (subjective) a priori values
of the unknown group means on math performance scores are 2.2, 2.5 and 2.0, for the control,
autistic, and impulsive groups, respectively. The prior standard deviations, quantifying the ex-
tent of uncertainty about these group mean values are .1, .2 and .15, respectively, as denoted by
prior sigma = c(.1, .2, .15). The researchers have twice the uncertainty (in standard de-
viation units) about the mean math ability score of the autistic children relative to the normal
controls, and their uncertainty about the mean math ability score of the impulsive children lies
between the normal and autistic groups. With a normal prior, the web-based tool (MATCH) can
be employed to elicit parameters for the three normal priors.
The researchers can execute the following two lines of the code (Step 2) to compute Bayesian-
classical hybrid power and return assurances of the power distributions:

x_hybrid_welch$hybrid_power()
x_hybrid_welch$assurance()

With n prior = 1000 and n MC = 1000, the time taken to calculate Bayesian-classical hybrid
power on a system with an Apple M1 processor with 8GB of memory was about 24 minutes, us-
ing 7 out of the 8 cores. For each group’s sample size N = 40, 60, 80, 100, and 120, the calcu-
lated assurances were .46, .60, .68, .74, and .76, respectively. Using a target mean power value of
.7, the research team would then plan to recruit N = 100 per group or a total of N = 300 chil-
dren. Boxplots of the calculated power distributions by group N can be visualized by calling
boxplot() with the code x hybrid welch$boxplot() (see Figure 4). Additionally, quantiles of
each distrbution of power values and assurance levels can be returned by calling power quantiles()
and assurance levels(), respectively. If the research team designs the study using assurance
levels, they would choose to collect N = 120 in each group (total N = 360) to ensure that at least
60% of plausible power values are at or higher than a lower bound of .8 power.
Discussion
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 27

Power analysis continues to be important in designing studies and we demonstrate how to


determine sample sizes from the application of the Bayesian-classical hybrid approach (Pek &
Park, 2019) by introducing hybridpower. The Bayesian-classical approach is unique in that it
formalizes subjective uncertainty about the unknown effect size in the context of limited infor-
mation to calculate power for frequentist procedures. From the Bayesian-classical method, re-
searchers can determine N with assurance, quantile values (cf. safeguard power), and assurance
levels. These different approaches vary in levels of tolerance (higher to lower) toward (subjec-
tive) epistemtic uncertainty in calculated power. hybridpower allows users to compute classical
and Bayesian-classical power for several popular procedures including the sign test, the test of
equality for proportions, the chi-square test of goodness-of-fit, the t-test, and one-way ANOVA
for equal/unequal variances. Additionally, hybridpower can compute power using Monte Carlo
methodology such that power estimates need not rely on closed form formulas, allowing for the
computation of power for procedures such as Fisher’s exact test and Welch’s ANOVA.
Additional features in hybridpower are described in vignettes that include example code for
other procedures that are available at the first author’s Github page. As indicated in Tables 1 and
2, hybridpower can compute classical and Bayesian-classical hybrid power for a wide variety of
procedures. Although we have provided the option to compute Bayesian-classical hybrid power
via RShiny web applications for researchers that are unfamiliar with R (see footnote 2), this on-
line utility has limitations. The RShiny applications can only compute closed form power for the
Bayesian-classical hybrid approach because the Monte Carlo approach is too computationally in-
tensive to handle online. Also, it does not support the full range of testing procedures included in
hybridpower. Hence, we recommend researchers to use hybridpower within the R environment
to compute Bayesian-classical hybrid power for full functional support.
Future Directions
To reflect uncertainty encountered in research, uncertainty should be taken into account in
power analysis. One approach is the Bayesian-classical approach to power analysis that is im-
plemented in hybridpower. hybridpower is an ongoing project and we plan to expand power
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 28

computations to include popular procedures used in the social and behavioral sciences such as
multiple regression, simple mediation, factorial ANOVA, and multivariate procedures. Beyond
expanding the capabilities of hybridpower to include more procedures, we also plan to imple-
ment the MC approach to some procedures with closed formed solutions. The advantage of the
MC approach is the ability to more accurately estimate power for procedures that rely on large
sample theory (e.g., t-test, one-way ANOVA, correlation) for small N. Additionally, the MC ap-
proach can also obtain power estimates for unequal cell sizes (e.g., one-way ANOVA and Welch’s
one-way ANOVA). We also plan to implement an additional method to return Bayesian-classical
hybrid power distributions averaged over several alternative procedures as described in Pek and
Park (2019). This concept of obtaining the average power over tests is motivated by the use of
competing models when analyzing collected data because models are incomplete and mere ap-
proximations of reality (e.g., see Box, 1976; MacCallum, 2003; Rodgers, 2010; Tukey, 1969).
The method implementation of averaged power over procedures has the potential to facilitate the
planning of studies with the a priori intent of fitting competing models and conducting their re-
spective tests to the collected data. It is envisioned that further development and application of
hybridpower can enhance research design by formalizing the incorporation of uncertainty due
to limited knowledge about the unknown effect size θ, sampling variability, and eventually model
approximation. It is our hope that hybridpower facilitates the application of Bayesian-classical
hybrid power analysis in research.

Open Practice Statement


The source code and the pre-packaged versions of hybridpower are publicly available at the first
author’s Github repository, https://github.com/JoonsukPark/RHybrid.
Table 1: List of Classes and Possible Choices of Priors
Procedure Test type normal uniform beta truncnorm dirichlet
hp ttest t-test ✓ ✓
hp oneway anova One-way ANOVA ✓ ✓
hp cor Correlation ✓ ✓ ✓
hp slr Regression ✓ ✓ ✓
hp prop Proportion tests ✓ ✓ ✓
hp sign Sign test ✓ ✓ ✓
hp chisq Chi-square test ✓ ✓ ✓ ✓
mu
Prior mu lower a sigma
a
Parameters sigma upper b lower
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER

upper

Note. trucnorm = truncated normal distribution. Prior parameters’ names should follow the prior prefix to be used.
For example, to specify a value of 0.5 for the mean of the normal distribution prior, the statement is prior mu = 0.5.
29
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 30

Table 2: Common inputs for new for every procedure


Input Type Description Default
parallel boolean Use parallel computing FALSE
ns vector Sample sizes c() (a null vector)
n MC integer Number of MC draws to compute power 100
alpha numeric Level of significance .05
alt character Alternative hypothesis ’two.sided’
quantiles vector Quantile values c(0,.25,.5,.75,1)
assurance level props vector Thresholds for assurance levels none
prior character Functional form of prior distribution none
n prior integer Number of draws from the prior 100
prior mu numeric Prior mean none
prior sigma numeric Prior standard deviation none
prior lower numeric Prior lower bound none
prior upper numeric Prior upper bound none
prior a numeric Prior alpha none
prior b numeric Prior beta none
Note. MC = Monte Carlo. parallel may not work in the Windows OS environment.
Table 3: List of test-specific inputs for classes in hybridpower.
Class Test type Input ES for classical Additional inputs
design (one.sample, two.sample or paired)
hp ttest t-test Cohen’s d (d)
Within-group standard deviation (sigma)
design, fixed-effect (fe) or repeated measures (rm)
Within-group standard deviation (sigma), not
hp oneway anova One-way ANOVA Group means (mu) or f (f2)2 necessary if f 2 is provided
Correlation among repeated measures (rho)
Sphericity (epsilon)
hp cor Bivariate correlation Pearson correlation (rho) None
hp slr Simple linear regression R-squared (r2) None
Proportions under H0 (p 0)
hp prop Proportion test Proportions (p 1)
design (one.sample, two.sample or paired)
hp sign Sign test (one sample) Proportion (p 1) Population proportion under H0 (p 0)
hp chisq Chi-square test Proportions (p 1) Proportions under H0 (p 0)
Note. In case of t-test, unequal variances are assumed only when sigma contains two distinct values. Otherwise, equal variances are assumed.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER
31
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 32

Figure 1: Box plots for distributions of power values for the independent samples t-test computed
from the Bayesian-classical hybrid approach. The design prior is a normal distribution with mean
.5 and standard deviation .44. Estimates are obtained with 1,000 draws from the prior per sample
size N. The diamonds in the box plot indicate the mean or assurance of the distribution of power
values.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 33

Plausible Prior Distriubtions

Uniform(.15, .45)
Beta(3,5)
Truncated normal(.3, .15)

0.0 0.2 0.4 0.6 0.8 1.0


Probability of Preferring Moderna over Pfizer

Figure 2: Plausible prior distributions for public health researcher interested in the preference of
the Moderna over the Pfizer vaccine. The uniform prior is represented by the black lines bounding
a rectangle, the Beta prior is represented by the dark grey distribution, and the truncated normal is
represented by the light grey distribution.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 34

Figure 3: Boxplots for distributions of power values for the sign test computed from the
Bayesian-classical hybrid approach. The design prior is a uniform distribution with the sup-
port p1 ∈ [.15, .45]. Estimates are obtained with Monte Carlo simulation of 5,000 draws from the
design prior for each sample size. The diamonds indicate the mean power or assurance.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 35

Figure 4: Box plots for distributions of Bayesian-classical hybrid power values for Welch’s one-
way ANOVA. The design prior is specified as three normal distributions with the prior means
of 2.2, 2.5 and 2.0, and the prior standard deviations of .1, .2 and .15, respectively. The unequal
group standard deviations are 1.0, 1.1 and 1.2, respectively. For each sample size N per group,
estimates are obtained from 1,000 draws from the prior by 1,000 samples of data associated with
each draw from prior. The diamonds indicate the mean power or assurance.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 36

References

Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate
statistical power: A method adjusting sample effect sizes for publication bias and uncer-
tainty. Psychological Science, 28(11), 1547–1562. doi: 10.1177/0956797617723724
Beavers, D. P., & Stamey, J. D. (2012). Bayesian sample size determination for binary regres-
sion with a misclassified covariate and no gold standard. Computational Statistics & Data
Analysis, 56(8), 2574–2582. doi: 10.1016/j.csda.2012.02.014
Bloom, H. S. (1995). Minimum detectable effects: A simple way to report the statistical power of
experimental designs. Evaluation Review, 19(5), 547–556.
Box, G. E. (1976). Science and statistics. Journal of the American Statistical Association,
71(356), 791–799.
Champely, S. (2018). pwr: Basic functions for power analysis. [Computer software]. Retrieved
from https://CRAN.R-project.org/package=pwr
Chen, D.-G., Fraser, M. W., & Cuddeback, G. S. (2018). Assurance in intervention research:
A Bayesian perspective on statistical power. Journal of the Society for Social Work and
Research, 9(1), 159–173. doi: 10.1086/696239
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. New York, NY: Aca-
demic Press.
Corbett, J. E. (2017). The whole warps the sum of its parts: Gestalt-defined-group mean size
biases memory for individual objects. Psychological Science, 28(1), 12–22. doi: 10.1177/
0956797616671524
Depaoli, S., & Van de Schoot, R. (2017). Improving transparency and replication in Bayesian
statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261.
Djimeu, E. W., & Houndolo, D.-G. (2016). Power calculation for causal inference in social sci-
ence: Sample size and minimum detectable effect determination. Journal of Development
Effectiveness, 8(4), 508–527.
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 37

Du, H., & Wang, L. (2016). A Bayesian power analysis procedure considering uncertainty in
effect size estimates from a meta-analysis. Multivariate Behavioral Research, 51(5), 589–
605. doi: 10.1080/00273171.2016.1191324
Jaeschke, R., Singer, J., & Guyatt, G. H. (1989). Measurement of health status: Ascertaining the
minimal clinically important difference. Controlled Clinical Trials, 10(4), 407–415. doi:
10.1016/0197-2456(89)90005-6
Jeon, M., & De Boeck, P. (2017). Decision qualities of Bayes factor and p value-based hypothesis
testing. Psychological Methods, 22(2), 340–360. doi: 10.1037/met0000140
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Associa-
tion, 90(430), 773–795. doi: 10.2307/2291091
Kraemer, H. C., & Blasey, C. (2015). How many subjects?: Statistical power analysis in research
(2nd ed.). Sage Publications.
Kruschke, J. K. (2013). Bayesian estimation supersedes the t test. Journal of Experimental
Psychology: General, 142(2), 573–603. doi: 10.1037/a0029146
Kruschke, J. K. (2015). Doing bayesian analysis: A tutorial with R, JAGS, and Stan (2nd ed.).
San Diego, CA: Academic Press.
Lipsey, M. W. (1990). Design sensitivity: Statistical power for experimental research. Newbury
Park, CA: Sage.
MacCallum, R. C. (2003). 2001 presidential address: Working with imperfect models. Multivari-
ate Behavioral Research, 38(1), 113–139. doi: 10.1207/S15327906MBR3801 5
Maynard, R., & Dong, N. (2013). Powerup!: A tool for calculating minimum detectable effect
sizes and minimum required sample sizes for experimental and quasi-experimental design
studies. Journal of Research on Educational Effectiveness, 6(1), 24–67.
McShane, B. B., & Böckenholt, U. (2015). Planning sample sizes when effect sizes are uncertain:
The power-calibrated effect size approach. Psychological Methods, 21(1), 47–60. doi:
10.1037/met0000036
McShane, B. B., Böckenholt, U., & Hansen, K. T. (2020). Average power: A cautionary
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 38

note. Advances in Methods and Practices in Psychological Science. doi: 10.1177/


2515245920902370
Morris, D. E., Oakley, J. E., & Crowe, J. A. (2014). A web-based tool for eliciting probability
distributions from experts. Environmental Modelling & Software, 52, 1–4.
O’Hagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R., Garthwaite, P. H., Jenkinson, D. J., . . .
Rakow, T. (2006). Uncertain judgements: Eliciting experts’ probabilities.
O’Hagan, A., & Stevens, J. W. (2001). Bayesian assessment of sample size for clinical tri-
als of cost-effectiveness. Medical Decision Making, 21(3), 219–230. doi: 10.1177/
0272989x0102100307
O’Hagan, A., Stevens, J. W., & Campbell, M. J. (2005). Assurance in clinical trial design. Phar-
maceutical Statistics, 4(3), 187–201. doi: 10.1002/pst.175
Pek, J., & Park, J. (2019). Complexities in power analysis: Quantifying uncertainties with a
Bayesian-classical hybrid approach. Psychological Methods, 24(5), 590–605. doi: 10.1037/
met0000208
Perugini, M., Gallucci, M., & Costantini, G. (2014). Safeguard power as a protection against
imprecise power estimates. Perspectives on Psychological Science, 9(3), 319–332. doi:
10.1177/1745691614528519
Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainty.
Addison-Wesley.
Rodgers, J. L. (2010). The epistemology of mathematical and statistical modeling: A quiet
methodological revolution. American Psychologist, 65(1), 1–12. doi: 10.1037/a0018326
Schönbrodt, F. D., & Wagenmakers, E.-J. (2018). Bayes factor design analysis: Planning for
compelling evidence. Psychonomic Bulletin & Review, 25(1), 128–142.
Spiegelhalter, D. J., & Freedman, L. (1986). A predictive approach to selecting the size of a
clinical trial, based on subjective clinical opinion. Statistics in Medicine, 5(1), 1–13. doi:
10.1007/978-3-642-83419-6 24
Stefan, A. M., Gronau, Q. F., Schönbrodt, F. D., & Wagenmakers, E.-J. (2019). A tutorial on
CONDUCTING POWER ANALYSIS WITH HYBRIDPOWER 39

Bayes factor design analysis using an informed prior. Behavior Research Methods, 51(3),
1042–1058.
Taylor, D. J., & Muller, K. E. (1995). Computing confidence bounds for power and sample size of
the general linear univariate model. The American Statistician, 49(1), 43–47. doi: 10.2307/
2684810
Tukey, J. W. (1969). Analyzing data: Sanctification or detective work? American Psychologist,
24(2), 83–91. doi: 10.1037/h0027108
Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Wang, F., & Gelfand, A. E. (2002). A simulation-based approach to Bayesian sample size de-
termination for performance under a given model and for separating models. Statistical
Science, 17(2), 193–208. doi: 10.1214/ss/1030550861
Weiss, R. (1997). Bayesian sample size calculations for hypothesis testing. Journal of the Royal
Statistical Society, Series D, 46(2), 185–191. doi: 10.1111/1467-9884.00075
Zondervan-Zwijnenburg, M., Peeters, M., Depaoli, S., & Van de Schoot, R. (2017). Where do
priors come from? Applying guidelines to construct informative priors in small sample re-
search. Research in Human Development, 14(4), 305–320. doi: 10.1080/15427609.2017
.1370966

View publication stats

You might also like