Professional Documents
Culture Documents
Speed 2000 What Exercise Science Doesnt Understand
Speed 2000 What Exercise Science Doesnt Understand
Speed, H.D., & Andersen, M.B. (2000). What exercise and sport scientists don't understand
Journal of Science and Medicine in Sport 3 (1): 84-92.
The power of research design in studies published in the Australian Journal of Science
and Medicine in Sport (AJSMS; now the Journal of Science and Medicine in Sport) for the
years 1996 and 1997 were analysed for their ability to detect small, medium, and large
effects according to Cohen's (1988) conventions. Also examined were the reporting and
interpreting of effect sizes and the control for experiment-wise (EW] Type I error rates.
From the two years of articles, 29 studies were analysed, mad power was computed on
108 different tests of significance. The median power of the studies to detect small,
medium, and large effects were. 14, .65 and .97, respectively. These results suggest that
exercise and sport science research, at least as represented in AJSMS, is probably
underpowered and may be limited ha detecKng small effects, has a better, but s~ll
underpowered, chance of detecthag medium effects, and has adequate power principally
for detecting large effects. The reporting of effect sizes was rare, and adequate
interpretation of them was even rarer. The mean EW Type I error rate for all studies was
.49. The analyses conducted suggest that much research in exercise science may have
substantial Type I and Type II errors. An appeal is made for exercise scientists to conduct
power analyses, control for EW error, exercise caution in the interpretation of
nonsignificant results, and examine, report, and interpret effect sizes rather than solely
rely on p values to determine whether significant changes occurred or significant
relationsl~ps exist.
Introduction
We h a v e t a k e n t h e title of this p a p e r from M e e h r s (1986) article - n o w a classic-
o n the m i s u n d e r s t a n d i n g s of s t a t i s t i c a l inference in t h e social s c i e n c e s . O u r m a i n
c o n t e n t i o n is t h a t m u c h r e s e a r c h d e s i g n in e x e r c i s e science, a n d t h e statistical
inferences t h a t a c c o m p a n y t h o s e designs, a r e s e r i o u s l y limited. Over 20 y e a r s
ago, J o n e s a n d B r e w e r (1972) a n d C h r i s t e n s e n a n d C h r i s t e n s e n (1977) p o w e r
a n a l y s e d exercise science articles in t h e R e s e a r c h Quarterly for t h e i r ability to
d e t e c t small, m e d i u m , a n d large effects. J o n e s a n d Brewer f o u n d t h a t t h e average
p o w e r of s t u d i e s to d e t e c t small, m e d i u m a n d l a r g e effects w e r e . 13, .50 a n d .78,
respectively. C h r i s t e n s e n and C h r i s t e n s e n ' s {1977) m e d i a n n u m b e r s for the s a m e
effects were .08, .32 a n d .69. W h a t do these n u m b e r s m e a n ? E s s e n t i a l l y , t h e y
m e a n t h a t e x e r c i s e science r e s e a r c h , as r e p o r t e d i n the R e s e a r c h Quarterly, is
~ r t u a l l y i n c a p a b l e of detecting s m a l l effects, h a s a coin toss (or worse) c h a n c e of
d e t e c t i n g a m e d i u m size effect, a n d is p r o b a b l y o n l y a d e q u a t e a t detecting large
84
What Exercise and Sport Scientists Don't Understand
85
What Exercise and Sport Scientists Don't Understand
Effect Size
Statistical Test Small Medium Large
Chi-square (~) .t0 .30 .50
t test (d) .20 .50 .80
Correlation (r) .t0 .30 .50
ANOVMANCOVA (f) .10 .25 .40
Multi. Regression (f2) .02 .15 .35
Note: Source - Cohen (1988).
Table 1: Effect size conventions for the statistical tests analysed in the Australian Journal of Science and Medicine
in Sport, 1996.1997.
For the purpose of this study, we set stm~dard conditions for which to compute
the power of each test. The experiment served as the unit of analysis. Different
ex~periments within the same articles were treated as separate units if they used
different samples of participants or comprised different degrees of freedom. For
example, where separate ANOVAs were conducted on each of six dependent
measures for the same sample, the analyses were treated as one in the
determination of power. Nonparametric tests were treated like their parametric
equivalents; all factors, fixed and random, were treated as fixed effects. Where two
groups had unequal ns the harmonic mean was calculated to determine power.
Where three or more cell sizes were unequal, the power analyses were conducted
using the average group or cell size (Cohen, 1988). The effect of these conditions
on calculations of the power of a test is usually an overestimation of power
(Cohen, 1965; Koele, 1982). In cases where details about the design were lacking
or unclear, the determination of power was in favour of a higher estimate (e.g.,
equal cell or group sample sizes were a s s u m e d where details were uncertain).
Thus, we were liberal in the estimation of power, and our results m a y paint a
better picture of power in AJSMS than actually exists. For repeated m e a s u r e s
designs, the power of statistical tests (i.e. t tests and ANOVAs) was determined
for effect sizes multiplied b y ~2, to compensate for a n overestimation of error
variance by conventional power tables (see Cohen, 1988, pp. 46-47). For small,
medium and large effects, we chose Cohen's (1988) conventions for the
behaviom'al sciences. For example, a Cohen's d of .20 is a small effect and
represents a between group difference of approximately 8 percentile points in
standardised terms. Note, however, that this small effect in the behavioural
sciences m a y actually represent a medium or even large effect in exercise
physiology or biomechanics. For example, an 8 percentile point change in some
blood chemistry parameter (e.g. haematocrit) would be dramatic. Thus, once
again we have probably erred on the side of painting a better picture in reported
exercise science research t h a n actually exists.
For each article where EW error was not controlled for, the number of statistical
tests performed was counted. Post hoc analyses that contained EW error control
were not included. The formula 1 - (.95) N = alpha was calculated for each article.
The mem~ and range of actual alpha levels are presented in the Results section.
Presentations and interpretations of effect sizes in all articles, where they
appeared, were noted.
87
What Exercise and Sport Scientists D o n ' t Understand
Results
Power
The analyses presented here are b a s e d on 108 significance tests from 29 articles,
and includes t tests, ANOVAs, ANCOVAs, correlations, multiple regressions, and
chi-square tests. Of all the articles in the two volumes, no authors determined a
priori the power of their research. Only two studies mentioned power, and one was
a review article.
The distribution of statistical power of the 108 tests analyzed for the ability to
detect small, medium, and large effects is summarized in Table 2. Because a small
n u m b e r of experiments had (relatively) high power for small effect sizes (i.e., the
distribution of power had a large positive skew) or very low power for large effect
sizes (substantial negative skew), we considered the medians rather t h a n the
m e a n s to represent the central tendency of power. The median power for small,
m e d i u m and large effects w a s . 14, .65 and .97, respectively. Given Cohen's (1988)
suggestion for a m i n i m u m for adequate power (i.e., .80 or higher), no studies had
adequate power to detect small effects, only 38% of studies had the suggested
adequate power to detect medium effects, and approximately 75% of studies had
.80 power or better to detect large effects.
~ffect s|ze
Smal~ IViedlum Large
Power Fre~. Cure.% Freq. Cure.% ~req. Curn.%
.95-.99 18 100 57 100
.90-.94 8 83 11 47
.80-.89 15 76 13 37
.70-.79 t2 62 7 25
.60-.69 1 100 2 51 3 19
.50-.59 4 99 15 49 8 16
.40-,49 1 95 1t 35 2 8
.30-.39 6 94 10 25 6 6
.20-.29 19 89 9 16 1 1
•t0-.19 45 71 8 7
.05-.09 32 30
Median Power .14 .65 .97
Mean Power .17 .63 .85
Table2: The power to detect small, medium and large effects of studies published in the 1996-1997volumes of
the AustralianJournal of Scienceand Medicine and Sport, IV=108.
Effect Size
Out of all the studies, effect sizes were r e p o s e d four times, and only two of those
four times were they interpreted.
~xperiment-Wise Error
Only two of the 29 studies controlled for EW Type I error. For the rest of the
studies the EW error rate could be, in m a n y cases, only approximately determined
88
What Exercise and Sport Scientists Don't Understand
b e c a u s e of poor reporting of the statistical tests conducted and the unclear nature
of m a n y post hoc comparisons. Thus, the r e s u l t s reported h e r e are only
approximate. The m e d i a n n u m b e r of statistical tests performed without alpha
a d j u s t m e n t for the 29 articles was 12, with a range from 1 to 66. With a single
statistical test performed, the alpha level is, of course, .05. With 13 unadjusted
tests, the alpha level rises to .49, a n d with 66 tests the alpha level is .97. Thus,
for approximately h a l f the studies, the probability of making a Type I error was
greater than .50, a n d for a few studies, the probability of such an error was a near
certainty.
Discussion
In Christensen and Christensen's (1977) study, t h e median power of significance
tests w a s ,08, .32, a n d .69 for small, medium, a n d large effect sizes, respectively.
More t h a n two decades later, the situation h a s n o t improved m u c h for the
detection of small effects. There a p p e a r s to be i m p r o v e m e n t over t h e 1977 study
in power for detecting medium a n d large effects, b u t one must consider that our
power estimates are m o s t likely overestimates. Nonetheless, the power of exercise
and sport science research still s e e m s substandard. We chose the study by
Christensen and Christensen for a comparison b e c a u s e the l~ature of the articles
published in Research Quarterly are closer to A J S M S articles t h a n any other
power studies previously conducted. Thus, the conclusions on power in studies
in Research Quarterly and AJSMS are only suggestive and need to b e xdewed as a
best available comparison.
The state of reporting power a n d effect sizes in exercise science research in
A J S M S appears extremely limited. Only few articles reported a class of major
statistics (i.e., effect sizes) that tells as m u c h (actually more) about the results as
p values, and half of those did not attempt to explain what the effect sizes might
mean. Given the history of the a r g u m e n t s about p o w e r and effect sizes in exercise
science (e.g., Christensen & Christensen, 1977; Thomas & French, 1986;
Thomas, Salazar, & Landers, 1991), it is obvious t h a t many contributors to
A J S M S do not u n d e r s t a n d the basic statistical concepts of Type I and II errors,
power, and the importance of effect sizes. In the world of research, they are in
good company. T h e s e "understanding" problems are o m n i p r e s e n t in the
behavioural and biomedical sciences. These misunderstandings are flxrther
complicated by a misinterpretation of the logic of hypothesis testing and the
a s s u m p t i o n s underlying statistical tests. This lack of understanding can lead to
flawed inferences n o t warranted b y the data and statistical results.
Calculating power after a study is completed is n o t always, at first glance, a
simple, straightforward process, a n d in all cases, a n attempt to estimate power
b e f o r e the study would be preferable. Calculating effect sizes, however, is
relatively easy. Once effect sizes are calculated, the researcher can u s e the sample
size a n d the effect size and consult power tables for a n estimation of the power of
the s t u d y just completed. The formulas for common effect sizes are contained in
Appendix A.
The results of the power analyses for AJSMS are disappointing, b u t not
unexpected. What do these results s a y about r e s e a r c h in exercise and sport? It
a p p e a r s that the likelihood of making Type II errors for medium effects in exercise
science research, as represented b y AJSMS (38% of studies have less than .80
power) is substantial, and for small effects, it is a l m o s t a certainty. Why does low
89
What Exercise and Sport Scientists Don't Understand
90
What Exercise and Sport Scientists Don't Understand
Christensen, J. E., & Christensen, C. E. (1977). Statistical power analysis of health, physical
education, and recreation research. Research Quarterly 48: 204-208.
Cohen, J. (1962). The statistical power of abnormal-social psychological research. J o u r n a l of
Abnormal and S o c i a l Psychology 65: 145-153.
Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.),
Handbook of Clinical Psychology (pp.95-121). New York, McGraw-Hill.
Cohen, J. (1988). S t a t i s t i c a l Power Analysis for the Behavioral Sciences (2nd ed.). HilIsdale,
NJ., Erlbaum.
Cohen, J. (1990). Things I have learned (so far). American P s y c h o l o g i s t 45: 1304-1312.
Cohen, J. (1994). The e a r t h is round (p < .05). American Psychologist 49: 9 9 7 - i 0 0 3 .
Dickersin, K. (1990) The existence of publication bias and risk factors for its occurrence. Journal
o f t h e American Medical Association 263: 1385-1389.
Easterbrook, P. J., Berlin, J. A., Gopalan, R. & Matthews, D.R. (199t) Publication bias in clinical
research. Lancet 337: 867-872.
Franks, B. D., & Huek, S. W. (1986). Why does everyone use the .05 significance level? Research
Quarterly for E x e r c i s e and Sport 57: 245-249.
Hart-is, R. J. (1997). Significance tests have their place. Psychological Science 8: 8-11.
Hunter, J. E. (1997). Needed: A b a n on the significance test. Psychological S c i e n c e S: 3-7.
Jones, B. J., & Brewer, J. K. (1972). An a n @ s i s of the power of statistical tests reported in The
Research Quarterly. T h e Research ~ u u r t e r l y 43: 23-30.
Ken', G., & Goss, J. (1996). The effects of a stress management program on injuries and stress
levels. J o u r n a l of Applied Sport P s y c h o l o g y 8:109-117.
Koele, P. (1982} Calculating power in analysis of variance. Psychological B u l l e t i n 92: 513-516.
Kroll, R. M., & Chase, L. d. (1975). Communication disorders: A power analytic assessment of
recent research. Jo~xrnal of Communication Disorders S: 237-247.
Laubach, W. J., Brewer, B. W., Van Raalte, J. L, & Petitpas, A. d. (1996) Attributions for recovery
a n d adherence to sport injury rehabilitation. The A u s t r a l i a n Journal o f Science and
Medicine in S p o r t 28: 30-34.
Meehl, P. E. (1986). Wh.at social scientists don't understand, t n D. W. Fiske & R. A. Shweder
(Eds.), M e t a t h e o r y i n Social Science: Pluralisms a n d Subjectivities {pp. 315-338).
Chicago, University of Chicago Press.
Payne, V. G., & Morrow, J. R., Jr. (1993) Exercise and VO2mm~ in children: A meta-analysis.
Research Quarterly for Exercise and S p o r t 64: 305-313.
Reed, d. F., III,& Slaichert, W. {1981). Statistical proof in inconclusive "negative" trials. Archives
of I n t e r n a l Medicine 141: 1307-1310.
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures a n d the justification of lmowledge
in psychological science, American P s y c h o l o g i s t 44: 1276-1284.
Rozeboom, W. W. (1960}. The fallacy of the null hypothesis significance test. Psychological
Bulletin 57: 416-428.
Schmidt, G. d. (1995). Muscular endurance a n d flexibility components of the Singapore National
Physical Fitness Award. The Australian J o u r n a l of S c i e n c e and Medicine i n Sport 27: 88-
94.
Sedlmeier, P., & Gigerenzer, G. {1989}. Do studies of statistical power have an effect on the power
of studies. Psychological Bulletin 105: 309-316.
Speed, H. D., & Andersen, M. B. {1997). Powerless in the face of small effects: Power in sport
psychology research [Abstract]. J o u r n a l of Applied S p o r t Psychology 9: (Suppl.}, $158.
Sterling, T, D. (1959} Publication decision and the possible effects on inferences d r a w n from tests
of significance-or vice versa. Journal o f the American S t a t i s t i c a l A s s o c i a t i o n 5 4 : 3 0 - 3 4
Sterling, T,D., Rosenbaum, W.L. & Weinkam, d.d. (1995} Publication decisions revisited: The effect
of the outcome of statistical tests on the decision to p u b l i s h and vice versa. The American
Statistician 49:108-112.
Thomas, d. R., & French, K. E. (1985) Gender differences across age in motor performance: A
meta-analysis. Psychological Bulletin 98: 260-282.
Thomas, d. R., & French, K. E. (1986} The use of meta-analysis in exercise a n d sport: A tutorial.
Research Quarterly for Exercise and Sport 57: 196-204.
Thomas, J. R., Satazar, W., & Landers, D. M. (1991} W h a t is missing in p< .05? Effect size.
Research Quarterly for Exercise and S p o r t 62: 344-351.
Tran, Z. V., Weltman, A., Glass, G. V., & Mood, D. P, (1983) The effects of exercise on blood lipids
a n d lipoproteins: A meta-analysis of studies. Medicine and Science in S p o r t s and Exercise
15: 393-402.
91
What Exercise and Sport Scientists Don't Understand
Author Note
Harriet D. Speed and Mark B. Andersen, Centre for Rehabilitation, Exercise and
Sport Science and the School of H u m a n Movement, Recreation and Performance,
Victoria University, Melbourne, Australia.
We would like to t h a n k S a m Zelazo and I. J. Reilly for their helpful c o m m e n t s
on earlier drafts of this paper. This paper received a "High Commendation" from
Sports Medicine Australia at their annual conference in Adelaide in 1998.
Correspondence concerning this article should be addressed to: Harriet Speed,
School of H u m a n Movement, Recreation and Performance, Victoria University, PO
Box 14428, MCMC Melbourne, Victoria 8001, Australia. Electronic marl m a y be
sent via Internet to: [Hamet.Speed@vu.edu.au].
AppendN A
Formulas for Caiculatin~ EfFec~ Sizes for Common S~atis~ican Tests
t test for dependent means: Mean of the difference scores divided by the standard
deviation of the difference scores:
d = MD/S D
t test for independent means: Mean of Group 1 minus the m e a n of Group 2
divided by the pooled standard deviation of Groups 1 and 2:
d = (M I - M 2 ) / S p
92