Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

What Sxercise and Sport Scientists Don't Understand

Harriet D Speed & Mark B Andersen


Victoria University, Melbourne, Victoria, Australia.

Speed, H.D., & Andersen, M.B. (2000). What exercise and sport scientists don't understand
Journal of Science and Medicine in Sport 3 (1): 84-92.
The power of research design in studies published in the Australian Journal of Science
and Medicine in Sport (AJSMS; now the Journal of Science and Medicine in Sport) for the
years 1996 and 1997 were analysed for their ability to detect small, medium, and large
effects according to Cohen's (1988) conventions. Also examined were the reporting and
interpreting of effect sizes and the control for experiment-wise (EW] Type I error rates.
From the two years of articles, 29 studies were analysed, mad power was computed on
108 different tests of significance. The median power of the studies to detect small,
medium, and large effects were. 14, .65 and .97, respectively. These results suggest that
exercise and sport science research, at least as represented in AJSMS, is probably
underpowered and may be limited ha detecKng small effects, has a better, but s~ll
underpowered, chance of detecthag medium effects, and has adequate power principally
for detecting large effects. The reporting of effect sizes was rare, and adequate
interpretation of them was even rarer. The mean EW Type I error rate for all studies was
.49. The analyses conducted suggest that much research in exercise science may have
substantial Type I and Type II errors. An appeal is made for exercise scientists to conduct
power analyses, control for EW error, exercise caution in the interpretation of
nonsignificant results, and examine, report, and interpret effect sizes rather than solely
rely on p values to determine whether significant changes occurred or significant
relationsl~ps exist.

Introduction
We h a v e t a k e n t h e title of this p a p e r from M e e h r s (1986) article - n o w a classic-
o n the m i s u n d e r s t a n d i n g s of s t a t i s t i c a l inference in t h e social s c i e n c e s . O u r m a i n
c o n t e n t i o n is t h a t m u c h r e s e a r c h d e s i g n in e x e r c i s e science, a n d t h e statistical
inferences t h a t a c c o m p a n y t h o s e designs, a r e s e r i o u s l y limited. Over 20 y e a r s
ago, J o n e s a n d B r e w e r (1972) a n d C h r i s t e n s e n a n d C h r i s t e n s e n (1977) p o w e r
a n a l y s e d exercise science articles in t h e R e s e a r c h Quarterly for t h e i r ability to
d e t e c t small, m e d i u m , a n d large effects. J o n e s a n d Brewer f o u n d t h a t t h e average
p o w e r of s t u d i e s to d e t e c t small, m e d i u m a n d l a r g e effects w e r e . 13, .50 a n d .78,
respectively. C h r i s t e n s e n and C h r i s t e n s e n ' s {1977) m e d i a n n u m b e r s for the s a m e
effects were .08, .32 a n d .69. W h a t do these n u m b e r s m e a n ? E s s e n t i a l l y , t h e y
m e a n t h a t e x e r c i s e science r e s e a r c h , as r e p o r t e d i n the R e s e a r c h Quarterly, is
~ r t u a l l y i n c a p a b l e of detecting s m a l l effects, h a s a coin toss (or worse) c h a n c e of
d e t e c t i n g a m e d i u m size effect, a n d is p r o b a b l y o n l y a d e q u a t e a t detecting large
84
What Exercise and Sport Scientists Don't Understand

effects. Is detecting small effects in exercise a n d sport science important? In


researching elite athletes, some intervention, training regimen or nutritional
supplement t h a t improves times or distances by a half of a percent (a small effect)
could translate into quite significant results. Those significant "practical"
improvements, however, would not, in m u c h of the current research, reach
"statistical" significance. W-hat are the costs of a Type II error in sport and exercise
research? In the case of a small statistically nonsignificant, b u t practically
important result, the cost m a y be that an effective treatment is deemed ineffective,
and athletes will not benefit from it in the future (and a n error is entered into the
literature).
The strong a n d worrisome message on power from the 7Os does not appear to
have b e e n heeded. Exercise and sport scientists, however, are in good, albeit
powerless, company. Power studies similar to Christensen and Christensen's
(1977), have b e e n conducted in medicine (Reed & Slaichert, 1981), psychology
(Cohen, t962, Sedlmeier & Gigerenzer, 1989), education (Brewer, 1972],
communication (Chase & Tucker, 1975), speech pathology (Kroll & Chase, 1975),
and sport psychology (Speed &Andersen, 1997). All of these studies have reported
poor power for detecting small and medium effects. A major feature of low power
is its ubiquity. Meta-analyses in the sport sciences have reported a large range of
effect sizes. For example, Thomas and French (1985) reported effect sizes between
.09 and 1.98 in their meta-analysis of studies on gender differences across age in
a range of motor performance tasks. Payne and Morrow (1993) reported effect
sizes in the range .01 to 3.52 in studies of exercise and VO2max in children, au~d
Tran, Weltman, Glass and Mood (1983) reported effect sizes between .01 and .89
in studies on the effects of exercise on blood lipids and lipoproteins. Thus, effect
sizes in sport science research vary from insignificant to extremely large.
Previous power studies in the social and biomedical sciences, along with meta-
analyses in exercise science, suggest that research in these fields m a y contain
substantial Type II errors b e c a u s e m a n y research designs examined are not
powerful enough to detect m e d i u m or small effects. This powerless state of affairs
h a s several ramifications. The first major problem is in the common pre-test for
differences. It is usual in exercise science to check statistically if the control and
the ex?el-imental group are equivalent on some measures. This "equivalence" is
tested sometimes by a simple t test for independent m e a n s on the variables in
question. Sometimes with small rts, however (e.g., 7 in each group), the size of the
difference between groups, in order to be significantly different, would have to be
large or the variability within groups, quite small. "Not significantly different" is
not a guarantee of equivalence. Thus, it is probable that m a n y "equal" groups are
nothing of the sort. Such a conclusion is an error, all erroneous a s s u m p t i o n
b a s e d on a misunderstanding of the principles of hypothesis testing (see also
Cohen, 1994). The other major issue is that saying there are "no differences" when
there actually are (Type II error) places questionable conclusions and inferences
into the exercise and sport science canon. Finally, m a n y studies in exercise
science never m a k e it to publication, for a variety of reasons. One of those reasons
m a y be that some lack sufficient power to obtain significant results. A related
problem is the publication biases inherent in reporting of research. Nonsignificant
results rarely get published, thus, we see a restricted range of scientific results.
For further discussion of publication biases in scientific research, see Sterling
(1959), Dickersin (1990), Easterbrook, Berlin, Gopolan and Matthews ( 1991), a n d

85
What Exercise and Sport Scientists Don't Understand

Sterling, Rosenbaum a n d Weinkam {1995).


The p r o b l e m of power s t e m s directly from the problem of the need for p to be
less t h a n .05. Andersen a n d Stoov~ (1998), along with m a n y others (e.g., C h a s e &
Chase, 1976; Cohen, 1962; Hunter, 1997; Meehl, 1986; Rosnow & Rosenthal,
1989; Rozeboom, 1960; Sedlmeier & Gigerenzer, 1989), have argued that p< .05,
in m a n y cases, has retarded the advance of social sciences. Thomas a n d his
colleagues (e.g., Thomas, Salazar, & Landers, 1991) have repeatedly argued that
what is missing from exercise science research studies is the r e p o ~ n g of effect
sizes (along with p values). As Cohen (1990) stated:
the primary product o f research is one or more measures of effect size, not
p values... Effect s i z e measures include mean differences (raw or
standardized), corTeIations and squared correlations of all kinds, odds ratios,
kappas--whatever conveys the magnitude o f the phenomenon of interest
appropriate to the research context. (p. 1310).
Thomas' and Cohen's suggestions have largely gone unheeded. Occasionally
effect sizes are reported in exercise science research, but m o r e often than not, they
are not interpreted or discussed.
Low p o w e r and the poor reporting of effect sizes are only two of several statistical
inference problems in exercise and sport science research. Many exercise science
journals publish articles where multiple statistical tests were run with no
adjustment for experiment-wise (EW) Type I error rates. An extreme example of
this problem is Schmidt (1995). He conducted 72 statistical tests, without alpha
adjustment, making his probability of a Type I error 1 - {.95) 72 = .98. Running
multiple tests with no a l p h a adjustment is common, and thus, studies in exercise
and sport science may also contain Type I errors.
Low power leads to Type II errors, b u t w h a t is even m o r e worrisome is w h e n
questionable, and in m a n y cases probably false, judgements of "no difference" are
translated into recommendations for training or diet or supplementation.
Misunderstandings of w h a t statistics are revealing can lead to bad, or possibly
even dangerous, advice. "Not significant" does not equal "no difference." In m a n y
cases it m e a n s "not enough power."
Our questions were "What is the current state of affairs in exercise science, as
reported in AJSMS, in t e r m s of power and the reporting a n d the interpretation of
effect sizes?" and W@hat are the range and average of EW Type I error r a t e s in
AJSMS?. "
Method
Procedures
Volumes 28 and 29 of A J S M S served as the data base for this study. Articles that
contained inferential statistics were analyzed for their power to detect small,
medium, a n d large effects for an alpha level of .05 using Cohen's (1988)
conventions (see Table 1). We recorded h o w often effect sizes were reported and
whether t h e y were interpreted. We also calculated EW Type I error rates for each
study containing inferential statistics. These eight issues of AJSMS contained a
total of 40 articles, 31 of which included at least one significance test. The
remaining nine articles were review p a p e r s or descriptive studies. Of those 31
articles, 2 were omitted from power analyses because t h e y included statistical
tests for which no power tables are currently available (e.g., structural equation
modelling, factor analysis).
86
What Exercise and Sport Scientists Don't Understand

Effect Size
Statistical Test Small Medium Large
Chi-square (~) .t0 .30 .50
t test (d) .20 .50 .80
Correlation (r) .t0 .30 .50
ANOVMANCOVA (f) .10 .25 .40
Multi. Regression (f2) .02 .15 .35
Note: Source - Cohen (1988).

Table 1: Effect size conventions for the statistical tests analysed in the Australian Journal of Science and Medicine
in Sport, 1996.1997.

For the purpose of this study, we set stm~dard conditions for which to compute
the power of each test. The experiment served as the unit of analysis. Different
ex~periments within the same articles were treated as separate units if they used
different samples of participants or comprised different degrees of freedom. For
example, where separate ANOVAs were conducted on each of six dependent
measures for the same sample, the analyses were treated as one in the
determination of power. Nonparametric tests were treated like their parametric
equivalents; all factors, fixed and random, were treated as fixed effects. Where two
groups had unequal ns the harmonic mean was calculated to determine power.
Where three or more cell sizes were unequal, the power analyses were conducted
using the average group or cell size (Cohen, 1988). The effect of these conditions
on calculations of the power of a test is usually an overestimation of power
(Cohen, 1965; Koele, 1982). In cases where details about the design were lacking
or unclear, the determination of power was in favour of a higher estimate (e.g.,
equal cell or group sample sizes were a s s u m e d where details were uncertain).
Thus, we were liberal in the estimation of power, and our results m a y paint a
better picture of power in AJSMS than actually exists. For repeated m e a s u r e s
designs, the power of statistical tests (i.e. t tests and ANOVAs) was determined
for effect sizes multiplied b y ~2, to compensate for a n overestimation of error
variance by conventional power tables (see Cohen, 1988, pp. 46-47). For small,
medium and large effects, we chose Cohen's (1988) conventions for the
behaviom'al sciences. For example, a Cohen's d of .20 is a small effect and
represents a between group difference of approximately 8 percentile points in
standardised terms. Note, however, that this small effect in the behavioural
sciences m a y actually represent a medium or even large effect in exercise
physiology or biomechanics. For example, an 8 percentile point change in some
blood chemistry parameter (e.g. haematocrit) would be dramatic. Thus, once
again we have probably erred on the side of painting a better picture in reported
exercise science research t h a n actually exists.
For each article where EW error was not controlled for, the number of statistical
tests performed was counted. Post hoc analyses that contained EW error control
were not included. The formula 1 - (.95) N = alpha was calculated for each article.
The mem~ and range of actual alpha levels are presented in the Results section.
Presentations and interpretations of effect sizes in all articles, where they
appeared, were noted.
87
What Exercise and Sport Scientists D o n ' t Understand

Results
Power
The analyses presented here are b a s e d on 108 significance tests from 29 articles,
and includes t tests, ANOVAs, ANCOVAs, correlations, multiple regressions, and
chi-square tests. Of all the articles in the two volumes, no authors determined a
priori the power of their research. Only two studies mentioned power, and one was
a review article.
The distribution of statistical power of the 108 tests analyzed for the ability to
detect small, medium, and large effects is summarized in Table 2. Because a small
n u m b e r of experiments had (relatively) high power for small effect sizes (i.e., the
distribution of power had a large positive skew) or very low power for large effect
sizes (substantial negative skew), we considered the medians rather t h a n the
m e a n s to represent the central tendency of power. The median power for small,
m e d i u m and large effects w a s . 14, .65 and .97, respectively. Given Cohen's (1988)
suggestion for a m i n i m u m for adequate power (i.e., .80 or higher), no studies had
adequate power to detect small effects, only 38% of studies had the suggested
adequate power to detect medium effects, and approximately 75% of studies had
.80 power or better to detect large effects.

~ffect s|ze
Smal~ IViedlum Large
Power Fre~. Cure.% Freq. Cure.% ~req. Curn.%
.95-.99 18 100 57 100
.90-.94 8 83 11 47
.80-.89 15 76 13 37
.70-.79 t2 62 7 25
.60-.69 1 100 2 51 3 19
.50-.59 4 99 15 49 8 16
.40-,49 1 95 1t 35 2 8
.30-.39 6 94 10 25 6 6
.20-.29 19 89 9 16 1 1
•t0-.19 45 71 8 7
.05-.09 32 30
Median Power .14 .65 .97
Mean Power .17 .63 .85

Table2: The power to detect small, medium and large effects of studies published in the 1996-1997volumes of
the AustralianJournal of Scienceand Medicine and Sport, IV=108.

Effect Size
Out of all the studies, effect sizes were r e p o s e d four times, and only two of those
four times were they interpreted.
~xperiment-Wise Error
Only two of the 29 studies controlled for EW Type I error. For the rest of the
studies the EW error rate could be, in m a n y cases, only approximately determined
88
What Exercise and Sport Scientists Don't Understand

b e c a u s e of poor reporting of the statistical tests conducted and the unclear nature
of m a n y post hoc comparisons. Thus, the r e s u l t s reported h e r e are only
approximate. The m e d i a n n u m b e r of statistical tests performed without alpha
a d j u s t m e n t for the 29 articles was 12, with a range from 1 to 66. With a single
statistical test performed, the alpha level is, of course, .05. With 13 unadjusted
tests, the alpha level rises to .49, a n d with 66 tests the alpha level is .97. Thus,
for approximately h a l f the studies, the probability of making a Type I error was
greater than .50, a n d for a few studies, the probability of such an error was a near
certainty.
Discussion
In Christensen and Christensen's (1977) study, t h e median power of significance
tests w a s ,08, .32, a n d .69 for small, medium, a n d large effect sizes, respectively.
More t h a n two decades later, the situation h a s n o t improved m u c h for the
detection of small effects. There a p p e a r s to be i m p r o v e m e n t over t h e 1977 study
in power for detecting medium a n d large effects, b u t one must consider that our
power estimates are m o s t likely overestimates. Nonetheless, the power of exercise
and sport science research still s e e m s substandard. We chose the study by
Christensen and Christensen for a comparison b e c a u s e the l~ature of the articles
published in Research Quarterly are closer to A J S M S articles t h a n any other
power studies previously conducted. Thus, the conclusions on power in studies
in Research Quarterly and AJSMS are only suggestive and need to b e xdewed as a
best available comparison.
The state of reporting power a n d effect sizes in exercise science research in
A J S M S appears extremely limited. Only few articles reported a class of major
statistics (i.e., effect sizes) that tells as m u c h (actually more) about the results as
p values, and half of those did not attempt to explain what the effect sizes might
mean. Given the history of the a r g u m e n t s about p o w e r and effect sizes in exercise
science (e.g., Christensen & Christensen, 1977; Thomas & French, 1986;
Thomas, Salazar, & Landers, 1991), it is obvious t h a t many contributors to
A J S M S do not u n d e r s t a n d the basic statistical concepts of Type I and II errors,
power, and the importance of effect sizes. In the world of research, they are in
good company. T h e s e "understanding" problems are o m n i p r e s e n t in the
behavioural and biomedical sciences. These misunderstandings are flxrther
complicated by a misinterpretation of the logic of hypothesis testing and the
a s s u m p t i o n s underlying statistical tests. This lack of understanding can lead to
flawed inferences n o t warranted b y the data and statistical results.
Calculating power after a study is completed is n o t always, at first glance, a
simple, straightforward process, a n d in all cases, a n attempt to estimate power
b e f o r e the study would be preferable. Calculating effect sizes, however, is
relatively easy. Once effect sizes are calculated, the researcher can u s e the sample
size a n d the effect size and consult power tables for a n estimation of the power of
the s t u d y just completed. The formulas for common effect sizes are contained in
Appendix A.
The results of the power analyses for AJSMS are disappointing, b u t not
unexpected. What do these results s a y about r e s e a r c h in exercise and sport? It
a p p e a r s that the likelihood of making Type II errors for medium effects in exercise
science research, as represented b y AJSMS (38% of studies have less than .80
power) is substantial, and for small effects, it is a l m o s t a certainty. Why does low

89
What Exercise and Sport Scientists Don't Understand

power plague exercise and s p o r t science r e s e a r c h ? Is it primarily a matter of


tradition and inertia? Why should we expect exercise and s p o r t science research
to be any different from other behavioural a n d biomedical sciences?
So what should be done? O u r message would not be to a b a n d o n significance
testing and p< .05, as some would suggest (e.g., Hunter, 1997). There are plenty
of good reasons to report p values (cf. Abelson, 1997; F r a n k s & Huck, 1986;
H a m s , 1997), and what we would suggest to reviewers and editors of exercise a n d
sport science journals is t h a t along with reporting p values they require
researchers to report and interpret effect sizes (cf., Andersen & StoovG 1998;
Thomas, Salazar, & Landers, 1991). The tripartite reporting of significance levels,
effect sizes, a n d power (especially in the case of nonsignificant results) will give
readers of j o u r n a l articles a m u c h more comprehensive picture of the results a n d
a broader b a s e of information for researchers (and j o u r n a l subscribers) to
interpret exercise and sport science findings. It is highly likely that many p a s t
studies in sport and exercise research have a n abundance of Type I and Type II
errors. For example, in the Laubach, Brewer, Van Raalte a n d Petitpas (1996)
article, a psychological variable (causal dimension) was not a statistically
significant predictor of attendance at rehabilitation sessions, b u t the R2 was .24!
In the behavioral sciences, m o s t researchers would be ecstatic to account for 24%
of the variance in a behavioral measure. The a u t h o r s discussed possible reasons
there was no significance b u t did not consider t h a t .24 is in the very large range
of effect sizes a n d that low power might h a v e been a problem. Rarely did a n y
researcher in the two volumes examined m a k e a conservative interpretation of a
nonsignificant result (e.g., "we did not detect a difference, b u t we may have not
had sensitive enough measures, strong enough treatments or enough statistical
power to do so"). Many researchers interpreted nonsignificance as "no difference".
Once again, as stated above, a "no difference" judgement is a misunderstanding
of the logic of hypothesis testing, leading to possibly inaccurate conclusions.
The good news from this study, however, is t h a t probably m a n y past studies in
sport and exercise science r e s e a r c h could benefit from replication with much m o r e
powerful designs. This situation could be a b o o n for graduate students looking for
thesis topics.
The bad news is that the issue of power still receives little attention, and even
though effect sizes are increasingly reported, few authors h a v e interpreted w h a t
the effect sizes might indicate (e.g., Kerr & Goss, 1996). For example, Laubach et
al. (1996) reported large to very large effect sizes, b u t they n e v e r discussed w h a t
those large effects might indicate. We hope, in the future, researchers ~Jl provide
the numbers for the p values, t h e effect sizes, a n d the power of their studies (along
with controlling for EW error) a n d offer a discussion of what it all means.
References
Abelson, R. P. (1997). On the surprising longmdty of flogged horses: Why there is a ease for the
significance test. Psychological Science 8: 12-15.
Andersen, M. B., & Stoovh, M. A. (1998). The sanctity o f p < .05 obfuscates good stuff: A comment
on Kerr and Goss. Journal of Applied Sport Psychology 10: 168-173.
Brewer, J. K. {1972). On the power of statistical t e s t s in the American Educational Research
Journal. American Educational Research Journal 9:391-401
Chase, L. J., & Chase, R. B. (1976). A statistical power analysis of applied psychological research.
Journal of Applied Psychology 61: 234-237.
Chase, L. J., & Tucker, R. T. (1975). A power-analytic examination of contemporary
communication research. Speech Monographs 42: 29-41.

90
What Exercise and Sport Scientists Don't Understand

Christensen, J. E., & Christensen, C. E. (1977). Statistical power analysis of health, physical
education, and recreation research. Research Quarterly 48: 204-208.
Cohen, J. (1962). The statistical power of abnormal-social psychological research. J o u r n a l of
Abnormal and S o c i a l Psychology 65: 145-153.
Cohen, J. (1965). Some statistical issues in psychological research. In B. B. Wolman (Ed.),
Handbook of Clinical Psychology (pp.95-121). New York, McGraw-Hill.
Cohen, J. (1988). S t a t i s t i c a l Power Analysis for the Behavioral Sciences (2nd ed.). HilIsdale,
NJ., Erlbaum.
Cohen, J. (1990). Things I have learned (so far). American P s y c h o l o g i s t 45: 1304-1312.
Cohen, J. (1994). The e a r t h is round (p < .05). American Psychologist 49: 9 9 7 - i 0 0 3 .
Dickersin, K. (1990) The existence of publication bias and risk factors for its occurrence. Journal
o f t h e American Medical Association 263: 1385-1389.
Easterbrook, P. J., Berlin, J. A., Gopalan, R. & Matthews, D.R. (199t) Publication bias in clinical
research. Lancet 337: 867-872.
Franks, B. D., & Huek, S. W. (1986). Why does everyone use the .05 significance level? Research
Quarterly for E x e r c i s e and Sport 57: 245-249.
Hart-is, R. J. (1997). Significance tests have their place. Psychological Science 8: 8-11.
Hunter, J. E. (1997). Needed: A b a n on the significance test. Psychological S c i e n c e S: 3-7.
Jones, B. J., & Brewer, J. K. (1972). An a n @ s i s of the power of statistical tests reported in The
Research Quarterly. T h e Research ~ u u r t e r l y 43: 23-30.
Ken', G., & Goss, J. (1996). The effects of a stress management program on injuries and stress
levels. J o u r n a l of Applied Sport P s y c h o l o g y 8:109-117.
Koele, P. (1982} Calculating power in analysis of variance. Psychological B u l l e t i n 92: 513-516.
Kroll, R. M., & Chase, L. d. (1975). Communication disorders: A power analytic assessment of
recent research. Jo~xrnal of Communication Disorders S: 237-247.
Laubach, W. J., Brewer, B. W., Van Raalte, J. L, & Petitpas, A. d. (1996) Attributions for recovery
a n d adherence to sport injury rehabilitation. The A u s t r a l i a n Journal o f Science and
Medicine in S p o r t 28: 30-34.
Meehl, P. E. (1986). Wh.at social scientists don't understand, t n D. W. Fiske & R. A. Shweder
(Eds.), M e t a t h e o r y i n Social Science: Pluralisms a n d Subjectivities {pp. 315-338).
Chicago, University of Chicago Press.
Payne, V. G., & Morrow, J. R., Jr. (1993) Exercise and VO2mm~ in children: A meta-analysis.
Research Quarterly for Exercise and S p o r t 64: 305-313.
Reed, d. F., III,& Slaichert, W. {1981). Statistical proof in inconclusive "negative" trials. Archives
of I n t e r n a l Medicine 141: 1307-1310.
Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures a n d the justification of lmowledge
in psychological science, American P s y c h o l o g i s t 44: 1276-1284.
Rozeboom, W. W. (1960}. The fallacy of the null hypothesis significance test. Psychological
Bulletin 57: 416-428.
Schmidt, G. d. (1995). Muscular endurance a n d flexibility components of the Singapore National
Physical Fitness Award. The Australian J o u r n a l of S c i e n c e and Medicine i n Sport 27: 88-
94.
Sedlmeier, P., & Gigerenzer, G. {1989}. Do studies of statistical power have an effect on the power
of studies. Psychological Bulletin 105: 309-316.
Speed, H. D., & Andersen, M. B. {1997). Powerless in the face of small effects: Power in sport
psychology research [Abstract]. J o u r n a l of Applied S p o r t Psychology 9: (Suppl.}, $158.
Sterling, T, D. (1959} Publication decision and the possible effects on inferences d r a w n from tests
of significance-or vice versa. Journal o f the American S t a t i s t i c a l A s s o c i a t i o n 5 4 : 3 0 - 3 4
Sterling, T,D., Rosenbaum, W.L. & Weinkam, d.d. (1995} Publication decisions revisited: The effect
of the outcome of statistical tests on the decision to p u b l i s h and vice versa. The American
Statistician 49:108-112.
Thomas, d. R., & French, K. E. (1985) Gender differences across age in motor performance: A
meta-analysis. Psychological Bulletin 98: 260-282.
Thomas, d. R., & French, K. E. (1986} The use of meta-analysis in exercise a n d sport: A tutorial.
Research Quarterly for Exercise and Sport 57: 196-204.
Thomas, J. R., Satazar, W., & Landers, D. M. (1991} W h a t is missing in p< .05? Effect size.
Research Quarterly for Exercise and S p o r t 62: 344-351.
Tran, Z. V., Weltman, A., Glass, G. V., & Mood, D. P, (1983) The effects of exercise on blood lipids
a n d lipoproteins: A meta-analysis of studies. Medicine and Science in S p o r t s and Exercise
15: 393-402.

91
What Exercise and Sport Scientists Don't Understand

Author Note
Harriet D. Speed and Mark B. Andersen, Centre for Rehabilitation, Exercise and
Sport Science and the School of H u m a n Movement, Recreation and Performance,
Victoria University, Melbourne, Australia.
We would like to t h a n k S a m Zelazo and I. J. Reilly for their helpful c o m m e n t s
on earlier drafts of this paper. This paper received a "High Commendation" from
Sports Medicine Australia at their annual conference in Adelaide in 1998.
Correspondence concerning this article should be addressed to: Harriet Speed,
School of H u m a n Movement, Recreation and Performance, Victoria University, PO
Box 14428, MCMC Melbourne, Victoria 8001, Australia. Electronic marl m a y be
sent via Internet to: [Hamet.Speed@vu.edu.au].
AppendN A
Formulas for Caiculatin~ EfFec~ Sizes for Common S~atis~ican Tests
t test for dependent means: Mean of the difference scores divided by the standard
deviation of the difference scores:

d = MD/S D
t test for independent means: Mean of Group 1 minus the m e a n of Group 2
divided by the pooled standard deviation of Groups 1 and 2:
d = (M I - M 2 ) / S p

One-way analysis of variance (ANOVA): A m o u n t of variance accounted for by


group membership, s u m of squares between divided by s u m of squares total:
R 2 = SSB/SSr or Cohen'sf." ~/F/~/n
where all cells in the A_NOVA have equal ns.
Two-way ANOVA: Amount of variance accounted for by the m a i n effects (R = rows,
C = columns) and the interaction (I), equivalent to eta squared (@).
R~ =s s R / (ssr - s s c - ssl)
R2R = S S c / (SSr- SSR - SS~)
R21= s s r / ( s s r - s s R - s s d
Chi-square 2 x 2 contingency table, the phi coefficient: The square root of Z2
divided by the n u m b e r of cases in the entire sample:

Chi-square contingency tables larger than 2 x 2, Cramer's phi coefficient: The


square root of ;~2 divided b y N times the degrees of freedom of the smaller
dimension.
Cramer's ¢'= ~/(z2/N * dfs)

92

You might also like