Professional Documents
Culture Documents
Saville 1990
Saville 1990
Author(s): D. J. Saville
Source: The American Statistician, Vol. 44, No. 2 (May, 1990), pp. 174-180
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2684163 .
Accessed: 17/06/2014 13:51
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to The
American Statistician.
http://www.jstor.org
A practicing statistician looks at the multiple comparison Theirmisuse in experimentsin which the treatmentspossess
controversyand relatedissues throughthe eyes of the users. a factorial structure,or are quantitativein nature,has been
The concept of consistency is introducedand discussed in highlighted by many excellent papers in applied science
relation to five of the more common multiple comparison journals(e.g., Chew 1980; Cousens 1988; Little 1978;Perry
procedures.All of the proceduresare found to be inconsis- 1986; Petersen 1977). These writers have rightly pointed
tent except the simplest procedure, the unrestrictedleast out the greaterappropriatenessof the methodsof orthogonal
significant difference (LSD) procedure(or multiple t test). contrasts and regression analysis for the analysis of data
For this and other reasons the unrestrictedLSD procedure from such experiments.
is recommended for general use, with the proviso that it In most researchwork, the objectives are sufficientlywell
should be viewed as a hypothesis generatorratherthan as defined that the questions of interestcan be answeredusing
a method for simultaneoushypothesis generationand test- appropriatecontrasts. In a minority of cases, such as pes-
ing. The implicationsfor Scheff6's test for generalcontrasts ticide screening trials or cultivar evaluation trials, a least
are also discussed, and a new recommendationis made. significant difference is useful as a yardstickwith which to
assess the strengthof evidence for any particularpairwise
KEY WORDS: Comparisonwiseerrorrate;Duncan's mul- difference. In these cases I recommendusage of the simplest
tiple rangetest; Experimentwiseerrorrate;Power;Teaching multiple comparison procedure, the multiple t test, or un-
of statistics; Tukey's honest significant difference proce- restrictedLSD procedure.
dure; Waller-Duncan k-ratiotest. The purpose of this article is to outline some of the rea-
soning behind this recommendation.Particularattentionis
paid to the "inconsistency" of the alternativesto the un-
1. INTRODUCTION restrictedLSD procedure.This refersto the fact thata given
Proceduressuch as the least significant difference (LSD) procedurecan returna verdict of not significant for a given
test, Duncan's multiplerangetest, and Wallerand Duncan's difference in one experiment, but return a verdict of 1%
k-ratio LSD test are used by applied researchersfor the significant for the same difference in a second experiment,
analysis of data from experimentsin which the treatments with no change in the standarderrorof the differenceor the
have no easily definable structure.These and similar pro- numberof errordegrees of freedom.
cedures, referredto generally as multiple comparisonpro- The formatof the article is to define the unrestrictedLSD
cedures, have been the subject of controversyfor over half procedure(Sec. 2), set the scene by discussing its practical
a century. usage (Sec. 3), and discuss the common objections to its
Many statisticiansdislike the idea of simultaneouslytest- use (Sec. 4). The notion of inconsistencyis then introduced
ing a multitude of unplannedand interrelatedhypotheses, and discussed, with examples of the inconsistency of Fish-
and some question the usefulness of multiple comparison er's restricted LSD procedure, Tukey's honest significant
procedures(e.g., Nelder 1971; Plackett 1971; Preece 1982). difference (HSD) procedure, and Waller and Duncan's k-
In spite of this, multiple comparison procedurescontinue ratioLSD procedure(Sec. 5). In a discussion (Sec. 6), three
to be widely used and sometimes misused by researchers. main points are covered. First, the advantagesof the un-
restrictedLSD procedureare summarized.Second, the dis-
tinction between hypothesis generation and testing is
*D. J. Saville is Biomnetrician,Ministry of Agricultureand Fisheries,
P.O. Box 24, Lincoln, Canterbury,New Zealand. The authoris grateful highlighted, leading to the suggestion that multiple com-
to F. JacksonHills for stimulatingthe work and to PeterHeffernan,Harold parison procedures should be treated as hypothesis gener-
Henderson, Mike Ryan, Chris Dyson, and Karen Baird for constructive atorsratherthan simultaneousgeneratorsand testers. Third,
criticism. Samuel G. Carmeris also thankedfor his early encouragement the implicationsfor the analysis of general, unplannedcon-
and for assistance with the preparationof this article, including the com-
trasts are spelled out, leading to a recommendationof the
putation of the equivalent significance levels in Section 5. The ideas in
the article were presented at the thirteenthInternationalBiometric Con- usual F or t test instead of the inconsistentScheffe test. To
ference in Seattle in 1986. summarize, the "practicalsolution" is given in Section 7.
174 The American Statistician, May 1990, Vol. 44, No. 2 (C 1990 AmericanStatistical Associationi
Wheat yield
(tonneslhectare)
Treatment
1. Control 5.2
2. Coded chemical, half-normal rate (.5N) 7.7
3. Coded chemical, normal rate (N) 8.8
4. Coded chemical, twice-normal rate (2N) 8.9
5. Standard chemical, normal rate (N) 8.3
LSD(5%) .9
Contrasts
Control versus treated (treatment 1 versus treatments 2-5) **a
Linear trend within coded chemical (treatments 2-4) *b
Coded (N) versus standard (N) (treatment 3 versus treatment 5) Not significant
NOTE: Treatment means, LSD(5%), and the significance of the contrasts of interest are presented for an herbicide experiment conducted
using a randomized complete block design with four replicates.
a1% significance.
b5% significance.
in common to both differences and 0 for pairsof differences in general there is more opportunityfor Type II errors to
with no treatmentin common. The second answer is that occur than Type I errors. This means that from a practical
for an experimentinvolving k treatments,the proportionof viewpoint it is desirable to put more weight on minimizing
correlatedpairs of pairwise differences is 4/(k + 1), which Type II errors than Type I errors. Since most of the alter-
decreases rapidly as k increases. Hence, when there are 7 native procedureshave a higher Type II errorrate than the
treatments,half of the pairs of pairwise differences are cor- unrestrictedLSD procedure (Carmer and Swanson 1971,
related, but when there are 39 treatmentsonly 10% of the 1973), this points to the unrestrictedLSD procedureas the
pairs are correlated. procedureof choice.
Does the unrestrictedLSD procedureallow more corre- The preoccupationwith Type I errorsamong theoretical
lated comparisonsthan the alternativeprocedures?The an- statisticianspresumablyarises from the comparativemath-
swer is no! All proceduresare equal in this respect. This is ematical simplicity of the null case ,? = A2 = =k
because the problem of correlationcan only be solved at which has led to a predominanceof work on this case at
the design stage, by prespecifyingorthogonalcontrastscor- the expense of equally importantalternativehypotheses.
respondingto questions of interestto the researcher.At the The third objection sometimes raised is that the unre-
analysis stage, the best that the data analyst can do is to stricted LSD procedureis wrongly based on a comparison-
bear in mind the fact that each comparisonis correlatedto wise Type I error rate and should instead be based on an
certain other comparisons, so Type I errors tend to occur experimentwiseType I errorrate. The problemhere is that
in bunches, being more frequentthan usual in some exper- holding the experimentwiseType I errorrate constant, say
iments but less frequent than usual in other experiments. at 5%o,causes a rapidincrease in the probabilityof Type II
20 25
LSD = 3.07
John
Mary
LSD(5%) = 3.01 A B
.JAI2LE LIL ,
20 25
*
LSD 3.87
20 25 Skip ,
IAfflFLlBI
20
ns
25
Dave
LSD = oc) A B ns
20 25 Figure 3. Significance of the Difference Between Populations
Receiving Treatments A and B in Three Studies Statistically Ana-
Figure 2. Significance of the Difference Between Populations lyzed Using Wallerand Duncan's k-Ratio LSD Procedure (and their
Receiving Treatments A and B in Three Studies Statistically Ana- tabular values). A k ratio of 100 was used to derive LSD(5%), which
lyzed Using the Restricted LSD Procedure. ns is not significant, a is shown as a horizontal bar. ns is not significant, a single asterisk
single asterisk denotes 5% significance, and a double asterisk de- denotes 5% significance, and a double asterisk denotes 1% sig-
notes 1% significance. nificance.
5.4 Waller and Duncan's Procedure the total numberof treatments,and (b) to varythe protection
In Waller and Duncan's k-ratio LSD procedure(Waller level with the numberof treatmentsso that it is consistent
with the structuredcase in which (p - 1) orthogonalcon-
and Duncan 1969) the overall F value is used in calculating
the LSD. If the overall F value is large the calculatedLSD trastscan be specified. Criticalvalues are tabulatedin many
is smaller than if the F value is small. A k-ratio of 100 statisticaltextbooks;these values inflate quite slowly as the
approximatesa 5%-level test and a k-ratioof 500 approx- group size (p) increases, so in practice the procedureoften
imates a 1%-level test. In general, critical values must be yields results similarto those obtainedfrom the unrestricted
obtained from tables (Waller and Duncan 1969). It is in- LSD procedure.
With Duncan's multiple range test it is more difficult to
formative, however, to note thatfor large experiments,with
construct examples as extreme as those shown in Figures
?15 treatmentsand ?30 degrees of freedom for error, the
1-3; these do exist, but only for relatively large experi-
LSD(5%) = LSD(k = 100) can be approximated by 1.72 x
ments. This indicates that Duncan's multiple range test is
\F/(F - 1) x SED (Duncan 1965). In this expression,
less inconsistent than the three preceding procedures.
as F -> ? the LSD(5%) approaches 1.72 x SED, and as
F -- 1 the LSD(5%) approachesoo.
5.6 Unrestricted LSD Procedure
In Figure 3 anotherthree colleagues Harry, John, and
Skip-have each conducted experimentswith 7 treatments In each of Figures 1-3 the unrestrictedLSD procedure
and 20 replicates, laid out in a completely randomizedfash- is entirely consistent in the verdict it returnsas to whether
ion. Each researcherincludeda common pairof treatments, the populationsreceiving treatmentsA and B have different
A and B, and each observed a difference of 3.6 between means. In fact, the definition of the word consistent means
treatmentsA and B. The SEM was 1.00, so the SED was that the unrestrictedLSD is always a consistent procedure
1.414, and the numberof errordegrees of freedomwas 133 and is the only consistent procedure.
in all cases.
In this hypotheticalexample the statisticianadvises his or 5.7 Comparison of Procedures
her colleagues to use Waller and Duncan's procedureas a
In Figures 1-3 the decision returnedby a particularpro-
method of analysis, using a k ratio of 100 to approximate
cedure has been shown to vary from experimentto exper-
a 5%-level test and a k ratio of 500 to approximatea 1%-
iment, with no change in the observed difference, the SED,
level test. As shown in Figure 3, the same inconsistency
or the numberof errordegrees of freedom. Similarexamples
arises as with the other two procedures. The explanation
can be constructedfor all proceduresexcept the unrestricted
here is thatthe critical value dependson the overall F value;
LSD procedure. Some procedures, however, are more in-
small F values mean large criticalvalues, and largeF values
consistent than others.
mean small critical values (in Fig. 3 the F values are 25.25,
Tukey's procedure is the most inconsistent of the alter-
3.01, and 1.40).
native proceduresmentioned in this article. With this pro-
This procedureis regardedby the reviewers mentionedin
cedure the significance can vary from not significantto . 1%
the last section as one of the better procedures. In agricul-
significant in studies of only modest size. In Figure 1, the
turalresearchit has received some acceptanceas a successor
quoted HSD(5%) values are equivalent to the following
to Duncan's multiple range test.
LSD values: HSD(Alison) = LSD(5%); HSD(Sue) =
LSD(1.1%); HSD(Graeme) = LSD(.3%). In other words,
5.5 Duncan's Multiple Range Test
each researcher has in effect used the unrestrictedLSD
Duncan's multiple range test (Duncan 1955) arises from procedurebut has also allowed the numberof treatmentsto
two modifications to Tukey's HSD procedure. These are select the significance level. Tukey's procedure,however,
(a) to use the critical value for the distributionof the range is at least predictable in its inconsistency, since the HSD
for the number of treatments,p, in a reduced group, not depends simply on the numberof treatmentsincludedin the
Unrestricted Alternative
Characteristics LSD procedure procedures
Consistency? Consistent Inconsistent
Simplicity? Simple More complex
Flexibility? Flexible Less flexible
Type I error rate? Constant Variable
Power? Maximum power Lower power
Required sample size? Easy to calculate Hard to calculate
Easy to check? Easily checked Harder to check