Bautista Et Al 1997 A Cluster-Based Approach To Means Separation

A Cluster-Based Approach to Means Separation
Author(s): Maria G. Bautista, David W. Smith and Robert L. Steiner

Reviewed work(s):
Source: Journal of Agricultural, Biological, and Environmental Statistics, Vol. 2, No. 2 (Jun.,
1997), pp. 179-197
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/1400402 .
Accessed: 23/01/2013 20:21
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Journal
of Agricultural, Biological, and Environmental Statistics.
http://www.jstor.org
This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

All use subject to JSTOR Terms and Conditions
A Cluster-Based Approach To Means
Separation
Maria G. BAUTISTA,David W. SMITH, and Robert L. STEINER
A new procedure for grouping treatments following the determination of differences

among treatments is proposed. The procedure differs from most others in that distinct
groups are created. A simulation shows that the new procedure compares quite favorably
with other widely-used means-separation procedures.
Key Words: Means separation; Multiple range test; Post hoc test.
A plant breeder is frequently faced with analyzing a particularvariable such as

mean yields for a group of cultivars. Usually this follows an F test from a standard
ANOVA which indicates that there are significant differences among cultivars. Many
breedersthen employ one of the large variety of available range tests to aid in digesting
the experimentalresults. Most of the tests constructgroups which may have substantial
overlaps.
An example of one such analysis was cited by Steel and Torrie(1980), beginningon
page 140 with the data and continuingon page 176 with Duncan's multiple comparison
procedure.The originalexperimentmeasurednitrogencontentof red clover plantsinocu-
lated with six differentrhizobiumtreatments.Duncan s procedureproducedthree groups.
Group 1 containedfour of the treatmentsand Group2 containedthree of the treatments.
The disconcertingpart of this analysis was that two of the treatmentsin Group 1 were
also two of the treatmentsin Group 2. In the same way, the third treatmentin Group 2
was also one of the two treatmentsin Group 3.
This situation presents a problem since it gives rise to ambiguous interpretations.
The proposedprocedureeliminates that problem and simulationresults indicate that the
procedureis quite good when comparedwith several others.
In the statisticsliterature,there are many types of multiplerange or multiplecompar-
ison techniquesas well as other approachesthat are neithermultiplerange tests nor mul-
tiple comparisons. Examples include Fisher's Least Significant Difference (Fisher 1935),
Tukey's Honestly Significant Difference (Tukey 1953), Student-Newman-Keuls (Stu-
dent 1927; Newman 1939; Keuls 1952), Waller and Duncan (Waller and Duncan 1969),
Duncan's New Multiple Range Test (Duncan 1955), Scheff6s method (Scheff6 1959),
Maria G. Bautista is Agricultural Statistician, Colorado Agricultural Statistics Service, 645 Parfet St., Lake-
wood, CO 80215. David W. Smith is Associate Professor and Robert L. Steiner is Assistant Professor,
University Statistics Center, Box 3CQ, New Mexico State University, Las Cruces, NM 88003 (E-mail:
estatiO3@nmsuvm1.nmsu.edu and estatu28 @nmsuvml .nmsu.edu, respectively).
?1997 American Statistical Association and the International Biometric Society

Journal of Agricultural, Biological, and Environmental Statistics, Volume 2, Number 2, Pages 179-197
179

180 M. G. BAUTISTA,D. W. SMITH,AND R. L. STEINER
Table 1. Comparison of Critical Values for Four Procedures
Test Critical Values
LSD q(2, a, v)s

HSD q(k, a, v)s k= number of treatment means
SNK q(p, a, v)s p=2,3,...,k
DUN q(p, ap, v)s p=2,3,..., k ap= 1-(1 -a)Pl
Gabriel's method (Gabriel 1978), Hochberg's GT2 method (Hochberg 1974), Studen-
tized Maximum Modulus (Stoline 1978; Stoline and Ury 1979), Sidak T test (Sidak
1967), Tukey's Gap Test (Tukey 1949), Welsch's step-up procedure(Welsch 1977), and
a cluster-basedprocedure (Scott and Knott 1974). Gupta (1965) has provided results
addressingrankingand selection. Chew (1976) provided a nice review.
1. REVIEW OF PROCEDURES CONSIDERED

Four multiple range or multiple comparisonprocedureswere used in this study for
comparisonwith the proposed procedure.These include Fisher's protectedleast signifi-
cant difference (LSD), Tukey's honestly significantdifference (HSD), Student-Newman-
Keuls (SNK), and Duncan's New MultipleRange Test (DUN). Each of the four employs
the upperpercentagepoints of the studentizedrangedistribution,q(p, a, v)(Federer 1955;
Harter 1960). The critical studentizedrange value is representedby q with appropriate
parametersneeded for each test. The value of p is the numberof sequential,estimated
treatmentmeans within the span of the means being tested. The value of a is the level of
significance.The value v is the numberof degrees of freedom associatedwith st. These
multiple range or multiple comparisonproceduresdiffer for the values of these parame-
ters (referto Table 1). The studentizedrange is only one of the terms needed to calculate
the critical value for a particularmultiple range or multiple comparisonprocedure.The
other is the estimated standarddeviation of a sample mean, sx.
2. DESCRIPTION OF THE PROPOSED PROCEDURE

In 1949, Tukey developed a procedurecalled the Gap StragglerMethod. Tukey had
the notion to separatethe treatmentmeans into distinct groups. "Wewish to separatethe
varietiesinto distinguishablegroups,as often as we can withouttoo frequentlyseparating
varieties which should stay together "(Tukey 1949, p. 100).
The proposedprocedureadopts the same approachby forming distinct groups with
the treatmentmeans thatare similarstayingtogether.There are severalrefinementswhich
are made possible by moderncomputerpower.
The procedureshould producegroups of treatmentswhere:
1. There should always be differences among groups.
2. There should be no differences among treatmentmeans within groups.
3. The numberof groupsof treatmentmeans shouldbe as small as possible consistent
with (1) and (2).

CLUSTER-BASED APPROACH To MEANS SEPARATION 181
The following procedure addresses these requirements:
1. Perform the F test on a set of sample means.

2. If there exists at least one difference among the treatment means, then do a cluster
analysis of the treatment means.
3. Group the two closest treatment means (according to the cluster analysis) into the
same group and put each of the remaining k -2 treatments into a group by itself.
(Note that at the first stage there will be k - 1 groups where k - 2 of the groups
contain exactly one mean and one group contains exactly 2 means.)
4. Construct a nested analysis of variance from the original ANOVA where the
treatment source is now partitioned into "groups" and "treatments within groups".
5. If there are differences among the means of the groups and no differences among
the treatments within groups, move up to the next level of the cluster tree and
repeat steps 3 and 4.
6. If there are no differences among the means of the groups or differences among
the treatments within groups, then use the previous level of the cluster tree.
3. SOME PRACTICAL CONSIDERATIONS

First, since the same set of data is used repeatedly, the observed significance levels
(OSL) should be viewed as indices. The behaviors of these indices as the number of
groups decreases are of interest.
Second, in our experience analyzing about 40 real biological and agricultural exper-
iments, the OSL for between groups usually remains low. The OSL for within groups
begins at a high value, and, for the first several groupings, may increase slightly. Usually
this is followed by a number of groupings which are stable at a high value. Then over the
space of several grouping steps, when quite dissimilar groups are being joined together,
the OSL usually falls quite rapidly to a low value.
Third, in determining the number of groups, nothing is more helpful than to visit with
the scientist. This person usually knows whether more or fewer groups are appropriate
and, between the two of you, some helpful understanding of the scientific problem may
be gained.
4. EXAMPLE
On page 140 of Steel and Torrie (1980), there is an example having six treatments.
Implementation of the proposed procedure is illustrated in Table 2. First, there is a
difference among treatments at Stage 0. The two closest estimated means are 3DOk7 and
COMP, with 19.31 for a combined mean. For Stage 1 these two are grouped. Note that
there is a difference among groups (p < .0001), but that there is no difference between
3DOk7 and COMP (p = .5794). One should now proceed to Stage 2.
Using Stage 1 to obtain Stage 2, the two closest estimated means are 3DOk4 and
3DOk13. Therefore, for Stage 2, 3DOk4 and 3DOkl3 are grouped giving 4 groups.

182 D. W. SMITH,ANDR. L. STEINER
M. G. BAUTISTA,
Table 2. Implementation of the Proposed Procedure for an Example
Estimated means and grouping

Stage 0, Stage 1, Stage 2, Stage 3, Stage 4,
Treatment 6 groups 5 groups 4 groups 3 groups 2 groups
3DOkl 28.82 28.82 28.82 28.82 28.82

3DOk5 23.98 23.98 23.98 20.867 18.1
3DOk7 19.92 19.31 19.31 - -
COMP 18.70 - - - -
3DOk4 14.64 14.64 13.95 13.95 13.95

3DOk13 13.26 13.26 - - -
Index values and degrees of freedom

Group df 5 4 3 2 1
Group OSL .0001 <.0001 <.0001 <.0001 <.0001
Trt(Group)df - 1 2 3 4
Trt(Group)OSL - .5794 .7015 .1034 .0004
Again, there is a difference among groups (p < .0001) but there are no differences
within the groups (p .7015).
The process continues until Stage 4 is reached and it is discovered that there is a
differencewithin the group of treatmentscomprisedof COMP, 3DOk4, 3DOk5, 3DOk7,
and 3DOkl3 (p = .0004). Therefore,the groupingselected is defined by Stage 3, which
yields a group consisting of 3DOkl alone; a second group containing 3DOk5, 3DOk7,
and COMP;and a third group of 3DOk4 and 3DOkl3.
5. SIMULATION
A simulationstudyusing SAS was conductedto comparethe performanceof the new
method (NEW) to the existing methods-LSD, HSD, SNK, and DUN. The study was
composed of two main sets of simulations,the first being based on five treatmentsand
the latterbeing based on nine treatments.Each main set is composed of three simulations
(n = 3, n = 9, and n = 12). The second main set has the same configurationexcept
that there are nine treatmentsbeing considered. Both main simulationsassume that the
standarderrorof the mean is equal to 1.
The simulationswill evaluate the accuracy of the five means-separationtechniques
by trackingthe average numberof Type I, Type II, and total errorscommitted.A Type I
erroris made when a method fails to group a pair of identical means, and a Type II error
occurs when a method incorrectlygroups a pair of dissimilarmeans. This errorcounting
is possible since the true treatmentmeans are known in advance. Tables 3 through 8
summarizethe results for the main simulations.
To illustrate, in Table 3 there are various sets of actual population means which
were used to test the performanceof the NEW method. The simulationgenerated10,000
"experimentalresults" with significant difference for each of the different sets of true

CLUSTER-BASEDAPPROACHTo MEANS SEPARATION 183
treatmentmeans;thatis, each "experimentalresult"had an F value indicatinga difference

among the treatmentmeans (a = .05). Table 3 shows the average total numberof errors
or incorrectdecisions for each of the five procedures.If the F value was not significant,
the "experimentalresult"was discardedand anothergenerated.The performanceof the
multiple range or multiple comparison proceduresis based on the average number of
errorsper significant "experimentalresult."
Generally, for Tables 3, 4, and 5, the NEW procedureproduced a slightly higher
TYPE I error rate and a substantiallylower TYPE II error rate. For total errors, the
NEW procedureis generally better with two exceptions that must be noted. First, when
all means are actually equal (Set 1), the NEW procedure is considerably worse than
the other four procedures.Generally, most experimentsare conducted under the notion
that there will be differences. In this case, each of 10,000 cases analyzed had already
suffereda TYPE I error!Second, when the middle three means are equal, the first mean
is three standarddeviations below the middle three, and the fifth mean is three standard
deviations above the middle three (Set 12), the NEW procedureperformsslightly worse
than LSD in Tables 3 and 4.
Because overlappinggroupingsare not permittedwith the NEW procedure,it should
come as no great surprisethat the Set 1 simulation would produce a substantialerror
rate. The second case (Set 12) is somewhat similar to Set 1 in that existing differences
are quite marked (three standarddeviations), but there are three means in the middle
which are all equal.
For Tables 6, 7, and 8, the Type I errorrate is generally much higher for the NEW
method. This is especially true for sets 1 through 4, which contain large numbers of
equal means in the middle of the set of population means. For sets 5 through 9, the
NEW method still displays larger average numberof Type I errors,but the disparityis
smaller. In contrast,the NEW method produces much smaller average Type II errorfor
sets 1 through9. Total errorfor the NEW method is generally smaller for sets 4 through
6 and much smaller for sets 7 through9. In summary,the NEW method performswell
when there are true differences throughoutthe mean set or when the similarities are
concentratedat the extremes of the sorted mean set. Two additionalsets of simulations
accompany the first main set in order to address questions concerning seriousness of
errorsand heterogeneousvariances.
With regardto the former question, a simulation, identical to the one summarized
in Table 4, was run to compare the performanceof the five methods under varying
Type II weights. The following example illustratesthe weighting scheme. Suppose that
1I < 12 = 13 = 14 < 15. If the mean 1 is incorrectly grouped with means 2, 3, or 4,
the Type II erroris given a weight of 1. If mean 1 is incorrectlygrouped with mean 5
the Type II erroris assigned a weight of 2. Similarly, if means 2, 3, or 4 are incorrectly
groupedwith mean 5, the erroris given a weight of 1. Table 10 summarizesthe average
Type II errorunderthis weighting scheme.

9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1
16 15 14 13 12 11 10 Set 16 15 14 13.12 11 10 Set
1!L t/L
17.00
17.00
20.00 17.00 18.00
17.00 18.00
18.00 18.00
18.00
19.00 19.00
19.00 19.50 20.00
19.75 20.00 17.00
17.00 17.00
17.00 18.00
18.00 18.00
18.00 18.00
19.00
19.00 19.50
19.00 19.75
20.00
1
I2 2
20.00 18.00
17.0019.00 18.00
20.00 18.50
19.00 20.00
19.50 19.00
19.50 20.00
20.00 20.00
20.00 True 20.00 18.00
17.00 20.00
19.00 18.50
18.00 19.50
19.00 20.00
19.00 20.00
19.50 20.00
20.00
20.00 True
A3 /X3
20.5020.00
20.0020.00 20.00
20.00 20.00
20.00 20.00
20.00 20.00 20.00
20.00 20.00 20.00
20.00 20.50
20.00
20.00
20.00
20.00
20.00
20.00
20.00
20.00
20.00
20.00
20.00
20.00 20.00
20.00 20.00
treatment treatment
t4 114
22.00 22.00
23.00 21.00 22.00
20.00 21.50
21.00
20.5021.00
20.0020.50 20.00
20.00 20.00
20.00 means 23.00
22.00 22.00
21.00
20.00 21.50
22.00 21.00
20.50
20.00
21.00
20.50
20.00
20.00
20.00
20.00 means
/5 b15
24.00 23.00
23.00
23.00 22.00
22.00
23.00 22.00
22.00 21.00
22.00 21.00
21.00 20.50
20.25
20.00 23.00
24.00 23.00
23.00
23.00
22.00
22.00
22.00
22.00
22.00
21.00
21.00 20.50
21.00 20.25
20.00
02.1
05.0 03.4
05.202.7
04.5 05.806.1
05.6 03.7
04.8 05.2
06.404.6 03.4LSD
05.6 02.3
04.8 05.1
04.6 02.7 05.4
03.4 05.605.8
03.5
04.5 05.1
06.004.5 05.4
03.8LSD
06.8 06.5
03.5 07.204.5
05.4 08.008.1
07.8 05.2
06.6
08.605.8
06.3 01.2HSDAverage
06.4 06.7 06.8
04.0 07.4 05.5
04.6 07.7 05.2
07.908.0 06.4 06.1
08.405.8 01.5HSDAverage
06.3
total total
Table Table
05.9 05.3
02.6 06.203.6
04.6 07.207.3
06.9 06.1
04.5 06.2
05.9
07.905.4 01.8SNK 05.8
03.0
05.7
06.3
03.7
04.7 07.007.1
06.8 04.3 07.605.2
05.8 05.9
05.7 02.3SNK 3.
4.
number number
05.2
02.2
04.7
05.402.8
03.7 06.106.3
05.8 03.8
05.0 05.3
06.704.7 05.7 of
03.0DUN T= 05.0
02.4 05.3
04.8 02.9
03.5
05.6
05.806.0
03.6
04.7 05.2
06.304.5 03.6DUNof
05.4 T=
5, 5,
errors x errorsx
04.2 03.9
01.9 04.502.8 04.705.0
04.4
02.6 03.6 04.904.2
03.8 04.6 05.0NEW
05.0 o,2 04.1
02.0
04.0
04.5
02.8
02.6
04.4
04.605.0
03.5 04.904.2
03.7 04.6
04.9
05.0NEW o.2
= =
1 1,
n= 00.1
00.1 00.1 01.1 n=
00.1
00.1
00.0 00.1
00.000.2 00.0
00.000.0
00.3
00.3 00.8
00.000.6 03.4LSD
01.0 00.0 00.2
00.0 00.0
00.000.0
00.3
00.3
00.000.7
01.0 03.8LSD
6, 3,
Average Average
00.0
00.0
00.0
00.000.0
00.0
00.0
00.000.0 00.1
00.0 00.000.1
00.3 01.2HSD
00.3 dfe 00.0
00.0 00.0
00.0 00.0
00.0 00.000.0
00.0 00.1
00.1 00.000.2
00.3
00.4
01.5HSD dfe
total total =
=25 10
00.1
00.0 00.000.1
00.0 00.1
00.0 00.1
00.000.0 00.2
00.000.200.4
00.5
01.8SNK 00.1
00.0 00.0
00.0 00.1
00.1 00.0 00.1
00.000.0 00.2
00.000.3
00.5
00.6
02.3SNK
number number
0 0 0 0 of of
00.1
00.1 00.000.1
00.0 00.1
00.0
00.000.0 00.2
00.2 00.00.50.70.93.0 DUN 00.1
00.1
00.0
00.0 00.1
00.2 00.000.0
00.0 00.3
00.3 00.000.6
00.9
01.0
03.6DUN
type type
I I
00.1
00.1
00.0 00.2
00.000.4 00.0
00.000.0 00.001.1
00.5
00.7 01.3
01.5
05.0 00.1
00.1
00.0
00.0 00.2
00.4 00.000.0
00.0 00.7 00.001.1
00.5 01.3
01.5
05.0NEW
NEW
errors errors
04.9
02.0 05.202.5
04.5 03.3 05.806.1
05.6 04.5
03.4 04.4
06.404.0 00.0LSD
04.6 02.2
04.7 05.1
04.6 02.5
03.3
05.4 03.2
05.605.8 04.2 04.1
06.003.8 04.3
00.0LSD
Average Average
06.8
03.5 07.204.5
06.5 05.4 08.008.1
07.8 05.2
06.5
08.605.7 06.1
06.0 00.0HSD 04.0
06.7 06.8
07.4
04.6
05.5
07.7 05.1
07.908.0 06.3
08.405.6
05.8
05.9
00.0HSD
total total
05.9
02.5 06.203.5
05.3 04.5 07.207.3
06.9 04.4 07.905.2
05.9 05.5
05.7
00.0SNK 05.8
02.9
05.7
06.3
03.6
04.6 07.007.1
06.8 04.2
05.6 05.2
07.604.9 05.3
00.0SNK
number number
of of
05.1
02.1
04.7
05.402.7
03.6 06.106.3
05.8 03.6 06.704.2
04.8 04.8
04.6 00.0DUN 02.3
04.9 04.8 02.7
05.3 03.4 05.806.0
05.6 03.3
04.4
06.303.9 04.4
04.3 00.0DUN
type type
11 11
04.1
01.8 04.502.4
03.9 02.4 04.705.0
04.4 02.9 04.903.1
03.3 03.5
03.3 00.0NEW 04.0
01.9
04.0
04.5
02.4
02.4 04.605.0
04.4 02.8 04.903.1
03.2 03.4
03.3 00.0NEW
errors errors

9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1
Set 16 15 14 13 12 11 10
Set
16 18 16 16 16 18 18 19 20 1. 11
1
20.00
17.00
17.00
17.00
17.00 18.00
18.00 18.00
18.00
18.00
19.00
19.00
19.00
19.50
19.75
20.00
17 18 16 16 16 18 20 20 20 P2
18 19 18 16 16 20 20 20 20 /'3 A2
17.00
20.00 18.00
19.00 18.00
20.00 18.50
19.00 20.00
19.50 19.00
19.50 20.00
20.0020.00
20.00 True
True
19 19 20 20 16 20 20 20 20 114
/13
20.50
20.00
20.00
20.00
20.00 20.00
20.00 20.00
20.00
20.00
20.00 20.00
20.00 20.00
20.00
20.00
20 20 20 20 20 20 20 20 20 P5 treatment
treatment
21 21 20 20 24 20 20 20 20 116 /14
22.00
23.00
22.00
21.00 22.00
20.00 21.50
21.00
20.50
20.00
21.00
20.50
20.00
20.00
20.00
20.00 means
22 21 22 24 24 20 20 20 20 17 means
5
23 22 24 24 24 22 20 20 20 A8 24.00
23.00
23.00
23.00
23.00
22.00
22.00
22.00
22.00
22.00
21.00 21.00
21.00 20.25
20.50 20.00
24 22 24 24 24 22 22 21 20 119
02.0
05.0 05.102.7
04.5 03.5 05.906.2
05.6 04.8
03.7 06.504.6
05.3 03.2LSD
05.6
16.919.8
10.7
04.7 09.4
02.514.0 13.3
09.5LSD
06.8
03.4 07.204.4
06.3 08.008.2
07.8
05.4 05.2
06.6
08.705.9
06.3 01.1HSDAverage
06.5
26.328.6 06.121.1
13.0
20.0 01.5HSDAverage
14.2
12.9
total Table
Table 06.0 06.103.5
05.2
02.4 04.6
06.9
07.307.4 06.1
04.6 08.005.4
05.9
06.2
01.7SNK 5.
total 6.
16.1
22.927.4 03.520.1
08.7 11.9 02.2SNK
13.8 04
number
02.1
05.3 04.6
05.402.8 05.9
03.7 06.306.5 05.1
03.9 06.98 05.4 02.9DUNof
05.7 T=
T=
number 5,
18.021.5
11.8
05.4
02.715.3 07.6DUNof
13.2
09.5 9,
errorsx
x 04.2
02.0
03.9
04.502.9
02.7
04.4
04.705.0 04.904.1
03.8
03.6 04.6
04.8
05.0NEW u2-
=
errors 2= 1,
07.2
11.313.4 02.7
02.411.6 15.6
11.7 18.3NEW
1, n
=
n= 00.1
00.1
00.0 00.1
00.000.2 00.0 00.2
00.000.0 00.2
00.000.5
00.8 03.2LSD
00.9
3, 12,
00.000.3
00.2
00.4
00.600.8
02.3
04.0
09.5LSD Average
Average
dfe 00.0
00.0
00.0
00.000.0
00.0
00.0
00.000.0
00.0 00.000.1
00.0 00.2 01.1HSD
00.3 dfe
total =
00.000.0
00.0 00.000.1
00.0 00.2
00.5
01.5HSD
total 18 55
00.1
00.0 00.000.1
00.0 00.1
00.0 00.1
00.000.0 00.1
00.000.2
00.3
00.4
01.7SNK
number
00.000.1
00.1
00.2
00.200.2
00.3 02.2SNK
00.7
of
number
00.1
00.1 00.000.1
00.0 00.1
00.0 00.2
00.000.0 00.2
00.000.4 00.8
00.7 02.9DUN
of type
00.000.2
00.2
00.4
00.500.6 03.1
01.5 07.6DUN /
type
I 00.1
00.2
00.0 00.2
00.000.5 00.0
00.000.0
00.7
00.5
00.001.0
01.3
01.4
05.0NEW
errors
00.001.0 01.2
00.9 01.203.6
07.2
09.4
18.3NEW
errors
04.9
01.9 05.102.5
04.5 03.4 05.906.2
05.6 03.5 06.504.1
04.6 04.5
04.7
00.0LSD
16.919.5
10.5
04.3 07.1
01.913.2 09.3
00.0LSD Average
Average 06.8
03.4 07.204.4
06.3 05.4 08.008.2
07.8 05.2
06.6 06.1
08.705.8 06.2
00.0HSD
06.121.0
total
26.328.6
20.0
13.0 13.7
12.7 00.0HSD
total
06.0
02.3 06.103.4
05.2 04.5
06.9
07.307.4
04.5 08.005.2
06.0 05.6 00.0SNK
05.8
22.927.3
16.0
08.5
03.319.9 13.1
11.6 00.0SNK number
number of
of 05.2 04.6
02.0 05.402.7
03.6 06.306.5
05.9 03.7 06.904.4
04.9 04.9
04.7 00.0DUN
18.021.3
11.6 02.214.7
05.0 10.1
08.0 00.0DUN type
type 11
11 04.1
01.8 04.502.4
03.9 04.4
02.5 04.705.0 04.903.1
03.3
02.9 03.3
03.4
00.0NEW
11.312.4
06.3 01.208.0
01.5 04.5
06.2
00.0NEW errors
errors

186 M. G. BAUTISTA,
D. W. SMITH,ANDR. L. STEINER
9 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1
Set Set
1 1 1 1 1 1 1 1 A-s 1 1 1 1 1 1 1 1
6 8 6 6 6 8 8 9 20 6 8 6 6 6 8 8 9 20 1ji
1
1 1 1 1 1 1 1 1 1 1 1 1
7 8 6 6 6 8 20 20 20 12 7 8 6 6 6 8 20 20 20 2
1 1 1 1 1 1 1 1 1 1
8 9 8 6 6 20 20 20 20 1J3 8 9 8 6 6 20 20 20 20 /13
1 1 1
9 9 20 20 6 20 20 20 20 /14
True 1 1 1 True
9 9 20 20 6 20 20 20 20 P-4
20 20 20 20 20 20 20 20 20 /15 20 20 20 20 20 20 20 20 20 /15
21 21 treatment treatment
20 20 24 20 20 20 20 16 12 21 20 20 24 20 20 20 20 /6
means 2 means
22 21 22 24 24 20 20 20 20 [17 22 1 22 24 24 20 20 20 20 /17
23 22 24 24 24 22 20 20 20 /A8 23 22 24 24 24 22 20 20 20 /8
2 2
24 22 24 24 24 22 22 1 20 /9 24 22 24 24 24 22 22 1 20 /19
16.220.1 04.1
10.0 02.214.5
09.4 08.1LSD
13.0 10.2
16.320.0 04.2
02.314.3 13.1
09.4 08.5LSD
Average 25.228.5 Average

18.4
24.928.5 05.221.1
11.5 01.1HSD
14.2
13.0 18.8
11.9
05.421.0 14.2
13.0 01.3HSD
Table Table
total 8. total 7.
01.
21.427.4 07.1
14.5 02.920.1
12.1
13.9
01.5 14.9
21.827.4 03.020.1
07.5 12.1
13.9
7 SNK
SNK' T
number number
T-=9, 13.1 =9,
17.422.2
11.0 02.215.9
04.7 09.8 of
05.9DUN
13.0 11.2
17.622.0 04.9
02.315.9
09.7 of
06.3DUN
x x
errors 21 errors oa?=
10.813.5
06.9 02.211.7
02.5 18.1
15.3
11.6 11.013.5 02.5
07.0 02.311.7
11.6 18.2NEW
15.5
NEW 1,
n
00.2
00.000.2 08.1LSD n=12, 00.2
00.000.2 00.4
00.600.7 08.5LSD
03.4
01.9 =6,
00.4
00.600.7 03.2
01.7
Average Average
dfe
dfe
00.000.0
00.0
00.0 00.1
00.000.0 01.1HSD
00.3 00.000.0
00.0
00.0 00.1
00.000.0 00.4
01.3HSD
total total =45
=99
00.000.1
00.1 00.200.1
00.2 00.1
00.4
01.5SNK 00.1
00.000.1 00.200.1
00.2 00.2
00.5
01.7SNK
number number
of of
00.2
00.000.2 00.4
00.400.4 02.1
01.0 05.9DUN 00.000.2
00.2 01.1
00.400.5
00.4 02.4
06.3DUN
type type
I I
00.001.0 01.2
01.0 01.303.6 09.1
07.0 18.1 00.001.0
01.0
01.2
01.303.6 18.2NEW
09.3
07.0
NEW
errors errors
16.219.9
09.8
03.7
01.613.8 09.8
07.7 00.0LSD 10.0
16.319.8 03.8
01.713.6
07.5
09.7
00.0LSD
Average Average
24.928.5
18.4 05.221.1
11.5 13.9
12.9 00.0HSD 25.228.5
18.8
11.9
05.421.0 13.8
12.9 00.0HSD
total total
21.427.3
14.4
06.9
02.720.0 13.5
12.0 00.0SNK 21.827.3
14.8
07.3
02.820.0 13.4
11.9 00.0SNK
number number
of of
17.422.0
10.8
04.3
01.815.5 10.9
08.8 00.0DUN 17.621.8
11.0
04.5
01.915.4
08.6
10.7
00.0DUN
type type
11 II
10.812.5
05.9 00.908.1
01.3 06.2
04.6 00.0NEW 11.012.5
06.0 01.008.1
01.3 06.2
04.6 00.0NEW
errors errors

9 8 7 6 5 4 3 2 1
16 15 14 13 12 11 10 Set
/
1
20.0017.00
17.00 17.00
17.00 18.00
18.00 18.00 18.00
18.00 19.00
19.00
19.00 19.75
19.50 20.00
1A2
20.00 18.00
17.00 19.00 18.50
18.00
20.00 19.00
19.5019.00
20.00 20.00
19.50 20.00
20.00
20.00 True
/13
20.5020.00
20.00 20.00
20.00
20.00 20.00
20.00 20.00
20.00
20.00 20.00
20.00 20.00
20.00 20.00
treatment
1L4
22.00
23.00 21.00
22.00 20.00 21.50
22.00 21.00
20.50 21.00
20.0020.50 20.00
20.00 20.00
20.00 means
Table
/.5 9.
24.00
23.00
23.00
23.00 22.00
23.0022.00
22.00
22.00 21.00
22.00 21.00
21.00 20.25
20.5020.00
T=5,
04.8 05.102.7
04.5
02.0 03.4 05.806.1
05.5 03.7
04.8 05.2
06.404.6 05.4
03.4LSD
x1
oa2x1
06.6
03.5 07.204.5
06.4 08.008.1
07.7
05.3 05.2
06.6 06.2
08.505.8 01.3HSDAverage
06.3
.6,
total
05.8
02.5 06.103.6
05.3 04.5 07.207.3
06.8 06.1
04.6 05.8
07.805.3 01.9SNK
06.0 ao2
number
02.1 06.106.3 05.1 03.1DUNof .8,
05.0 04.6
05.302.9
03.6
05.8 03.8 06.704.7
05.3
05.5
a,
2
errorsx273
04.2
01.9
03.9
04.502.8
02.6 04.705.0
04.4 03.6 05.004.1
03.8 04.7
04.9
04.9NEW =
1.0,
00.1
00.0 00.0 00.1
00.000.2 00.3
00.000.0
00.0 00.3
00.000.6
00.8 03.4LSD
00.9 ,2-?
=
Average
00.0
00.0 00.000.0
00.0 00.0
00.0 00.1
00.0
00.000.0 00.000.1
00.3
00.3
01.3HSD 1.2,
total
a2
X4xs
00.1
00.0 00.000.1
00.0 00.1
00.0 00.1
00.000.0 00.2
00.000.2
00.4 01.9SNK
00.5
number=
of
1.4,
00.1
00.0 00.0 00.1
00.000.2 00.0 00.2
00.000.0 00.3 00.7
00.000.5 03.1DUN
00.8
type
I n=
00.1
00.1
00.0
00.000.4
00.2
00.0 00.7
00.000.0 00.5
00.001.0
01.3
01.4
04.9NEW 6,
errors
dfe=
25
04.8
01.9 05.102.5
04.5 03.3 05.806.1
05.5 03.4
04.5
06.404.0
04.4
04.5
00.0LSD
Average
03.5
06.6 07.204.5
06.4 05.3 08.008.1
07.7 05.2
06.5
08.505.7
05.9 00.0HSD
06.0
total
02.4
05.8 06.103.5
05.3 04.4 07.207.3
06.8 04.5 07.805.1
05.9 05.4
05.5
00.0SNK
number
of
02.0
05.0 04.6
05.302.7
03.5 06.106.3
05.8 03.6 06.704.2
04.8 04.6
04.7
00.0DUN
type
11
04.1 03.9
01.8 04.502.4
02.4
04.4 02.9
04.705.0 05.003.1
03.3 03.4 00.0NEW
03.5
errors

188 M. G. BAUTISTA, D. W. SMITH, AND R. L. STEINER
Table 10. T-= 5, q. = 1, n= 6, dfe = 25

x
True treatment means Average number of weighted type 11errors
Set 111 /12 /13 /14 /15 LSD HSD SNK DUN NEW
1 - - - - - - - - -
2 19.75 20.00 20.00 20.00 20.25 05.2 07.0 06.4 05.4 03.9
3 19.50 20.00 20.00 20.00 20.50 04.9 06.7 06.2 05.1 03.7
4 19.00 20.00 20.00 20.00 21.00 04.2 06.2 05.6 04.5 03.2
5 19.00 19.50 20.00 20.50 21.00 11.1 16.1 14.6 11.8 08.3
6 19.00 19.00 20.00 21.00 21.00 06.0 09.3 08.4 06.5 04.3
7 18.00 20.00 20.00 20.00 22.00 03.4 05.3 04.5 03.6 02.9
8 18.00 19.50 20.00 20.50 22.00 09.8 14.3 12.6 10.4 08.1
9 18.00 19.00 20.00 21.00 22.00 09.0 13.9 12.2 09.6 07.1
10 18.00 18.50 20.00 21.50 22.00 08.3 13.5 11.6 08.9 06.3
11 18.00 18.00 20.00 22.00 22.00 03.9 07.2 05.9 04.3 02.8
12 17.00 20.00 20.00 20.00 23.00 02.5 04.5 03.5 02.7 02.4
13 17.00 19.00 20.00 21.00 23.00 07.5 11.8 09.8 08.0 06.8
14 17.00 18.00 20.00 22.00 23.00 05.9 00.9 07.8 06.3 05.2
15 17.00 17.00 20.00 23.00 23.00 02.0 00.0 02.7 02.2 01.9
16 20.00 20.00 20.50 22.00 24.00 06.7 10.4 08.8 07.1 05.5
In Table 10 there are smaller average Type II errors for the NEW method in all of
the eight sets. Note that the same ordering of methods occurs with a set for Tables 4 and
10. In other words, the ranking of the methods does not change with the weighting. The
only effect of the scheme is to increase the number of Type II errors in relation to the
Table 4 results.
A final set of simulations examines the effect of heterogeneous variances on the
means separation techniques. The standard errors of the means are .6, .8, 1.0, 1.2,
and v1.4 for groups 1, 2, 3, 4, and 5, respectively. The results from these simulations
are summarized in Table 9. A comparison of results to Table 4 shows no real effect from
this departure from homogeneity.
6. COMPUTATION
The NEW procedure can be implemented in the Statistical Analysis System (SAS
Institute 1990) with the macro included in the appendix. This macro should be invoked
after PROC ANOVA or PROC GLM indicates that there is a detectable difference be-
tween the treatment means. An example is given below in order to illustrate the use of
the code. In this example the macro is called after the dataset inputdat is created.
This particular dataset has a response variable denoted as NITRO and a classification
variable called CULTURES.
Execution of the macro requires six parameters: the dataset, the response variable,
the classification variable, alpha, error sum of squares, and error degrees of freedom.
1. The dataset must contain a response and a classification variable as a minimum

but can have other variables also.
2. In the present form, the user can analyze from 3 to 35 treatment groups with the
macro (this upper limit can easily be increased).
3. The error sum of squares and the error degrees of freedom from the ANOVA
table can be selected by including 0, 0 as the last two parameters (this choice can

be thought of as the default values). If the user wants other values for SSE and
dfe, then these may be entereddirectly into the macro for ss and dff respectively.
4. The macro should be used when there is an equal number of observations per
treatmentgroup. Directions for executing the procedurewhen this condition is
violated are included after the balanced example.
5. If the macro is invoked with a datasetthat is unbalancedor is lacking an overall
significant F, an errormessage is displayed.
The output begins with a display of the group means, the original groups, and
the designated groups. This table (Table 11) allows the user to see the correspondence
between the groups in the datasetand the groups assigned by the macro. In the example,
3DOkl is assigned group 1, 3DOk13 is assigned group 2, 3DOk4 is assigned group 3,
3DOk5 is assigned group4, 3DOk7 is assigned group 5, and COMPis assigned group 6.
The new group names allow the macro to compareand display a large numberof means.
After the means are shown, the macro displays the individual iterations, each in-
cluding the iteration number, the within group F test, the between group F test, and
the current grouping of means. The final grouping of means is given in the matrix
SIMILAR. The following example illustratesthe interpretationof this matrix. Cultures
2 and 3 are not different, and hence form a larger group. Cultures4, 5, and 6 are also
not different, forming another group. Culture 1 is different from all the other cultures.
SIMILAR
23
456
The above result correspondsto the following conclusion using the familiarline notation:
1 23 456.
The researchercan also use the NEW procedureon unbalanceddata. After finding
a significantoverall F, the means are sorted, and the closest two means are located. The
observationsfrom these two groupsare combinedto form a largergroup.A new variable,
denoted as Gl, is added to the original dataset indicating the new grouping.Proc GLM
is then run with the following model statement:model resp = Gl group (Gl) .
The within group F test and between group F test are evaluated using the ANOVA
table. Continue the process of forming new groups (and adding new group variablesto
the dataset) until the stopping conditions are satisfied.
As with most statistical procedures, the goal of the analysis is important.If the
researcheris interestedin nonoverlappinggroups, then the NEW procedurecould prove
to be quite useful. On the other hand, if the treatmentmeans were 25, 30, 35, 40, 45,
and 50, then the NEW proceduremight not be so useful.

Table 11. Implementation

of Macroin SAS
Data inputdat;
input resp group $ @@;
cards;
19.4 3DOkl 32.6 3DOkl 27.0 3DOkl 32.1 3DOkl 33.0 3DOkl
17.7 3DOk5 24.8 3DOk5 27.9 3DOk5 25.2 3DOk5 24.3 3DOk5
17.3 COMP 19.4 COMP 19.1 COMP 16.9 COMP 20.8 COMP
proc anova data=inputdat;

class group;
model resp=group;
%meta (inputdat, resp, group, 0.05, 0, 0)
7. CONCLUSION
The NEW procedure unambiguously defines the structure among treatment means
by gathering like means into groups. There are no treatment means designated to more
than one group. Treatment means within a group are considered to be homogeneous.
The NEW method performs favorably within the present set of simulations against well-
known multiple range or multiple comparison procedures.
APPENDIX: SAS MACRO

%macro meta( ds , respvar , classvar , alpha ss, dff);
proc sort data=&ds ; by &classvar;
proc anova data = &ds outstat = aovout(keep = ss df f prob

noprint;
class &classvar;
model &respvar = &classvar;
proc means data = &ds noprint;

output out= meanout(keep=mean &classvar n ) mean = mean n = n;
var &respvar; by &classvar;
proc print data = meanout;
/* S T A R T I M L */
proc iml;
/* S 0 R T */
START SORTIT(X, sec );
HOLD1=J(1, ncol(x),.);
HOLD2=J(1,ncol(sec),.);
DO J = 1 TO Nrow(X)-1;
DO I = 1 TO Nrow(X) -1;

To MEANSSEPARATION
APPROACH
CLUSTER-BASED 191
IF X(|i|) > x(|i+1) then do;

holdl= x(|i, );
x(I i, ) = x(i+1, 1)
x(| i+1, ) = holdl;
hold2= sec(i, 1);
sec(| i, ) = sec(ji+1, );
sec(I i+1, ) = hold2;
end;
end;
end;
finish sortit;
/* M E A N S */
start meanit( x,avel z );

MARKER=1;
goo = 1;
avel = j(nrow(x), 1,
ave2 = j(nrow(z), 1,
do i = 2 to nrow(x);
if z(|il) ^= z(li-11) then do;
zum =0;
do j = goo to i-1;
zum=zum+ x(IjI)
end;
avel(IMARKER|) = zum/(i-goo);
ave2 (IMARKERI)=z (IGOO|);
goo = i;
MARKER = MARKER+1;
end;
end;
if z(|nrow(x)|) ^ =z(I nrow(x)-1) then do;

avel (IMARKER|) =x ( nrow (x) );
ave2(|MARKER|) =z (|NROW(X) |);
end;
else do;
zum =0;
do j = goo to NROW(X);
zum=zum+ x(I
end;
avel(Imarker|)= zum/ (NROW(X) -goo+1);

ave2 (|marker|) =z (|NROW(X) );
END;
DO WHILE ( AVE1[ NROW(AVE1) ] =

IF AVE1 (I NROW(AVE1 ) I) THEN AVE1 = AVE1[ 1: NROW(AVE1) - 1;
END;
ave2 = ave2[ 1:nrow(avel) ];
z=compress(ave2);
finish;

/* V A R I A N C E */
start varit( x,Z,coll, AVE );

MARKER=1;
goo = 1;
ave = j(nrow(x)-1, 2, 0 );
if Z(li,COLLI) ^= Z(li-1,COLLI) then do;
zum =0;
zum2=0;
do j = goo to i-1;
zum=zum+ x(j I);
zum2=zum2+ x(jjj)*x(jjj);
end;
if i - goo = 1 then ave(Imarker,1|) = O;else
ave(IMARKER,11)= (zum2 - zum*zum/(i-goo))/(i-goo-1);

ave(|MARKER, 2|) =LENGTH(Z(IGOO, COLL|);
goo = i;
MARKER = MARKER+1;
end;
end;
if Z(Inrow(x),COLL|) "=Z(t nrow(x)-1,COLLI) then do;

ave (IMARKER, 1|)=0;
ave( |MARKER,2| )=LENGTH(Z(jNROW(X) ,COLL|));
end;
else do;
zum =0;
zum2=0;
do j = goo to NROW(X);
zum=zum+ x(Iij) ;
zum2=zum2+ x(Iij) *x(Ij );
end;
if nrow(x)-goo+l=l then ave(Imarker,11)=0 ; else

ave(jmarker,l|)=(zum2-zum*zum/(NROW(X)-goo+1))/(nrow(x) -goo);
ave(Imarker,21)=LENGTH(Z(INROW(X) COLLI));
END;
DO WHILE ( AVE[ NROW(AVE) 21 = 0 );

IF AVE(I NROW(AVE ) 21) = 0 THEN AVE = AVE[ 1:NROW(AVE )-1, 1
END;
finish;
/* M E T A S */
START metas( X , outdat );
diff = j(1,1,4000000);
mindiff = diff;
place = diff;
upper = nrow(x);

do i = 2 to upper;
diff = x( i ) - x( i-11)
if diff < mindiff then do;
mindiff = diff;
place = i;
end;
end;
loc = j(1,1,'
loc = outdat (Ip1acel);
met =compress( outdat (|place|) +outdat (Iplace-1)1);

meta=outdat +
do i = place-1 to place;
meta(lil) = met;
end;
OUTDAT= compress(outdatl|meta);
FINISH;
/* M E R G E */
start merg( x, y
dumm = j( nrow(x) 1,'
do j = 1 to nrow(y);
if x(ji, ncol(x) 1) = y(Ij, 11)
then dumm(|i|) = y(|j,21);
end;
end;
x=xllcompress (dumm);
finish;
/* S T A R T M A I N */
start main;
use aovout; read all into aov;
use meanout; read all var{mean n} into x;

use meanout; read all var{&classvar} into ORIG_grP;
KEEPIT = J ( NROW(X) 1
chars={''''''''''''''''''''''''''''''g,
keepit=chars[ 1:nrow(x) ];
/* c h e c k f o r e q u al n a n d s i g n i f f*/
if aov[2,4] < &alpha & x[ , 2]/x[1,2] = j( nrow(x),1,1)

& &ss >= 0 & &dff >= 0 & NROW(X) < 36
then do ; * O K T O R U N M E T A G R O U P S;
n = x[1,2];

x=x[ , 1];
dftrt=aov[2,1];
sstrt=aov[2,2];
if &ss=O & &dff = 0 then do;

sserr = aov[1,2];
dferr = aov[1,1];
end;
else do;
sserr = &ss;
dferr = &dff;
end;
NEWGROUP= KEEPIT;
print 'GROUP MEANS ORIGINAL GROUPS ASSIGNED GROUPS';
VVVV={ 'MEANS' };
print x[COLNAME=VVVV] ' ORIG_grP' ' NEWGROUP N;
/* b e g i n t h e m e t a p r o c e s s*/
CALL SORTIT(x,keepit );
XORIG=X;
CALL METAS(x, keepit);

z=keepit[ ,2];
* p r o d u c e m a t r i x o f m e t a g r o u p s;
indx=l;
do while ( indx <= nrow(keepit) - 3 );
z= keepit[ , ncol(keepit) ];
call sortit(xorig,z);
call meanit(xorig,x,z);
call sortit(x,z);
call metas(x,z);
call merg(keepit,z);
z= z[ , 2];
indx=indx+l;
end; * o f w h i 1 e
indx = 1;
do until ( sigbet = 1 sigwin = llindx=NCOL(KEEPIT));
sigbet=O; sigwin = 0;
dfwin=O;
sswin=0;
indx = indx+1;
DUMIT = J( NROW(XORIG) 1 ,
CALL SORTIT( XORIG, DUMIT );

call varit(XORIG, keepit, indx , vari );
do i = 1 to nrow(vari );
dfwin = dfwin + vari(ji,21)-1;
sswin = sswin + vari(Ii,1j)*(vari(Ii,2j)-1 );
end;
sswin = n*sswin;
fbet = (( sstrt - sswin )/(dftrt-dfwin ) )/ (SSERR/dfERR

fwin = ( sswin/dfwin )/(SSERR/dfERR )
OSLBET= 1-PROBF( FBET, DFTRT-DFWIN , DFERR , 0 );

OSLWIN= 1-PROBF( FWIN, DFWIN , DFERR, 0 );
if OSLbet > &ALPHA then sigbet = 1;

if OSLwin < &ALPHA then sigwin = 1;
DFBET=DFTRT-DFWIN;
SSBET=SSTRT-SSWIN;
PRINT '-------------------------------------------------
IT=INDX-1;
PRINT 'I T E R A T I 0 N' IT
PRINT I I;
PRINT 'F TEST FOR WITHIN SUM OF SQUARES';
PRINT DFWIN SSWIN OSLWIN;
PRINT I I;
PRINT 'F TEST FOR BETWEEN SUM OF SQUARES';
PRINT DFBET SSBET OSLBET;
PRINT I I;
PRINT DFERR SSERR SSTRT;
PRINT
PRINT
SIMILAR =UNIQUE( KEEPIT[ ,INDX]);
SIMILAR=SIMILAR';
PRINT 'MEAN GROUPS';
PRINT SIMILAR;
PRINT '-----------------------------------------------
end; * o f r e p e a t ;
if indx = NCOL(KEEPIT) then do;

if sigbet=0 & sigwin = 0 then indx=NCOL(KEEPIT);
else indx = NCOL(KEEPIT)-1;
end;
else indx = indx-1;
PRINT '------------------------------------------------
PRINT
PRINT
SIMILAR = UNIQUE(KEEPIT[ ,INDX]);
SIMILAR = SIMILAR';
PRINT I I;
PRINT 'F I N A L G R O U P I N G O F M E A N S';
PRINT I I;
PRINT SIMILAR;
PRINT '------------------------------------------------

end;
else print 'Sample sizes not equal or overall F not significant';
* C O N D I T I O N S NO T S A T I S F
finish; run main;
%mend; * EN D OF M A C R O;
data inputdat;
input resp group $ @@;
cards;
enter your response and group values here.
proc anova data = inputdat;

class group;
model resp = group;
%meta(inputdat, resp , group , 0.05 , 0, 0)
[Received August 1994. Revised April 1997.]
REFERENCES
Chew, V. (1976), "ComparingTreatmentMeans: A Compendium,"HortScience, 11, 348-357.
Duncan, D. B. (1955), "MultipleRange and Multiple F Tests,"Biometrics, 11, 1-42.
Federer,W. T. (1955), ExperimentalDesign Theoryand Application,New York:Macmillan.
Fisher, R. A. (1951), The Design of Experiments(6th ed.), London: Oliver and Boyd.
Gabriel, K. R. (1978), "A Simple Method of Multiple Comparisons of Means", Joumnalof the American
StatisticalAssociation, 73, 724-729.
Gupta,S. S. (1965), "On Some MultipleDecision (Selection and Ranking)Rules," Technometrics,7, 225-245.
Harter,H. L. (1960), "Tablesof Range and StudentizedRange,"Annals of MathematicalStatistics, 31, 1122-
1147.
Hochberg, Y. (1974),"Some Generalizationsof the T-Method in SimultaneousInferences,"Journal of Multi-
variate Analysis, 4, 224-234.
Keuls, M. (1952),"TheUse of the 'StudentizedRange' in ConnectionWith an Analysis of Variance,"Euphytica,
1, 112-122.
Newman, D. (1939), "The Distributionof Range in Samples From A Normal Population,Expressedin Terms
of an IndependentEstimate of StandardDeviation,"Biometrika,31, 20-30.
Version6 (4th ed.), North Carolina:
SAS Institute(1990), SAS/STATUser's Guide Volume2, GLM-VARCOMP,
Author.
Scheffe, H. (1959), TheAnalysis of Variance(1st ed.), New York:Wiley.
Scott, A. J., and Knott, M. (1974),"A Cluster Analysis Method for Grouping Means in The Analysis of
Variance,"Biometrics, 30, 507-512.
Sidak, Z. (1967), "RectangularConfidence Regions for the Means of MultivariateNormal Distributions,"
Journal of the AmericanStatisticalAssociation, 62, 626-633.
Steel, R., and Torrie,J. (1980), Principles and Proceduresof StatisticsA BiometricalApproach(2nd ed.), San
Francisco:McGraw-Hill.

Stoline, M. R. (1978), "Tablesof the StudentizedAugmentedRange and Applicationsto Problemsof Multiple

Comparisons,"Journal of the AmericanStatisticalAssociation, 73, 656-660.
Stoline, M. R., and Ury, H. K. (1979), "Tables of the StudentizedMaximum Modulus Distributionand an
Applicationto Multiple ComparisonAmong Means," Technometrics,21, 87-93.
Student (1927), "Errorsof Routine Analysis," Biometrika, 19, 151-164.
Tukey, J. W. (1949), "ComparingIndividualMeans in the Analysis of Variance,"Biometrics, 5, 99-114.
(1994), " The Problem of Multiple Comparisons,"in The Collected Worksof John Tukey,ed. H. I.
Braun,New York:Chapmanand Hall.
Waller,R. A., and Duncan, D. B. (1969), "A Bayes Rule for the Systematic Multiple ComparisonsProblem,"
Journal of the AmericanStatisticalAssociation, 64, 1484-1503.
Welsch, R. E. (1977), " Stepwise Multiple ComparisonProcedures",Journal of the AmericanStatisticalAsso-
ciation, 72, 566-575.


Bautista Et Al 1997 A Cluster-Based Approach To Means Separation

Uploaded by

Copyright:

Available Formats

You might also like

Bautista Et Al 1997 A Cluster-Based Approach To Means Separation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bautista Et Al 1997 A Cluster-Based Approach To Means Separation

Uploaded by

Copyright:

Available Formats

A Cluster-Based Approach to Means Separation

Author(s): Maria G. Bautista, David W. Smith and Robert L. Steiner

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

A new procedure for grouping treatments following the determination of differences

A plant breeder is frequently faced with analyzing a particularvariable such as

?1997 American Statistical Association and the International Biometric Society

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

Table 1. Comparison of Critical Values for Four Procedures

Test Critical Values

LSD q(2, a, v)s

1. REVIEW OF PROCEDURES CONSIDERED

2. DESCRIPTION OF THE PROPOSED PROCEDURE

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

The following procedure addresses these requirements:

1. Perform the F test on a set of sample means.

3. SOME PRACTICAL CONSIDERATIONS

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

Table 2. Implementation of the Proposed Procedure for an Example

Estimated means and grouping

3DOkl 28.82 28.82 28.82 28.82 28.82

3DOk4 14.64 14.64 13.95 13.95 13.95

Index values and degrees of freedom

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

treatmentmeans;thatis, each "experimentalresult"had an F value indicatinga difference

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

Average 25.228.5 Average

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

Table 10. T-= 5, q. = 1, n= 6, dfe = 25

True treatment means Average number of weighted type 11errors

1. The dataset must contain a response and a classification variable as a minimum

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

Table 11. Implementation

proc anova data=inputdat;

APPENDIX: SAS MACRO

proc sort data=&ds ; by &classvar;

proc anova data = &ds outstat = aovout(keep = ss df f prob

proc means data = &ds noprint;

proc print data = meanout;

START SORTIT(X, sec );

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

IF X(|i|) > x(|i+1) then do;

start meanit( x,avel z );

if z(|nrow(x)|) ^ =z(I nrow(x)-1) then do;

avel(Imarker|)= zum/ (NROW(X) -goo+1);

DO WHILE ( AVE1[ NROW(AVE1) ] =

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

start varit( x,Z,coll, AVE );

ave(IMARKER,11)= (zum2 - zum*zum/(i-goo))/(i-goo-1);

if Z(Inrow(x),COLL|) "=Z(t nrow(x)-1,COLLI) then do;

if nrow(x)-goo+l=l then ave(Imarker,11)=0 ; else

DO WHILE ( AVE[ NROW(AVE) 21 = 0 );

START metas( X , outdat );

This content downloaded on Wed, 23 Jan 2013 20:21:57 PM

met =compress( outdat (|place|) +outdat (Iplace-1)1);