Carl Ace Parilla - Nelson and Rawlings

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Ten common

misusesof statistics in agronomic


1
researchandreporting
2Larry A. Nelson and John O. Rawlings

ABSTRACT The ready availability of efficient electronic com-


puters has had both positive and negative effects. On
Tencommon misuses of statistics in agronomicre- the positive side, we can process data very efficiently
searchandreportingare discussed. Someof these are a and accurately at low cost. There are analyses which can
result of changesin statistical philosophy
overthe years be run which would have been impossible before the ad-
to whichbiologists in general, and agronomistsin vent of modern computing techniques. On the other
particular, havenot respondedin terms of their data hand, ready access to statistical packages by researchers
analytic andinterpretational techniques. Othershave
with limited statistical background has increased the
been created by an overdependence on computerswith-
misuse of statistical procedures.
out careful study of the basic datapatterns or without
carefulconsiderationof the calculations whichthe com- Today there is abundant evidence of the misuse of
puteris performing.Theimportance of planningexperi- statistics by research agronomists. A current issue of
mentsproperlywith a view towardsubsequentanalysis any one of the plant science journals will provide ample
is emphasized.Careful, well-controlled experimental cases in point. AgronomyJournal is attempting to im-
techniqueis also recommended. Properplanningusually prove the situation but authors are asking for more as-
assures that logical comparisonscan be madewithout sistance in deciding which statistical techniques they
resorting to the use of mechanicalproceduressuch as should use and howthey should be used.
multiple comparisons.Thematterof misuseof multiple The purpose of this paper is to point out and discuss
comparison procedures such as Duncan’s Multiple 10 commonmisuses of statistics in agronomic applica-
RangeTest is also discussed. It is pointedout that in tions. Someof these are more a lack of use of statistics
cases wherelogical structuredoesn’texist in the treat- rather than misuses.
ments(a rare event), the use of multiple comparison
procedures is valid. MISUSES IN PLANNING EXPERIMENTS
Additional index words: Statistics, Research,Ac- Misuse 1--Failing to Involve Statistical Considerations
curacy. at the Planning Stage of the Experiment

Statistical considerations should be involved at the


M ODERN applied statistics has been utilized ex-
tensively in agronomic research in the United
conceptual stage of the experiment. This perhaps is
secondary only to the need for a good researchable idea.
States and elsewhere for the past 40 years. During this The reason for involving statistical principles early is to
period, some of our concepts have changed. For ex- insure that one obtains quality data which bear upon the
ample, there now is much less emphasis on hypothesis problem being studied. There are sampling considera-
testing and far more emphasis on estimation of the mag- tions with respect to the populations to whichthe results
nitudes of differences and other effects. No longer are of the experiment will be extrapolated; the populations
we expecting an experiment to provide the last word of environmental conditions, of experimental material,
based upon the result of some mechanical process such and of treatments. There are experimental design con-
as a hypothesis test at a stated level of significance. Now siderations such as which experimental design should be
we are looking for indications of effects and we rely on used, what treatments should be studied, and how many
the data to provide us estimates of the magnitudes of replications are needed. There are a numberof practical
these effects. There has also been somerevision in our considerations relating to the orientation, size, and
thinking about some concepts as a result of extensive shape of plots and blocks. Involvement of a statistician
application of statistical techniques to biological prob-
lems. For example, formerly we recommended the use
of large plots in field experiments because the variance IREP"!" [’ REPTF J.l REPTrr J CHEMICAL
A
of large plots is small. Nowwe recommendthe use of
’ ,
small plots with a compensating increase in number of [ REP T I ~ REP Tr I’ REP TIT CHEMICALEl
replications to use the available resources.

’ Contribution
fromthe Inst. of Statistics, NorthCarolinaStateUni., ] REP 1"[ REP 1T i REP TIT [ CHEMICAL C
Raleigh,NC27650.
2 Professorsof statistics, NorthCarolinaState Univ.,Raleigh,NC Fig. 1. Exampleof anaerial sprayingexperiment
for plantdisease
27650. controlin a SouthAmerican country.
100
NELSON & RAWLINGS: MISUSES OF STATISTICS IN AGRONOMY 101

at an early stage in the planning of an experiment, A


particularly if the researcher is not well trained in
statistics, is helpful in focusing on the specific questions C
to be answered and the relevant statistical methodsfor B
estimation and/or hypothesis testing. He also can
provide assistance in choosing treatments in such a way BLOCKI BLOCK"IT BLOCKTIT
that the treatment comparisons can be made efficiently Fig. 2. Anappropriate alternative procedurewhich the statistician
at the data analysis stage. provided the researcher (randomizedcomplete block design).

Misuse 2--Using an ImproperExperimental Design or


Misusing a Proper Design land conditions was no longer a problem and proper
replication provided an estimate of experimental error.
The most popular design in agronomic research is the Another example of misuse of a proper design is the
randomized complete block design. It is a simple design placement of the wrong factor at the whole-plot level in
and is reasonably efficient in most cases if the blocking a split-plot design. Recently, one of the authors en-
has been constructed appropriately. The second most countered a greenhouse experiment involving a study of
popular design is someversion of the split-plot design. the response of 12 different soils to lime. There were
There are manysituations in which a split-plot design is two lime levels, none and 2 t ha-’. The objective was to
used where clearly a randomized complete block design see if soils differed in their responseto lime. Asplit-plot
with factorial arrangement of treatments would be pre- design was used with lime being the whole-plot factor
ferred. In cases where there are only a few levels of the which was arranged in randomized blocks. Soil was the
whole-plot factor or there are few replications, the sub-plot factor. Althoughthe overall test of the lime x
whole-plot error in the split-plot analysis of variance soil interaction was precise, comparisonsof the two lime
will be estimated with low precision and there will not be levels within a soil were not. A muchbetter arrangement
a good test on the whole-plot factor. A factorial ar- to achieve the stated goals would be to randomize the
rangement of treatments within the frameworkof a ran- soils to the whole-plots and place the lime factor at the
domized complete block design will provide equal (and sub-plot level. Comparisonsof two lime levels within a
adequate) precision for all effects, i.e., both sets of soil would then be quite precise because the sub-plot
maineffects and interactions. error which is based upon a relatively large number of
There are manyinstances in which a split-plot design degrees of freedom would be used for testing.
arises out of an inappropriate handling of what was in-
tended to be a randomized complete block design. At Misuse 3--Failing to Use Proper Randomization
some point in the conduct of the experiment, certain Procedures
subsets of the treatments are handled in groups such as
in data collection, exposing to treatment factors, Randomizationis used to insure that we will have un-
harvesting by maturity, etc. Such nonrandom handling biased estimates of treatment effects and experimental
of the experimental units mayintroduce positive inter- error. Failure to use proper randomization techniques
plot correlations amongthe units handled as groups, could cause certain treatments to be favored or
generating a "split-plot" design. Failure to recognize hampereddue to the position in which they are placed in
this can lead to inappropriate analyses and erroneous the experimental area and cause differences in degrees
experimental error variance estimates. of precision for different comparisons. In our extensive
A proper experimental design can be destroyed by consulting with plant science researchers during the past
failing to recognize what constitutes the experimental 20 years, we have encountered numerous examples in
unit. For example, an individual may try to provide which the researcher did not take the need for random-
replication by subdividing larger plots to which treat- ization seriously and consequently compromised the
ments have been applied (Fig. 1). In an aerial spraying quality of the experiment and the results therefrom.
experiment for disease control, three chemicals were Methods of proper randomization are discussed ex-
applied in long strips and then the strips were tensively in statistical methodstexts. There are also com-
subdivided to provide what the researcher considered puting routines available for this purpose in someof the
were replications. Figure 1 is not a randomizedcomplete statistical computingpackages.
block design which the researcher had intended to use. It is commonfor randomization to be considered rele-
Each strip is one experimental unit and the subdivisions vant only at the time of assignment of the treatment to
are samples, not replicates. A randomized complete the experimental units. It is important that the research-
block design (Fig. 2) was then provided the researcher er take care during all phases of the research that spuri-
an appropriate alternative. In this case there are nine ex- ous correlations are not introduced amongexperimental
perimental units in three blocks of three each. Random- units receiving the same treatments or any other cor-
ization was carried out as required in the randomized relations not accounted for by the design (see misuse 2).
complete block design and the aerial spraying was per- Such might occur, for example, if potted plants from
formed according to this revised plan. The danger that different replicates are grouped for easy administration
chemical effect estimates would be confounded with of a daily nutrient supplement.
102 JOURNALOF AGRONOMIC
EDUCATION,VOL. 12, 1983

Whenexperiments are conducted in series (over sites the researcher about his experimental technique with a
and/or years) it is necessary to provide different ran- view to its improvement.
domizations for each experiment. This will reduce We find that many agronomists do not know the
biases which might result from two adjacent treatments meaning of effective blocking. In many cases, they are
interacting and will tend to equalize the precision of all blocking just to provide replication, not error control.
comparisons. Others attempt to run experiments before they have ade-
There are also experiments in which the entire experi- quately becomefamiliar with the experimental process
mental process is a chain of individual steps. For ex- (perhaps through small pilot studies) and so they do not
ample, plants might be grown according to an experi- use the best technique. Somelack care in controlling
mental design in the greenhouse and then transferred to variation. Wesay that their experimental technique is
the field and used in a second stage experiment under out-of-control. Others do not oversee the experimental
field environmental conditions. Or, more likely, plants process adequately or they do not take note of unusual
may be grown in a haphazard arrangement in the green- events which took place at the experimental site. Con-
house for part of their growth cycle and then put into a sequently, when these unusual events have generated
designed experiment at a specified stage of growth. "outliers" in the experimental data, there is no basis for
Proper randomization at that stage would avoid biasing rejection of the extreme datumpoints from the data set.
treatment effects and experimental error (arising from Overall precision can be increased by using a uniform
environmental variation during the earlier stage of experimental technique throughout the series of experi-
growth) but a more efficient experimental design would ments. Someways of standardizing technique are to: 1)
incorporate provisions for error control at all stages of write out procedures for conducting various phases of
plant growth. the experiments and a time schedule for their execution;
2) make all personnel dealing with the treatments, plots
Misuse 4--Using an Improper Size of An Experiment and data aware of the various sources of error and the
need for good technique; 3) apply the treatments uni-
It is important to use an appropriate numberof repli- formly; 4) exercise sufficient control over external in-
cations in an experiment. Under-replication could result fluences so that every treatment produces its effect
in very imprecise estimates, whereas over-replication under controlled, comparable conditions; 5) devise suit-
can be costly. Agronomists probably err on the side of able unbiased measures of the effects of treatments; and
under-replication more often than on the other side. 6) prevent gross errors.
Onestill sees self-contained field experiments where the There are manyaspects to technique such as choice of
researcher is attempting to achieve adequate precision proper plot size and shape, choice of dosages in quanti-
with only two replications. There are very few field tative controlled variable experiments and proper timing
situations where this number of replications would be of operations. It is very important from a statistical
adequate. point of view that individuals who lay out the experi-
Table 2.1 in Cochran and Cox (1957) provides ments are trained in the subject matter discipline as well
useful guide to the determination of the proper number as in field plot technique. Otherwise it is impossible to
of replications if the approximate variability of the ex- provide credentials to imprecise data once they have
perimental material (coefficient of variation) is known been collected under dubious sets of circumstances.
and the researcher is willing to estimate differences be-
tween means of a given percent. It is also necessary to MISUSES IN ANALYSIS AND
assumea probability level for the test (or) and a proba- INTERPRETATION OF DATA
bility of rejecting a false null hypothesis.
Equally important in agronomic research is for the Misuse 6mUsingInappropriate Error Termsfor Testing
replication to adequately sample the reference popula- or for Calculating StandardErrors
tion of environmental conditions. This is usually ac-
complished by growing the test over several years at Use of an inappropriate error term for testing or for
several locations within the geographical area of inter- providing standard errors has been a problem for many
est. It is very unlikely that a test at one site in 1 year will years but has increased recently with the commonuse of
provide a reliable inference to any except the most re- statistical computingpackages which use a default error
stricted reference population of environments. term. By this is meant that all terms not included in the
linear model spelled out in the instructions to the com-
Misuse 5--Using Improper Experimental Technique puter are pooled into an error term which the computer
uses for tests and estimates of precision. In a very large
The precision of data from an experiment depends to proportion of the cases, the tests of significance auto-
a large degree upon the experimental technique used. matically provided by computer packages are incorrect.
Because statistics deal with variability and methodsfor Each user of a computer package is responsible for the
dealing with it, experimental technique does fall within correctness of his or her analysis.
the realmof a statistician’s concern. In fact, statisticians The analysis of variance of data from a randomized
have perhaps made one of their more important con- complete block design in which each plot has been sam-
tributions to agronomic research in asking questions of pled (Table 1) has a sampling error in addition to the
NELSON & RAWLINGS: MISUSES OF STATISTICS IN AGRONOMY 103

Table1. Ananalysisof varianceof datafroma randomized complete propriate error term for testing, especially in situations
blockexperiment
(fourblocksand10treatments) in whichtheentire wherethere are several plot sizes or levels of error within
plotwasnotharvested,butinsteadsix samples weretakenwithin
eachplot andanalyzedforlevel of the response the same experiment.
variablebeingstudied.
Source df MS Misuse7--Failing to Study Patterns in Data
Blocks 3 9000
Treatents 9 4000"1 With modern computers, there is a tendency to rou-
Error 27 2000 ] F = 2 NS tinely process data through standard data analysis rou-
Sampling error 200 400
tines without careful study of the patterns of variation
usual experimental error. The appropriate error term in the data. As a result, researchers are much less
for testing treatments is experimental error, not sam- familiar with the behavior of their experimental data
pling error. The F of 2 based upon forming the ratio of than when analyses were done by hand and it is easy for
Treatment MS to Error MS with 9 and 27 degrees of badly behaved data to escape detection. The presence of
freedomis not significant at the 0.05 level. If one placed a single outlier can grossly inflate variances without
only Blocks and Treatments in the linear model used in being detected if, for example, only routine analyses of
fitting by the computer, the pooled error would be [(27 variance are run. Heterogeneous variances, inade-
x 2000) + (200 x 400)]/227 = 590. Using this inap- quacies of the model, and model assumptions will sel-
propriate pooled error, the computer would use a de- dombe detected without a careful study of the data. If
nominator of 590 rather than 2000 in the F ratio and the the data set is too large for a careful study by hand, vari-
denominator degrees of freedom would be 227 rather ous computer programs for editing, residual analysis,
than the correct value of 27. The resulting inference tests of normality, etc., are available for assisting a
would be incorrect, i.e., treatments would now be sig- complete analysis of the data. If causes for the outliers
nificant at the 0.05 level. can be identified, it is possible to replace them by
Or if the model is specified so that "error" is missing plot estimates. In somecases, entire sections of
separated from "sampling error," some computer the data (or even entire data sets) are rendered invalid
packages will use the sampling error in testing the treat- effects of uncontrolled factors or by improper experi-
ments category resulting in a large upward bias in the mental design or layout. It is important that such cases
Treatments F ratio. be recognized and dealt with appropriately.
There are other cases where the researcher desires to Detection of portions of the data where the variance
use the appropriate error term but it is difficult due to differs from that of other parts of the data may be ac-
the nature of the design constraints. A commonex- complished by comparing the ranges among the
ample is the design of growth chamberexperiments. It is replications for each treatment or by analysis of
difficult to provide enough chambers for replication of residuals. There are several courses of action once it has
the temperature-humidity conditions. The chambers are been determined that the errors in the data set are
often shared by a number of researchers and also it is heterogeneous. In some cases, a transformation such as
difficult to change the temperature-moisture settings for the log-transformation may be used. Another approach
a given chamber. Replication of the factor-combina- is to partition the data into sets which have homo-
tions within the chamber is achieved more readily but geneous variance and conduct separate analyses of
the error term resulting from this second type of repli- variance for each set.
cation is not appropriate for testing temperature- A careful study of the data patterns will also help to
humidity treatments. The answer to this problem usual- determine if the a priori biological modelis adequate or
ly is to provide multiple runs of the experimentusing the if the patterns show that some other model would be
same temperature settings within a chamber from run to more appropriate.
run but with new randomizations of the factor-com-
binations within a chamber for the various runs. The
run then serves as a block in a randomized complete Misuse 8--Depending Excessively on One Class of
block whole-plot arrangement of temperatures and the Statistical Analyses
plots within the chamberswithin a run are considered as
sub-plots. In agronomic research, the analysis of variance has
In short, the researcher needs to be sure that the ap- almost becomethe universal method of analysis. This is
propriate error term is being used whether the analysis the statistical procedure emphasizedin all basic statis-
of variance is being conducted by a desk calculator tical methodscourses and it is familiar to most agrono-
under his personal supervision or whether by an elec- mists. The analysis of variance is a powerful technique
tronic computer. The computer is in no position to de- for understanding variational patterns in the data and
termine the appropriate error term with which to test so deserves to be a primary tool. However, excessive de-
one should not blindly assume that significance tests or pendence on the analysis of variance, or any other
error estimates obtained from the computer were ob- single statistical methodof analysis, is a handicapto the
tained using the proper error. The procedure of writing researcher. There are two problems associated with this
out expectations of mean squares which is described in dependence. First, the particular statistical analysis
statistics methodstexts is useful in guiding one to the ap- simply may be inappropriate for the problem either be-
104 JOURNALOF AGRONOMIC
EDUCATION,VOL. 12, 1983

cause basic assumptions required for the analysis are Table 2. Example of an analysis of variance table which might be
not satisfied or because other more powerful methods included in the RESULTSsection
may be available. Secondly, other methods of analysis Source df MS F
maybe more revealing of the basic structure of the data. Blocks 3 1900
For example, a principal component analysis or other Treatments 4 1450 4.8*
multivariate methods can be very informative of the Ck vs others 1 2225 7.5*
Variety (V) 1 1500 5.0’
correlational structure amongseveral variables. The re- 4.2
N-rate (N) 1 1250
searcher should always be critical of the standard V × N 1 825 2.8
methodof analysis and seek statistical assistance if he Error 12 300
suspects his methodis inadequate in any sense. * Significant at the 0.05 level.

MISUSES IN REPORTING response curve. This may be reported in a graph show-


EXPERIMENTAL RESULTS ing the response relationship together with the equation
and measures of precision. The preceding approach may
Misuse 9--Misapplying Multiple Comparison be extended to more than one fertilizer variable; a re-
Procedures Such as Duncan’s NewMultiple sponse surface in two or more variables would be fitted
Range Test to the yield data. Adoptionof a response curve (or sur-
face) to represent the behavior of the data completes the
Perhaps the most frequently occurring of the misuses tests of significance of effects of changes in the inde-
in agronomic reporting is misuse of multiple compari- pendent variables. It would be inappropriate, for ex-
son procedures. There has been a recent tendency to ample, to ask, "Is the change in Y from X = 10 to X =
overuse and misuse mechanical comparison procedures 20 statistically significant?" Or, "How much does X
such as Duncan’s NewMultiple Range Test in inter- have to be increased (above X = 10) before a statis-
preting and reporting research results. The plant science tically significant increase in Y is achieved?" Questions
journals have faced this problem for a number of years. of this type reflect’ a general over-emphasis on
Unfortunately, the misuse of these procedures is re- hypothesis testing at the expense of estimation. The
corded permanently in some of the papers of these statistical test of significance is an aid for deciding
journals. whether changes in Y are real or are an artifact of the
In approaching the problem of drawing inferences random "noise" in the experiment. Adoption of the re-
about treatment effects, it is important to review the sponse curve, presumably using appropriate tests of sig-
plans and the major goals set forth by the researcher be- nificance, implies that we have rejected randomnoise as
fore the experiment was initiated. It is also important to the explanation for the changes in Y. Having adopted
review the experimental and treatment designs and an the response curve, the above questions should be re-
account of what has taken place at the research site phrased as estimation questions; e.g., "Whatis the esti-
throughout the period of the experiment. The treat- mated change in Y as X is changed from 10 to 20, and
ments are then studied in detail to see what comparisons what is the standard error of the estimated change?" or
are logical from the structure of the treatments. As an "How much does X have to change to produce a (bio-
example, consider the following set of five treatments: logically) meaningful increase in Y of S units, and what
the 2 x 2 factorial set of treatments consisting of is the standard error of this estimated change in X?"
varieties A and B with and without fertilization plus a It is difficult to lay downhard and fast rules for inter-
check treatment consisting of the standard variety and preting data from factorial experiments. Each experi-
the standard fertilization. The comparisons which ment will present a different interpretational pattern.
wouldbe logical for these five treatments are: Usually if interaction(s) are negligible it is possible
Ck vs. other four treatments, use the techniques described above on the main effect
Var A vs. Vat B, means. If interaction is sizeable, the above techniques
Fertilizer vs. Nofertilizer, and are used for one factor and each level of the other
(Var A vs. Vat B) x (Fertilizer vs. No fertilizer). factor(s).
One should consult a statistics methods text for the Where treatments have structure, the inclusion of a
methodologyto construct contrasts and to calculate the brief analysis of variance table in the RESULTS section
sum of squares associated with each comparison. It of a report or paper will often be quite useful to the
should be pointed out that multiple comparison pro- reader (Table 2). The check mean was different from the
cedures should not be used for comparing the five mean of all of the other treatments. Also the Var A
means because there are specific comparisons suggested mean was different from the Var B mean. One can then
by the factorial structure of the data. Wewant the report the check mean, the average of the other four
powerof the tests to be focused on these particular com- means and the Vat A and Var B means. This gives the
parisons. information which exists in this set of data.
For situations in which the experiment involves a The one situation in which the use of multiple com-
quantitative controlled factor (e.g., an N-rate experi- parisons is justified is where there does not appear to be
ment involving treatments of 0, 50, 100, 150, and 200 a logical treatment structure. A mechanical comparison
kg/ha N) the appropriate statistical analysis is to fit of all pairs of means using a criterion such as the
NELSON & RAWLINGS: MISUSES OF STATISTICS IN AGRONOMY 105

Waller-Duncan k-Ratio Procedure (Steel and Torrie, CONCLUSIONS


1980) seems appropriate. However, it should be pointed
out that in most experiments there is at least some struc- Misuses of statistics in agronomic applications are far
ture in the treatments. more prevalent than most agronomists realize. The
statisticians' views of statistical concepts have changed
Misuse 10—Failing to Report in the MATERIALS considerably since the early days of statistical applica-
AND METHODS Section of the Research Report the tion (e.g., when Duncan's New Multiple Range Test was
Experimental Design and Statistical Procedures Used in vogue). Agronomists apparently are not fully aware
of these changes. Widespread use of computers, to-
One of the common failures in reporting of research gether with the ready availability of "user friendly"
results is to inadequately describe in the MATERIALS software packages have also resulted in a number of
AND METHODS section of the research report or misuses of statistics.
paper the experimental design and statistical procedures Agronomists should recognize their need for
used. This information is very essential to the reader's statistical assistance in planning experiments and in
proper interpretation of the results reported in the analyzing and interpreting experimental data. The re-
paper. A detailed account of what procedures were used sponsibility for improving the statistical practice in
will also allay any fears on the part of the reviewer and agronomic research rests jointly with the agronomist,
the editor of the journal and the reader that improper the statistician, the agronomic journal reviewers and
design or statistical procedures were used. editors. Such improvement should result in well-
An example of a statement which would be used to planned agronomic experiments which focus upon the
describe the experimental design and statistical proce- problem(s) being researched, the results from which are
dures is as follows: The experiment was conducted ac- clear-cut and the conclusions from which are scien-
cording to a split-plot design with a randomized com- tifically valid and relevant to the underlying problem.
plete block arrangement of the whole-plot factor (Varie-
ties). There were four blocks. The sub-plot factor was N
fertilizer rate (0, 50, 100, 150 kg/ha of N supplied as an-
hydrous ammonia). Because there was a significant
Variety x N-rate interaction, separate quadratic N-rate
response curves were fitted for each of the varieties.
Economic optimal rates were computed according to the
methods of Heady et al. (1955) assuming a cost/price
ratio of 0.0244 (N fertilizer in kg, corn yield in quintals).

You might also like