Box 1976

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Science and Statistics Author(s): George E. P. Box Reviewed work(s): Source: Journal of the American Statistical Association, Vol.

71, No. 356 (Dec., 1976), pp. 791799 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2286841 . Accessed: 11/03/2013 11:54
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.

http://www.jstor.org

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

Scienceand Statistics
GEORGEE. P. BOX*

Aspects of scientificmethod are discussed: In particular,its representation as a motivated iterationin which, in succession, practice confronts theory,and theory, practice. Rapid progressrequires sufficient flexibility profitfrom such confrontations, to and the ability to devise parsimonious but effectivemodels, to worryselectively about model inadequacies and to employ mathematics skillfully but The developmentof statisticalmethods at Rothamsted appropriately. ExperimentalStation by Sir Ronald Fisher is used to illustratethese themes.

on the one hand, nor by the undirectedaccumulationof practical facts on the other,but ratherby a motivated betweentheory and practicesuch as is illustrated iteration in FigureA(1).

1. INTRODUCTION
In 1952,whenpresenting R.A. Fisherforthe Honorary degreeof Doctor of Science at the University Chicago, of W. Allen Wallis describedhim in these words.
He has made contributions manyareas of science; among to them are agronomy, astronomy, bacteriology, anthropology, botany,economics, public forestry, meteorology, psychology, health,and-above all-genetics,in whichhe is recognized as one oftheleaders.Out of thisvariedscientific research his and skillin mathematics, has evolvedsystematic he for principles the interpretation empiricaldata; and he has foundeda of scienceof experimental he design.On the foundations has laid there beenerected structure statistical has a of down, techniques thatareusedwhenever menattempt learnaboutnature to from and experiment observation.

of A. The Advancement Learning and Practice Between Theory A(1) An Iteration A(2) A Feedback Loop
PRACTICE DATA FACTS

HYPOTHESES xD MODEL CONJECTURE THEORY IDEA

DEDUCTILON INDUCTION
__ \_

Hj,+ REPLACES Hj

H HYPOTHESIS
A

H OTAED 1 H

|INDUCTION

ERROR SIGNAL

Fisher was introducedby the title which he himself would have chosen-not as a statisticianbut as a scientist, and this was certainlyjust, since more than half of his publishedpapers wereon subjectsotherthan statistics and mathematics. MATy themethenwillbe first show the to part that his being a good scientist in his astonishplayed ingingenuity, originality, inventiveness, productivity and as a statistician,and second to considerwhat message that has forus now.

CONSEQUENCES OF Hj DEDUCTION

2. ASPECTSOF SCIENTIFICMETHOD
learning comes to us from such classical writers as Aristotle, Galen, Grossteste, William of Occam, and Bacon who have emphasizedaspects of good science and have warnedof pitfalls.

A heritage thought of about the processof scientific [1, 2]).

Mattersof fact can lead to a tentativetheory.Deductions from this tentative theory may be found to be withcertainknownor speciallyacquiredfacts. discrepant These discrepanciescan then induce a modified,or in some cases a different, theory. Deductions made from the modified theorynow may or may not be in conflict with fact,and so on. In realitythis main iterationis acsubiterations (see, e.g., companiedby manysimultaneous

2.2 Flexibility

Lecture presentedat the joint statistical meetingsof the American Statistical Associationand Biometric Societygivenat St. Louis in 1974.The authorgratefully the acknowledges assistance of his wifeJoan who generously sharedthe resultsof herresearch herfather's and made available themanuscript herbiography on life of of Fisher.

scientificiteration evidently On this view efficient requires unhamperedfeedback. The iterativescheme is shownas a feedback loop in FigureA (2). In any feedback 2.1 Iteration Between and Theory Practice loop it is, of course, the errorsignal-for example, the One important is thatscience a meanswhereby discrepancy between what tentative theory suggests idea is learningis achieved, not by meretheoreticalspeculation shouldbe so and what practicesays is so-that can promusthave the flexibility The good scientist duce learning. * GeorgeE.P. Box is R.A. FisherProfessor Statistics, of of University wisconsin, and courage to seek out, recognize, and exploit such Madison, WI 53706. Research was supportedby the United States Armyunder GrantDAHC04-76-G-0010.This is thewritten versionoftheR.A. FisherMemorial errors-especially his own. In particular,using Bacon's
? Journalof the AmericanStatisticalAssociation December 1976,Volume71, Number356 ApplicationsSection

791

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

792

Journal the American of Statistical Association, December 1976

hypotheses and his strategically selectedenvironment will allow him to compare these consequenceswith practical reality. In this way he can begin an iterationthat can 2.3 Parsimony eventuallyachieve his goal. An alternativeis to redefine such words as experimental design and decisionso that Since all models are wrongthe scientistcannot obtain mathematical solutionswhichdo not necessarily have any a "correct"one by excessiveelaboration.On the contrary relevanceto realitymay be declared optimal. Williamof Occam he shouldseek an economical following of description natural phenomena.Just as the abilityto 3. FISHER-A SCIENTIST devise simplebut evocativemodelsis the signature the of great scientistso overelaboration and overparameteriza- With theseideas in mindlet us see how Fisherqualifies as a scientist, some of the eventsocusingforillustration tion is oftenthe mark of mediocrity. curringduring his stay at Rothamsted Experimental 2.4 Worrying Selectively Station. Since all modelsare wrongthe scientist mustbe alertto 3.1 Rothamsted what is importantly wrong.It is inappropriate be conto In 1919, Fisher had rejectedthe securityand prestige cernedabout mice when thereare tigersabroad. of working underKarl Pearson in the most distinguished 2.5 Roleof Mathematics Science in statistical laboratory in Britain and at that time cerin the world.Instead, he took up a temporary job Pure mathematicsis concernedwith propositions like tainly as the sole statisticianis a small agriculturalresearch "given that A is true, does B necessarily follow?" Since stationin the country.He was then already 29 years old the statement a conditionalone, it has nothing is whatsoever to do withthe truthof A nor of the consequencesB and he later said that he was aware that he had failedat both the jobs (teacher and actuary) that he had so far in relation to real life. The pure mathematician, acting in that capacity,need not, and perhapsshould not, have attempted. Sir John Russell, then Director of Rothamsted,later any contact withpracticalmattersat all. recalled [17, p. 326] In applyingmathematicsto subjects such as physics ... whenI first or statistics we make tentative assumptionsabout the saw himin 1919 he was out of a job. Before I decidinganything wroteto his tutorat Caius college ... real worldwhichwe know are false but whichwe believe about his mathematical The answer ability. was thathe could may be useful nonetheless.The physicist knows that have been a first class mathematician he "stuck to the had particleshave mass and yetcertainresults, approximating ropes" but he would not. That lookedlike the type of man what really happens, may be derived fromthe assumpwe wanted.... I had only ?200 and suggested shouldstay he tion that theydo not. Equally, the statistician as long as he thoughtthat should suffice.... He reported knows,for to me weekly tea at my house.... It tookme a very, at short example,thatin naturethereneverwas a normaldistributimeto realizethat he was morethan a man of greatability, tion,thereneverwas a straight line,yet withnormaland he was in facta genius. linearassumptions, knownto be false,he can oftenderive At the end of a year,Fisher,who had a wifeand child, results which match, to a useful approximation,those had used up twice the ?200, but by that time he had foundin the real world. It followsthat, althoughrigorousderivationof logical been given a permanent post. consequences is of great importanceto statistics,such the derivations necessarily are encapsulatedin the knowledge 3.2 Weighing Baby that premise, and hence consequence, do not describe For the theory-practice iterationto work,the scientist natural truth.It followsthat we cannot know that any must be, as it were, mentallyambidextrous;fascinated statisticaltechniquewe develop is usefulunlesswe use it. equally on the one hand by possible meanings,theories, Major advances in scienceand in the scienceof statistics and tentativemodels to be induced fromdata and the in particular, usually occur,therefore, the resultofthe practicalrealityof the real world,and on the otherwith as theory-practice iteration. the factual implicationsdeducible fromtentative theoThe researcherhoping to break new ground in the ries,models and hypotheses. theoryof experimental design should involve himselfin Fisher had great interest in practical matters. For the design of actual experiments. The investigator who example,he beginsthe real businessofhis book Statistical decisiontheory hopes to revolutionize shouldobserveand Methods ResearchWorkers Chapter2, by discussing for in take part in the making of importantdecisions.An ap- different ways of plotting data. His firstexample is propriatelychosen environment can suggest to such an introducedas follows[12, p. 25]: "Figure 1 represents new theoriesor models worthyto be enter- the growthof a baby weighedto the nearest ounce at investigator tained. Mathematics artfully employed'can then enable equal intervalsfrombirth." He does not say that this is him to derive the logical consequences of his tentative any particularbaby. RecentlyI was fortunate see the to Fisher familyrecordsin whichin Fisher's own hand are 1 The researcher's purelymathematical ingenuity likelyto be exercised is more, recordedthe weightfrombirth of everyone of his nine not less, by the fact ofhis dealingwithgenuineproblems.

analogy,he must not be like Pygmalionand fall in love withhis model.

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

Scienceand Statistics
children,weighed by himself, with the results carefully graphed. Comparisonshows that the child is his second son, Harry Leonard, who was bornin 1923 shortly before the firstedition of the book was written.The next leg of the scientific iterationis hinted at as he goes on to discusshow best to plot the data so as to make "a rough examinationof the agreementof observationwith any (proposed) law of increase."

793
that, "the limitingdistribution must be such that the extreme memberof a sample of n from such a distribution has itself a similar distribution."This simple but remarkable insightleads to a functionalequation which yieldsas its solutionthe basic limiting forms. From these formsalmost all subsequentworkon the subject springs. The theoryhas applicationsin such different fieldsas the design of dams and the reliabilityof components.Like so many of Fisher's brain childrenthis is now regarded as a distinctfieldof study. I will use forfurther illustration workthat Fisher did at Rothamstedbetween1919 and 1927 whichbegan with regression analysisand endedwitha completeand elegant theoryof experimental design whichis still the basis for most statisticallyplanned experiments. This work was published in a series of papers having the general title "Studies in Crop Variation" and numberedr, II, III,2 IV, and VI [7, 13, 8, 5, 6].

3.3 Findthe Lady

The extraordinaryextent to which Fisher's actual was gristto the millofhis inductive everyday experience mind is further illustratedin the famous opening lines of Chapter II of Fisher's book The Design of Experiments[11, p. 11]: "A lady declaresthat by tastinga cup whetherthe milk or the of tea ... she can discriminate tea infusion was first added to the cup. We will consider the problem of designingan experimentby means of which this assertioncan be tested." Fisher proceeds to 3.6 FromDungto Orthogonal and Polynomials Residual use this example to explain and illustrate the basic Analysis principlesof good statisticaldesign. There was, of course, a real lady. This incidenthapBy 1919 13 plots on Broadbalk wheat fieldshad repened many years beforethe book was writtenand just ceived thirteendifferent manurial treatments uniformly after Fisher came to Rothamsted. The lady was Dr. for67 years.In "Studies in Crop VariationI" [7], Fisher Muriel Bristol,the algologist,and she had declined the begins by presentinga workmanlikediscussion,which her because he had lasts for twelve pages, of the responsesto the thirteen cup of tea that Fisher had offered added the tea first. Fisherdeclaredit made no difference. different manuresrevealed by his analysis of the BroadTo which she replied "Of course it did." Her future balk data. In particular, concludesthat thereis really he husband, William Roach, who was close at hand said nothinglike plain dung. It gives a high yield with no "Let's testher," theydid, and accordingto himshe made significant diminutionof its effectover the years. He nearly every choice correctly. In this she behaved then quite suddenlyshiftsfrommanure to mathematics to similarly the lady in the book who got one wrong. revealingwherehis analysis has come from.In the next fewpages he introduces orthogonal polynomials, presents 3.4 From Soil Bacteria Nonlinear to Design formulas for their calculation from equispaced data, propertiesof the coefficients, The tea urn was a great catalyst to iteration.There, obtains the distributional and showshow theirsignificance may be judged. Without each afternoon,Fisher conversed with membersof the calling it that he presentsthe appropriate analysis of staffand with visitorsand became involved in scientific fifth degree polytheir problems,oftenwith dramatic consequences. One variance which he has used in fitting nomials to the annual yields. M\Iost of interesting all, he scientistwho came to Rothamsted about the same time discusses the propertiesof the residuals y - y from a as Fisher and became his intimate friendwas the bacpolynomialof any degree r allowingus to see him teriologist,Gerard Thornton. It was he who firstin- fitted in the guise of what some people now call a data analyst. terestedFisherin improving timeconsuming the dilution Data analysis, a subiterationin the process of invesmethods for making bacterial counts. This resulted in tigation,is illustrated here. Fisher's pioneeringwork on nonlinear design in 1922 mentioned Cochran [4]. by

TENTATIVE
MIODEL

INFERENCE

to 3.5 From Cotton Extreme Values


One of the early visitorsto Rothamsted was L.H.C. Tippett fromthe Cotton Research Institute. A matter of great practical concern to him was the strengthof cotton yarn. Since the breaking strengthof a piece of cotton is the strength the weakest link, he was faced of withwhat we shouldnow call the extreme value problem. Tippett had firststudied with Karl Pearson and had earlier approximatedthe distribution using the method of mnoments. cooperationwith Fisherthe problemwas In tackled ratherdifferently. authorsnote [14, p. 180] The

CRITICISM

TENTATIVE
ANALYSIS

In the inferential stage, the analyst acts as a sponsor forthe model. Conditionalon the assumptionof its truth he selects the best statistical proceduresfor analysis of the data. Having completed the analysis, however,he must switchhis role fromsponsorto critic.3 Conditional now on the contrary assumptionthat the model may be
2 This paper [8] was presented to the Royal Society withoutthe generaltitle but was mysteriously labelled III and had clearlybeen originally intendedforthis

series.

' The apt christening statisticalcriticism due to CuthbertDaniel. of is

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

794

Association, Journal the American of Statistical December 1976


Education Acts of 1876 and 1880 made attendance at We are leftto speculate whether the school compulsory." low wheat yields occurredafter that time because the hands of the littleboys who pulled the foxtailgrass were ink earth. now covered w-ith and not w-ith

seriouslyfaultyin one or more suspectedor unsuspected ways he applies appropriatediagnosticchecks,involving various kinds of residual analysis. In orderto conduct his analysis of the residuals from Fisher obtained polynomials, the fitted
y, of variances the residuals - i forthe 67 ii. the individual observations, iii. theidentity
i. the average value of V(y-y) as (1
-

(r + 1)/n)a',

Rainfall WheatYieldto Distributed and 3.8 From Lags

In 1924, in the thirdpaper of the series [8], he used the the Broadbalk data to demonstrate influence rainof a,2 = V(yj) = V(Yj - ij) + V(P;) , j = 1, 2, ..., n, fall on the wheat yield. At the beginning the paper he of of for formula theautocorrelations residuals seemed to fearthat he mightbe expectedto account for iv. an approximate of polynomial anydegree. from fitted a the effects not only of rainfallbut also for such other dew of V (y?) from (i) and (iii) is variables as maximumand minimumtemperature, The average value point, and hours of brightsunshine.But he points out Or2(r 1)/n. Thus, Fishersays if we want to have a small + of variance fory we should keep r small-a demonstration that allowancesforthe effect each of these on the final harvested yield would need to be included at least for helpingto justifyhis use of of the value of parsimony, degree. Fisher plots the vari- each monthseparately.And he says ifso manyregressors polynomialsof only fifth of ances V(yj - ij) forthe individualresidualsagainsttheir are includeda veryhighproportion the total variation can seem to be accounted for by chance alone. In case timeorderj. Because of relation(iii) the graphlooked at the some dissidentreader mightdoubt it, he thereuponoutthis is Using henotes down also a plotofV(a,j). upside of of lines the derivationof the distribution the multiple deceptive reductionof V(yj - j) at the extremities in correlationcoefficient one paragraph flat using n-diincrease of V(y ) and the scale and the corresponding says [7, p. 123] "it is a weaknessof the polynomialform mensional geometryand on the next page produces a shorttable of tail areas forR. He thengoes on to discuss that the extremetermsshould be so much affected." of and underfitting the misleadingeffects selectionin what would now be Finally, mentioningthat overfitting regression. are both to be avoided he uses the matchingoftheoretical called step-wise Fisher's data were as follows: of and empiricalautocorrelations residualsto check when high degree has been fitted. a polynomialof sufficiently i. for eachofthe13 Broadbalk wheat plotshehad theharvested yieldsforeach of60 years,4 and observedautohe In particular, comparestheoretical ii. foreach of these60 yearshe had dailyrainfall records and of correlations residualsfrompolynomialsof degreezero forconvenience aggregated he theseforeach year into 61 and the and five to show the inadequacy of the former after six-day periods(6 X 61 = 366) beginning immediately satisfactoryfit of the latter. This application of serial theharvest. of correlation residualsto the awkwardproblemofdecidIn a remarkable demonstration parsimonious of modeling at what point adequacy of fithas been achieved has ing he firstsuggeststhat the yield of wheat in the jth and interest55 years later. great freshness year, wj say, mightbe represented by

Acts 3.7 Weedsand the Education

Fisher was perplexedby the shapes of his fittedyield graphs. These showed a pattern of significantslow to changes commnon all the 13 Broadbalk plots. In particular, there was a common tendency for low yields roughlyin the period 1870-1880. This common pattern was not due to weather; a similaranalysis he conducted wheat at Woburn, for successive yields of experimental and wheat averages for the whole of Hertfordshire, for plots at Rothamsted, barley and grass fromexperimental failed to show it. He speculates [7, p. 129], "Of all the organic factorswhich influencethe yield of wheat it is slowly to probable that weeds alone change sufficiently the changes at Broadbalk." explain He goes on to describe,as only a dedicated gardener could, all the various weeds that were found there. He notes that old recordsshow that, in 1853, 211 man-days and 714 boy-days were spent in weeding the field. In particular, the boys probably held in check by hand But agrestis. weedingthe slenderfoxtailgrass A lopecuris he says [7, p. 131] "it may be rememberedthat the

w; = c + E atrjt, j = 1, 2, ... ., 60.


t

61

(3.1)

In this model the coefficient provides the average at effect eventual harvestedyield of one inch of rain in on the tth time period. In modernparlance (3.1) mightbe called a "transfer function" model expressing the "memory" of the system. Economists later called it a "distributedlag" model but they seem to have been unaware of Fisher's priorwork or of his ingeniousway of proceeding using orthogonal polynomials. As it stands (3.1) is highlynonparsimonious. Fisher to decided, therefore, representthe rainfalldata rj, by orthogonal of polynomials fifth degree.He now notesthat the coefficients shouldalso followa smoothcurvewhich at in mightbe represented the same way. Thus,
at=

_ a oT01 aTL1 +I-. . . +I a5T5t +I r = po,T01-IPlT1t p+

+ . .. +

p53 T51

4Five years 1890, 1891, 1905, 1906,and 1915 were omittedbecause the plots in theseyearshad special treatment.

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

Scienceand Statistics
But if the orthogonalfunctionsTit are chosen so that Tit' = 1; then, aftersumming,(3.1) may be written Et
Wj = C + aopoj + allpj + ...+ a5p5i

795
camequickly thefirst in of correction form the appropriate Methods 1925 (see [12, p. 238]). in editionof Statistical givesthe correct Usingpartof the same data, Fisherthere to analysisand pointsout thatit is essential use separate variance estimates and within error (forbetween plot comlarger parisons)and showsthat one is indeedsignificantly thantheother. Fisher paperon the analysisof variance, iv. In thisveryfirst the of demonstrated flexibility his thouight questioning by model(which almost else everybody has eversince thelinear received truth).The authors say acceptedas representing [13, p. 316], "theabove testis onlygivenas an illustration formula combining for the of the method;the summation is of and treatment evidently effects variety manurial quite No unsuitable thepurpose. one wouldexpectto obtain for in the variety sameactualincrease yield from lowyielding a whicha highyielding varietywouldgive ... a far more of be assumption thattheyieldshould theproduct is natural two factors on and one on the one depending the variety so Withthepossibility transformationmucha of manure." himnow we expect everyday thought, might partofFisher's to proceedalong thatroutebut in facthe derivesthe apmethods whichhave nonlinear analysis, devising propriate beenrediscovered onlyrecently [18].

the The a's whichdetermine lagged weightsin the transthe ferfunction thusbe obtainedby regressing wj onto can the estimatedp's. Having carried throughthe necessaryheavy calculations and graphed his results Fisher conducts a very extensivediscussion and comparisonof the polynomial manured plots distributed lag curves for the differently fromwhich,in particular,he adduces the predominant effect rain in reducingsoil nitrates.One feelsthat his of not lessenedby the fact love of parsimonywas certainly that the computations were performedby hand by himself and his assistant.Indeed, much can stillbe learnt fromhis discussion about economical processes of calculation and appropriatechecks [8, p. 111-3].

and to of 3.9 From Fertilizer Potatoes the Analysis Variance


About this time Fisher was getting rather tired of analyzing old records-he later describedit as "raking over the muckheap." In "Studies in Crop VariationII," jointly authored with his assistant, Miss W.A. MacKenzie, and subtitled"The Manurial Response of Different Potato Varieties,"[13] he triedhis hand at analyzing data fromRothamsted. The authors some experimental if to remarkthat it would be convenient (contrary some varietiesof plants did notreact expertopinion) different or to differently fertilizers, as we should say now, ifthere betweenvarietyand fertilizer. were no interaction been runby Thomas Eden, had An experiment recently in a crop ecologistat Rothanmsted, whicheach of twelve comvarietiesof potatoes were tested with six different was analyzed as if binationsof manure.This experiment it were a thricereplicated and randomized 12 X 6 factorial. (It wasn't, but we returnto that-later.) From the analysis of variance whichis presented,the interaction answerto the question, "Is theresignificant betweenvarietiesand manures?" appears to be No! There are some remarkablethings about this paper, however:
of at hinted earlier, i. The analysis variance, appearsherefor timein its completeness. arrives It the first quitesuddenly the in of andunannounced themiddle thepaperafter discussion of agricultural questions.It is, of course,not even in mentioned thetitle. the the between totalsuinofsquares ii. After algebraic identity suni treatments ofsquareshas and between and thewithin is beenwritten down,the statement made [13, p. 315] "If as had all the plots wereundifferentiated,if the numbers the downin random beenmixed and written order, averup to age value of each of the two parts is proportional the of it in of number degrees freedom thevariation which is of an randomization, Thus,at theviery beginning, compared." nailed important underwhichFisherwillsail, is firmly flag to themast. is becausein factthetrialwas actually iii. The analysis wrong, Feedbackin the runas whatis nowcalleda splitplotdesign.

and 3.10 Mice,Tigers, Randomization withfield A manin dailymuddy experiments contact in to couldnotbe expected have muchfaith any direct normalerrors. distributed of assumption independently for of the normality theerrors While supposition marginal from the as be might regarded innocuous, ideathaterrors adjacentplotsof land couldbe treatedas independent This was one absurdand dangerous. wouldbe obviously insistence onthephysical (i) reason Fisher's for important condition the for act of randomization a necessary as and (ii) that given that validityof any experiment should be out had randomization beencarried inferences randomization distribution; the made from appropriate often theory proto which, standard normal however, videdan adequateapproximation. of the To guarantee exactvalidity theusualnulltests linear modelit is not,ofcourse, madewiththestandard vector e of that function theerror necessary thedensity it be spherically only that it be normal, is necessary be function of i.e., spherically symmetric,5 the density the form normal theory f(e'e). The factthat standard an to often provides adequateapproximation thatgiven for theory not becausethe density is by randomization randomized by is approximated thatof errors necessarily because, the in deviates. is rather It independent normal inducedby vector space, the symmetry appropriate symmetry. is by randomization approximated spherical who someirritation laterworkers with Fishershowed saw onlya richsourceof purely mathematical development in his work. In particular,workerson what has failed to emphasize and sometimes perhaps even to realize the limitations imposedby the necessaryassumpof tion of symmetry the joint error distribution.The

come to be called "distribution-free" have often tests

' Obviously,thismusthe trueforany criterion whichis a homogeneous function of the data of degreezero.

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

796

Journal the American of Statistical Association, December 1976

The frequencies shownunderNR are thoseobtainedfor validity of this assumption could, of course, only be guaranteed by randomization. Otherwise,the derived a nonrandomized test. The frequencies underR are those procedures,far from being distributionfree, would be obtained whenthe observations wererandomlyallocated as almost as restrictive those derivedon the assumption to the two groups. of normalindependenterrors.It is true that long usage As is to be expectedthe significance level of the t-test the that densityfunc- is affected has seemed to sanctify proposition littleby the drasticchangesmade remarkably tions are of the formp(y) = Hiff(yi) or at least that in the marginal parent distribution-changesfor which function the the distribution-free p(y) = S(y), whereS is some symmetric of test provides insurance. Unforelementsofy. These propositions have cometo be treated tunately,of course, both tests are equally impaired by almost as natural laws or at least as rules of the game error dependence unless randonmization introduced is that no sportsmanwould question.6In fact, of course, whentheydo about equally well. The point is, of course, where errorscannot be expected to be in- that it is the act of randomizationthat is of major imexperiments dependentare very common. portanceherenot theintroduction the distribution-free of These points are not new but if we are to appreciate test function. Fisher's point of view they need to be broughttogether MuckRaking Group to Theory and illustrated together. For this latter purpose the 3.11 From resultsof a simplesamplingexperiment shownin the are Eden's potato data served to illustratethe method of table. Two samples of 10 observations from identical analysis of variance but Fisher appears to have had no populations of the formsindicated were taken and sub- hand in planning that experiment.The design is not jected to a t-test (t) and a Mann-Whitneytest (MW). randomizednor blocked and its very deficiencies for call The samplingwas repeated 1,000 times and the number appropriateremedies.When Fisher's friendGosset saw of resultssignificant the 5 percentpointwas recorded. the at paper, he wroteto Fisher [15, Letter No. 29], "The Ideally, this numbershould be 50 (that is, 5 percentof experiment seems to me to be quite badly planned,you the total) but it has a standard deviationof about 7 be- should give them a hand in that ....." Fisherlaternotes cause of samplingerrors.More accurate resultsmay be Gosset's that I should startdesigning "suggesting experiobtained by taking largersamples or by analytical pro- ments" [15, summaryof Letter No. 29]. This he procedures, however,since there is no practical difference ceeded to do. The iterativeprocess includingthe design between a significance level of say 4 percentand 6 per- aspect is sketchedin Figure B. cent, the present investigationsuffices for illustration. Autocorrelation between adjacent values was introduced B. Data Analysis and Data Gettingin the Process by generating observations froma movingaverage model of Scientific Investigationa of the formyt = ut - Out-. In this model the ut were Frequencyin 1,000 TrialsofSignificance at the5 Percent Level Using the t-Test (t) and the Mann-Whitney Test (MW) with No Randomization (NR) and Randomization (R)
Parentdistribution Pi Test Rectangular
NR R

_~~~~~~~~TU

STAT

Normal
NR R

Chi-squarea
NR R

NEW

AVAIWABLE

A
I HYTESISH

Hi Hj51 REPLACES HYPOTHESIS HFiE+r

Independent observations 0.0 t MW t MW t MW 56 43 5 5 125 110 60 58 48 43 59 46 54 45 3 1 105 96 43 41 55 49 58 53 47 43 1 2 114 101 59 44 63 56 54 43

L
| DEDUCTION

ERRORSIGNAL

CONSEQUENCES Hj

Autocorrelated successive observations -0.4 +0.4

~~~~~~~~~~~~~~~~~O

* The experimental designis bere shownas a movable windowlookingonto the truestate of nature. Its positioning each stage is motivatedby current at beliefs, hopes,and fears.

Between 1919 and 1928 an iterativesequence occurred that went through three main stages, each leading of and practice. and identicallydistributedabout zero in logicallyto thenextvia interaction theory independently the forms indicatedin the table. Values of 0 were chosen The analysis of existingrecords led to the analysis of experimental trialswthich thenled to thedesignofexperiso that P1, the first serial correlation, had values of -0.4 mental trials. and +0.4. There were different but interactiveaspects to this 6 Except in the studyof timeseries. development.We can see (i) sequential evolutionof the
a The parentchi-square distribution has fourdegrees offreedomand is thus highly skewed.

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

Scienceand Statistics
new methods in response to unfoldingrealizations of to need, (ii) the persuadingof practitioners try the new techniques,and (iii) the changingrole of the statistician impliedby the development.

797

initially envisaged in 1919 for the statistician was a passive and possibly even a temporary one. Russell wonderedif anythingmore could be extractedfromthe records. existing Fisher's work gradually made clear that the statis3.12 Evolution the NewMethods of tician's job did not begin when all the workwas over-it was not began long beforeit was started.The statistician Fisher's attemptsto analyze experimental data quickly a curator of dusty relics. His responsibilityto the of led him to the essentialprinciples experimental design. team was that of the architectwiththe crucial The need for randomization to achieve validity; for scientific job of ensuringthat the investigationalstructureof a replication providea valid estimateof error;forblockto was sound and economical. The ing extraneous sources of disturbance to achieve ac- brand new experiment He much morefunthan the former. himself curacy. Blocking in two directionssimultaneously(by latterrole is randomized Latin squares) was particularlyappealing. relishedit and we should thank him forbequeathingit Fisher would have been brought to see the enormous to us. It calls for abilities of a high order. It requires complicated advantages of the unorthodoxfactorialarrangements as among other thingsthe wit to comprehend the scientific problems, patienceto listen,the penetration an economical way to assess the effects variables in of combination forexample,his earlyattempts impart to ask the rightquestions,and the wisdomto see what is, by, to Finally,it requiresfromthe meaningto the differences associated with the 13 differ- and what is not, important. the statistitian courageto wagerhis reputation each time ently manured Broadbalk plots to which fertilizers had is an experiment run. For the time must come when all been applied in a highlynonbalancedmanner.However, whilethe efficiency factorialdesignscould be increased the data are in and conclusionsmust be drawn; at this of in by packing in more factors,larger factorialdesigns re- stage oversights the design,if they exist,will become embarrassingly evident. quired bigger blocks and hence produced greater inin homogeneity the experimental material,givinglarger experimental errors.The answerwhich quickly followed was confounding.

4. PERILSOF THE OPEN LOOP

We have seen some examples of the extraordinary progress made in our science over a brieften-year period 3.13 Persuading Practitioners as a result of feedback between theory and practice. The blessings of feedback were only available if Feedback requires a closed loop. By contrast,when for stops. Such stagnascientists would tryout his designsbut, not surprisingly, any reason the loop is open, progress did Fisher at first not have an easy job sellinghis revolu- tion can occur with the (normallyiterative) cycle stuck tionaryideas at Rothamsted.Indeed, the first designrun eitherin the practicemode or in the theorymode. to his specification 1924) was not done at Rothamsted (in and at all. It was a randomizedLatin Square design run at 4.1 Cookbookery Mathematistry Bagshot forthe ForestryCommission who had asked for The maladies which result may be called cookbookery and acted on his advice. But between 1924 and 1929, as and mathemtatistry. symptomsof the formerare a The describedin "Studies in Crop Variation IV and VI" [5, tendencyto forceall problemsinto the molds of one or 6], there is a rapid developmentof ideas which were two routinetechniques,insufficient thoughtbeing given quickly put into practice. It is clear that Eden had be- to the real objectives of the investigationor to the come a convinced disciple during this period and it is relevance of the assumptions implied by the imposed but refreshing, alas unfamiliar, see publicationof new methods. Concerning the latter, Fisher's apparently to designs simultaneous with data obtained from their bivalentattitudetowardsmathematicians oftenbeen has successful use. By the end of this perioddata were being remarkedand has been the cause of perplexity and ancollectedfrom designsofgreataccuracyand beautywhich noyance.He himself was an artist theuse of mathematin included all of Fisher's ideas. ics and emphasized the importance of mathematical In spite of all thisin 1926 the Directorof Rothamsted, trainingforstatisticians-the more mathematicsknown Sir John Russell, wrote a paper [16] in the Journal of the greaterthe potentialto be a good statistician.Why theMinistry Agriculture about agricultural of experimen- then did he sometimesseem to referso slightingly to tation which almost totally ignored the ideas of his mathematicians?The answer I think is that his real proteg6.However,in the nextissue [9] in a paper notable targetwas "mathematistry." is to make the distinction It forits brevityand clarity, Fisher outlinedhis philosophy that the wordis introduced here. on the subject, settinghis boss to rightsand anyone else is M\Iathematistry characterized by development of who would listen. theoryfor theory'ssake, which since it seldom touches down withpractice., a tendencyto redefine probhas the lem ratherthan solve it. Typically,therehas once been a The originalconcept that the researchstation needed statisticalproblemwith scientific relevancebut this has a statisticianwas revolutionary, but certainlythe role long since been lost sight of. Fisher felt stronglyabout

3.14 A New Heritage Statisticians for

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

798

of Journal the American Statistical December Association, 1976

this last point, particularlywhen he himselfhad pro- for competentstatisticianswho can tease out the facts duced the originally usefulidea. I have cited already the by analyzingdata, planninginvestigations, and developdevelopment of distribution-free tests which, he felt, ing the necessarynew theoryand techniqueswill,theremisusedideas initiatedin Chapter III of his book Design fore,continueto increase. of Experiments [11, p. 48]. Anotherannoyance was the of generalizationto what he felt was absurdityof his ap- 4.3 Training Statisticians plications of group theoryand combinatorial mathematwill be front Competentstatisticians line troopsin our ics to experimental design. war forsurvival-but how do we get them?I thinkthere The penaltyforscientific irrelevance of course,that is now a wide readinessto agree that what we want are is, work is ignoredby the scientific the statistician's com- neithermere theoremproversnor mere users of a cookmunity.But this does not come to the notice of a sta- book. A properbalance of theoryand practiceis needed tisticianwho has no contact with that community. is and, most important, It statisticiansmust learn how to be sometimesalleged that thereis no actual harmin mathe- good scientists; a talent which has to be acquired by matistry.A group of people can be kept quite happy, experienceand example. To quote Fisher once more,in playingwitha problemthat may once have had relevance 1952, in a letterconcerning proposed StatisticsCenter a and proposing solutions never to be exposed to the to be set up in Scotland he said: "I have no hesitation in dangeroustest of usefulness. They enjoy readingpapers advisingthat such a centreas you have underdiscussion to each other at meetings and they are usually quite should plan to integrateteaching closely with project But we must surely regret that valuable workin whichpracticalexperience be gainedby those inoffensive. can talentsare wasted at a periodin history whentheycould who are capable of learningfromit; in contradistinction be put to good use. to the ruinous process of segregating the keener minds there is unhappy evidence that mathe- into a completely Furthermore, sterileatmosphere"[3]. It is encouragis matistry not harmless.In such areas as sociology,psy- ing that at more and morestatisticalcenterssuch advice chology,education, and even, I sadly say, engineering, is now being taken seriously. who are not themselvesstatisticianssomeinvestigators times take mathematistry seriously.Overawed by what 5. CONCLUSION they do not understand,they mistakenlydistrusttheir We may ask of Fisher own common sense and adopt inappropriateprocedures with no scientific devised by mathematicians experience. Was he an applied statistician? An even more serious consequence of mathematistry Was he a mathematicalstatistician? concernsthe trainingof statisticians.We have recently Was he a data analyst? been passing througha period wherenothingverymuch Was he a designerof investigations? was expectedof the statistician.A great deal of research It is surelybecause he was all ofthesethat he was much money was available and one had the curious situation more than the sum of the parts. He providesan example where the highest objective of the teacher of statistics was to produce a studentwho would be anotherteacher we can seek to follow. of statistics.It was thus possible for successive genera[ReceivedMlay1976.] tions of teachersto be producedwithno practicalknowledge of the subject whatever. Although statistics deREFERENCES partmentsin universitiesare now commonplace there [1] Box, G.E.P. and Tiao, G.C., BayesianInference Statistical in continuesto be a severe shortage of statisticianscomAnalysis,Reading, Mass.: Addison-Wesley Publishing Co., petent to deal with real problems.But such are needed. 1973.
[2]

4.2 Meeting Challenge the


As long ago as 1950, Fisher,delivering the Eddington Memorial Lecture at Cambridge,said [10, p. 22]
For thefuture, faras we can see it, it appearsto be unquesso of tionablethat the activity the humanrace will providethe of major factorin the environment almost every evolving or Whether human organism. theyact consciously unconsciously initiative and humanchoicehave becomethe major channels on of creativeactivity thisplanet.Inadequatelypreparedwe are which unquestionably forthe newresponsibilities, withthe of rapidextension humancontrol overtheproductive resources oftheworld have been,as it were, thrust suddenly uponus. [3] [4] [5]

[6] [7]

One by one, the various criseswhichthe worldfaces become more obvious and the need forhard factson which to take sensibleaction becomesinescapable.The demand

and Youle, P.V., "The Exploration and Exploitation of ResponseSurfaces:An Exampleof the Link Betweenthe Fitted Surfaceand the Basic Mechanismof the System," Biometrics, No. 3 (1955),287-323. 11, Box, JoanFischer, Fisher, TheLifeofa Scientist, New York: John Wiley& Sons,Inc. In press. Cochran, W.G., "Experiments for Nonlinear Functions," Journalof theAmerican Statistical Association, No. 344 68, (1973), 771-81. Eden,T. and Fisher, R.A., "Studies CropVariation The in IV. Experimental Determination the Value of Top Dressings of withCereals,"Journal Agricultural of Science, (1927),54817 62. and Fisher,R.A., "Studies in Crop VariationVI. on Experiments the Responseof the Potato to Potash and Nitrogen," Journal Agricultural of Science, (1929), 201-13. 19 Fisher, R.A., "Studiesin CropVariation An Examination I. of the Yield of Dressed Grain fromBroadbalk,"Journalof 11 Agricultural Science, (1921), 107-35.

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

Scienceand Statistics
"Studies in Crop VariationIII. The Influence , of Rainfall theYield ofWheatat Rothamsted," on Philosophical Transactions the of RoyalSociety London, No. 213 (1924), of B, 89-142. of [9] , "The Arrangement Field Experiments," Journal of theMinistry Agriculture, (1926),503-13. of 33 [10] , "CreativeAspectsof NaturalLaw," The Eddington MemorialLecture,Cambridge, Eng.: Cambridge University Press,1950. [11] -, The Design of Experiments, (8th Ed.), Edinburgh: Oliver& Boyd,Ltd., 1966. , Statistical [12] Methods Research for Workers, (14thEd.), Edinburgh: Oliver& Boyd,Ltd., 1970. [13] and MacKenzie,W.A., "Studiesin Crop Variation II. The ManurialResponse Different of PotatoVarieties," Journal ofAgricultural Science, (1923),311-320. 13 [8] [14]

799
of Forms theFrequency andTippett, L.H.C., "Limiting of Member a Sample," and of Distribution theLargest Smallest 24 Society, (1928), Philosophical of Proceedings theCambridge 180-90. to fromW.S. Gosset R.A. Fisher,1915Gosset,W.S., Letters by by 1936,withsummaries R.A. Fisherand a foreword L. 1970. circulated, McMullen,(2nd Ed.), Privately How They Are Made Russell,E. John,"Field Experiments: of of and WhatThey Are,"Journal theMinistry Agriculture, 32 (1926),989-1001. in Research GreatBritain, of ' A History Agricultural Ltd., 1966. London:Allynand Unwin, Least Squares by Estimation Iterative Wold,H., "Nonlinear Papersin Statistics, in Procedures," F.N. David, ed., Research New York: JohnWiley & Sons, for Festschrift J. Neyman, Inc., 1966,411-44.

[15] [16] [17] [18]

This content downloaded on Mon, 11 Mar 2013 11:54:42 AM All use subject to JSTOR Terms and Conditions

You might also like