Professional Documents
Culture Documents
Blackwell Publishing The Royal Geographical Society (With The Institute of British Geographers)
Blackwell Publishing The Royal Geographical Society (With The Institute of British Geographers)
Source: Area, Vol. 10, No. 5 (1978), pp. 393-398 Published by: Blackwell Publishing on behalf of The Royal Geographical Society (with the Institute of British Geographers) Stable URL: http://www.jstor.org/stable/20001404 . Accessed: 08/05/2012 11:36
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Blackwell Publishing and The Royal Geographical Society (with the Institute of British Geographers) are collaborating with JSTOR to digitize, preserve and extend access to Area.
http://www.jstor.org
The probable consequences of violating the normality assumption in parametric statistical analysis
Raymond Hubbard, Department of Economics, University of Nebraska- Lincoln
Summary. Confronted with non-normally distributed data, many geographers prefer to adopt nonparametric methods when analyzing the results of their research. The present paper argues that, provided the departures from normality are not severe, conventional parametric statistical models may still be frequently utilized.
A number specifically
of articles appearing in recent editions of this journal have been concerned with the need to transform raw geographical data into a
form approximating the normal distribution prior to statisticalmanipulations (Clark, 1973; Pringle, 1976; Roff, 1977).The rationaleunderlying this approach
is, of course, predicated normally adopted distributed, on the need to satisfy the assumptions and it has been argued methods. of standard para of such con that
current nonparametric tests typically do not possess the power, versatility, and extensions tomultivariate situationswhich characterize theirparametriccounter parts (Labovitz, 1970),so that the tendency to embracenonparametric techniques is frequently undesirable and often unwarranted (Nunnally, 1967). Generally, therefore, geographers prefer to employ parametric statistics even when their
data do not fulfil the necessary conditions (Pringle, 1976). In view of this it becomes a matter of vital importance to ascertain the extent to preference which the familiar parametric models are ' robust', at least with respect to the
assumption of normality.
It is the purpose of this note to summarize some of the likely consequences
393
394
sample mean such as skewness and kurtosis. It should be noted that these the normality of a methods by no means exhaust those available for determining
[they are] sometimes the only convenient test for agreementwith the normal distribution '.
Comments of this nature illustrate the dilemma faced by the researcher. In many situations it will be possible for the individual to employ an appropriate
tests of, and corrections for, deviations from normality allows the scholar an element of discretion.Guided partly by convention, partly by his own experience
and judgment, and partly by the character of the particular problem at hand,
the researcher himself must decide whether the distortion of normality is 'significant'. Thus, amajor element adding to the controversy surrounding the 'normality issue' undoubtedly emanates from the lack of undisputed research guidelines (Clark, 1973). The remainder of this paper consequently focuses attention upon the probable outcomes of violating the normality assumption when employing certain common parametricmodels. Violating the normality assumption in regressionand correlation
It is important to understand that in the classical linear regression model the
assumption of normality applies only to the conditional distributions of the endogenous or dependent variable, and to the stochastic disturbance term.The
model may be formally stated as follows:
+1 Xil +2Xi2 + ***+fkXik + Ui Yi =Po
where for each observation on the Xis (exogenous or independent variables), the disturbance term Ui is a (conditionally) normally distributed random variable.Other properties of the disturbance term (such as non-autoregression and homoscedasticity) are adequately discussed by Poole and O'Farrell (1971).
Because the error term is assumed to be normally distributed, it follows directly
regarded as fixed or non-stochastic elements.1This implies that, while the Xis may obviously attain differentnumerical values, they are nevertheless considered
as constants when calculating their mathematical expectation.
395
1973). To the individual working with small sample sizes, however, this rational certain research has revealed ization may be of little consolation. Nevertheless, distributions that even for sample sizes of between ten and twenty observations, may still approximate the normal curve (Koutsoyiannis, 1973). Finally, if the the model purports to represent have researcher suspects that the relationships he should examine the distribution of the been distorted by mis-specification, U1s. This may help residuals, that is, the surrogate values for the unobservable
in suggestingpossible transformationsof the dependent variable, or the need to include additional variableswhile deleting others.
Suppose, for the sake of expository purposes, that the normality assumption has indeed been violated. What, one may inquire, are the potential repercussions it transpires that the relaxation of the normality likely to entail? Fortunately, does only minimal damage to the properties of the ordinary least assumption squares (OLS) estimators of f3, and fk. As Maddala (1977) observes, the small sample properties of the OLS estimators still retain their BLUE (best linear unbiased estimator) characteristics, for these are independent of the form of the probability distribution evidenced by the Uis. That is, they are unbiased and continue to possess the minimum variance among the class of linear unbiased
estimators.However, followingKmenta (1971) and Zeckhauser and Thompson (1970), it should be noted thatwithout specificationof thedistributional form of
the Uis these estimators are no longer efficient owing to the fact that the Cramer Rao lower bound of their variances cannot be determined. Furthermore, they
are no longer maximum likelihoodestimators since legitimateemployment of the likelihood function rests criticallyupon the normality assumption. Considering now the large sample (asymptotic)properties of these estimators,
it can be demonstrated that they are both consistent and asymptotically un biased, for when many observations are employed one can invoke the Central Limit Theorem to illustrate that the sampling distributions of /3"and the /3kS
approach normality as the sample size tends toward infinity.2 Consequently, the asymptotic properties of the least squares estimates are equivalent tomaximum
likelihood estimates; that is, they display the same mean and variance. It is therefore apparent that even when the assumption of normality is violated, the least squares estimates preserve most of their desirable properties. While it is
worth reiterating that the assumption of normality is unnecessary for obtaining estimates of the coefficients, it is a requirement when conducting significance tests and establishing confidence intervals(Maddala, 1977).Yet even under these circumstances, provided that the disturbance term does not depart drastically
from the normal curve, the usual t and F tests of statistical inference may be
considerable amount of related researchhas also been undertaken by, among others, sociologists and psychologists.With respect to the point-biserial corre lation, for example, Labovitz (1967) employed simulationmethods to demon strate the robustnessof this coefficient tomarkedly-skewed distributional forms.
In subsequent research the same author (Labovitz, 1970) showed that even when numbers were randomly and non-randomly assigned to rank-ordered data
(subject to an order preservingmonotonic transformation)to produce a number of logarithmic, exponential, and higher-order curves, the Pearson product
moment correlation coefficient should still be utilized as a superior measure of
396
affect the interpretationof a principal components solution. Similarly, Roff's (1977) investigation of 44 variables from the British 1971 census (in which approximately one-half were non-normally distributed) concluded that trans formations did not significantly alter the correlationmatrix or principal com ponents result.Again, Pringle's (1976) analysis of 64 census variables forCounty Durham demonstrated that,while transformationsare not universally appropri
ate in that they easily produce a normal distribution in the data set, they the incidence of non-normality to the extent that non typically attenuate
have made more explicit the particular distributional form (normal, t, F) to which variableswere assigned.3Similarly, the evidence provided by Nefzger and
Drasgow (1957) substantiating the notion that normality is a needless assump
mathematical fully understand the assumptions and implicationsof the different models which constitute the basis of these approaches; namely, the bivariate normal distribution, the linear regression, and the randomization models.
Experimental studies exist which demonstrate that the Pearson r is not a
particularly robust statistic (Kowalski, 1972;Norris and Hjelm, 1961), and the use of alternativecorrelation coefficients should perhaps be encouraged (Carroll, 1961; Tinkler, 1972). Finally, within a geographical context, the findings of Clark (1973) that even small departures from normality significantly affect principal components structuresand scores are contrary to those of bothMoser
and Scott (1961) and Roff (1977).
397
tests can usually still be applied with confidence. Empirical is plentiful with respect to both two-tailed this viewpoint
1966; Boneau, 1960; Rider, 1929) and F tests (Cochran, 1947; Lindquist, 1953;
Pearson, 1931). Confronted with information of this nature, Gaito (1959, p. 1 16) has commented that ' the mathematical and empirical data indicate that tests of means by analysis of variance (and two-tailed t tests) are of homogeneity
and/or select a lowerprobability level for significance testing, for example, 0^025
instead of 0 05, 0 005 instead of 0 01, and so on (Gaito, 1959).
Conclusions
The aim of this paper has been to indicate that even in those instances where the researcher's data fail to satisfy the normality assumption commonly demanded of parametric statistical models, the alternative course of action should not
generally not result in severely erroneous inferences.As usual, however, the individual should be judicious in his approach, and endeavour to satisfy the
normality Notes 1.When the Xis are assumed to be stochastic (referred to by Poole and O'Farrell (1971) as the random X model), the critical factor iswhether or not they are independent of the error term.When they are not independent, ordinary least squares estimates are biased. 2. It is instructive to note that inmany practical research situations the benefits of asymptotic properties accrue with relatively small sample sizes, for example, 100 observations or less. 3. For further communications concerning Labovitz's findings see Soc. Forces (June 1968) and Am. Sociol. Rev. (June 1971) 4. See, for example, the three comments in Am. Psychol. (Sept. 1958) on the Nefzger and Drasgow article. 5. A further assumption requires that the variance associated with the error term of different treatment populations be homoscedastic. But, in so far as non-normality and heterogeneity of variance tend to covary (Bartlett, 1947), it is convenient to treat them simultaneously throughout the remainder of this paper. References Anderson, N. H. (1961) ' Scales and statistics: parametric and nonparametric', Psychol. Bull. 58, 305-16 requirement in so far as this is possible.
398
Violating
Baker, B. O., Hardyck, C. D. and Petrinovich, L. F. (1966) 'Weak measurement versus strong statistics: an empirical critique of S. S. Stevens' proscriptions and statistics', Educ. psychol. Measur. 26, 291-309 Bartlett, M. S. (1947) 'The use of transformations', Biometrics 3, 39-52 Binder, A. (1959) 'Considerations of the place of assumptions in correlational analysis', Am. Psychol. 14, 504-10 Bliss, C. I. (1967) Statistics in biology, vol. 1 (New York) Boneau, C. A. (1960) 'The effects of violations of assumptions underlying the t test ', Psychol. Bull. 57, 49-64 Borgatta, E. F. (1968) ' My student, the purist: a lament', Sociol. Q. 9, 29-34 Carroll, J. B. (1961) 'The nature of the data, or how to choose a correlation coefficient', Psychometrica 26, 347-72 Clark, D. (1973) 'Normality, transformation and the principal components solution', Area 5, 110-13 Cochran, W. G. (1947) 'Some consequences when the assumptions for the analysis of variance are not satisfied', Biometrics 3, 22-38 Cole, J. P. and King, C. A. M. (1968) Quantitative geography (New York) Gaito, J. (1959) 'Nonparametric methods in psychological research', Psychol. Rep. 5, 115-25 Gregory, S. (1963) Statistical methods and thegeographer (London) Haggett, P., Cliff, A. D. and Frey, A. (1977) Locational methods (London) Hammond, R. and McCullagh, P. S. (1974) Quantitative techniques ingeography (Oxford) Hays, W. L. (1963) Statistics for psychologists (New York) Kendall, M. G. and Stuart, A. (1967) The advanced theoryof statistics, vol. 3 (London) King, L. J. (1969) Statistical analysis ingeography (Englewood Cliffs, N. J.) Kmenta, J. (1971) Elements of econometrics (New York) Koutsoyiannis, A. (1973) Theory of econometrics (New York) Kowalski, C. J. (1972) 'On the effects of non-normality on the distribution of the sample product-moment correlation coefficient', Appl. Statist. 21, 1-12 Labovitz, S. (1967) 'Some observations on measurement and statistics ', Soc. Forces 46, 151-60 Labovitz, S. (1970) 'The assignment of numbers to rank order categories', Am. Sociol. Rev. 35, 515-24 Lindquist, E. F. (1953) Design and analysis of experiments inpsychology and education (New York) Maddala, G. S. (1977) Econometrics (New York)
Mayer, L. S. (1970) 'Comment on "the assignment of numbers to rank order categories"',
Am. Sociol. Rev. 35, 916-17 Moser, C. A. and Scott, W. (1961) British towns (Edinburgh) Murphy, J. L. (1973) Introductory econometrics (Homewood, Ill.)
Nefzger, M. D. and Drasgow, J. (1957) 'The needless assumption of normality in Pearson's r',
Am. Psychol. 12, 623-5 Norris, R. C. and Hjelm, H. F. (1961) 'Non-normality and product moment correlation', J. Exp. Educ. 29, 261-70 Nunnally, J. C. (1967) Psychometric theory (New York)
Pearson, E. S. (1931) 'The analysis of variance in cases of non-normal variation', Biometrika
23, 114-33 Poole, M. A. and O'Farrell, P. N. (1971) 'The assumptions of the linear regression model', Trans. Inst. Br. Geogr. 52, 145-58 Pringle, D. (1976) ' Normality, transformations, and grid square data', Area 8, 42-5
Rider, P. R. (1929) 'On the distribution of the ratio of mean to standard deviation in small
samples from non-normal populations', Biometrika 21, 124-43 Roff, A. (1977) 'The importance of being normal', Area 9, 195-8 Snedecor, G. W. and Cochran, W. G. (1967) Statistical methods (Ames, Iowa) Tinkler, K. J. (1972) 'The physical interpretation of eigenfunctions of dichotomous matrices', Trans. Inst. Br. Geogr. 55, 17-46 Yeates, M. H. (1968) An introduction toquantitative analysis in economic geography (New York) Zeckhauser, R. and Thompson, M. (1970) 'Linear regression with non-normal error terms', R. Econ. Stat. 52, 280-6