The Probable Consequences of Violating the Normality Assumption in Parametric Statistical Analysis Author(s): Raymond Hubbard

The probable consequences of violating the normality assumption in parametric statistical analysis
Raymond Hubbard, Department of Economics, University of Nebraska- Lincoln
Summary. Confronted with non-normally distributed data, many geographers prefer to adopt nonparametric methods when analyzing the results of their research. The present paper argues that, provided the departures from normality are not severe, conventional parametric statistical models may still be frequently utilized.

A number specifically

of articles appearing in recent editions of this journal have been concerned with the need to transform raw geographical data into a

form approximating the normal distribution prior to statisticalmanipulations (Clark, 1973; Pringle, 1976; Roff, 1977).The rationaleunderlying this approach
is, of course, predicated normally adopted distributed, on the need to satisfy the assumptions and it has been argued methods. of standard para of such con that

metric statistical models. One of these assumptions requires that scores be

that in the absence

ditions distribution-free or nonparametric statistical procedures should be

in favour of parametric However, it is also recognized

current nonparametric tests typically do not possess the power, versatility, and extensions tomultivariate situationswhich characterize theirparametriccounter parts (Labovitz, 1970),so that the tendency to embracenonparametric techniques is frequently undesirable and often unwarranted (Nunnally, 1967). Generally, therefore, geographers prefer to employ parametric statistics even when their
data do not fulfil the necessary conditions (Pringle, 1976). In view of this it becomes a matter of vital importance to ascertain the extent to preference which the familiar parametric models are ' robust', at least with respect to the

assumption of normality.
It is the purpose of this note to summarize some of the likely consequences

of violating the assumption of normality when utilizing parametric statistical

methods and extend and amplify some of the views on this subject expressed in to Area. The need for a paper of this nature is revealed previous contributions textbooks by the fact that commonly employed dealing with quantitative aspects of geographical analysis (for example, Cole and King, 1968; Gregory, 1963; Hammond and McCullagh, 1974; King, 1969; Yeates, 1968) do not devote much attention to this important issue. Testing for departures from normality

The techniques generally employed to determine whether a distribution ap

proximates normality are varied. One can, for example, work with graphical such as probit, rankit, and fractile diagrams (Bliss, 1967), or with a methods chi-square goodness of fit test as outlined inmany standard statistics texts (Hays,

1963).Alternatively, asSnedecor andCochran (1967)demonstrate, the investigator

to ascertain the degree to which may choose violated by utilizing the information contained the normality assumption in higher moments about is the



Violating the normality assumption

sample mean such as skewness and kurtosis. It should be noted that these the normality of a methods by no means exhaust those available for determining

distribution. Inmany applied contexts, however, researcherstend to emphasize

the role of chi-square and log likelihood approaches. Yet Bliss (1967, p. 140) points out that these are ' . . . indicative but hardly a critical criterion, although

[they are] sometimes the only convenient test for agreementwith the normal distribution '.
Comments of this nature illustrate the dilemma faced by the researcher. In many situations it will be possible for the individual to employ an appropriate

transformation such that the resultant distribution conforms to the normal

curve (Haggett, et al., 1977). Yet in other instances this state of affairs may be only partially realized at best. In short, the absence of unequivocally acceptable

tests of, and corrections for, deviations from normality allows the scholar an element of discretion.Guided partly by convention, partly by his own experience
and judgment, and partly by the character of the particular problem at hand,

the researcher himself must decide whether the distortion of normality is 'significant'. Thus, amajor element adding to the controversy surrounding the 'normality issue' undoubtedly emanates from the lack of undisputed research guidelines (Clark, 1973). The remainder of this paper consequently focuses attention upon the probable outcomes of violating the normality assumption when employing certain common parametricmodels. Violating the normality assumption in regressionand correlation
It is important to understand that in the classical linear regression model the

assumption of normality applies only to the conditional distributions of the endogenous or dependent variable, and to the stochastic disturbance term.The
model may be formally stated as follows:
+1 Xil +2Xi2 + ***+fkXik + Ui Yi =Po

where for each observation on the Xis (exogenous or independent variables), the disturbance term Ui is a (conditionally) normally distributed random variable.Other properties of the disturbance term (such as non-autoregression and homoscedasticity) are adequately discussed by Poole and O'Farrell (1971).
Because the error term is assumed to be normally distributed, it follows directly

that observations on the dependent variablemust also be normally distributed,

since these are themselves linear combinations of the errors. But the Xis are not necessarily normally distributed variates, and it is as well to dispense with this somewhat common fallacy at the outset. Indeed, it is postulated in the above classical model that the Xis are not even random variables, but instead are

regarded as fixed or non-stochastic elements.1This implies that, while the Xis may obviously attain differentnumerical values, they are nevertheless considered
as constants when calculating their mathematical expectation.

The statistician or econometrician typically rationalizes the normality

assumption by positing that the stochastic disturbance term incorporates a large number of individually unimportant random effects, all of which influence the behaviour of the endogenous variable Yi in only a minor fashion. The researcher then appeals to the implications of the Central Limit Theorem to justify the fact in the error term. In that such a process would induce a normal distribution addition, it isworth emphasizing that the Central Limit Theorem only applies in

those instanceswhere the random effects are mutually independent (Murphy,

Violating the normality assumption


1973). To the individual working with small sample sizes, however, this rational certain research has revealed ization may be of little consolation. Nevertheless, distributions that even for sample sizes of between ten and twenty observations, may still approximate the normal curve (Koutsoyiannis, 1973). Finally, if the the model purports to represent have researcher suspects that the relationships he should examine the distribution of the been distorted by mis-specification, U1s. This may help residuals, that is, the surrogate values for the unobservable

in suggestingpossible transformationsof the dependent variable, or the need to include additional variableswhile deleting others.
Suppose, for the sake of expository purposes, that the normality assumption has indeed been violated. What, one may inquire, are the potential repercussions it transpires that the relaxation of the normality likely to entail? Fortunately, does only minimal damage to the properties of the ordinary least assumption squares (OLS) estimators of f3, and fk. As Maddala (1977) observes, the small sample properties of the OLS estimators still retain their BLUE (best linear unbiased estimator) characteristics, for these are independent of the form of the probability distribution evidenced by the Uis. That is, they are unbiased and continue to possess the minimum variance among the class of linear unbiased

estimators.However, followingKmenta (1971) and Zeckhauser and Thompson (1970), it should be noted thatwithout specificationof thedistributional form of
the Uis these estimators are no longer efficient owing to the fact that the Cramer Rao lower bound of their variances cannot be determined. Furthermore, they

are no longer maximum likelihoodestimators since legitimateemployment of the likelihood function rests criticallyupon the normality assumption. Considering now the large sample (asymptotic)properties of these estimators,
it can be demonstrated that they are both consistent and asymptotically un biased, for when many observations are employed one can invoke the Central Limit Theorem to illustrate that the sampling distributions of /3"and the /3kS

approach normality as the sample size tends toward infinity.2 Consequently, the asymptotic properties of the least squares estimates are equivalent tomaximum
likelihood estimates; that is, they display the same mean and variance. It is therefore apparent that even when the assumption of normality is violated, the least squares estimates preserve most of their desirable properties. While it is

worth reiterating that the assumption of normality is unnecessary for obtaining estimates of the coefficients, it is a requirement when conducting significance tests and establishing confidence intervals(Maddala, 1977).Yet even under these circumstances, provided that the disturbance term does not depart drastically
from the normal curve, the usual t and F tests of statistical inference may be

safely utilized to yield reasonably accurate approximations (Kmenta, 1971).

Lest it be imagined that econometricians exercise a monopoly on research concerning violations of the normality premise, it should be mentioned that a

considerable amount of related researchhas also been undertaken by, among others, sociologists and psychologists.With respect to the point-biserial corre lation, for example, Labovitz (1967) employed simulationmethods to demon strate the robustnessof this coefficient tomarkedly-skewed distributional forms.
In subsequent research the same author (Labovitz, 1970) showed that even when numbers were randomly and non-randomly assigned to rank-ordered data

(subject to an order preservingmonotonic transformation)to produce a number of logarithmic, exponential, and higher-order curves, the Pearson product
moment correlation coefficient should still be utilized as a superior measure of


Violating the normality assumption

association to ordinal statistics.Again, other studies recommending the use of

the Pearson r when the normality assumption is not dramatically violated are

readilyavailable (Borgatta, 1968;Nefzger andDrasgow, 1957;Nunnally, 1967).

In geography also, there is some evidence attesting to the robustness of the normality assumption in empirical work. Thus, for example, in the analysis of a and Scott large array of socio-economic and demographic variables, Moser (1961) discovered that a log transformation of raw data did not materially

affect the interpretationof a principal components solution. Similarly, Roff's (1977) investigation of 44 variables from the British 1971 census (in which approximately one-half were non-normally distributed) concluded that trans formations did not significantly alter the correlationmatrix or principal com ponents result.Again, Pringle's (1976) analysis of 64 census variables forCounty Durham demonstrated that,while transformationsare not universally appropri
ate in that they easily produce a normal distribution in the data set, they the incidence of non-normality to the extent that non typically attenuate

parametric statisticalmodels should not be adopted unnecessarily.

It would be negligent, however, to leave the reader with the impression that one can dismiss the applicability of nonparametric for almost all of methods,

the studies justmentioned have been contested to some degree.Mayer (1970),

for example, has criticized Labovitz's research by arguing that the latter should

have made more explicit the particular distributional form (normal, t, F) to which variableswere assigned.3Similarly, the evidence provided by Nefzger and
Drasgow (1957) substantiating the notion that normality is a needless assump

tionwhen computing the Pearson r has not gone unchallenged.4Binder (1959),

in particular, has noted that much of the confusion which exists among social scientists when employing correlation methods can be attributed to a failure to

mathematical fully understand the assumptions and implicationsof the different models which constitute the basis of these approaches; namely, the bivariate normal distribution, the linear regression, and the randomization models.
Experimental studies exist which demonstrate that the Pearson r is not a

particularly robust statistic (Kowalski, 1972;Norris and Hjelm, 1961), and the use of alternativecorrelation coefficients should perhaps be encouraged (Carroll, 1961; Tinkler, 1972). Finally, within a geographical context, the findings of Clark (1973) that even small departures from normality significantly affect principal components structuresand scores are contrary to those of bothMoser
and Scott (1961) and Roff (1977).

Additional commentsconcerning inferentialstatistics

It has already been noted that provided the departures from normality are not

excessive, the formulation of confidence intervals and significance tests for

regression and correlation coefficients will not on average be unduly affected in an adverse fashion. As may be anticipated, further evidence can be easily adduced similar findings with respect to the generalized use of indicating

inferential statistics.This iswelcome information to the researcher who wishes

to extend the findings for his sample to the relevant underlying population. In both the F and t tests it is assumed that the error term is distributed available evidence would seem to indicate that these normally.5 Nevertheless, tests are virtually immune to violations concerning this premise (Boneau, 1960). Even when observations have been drawn from logarithmic, logistic, expo the F and t nential, double exponential, J-shaped, and rectangular populations,

Violating the normality assumption


tests can usually still be applied with confidence. Empirical is plentiful with respect to both two-tailed this viewpoint

research supporting t tests (Baker, et al.,

1966; Boneau, 1960; Rider, 1929) and F tests (Cochran, 1947; Lindquist, 1953;
Pearson, 1931). Confronted with information of this nature, Gaito (1959, p. 1 16) has commented that ' the mathematical and empirical data indicate that tests of means by analysis of variance (and two-tailed t tests) are of homogeneity

relatively insensitive to both deviations from normality and from homogeneity

of variance'. At this point a number of caveats are appropriate. First, counter-examples may be cited indicating that tests of the homogeneity of several variances are indeed sensitive to departures from normality (Haggett, et al., 1977; Kendall and Stuart, as Anderson 1967). Secondly, (1961) points out, the lack of normality will t tests are employed, almost certainly make its presence felt if one-tailed and experiments involve grossly disparate sample sizes. Yet even under these circum stances the researcher may be able to exercise a degree of control over the sample sizes utilized in a study. Thirdly, in the event that deviations from normality are marked, the individual can either apply suitable transformations to his variables

and/or select a lowerprobability level for significance testing, for example, 0^025
instead of 0 05, 0 005 instead of 0 01, and so on (Gaito, 1959).

The aim of this paper has been to indicate that even in those instances where the researcher's data fail to satisfy the normality assumption commonly demanded of parametric statistical models, the alternative course of action should not

necessarily be the immediate adoption of some nonparametric technique.This

should not be construed to imply that the latter are dispensable or redundant, for several of the studies cited in this paper clearly suggest that the dust has not settled on the 'normality yet completely issue'. Nevertheless, it is maintained that because of the remarkable robustness displayed by parametric procedures, their continued use in the face of reasonable violations of assumptions will

generally not result in severely erroneous inferences.As usual, however, the individual should be judicious in his approach, and endeavour to satisfy the
normality Notes 1.When the Xis are assumed to be stochastic (referred to by Poole and O'Farrell (1971) as the random X model), the critical factor iswhether or not they are independent of the error term.When they are not independent, ordinary least squares estimates are biased. 2. It is instructive to note that inmany practical research situations the benefits of asymptotic properties accrue with relatively small sample sizes, for example, 100 observations or less. 3. For further communications concerning Labovitz's findings see Soc. Forces (June 1968) and Am. Sociol. Rev. (June 1971) 4. See, for example, the three comments in Am. Psychol. (Sept. 1958) on the Nefzger and Drasgow article. 5. A further assumption requires that the variance associated with the error term of different treatment populations be homoscedastic. But, in so far as non-normality and heterogeneity of variance tend to covary (Bartlett, 1947), it is convenient to treat them simultaneously throughout the remainder of this paper. References Anderson, N. H. (1961) ' Scales and statistics: parametric and nonparametric', Psychol. Bull. 58, 305-16 requirement in so far as this is possible.



the normality assumption

