Download as pdf
Download as pdf
You are on page 1of 8
Capra SFE Te cy for Pohoplgal Ree Methodology On Stepwise Discriminant Analyses Applied to Physiologic Data Joun M, Laé ¥ AND JOSEPH SCHACHTER The Bintatsies Center of he Department of Statsies, The George Washington Universi and The Research Center in Child Psycho of he Patsbargh Child Guadance Cente, and Drpariment of Paehiaty, Universi of Pitsburgh School of Medicine ABSTRACT Recently widespread interest has arisen in the se of stepwise discriminant function techniques with evoked physiologie data. The purpose of this presentation is to discus ‘non-mathematical terms how such analyses can easly be misused and misinterpreted, point- ing out specific pitfalls to be avoided. The various staGstical tests of hypotheses used with stepwise discriminant functions, predominantly in the BMDO7M computer program, are even though there may be no real differences between groups. Ax an empi logic data (EKG and EEG) was collected under stimulus and no stimulus series of physi conditions. Stepwise analyses consistenly produced Conditions whereas nonstepwise procedures did not, Under fnalyses consistently exaggerated the extent of differences detected by nonstepwize ‘Stepwise procedures, Discriminant analyses, Law of the tera DESCRIPTOR: ‘rithm, Neonatal physiology. Discriminant analyses (DAs) have recently been discussed, described, and advocated as a useful statistical method for the analy sis of evoked physiologic data @onchin, 1966, 1969a, 1969b; Donchin, Callaway, & Jones, 1970; Williams, Lach” in, & Schachter, 1971). The purpose of this paper is to discuss in non- ‘This report js based on an easier paper which was presented to the annual mecting of the Society for Peychophysiological Research held during 1972 in Boston, This project was supported by funds from the National Institute of Mental Health (MH. 19677) and from the RK. Mellon, SM. Scaife, M. Falk Medical Fund, Grant and Buhl Foundations. ‘The authors wish to acknowledge the data analysis assistance of Mr. Frank Wimberly and expecially ‘thank the referee for many helpfol sugges "Address requests for reprints to: Dr. John M. Lachin, The Biostatistics Center, 7979 Old George- town Ra, Bethesda Md. 20014, icant differences under n0 sti imulus conditions, lus stepwise lyse To ‘mathematical terms some issues in the in- terpretation of stepwise discriminant analyses, pointing out. specific pitfalls to be avoided, and to illustrate how mis- leading “significant” results can easily but falsely be obtained, Excellent discussions of the proper application of discriminant analyses have already been presented (onchin 1966, 1969a, 1969b): and our main purpose, therefore, is to. clarify those conditions under which this tech- nique can and cannot be usefully applied. In doing so, we hope to strengthen rather than detract from the applicability of dis- criminant analyses to physiologic data, Let us begin by examining the basic elements of a stepwise DA, and in particu- lar, the program «VMDO7M of the BMD series of compu: programs (Dixon, 1968). Although the theory of dis. 708 704 criminant functions and related topics is much broader than that encompassed in this program, this one program is widely considered the single main resource avail- able on discriminant analyses. This is so because the applicability of certain statis- tical tools often depends more on the availability of a program to perform the computations than on the statistical so- phistication of the researcher or his stati tical consultant. ‘When one examines this program care- fully it is seen that the computations are divided into two sections; statistical tests of hypotheses, and che actual problem of discriminating between the given popu- lations using linear discriminant functions of the set of observations.’ As R.A. Fisher (1936) suspected when he developed the underlying theory, the proportion cor- rectly classified by this procedure depends to a large extent on the mean differences between the populations on the given variables, Although definite proof of this relationship under certain assumptions was not derived until much later (Ander son, 1951), for many years this relation. ship was logically accepted, It is partly for this reason that the BMDO7M and other such programs include tests of related hy- potheses along with the development and application of the discriminant functions, ‘Thus, itis not surprising that applications of discriminant analyses to physiologic data have often been interpreted primar ily in terms of tests of differences between populations. In the following, therefore, we shall concern ourselves primarily with the application and interpretation of these Tests and Discriminant Analysis Assume that each sample element (e.g. cach subject) is measured on p variables which are distributed as multivariate nor- mal within each of k disjoint populations with a common covariance matrix, A test of the basic hypothesis of the equality of the distributions across populations then Statistical An dhe following, a detailed description of dis: criminant functions wll not be presented since iti hot essential to our argument FOr the genesal theo Fy the reader is referred 10 Anderson, 1988, Chapter 6, ot Rao, 1958, Chapters 7-1, For the spec proach used in the BMDOTM see Ra, 19 260-266, 294.316, and 365-374 LACHIN AND SCHACHTER Vol, 11.6 reduces t0 a test of the equality of mean vectors since the covariance matrices are assumed equal. In the BMDO7M, the test usually takes the form of a one-way k group multivariate analysis of variance (MANOVA) using the p observations as the dependent variables (Rao, 1952; An derson, 1958). Since all p variables are included, this shall be referred to as a full inde test of equality aeross groups. Sim- ilarly, a full model or monsiepaive DA would “then develop the discriminant, fanctions based on the complete set of variables. Respectively these are analo- gous to a full model test of regression and @ nonstepwise regression analysis under thegeneral linear models A sepuior discriminant analysis, how- ever. goes on to determine which’ sshset of these initial p variables is quasi-optimal according to some criterion. Although stepwise procedure will not necessarily find the absolute optimal subset, it will usually determine a satisfactory subset. At the heart of any stepwise procedure is some statistic which allows one to measure the relative contribution of each variable under consideration to the final resul.? Although other criteria could have been ‘chosen the BMDO7M uses F statistics de- rived from a series of one-way analyses of covariance, These are a separate F to re- move for each of the variables already selecied atu a separate F to enter for each of the remaining variables not yet se- lected. These tests measure the relative contribution of each variable to the over= all differences between the groups, and they are analogous, therefore, to a test of significance of each coefficient in the gen- eral linear regression model. Exact tests of discriminant coefficients, however, are not used because their standard errors are tractable only in the case of two popu lations (c.f. Bartlet, 1947). The overall F ina stepwise DA then tests the hypothesis that the vectors of the means of just the selected variables are equal across the populations. This overall F, therefore, is a reduced model test of equality across groups and is analogous to a reduced model test °%Sce Draper and Smith. 1966; Dixon, 1968: oF Lachin, 1978 fora more detailed description of step Wise procedures in general. and ss applied respec: tively to regresion analysis. discriminant functions And simple Bayes decision rues, November 1974 for regression in_a stepwise regression analysis. Finally, this program goes on to multiple comparisons between each pait bf group means, Note that a stepwise DA ddoes not include a full modet overall test tunless by chance all of the p variables are selected. ‘Although all of the above statistics are intended to complement the devel. ‘opment, application, and interpretation of the discriminant functions, the reduced model statistics corresponding to the test fof the overall hypothesis of equality across groups and the test of multiple com- parisons are worthless, ‘The reason for this Keconceptvally related to Khintchine’s aw of the iterated logarithm (cf. Feller, 1957), which has been shown to be appli- cable to the class of fixed sample size, as Opposed to sequential tests (Robbins, 1052; Anscombe, 1954). With respect 10 Suident’s { or the standard F test, if the experimenter successively. increases the Size of his sample from each of two popu- lations and computes the test statistic after each successive block of subjects is obtait ced, then the probability that the mull hy- pothesis will be rejected approaches unity Bs the sample size increases. In other words, if one drew two samples succes: Sively from the same population, ie. such that there is no real difference between the two groups, then eventually a run of Sample elements would occur with cer- fainty such that the null hypothesis would be rejected when performing the test af- ter each successive block of elements was obtained. In effect, therefore, the sig- nificance or alpha level of the test would be unity rather than the pre-defined value (05 or .O1, ete.) ‘The reason for this is that in any fixed sample size test such as the for F, the sample size is not a random variable but some a priori specified value If the sample size is to be treated as a random variable along with the sample ‘means and variance, then a sequential test ‘of the hypothesis would be appropriate {cc Wald, 1947; Cornfield, 1966), ‘Analogously, in the reduced model overall of a stepwise DA a new random Sariable has also been added, the number of variables. selected. Clearly. Sufficient random variation between the groups on a few of a relatively large pool DE variables, one will almost always reject [STEPWISE DISCRIMINANT ANALYSES givens 705 the null hypothesis of no group dilferences if a stepwise analysis is used, even if there are no real group differences. In effect, therefore, if the primary source of interpretation is the re- Guced model overall F, then the results are worthless since the test statistic does hot account for the added random vari- able, the number of variables selected fand since the test procedure is not se- Gquential with respect to this added ran- dom variable To illustrate the severity of this prob- Jem, let us assume that the initial p vari- tables were statistically independent and p Separate univariate tests (F's to enter) were conducted with significance level a. In this case, the probability of having at least one significant difference among the p variables, say ar, can easily be shown t0 he at=1 (1 ait, This quantity, there- fore, is the probability that a stepwise procedure applied wo such data would Field a seemingly significant difference af- Ter just the first step. For a=.05 and only 10 initial independent variables, «° is an amazingly high 40, In this over-simplified Situation, therefore, a stepwise DA would lead to a false rejection of a true null hy- pothesis in at least 40% of such cases after just the first step. Application to Empirical Data ical demonstration of the meaninglessness of tests of significance in Stepwise DAs, both stepwise and nonstep- wise DAs were applied to heart period (HP) and clectroencephalographic (EEG) data collected from each of 3. neonates For each of 2 Se (8 5, test sesion 15-1) and $6, session 1 (G1) ) 400 epochs or time slices of resting HP and EEG data were collected over a 3.5 hr period of esting during which no stimuli of any kind were presented. For both Ss, there- fore, a series of control or random data ‘was collected. During 2 separate test ses Sons of the third §-(S 10, test sessions 1 (10-1) and 2 (10-2)) similar sets of HP find EEG data were obtained. During each Of these laiter sessions, however, one of four intensities of a puncate auditory ulus was presented on a random basis during each of the 400 epochs in the test- ing period, For this S, therefore, 2 se: sions of stimulus associated data were col- 706 lected. Each of the 4 test sessions was then treated as a separate experiment, Experimental Procedure Each newborn was tested on the second Postenatal day (test session 1, 3612 hrs old), or on the third post-natal day (vest session 2, 60212 hrs old), or on both days, The infant was fed, swaddled in ai splint at 7 cm H,0 air pressure an placed supine with’ side-to-side - head movements restricted by a padded device, Beckman clectrodes were applied for recording EKG, horizontal and vertical eye movements, and submental and bi- ceps EMG, Electrodes for recording were placed on the interaural line to the right and left of the midline, a distance equal to that of the inner canthus of the eye from the midline. Fach of these was referred to a midline electrode, 2 cm an- terior to the interaural line, for bipolar recording. The EEG was digitized on-line with a PDP-12 computer every 5 msec rom 50 msec prior to stimulus presenta tion to 1205 msec following stimulus pre- sentation. Simultaneously, an R-peak de tector (Kerr, Tobin, Milkman, Djoleto, Khachaturian, Williams, Schachter, & Lachin, 1971) was employed to record the EKG (R-R intervals) starting 10 see prior to and ending 15 see after each stimulus Presentation. Again note that during the ‘ho-stimulus sessions (5-1 and 6-1) the data was collected as though one of the four stimuli had been presented in each epoch, Stimuli (presented 10'S. 10. only) were 0.83 msec pulses (single wave, base fre- quency 1200 Hz), which were presented synchronously with an Rewave of te EKG, Stimuli of peak intensities of 60, 70, 88, and 110 dB were presented against a background pink noise of 55 dB. The four intensities of clicks were randomized Within blocks of 4 epochs throughout the 8.5 hr testing period: a total of 100 pre- sentations were made at cach intensity Interstimulus intervals were varied from 25 t0 55 sec according to a predetermined Pseudo-randomized format Within cach test session or experiment, 4 groups of HP and EEG data were then constructed, denoted as control (60 dB). stim-1 (70 dB), stim-2 (88 dB), and stim-3 (110 dB), each consisting of 100 epochs of data, For the stimulus experiments (10-1 and 10-2), the epochs were grouped ac- LACHIN AND SCHACHTER Vol. 11,80, 6 cording to which of the four stimuli had been presented during each epoch. For the no-stimulus experiments (5-1 and 6-1) the epochs were similarly grouped at though one of the four stimuli had been Presented during each epoch. The HP and EEG data were then analyzed across the 4 groups of epochs within each exper- iment separately using the stepwise and the nonstepwise DA procedures to deter- mine whether these groups could be dis- tinguished on the basis of the HP or EEG data, Dependent Variables For analyses based on HP data, a 6-ele- ‘ment vector of observations was construct- ed from each epoch of HP data. In each epoch the R peak at which the stimulus was (or would have been) presented wis identified. The differences. in msec be. tween the HP immediately preceding “stimulus” presentation and each of 6 Post-stimulus HPs were caleulated; utile zing the first 5 plus the 22nd. post stimulus HPs. Epochs were scored identi- cally, regardless of whether a. stimulus had actwally been presented. The G-ele- ment HP vector was thus obtained. for each of the 100 epochs in each of the 4 groups of epochs within each experiment, For analyses based on EEG data, an 18 element vector of observations was, con- suucted from the poststimulus voltages in each epoch. The differences between the average prestimulus voltage (of 5 points from 45 to 25 msec prior to stimu Jus presentation) and the voltage at each of 18 post-stimulus times were calculated, ‘These 18 times were selected on the basis of reviewing the average EEG response curves of $6 used in previous studies to determine times corresponding to peaks and troughs in the.evoked potential. ‘They ranged from 40 msec to 1010 msee fol- lowing stimulus presentation, Analysis ‘These multivariate HP and EEG obser- vations were then used as the basis for separate discriminant analyses within each test session or experiment, Since. each analysis was to be performed “within sub- Jeets," however, it was first necessary to determine that the successive observations were indeed serially independent. Within each session, therefore, circular auto- November 1974 regression coefficients. (Anders0 wwere calculated separately for BHP and cach of the 18 EEG vector cle- ents by treating the 400 epochs within that session as one time series. Of the 96 coefficients so calculated (4 sessions 24 Vector elements), only 1 was significant at the OL level. OF these, 70 were on the order of [ri05, and thus the assummp- tion of independence appears 10 have been satisfied These multivariate HP and EEG obser- vations were then used as the basis for Separate 4-way stepwise and nonstepwise DAs within each test session, In stepwise analyses. F values of 1.0 were used tor addition and deletion, In each analysis, the overall F test of equality across groups twas then examined (0 determine whether J gnifeant difference had been detected, and the results are presented in Table 1 OF the stepwise analyses (reduced Model Fs) in the no-stimulus experiments (se5- sions 5-1 and 6-1), the HP data for one Experiment (6-1) yielded a so-called sig- snyjeant difference at the 05 level; whereas the stepwise analyses of EEG were highly significant in both experiments, ‘The fact Femains, however, that this is all control ‘Gata and any differences between the four groups are due to purely random or [STEPWISE DISCRIMINANT ANALYSES 707 therefore, the null hypothesis was falsely rejected under the stepwise analysis, On the other hand, nonstepwise analyses (Gull-model #'s) of the same two experi- ments failed to yield any_ significant differences. As an illustration of the de- gree of random variation which existed ithout any specific stimuli, the average cardiac (HP) and resting potential (EEG) curves for $6 are presented in Figs. | and 2 respectively. Tn the stimulus experiments (sessions 10-1 and 10-2), both stepwise and non stepwise analyses produced significant re- Sults, In all cases, however, the stepwise analyses yielded F statistics with much Smaller probability levels than their nou Stepwise counterparts (Fable 1). Discussion ‘The above results clearly indicate, therefore, that in a stepwise analysis, the Jevel of tests of significance (05 in this case) simply was not maintained, “Thus, even with relatively large sample sizes (N= 100 per group) and relatively Small numbers of variables (6 and 18), Sufficient random variation existed to lead to spuriously “significant” results when stepwise procedures were applied. Given no real group differences, it should be rou ects, In three of these four cases, obvious therefore, that such spurious °SIf- TABLE 1 Compan of sera tes re for stp and nanan deriva ana STEPWISE No Sumolus =] RP 6 a2 No Stimulus 6 fa | ue 6 3 os ‘No Stimulus 5 {1 | ec] ew 8 pot No Stimulus 6 |i | acl we 2 001 Stimulus w } ou] ne 6 3 13ES Stimulus w | 2 | HP 5 8 05S Stimalus w |i | ec} ww 0 10 E10 Stimulus w | 2 | eee] om 9 6. E10 NONSTEPWISE No Stimulus >) s 6 1000 No Stimulus 6 | | He 6 6 0.176 No Skimalus 8 | 1 | pec] 8 8 051 No Stimulus és |i | uc] 8 8 0.908 Stimulus w |i | ur 6 6 20 F4 Stimlas w | 2 | ar 6 6 Os Es Stimulus w |i | ec} ws 1D ES Stimulus w | 2 | ceo] 8 OL ES LACHIN AND SCHACHTER Vol 1,.No.6 No Stimuli Administered Subject ne Session #01 average Heart Duration tn asec, (4 asec. /untt) tin, "sein. 2” "sein. 1 Control 0246 Time in Seconds 1. Average cardiac (HP) curve for $6, No Stimuli Administered Subjece #06 Session $01 Average Bvoked eG I. 0 200 400 600 "sein. 3” "stan. 2" “stim a" contro 800 1000 Tine in Milleseconds Fig, 2. Average resting potential (EEG) curve for $6. nificant” differences can easily be obtained when the source of inference is the re- duced model overall F test based on a set of variables selected by a stepwise proce- dure, When the null hypothesis is. tru this bias toward yielding significant results increases as the number of variables in. creases, and decreases as the sample size increases asymptotically. Further, when real differences do exist, the reduced model overall F will exaggerate the extent of such differences, 18s recommended, therefore, that a full mode el test of significance, preferably a simple ‘one-way MANOVA, should always be per- formed prior to applying a stepwise dis. criminant analysis. Until satisfactory meth- ods are developed for testing hypotheses in conjunction with stepwise analyses, a fall-model test is the only appropriate Nowwmley, 1974 procedure, It should be noted, however, that the a priori selection of dependent variables is equally appropriate. For ex- ample, if the data is in the form of evoked responses, then the researcher could se- lect data’ points or intervals along the ‘curve which he feels a priori are most rep- resentative of the degree of responsivity. Performing a full model test on a large number of physiologic variables, however, ‘may present an additional problem in that ‘often one or more of the variables may be fa near linear function of the others, in which case one of the matrix determi nants used in computing the test statistic will approach zero. To correct for this, one could first run the stepwise procedure with an F for addition and an F for dele- tion of 0.0, and a tolerance level on the order of .05 rather than the usual ,0001 If any variable is not selected, it will then be due solely to a near linear dependence, and not its lack of contribution to dis crimination, The overall F test could then he computed using the m=p variables which pass the tolerance test, and this ‘would be a valid full-model test In any case, once it is demonstrated that the populations do indeed differ on STEPWISE DISCRIMINANT ANALYSES 709 the given variables, it is then appropriate to determine which subset of the initial variables maximizes one’s ability t0 dis- tinguish between the given populations. In doing so, the stepwise discriminant analysis is indeed a most powerful tool. In summary, therefore, one must not confuse the problem of inference (MA- NOVA) with that of discrimination (DA). Although one follows logically from the other, a reduced model approach to in- ference is simply not appropriate since the operating characteristics of a reduced ‘model test are never known. Finally, it should be noted that the above results and recommendations are applicable to stepwise analyses in general. ‘The reduced model test of multiple re- {gression is equally likely co supply spur- iously significant results and recently work has been initiated toward developing more appropriate statistical tests for use in stepwise multiple regression analyses (Pope & Webster, 1972), Hopefully this work will eventually lead to satisfactory reduced model statistical tests for not only stepwise regression analysis, but stepwise DAsas well REFERENCES, Andean. 1. W. Classification. by multivariate ‘analy, Poyhumrvd, BI, 16, 1-50, Andere TW. da intraductal tinal New York: Wiley. 1986. Anderson. TW. The stitial tual uf time srs, New York: Wiley, 1971 Aoconnle, EJ. Fixed simplesize analysis of se inate, 1854, 10,88- 10, lysis Join! of the oral Suatisiral Swen. 1947, 9 Sup). 17 Conifisl, A. Bavestin tex-of some Pesce with applications to see il Journal of the evi Stasi? 196501. 577- 994, Dison, W. F- BMD cumpater pragvams. Los Angeles University of Califor Dochin, EA multivariate approach to the analysis ‘of average evoked potentials ERE Transactions on Imrdicl Esgineritn, 1986, BME-13, 131-130. Donchin, E, Dats analysis techniques in average ‘voked potential rexearch, In E. Doschin & D. B. Lindsey (Eds), Jevrnge vghed patties: Methods, ‘esa tid svaluation, Washington D.C NASA SP-191, Govt Printing Office, 1969, Pp. 199-217. w Donchin, , Discriminant analysis in average evoked ‘exponse studies The study of single tial data Ebrirowerplalography © Clineal Newophyslogs, 1969, 27. 311-318.) Donchin, E., Callas, E. & Jones R Auditory ‘evoked potential vatiability in schizophrenia, Th The applestion of dischminant analysis. Ble trocmaphalogrphs @ Clinl Newfie, 1970, 29, 420-440, Diapers Nu & Smith, H. ‘Now York: Wiley. 1966, Weller, We sn ntti Ov pray Hen anal is “aitivmiins. Vol. 1. New Yorks Wiley. 19 Fishers RUA. The ue oF multiple measirements in tamonoanie problems. dines af Eugen, 1936. 7, 17M 18 “J. Tubin, ML, Milka, Nu Djoleto, sirian, A. Willis, T. Sc inn]. A PDP-12 system for dhe online acquisition ‘eave vate data, A PDP-12 bird no ‘atin oper, Mayead, Maxx Digital Equipment Wi ‘as sepise procedure for two. pop Tation Bayes decison rules using discrete ale Applied gression ana Birtvie, 1973, 29. 951-568, Pope. PT, & Webwter, J.T. The use of an TEsatinic in sepine regrension procedures. Trl mete, LIT, P, 327— 340, Buon CR. nr atic mt ioe seuir New Yorks Wiley, 1952, Robihns, H. Some aspects ofthe sequential design of fexperiments, alti of the sues Matbonatead Sait, 108 tiphasic heart rate response to adiiry chek i ‘neonates! Sources of varince in response to te petitive sen, Pye hephyvlogy HPT, 8.227, (Abe sexed)

You might also like