Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

The Superiority of Simple Alternatives to Regression for Social Science Predictions

Author(s): Jason Dana and Robyn M. Dawes


Source: Journal of Educational and Behavioral Statistics, Vol. 29, No. 3 (Autumn, 2004), pp.
317-331
Published by: American Educational Research Association and American Statistical Association
Stable URL: http://www.jstor.org/stable/3701356 .
Accessed: 16/01/2015 00:29

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

American Educational Research Association and American Statistical Association are collaborating with
JSTOR to digitize, preserve and extend access to Journal of Educational and Behavioral Statistics.

http://www.jstor.org

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Journalof Educationaland BehavioralStatistics
Fall 2004, Vol. 29, No. 3, pp. 317-331

The Superiority of Simple Alternatives


to Regression for Social Science Predictions
Jason Dana
Robyn M. Dawes
CarnegieMellon University

Some simple, nonoptimizedcoefficients(e.g., correlationweights, equal weights)


werepitted against regressionin extensivepredictioncompetitions.Afterdrawing
calibrationsamplesfrom large supersetsof real and syntheticdata,the researchers
observedwhich set of sample-derivedcoefficientsmade the bestpredictionswhen
applied back to the superset. WhenadjustedRfrom the calibration sample was
< .6, correlationweights were typicallysuperiorto regression coefficients,even
if the samplecontained100 observationsperpredictor;unitweightswere likewise
superior to all methodsif adjustedR was < .4. Correlationweights were gener-
ally the best method.It was concludedthat regressionis rarely usefulfor predic-
tion in most social science contexts.

Keywords:forecasting, improperlinear models,prediction

As a folk practice,the use of simple combinationfunctionssuch as unit weights or


correlationweights as alternativesto regressioncoefficients has existed for some
time. Although these alternatives yield inferior in-sample predictions, various
authors(e.g., Dawes & Corrigan,1974; Goldberg,1972) have pointedout thatthey
may be more robust. Specifically, when regression coefficients are applied to a
cross-validationsample, the new multiple correlationcan be much smaller.This
cross-validatedloss of efficiency can be pronouncedwhen the calibrationsample
is small or when predictionerroris sizeable, situationsnot uncommonto research
in the social sciences. Correlationand equal weighting schemes are less prone to
such losses (a priorichosen equal weights have no expected loss), often resulting
in superiorpredictions to regression in cross-validationsamples. Nevertheless,
regressioncoefficients remainubiquitous,even in applicationswheretheirperfor-
mance may be inferior.
In this article, we pit some simple, "homespun"sets of coefficients against
regressionto see which makemorerobustforecastsundervaryingpopulationchar-
acteristicsand calibrationsample sizes. We employ real datasetsthat are publicly
availableas well as severalsimulateddatasets.Resultsfromthe formerensurethat
data have not been invented to favor a particularmethod, while results from the

The authorsthankRob Kass, Scott Moser, and JohnPattyfor conversionshelpful to this article.

317

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes
latterprecludesuch technicalobjectionsas model misspecificationor violationsof
statisticalassumptions.In both cases, we begin with large datasetsand then ran-
domly drawcalibrationsamplesfromwhich the variouscoefficientsarecalculated.
The coefficients are then appliedto the supersetin orderto determinewhich pro-
duce the largest correlationbetween actual and predictedvalues. Based on our
results, we make prescriptionsaboutwhen it is wise to use regressioncoefficients
for prediction,which is almost never in typical social science contexts.
A variety of researchand forecastingcontexts requiresa choice of coefficients
thatwill validatewell in futurecases. Considerthe following examples.
In personalityor aptitudemeasurement,scoresareoftenweightedby theirregres-
sion coefficients as obtainedin some validationsample, or else variablesin a bat-
terymaybe chosenbecausethey yield the largestmultiplecorrelationin the sample.
More generally,compositevariablesare often createdthroughsophisticatedstatis-
tical techniques ratherthan equally combining scores the researcherconsiders
importanta priori.Once chosen, the weights and/or items in the batteryare often
not subjectto change.Thus, the need for a robustchoice of weights is supported.
Researchhasrepeatedlydemonstratedthe superiorityof statisticalpredictionrules
overinformal"clinical"judgmentin forecastsinvolvinghumanoutcomes(see Grove
& Meehl, 1996). These rules are typically simple, linear combinationsof a few
importantvariables.Many authors(e.g., Swets, Dawes, & Monahan,2000) argue
stronglyfor theuse of statisticalrulesin suchimportantpredictionsas college admis-
sions or paroledecisions.To make importantdecisions with realhumancosts, a set
of weights (not simply variables)thatpredictsoptimallyin futurecases is crucial.
Several authorshave pointed out shortcomingsof regressioncoefficients, par-
ticularlywhen used to predict.Before proceeding,we shoulddistinguishour main
point from those that have alreadybeen thoroughlyaddressed.First, the issue of
superiorvalidationgoes beyond the notion of shrinkage.A shrinkageestimate is
the expectation of the population multiple R, which is smaller than the sample
value because of the problemof overfittingavailabledata.This estimate,however,
assumes no constrainton the values of the populationcoefficients. The shrunken
R prophecy can be much larger than the R that would obtain using "these here
weights"derivedfrom the sample.Because we never have trueweights, it is these
here weights that we need to worry about if we are at all concerned with pre-
scribing a prediction equationfor future use. Although least squarescoefficients
areunbiased,they can be inefficient.Guion (1965) gives a strikingexample of the
differencebetween a shrunkenand a cross-validatedR:

Consider,forexample,the sadstoryof McCartyandFitzpatrick (1956).Usingthe


Wherry-Doolittle a shrunken
theyselectedabatteryandestimated
technique R of .92.
Whentheycross-validated on a secondsample,however,theyfoundthecorrelation
to be -.21! (p. 166)

The relativeforecastingefficiency of alternativesto regressioncoefficientsthatwe


exploreherehas receivedmuchless attention,althoughworkby Gigerenzerandhis
318

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternativesto Regressionfor Social Science Predictions

colleagues (Gigerenzer& Todd, 1999) has resurrectedthe issue by demonstrating


thateven singlevariablepredictionstrategiesmayoutperformregressioncoefficients
in cross-validationsamples.
Second, we addressforecastingefficiency independentof the widely discussed
problemsof selecting variablesthroughregressionanalysis, such as throughstep-
wise regression(see Armstrong,1985 for a discussion).Selectionproblemspointout
the shortcomingsof post hoc analysisandcapitalizingon chance,butdo not account
for the inefficiencywe document.All of ourpublicdatawere preselected,while the
simulationdataarecreatedfroma fully specifiedlinearmodel.However,we demon-
stratethatoptimalsamplecoefficientsmay performrelativelypoorlyon new data.

Methods
Notation
Withoutloss of generality,we discuss datain standardscore formthathave been
codified such that each predictorcorrelatespositively with the criterion.The fol-
lowing notationis used:the samplecorrelationmatrixamongpredictorsis S, in the
populationit is 2. The samplevectorof correlationsbetweenthe predictorsandthe
criterionis r andhas m (numberof predictors)elements,while the populationana-
log is v. Sample-derivedleast squarescoefficients are a vector b whose elements
are b coefficients,the populationcoefficientsarePcoefficients,andw is any vector
of coefficients.Finally, the sample multiplecorrelationis R, while the population
multiplecorrelationis p, and the populationcorrelationbetween the criterionand
the predictedvalues resultingsample-derivedcoefficientsis "validatedR."

AlternativeCoefficients
In additionto least squares,we considerthe following':
1. Correlation Weights. Coefficients are each predictor'szero-ordercorrela-
tion with the criterion.Previousauthors(Goldberg,1972;Marks,1966) have found
some supportfor favorablecross-validationof correlationweights over regression
coefficients.
2. Unit Weights. Coefficients are either 1 or -1 on each predictor.Wainer
(1976) showedthatthe loss of predictablevariationof a randomvariableusingequal
weights ratherthan ordinaryleast squaresis theoreticallyquite small (see correc-
tions by Wainer,1978 and Grove, 2002), and Grove shows thatthis loss is smaller
when predictorsare correlated.Unit weights can be chosen post hoc accordingto
observedcorrelationsor a prioriaccordingto the researcher'stheory.As the former
methodrarelyoutperformscorrelationweights, and the latteris less redundant,we
investigate only the lattermethod. Note that this choice entails the possibility of
weightinga predictor-1, even if all sampleobtainedcorrelationswith the criterion
arepositive. It is also possible in some samplesthatproperlycalibrateda prioriunit
weights will yield negativevalues of R.
3. Take the Best Weights. Gigerenzerand Todd (1999) describeone-variable
decision rules that may cross validatebetterthan regression,unit, and correlation
319

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes

weights. Take the best involves an orderedsearchof variablesaccordingto their


perceivedvalidity with a rule to stop when a variableis found thatadequatelydis-
criminates.To ourknowledge, no formalcriterionhas been given for invoking the
stopping rule. Thus, we approximateby choosing the predictormost correlated
with the criterionin the sample and weighting others0. If this strategyis not pre-
cisely take the best, it does yield the best single variablepredictionas assessed by
the availablesample,andhence, yields an upperboundon takethe best. The inclu-
sion of takethe best does not representa test of whetherit is a fast andfrugalheuris-
tic. Rather,we approximateit to investigatehow generalis the situationwhere one
variableoutpredictsthe coefficients consideredhere.
Ourcomparisonsinvolve additivelinearmodels;we justifythis focus by notingthat
little evidence exists to show thatforecastscan be improvedby using complex or
nonlinearmodels (Armstrong,1985).

Samplingand ValidationProcedures
A modification of the traditionalcross-validationprocedurewas used. Rather
than creating two samples, we began with large datasets that we assume yield
population parameters. We then randomly drew smaller samples from these
datasets. From each dataset, we sampled 300 times with replacementafter each
sample, including 50 each of sizes 5m, 10m, 15m, 20m, 30m, and 50m, where m
is the numberof predictors.We then applied the proceduresdescribed above in
each of the samples to obtain the sample-derivedcoefficients. To determinethe
directions of the unit weights, the researchersused the majorityjudgment of a
small convenience sample of colleagues (all directions were correct). All coef-
ficients were then applied to the "population"superset(including the sample) to
obtain values of validated R, which is computationallysimplified by using the
formula:

W'V
validatedR =

We chose to validateon the supersetratherthanin anothersampleof equal size


because validation in the latter case depends, in part, on the idiosyncrasies of
the new sample. By using the largest set of datapossible for validation,we elimi-
nate this secondary source of error. We find this strategy sensible because the
researchershouldhope to maximize"true"forecastingaccuracy.Becausethis strat-
egy does not removethe samplefrom the superset,we note thatthis is favorableto
regressioncoefficients.
Although we reportour results in terms of validatedR, the qualitativeconclu-
sions aboutthe relative performanceof the coefficients would remainunchanged
if we adoptedthe metricof minimizingout-of-samplesquarederrors.Forthe alter-
native coefficients considered here, we could simply rescale each by a factor a,
given by:
320

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternativesto Regressionfor Social Science Predictions

=
w'r
a
w'Sw

If one set of coefficientsproducesa largervalidatedR thananother,thenit will pro-


duce smallersquarederrorswhen scaled in this fashion.

Descriptions of Public Datasets


Statisticalpropertiesof the public datasetsare summarizedin Table 1. Descrip-
tions of the datasetsfollow (for scaled variables,the positive directionis given in
parentheses).
The Abalone dataset includes measurementsof abalone used in an original
study (Warwick,Sellers, Talbot,Cawthorn,& Ford, 1994). The criterionwas the
age of abalonepredictedfrom seven measurements:shell weight, diameter,height,
length, whole weight, viscera weight, and shuckedweight.
The NFL datasetincludes all National Football League game outcomes from
1981 to 1995, excludingstrikeyears(Carroll,Gershman,Neft, Thorn,& Silverman,
1999). The criterion,final score difference (home team minus visiting team), was
predictedfrom differences in 10 team statistics:points per game, points allowed
per game, passing rating,interceptionsthrown,total yards of offense, total yards
allowed, percentageof opponents'plays endingin a sack, opponents'averagepunt
return,opponents'averagekickoff return,and percentageof plays penalized.
The ABC datasetincludes a subset of results from a randompolling of house-
holds by ABC (ABC News, 2002). The criterionwas the answerto the question:
"How confident are you that Osama Bin Laden will be capturedor killed?" on a
4-point Likertscale. Selected predictorswere two poll questions:"Do you regu-
larly display an Americanflag?"answeredyes/no (yes) and "How proudare you
to be an American?"on a 4-point Likertscale (proud)andthreedemographicvari-
ables (sex (male), age, and level of education).
The NES datasetincludesa subsetof resultsof a telephonepoll duringthe 1988
presidentialprimaryelections (Miller, 1999). Favorabilityratingof the Republican
partyon a 0-100 scale was predictedfromsix poll questions.The firstquestionasked
whetherthe nation'seconomywas betteror worse thanthe yearbefore(better).The

TABLE1
Characteristicsof the Real Datasets
Dataset N k P v Vector xixja
Abalone 4,177 7 .73 .63 .58 .56 .56 .54 .50 .42 .89
NFL 3,057 10 .54 .46 .43 .37 .34 .33 .27 .21.07 .05 .05 .21
ABC 955 5 .35 .32 .20 .06 .04 .02 .08
NES 1,910 6 .35 .26.17.15.15.13.12 .11
WLS 6,385 5 .20 .13.11.10.10.10 .15
a
xixjis themeanstrength
of correlation
amongthepredictors.

321

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes
next four were statementsfor which respondentsindicatedagreementon a Likert
scale: "If people were treatedmore equally in this country,we would have many
fewer problems"(disagree);"Changesin lifestyle, like men and women living to-
getherwithoutbeingmarried,aresignsof increasingmoraldecay"(agree);"Wehave
gone too farin pushingequalrightsin this country"(agree);"Weshouldbe moretol-
erantof people who choose to live accordingto theirown moralstandards,even if
they are very differentfrom our own" (disagree).The finalquestionaskedwhether
the individualwas financiallybetteror worse thanthe yearbefore (better).
The WLS datasetincludes a subset of results from the Wisconsin Longitudi-
nal Survey (1993). The criterionwas a measureof occupationalprestige in 1992.
Predictorswere self-rated physical health compared to others of same age and
gender (healthier), scores on personality scales measuring depression (less de-
pressed), extraversion(more extraverted),neuroticism(less neurotic), and num-
ber of children.

Results for Public Data


The resultingmean validatedR for each of the methods in each of the respec-
tive datasetsis presentedin Figure 1 as a functionof sample size. To reiterateour
point that the relative performanceof regression is a separateissue from model
selection, we also used least squaresfrom the submodelwith the best Mallows's
Cp statistic (Mallows, 1973). In this way, it is less likely that extraneouspredic-
tors are considered.The least squarescoefficients results from using just the best
Cp models are also includedin Figure 1. Regularitiesin the dataemerged:
1. The requisite sample size for regression coefficients to be superior
depends on prediction error in the model. That is, it dependson the value of p
in the populationfrom which one is sampling,2a situationhelped very little by
using Cp analysis. One can see clearly in Figure 1 thatvalidatedR as a functionof
sample size is steeper for regressionthan it is for correlationweights or take the
best, and is, of course, flat for unit weights. To understandthe effects of sample
size andp on estimationbetter,we examinedthe 2 x 2 factorialdesign with the dif-
ference in validatedR between regressionand unit weights as the dependentvari-
able. We found strong effects for both p ((62 = .543) and level of sample size
(62 = .18) butnot for the interaction(62 = .004). The muchstrongereffect of p noted
here suggeststhatperhapstoo muchattentionis given to samplesize relativeto pre-
dictionerrorwhen choosing coefficients.The insensitivityof unit weights, andto a
lesser degree correlationweights, to the increasingefficiency of estimationas the
strengthof the linearrelationshipincreasesis dramaticallyreflectedin the results.
In the datasetswhere p was smallest,NES and WLS, regressioncoefficients were
not superior when calibration samples were sized 50m. In the abalone dataset,
where p was largest, regression coefficients were always superior.These results
speak unfavorably to the use of regression coefficients in most social science
applications, where the NFL dataset would representthe unusually high end of
predictability.Even in that dataset, regression coefficients were not superiorin
322

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternatives to Regression for Social Science Predictions

Abalone
0.75
bestcp

0~.6
Sake the best

-0

Sm 10m m 20m 30m 0m

NFL
best cp essi
...
.. correlation
unit

takethebest
0A4

...
5m 10m 15m 20m 30m 50m

ABC NES
0e35 C
..on ...iunon
Utti a eat

atthe akeebes
0 22 m 10 5m 20m 30m 5bes 5m 15m 20m m 50m r

20
005

Sm 10m 1 m 20m 30m m

Sample Size
FIGURE 1. Mean validated R as a function of sample size for each public dataset.

323

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes

samples smaller than 30m. Furthermore,any superiorityin that datawas modest,


indeed;the validatedR resulting from regression coefficients was never as much
as .03 largerthanthatobtainedby using correlationweightsfrom any one sample.
2. The small loss of predictable variation caused by using equal weights was
empirically supported. Althoughequal weights were not the most efficientof the
alternativemethods,they were practicallyas good. The decrementin explainable
variationon validationcaused by using unit weights as opposedto the best coeffi-
cients rarelyexceeded 2%or 3%,except in the abalonedataset.In Wainer's(1976)
words, estimatingcoefficients apparently"don'tmake no nevermind,"or at least
it doesn't for most social science data.
3. The success of single variable prediction depends strongly on the char-
acteristics of the data. Gigerenzerand Todd (1999) suggest that single variable
predictorsareoften morerobustthansuch otheralternativemethodsas correlation
weights. Ourresultsgenerallydo not supportthis conclusion. In the NFL andNES
datasets,take the best was never the best method, and in the NFL data, only one
variablecould have predictedbetterthanone of the 300 sets of correlationweights.
Not surprisingly,takethe best seems most effective whenone predictorhas a valid-
ity approachingp; thatis, when only one predictoris trulymeaningful.
SimulationProcedures
To test the sensitivityof these resultsto variousparametersas well as to possible
violationsof statisticalassumptions,a largesimulationwas runusing syntheticdata.
Datasetsof 5,000 cases were createdusing a methodsimilarto thatfirstdescribed
by Wherryet al. (1965). A j(factors) x m + 1(variables)matrixof factorloadings
was definedfor each dataset,which was then premultipliedby a 5,000 (cases) xj
(factors)matrixof normaldeviatesrepresentinglatentscores. The resultingmatrix
was then addedto a second matrixwhose columns were normaldeviates scaled by
the "uniqueness"values,

u = J1-h2.

(where h2 is the sum of squaredfactor loadings) of the correspondingvariables.


This method allows for the constructionof data with specific intercorrelations.
Thus, we defineda criterionvariablein linearfashion on a set of predictors.
For each matrixof factorloadings,j was set equalto m. This scheme allows for
perfect colinearity among predictors, complete independence, and all cases in
between. The m loadings of the criterionvariablewere the squareroots of a draw
from a Dirichlet distribution(essentially an m dimensionalextension of the beta
distribution).To varythe level of colinearitywith one parameter,we set each diag-
onal element of the m x m submatrixof predictorloadingsto the same amountd in
[0, 1]. Each of the remainingm - 1 column elements were set to

l-d

324

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternativesto Regressionfor Social Science Predictions

Finally, to introduceerrorinto the model, each of these elements was scaled by a


factort + x, wheret was a fixed value chosen from [0, .5], andx was a randomvalue
from the uniform interval (0, .5). The expected value of p is increasing in t. As
d increases,the predictorsapproachindependence.
We employed seven levels of t, eight levels of d, and five levels of m-3, 4, 5,
8, and 10-and createdfive datasetsper combinationfor a total of 1,400 datasets.
Thus,the models sampledbroadrangesof m, p, andmulticolinearity.We included
additionalsample sizes of 75m and 100m, because of the possibility that in data
where p was small, regression coefficients would still be outperformedat 50m.
Because the datawere definedwith positive manifold,unit weights were always a
vector of is.
We also included in the simulation analysis a more principled option, ridge
regressioncoefficients, which are computedby adding an amountk to each diag-
onal element in S. Although we often cannot specify precise priorson regression
coefficientsfor social science data,simple assumptionscan be made aboutthe lim-
ited ability to predictthe criterion.A Bayesian motivationfor ridge regressionis
to startwith a priorbelief thatb weights will be distributedabouta mean of zero.
As the ridge coefficient vector is shorterin Euclidiandistancethan b, we can say
it "reinsin" coefficients towardzero. We employed a data-dependentestimate of
k given by Brown (1993) and motivatedin the Bayesian mannerdescribedabove,
which in our case can be reducedto:
(m - 2)2
b'Sb

whereG2 is the unbiasedestimate of residualvariance.

Results for Synthetic Data


The strongdependencebetween the performanceof regressioncoefficients and
p was confirmed.When p was small, a situationnot uncommonin social science
research,theirrelativeperformancewas surprisinglypoor.Even with "respectable"
samplesizes of 100m,b coefficientsproduceda smallermeanvalidatedR thancor-
relationweights when p < .6. When p < .4, unit weights were superiorto all other
methods,even with samplesizes of 100m.3If the sampleswere drawnfromdatasets
in which p was above .9; however, least squarescoefficients were superiorwhen
sampleswere of size 5m. An analysisof the effect sizes for samplesize ((2 = .462)
and p (62 = .456) on regressioncoefficients' superiorityrevealedthe importanceof
p relativeto samplesize to be moremodestthanfoundin the publicdata;the results
here suggest they are about equally importantin determiningwhetherregression
coefficientswill outperformsimple unit weights.
An orderlypatternwas noted:unit weights typicallyproducedthe largestvalues
of validatedR at the smallest sample sizes. As sizes increased,a "crossover"point
occurred at which correlation weights became superior and remained superior
thereafter.Finally, when sample sizes were sufficiently large, a second crossover
325

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes
occurredafterwhich least squaresandridgecoefficientswere superior.The perfor-
mance of ridge and least squarescoefficients was similar, althoughridge coeffi-
cients weakly dominatedthe latterin thattheirmean validatedR's were at least as
large and often larger at all sample sizes. At what point, if at all, these crossover
points occurreddependedon the value of p in the dataset.
The researchercan estimate p, but does not know its value. Here, we reportthe
results conditionalon adjustedR using the Wherry-Lordformula (P) so that the
researcherdoes not need to rely on an intuitiveestimate.4Figure2 depictsthe mean
validatedR for each set of coefficients as a functionof samplesize at differentval-
ues of 3 for j > .3. Take the best is excluded because it was dominatedby at least
one other method at each level of P. Perhapssurprisingly,the qualitativeresults
were sensitive to neitherm nor colinearityamongpredictorsin the data;cross tab-
ulating the result in Figure 2 with either variable leads to the same conclusions.
The fact thatthese resultshold across levels of m suggests thatthe crossoversam-
ple sizes are betterexpressedas ratios of n to m thanas n.
The readermay question how these results depend upon whether sample co-
efficients were significantlydifferentfrom equality. A statisticalrejectionof this
omnibusassumptionis increasinglylikely in the size of the samplein questionand
decreasinglylikely in the residualsums of squares.Such an analysisis thusredun-
dant;where the assumptionb = b2 =... bm is rejected,regressioncoefficients are
likely to performmuchbetter,but the researcheralso knows this fromR andn, two
values requisiteto conductingthe test.

Discussion and Conclusion


A large psychological literaturein the traditionof the lens model has attempted
to capturethejudgmentskills/policies of decision makersby regressingtheirpre-
dictions on the cues availableto them (see Hammond,1996). Hammondsuggests
an intuitive "quasirationality"thathumanspossess, notingthatwhen multiplefal-
lible indicatorsare available, a tradeoff between robustnessand precision often
exists. The forecastingsuccess of the somewhatcoarseweightingschemes we con-
sider here might temptone to interpretour resultsas supportingquasirationality.
We warnagainstthisinterpretation. It is unlikelythatpeoplesomehowintuitively
equal-weightandeven farless likely thatthey correlationweight. GroveandMeehl
(1996), for example, note that simple tallies of diagnostic indicators have out-
performedexpertjudges, even when the indicatorsare the vague impressions of
the judges themselves. This is precisely because humanjudges do not behave like
equal-weightersthatwe must even make a point aboutthe effectiveness of such a
policy. Considerfirstthatone must mentallytransforminputsto a common scale;
for example, percentile,before addingthem can be profitable,which can be diffi-
cult given thatthe distributionsof the inputscan be disparate.5Even worse, in order
for people to have adaptedsuch a skill, it must have made superior"pointpredic-
tions"in ecologicalcontexts.Unit andcorrelationweightpredictions,however,may
not pass throughany dataat all; to makeprecisepredictions,weights (or datacom-
binations)mustbe scaled to best positionthe intercept.The answerto this problem,
326

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternatives to Regression for Social Science Predictions

0.95

o" regression
80
-- - ridge regression

075.5
"" correlationweights

7 unitweights
.8
070 -
--'-
- - - .7-

.6- .7
a
Level p

--
-- 4-

0 35

.3 -
030 --------•-
.4

5m iOnm 25Dm320 m 50m 75m 100m

Samplesize
FIGURE2. Mean validatedR's as afunction of samplesizefor syntheticdatasets,grouped
by level of prediction error.

327

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes

given in the samplingand validationproceduressection, is sufficientlydifficultto


derive as to precludethatpeople do this mentally.
The alternativepredictionmethodspresentedin this articlequite often defeated
ordinaryleast squaresin our competition.To make the resultsmore digestible, it
may be helpful to state a simple rule for when regression coefficients should be
used for predictionpurposes.A greatdeal of speculationhas surroundedthe issue
of minimumsample size requirementsfor conductingmultiple regressionanaly-
ses. Numerousrules of thumbabound,often of mysteriousorigin (listed in Green,
1991). The considerablevariabilityof these rules is probablyattributablein partto
the ambiguityof the question;samplesize prescriptionshave been madefor a vari-
ety of purposes.Green(1991) made suggestions based on power analyses for the
hypotheses one wishes to test; for example, submittingwith reservationthat N >
50 + 8m is desirableif one wants to test the null hypothesisR = 0. Nunnally and
Bernstein (1994) made recommendationsbased on the determinationof the un-
biased estimateof the populationR2:

If thereareonly 2 or 3 independentvariablesandno preselectionis madeamong


them,100ormoresubjectswillprovidea multiplecorrelation withlittlebias.Inthat
case,if thenumberof independent variablesis 9 or 10, it will be necessaryto have
from300 to 400 subjectsto preventsubstantial
bias.(p. 189)

More recently, rules of 40m for a stable cross-validity coefficient and 100m for
estimating the population regression equation have been suggested (Osborne,
2000).
We throwourown hat into the ring, arguingfrom a ratherpracticalperspective.
Regressioncoefficients should not be used for predictionsunless erroris likely to
be extremely small by social science standardsor sample sizes will be largerthan
100 observationsto predictors.In otherwords,regressioncoefficientsshouldalmost
never be used for social science predictions.Simple alternativeswill usuallyyield
betterpredictions.Schmidt(1971) made a similarpoint aboutthe cross-validsupe-
riorityof unit weights:

Sincemanystudiesemployingregressionweightsreported arechar-
in theliterature
acterizedby samplesizes belowthe criticalvaluespresentedin Table3, it is con-
cludedthatmanypsychologistsin appliedareasareroutinelypenalizingthemselves
by theiradherence
to [leastsquaresregression].
(p. 710)

His results typically required sample sizes close to 20m for least squares to be
superiorto unit weights. Ourmore conservativeestimatesregardingleast squares
result in partfrom the considerationof correlationweight alternatives,which are
often superiorto unit weights, and in partbecause we conditionour recommenda-
tions on p.
If the problemis to choose a sample size a priorito obtaineffective regression
coefficients, the researchershould assume a value of p based on his or her theory.
328

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternativesto Regressionfor Social Science Predictions
Forexample,when predictingextratestbehaviorsfrompersonalitymeasures,p > .6
is rare.The results in Figure2 hold for p as well.
Numerouspleasto investigatorsto consideralternativesto regressioncoefficients
seem to have fallen on many sets of deaf ears. We have tried to bolster the argu-
ment in two ways. First, we employed sampling from several real social science
datasets-which we consider to be generalizableto an audience concernedwith
practicalissues-as well as employinga largenumberof cleanersyntheticdatasets.
Second,on practicalandempiricalgroundswe triedto establishsimplerulesfor the
minimumnumberof cases requiredwhen using regressioncoefficientsfor predic-
tion. These rules, which are based on actual analysis ratherthan "guesstimates,"
requireno moreinformationthanis availablewhenregressionis runon sampledata.
We hope thatthese efforts will help improvepredictionin the social sciences.

Notes
1One of us (RMD) had the idea of
retainingonly those off diagonalelements of
S that could change the sign of a regression coefficient (i.e., those pertinentto a
suppression).This strategy,which we call vanishingcovariances,was also exam-
ined. However,because ourdataarenot characterizedby any strongsuppressions,
we do not include its results. It usually (but not always) reduced to correlation
weighting, and, thus, usually yielded the same accuracy(but sometimes worse).
2 One simple intuition behind the increased efficiency of regression as error

decreasesis to imagine the scatterplotof the linearcombinations(Y) on the crite-


rion. If all of the points arenearthe best fit line, then no matterwhatgroupof them
is sampled,the fit line can hardlychange.
3 Unitweightswere relativelybetterin the simulateddata,in part,becauseall pre-
dictorswere actuallyimportant.In the public data,the authors'judgmentselected
variablesthatmay or may not have been important,a damagingsituationwhen all
coefficientsare set equal.
4 Of course,the validationof weightsdependson p. Because small P values from

largersamples are indicativethat p is actuallysmall, the functionsin Figure2 are


decreasingat the lowestP values, which would not happenif we presentedthe func-
tions conditionalon p.
5 Imagine doing this, for example, when evaluating an applicantto graduate
school based on GPA and GRE scores. How many among us mentallytransform
these scoresso thatthey canbe appropriately addedto learnsome optimalcut value?

References
ABCNews/TheWashington Post.(2002).ABCNews/Washington PostSix MonthsAfter
September 11thPoll,March2002(Computer PA:Taylor
file).ICPSRversion.Horsham,
NelsonSofresIntersearch 2002.AnnArbor,MI:Interuniversity
[producer], Consortium
forPoliticalandSocialResearch[distributor].
Armstrong,J. S. (1985). Long-rangeforecasting: From crystal ball to computer(2nd ed.).
NewYork:WileyInterscience.

329

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Dana and Dawes

Brown, P. J. (1993). Measurement,regression, and calibration. London:Oxford Univer-


sity Press.
Carroll,B., Gershman,M., Neft, D., Thorn,J., Silverman,M. (Eds.).(1999). TotalfootballII:
The official encyclopedia of the nationalfootball league (2nd ed.). New York: Harper
Collins.
Dawes,R. M., Corrigan,B. (1974). Linearmodelsin decisionmaking.PsychologicalBulletin,
81, 95-106.
Gigerenzer,G., Todd, P. M. (1999). Simpleheuristicsthat makeus smart.London:Oxford
UniversityPress.
Goldberg,L. R. (1972). Parametersof personalityinventoryconstructionandutilization:A
comparisonof predictionstrategiesandtactics.MultivariateBehavioralResearchMono-
graph, 7, No. 72-2.
Green, S. B. (1991). How many subjects does it take to do a regression analysis? Multi-
variate BehavioralResearch,26, 499-510.
Grove, W. M. (2002). Correction and extension of Wainer's "Estimatingcoefficients in
linear models: It don 't makeno nevermind." Manuscriptsubmittedfor publication.
Grove,W. M., & Meehl,P. E. (1996). Comparativeefficiencyof informal(subjective,impres-
sionistic) and formal (mechanical, algorithmic) prediction procedures:The clinical-
statisticalcontroversy.Psychology,Public Policy, and Law, 2, 293-323.
Guion, R. M. (1965). Personnel testing. New York:McGraw-Hill.
Hammond,K. R. (1996). Humanjudgmentand social policy. New York:OxfordUniversity
Press.
Mallows, C. L. (1973). Some commentson CP. Technometrics,15, 661-675.
Marks,M. R. Twokindsof regressionweightswhichare betterthanbetas in crossedsamples.
Paperpresentedat the meeting of the AmericanPsychological Association, New York.
(1966, September).
Miller, W. E., & NationalElection Studies. (1999). NationalElection Studies, 1988 Super
Tuesday Study [dataset]. Ann Arbor, MI: University of Michigan, Centerfor Political
Studies [producerand distributor].
Nunnally, J. C., Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York:
McGraw-Hill.
Osborne,J. W. (2000). Predictionin multiple regression.Practical Assessment,Research
& Evaluation,7(2). [Availableonline: http://ericae.net/pare/getvn.asp?v=7&n=2.]
Schmidt,F. L. (1971). The relativeefficiency of regressionandsimpleunitpredictorweights
in applied differential psychology. Educational and Psychological Measurement,31,
699-714.
Swets, J. A., Dawes, R. M., & Monahan,J. (2000). Psychologicalscience can improvediag-
nostic decisions. Psychological Sciences in the Public Interest,1(Suppl. 1).
Wainer,H. (1976). Estimatingcoefficients in linear models: It don't make no nevermind.
Psychological Bulletin,83, 213-217.
Wainer,H. (1978). On the sensitivity of regressionand regressors.Psychological Bulletin,
85, 267-273.
Warwick,J. N., Sellers,T. L., Talbot,S. R., Cawthorn,A. J., & Ford,W. B. (1994). The Pop-
ulationBiology of Abalone(Haliotisspecies) in Tasmania.I. BlacklipAbalone(H. rubra)
from the NorthCoast and Islandsof Bass Strait.Sea FisheriesDivision, Tech. Rept. 48,
ISSN 1034-3288.[Availableonline:http://www.cs.toronto.edu/-delve/data/datasets.html].
Wherry,R. J., Sr., Naylor, J. C., Wherry,R. J., Jr., & Fallis, R. F. (1965). Generatingmul-
tiple samples of multivariatedata with arbitrarypopulationparameters.Psychometrika,
30, 303-313.

330

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions
Alternativesto Regressionfor Social Science Predictions
Wisconsin longitudinal study (WLS) [graduates]. (1992/93). [machine-readable data
file]. Hauser,R. M., Sewell, W. H., Hauser,T. S., Logan, J. A., Ryff, C., Caspi, A., &
MacDonald,M. M. [principalinvestigator(s)].Madison, WI: Universityof Wisconsin-
Madison, Data and ProgramLibraryService [distributor],available at http://dpls.dacc.
wisc.edu/WLS/SB6281.htm.

Authors
JASONDANA is a doctoralstudent,Departmentof Social andDecision Sciences,208 Porter
Hall,CarnegieMellonUniversity,Pittsburgh,PA 15213;jdd@andrew.cmu.edu.His inter-
ests are in behavioral economics, social preferences, the use of clinical vs. actuarial
judgment, and applicationsto ethics.
ROBYN M. DAWES is the CharlesJ. Queenan,Jr. University Professorof Psychology,
Departmentof Social andDecision Sciences,208 PorterHall,CarnegieMellonUniversity,
Pittsburgh,PA 15213;rdlb@andrew.cmu.edu.

Manuscriptreceived April 29, 2003


Revision received September5, 2003
Accepted October 15, 2003

331

This content downloaded from 130.207.93.124 on Fri, 16 Jan 2015 00:29:30 AM


All use subject to JSTOR Terms and Conditions

You might also like