Professional Documents
Culture Documents
Article 1
Article 1
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=astata.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that
promotes the discovery and use of these resources. For more information about JSTOR, please contact support@jstor.org.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of Business & Economic Statistics.
http://www.jstor.org
? 1995American
Statistical
Association Journalof Business&Economic July1995,Vol.13,No.3
Statistics,
ComparingPredictive Accuracy
FrancisX. DIEBOLD
Department of Pennsylvania,
of Economics,University PA19104-6297,and
Philadelphia,
NationalBureauof EconomicResearch,Cambridge, MA 02138
RobertoS. MARIANO
of Pennsylvania,
of Economics,University
Department PA 19104-6297
Philadelphia,
Predictionis of fundamentalimportancein all of the sci- and Chinn and Meese (1991) stressed direction of change,
ences, including economics. Forecast accuracyis of obvi- Cumbyand Modest (1987) stressedmarketand countrytim-
ous importanceto users of forecasts because forecasts are ing, McCulloch and Rossi (1990), and West, Edison, and
used to guide decisions. Forecast accuracy is also of ob- Cho (1993) stressedutility-basedcriteria,and Clementsand
vious importance to producers of forecasts, whose repu- Hendry(1993) proposeda new accuracymeasure,the gen-
tations (and fortunes) rise and fall with forecast accuracy. eralizedforecast-errorsecond moment.]Moreover,we allow
Comparisonsof forecast accuracyare also of importanceto for forecasterrorsthatarepotentiallynon-Gaussian,nonzero
economists more generally who are interestedin discrim- mean,seriallycorrelated,andcontemporaneouslycorrelated.
inating among competing economic hypotheses (models). We proceedby detailingour test proceduresin Section 1.
Predictiveperformanceand model adequacy are inextrica- Then, in Section 2, we review the small extant literatureto
bly linked-predictive failure implies model inadequacy. provide necessary backgroundfor the finite-sampleevalu-
Given the obvious desirabilityof a formal statisticalpro- ation of our tests in Section 3. In Section 4 we provide an
cedure for forecast-accuracycomparisons,one is struckby illustrativeapplication,andin Section 5 we offer conclusions
the casual mannerin which such comparisonsare typically and directionsfor futureresearch.
carried out. The literaturecontains literally thousands of
forecast-accuracycomparisons; almost without exception, 1. TESTINGEQUALITY
OF FORECAST
point estimates of forecast accuracyare examined, with no ACCURACY
attemptto assess their sampling uncertainty.On reflection,
the reason for the casual approachis clear: Correlationof Consider two forecasts, {it}, and {}fi, of the time
forecasterrorsacross space and time, as well as several ad- series {y,}ri. Let the associated forecast errorsbe
{ei,}r,
ditionalcomplications,makesformalcomparisonof forecast and {et},T1. We wish to assess the expected loss associated
accuracydifficult. Dhrymeset al. (1972) andHowrey,Klein, with each of the forecasts(or its negative,accuracy).Of great
and McCarthy(1974), for example, offered pessimistic as- importance,and almost always ignored, is the fact that the
sessments of the possibilities for formaltesting. economic loss associated with a forecast may be poorly as-
In this articlewe proposewidely applicabletests of the null sessed by the usual statistical metrics. That is, forecastsare
hypothesisof no differencein the accuracyof two competing used to guide decisions, and the loss associated with a fore-
forecasts. Our approachis similar in spirit to that of Vuong cast errorof a particularsign and size is induceddirectlyby
(1989) in the sense that we propose methods for measuring the natureof the decision problemat hand. Whenone consid-
andassessing the significanceof divergencesbetweenmodels ers the varietyof decisions undertakenby economic agents
anddata. Ourapproach,however,is baseddirectlyon predic- guided by forecasts(e.g., risk-hedgingdecisions, inventory-
tive performance,and we entertaina wide class of accuracy stockingdecisions, policy decisions, advertising-expenditure
measuresthat users can tailor to particulardecision-making decisions,public-utilityrate-settingdecisions, etc.), it is clear
situations.This is importantbecause, as is well known, re- that the loss associated with a particularforecast erroris in
alistic economic loss functions frequently do not conform generalan asymmetricfunctionof the errorand,even if sym-
to stylized textbook favorites like mean squared predic- metric, certainlyneed not conform to stylized textbookex-
tion error(MSPE). [For example, Leitch and Tanner(1991) amples like MSPE.
134
Journalof Business&Economic
Statistics,July1995 135
Thus, we allow the time-t loss associated with a fore- To motivate a choice of lag window and truncationlag
cast (say i) to be an arbitraryfunction of the realizationand that we have often found useful in practice, recall the fa-
prediction, g(y,,3i,). In many applications, the loss func- miliar result that optimal k-step-aheadforecast errorsare at
tion will be a direct function of the forecast error;that is, most (k - 1)-dependent.In practicalapplications,of course,
g(y,, i,) = g(ei,). To economize on notation,we write g(ei,) (k - 1)-dependencemay be violated for a varietyof reasons.
from this point on, recognizing that certain loss functions Nevertheless,it seems reasonableto take (k - 1)-dependence
(like direction-of-change)do not collapse to g(ei,) form, in as a reasonablebenchmarkfor a k-step-aheadforecast error
which case the full g(y,,Y,) form would be used. The null (and the assumptionmay be readily assessed empirically).
hypothesis of equal forecast accuracy for two forecasts is This suggests the attractivenessof the uniform,or rectangu-
E[g(ei,)] = E[g(ejt)],or E[d,] = 0, where d, - [g(ei,) - g(ejt)] lar, lag window,definedby
is the loss differential. Thus, the "equalaccuracy"null hy-
pothesis is equivalentto the null hypothesis thatthe popula- 1 =1 for S ? 1
tion mean of the loss-differentialseries is 0. S(T) S(T)
=0 otherwise.
1.1 An AsymptoticTest (k - 1)-dependenceimplies thatonly (k - 1) sample autoco-
Considera samplepath {dt}I', of a loss-differentialseries. variancesneed be used in the estimationof fd(O)because all
If the loss-differentialseries is covariancestationaryandshort the others are 0, so S(T) = (k - 1). This is legitimate(i.e.,
memory, then standardresults may be used to deduce the the estimatoris consistent)under(k - 1)-dependenceso long
as a uniform window is used because the uniform window
asymptoticdistributionof the sample mean loss differential.
We have assigns unit weight to all includedautocovariances.
Because the Dirichletspectralwindow associatedwith the
- N(O, rectangularlag window dips below 0 at certainlocations,the
T/(d - -) 27rfd(O)),
where resultingestimatorof the spectraldensityfunctionis notguar-
anteedto be positive semidefinite.The large positive weight
d= -[g(ei,)-g(et)] nearthe origin associatedwith the Dirichletkernel,however,
makes it unlikely to obtain a negative estimate of fd(0). In
is the sample mean loss differential, applications,in the rareevent thata negativeestimatearises,
o0 we treat it as 0 and automaticallyreject the null hypothe-
1
fd(O) =
7 Z
Yd(r)
sis of equal forecast accuracy. If it is viewed as particularly
"7T=-00 importantto impose nonnegativityof the estimatedspectral
is the spectraldensity of the loss differentialat frequency0, density, it may be enforced by using a Bartlettlag window,
with correspondingnonnegativeFejerspectralwindow,as in
Y%(r)= E[(d, - -)(d,_, - I)] is the autocovarianceof the the work of Newey andWest (1987), at the cost of havingto
loss differentialat displacementr, and I is the population
mean loss differential. The formulaforfd(0) shows that the increasethe truncationlag "appropriately" with samplesize.
correctionfor serial correlationcan be substantial,even if Otherlag windows and truncationlag selection procedures
the loss differentialis only weakly serially correlated,due to are of coursepossible as well. Andrews(1991), for example,
cumulationof the autocovarianceterms. suggested using a quadraticspectral lag window, together
Because in largesamples the samplemeanloss differential with a "plug-in"automaticbandwidthselection procedure.
d is approximatelynormally distributedwith mean /t and 1.2 ExactFinite-SampleTests
variance27rfd(O)/T,the obvious large-sampleN(O, 1) statistic
for testing the null hypothesis of equal forecastaccuracyis Sometimes only a few forecast-errorobservations are
available in practice. One approachin such situations is
to bootstrapour asymptotic test statistic, as done by Mark
$1 =
(1995). Ashley's (1994) workis also very muchin thatspirit.
T
Littleis knownaboutthe first-orderasymptoticvalidityof the
wherefd(0) is a consistent estimate offd(0). bootstrapin this situation, however, let alone higher-order
Following standardpractice, we obtain a consistent esti- asymptoticsor actualfinite-sampleperformance.Therefore,
mate of 2lrfd(0) by taking a weighted sum of the available it is useful to have availableexact finite-sampletests of pre-
sample autocovariances, dictive accuracy, to complement the asymptotic test pre-
sented previously. Two powerful such tests are based on
the observed loss differentials(the sign test) or their ranks
2lrfd(0)= 1( T) d(7), (Wilcoxon's signed-ranktest). [These tests are standard,so
where our discussion is terse. See, for example, Lehmann(1975)
for details.]
t=1fi+1I
1.2.1 The Sign Test. The null hypothesis is a zero-
median loss differential:med(g(ei,) - g(ei,)) = 0. Note that
1(7/S(T)) is the lag window, and S(T) is the truncationlag. the null of a zero-medianloss differential is not the same
136 DieboldandMariano: Predictive
Comparing Accuracy
as the null of zero difference between median losses; that the loss functionneed not be quadraticand need not even be
is, med(g(ei,)- g(ej,)) / med(g(ei,)) - med(g(ei,)). For that symmetricor continuous.
reason,the null differs slightly in spirit from thatassociated Second, a varietyof realisticfeaturesof forecasterrorsare
with our earlierdiscussed asymptotictest statistic S1, but it readily accommodated.The forecast errorscan be nonzero-
neverthelesshas an intuitiveandmeaningfulinterpretation- mean, non-Gaussian, and contemporaneouslycorrelated.
namely,thatP(g(eit) > g(ej,)) = P(g(ei,) < g(ei,)). Allowance for contemporaneouscorrelation,in particular,is
If, however, the loss differential is symmetrically dis- importantbecausethe forecastsbeing comparedareforecasts
tributed,then the null hypothesis of a zero-medianloss dif- of the same economic time series and because the informa-
ferentialcorresponds precisely to the earlier null because tion sets of forecastersarelargelyoverlappingso thatforecast
in that case the median and mean are equal. Symmetry of errorstend to be stronglycontemporaneouslycorrelated.
the loss differential will obtain, for example, if the distri- Moreover, the asymptotic test statistic S1 can of course
butions of g(ej,) and g(ej,) are the same up to a location handle a serially correlatedloss differential. This is poten-
shift.Symmetryis ultimatelyan empiricalmatterandmay be tially importantbecause, as discussed earlier,even optimal
assessedusing standardprocedures.We have found roughly forecasterrorsareseriallycorrelatedin general. Serialcorre-
symmetricloss-differential series to be quite common in lationpresentsmore of a problemfor the exact finite-sample
practice. test statisticsS2 and S3 and their asymptoticcounterpartsS2,
Theconstructionandintuitionof a test statisticarestraight- and S3abecause the elements of the set of all possible re-
forward.Assuming thatthe loss-differentialseries is iid (and arrangementsof the sample loss differentialseries are not
we shall relax that assumptionshortly), the numberof pos- equally likely when the data are serially correlated,which
itive loss-differentialobservationsin a sample of size T has violates the assumptionson which such randomizationtests
the binomialdistributionwith parametersT and l underthe are based. Nevertheless, serial correlationmay be handled
nullhypothesis. The test statisticis thereforesimply via Bonferronibounds,as suggestedin a differentcontextby
T Campbell and Ghysels (1995). Under the assumptionthat
S2 = the forecasterrorsand hence the loss differentialare (k - 1)-
+(d,),
dependent,each of the following k sets of loss differentials
where will be free of serial correlation: {d1y,I, dij,l+k,dij,1+2k,.. .,
I+(d,)= 1 if d, > 0 {dij,2, dij,2+k, dij,2+2k,...... ., {di,k, dij,2k, dij,3k,.. .}. Thus, a
= 0 otherwise. test with size boundedby a can be obtainedby performing
k tests, each of size a/k, on each of the k loss-differential
Significancemay be assessed using a table of the cumula- sequences and rejectingthe null hypothesis if the null is re-
tive binomialdistribution.In large samples, the studentized jected for any of the k samples. Finally, it is interestingto
versionof the sign-test statistic is standardnormal: note that, in multistep forecast comparisons,forecast-error
serial correlationmay be a "commonfeature,"in the termi-
S2--.5T a
S2a = ..,N(O,1). nology of Engle and Kozicki (1993), because it is induced
largelyby the fact thatthe forecasthorizonis longerthanthe
1.2.2 Wilcoxon's Signed-Rank Test. A related distri- intervalat which the dataare sampledand may thereforenot
bution-freeprocedurethatrequiressymmetryof the loss dif- be presentin loss differentialseven if presentin the forecast
ferential(but can be more powerfulthan the sign test in that errorsthemselves. This possibility can of coursebe checked
case) is Wilcoxon's signed-ranktest. We again assume for empirically.
the moment that the loss-differentialseries is iid. The test
statisticis 2. EXTANTTESTS
T
x'z
Gaussian Fat-tailed
Gaussian Fat-tailed
Table3. Empirical
Size UnderQuadratic
Loss,TestStatisticMR
Gaussian Fat-tailed
T p 0=.0 0= .5 8= .9 0= .0 0= .5 8= .9
8 .0 9.67 19.33 22.45 16.16 25.26 27.62
8 .5 9.50 19.00 22.07 14.81 24.50 26.99
8 .9 9.66 19.51 22.85 11.23 21.28 24.14
16 .0 9.62 13.92 14.72 19.94 22.56 23.06
16 .5 10.02 13.88 14.96 17.70 21.04 21.26
16 .9 10.04 13.82 14.94 11.76 15.68 16.70
32 .0 9.96 10.98 11.12 22.78 22.86 21.72
32 .5 9.68 11.46 11.66 19.78 20.32 20.14
32 .9 9.86 11.62 11.96 12.42 13.54 13.46
64 .0 10.32 11.02 11.04 24.50 22.60 21.58
64 .5 9.84 10.56 10.64 21.44 19.48 18.84
64 .9 9.58 10.58 10.34 13.38 13.38 13.20
128 .0 9.78 10.54 10.44 25.86 22.90 21.54
128 .5 10.02 11.04 11.18 22.76 20.26 19.44
128 .9 10.76 11.28 11.38 13.44 13.52 12.92
256 .0 10.04 9.90 9.58 27.16 23.74 22.70
256 .5 10.32 9.92 9.82 24.00 20.50 19.18
256 .9 9.92 10.16 10.34 13.38 12.70 12.24
512 .0 9.94 10.48 10.56 26.92 23.40 21.78
512 .5 9.52 10.56 10.48 23.56 20.52 19.36
512 .9 9.80 9.82 9.88 13.96 12.98 12.74
NOTE: T is sample size, p is the contemporaneouscorrelationbetweenthe innovationsunderlyingthe forecasterrors,and 0 is the
coefficientof the MA(1)forecasterror.Alltests are at the 10%level. Atleast 5,000 MonteCarloreplications
are performed.
Table4. Empirical
Size UnderQuadratic
Loss,TestStatisticS,
Gaussian Fat-tailed
Table5. Empirical
Size UnderQuadratic
Loss,TestStatisticsS2 andS28
Gaussian Fat-tailed
Loss,TestStatisticsS3 andS3e
Size UnderQuadratic
Table6. Empirical
Gaussian Fat-tailed
T p 0= .0 = .5 0 = .9 0= .0 = .5 0 = .9
30- 1.25
0 IMGN 1.00
0.75
"0.50
d 0.25
S0.00
254. AN EMPIRICAL
EXAMPLE -0.25
o 0.1
S-2.5
0.0
-5.0
-0.1
-7.5 ,
77 78 79 80 818283885868788 8990 91
Time 2 4 6 8
Figure2. Actualand PredictedExchange-RateChanges. The Displacement
solidline is the actual exchange-ratechange. Theshort dashed line Figure4. Loss DifferentialAutocorrelations.The firsteightsam-
is the predictedchange fromthe rao andom-walk oe
model, and the long are graphed,togetherwithBartlett'sapproximate
ple autocorrelations
dashed lineis the predictedchange impliedby the forwardrate. 95%confidenceinterval.
Statistics,July1995
Journalof Business&Economic 143
Brockwell,P. J., and Davis, R. A. (1992), TimeSeries: Theoryand Methods Kendall,M., andStuart,A. (1979), TheAdvancedTheoryofStatistics(Vol.2,
(2nded.),NewYork:Springer-Verlag. 4thed.),NewYork:OxfordUniversity
Press.
B.,andGhysels,E. (1995),"IstheOutcome
Campbell, of theFederalBudget Lehmann, E. L. (1975), Nonparametrics: Statistical Methods Based on
ProcessUnbiasedandEfficient?A NonparametricAssessment,"Review Ranks,SanFrancisco: Holden-Day.
of Economicsand Statistics, 77, 17-31. Leitch,G., andTanner,J. E. (1991), "Econometric
ForecastEvaluation:
Chinn,M., andMeese,R. A. (1991),"Bankingon Currency
Forecasts:Is ProfitsVersusthe Conventional ErrorMeasures,"AmericanEconomic
Changein MoneyPredictable?" unpublished Universityof
manuscript, Review,81, 580-590.
Berkeley,Graduate
California, Schoolof Business. Mariano,R. S., andBrown,B. W.(1983),"Prediction-BasedTestforMis-
Chong,Y.Y.,andHendry,D. F. (1986),"Econometric
Evaluation
of Linear specificationin NonlinearSimultaneousSystems,"in Studiesin Econo-
MacroeconomicModels,"Reviewof EconomicStudies, 53, 671-690. metrics, TimeSeries and MultivariateStatistics,Essays in Honor of T W
Christiano, M. (1990),"UnitRootsin RealGNP:Do
L., andEichenbaum, Anderson,eds. T. Amemiya,S. Karlin,andL. Goodman,New York:
We Know,and Do We Care?"Carnegie-RochesterConferenceSeries on AcademicPress,pp. 131-151.
Public Policy, 32, 7-61. Mark,N. (1995),"Exchange
RatesandFundamentals:Evidenceon Long-
P., andDiebold,F. X. (1994),"Optimal
Christoffersen, Prediction
Under Horizon Predictability,"
AmericanEconomicReview, 85, 201-218.
Asymmetric Loss,"TechnicalWorkingPaper167, NationalBureauof R., andRossi,P.E. (1990),"Posterior,
McCulloch, andUtility-
Predictive,
EconomicResearch,Cambridge, MA. BasedApproaches to Testingthe Arbitrage Journalof
PricingTheory,"
Clemen,R. T. (1989), "Combining Forecasts:A ReviewandAnnotated Financial Economics,28, 7-38.
Bibliography"(with discussion), InternationalJournalof Forecasting,5, Meese,R. A., andRogoff,K. (1988),"Wasit Real? TheExchangeRate-
559-583. InterestDifferential
RelationOverthe ModemFloating-Rate Period,"
Clements,M.P.,andHendry, D. T.(1993),"OntheLimitationsof Compar- Journal of Finance,43, 933-948.
ingMeanSquareForecast Errors" Journalof Forecast-
(withdiscussion), Mizrach,B. (1991),"ForecastComparison in L2,"unpublished
manuscript,
ing,12,617-676. Wharton School,Universityof Pennsylvania,Dept.of Finance.
Cumby, R.E., andModest,D. M.(1987),"TestingforMarketTimingAbil- Morgan,W. A. (1939-1940),"ATestfor Significanceof the Difference
ity: A Framework Journalof FinancialEco-
for ForecastEvaluation," BetweentheTwoVariances in a SampleFroma NormalBivariate Popu-
nomics,19, 169-189. lation,"Biometrika,31, 13-19.
Diebold,F. X., andRudebusch, G. D. (1991),"Forecasting
Outputwith Newey, W., and West, K. (1987), "A Simple, PositiveSemi-Definite,
the CompositeLeadingIndex: An Ex AnteAnalysis,"Journalof the HeteroskedasticityandAutocorrelationConsistentCovariance Matrix,"
AmericanStatisticalAssociation, 86, 603-610. Econometrica,55, 703-708.
Dhrymes,P.J., Howrey,E. P.,Hymans,S. H., Kmenta,J., Leamer,E. E., Priestley,M. B. (1981), SpectralAnalysisand TimeSeries, New York:Aca-
Quandt, V. (1972),
R. E., Ramsey,J. B., Shapiro,H. T., andZarnowitz, demicPress.
"Criteria
for Evaluationof Econometric Models,"Annalsof Economic G. D. (1993),"TheUncertain
Rudebusch, Trendin U.S.RealGNP,"
Amer-
and Social Measurement,1, 291-324. ican EconomicReview,83, 264-272.
Engel,C. (1994),"Canthe MarkovSwitchingModelForecastExchange Stock,J. H., andWatson,M. W. (1989), "Interpreting
the Evidenceon
Rates"Journal of InternationalEconomics,36, 151-165. Money-IncomeCausality,"Journalof Econometrics,40, 161-181.
forCommonFeatures,"
Engle,R.F.,andKozicki,S. (1993),"Testing Jour- Toda,H.Y.,andPhillips,P.C.B.(1993),"Vector andCausal-
Autoregression
nal of Business & Economic Statistics, 11, 369-395. ity,"Econometrica,61, 1367-1393.
Information
Fair,R. C., and Shiller,R. J. (1990), "Comparing in Fore- Vuong,Q.H.(1989),"LikelihoodRatioTestsforModelSelectionandNon-
casts From Econometric Models," American Economic Review, 80, nestedHypotheses," 57, 307-334.
Econometrica,
375-389. Weiss,A. A. (1991),"Multi-step
EstimationandForecasting
in Dynamic
C. W. J. (1969), "Prediction
Granger, Costof Error
Witha Generalized Models,"Journalof Econometrics,48, 135-149.
Function,"OperationalResearchQuarterly,20, 199-207. - (1994),"Estimating TimeSeriesModelsUsingtheRelevantCost
C. W. J., andNewbold,P. (1977), Forecasting
Granger, EconomicTime Function," unpublished
manuscript,Universityof SouthernCalifornia,
Series,Orlando, FL:AcademicPress. Dept.of Economics.
Hamilton,J. D. (1989), "ANew Approachto the EconomicAnalysisof Weiss,A. A., andAndersen,A. P.(1984),"EstimatingForecastingModels
Nonstationary TimeSeriesandtheBusinessCycle,"Econometrica,
57, UsingtheRelevantForecastEvaluation Journalof theRoyal
Criterion,"
357-384. StatisticalSociety, Ser. A, 137, 484-487.
E. J. (1970),MultipleTimeSeries,NewYork:JohnWiley.
Hannan, West, K. D. (1994), "Asymptotic
InferenceAboutPredictiveAbility,"
Hogg,R. V., andCraig,A. T. (1978), Introductionto MathematicalStatistics SSRIWorkingPaper9417,Universityof Wisconsin-Madison, Dept.of
(4thed.),NewYork:MacMillan. Economics.
M. D. (1974),"Noteson Test-
E. P.,Klein,L. R., andMcCarthy,
Howrey, West,K.D.,Edison,H.J.,andCho,D.(1993),"AUtility-Based
Comparison
ingthe PredictivePerformanceof EconometricModels,"International of SomeModelsof ExchangeRateVolatility,"Journalof International
EconomicReview, 15, 366-383. Economics,35, 23-46.