Professional Documents
Culture Documents
The Politics of Forecast Bias Forecaster Effect
The Politics of Forecast Bias Forecaster Effect
This paper examines the impact of forecasters, horizons, revenue categories, and
forecast timing in relation to decision making on forecast bias or accuracy. The
significant findings are: for the most part forecasters tend to report forecasts that
are similar rather than competitive. Forecast bias (underforecasting) increases over
longer horizons; consequently claims of structural budget deficit are suspect, as
an assertion of structural deficit requires that a reliable forecast of revenue shows
continuous shortfall compared with a reliable forecast of expenditures. There is
an overforecasting bias in property tax, possibly reflecting demand for services.
There is an underforecasting forecast bias in two revenue categories, all other
taxes and federal categorical grants, resulting in a net total underforecasting bias
for the city’s revenue. There appears to be a period effect (forecasts in June are
substantially biased), but this effect requires further study. The study suggests
further examination of the bias associated with revenue categories, time within the
budget cycle, and forecast horizon.
INTRODUCTION
As long ago as Government Budgeting (Burkhead 1956), it has been observed that gov-
ernment forecasters often underestimate revenue. In fact, New York City consistently un-
derforecasts revenue with forecasts in the earlier years—sometimes roughly accurate, but
almost never in excess of actual revenues. Table 1 also shows that in the past decade, un-
derforecasting has been pervasive across revenue sources. Treating New York City as a
forecasting case study, this paper examines forecast details to discover characteristics that
predict underforecasting bias.
There have been many studies substantiating the notion of revenue underforecasting
by subnational governments in the United States. Voorhees (2006) provides an extensive
Daniel W. Williams is a Associate Professor, Baruch College, New York, NY 10010. He can be reached at
Daniel.Williams@Baruch.CUNY.EDU. I would like to thank Fred Thompson, Phil Joyce, Jonathan Justice
and Thad Calabrese for comments on prior versions of this paper.
review of the recent literature, reporting that underestimation bias varies substantially from
study to study. Rubin (1987) reports that poor jurisdictions tend to overestimate revenue.
In one study comparing the US Office of Management and Budget and Congressional
Budget Office, it was determined that competing forecasters exhibit similar biases rather
than providing competing independent forecasts (Krause and Douglas 2006).
Choate and Thompson (1988, 1990), Paleologou (2005), and Rodgers and Joyce (1996)
focus on the political character of forecasting. Choate and Thompson argue that it is the
principal (the political decision maker), not the agent (the technical forecaster), who selects
the underforecast. They argue that forecast bias is not strictly a consequence of the widely
believed budget officers’ risk aversion, and suggest that underforecasting plays a role in tax
policy. Paleologou shows that in the United Kingdom forecasting bias is associated with
the party in power. Rodgers and Joyce suggest there are complex factors some of which are
rational in a political context and they call for further work to explain variation in forecast
errors.
This paper examines revenue forecasting differently than past studies. First, rather than
looking at one, or even two, revenue forecasters, this paper examines five separate forecast-
ers of New York City revenue over a five-year period. By examining a larger number of
forecasters, it is possible to evaluate whether the institutional motivation and presence of
competition affects revenue forecasting accuracy and the direction of bias.
These questions relate directly to the reliability and use of forecasts and the need for
forecasts, as well as addressing significant gaps that exist in the current research. In the
subsequent sections, each question is discussed in more detail.
This paper is significant because revenue forecasting is a principal controlling force in
budget making in subnational governments in the United States. Elected officials of all sorts
are loath to be seen to increase taxes or any visible revenue devices. Forecasting provides
estimates for the continuing effect of already established revenue devices for the budget
year and future years. Elected officials may constrain expenditures within these estimates
1. New York City Office of Management and Budget (NYCOMB), which reports to a
deputy mayor and ultimately to the mayor. This is the charter mandated determinative
forecast.
2. The Financial Control Board (FCB), which was created after the financial crisis of
the 1970s. The explicit purpose of the FCB is to prevent the city from overspending
its revenue.
3. The New York City Comptroller (NYCC), one of three citywide elective positions
and is considered a competitor of the mayor. In the 2009 election and other earlier
elections the comptroller was among the mayoral candidates, sometimes opposed to
the current mayor.
4. The Independent Budget Office (IBO), which was created in a significant charter
revision in 1989. This charter change resulted in shifting budget power to the city
1. The author served on a technical panel for a consensus prison population forecast between three bodies
when employed with a state government.
During recent years each of these entities has made some information about its fore-
cast available to the public via the internet.2 These publically available data provide the
opportunity to examine relative forecast practice of the five forecasters.
Should there be a forecaster effect? By forecaster effect, it is intended that there is a
statistically significant observable difference in systematic error associated with the source
of the forecast (the entity that produces the forecast). There are three reasons why there
might be such a difference:
1. Among the five forecasters, two, NYCOMB and FCB, are revenue conservers (Bland
2007)—if there is a bias for underforecasting, they should exhibit this bias the most;
two, IBO and NYCC, represent demand for services—IBO as a proxy for city council
and NYCC as a political competitor to the mayor—so they should reflect less under-
forecasting bias; and one, DCNYC, is indeterminate. This reason is the most explicitly
political: revenue conservers such as NYCOMB and the FCB may be seeking to sup-
press expenditures in order to build surpluses as a hedge against future uncertainty or
to hold down future taxes as argued by Choate and Thompson (1988, 1990). Repre-
sentatives of demand, such as IBO and NYCC, should prefer to find revenue so that
expenditure programs can be funded (Bretschneider, Straussman, and Mullins 1988).
2. It is not an effective use of public resources to conduct five forecasts that get the
same biased answer. So, forecasters should be expected to find different answers. The
existence of no forecaster effect should raise the suspicion that the public is spending
too much money on forecasting. The existence of rational bias is well documented
in forecasting literature (Batchelor 2007; Butler and Lang 1991; Laster, Bennett, and
Geoum 1999). The specific rational bias suggested here, coming to different results,
is sometimes included in this literature, although it is poorly understood. The reason
offered here—the continued appearance of usefulness—is no more speculative than
those offered in the existing literature.
1. As argued by Krause and Douglas (2006), forecasters may seek the same answer rather
than differences. This result is a safe strategy because the future is uncertain, leaving
opportunity for variance between forecast and actual. When the same variance can be
attributed to one’s “competitor,” it appears one is still doing a reasonably good job.
Herding behavior has been documented in other uses of forecasting as well (Clement
and Tse 2005; Olsen 1996).
2. To the degree that it is documented on their various websites and reports, these entities
use the same basic approach to forecasting, mostly econometric modeling. To some
degree they are simply second guessing NYCOMB’s parameter choices. NYCOMB
produces a set of input economic factor forecasts, which are not necessarily reforecast
by the other entities. In addition, for “minor” categories various forecasters simply
accept the NYCOMB forecast as their own.3 Considering the similarity of method and
high interdependence, it is unlikely that the forecasts will be substantially different.
3. Some data deficiencies that will be discussed in a subsequent section led to imputing
some values. Thus, even where there may be differences, the data may be insufficient
to find these differences.
3. No interviews were conducted with the various forecast entities. These entities were contacted for
data files that were not available on their websites. In the ensuing conversation some anecdotal information
was volunteered. This information should not be treated as systematically collected. Despite the contacts, no
additional data were provided beyond that which could be found on the websites.
4. All other city revenue includes federal and state unrestricted grants, other categorical grants interfund
agreements, disallowance of categorical grants (a negative entry), and adjustment for intercity revenue (a
negative entry, which is a revenue line in miscellaneous category). It is apparent that the standard “forecast”
for disallowance of categorical grants is set at $15 million every year. By combining the two larger categories,
intercity revenue is netted out of the entire forecast and actuals, thus reducing irrelevant variability.
Hypothesis 2 = There is a revenue effect that is negative for all city source categories
except property (which is uncertain) and uncertain for federal and state
categorical grants.
Hypothesis 4 = There is a month of origin effect that is negative (underforecast) for May,
uncertain for June, and positive for July (increase over earlier periods).
Hypothesis 5 = The second forecast for the same fiscal years is predicted to be higher.
Data
The data used in this analysis are collected from reports posted on the websites as identified
in footnote 2. The reports require considerable preanalysis before they can be examined
with respect to the central question of this study. NYCOMB and IBO report their forecasts.
However, these forecasts are divided into two parts. The larger part of these forecasts is
in revenue categories, such as property tax, sales tax, etc. A smaller part is in proposals
that are expected to pass at the city council session at the time of the forecast. Forecasts
of these proposals must be recategorized to fit the revenue categories used in this study
before comparison with actual revenue outcomes.5 After these adjustments and in order to
be parallel with the other forecasters, the NYCOMB forecast is summarized in five groups,
property tax, all other taxes, miscellaneous and other city revenue, federal categorical grants,
and state categorical grants. A further difficulty with the IBO forecast is that in the first year
of these data, 2003, IBO aggregates federal and state categorical grants. To impute values,
the NYCOMB value federal categorical grants is imputed to IBO and the difference between
the aggregate values and the NYCOMB values are treated as the IBO value state categorical
grants. This procedure may have the effect of understating the difference between IBO and
NYCOMB in federal categorical grants while overstating it in state categorical grants.
The NYCC, FCB, and DCNYC report “risks,” which is to say how much they think NY-
COMB’s forecast is incorrect in particular lines. These risks are converted back to forecasts
by adding or subtracting them compared with the appropriate NYCOMB forecast.6 These
forecasters also do not necessarily report all the categories of this analysis. Particularly, the
5. There is a report made by NYCOMB that assigns proposal forecasts to revenue categories; however,
efforts to obtain this report were unsuccessful. Most proposals are easily categorized; however, there may be
some small error due to this lack of access.
6. Despite the term “risk,” these are alternate competing forecasts. The “risk” is the difference between
the NYCOMB’s forecast and competing forecast.
Model
The hypotheses are evaluated using the model:
⎛ ⎞
⎜ F −A⎟ k l
⎝ F + A ⎠ × 100 = βi xi β j+k m j + ε
i=1 j=1
2
where F is forecast, A is actual, the variables xi are dummies for conditions specified in the
hypotheses except that the horizon variable is a semicontinuous variable labeled Out Year
and coded 0 through 3 for the budget year and three out years reported with the forecasts,
and mj are three-way or four-way interaction variables included as controls plus controls
for forecast and expenditure years.8 The expression,
F −A
× 100
F +A 2
is the symmetrical percent error (SPE), a component of symmetrical mean absolute percent
error (SMAPE), which is commonly used by forecasters (Armstrong 2006; Dekker, van
Donselaar, and Ouwehand 2004; Lawrence, O’Connor, and Edmundson 2000; Makridakis
and Hibon 2000). Use of SPE simultaneously solves four problems. As with differencing,
7. The dependent variable observations for this study are symmetrical percent errors and, for some
purposes, percent errors. For each actual expenditure within a category, there are many forecasts resulting in
roughly 20 observed symmetrical percent errors: There are at minimum five forecasters times four forecasts
(beginning with the third out year and some years later as the budget year). When a forecaster reports more
than one forecast in the same cycle, there can be more.
8. The forecast year is the fiscal year in which the forecast is made, the forecast origin year. The expenditure
year is the fiscal year in which the expenditures will occur, the forecast target year. A forecast made in FY 2003
will have target years of FY 2004 through FY 2007.
Results
The mean value of SPE is −0.127 (12.7 percent) and the standard deviation is 13.4 percent.12
Regression estimations are shown in Table 2. The models are estimated using ordinary least
squares (OLS) with robust standard errors.
Models 1 through 3 test various hypotheses. Adjusted R-squared ranges from 50.7 to
68.0 percent, showing that, in fact, these variables provide a reasonably good explanation
of the variation in SPE. To test the robustness, Model 4 examines the effect of a reduced
form model excluding interaction variables. The adjusted R-squared declines by about
10 percent, but remains significant at the 1 percent level. The coefficients do not change
more than one or two percent points.
9. Small values of SPE, between ±10 percent, are roughly equivalent to percentages; large values under-
state positive percentages and overstate negative ones, but not severely until SPE exceeds ±25 percent.
10. As a technical note, by excluding the constant, the model allows one block of indictors to be included
without excluding the last indicator to avoid singularity. For this block, the coefficient measures the distance
from zero. For the other blocks, the coefficient measures the distance from the excluded indicator. In effect,
the constant is allocated among the indicators of the fully included block.
11. The first expenditure year and last forecast year are the excluded indicators. In addition, the model
fitting excluded the last forecast target year. Further explanation is provided in footnote 8.
12. Because this value is unweighted, it is not the mean bias.
Coef. Std. err. Coef. Std. err. Coef. Std. err. Coef. Std. err.
Out Year −0.039 0.004b −0.039 0.004b −0.040 0.004b −0.039 0.004b
Second −0.017 0.016 −0.009 0.039 −0.012 0.022 −0.016 0.017
NYCC 0.003 0.018 0.009 0.023
IBO 0.016 0.024 0.024 0.029
DCNYC −0.011 0.016 −0.003 0.021
FCB −0.013 0.009c −0.007 0.016
NYCOMB 0.000 0.024 0.009 0.029
Property 0.133 0.027a 0.146 0.008a
All other taxes −0.112 0.029b −0.122 0.011b
Miscellaneous −0.009 0.028 −0.011 0.011
Federal categorical −0.038 0.027c −0.039 0.009b
grants
State categorical 0.002 0.027
grants
May −0.009 0.010 −0.011 0.022
June −0.017 0.020 −0.011 0.015
July −0.002 0.035
R-squared 0.723 0.543 0.701 0.588
Adjusted R-squared 0.680 0.507 0.623 0.575
F 67.9 (84,540) 54.8 (46,578) 25.0 (157,593) 129.8 (20,604)
p-value (F) 0.000 0.000 0.000 0.000
Note: Control (interaction) variables not shown. All models controlled for expenditure year and budget year fixed
effects. Model 1: parameters estimated for forecaster effect. Model 2: parameters estimated for revenue category effect.
Miscellaneous and other city revenue; and federal categorical grants have an unexpected sign. Model 3: parameters
estimated for forecast month effect. Model 4 is reduced using only forecast origin and forecast target controls.
Significance: a 1%, two tail; b 1% one tail; c 10%, one tail.
Coef. Std. err. Coef. Std. err. Coef. Std. err. Coef. Std. err. Coef. Std. err.
Out Year −0.039 0.004b −0.039 0.004b −0.039 0.004b −0.039 0.004b −0.039 0.004b
Second −0.017 0.016 −0.017 0.016 −0.017 0.016 −0.017 0.016 −0.017 0.016
NYCC 0.002 0.012 0.016 0.018 −0.014 0.014 0.014 0.010
IBO 0.016 0.012c 0.030 0.024 0.014 0.014 0.028 0.014a
DCNYC −0.012 0.013 0.002 0.017 −0.028 0.014a −0.014 0.010
FCB −0.014 0.024 −0.030 0.024 −0.016 0.018 −0.002 0.017
NYCOMB 0.014 0.024 −0.016 0.012c −0.002 0.012 0.012 0.013
R-squared 0.723 0.723 0.723 0.723 0.723
Adj. R-sq. 0.680 0.680 0.680 0.680 0.680
F 67.9 (84,540) 67.9 (84,540) 67.9 (84,540) 67.9 (84,540) 67.9 (84,540)
p-value (F) 0.000 0.000 0.000 0.000 0.000
Note: Control (interaction) variables not shown.
All models controlled for expenditure year and budget year fixed effects.
Models 1a through 1e sequentially include one forecaster in the base model to test for difference with the others.
Significance: a 10%, two tail; b 1% one tail; c 10%, one tail.
distinguished from no bias at the intercept of the regression model (i.e., in the budget year).
The answer is that only the FCB can be distinguished from zero.
We are also interested in whether the forecasters can be distinguished from each other.
For that question we turn to Table 3, which shows the effect when each of the forecaster
dummies is omitted from the model sequentially. This exclusion puts the excluded forecaster
into the base model and tests for the hypothesis that this forecaster is significantly different
from the other forecasters. Where IBO or NYCC is compared with NYCOMB or FCB,
directional hypotheses are appropriate (IBO or NYCC should have positive coefficients,
NYCOMB or FCB should have negative coefficients), otherwise nondirectional hypotheses
are required. IBO forecast bias is statistically different with a positive coefficient from
NYCOMB, p = 0.1, and from DCNYC, p = 0.1.
Hypothesis 2 is that there is a revenue category effect that is uncertain for property tax,
negative (underforecasting) for other New York City own-source revenue, and uncertain for
federal and state categorical grants. Model 2 shows underforecasting (negative coefficients)
for all other taxes, miscellaneous and other city revenue, and federal categorical grants. It
shows positive (over forecasting) for property tax and state categorical grants. Property tax,
all other taxes, and Federal Categorical Revenue are statistically significant. An important
potential limitation is that because of the way some values were imputed, it is likely that
the variance in property tax is understated, so the level of significance may be biased; no
known data source is available to examine this concern.
This paper has examined the impact of forecasters, horizons, revenue categories, and forecast
timing in relation to decision making on forecast bias or accuracy. For the most part
forecasters tend to report forecasts that agree rather than compete. This finding suggests
that five separate forecasts are excessive. However, it may be unwise to reduce the forecasts
to one. The evidence here is not sufficient to show that the convergence is toward NYCOMB
rather than toward some other value that is less biased than NYCOMB would be without
REFERENCES
Armstrong, J. Scott. 2006. “Findings from Evidence-Based Forecasting: Methods for Reducing Forecast Error.” International Journal
of Forecasting. 22 (3): 583–598. doi: 10.1016/j.ijforecast.2006.04.006.
APPENDIX ON VARIABLES
Dependent Variables
SPE_RCD = symmetrical percent error as defined in the main text. The sublabel RCD
reflects combining miscellaneous and other revenue to exclude negative revenue.
PE_RCD = percent error as defined in the main text.
Independent Variables
Out Year = years past the first year of the forecast coded 0 for the upcoming budget year
and 1, 2, and 3 for the three subsequent years.
Second = coded 1 if the record is for the Second forecast made by the forecaster during the
same cycle, otherwise 0.
Forecaster
NYCC = coded 1 if the forecaster is the New York City Comptroller, otherwise 0.
IBO = coded 1 if the forecaster is the Independent Budget Office, otherwise 0.
DCNYC = coded 1 if the forecaster is the State Deputy Comptroller for New York City,
otherwise 0.
FCB = coded 1 if the forecaster is the Financial Control Board, otherwise 0.
NYCOMB = coded 1 if the forecaster is the New York City Office of Management and
Budget, otherwise 0.
Revenue Category
Month
FCY03 through FCY07 = forecast origin year 2003 through 2007, individually, coded 1 for
the respective year, otherwise 0.
BY04 through BY09 = budget year (forecast target year) 2004 through 2009, coded 1 for
the respective year, otherwise 0.
Interaction Variables