Professional Documents
Culture Documents
Forecasting US Population Totals With The Box-Jenkins Approach
Forecasting US Population Totals With The Box-Jenkins Approach
North-Holland
Abstract: The use of the Box-Jenkins approach for forecasting the population of the United States up
to the year 2080 is discussed. It is shown that the Box-Jenkins approach is equivalent to a simple trend
model when making long-range predictions for the United States. An investigation of forecasting
accuracy indicates that the Box-Jenkins method produces population forecasts that are at least as
reliable as those done with more traditional demographic methods.
Keywords: Population forecasting, Time series analysis, Difference equations, Forecasting accuracy
ond differences of the original values is recom- interval, one can then more easily decide
mended. whether the uncertainty connected with the fu-
Ideally, population forecasts should be based ture population development was estimated too
on social theories linking social, cultural, psycho- high or too low.
logical and economic factors to demographic be-
havior [see Keilman (1990, p. SO)]. Therefore it
is correct to question the use of time series
2. Data and model
procedures like the Box-Jenkins approach for
population forecasts. In contrast to the compo-
The Box-Jenkins model has gained great
nent method. the Box-Jenkins approach does
popularity since the publication of their book in
not even provide a population forecast classified
1970. This method depends on the class of au-
according to age and sex. In my opinion, there
toregressive (AR). integrated (I), moving aver-
are several reasons that justify the use of time
age (MA) models. Non-stationary data must be
series methods with population forecasts, where-
differentiated until stationarity is achieved. An
by the Box-Jenkins approach is generally better
ARIMA model must then be formulated for the
than other time series methods [see. for exam-
differentiated data by looking at the autocorrela-
ple, Newbold and Granger (1974)].
tion and the partial autocorrelation functions.
(1) Time series methods may perform more
After estimating the parameters of the model,
accurately. or at least as accurately, as more
diagnostic checks should be made. If the model
sophisticated methods. This has been found in a
is appropriate. forecasts are produced; if not,
number of studies [see. for example, Voss and
alternative models should be considered.
Palit (1981). Stoto (1983), Mahmoud (1984)].
The annual population size of the United
(2) The accuracy of time series methods
States from 1900 to 1988 serves as data base.
should always be regarded as a baseline against
When using the Box-Jenkins procedure, it has
which the forecasts made by more complex mod-
to be taken into account that the population size
els (e.g. the component method) can be as-
is only determined every 10 years by a census.
sessed. As long as these complex models do not
The data for the intervening years are estimates
surpass simple time series models in accuracy,
based on registered births, deaths and migrants.
they should be improved and modified. In this
The efficiency of the Box-Jenkins projection
sense, time series prognoses serve as a measure
therefore depends largely on the accuracy of the
of the accuracy of forecasts. The long-term goal
annual population estimates.
of the demographer should be to predict the
population more accurately than a forecaster
without demographic knowledge, since the de-
mographer’s entire expert knowledge goes into 3. Analysis of the population size
the production of prognoses.
(3) The results of time series forecasts serve This section demonstrates the application of
to test the plausibility of the assumptions of the Box-Jenkins technique using annual US
forecasts that have been produced with the com- population figures P, from 1900 to 1988 (Exhibit
ponent method [see, for example, Pflaumer 1). The series had to be differentiated until it
(1988a, 1988b)]. The time series forecast projects appeared to be stationary (see Exhibits 2 and 3).
the past population development into the future. The examination of the correlograms of various
If considerable deviations occur, one should con- differentiated series indicated the necessity of
sider whether the assumptions about fertility, differentiating twice, since the correlogram of
mortality and migration, whose extrapolation is the first differentiated series did not disappear
often based on time series methods as well, have fast enough.
been realistically chosen. Exhibit 4 shows the autocorrelation function
(4) The uncertainty of population forecasts (ACF) and the partial autocorrelation function
can be described in the case of the Box-Jenkins (PACF) of the second differences A’P,. The
method with a confidence interval. If one com- PACF is characterized by two negative spikes at
pares the alternative projections of the compo- lags 1 and 2 with the values r, = -0.18 and
nent method with the width of the confidence r2 = -0.306. The ACF is characterized by a
331
YE&R
2.88
1.88
%
:
P
D
.ee
k
E
P
E
SE
-1.88
-2.88
19 88‘I 1928. 1948. 1968. 1988. 2888.
YE(IR
Exhibit 3. US population (second differences) from 1900 to 1988 (in 100 000)
dampened sine wave. The behavior of both cor- The empirical results show a significant in-
relograms is consistent with an ARIMA (2,2,0) fluence of the regressors A’P,_, and Alp,_,. The
process for the population figures P,. Having asymptotic r-values are shown in parentheses
chosen an ARTMA model to fit to the data, the below the estimators.
next step is to estimate the parameters from the Since A’P, can be satisfactorily described by
data. Applying least squares yields the following an ARMA(2,O) process, it is easy to show, when
fitted equation: considering that
Exhibit 4
Autocorrelation and partial autocorrelation functions of ALP,
Lag 1 2 3 4 5 6
Lag 7 8 9 10 11 12
Exhibit 6
Projections of the US population (in millions).
Year
Census Bureau
projections (1989)
Middle 250 294 300 292
High 252 335 414 501
Projections of
Ahlburg and
Vaupel ( 1990)
Middle 252 329 402 487
(1% mortality progress
TFR = 2, 1-2 million
immigrants)
High 252 385 553 811
(2% mortality progress
large fertility cycles
l-2 million immigrants)
334 P. Pfluumer I Forecasting US population totals
This equation has repeated real roots and conju- + 239.419 + 2.219t + 0.004286t’ ,
gate complex numbers. Because of the existence fort=0,1,2,.
of repeated roots, the general solution is
Transforming it into its trigonometric form leads
P, = C,(-0.1015 + 0.49461’)’ to
+ C-O.1015 - 0.49461’)’ + (C, + tC,)l’ , p, = e~lMX37f
(-0.140 cos 1.77t - 0.087 sin 1.77t)
z = 0.004286.
4. Analysis of the logarithms of the population
Given the population (in millions) P,, = 239.3 size
(1985). P, = 241.6 (lY86), P2 = 243.9 (1987) and
P, = 246.1 (I YSS). The following linear systems In accordance with a recommendation by
of equations Kashyap and Rao (lY76), a Box-Jenkins model
has been selected with the logarithms of popula-
(-0. 1015 t 0.4Y46i)” (-0.1015 ~ 0.4946i)” 1 0 c‘,
(-0.1015 + 0.4946i)’ (~O.lOlS - 0.4Y46i)’ 1 1 CJ tion size. These authors find that the optimal
(-0.1015 + 0.4Y46i)’ (~0.1015 - 0.49461’)’ 1 2 c, model for US population data 1900-1971 is a
i (~0.1015 + 0.3046i)’ (-0.1015 - U.JY46i)’ I 3,:li C, :I first-order autoregressive model in the first dif-
‘239.3 ference of the natural logarithms of total popula-
~
_
241.6 tion. Exhibit 7 shows the autocorrelations and
243.Y the partial autocorrelation of the first logarithmic
( 246. I,1
differences A In P, = In P, - In P,_ , between lYO0
have to be solved in order to obtain the definite and 1988. The logarithmic differences are to be
solution interpreted as growth rates.
Exhibit 7
Autocorrelation and partial autocorrelation functions of J In P,.
Lag 1 2 3 3 5 6
Autocorrelation 0.781 0.633 0.598 O.S4 0.416 0.363
partial
Autocorrelation 0.7Xi 0.059 0.223 0 PO. 141 0.065
Lag 7 8 Y 10 11 12
Autocorrelation 0.337 0.204 0.122 0.066 0.028 -0.062
partial
Autocorrelation 0.012 -0.230 0.017 -0.104 0.041 PO. 106
If one neglects the slight negative trend which The general solution of the difference equation is
still remains in the growth rates (see Exhibit S),
then the autocorrelation function and the partial In P, = C, 1’ + C(O.782)’ + zt
autocorrelation function lead to the assumption
that the development of the logarithmic differ- If one takes the initial conditions P,, = 243.9
ences (growth rates), can be described, as in the (1987) and P, = 246.1 (1988) into consideration,
case of Kashyap and Rao, with an AR(l) model. one then obtains as the definite solution
The estimated model is
In P,= 5.4771 + 0.0197(0.7X2) + 0.0133t
A In P, = 0.0029 + 0.782A In P,_,.
or
(3.25) (12.42)
0 0197(0 7X2)‘+0.l11331
The model can be rearranged as a second-order P,= 239.15 e
difference equation, since A In P,= In P,-
In P, ,. It follows that In the long term, the AR(l) model of the first
logarithmic differences is identical to an ex-
In P, - 1.782In P,_, + 0.782 In P,_,= 0.0029. ponential growth model with a constant growth
rate (Y = 0.0133). The results of population pro-
The characteristic equation is jections up to the year 2080 with this model are
given in Exhibit 9.
A- - 1.782A + 0.782 = 0. Assuming a constant growth rate of about
1.33% annually. the population in the year 2080
The solutions are would rise to about X30 million. If one compares
the results of the ARIMA( 1,l ,O) model of the
A, = I .
logarithmic differences with the results of the
A, = 0.782. ARIMA(2,2,0) model of the initial values, then
,839
YEAR
Exhibit 8. US population (first logarithmic differences) from 1900 to 1988
336 P. PjCm-r~er I Forecusting US population lot&
one sees that the ARIMA( 1 ,l ,O) model leads to where r is the actual annual growth rate and i is
significantly higher population forecasts. If one the forecasted annual growth rate I periods in the
suspects a future decrease in the population future. The time horizon of the ex post forecast-
growth rate, then the ARIMA(2,2,0) model or ing ranged from 1 year to 60 years. The fitting
the quadratic time series model is better suited periods included the periods 1900-1930, 1900-
to modelling the population development, since 1940, . ( 1900-1970, 1900-1980.
this model implies a decrease in the growth rate.’ We first look at some short-term forecasts.
The same effect could be achieved with the Exhibit 10 presents the e, for projections made in
second logarithmic differences of the population the jump-off years 1930-1980.
totals. If the error e, is positive, then the actual
population has been overestimated. For exam-
ple, e, = 0.0020 means that the estimated annual
5. Forecasting accuracy growth rate was roughly 0.2 percentage points
higher than the actual rate. A negative error
The question here is whether the Box-Jenkins should be interpreted correspondingly. From
approach is capable of predicting future popula- Exhibit 10 it becomes apparent that the
Exhibit 10
Errors e, of short-term population forecasts with the BoxxJenkins model: (a) ARlMA(2.2.0) for P,; (b) ARIMA( 1,I .(I) for In I’,.
Years Jump-off year
ahead
1980 1970 1960 I950 1940 1930
Exhibit 11
Population projections for the United States in 1970 (actual population: 204.9 million).
In addition to the forecast comparison, the Land, K.C. and D. Cantor, 1983, -‘ARIMA models of
Box-Jenkins approach serves to investigate the seasonal variation in U.S. birth and death rates”,
Drmogruphy, 20, 541-56X.
plausibility of the assumptions in the component
Lee, K.D.. 1974, “Forecasting births in post-transition popu-
method. Furthermore, the uncertainty of popula- lations: Stochastic renewal with serially correlated
tion forecasts is taken into account by specifying fertility”, Journal of the American Statistical Association,
a confidence interval. Time series methods, espe- 69, 607-617.
cially the Box-Jenkins method, therefore should Mahmoud. E.. 1984. “Accuracy in forecasting: A survey”,
Journal of Forecasting, 3, 139-1.59.
not replace the component method in population
McDonald, J.. 1979, “A time series approach to forecasting
forecasting, but supplement it. Australian live births”, Demography, 16, 57.5-601.
Newbold. P. and C.W.J. Granger, 1974, “Experience with
forecasting univariate time series and the c(~mbinati~~n of
Acknowledgment forecasts”. Jourmcl of the Royal ~~t~ti~t~cu~ Society. 137,
131-165.
Pflaumer, P.. 1988a. “Confidence intervals for population
I wish to thank two anonymous referees for projections based on Monte Carlo methods”, Irzternatiunnl
helpful and valuabie comments. Journal of Forecasting, 4. 135-142.
Pflaumer. I?, IYHHb, Methoden der Bevolkerungvoraussthiit-
zung unier hesonderer Reriicksichtigung der Unsicherheit
(Duncker & Humblot. Berlin).
References
Pflaumer, P.. lYX8c, “The accuracy of U.N. population pro-
jections”. Americun Stati.yticul Associrrtion, Proceedings of
Ahlhurg, D.A. and J.W. Vaupel, 1990, “Alternative projec- the Social Statistics Section. 2Y9-304.
tions of the U.S. population”, Demography. 27. 639-652. Sahoia. J.L.M.. 1974. “Modeling and forecasting population
Box. G.E.P. and G.M. Jenkins. 1970, ‘%tc Series Analysis - time series”, Demogruph_y, 11, 4833492.
Forecustirrg and Conrrol (Holden-Day, San Francisco). Stoto, M.. lY83, “The accuracy of population projections”,
El-At&n, S.. 19X8, .‘Popuiation forecasting: An application four& ~?f the A~zeri~urz .~tut~.~t~~~I Association, 78, 13-20.
of the Box-Jenkins technique, American Stutjstic~I As- U.S. Bureau of the Census. IY89, Pr#;ection~ of‘thr Popula.
sociation. Proceedings of the Social Statistics Section. 30% tion of rhe United Stures by Age. Se.w and Race: i%W to
310. 2080. Current Population Reports. Series P-25. No. IO18
Gandolfo, S., 1971, Muthemaiical Methods and Models in (US Government Printing Office. Washington, D.C.).
Economic Dynamics (North-Holland, Amsterdam- Voss, P.R. and C.D. Palit. lY81, “Forecasting state popula-
London). tion using ARIMA time series techniques”, Technical
Goldberg, S.. 1958. Introduction to Difference Equations Series 70-6, llniversity of Wisconsin, Madison, WI.
(Addison-Wesley. New York).
Kashyap, R.L. and A.R. Rao, lY76, Dynamic Stochastic
Models from Empirical Data (Academic Press, New York, Biography: Peter PFLAUMER has been teaching statistics
and demography at the University of Dortmund since 1974.
San Francisco. London).
From 1981 to 1983 he was a visiting fellow at the Center for
K&mm. N .W.. 1990, Uncertainty in Nafional Popz~iatiofl
Population Studies of Harvard University in Cambridge. In
Forecusiing Issuer: 3uckgroun~. Ar~~ly.~es, Re~o~lmend~- 1989 he was appointed affiliated project director in the
tions (Swets & Z&linger. Amsterd~~m). long-term rest&h program ‘lnte~na~ionalization of the
Land. K.C.. 19%. “Methods for national population fore- Economv’ at the Universitv of Constance. Here he is working
casts: A review”. Jourrral c# the Ameriwn Sicrtisticoi As- on a study of the dem~~~phic effects of guest worker
sociutiofz. 81, X8X-901. migration.