Professional Documents
Culture Documents
Automatic Forecasting SnapStat
Automatic Forecasting SnapStat
Automatic Forecasting SnapStat
7/3/2009
Summary
The Automatic Forecasting SnapStat creates a one-page summary of forecasts generated for a
time series. Like the Automatic Forecasting procedure, this SnapStat tries a collection of
forecasting models and selects the one that gives the best fit according to a specified criterion.
Unlike that procedure, however, the SnapStat output is preformatted to fit on a single page.
Sample Data:
The file baseball.sgd contains the leading batting average in U. S. Major League Baseball for
each year between 1901 and 2004. Batting averages represent the proportion of times that a
player gets a hit out of all at-bats that result in either a hit or an out. The table below shows a
partial list of the data from that file. The batting averages are expressed as the number of points
out of 1000, such that a player batting 333 would have gotten a hit one-third of the time.
Time indices: time, date or other index associated with each observation. Each value in this
column must be unique and arranged in ascending order.
Sampling Interval: If time indices are not provided, this defines the interval between
successive observations. For example, the baseball data were collected once every year,
beginning in 1901.
Seasonality: the length of seasonality s, if any. The data is seasonal if there is a pattern that
repeats at a fixed period. For example, monthly data typically have a seasonality of s = 12.
Hourly data that repeat every day have a seasonality of s = 24. If no entry is made, the data is
assumed to be nonseasonal (s = 1).
Trading Days Adjustment: a numeric variable with n observations used to normalize the
original observations, such as the number of working days in a month. The observations in
the Data column will be divided by these values before being plotted or analyzed. There
Number of Forecasts: number of periods following the end of the data for which forecasts
are desired.
Output
The output from the SnapStat consists of a single page pf graphs and numerical statistics.
Leading average
forecast
ME=-1.077 MPE=-0.48%
95.0% lim
400
Lower Upper
Period Forecast 95% Limit 95% Limit 380
2005 366.743 330.772 402.715 360
2006 365.715 327.642 403.788
2007 365.591 326.667 404.515 340
2008 365.58 325.925 405.235
2009 365.579 325.216 405.943 320
2010 365.579 324.519 406.64 1900192019401960198020002020
0.6 28
Residual
0.2
8
-0.2
-0.6 -12
-1 -32
0 5 10 15 20 25 1900192019401960198020002020
lag
1500 80
50
1000 20
500 5
1
0 0.1
0 0.1 0.2 0.3 0.4 0.5 -32 -12 8 28 48
frequency Residual
Summary Statistics: table of summary statistics calculated from the one-period ahead
forecast errors (error made in forecasting the value at time t given all data through time t-1).
The statistics include the root mean squared error (RMSE), the mean absolute percentage
error (MAPE), and the mean absolute error (MAE), all of which measure the variability of
the one-period ahead forecast errors. Small values are preferred. The mean error (ME) and
mean percentage error (MPE) measure bias and should be close to zero.
Forecasts: table of forecasted values and probability limits. The forecasts are made given all
available data. The probability limits are calculated at the level specified on the Forecasting
tab of the Preferences dialog box, accessible via the Edit menu.
1. The observed data Yt, shown as point symbols, including any replacements for missing
values.
2. The one-step ahead forecasts Ft(1), displayed as a solid line through the points. These are
created using the fitted model, forecasting each time period t+1 using only the
information available at time t. The one-ahead forecast errors et are observable as the
vertical distance between the observations and the solid line.
3. Forecasts for future values Fn(k) made at time t = n, the last time at which observed data
is available. These are shown by the extension of the solid forecast line beyond the last
observation.
4. Probability limits for the forecasts at the 100(1-)% confidence level, calculated
assuming that the noise in the system follows a normal distribution.
For mathematical details regarding the calculations, see the documentation for the Forecasting
procedure.
Analysis Options
Display: if desired, the plot may be limited to the specified number of most recent
observations.
SnapStat Defaults
The defaults used by the Automatic Forecasting SnapStat are set on the Forecasting tab of the
Preferences dialog box under the Edit menu:
Models Included: specify the models that should be fit to the data. These are the models
from which the “best” model will be selected. Descriptions of each of the models are given
in the Forecasting documentation. For several of the models, additional options are provided:
Random walk model – check include constant to consider a model containing a constant
as well as one without a constant.
Moving average model – select the maximum span to consider. Models will be fit of
spans 2 through the number indicated.
ARIMA AR Terms – specify the maximum order p of the autoregressive terms in the
model.
ARIMA MA Terms – specify the maximum order q of the moving average terms in the
model. You may elect instead to consider only models for which q = p – 1.
The procedure fits each of the models indicated and selects the model that gives the smallest
value of the selected criterion. They are three criteria to choose from:
AIC 2 lnRMSE
2c
(1)
n
where RMSE is the root mean squared error during the estimation period, c is the number of
estimated coefficients in the fitted model, and n is the sample size used to fit the model. Notice
that the AIC is a function of the variance of the model residuals, penalized by the number of
estimated parameters. In general, the model will be selected that minimizes the mean squared
error without using too many coefficients (relative to the amount of data available).
Hannan-Quinn Criterion
The Hannan Quinn Criterion (HQC) is calculated from
2 p lnln(n)
HQC 2 lnRMSE (2)
n
This criterion uses a different penalty for the number of estimated parameters.
p lnn
SBIC 2 lnRMSE (3)
n
Again, the penalty for the number of estimated parameters is different than for the other criteria.