Automatic Forecasting SnapStat

STATGRAPHICS – Rev.
7/3/2009
Automatic Forecasting SnapStat
Summary
The Automatic Forecasting SnapStat creates a one-page summary of forecasts generated for a
time series. Like the Automatic Forecasting procedure, this SnapStat tries a collection of
forecasting models and selects the one that gives the best fit according to a specified criterion.
Unlike that procedure, however, the SnapStat output is preformatted to fit on a single page.
Sample StatFolio: autocastsnapstat.sgp
Sample Data:
The file baseball.sgd contains the leading batting average in U. S. Major League Baseball for
each year between 1901 and 2004. Batting averages represent the proportion of times that a
player gets a hit out of all at-bats that result in either a hit or an out. The table below shows a
partial list of the data from that file. The batting averages are expressed as the number of points
out of 1000, such that a player batting 333 would have gotten a hit one-third of the time.
Year Leading average

1901 422
1902 376
1903 355
1904 381
1905 377
1906 358
1907 350
1908 354
1909 377
1910 385
… …
2004 372
Forecasts are desired for the next several years.
 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 1

STATGRAPHICS – Rev. 7/3/2009
Data Input
The data input dialog box requests the name of the column containing the time series data and
information about how it was sampled:
 Data: numeric column containing n equally spaced numeric observations.
 Time indices: time, date or other index associated with each observation. Each value in this
column must be unique and arranged in ascending order.
 Sampling Interval: If time indices are not provided, this defines the interval between
successive observations. For example, the baseball data were collected once every year,
beginning in 1901.
 Seasonality: the length of seasonality s, if any. The data is seasonal if there is a pattern that
repeats at a fixed period. For example, monthly data typically have a seasonality of s = 12.
Hourly data that repeat every day have a seasonality of s = 24. If no entry is made, the data is
assumed to be nonseasonal (s = 1).
 Trading Days Adjustment: a numeric variable with n observations used to normalize the
original observations, such as the number of working days in a month. The observations in
the Data column will be divided by these values before being plotted or analyzed. There

must be enough entries in this column to cover both the observed data and the number of
periods for which forecasts are requested.
 Select: subset selection.
 Number of Forecasts: number of periods following the end of the data for which forecasts
are desired.

Output
The output from the SnapStat consists of a single page pf graphs and numerical statistics.
SnapStat: Automatic Forecasting Time Series Plot

Data variable: Leading average ARIMA(0,1,1)
440
actual
RMSE=17.7 MAE=13.96 MAPE=3.81% 420
Leading average
forecast
ME=-1.077 MPE=-0.48%
95.0% lim
400
Lower Upper
Period Forecast 95% Limit 95% Limit 380
2005 366.743 330.772 402.715 360
2006 365.715 327.642 403.788
2007 365.591 326.667 404.515 340
2008 365.58 325.925 405.235
2009 365.579 325.216 405.943 320
2010 365.579 324.519 406.64 1900192019401960198020002020
Residual Autocorrelations Residual Plot

1 48
Autocorrelations
0.6 28
Residual
0.2
8
-0.2
-0.6 -12
-1 -32
0 5 10 15 20 25 1900192019401960198020002020
lag
Residual Periodogram Normal Probability Plot

2500 99.9
99
2000 95
percentage
Ordinate
1500 80
50
1000 20
500 5
1
0 0.1
0 0.1 0.2 0.3 0.4 0.5 -32 -12 8 28 48
frequency Residual

Model Statistics and Forecasts (top left)
The top left section of the output summarizes the selected forecasting model, which in this case
is an ARIMA(0,1,1) model. Included are:
 Summary Statistics: table of summary statistics calculated from the one-period ahead
forecast errors (error made in forecasting the value at time t given all data through time t-1).
The statistics include the root mean squared error (RMSE), the mean absolute percentage
error (MAPE), and the mean absolute error (MAE), all of which measure the variability of
the one-period ahead forecast errors. Small values are preferred. The mean error (ME) and
mean percentage error (MPE) measure bias and should be close to zero.
 Forecasts: table of forecasted values and probability limits. The forecasts are made given all
available data. The probability limits are calculated at the level specified on the Forecasting
tab of the Preferences dialog box, accessible via the Edit menu.
Time Sequence Plot (top right)

The plot shows:
1. The observed data Yt, shown as point symbols, including any replacements for missing
values.
2. The one-step ahead forecasts Ft(1), displayed as a solid line through the points. These are
created using the fitted model, forecasting each time period t+1 using only the
information available at time t. The one-ahead forecast errors et are observable as the
vertical distance between the observations and the solid line.
3. Forecasts for future values Fn(k) made at time t = n, the last time at which observed data
is available. These are shown by the extension of the solid forecast line beyond the last
observation.
4. Probability limits for the forecasts at the 100(1-)% confidence level, calculated
assuming that the noise in the system follows a normal distribution.
For mathematical details regarding the calculations, see the documentation for the Forecasting
procedure.
Residual Autocorrelations (center left)

The residual autocorrelations measure the correlations amongst the residuals from the fitted
forecasting model. If the model has captured all of the dynamic structure in the data, then the
residuals should be random (white noise). In such a case, all of the estimates should be within the
probability limits, as in the above plot.

Residual Plot (center right)
This plot shows the data in sequential order. It can be helpful in finding outliers or identifying
trends that the forecasting model has missed. Ideally, the residuals should behave like a random
set of observations from a normal distribution.
Residual Periodogram (bottom left)

The residual periodogram can be used to identify cyclical components that have not been
captured by the forecasting model. The periodogram plots the power remaining at each of the
Fourier frequencies. If the residuals are random, there should approximately equal power at all
frequencies, which is why a random time series is often called “white noise”. Any large spikes
could indicate a cycle at a fixed frequency that, if modeled, might improve the forecasts.
Residual Normal Probability Plot (bottom right)

The normal probability plot is used to determine whether the residuals left behind by the
forecasting model follow a normal distribution. If so, they should fall approximately along the
reference line. A plot such as that displayed above, which shows some curvature in the tails, is
indicative of a situation where the data have some positive skewness. In such cases, it made be
helpful to transform the data using Analysis Options.
Analysis Options
 Display: if desired, the plot may be limited to the specified number of most recent
observations.
 Transformation: the transformation to be applied to the data, if any. If Box-Cox is selected,

the program will automatically determine an appropriate power transformation to normalize
the data, after adding the specified Addend to each data value. Note: the Box-Cox option can
be very time-consuming if many models are being compared, since the program will fit every
model at each iteration of the Box-Cox optimization algorithm.

SnapStat Defaults
The defaults used by the Automatic Forecasting SnapStat are set on the Forecasting tab of the
Preferences dialog box under the Edit menu:
 Models Included: specify the models that should be fit to the data. These are the models
from which the “best” model will be selected. Descriptions of each of the models are given
in the Forecasting documentation. For several of the models, additional options are provided:
Random walk model – check include constant to consider a model containing a constant
as well as one without a constant.
Moving average model – select the maximum span to consider. Models will be fit of
spans 2 through the number indicated.
ARIMA AR Terms – specify the maximum order p of the autoregressive terms in the
model.
ARIMA MA Terms – specify the maximum order q of the moving average terms in the
model. You may elect instead to consider only models for which q = p – 1.
ARIMA Differencing – specify the maximum order of differencing d. Select Include

constant to consider models that include a constant term when differencing is performed.

 Information Criterion: the criterion used to select the best model.
 Forecast Limits: percentage used for the forecast probability limits.
The procedure fits each of the models indicated and selects the model that gives the smallest
value of the selected criterion. They are three criteria to choose from:
Akaike Information Criterion

The Akaike Information Criterion (AIC) is calculated from
AIC  2 lnRMSE  
2c
(1)
n
where RMSE is the root mean squared error during the estimation period, c is the number of
estimated coefficients in the fitted model, and n is the sample size used to fit the model. Notice
that the AIC is a function of the variance of the model residuals, penalized by the number of
estimated parameters. In general, the model will be selected that minimizes the mean squared
error without using too many coefficients (relative to the amount of data available).
Hannan-Quinn Criterion
The Hannan Quinn Criterion (HQC) is calculated from
2 p lnln(n) 
HQC  2 lnRMSE  (2)
n
This criterion uses a different penalty for the number of estimated parameters.
Schwarz-Bayesian Information Criterion

The Schwarz-Bayesian Information Criterion (SBIC) is calculated from
p lnn 
SBIC  2 lnRMSE  (3)
n
Again, the penalty for the number of estimated parameters is different than for the other criteria.

Automatic Forecasting SnapStat

Uploaded by

Copyright:

Available Formats

You might also like

Automatic Forecasting SnapStat

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Automatic Forecasting SnapStat

Uploaded by

Copyright:

Available Formats

STATGRAPHICS – Rev.

Automatic Forecasting SnapStat

Sample StatFolio: autocastsnapstat.sgp

Year Leading average

Forecasts are desired for the next several years.

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 1

 Data: numeric column containing n equally spaced numeric observations.

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 2

 Select: subset selection.

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 3

SnapStat: Automatic Forecasting Time Series Plot

Residual Autocorrelations Residual Plot

Residual Periodogram Normal Probability Plot

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 4

Time Sequence Plot (top right)

Residual Autocorrelations (center left)

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 5

Residual Periodogram (bottom left)

Residual Normal Probability Plot (bottom right)

 Transformation: the transformation to be applied to the data, if any. If Box-Cox is selected,

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 6

ARIMA Differencing – specify the maximum order of differencing d. Select Include

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 7

 Information Criterion: the criterion used to select the best model.

 Forecast Limits: percentage used for the forecast probability limits.

Akaike Information Criterion

Schwarz-Bayesian Information Criterion

 2009 by StatPoint Technologies, Inc. Automatic Forecasting SnapStat - 8

You might also like