Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 54

TIME SERIES

J. REEVES WESLEY
An Introduction

• A time series is a set of observations obtained


by measuring a single variable regularly over a
period of time.

• Eg., a series of inventory data, market share of a


product might consist of weekly market share taken
over a few years, series of total sales figures

• What each of these examples has in common


• Some variable was observed at regular, known
intervals over a certain length of time

• The form of the data for a typical time series is a


single sequence or list of observations representing
measurements taken at regular intervals.

• Why TIME SERIES? To forecast future values


of the series.

• A model of the series that explained the past values


may also predict whether and how much the next
few values will increase or decrease.
• A univariate time series is a sequence of
measurements of the same variable collected
over time.  Most often, the measurements are
made at regular time intervals.

• One difference from standard linear regression


is that the data are not necessarily independent
and not necessarily identically distributed. 
• An analyst for a national broadband provider is
required to produce forecasts of user subscriptions
in order to predict utilization of bandwidth.
Forecasts are needed for each of the 85 local
markets that make up the national subscriber base.
Monthly historical data is collected
• It is always a good idea to have a feel for the nature
of your data before building a model.
• Does the data exhibit seasonal variations?
– Although the Expert Modeler will automatically find the
best seasonal or non-seasonal model for each series, you
can often obtain faster results by limiting the search to
non-seasonal models when seasonality is not present in
your data.
• Select Total Number of Subscribers and move it into
the Variables list.
• Select Date and move it into the Time Axis Labels
box.
• Click OK.
• The series exhibits a very smooth upward
trend with no hint of seasonal variations.
There might be individual series with
seasonality, but it appears that seasonality is
not a prominent feature of the data in general.
• Verify that Expert Modeler is selected in the
Method drop-down list. The Expert Modeler will
automatically find the best-fitting model for each of
the dependent variable series.

• The set of cases used to estimate the model is


referred to as the estimation period.
• If you are forecasting beyond the last case,
you will need to extend the forecast period.

• the data does not exhibit any seasonality, so


there is no need to consider seasonal models.
This reduces the space of models searched by
the Expert Modeler and can significantly
reduce computing time.
• You can set the estimation period by selecting
Based on time or case range in the Select
Cases dialog box.

• Deselect Expert Modeler considers seasonal


models in the Model Type group. Why?
Options tab
• Select First case after end of estimation period
through a specified date in the Forecast Period
group.
• In the Date grid, enter 2004 for the year and 3
for the month.
– The dataset contains data from January 1999
through December 2003. With the current
settings, the forecast period will be January 2004
through March 2004.
SAVE Tab
• Click the Save tab.
• Select (check) the entry for Predicted Values in
the Save column, and leave the default value
Predicted as the Variable Name Prefix.
• The model predictions are saved as new
variables in the active dataset, using the prefix
Predicted for the variable names.
STATISTICS Tab
• Click the Statistics tab.
• Select Display forecasts.
• This option produces a table of forecasted values for
each dependent variable series and provides
• another option—other than saving the predictions as
new variables—for obtaining these values.
– The default selection of Goodness of fit (in the Statistics for
Comparing Models group) produces a table with fit statistics—
such as R-squared, mean absolute percentage error, and
normalized BIC—calculated across all of the models. It provides
a concise summary of how well the models fit the data.
PLOTS Tab
• Deselect Series in the Plots for Individual Models
group. Why?

• We are more interested in saving the forecasts as


new variables than generating plots of the
forecasts.
• Select Mean absolute percentage error and
Maximum absolute percentage error in the Plots for
Comparing Models group.
• Select Mean absolute percentage error and
Maximum absolute percentage error in the Plots
for Comparing Models group.
– Absolute percentage error is a measure of how much a
dependent series varies from its model-predicted
level. By examining the mean and maximum across all
models, you can get an indication of the uncertainty in
your predictions.
Interpretation
• This histogram displays the mean absolute percentage error
(MAPE) across all models. It shows that all models display a
mean uncertainty of roughly 1%.

• This histogram displays the maximum absolute percentage error


(MaxAPE) across all models and is useful for imagining a worst-
case scenario for your forecasts. It shows that the largest
percentage error for each model falls in the range of 1 to 5%.
Do these values represent an acceptable amount of uncertainty?
This is a situation in which your business sense comes into play
because acceptable risk will change from problem to problem.
• Three new cases, containing the forecasts for
January 2004 through March 2004, have been
added to the dataset, along with automatically
generated date labels. Each of the new
variables contains the model predictions for
the estimation period (January 1999 through
December 2003), allowing you to see how well
the model fits the known values.
• You have now seen two approaches for
obtaining the forecasted values: saving the
forecasts as new variables in the active
dataset and creating a forecast table. With
either approach, you will have a number of
options available for exporting your forecasts
Summary
• Time Series Modeler to create models for
your time series data and to produce initial
forecasts based on available data.
Bulk Reforecasting by Applying Saved
Models
• Reuse these models to extend your forecasts as more
current data becomes available.

• In this scenario, you are an analyst for a national broadband


provider who is required to produce monthly forecasts of
user subscriptions for each of 85 local markets. You have
already used the Expert Modeler to create models and to
forecast three months into the future. Your data warehouse
has been refreshed with actual data for the original forecast
period, so you would like to use that data to extend the
forecast horizon by another three months.
• Analyze > Forecasting > Apply Models...
• Click Browse, then navigate to and select
Timeseriesoutput.xml
• Select Reestimate from data.
• Select First case after end of estimation period
through a specified date in the Forecast Period
group.
• In the Date grid, enter 2004 for the year and 6 for
the month.
– The dataset contains data from January 1999 through
March 2004. With the current settings, the forecast
period will be April 2004 through June 2004.
SAVE Tab
• Select (check) the entry for Predicted Values in
the Save column and leave the default value
Predicted as the Variable Name Prefix.
PLOTS Tab
• Deselect Series in the Plots for Individual
Models group.

• Click OK
MODEL FIT STATISTICS
• It provides a concise summary of how well the models,
with re-estimated parameters, fit the data.
• For instance, 95% of the models have a value of
MaxAPE (maximum absolute percentage error) that is
less than 3.676.
• Absolute percentage error is a measure of how much a
dependent series varies from its model-predicted level
and provides an indication of the uncertainty in your
predictions. The mean absolute percentage error varies
from a minimum of 0.669% to a maximum of 1.026%
across all models. The maximum absolute percentage
error varies from 1.742% to 4.373% across all models.
• So the mean uncertainty in each model’s
predictions is about 1% and the maximum
uncertainty is around 2.5% (the mean value of
MaxAPE), with a worst case scenario of about 4%.

• Whether these values represent an acceptable


amount of uncertainty depends on the degree of
risk you are willing to accept.
• The Data Editor shows the new variables containing
the model predictions. Although only two are
shown here, there are 85 new variables, one for
each of the 85 dependent series. The variable
names consist of the default prefix Predicted,
followed by the name of the associated dependent
variable (for example, Market_1), followed by a
model identifier (for example, Model_1).
• Three new cases, containing the forecasts for April
2004 through June 2004, have been added to the
dataset, along with automatically generated date
labels.
Using the Expert Modeler to Determine
Significant Predictors
A catalog company, interested in developing a
forecasting model, has collected data on monthly
sales of men’s clothing along with several series
that might be used to explain some of the variation
in sales. Possible predictors include the number of
catalogs mailed, the number of pages in the catalog,
the number of phone lines open for ordering, the
amount spent on print advertising, and the number
of customer service representatives. Are any of
these predictors useful for forecasting?
• You will use the Expert Modeler with all of the
candidate predictors to find the best model. Since
the Expert Modeler only selects those predictors
that have a statistically significant relationship with
the dependent series, you will know which
predictors are useful, and you will have a model for
forecasting with them.
Seasonality vs Non Seasonality
• Select Sales of Men’s Clothing and move it into
the Variables list.
• Select Date and move it into the Time Axis
Labels box.
• Click OK.
• The series exhibits numerous peaks, many of which
appear to be equally spaced, as well as a clear
upward trend.
• The equally spaced peaks suggests the presence of
a periodic component to the time series.
• There are also peaks that do not appear to be part
of the seasonal pattern and which represent
significant deviations from the neighboring data
points.
• These points may be outliers, which can and should
be addressed by the ExpertModeler.
• Select Parameter estimates.
– This option produces a table displaying all of the
parameters, including the significant predictors, for the
model chosen by the Expert Modeler.
• Deselect Forecasts.
– In the current example, we are only interested in
determining the significant predictors and building a
model. We will not be doing any forecasting.
• Select Fit values.
• This option displays the predicted values in the
period used to estimate the model. This period is
referred to as the estimation period, and it
includes all cases in the active dataset for this
example. These values provide an indication of how
well the model fits the observed values, so they are
referred to as fit values. The resulting plot will
consist of both the observed values and the fit
values.
• The model description table contains an entry for
each estimated model and includes both a model
identifier and the model type.

• In the current example, the dependent variable is


Sales of Men’s Clothing and the system-assigned
name is Model_1.
• The Time Series Modeler supports both exponential
smoothing and ARIMA models. Exponential
smoothing model types are listed by their
commonly used names such as Holt and Winters’
Additive. ARIMA model types are listed using the
standard notation of ARIMA(p,d,q)(P,D,Q), where p
is the order of autoregression, d is the order of
differencing (or integration), and q is the order of
moving-average, and (P,D,Q) are their seasonal
counterparts.
• The Expert Modeler has determined that sales of
men’s clothing is best described by a seasonal
ARIMA model with one order of differencing. The
seasonal nature of the model accounts for the
seasonal peaks that we saw in the series plot, and
the single order of differencing reflects the upward
trend that was evident in the data.
Model Statistics Table
• The model statistics table provides summary
information and goodness-of-fit statistics for
each estimated model.

• 2 predictors out of 5 used for prediction


Model fitness
• Time Series Modeler offers a number of different
goodness-of-fit statistics, we opted only for the
stationary R-squared value. This statistic provides an
estimate of the proportion of the total variation in
the series that is explained by the model and is
preferable to ordinary
• R-squared when there is a trend or seasonal pattern,
as is the case here. Larger values of stationary
• R-squared (up to a maximum value of 1) indicate
better fit.
• The Ljung-Box statistic, also known as the modified
Box-Pierce statistic, provides an indicationof whether
the model is correctly specified.
• A significance value less than 0.05 implies that there is
structure in the observed series which is not
accounted for by the model.
• The value of 0.984 shown here is not significant, so we
can be confident that the model is correctly specified.
• The Expert Modeler detected nine points that were
considered to be outliers.
• Each of these points has been modeled appropriately,
so there is no need for you to remove them from the
series.
ARIMA Model Parameters Table
• The ARIMA model parameters table displays values
for all of the parameters in the model, with an entry
for each estimated model labeled by the model
identifier.

• We already know from the model statistics table


that there are two significant predictors. The model
parameters table shows us that they are the
Number of Catalogs Mailed and the Number of
Phone Lines Open for Ordering.
SUMMARY
• Build a model and identify significant predictors,
and you have saved the resulting model to an
external file.

• You are now in a position to use the Apply Time


Series Models procedure to experiment with
alternative scenarios for the predictor series and
see how the alternatives affect the sales forecasts.
Experimenting with Predictors by
Applying Saved Models
• Sales of men’s clothing from January 1989 through December
1998, along with several series that are thought to be potentially
useful as predictors of future sales. The Expert Modeler has
determined that only two of the five candidate predictors are
significant: the number of catalogs mailed and the number of
phone lines open for ordering. When planning your sales
strategy for the next year, you have limited resources to print
catalogs and keep phone lines open for ordering. Your budget for
the first three months of 1999 allows for either 2000 additional
catalogs or 5 additional phone lines over your initial projections.
Which choice will generate more sales revenue for this three-
month period?
• When you’re creating forecasts for dependent
series with predictors, each predictor series
needs to be extended through the forecast
period. Unless you know precisely what the
future values of the predictors will be, you’ll
need to estimate them. You can then modify
the estimates to test different predictor
scenarios. The initial projections are easily
created by using the Expert Modeler.
Seasonal Decomposition
Removing Seasonality from Sales Data

• A catalog company is interested in modeling the


upward trend of sales of its men’s clothing line on a
set of predictor variables (such as the number of
catalogs mailed and the number of phone lines
open for ordering). To this end, the company
collected monthly sales of men’s clothing for a 10-
year period. This information is collected in
catalog.sav
• Select Sales of Men’s Clothing and move it into
the Variables list.
• Select Date and move it into the Time Axis
Labels list.
• Click OK.
• The series exhibits a number of peaks, but they do not
appear to be equally spaced. This output suggests that
if the series has a periodic component, it also has
fluctuations that are not periodic—the typical case for
real-time series. Aside from the small-scale
fluctuations, the significant peaks appear to be
separated by more than a few months. Given the
seasonal nature of sales, with typical highs during the
December holiday season, the time series probably has
an annual periodicity. Also notice that the seasonal
variations appear to grow with the upward series
trend, suggesting that the seasonal variations may be
proportional to the level of the series, which implies a
multiplicative model rather than an additive model.
• Examining the autocorrelations and partial
autocorrelations of a time series provides a more
quantitative conclusion about the underlying
periodicity.

• From the menus choose:


– Analyze > Forecasting > Autocorrelations...
• Select Sales of Men’s Clothing and move it into the
Variables list.
• Click OK.
• The autocorrelation function shows a
significant peak at a lag of 1 with a long
exponential tail—a typical pattern for time
series. The significant peak at a lag of 12
suggests the presence of an annual seasonal
component in the data. Examination of the
partial autocorrelation function will allow a
more definitive conclusion.
• The significant peak at a lag of 12 in the partial
autocorrelation function confirms the
presence of an annual seasonal component in
the data.

• To set an annual periodicity:


– From the menus choose:
– Data > Define Dates...
• Select Years, months in the Cases Are list.
• Enter 1989 for the year and 1 for the month.
• Click OK.
– This sets the periodicity to 12 and creates a set of
date variables that are designed to work with
Forecasting procedures
SEASONAL DECOMPOSITION
• To run the Seasonal Decomposition procedure:
• From the menus choose:
• Analyze > Forecasting > Seasonal Decomposition…
• Right click anywhere in the source variable list and
from the context menu select Display Variable
• Names.
• Select men and move it into the Variables list.
• Select Multiplicative in the Model Type group.
• Click OK.
Understanding the Output
• The Seasonal Decomposition procedure creates four new variables for
each of the original variables analyzed by the procedure.
• SAF. Seasonal adjustment factors, representing seasonal variation. For
the multiplicative model, the value 1 represents the absence of seasonal
variation; for the additive model, the value 0 represents the absence of
seasonal variation.
• SAS. Seasonally adjusted series, representing the original series with
seasonal variations removed. Working with a seasonally adjusted series,
for example, allows a trend component to be isolated and analyzed
independent of any seasonal component.
• STC. Smoothed trend-cycle component, which is a smoothed version of
the seasonally adjusted series that shows both trend and cyclic
components.
• ERR. The residual component of the series for a particular observation.
• To plot the seasonally adjusted series:
• Open the Sequence Charts dialog box.
• Click Reset to clear any previous selections.
• Right click anywhere in the source variable list, and
from the context menu select Display Variable
Names.
• Select SAS_1 and move it into the Variables list.
• Click OK.
• Using the Seasonal Decomposition procedure,
you have removed the seasonal component of
a periodic time series to produce a series that
is more suitable for trend analysis. xamination
of the autocorrelations and partial
autocorrelations of the time series was useful
in determining the underlying periodicity—in
this case, annual.

You might also like