SAS Introduction To Time Series Forecasting-Libre

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Quick Review about How to Use SAS to

Analyze Time Series Data


1. Get to know SAS
How to Start SAS?

)f you use computer in this laboratory, please start SAS from Desktop or Start/programs .

You can use the SAS software at the laboratory of the Computer center of our university, or even
by the server of our university if you have the permission.

You can get a temporary license of the SAS software by contacting our computer assistant.

Five main windows

Program Editor -- Edit SAS programs


Log Records the running messages of SAS session, which is very helpful for program

debugging.

Output Display output from SAS procedures

Explorer Manage SAS datasets or Create new libraries

Result Show a tree-like summary of your Output window

Several important shortcuts

Open a new Program Editor window

Open SAS program which is composed before

Save your program as external files

Create a new library

Open Explorer window to manage SAS datasets

Submit the whole program or just submit a few lines SAS programs to SAS System

2. How to use SAS


Two important concepts

SAS library A folder in which the SAS data set is. You can create a new library by libname or
shortcut

SAS data set Temporary and Permanent SAS data set.

Structure of SAS program

DATA step Deal with SAS dataset, or change raw data into a SAS data set, which can be
identified by SAS System and dealt with by PROC step
=====================================
DATA dataset name;
INPUT variable<format>;
CARDS;
.. data line

=====================================
The dataset name must contain no more than 8 characters alphabet a, b , digit
underscore (_)), and begin with alphabet or underscore.

PROC step Deal with SAS data set, and output results of analysis

=====================================
PROC procedure name DATA= dataset name;
RUN;
=====================================

, or

The procedure name is the name of SAS Command, and includes PRINT, PLOT, GPLOT, and
INSIGHT etc.

3. Change raw data into SAS dataset


Create a new library
Library Name

Lib1

Physical Path

D:\example

Using SAS program.

Using shortcut.

Libname lib D:\example

SAS data set name


library_name.dataset_name

For example, lib1.blood means that data set blood is saved in the library lib1.

The library_name can be sashelp, sasuser, maps, work or lib1. The dataset_name is due to you,
such as blood.

When library_name is equal to work, the data set work.dataset_name is temporary SAS data set,
which will be deleted automatically when you shut down the SAS software. At this time, the
work can be ignored. For example, you use blood or work.blood as the name of the data set.

Three methods to deal with data through DATA Step

The size of raw data is small.

DATA dataset name;


INPUT variable <format>;
CARDS;
. data line)
;

The data are saved in some file.


DATA dataset name;
INFILE physical path ;
INPUT variable <format>;
RUN;

The data that you want to deal with are also SAS data set.
DATA dataset name;
SET dataset name that you want to deal with;
RUN;

4. SAS Application without programming


SAS/INSIGHT

How to Start SAS/INSIGHT?


o PROC INSIGHT DATA=dataset name; RUN;
o

Solutions --- Analysis --- )nteractive Data Analysis

It can be used to draw Several Types Graph such as Line Plot, Scatter Plot, Rotating Plot, 3dimensions Scatter Plot Matrix, etc.

It can be used to do some simple statistical analysis.

5. How to use SAS in time series analysis


Time Series Forecasting System (without programming)

Solutions Analysis Time series forecasting system

Using SAS procedure

AR)MA and AUTOREG procedures.

6. Some commonly used options in the ARIMA procedure


Syntax
PROC ARIMA options;
BY variables;
IDENTIFY VAR=variable options;
ESTIMATE options;

OUTLIER options;
FORECAST options;
RUN;
QUIT;
BY

A BY statement can be used in the ARIMA procedure to process a data set in groups of
observations defined by the BY variables. Note that all IDENTIFY, ESTIMATE, and FORECAST
statements specified are applied to all BY groups.

IDENTIFY

ALPHA= significance-level: The ALPHA= option specifies the significance level for tests in the
IDENTIFY statement. The default is 0.05.
ESACF: computes the extended sample autocorrelation function and uses these estimates to
tentatively identify the autoregressive and moving average orders of mixed models.
The ESACF option generates two tables. The first table displays extended sample
autocorrelation estimates, and the second table displays probability values that can be used to
test the significance of these estimates. The P= (pmin: pmax) and Q= (qmin: qmax) options
determine the size of the table.
NLAG= number: indicates the number of lags to consider in computing the autocorrelations and
cross-correlations.
STATIONARITY=(ADF= AR orders DLAG= s) or STATIONARITY=(DICKEY= AR orders DLAG= s):
performs augmented Dickey-Fuller tests. If the DLAG=s option specified with s is greater than
one, seasonal Dickey-Fuller tests are performed. The maximum allowable value of s is 12. The
default value of s is one.
VAR= variable ( d1, d2, ..., dk ) : names the variable containing the time series to analyze. The
VAR= option is required. A list of differencing lags can be placed in parentheses after the
variable name to request that the series be differenced at these lags. For example, VAR=X(1)
takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with
lag 1, producing a second difference series, which is (Xt-Xt-1)-(Xt-1-Xt-2)=Xt-2Xt-1+Xt-2 .
VAR=X(2) differences X once at lag two (Xt-Xt-2) . If differencing is specified, it is the
differenced series that is processed by any subsequent ESTIMATE statement.

ESTIMATE

METHOD=ML/ULS /CLS: specifies the estimation method to use. METHOD=ML specifies the
maximum likelihood method. METHOD=ULS specifies the unconditional least-squares method.
METHOD=CLS specifies the conditional least-squares method. METHOD=CLS is the default.
P= order: specifies the autoregressive part of the model. By default, no autoregressive
parameters are fit. P=(l1, l2, ..., lk) defines a model with autoregressive parameters at the

specified lags. P= order is equivalent to P=(1, 2, ..., order). A concatenation of parenthesized lists
specifies a factored model. For example, P=(1,2,5)(6,12) specifies the autoregressive model

Q= order: specifies the moving average part of the model.


NOCONSTANT/NOINT: suppresses the fitting of a constant (or intercept) parameter in the
model. (That is, the parameter is omitted.)
PLOT: plots the residual autocorrelation functions. The sample autocorrelation, the sample
inverse autocorrelation, and the sample partial autocorrelation functions of the model residuals
are plotted.

FORECAST

ALPHA= n: sets the size of the forecast confidence limits. The ALPHA= value must be between 0
and 1. When you specify ALPHA=, the upper and lower confidence limits will have a confidence
level. The default is ALPHA=.05, which produces 95% confidence intervals. ALPHA values are
rounded to the nearest hundredth.
ID= variable: names a variable in the input data set that identifies the time periods associated
with the observations.

INTERVAL= interval /n: specifies the time interval between observations.

LEAD= n: specifies the number of multistep forecast values to compute.

OUT= SAS-data-set: writes the forecast (and other values) to an output data set.

Fitting the ARIMA Model to a Simulated Time Series

0. Simulate an AR(2) time series data

The model: Z(t)=0.5*Z(t-1)+0.4Z(t-2)+a(t)


The SAS program:
/* Create a new library */
libname ts 'D:/TimeSeries';
/* Simulate an AR(2) process */
data ts.ar;
z1=0; z2=0;
do t = -50 to 200;
a = rannor( 32565 );
z = z1*0.5 + z2*0.4 + a;
if t > 0 then output;
z2=z1; z1=z;
end;
keep z t;
run;

Simulate an MA(2):
/* Simulate an MA(2) process */
data ts.ma;
a1=0; a2=0;
do t = -50 to 200;
a = rannor( 32565 );
z = a + a1*0.2+a2*0.5;
if t > 0 then output;
a2=a1; a1=a;
end;
keep z t;
run;

Simulate an ARMA(1,1):
/* Simulate an ARMA(1,1) process */
data ts.arma;
z1=0; a1=0;
do t = -50 to 200;
a = rannor( 32565 );
z = z1*0.5 + a + a1*0.3;
if t > 0 then output;
a1=a; z1=z;
end;
keep z t;
run;

1. Draw the time plot

The SAS program:


/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.ar;
plot z*t;
run;
quit;

The result:
Simulated AR(2) Time Series

2. Identify some suitable models

The SAS program:


/* Identify some suitable models with minimum requirement */
proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take differencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);
run;
quit;

The summary of the output:

The detailed output without differencing:

Series Correlation Panel

different values of k
3 different tests
3 deterministic trends

The detailed output after first differencing:


Series Correlation Panel

We may reach three possible models:


ARIMA(3,0,0); ARIMA(0,1,1); and ARIMA(2,1,0).

3. Estimate the models


Candidate models: AR(3), ARMA(3,1) with AR coefficient at lag 2 suppressed and ARIMA(2,1,0)
without intercept.
The SAS program:
/* Identify some suitable models with minimum requirement */
proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take diffferencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);
run;
/* Use CLS method to estimate the AR(3) model */
identify var=z;
run;
estimate method=cls p=3 plot;
run;
/* Use ULS method to estimate the ARMA(3,1) model */
/* with the second coefficient is suppressed */
estimate method=uls p=(1,3) q=1 plot;
run;
/* Use ML method to estimate the ARIMA(2,1,0) model without
intercept */
identify var=z(1);
run;
estimate method=ml p=2 noint plot;
run;
quit;

The summary of the output:

The estimated AR(3) model:

The important outputs for the fitted AR(3) model:

Estimated
parameters

Mean

Intercep
Variance of the
white noise

Standard deviation
of the white noise

P values of
significance

Outputs for ARMA(3,1) with AR coefficient at lag 2 suppressed:

Outputs for ARIMA(2,1,0) without intercept:

4. Diagnostic checking for the fitted ARIMA(2,1,0)


The SAS program:
/* Diagnostic checking for the fitted ARIMA(2,1,0) */
proc arima data=ts.ar;
identify var=z(1);
run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.dc lead=0 id=t;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.dc;
plot residual*t;
run;
quit;
/* Perform the normality test */
proc univariate data=ts.dc normal plot;
var residual;
run;

The summary of the output:

The time plot:

A normality test:

Distribution plot and Q-Q plot for normality:

Sample autocorrelation function (ACF) of the residuals and Sample partial ACF of the residuals:

Ljung-Box test:
Test statistic

Degree of
freedom

P-values

Analysis of over-parameterized models:


o The SAS program:
/* Analysis of over-parameterized models */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
estimate method=ml p=(1,2)(6) noint plot;
run;
estimate method=ml p=2 q=(6) noint plot;
run;
quit;

o The first over-parameterized model based on the sample partial ACF:

o The second over-parameterized model based on the sample ACF:

o Three fitted models:

o Conclusion is that the fitted ARIMA(2,1,0) is not adequate!

5. Do forecasting with the fitted ARIMA(2,1,0) model


The SAS program:
/* Do forecasting by using the fitted ARIMA(2,1,0) model */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.out lead=50 id=t;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot z*t=1 forecast*t=2 l95*t=3 u95*t=3/overlay;
run;
quit;

The results:

Fitting the Seasonal ARIMA Model to


The Airline Passenger Data
0. The data

The airline passenger data records the number of passengers traveling by air per month from
January, 1949 to December, 1960.
It is given as Series G in Box and Jenkins (1976), and has been used in time series analysis
literature as a standard example of a non-stationary seasonal time series.

1. Draw the time plot

The SAS program:


/* Create a new library */
libname ts 'D:/TimeSeries';
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=sashelp.air;
plot air*date;
run;
quit;

The time plot:

Taking log transformation and drawing the time plot again.


/* Take log transformation*/
data ts.lair;
set sashelp.air;
lair=log(air);
run;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.lair;
plot lair*date;
run;
quit;

The time plot:

2. Identify some suitable models

The SAS program:

/* Identify some suitable models*/


proc arima data=ts.lair;
identify alpha=0.05 var=lair;
run;
/* Take differencing since the sample ACF decays slowly */
identify alpha=0.05 var=lair(1);
run;
/* Take seasonal differencing since the sample ACF decays slowly
especially after periods */
identify alpha=0.05 var=lair(1,12);
run;

The sample ACF of original sequence:

The sample ACF of the sequence after common differencing:

The sample ACF of the sequence after both common differencing and seasonal differencing:

3. Estimate the seasonal ARIMA(0,1,1)X(0,1,1)12 model

The SAS program:


proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;

The estimated model:

4. Diagnostic checking the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model

The SAS program:


proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;
/* Diagnostic checking by overfit AR part */
estimate method=ml p=(9) q=(1)(12) plot;
run;
/* Diagnostic checking by overfit MA part */
estimate method=ml q=(1)(12)(23) plot;
run;
/* Export the data to do further diagnostic checking*/
forecast out=ts.out lead=0 id=date;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot residual*date;
run;
quit;
/* Perform the normality test */
proc univariate data=ts.out normal plot;
var residual;
run;

The sample ACF of residuals:

The sample PACF of residuals:

Ljung-Box test:

Diagnostic checking by overfitting the AR part and the MA part:

Compare the
estimated
coefficients

Compare
the model
criteria

The time plot of the residuals:

Normality tests:

Distribution plot and Q-Q plot for normality:

5. Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model

The SAS program:

/* Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model */


proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
estimate method=ml q=(1)(12) plot;
run;
forecast out=ts.out lead=24 id=date interval=month;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot lair*date=1 forecast*date=2 l95*date=3 u95*date=3/overlay;
run;
quit;

10

The result:

You might also like