Time Series Analysis: Example: Stationary ARIMA

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

4/19/2011

Time Series Analysis


ARIMA Modeling

Example: stationary ARIMA

4/19/2011

Seasonally differenced using a seasonal


harmonic model

Checking residuals for stationarity


> adf.test(flu.shgls$residuals, alternative = "stationary")
Augmented Dickey-Fuller Test
data:

flu.shgls$residuals

Dickey-Fuller = -4.8395, Lag order = 5, p-value = 0.01


alternative hypothesis: stationary

4/19/2011

Choosing a TS model

Model fitting
> flu.ar<-arima(flu.shgls$residuals, order=c(2,0,0))
> flu.ar$aic
[1] -349.0261
> flu.ma<-arima(flu.shgls$residuals, order=c(0,0,1))
> flu.ma$aic
[1] -354.4418
> flu.arma<-arima(flu.shgls$resid, order=c(1,0,1))
> flu.arma$aic
[1] -352.8236
> flu.arma2<-arima(flu.shgls$resid, order=c(0,0,2))
> flu.arma2$aic
[1] -352.9634

4/19/2011

Cool code for finding the best ARMA


best.order<-c(0,0,0)
best.aic<-Inf
for (i in 0:2) for (j in 0:2) {
fit.aic<-AIC(arima(resid(flu.shgls), order=c(i,0,j)))
if (fit.aic < best.aic) {
best.order <- c(i,0,j)
best.arma <- arima(resid(flu.shgls), order=best.order)
best.aic <-fit.aic
}}

Results
> best.arma
Series: resid(flu.shgls)
ARIMA(0,0,1) with non-zero mean
Call: arima(x = resid(flu.shgls), order = best.order)
Coefficients:

s.e.

ma1

intercept

0.7003

0.0001

0.0696

0.0091

sigma^2 estimated as 0.003797:


AIC = -354.44

AICc = -354.25

log likelihood = 180.22


BIC = -345.79

Results are exactly the same as the flu.arma2 model we ran by


hand!

4/19/2011

Is the model adequate?

Is the model adequate?

4/19/2011

Is the model adequate?


> Box.test(best.arma$resid, lag = 10, type = "Ljung", fitdf=2)
Box-Ljung test
data:

best.arma$resid

X-squared = 6.2012, df = 8, p-value = 0.6247

Forecasting

Predict() function can be used to forecast future values


from a fitted regression model and ARMA model

Sum the two to give a forecast for the overall series

>
>
>
>

new.T<-time(ts(start=1979, end=c(1981, 12), fr=12))


TIME<-(new.T-mean(time(flu)))/sd(time(flu))
SIN<-COS<-matrix(nr=length(new.T), nc=6)
for (i in 1:6) {
COS[,i] <-cos(2*pi*i*time(new.T))
SIN[,i] <-sin(2*pi*i*time(new.T)) }
> SIN<-SIN[,-(3:6)]
> COS<-COS[,-(5:6)]
> new.data<-data.frame(TIME=as.vector(TIME), SIN=SIN, COS=COS)
> predict.gls<-predict(flu.shgls, new.data)
> predict.arma<-predict(best.arma, n.ahead=36)
> flu.pred<-ts((predict.gls+predict.arma$pred), st=1979, freq=12)

4/19/2011

Forecasting
> ts.plot(cbind(flu, flu.pred),lty=1:2, col=1:2)

What are my predicted values?


> flu.pred
Jan

Feb

Mar

Apr

May

Jun

Jul

1979 0.3924846 0.3864422 0.2830972 0.2245747 0.1580188 0.1349402 0.1576668


1980 0.4278026 0.3758563 0.2725114 0.2139888 0.1474329 0.1243543 0.1470809
1981 0.4172167 0.3652704 0.2619255 0.2034029 0.1368470 0.1137684 0.1364950
Aug

Sep

Oct

Nov

Dec

1979 0.1497663 0.1524658 0.1684255 0.1899788 0.3101741


1980 0.1391804 0.1418799 0.1578396 0.1793929 0.2995882
1981 0.1285945 0.1312940 0.1472537 0.1688070 0.2890023

4/19/2011

What is an ARIMA model?

Type of ARMA model that can be used with some kinds of


non-stationary data

Useful for series with stochastic trends

First order or simple differencing

Series with deterministic trends should be differenced first


then an ARMA model applied

The I in ARIMA stands for integrated, which basically


means youre differencing

Integrated at the order d (e.g., the dth difference)

Example: non-stationary TS

4/19/2011

Logged series

ACF/PACF for a non-stationary TS

4/19/2011

One more test


> adf.test(lngas, alternative="stationary")
Augmented Dickey-Fuller Test
data:

lngas

Dickey-Fuller = -1.2118, Lag order = 5, p-value = 0.9021


alternative hypothesis: stationary

R code for choosing the best ARIMA model


> get.best.arima <- function(x.ts, maxord = c(1,1,1))
{
best.aic <- 1e8
n <- length(x.ts)
for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in
0:maxord[3])
{
fit <- arima(x.ts, order = c(p,d,q))
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.model <- c(p,d,q)
}}
list(best.aic, best.fit, best.model)
}
> get.best.arima(lngas, maxord=c(2,2,2))

10

4/19/2011

Results
Series: x.ts
ARIMA(0,1,1)
Call: arima(x = x.ts, order = c(p, d, q))
Coefficients:
ma1
0.5243
s.e.

0.0646

sigma^2 estimated as 0.001701:


AIC = -629.06

AICc = -628.99

log likelihood = 316.53


BIC = -622.68

BUT, the basic ARIMA output isnt stored anywhere, so we


cant do anything with it!
Plus, wheres our intercept term?!?

A caution about R

When fitting ARIMA models with R, an intercept term is


NOT included in the model if there is any differencing

We need to force R to do this, using an extra term in the


arima() function

xreg=t or xreg=1:length(lngas)

R will automatically calculate the intercept for us and call


it 1:length(lngas)

http://www.stat.pitt.edu/stoffer/tsa3/Rissues.htm

11

4/19/2011

Correct results
> gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))
Series: lngas
ARIMA(0,1,1)
Call: arima(x = lngas, order = c(0, 1, 1), xreg = 1:length(gas))
Coefficients:

s.e.

ma1

1:length(gas)

0.5181

0.0068

0.0654

0.0046

sigma^2 estimated as 0.001681:


AIC = -629.18

AICc = -629.04

log likelihood = 317.59


BIC = -619.61

Compare the results


> gas.arima1<-arima(lngas, ord=c(0,1,1))
Coefficients:
ma1
0.5243
s.e. 0.0646
AIC = -629.06
AICc = -628.99
BIC = -622.68
> gas.arima2<-arima(diff(lngas), ord=c(0,0,1))
Coefficients:
ma1 intercept
0.5181
0.0068
s.e. 0.0654
0.0046
AIC = -629.18
AICc = -629.04
BIC = -619.61
> gas.arima<-arima(lngas, ord=c(0,1,1), xreg=1:length(gas))
Coefficients:
ma1 1:length(gas)
0.5181
0.0068
s.e. 0.0654
0.0046
AIC = -629.18
AICc = -629.04
BIC = -619.61

12

4/19/2011

Is the model adequate?

Is the model adequate?

13

4/19/2011

Is the model adequate?


> tsdiag(gas.arima)

Forecasting
> plot(gas, xlim=c(1973,1991))
> nobs=length(gas)
> gas.pred <- predict(gas.arima, n.ahead=36,
newxreg=(nobs+1):(nobs+36))
> lines(exp(gas.pred$pred), col="red")

14

4/19/2011

Results

Example: nonstationary series

15

4/19/2011

ACF/PACF for a nonstationary series

Choose the best ARIMA model


> get.best.arima(newGTS, maxord=c(2,2,2))
[[3]]
[1] 2 1 1
> gts.arima<-arima(newGTS, ord=c(2,1,1), xreg=1:length(newGTS))
Series: newGTS
ARIMA(2,1,1)
Coefficients:

s.e.

ar1

ar2

ma1

0.5114

0.2923

-1.0000

0.0014

0.0464

0.0467

0.0062

0.0002

sigma^2 estimated as 0.007272:


AIC = -885.22

AICc = -885.08

1:length(newGTS)

log likelihood = 447.61


BIC = -864.89

16

4/19/2011

Check the residuals

Forecast

> tsdiag(gts.arima)

> plot(newGTS,xlim=c(1970,2010))
> nobs=length(newGTS)
> gts.pred<-predict(gts.arima, n.ahead=60,
newxreg=(nobs+1):(nobs+60))
> lines(gts.pred$pred,col="red")

17

4/19/2011

Why use ARIMA vs. a linear model?

Detecting the trend with an ARIMA model is implicit

Cant calculate the exact slope of the trend line

If understanding the dynamics of the trend is important


to your research question, may want to use a linear
model instead

ARIMA models also better for stochastic time series

Additional non-stationary ARIMA models

Seasonal ARIMA (aka SARIMA)

An ARIMA model with an additional seasonal parameter

The seasonal part of an ARIMA model has the same


structure as the non-seasonal part:

Can have an SAR factor, an SMA factor, and/or an order of


differencing

These factors operate across multiples of lag s

e.g., the number of periods in a season

ARIMA(p, d, q)x(P, D, Q)

P=number of seasonal autoregressive terms (SAR)


D=number of seasonal differences
Q=number of seasonal moving average terms (SMA)

18

4/19/2011

Rules for model fitting

The first step is to determine whether or not a seasonal


difference is needed, in addition to or perhaps instead of a
non-seasonal difference:

Look at residual time series plots/ACF/PACF plots for all possible


combinations of 0 or 1 non-seasonal difference and 0 or 1 seasonal
difference

Don't use more than ONE seasonal difference or more than TWO total
differences (seasonal + non-seasonal)

If the autocorrelation at the seasonal period is positive, consider


adding an SAR term to the model
If the autocorrelation at the seasonal period is negative, consider
adding an SMA term to the model

Do notmix SAR and SMA terms in the same model, and avoid using more
than one of either kind

Example: seasonal time series

19

4/19/2011

ACF shows a strong seasonal component

Seasonal ARIMA
> flu.s<-arima(flu, order=c(0,0,0), seas=list(order=c(0,1,0), 12))
acf(flu.s$resid)
pacf(flu.s$resid)

MA signature?

20

4/19/2011

Seasonal ARIMA with an MA term


> flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,0), 12))

Remember the rules

If the autocorrelation at the seasonal period is positive,


consider adding an SAR term to the model

If the autocorrelation at the seasonal period is negative,


consider adding an SMA term to the model

21

4/19/2011

Seasonal ARIMA with MA and SMA terms


> flu.s<-arima(flu, order=c(0,0,1), seas=list(order=c(0,1,1), 12))

Results
> flu.s
Call:
arima(x = flu, order = c(0, 0, 1), seasonal = list(order = c(0,
1, 1), 12))
Coefficients:

s.e.

ma1

sma1

0.7294

-0.5825

0.0678

0.1015

sigma^2 estimated as 0.0047: log likelihood=148.41, aic=-290.82

22

4/19/2011

Prediction
> predict<-predict(flu.s, n.ahead=36)
> predict
$pred
Jan

Feb

Mar

Apr

May

Jun

Jul

1979 0.4237531 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864


1980 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864
1981 0.4350645 0.4242599 0.3431599 0.2553681 0.2130979 0.1911113 0.1971864

> ts.plot(cbind(flu, predict$pred), lty=1:2, col=1:2)


> lines(predict$pred+2*predict$se, col="red", lty=3)
> lines(predict$pred-2*predict$se ,col="red", lty=3)

Prediction

23

4/19/2011

Why use a SARIMA vs. a linear model?

SARIMA models deal with seasonality in a more implicit


manner

Can't easily see in the ARIMA output how the average


December, say, differs from the average July

If you want to isolate the seasonal pattern (e.g., know the


difference between December and July), then a linear
model may be better

Also, SARIMA models are really better if the seasonal


pattern is both strong and stable over time

What have we learned?


What will mean CO2 levels be in Jan 2015?
399.25 PPM
What will the increase in CO2 levels be over the next 10 years?
Dec 2010 = 387.27
Dec 2020 = 411.66
24.39 PPM

24

4/19/2011

What have we learned?


Have we seen an epidemic of flu recently?
Jan 1976
Was there a year of particularly low flu rates?
Dec 1969

What have we learned?

Elements of exploratory time series analysis

Time series plots and classical decomposition


Autocovariances and autocorrelations
Stationarity and differencing

Models of time series

Linear models
Moving averages (MA) and autoregressive (AR) processes
Specification/identification of ARMA/ARIMA models

SARIMA models

Estimation/prediction

For linear models and ARIMA models and combination of the two

25

You might also like