TSA Assignment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Time Series Analysis Assignment - MD2106

Goda Venkata Adithya Tarun

2022-11-19

As I am from NB-Stream with even roll no. (MD2106), I have chosen Dataset-4 for the analysis.

Plotting the Data

The visitors dataset presents the monthly Australian short-term overseas visitors during May 1985-April
2005 (in thousands).As the data is indexed by time, it is a timeseries data. The plot of the data against
corresponding time point is given below:

600

400
visitors

200

1985 1990 1995 2000 2005


Time

## [1] "The length of visitors data is 240"

From the above plot, we can observe that the data has approximately increasing linear trend. Also, same
pattern appears repeatedly after every 12 months, which shows seasonality component of frequency
12 . We choose an additive time series model for the analysis.
These can be easily observed in the following plot.

1
Decomposition of multiplicative time series
600

400 data
200

400
trend

300
200
100
seasonal

1.2

1.0

0.8
1.1
remainder

1.0
0.9

1985 1990 1995 2000 2005


Time

The trend component can be approximated by a line with increasing slope, hence it is approximately linearly
increasing.
The seasonality component is very evident from the above plot, where the same pattern can be seen to be
repeated after every year.
The remainder is the random noise component, which we attempt to model as a stationary time series in
this analysis.

## [1] "The length of remainder data is 228"

Since the decompose function uses moving averages to find trend, the random component is NA for the first
and last 6 data points. Hence the length is reduced by 12. The following is a plot of the noise component.
From the plot below, we can see that the noise component is fairly randomly spread about the line y=0 .
Hence we do not have to apply any transformation on it to model the noise component as a stationary time
series.
1.00
noise

0.85

0 50 100 150 200

Time

2
ACF & PACF plots of random component:

The Auto-Correlation Function and Partial Auto-Correlation Function of the random component at different
lags is plotted below:

Series noise
1.0
0.6
ACF

0.2
−0.2

0 5 10 15 20

Lag

Series noise
0.2
Partial ACF

0.0
−0.2

5 10 15 20

Lag

We know that, for an ARMA(p,q) process, the PACF and ACF plots begin to tail off at p and
q respectively. The PACF plot starts to tail off at lag 6, so it is highly probable that p=6 . Also, the
ACF becomes very small just after lag 0, but takes high values again at lags 6, 12 etc. This could be due
to randomness. It is possible that q=0 or q=6 . Once we employ a model selection criteria, the values of p
and q would become clearer.

3
Fitting ARMA model
We shall fit ARMA(6,0) & ARMA(6,6) models for now, find the coefficients and residuals and compare them.
Later, we will use a model selection criteria.

Fitting ARMA(6,0) model:

We have fitted ARMA(6,0) model to the noise data and the coefficients are as follows:

## ar1 ar2 ar3 ar4 ar5


## 0.2778957564 -0.0006217447 -0.0911784279 -0.2087844970 -0.0875571713
## ar6 intercept
## -0.1001512016 1.0002905645

The Portmanteau tests’ results for the hypothesis that residuals follow iid noise, ACF, PACF and residuals
and Normal Q-Q plots are shown below:

## Null hypothesis: Residuals are iid noise.


## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 15.63 0.7393
## McLeod-Li Q Q ~ chisq(20) 13.9 0.8357
## Turning points T (T-150.7)/6.3 ~ N(0,1) 156 0.4003
## Diff signs S (S-113.5)/4.4 ~ N(0,1) 114 0.9089
## Rank P (P-12939)/575.7 ~ N(0,1) 13158 0.7036

ACF PACF
0.5

0.5
−1.0

−1.0

0 10 20 30 40 0 10 20 30 40

Lag Lag

Residuals Normal Q−Q Plot


Sample Quantiles
0.10

0.10
−0.10

−0.10

0 50 100 150 200 −3 −2 −1 0 1 2 3

Time Theoretical Quantiles

## [1] "The AIC of the ARMA(6,0) model is -797.318221783803"

## [1] "The BIC of the ARMA(6,0) model is -769.883456752167"

From the Portmanteau tests, we see that 3 out of 5 tests agree with the hypothesis that the residuals are iid
noise.

4
Fitting ARMA(6,6) model:

We have fitted ARMA(6,6) model to the noise data and the coefficients are as follows:

## ar1 ar2 ar3 ar4 ar5 ar6 ma1


## 2.2649267 -1.8787810 -0.1627580 1.9103177 -1.7324315 0.5637246 -2.0790408
## ma2 ma3 ma4 ma5 ma6 intercept
## 1.4387755 0.3902858 -1.8989237 1.5037343 -0.3548188 1.0005435

The Portmanteau tests’ results for the hypothesis that residuals follow iid noise, ACF, PACF and residuals
and Normal Q-Q plots are shown below:

## Null hypothesis: Residuals are iid noise.


## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 5.29 0.9996
## McLeod-Li Q Q ~ chisq(20) 12.82 0.8851
## Turning points T (T-150.7)/6.3 ~ N(0,1) 150 0.9163
## Diff signs S (S-113.5)/4.4 ~ N(0,1) 114 0.9089
## Rank P (P-12939)/575.7 ~ N(0,1) 13289 0.5432

ACF PACF
0.5

0.5
−1.0

−1.0

0 10 20 30 40 0 10 20 30 40

Lag Lag

Residuals Normal Q−Q Plot


Sample Quantiles
0.10

0.10
−0.10

−0.10

0 50 100 150 200 −3 −2 −1 0 1 2 3

Time Theoretical Quantiles

## [1] "The AIC of the ARMA(6,0) model is -809.021962932916"

## [1] "The BIC of the ARMA(6,0) model is -761.011124127554"

From the Portmanteau tests, we see that 4 out of 5 tests agree with the hypothesis that the residuals are iid
noise.
ARMA(6,0) is better model compared to ARMA(6,6) as both the AIC and BIC are higher for the former
model. Now, based on different information criteria, we will select best model.

5
Best model based on AIC:

Here, we select (p,q) such that the AIC of the fitted ARMA(p,q) model is highest.

## ar1 ar2 ar3 ar4 ma1 intercept


## 1.16435987 -0.27027157 -0.14481423 -0.01397736 -0.95902764 1.00047431

The model selected based on AIC is ARMA(5,0).

Best model based on BIC:

Here, we select (p,q) such that the BIC of the fitted ARMA(p,q) model is highest.

## ar1 ar2 ar3 ar4 ma1 intercept


## 1.16435987 -0.27027157 -0.14481423 -0.01397736 -0.95902764 1.00047431

The model selected based on BIC is also ARMA(5,0).


Hence, for the model ARMA(5,0), the Portmanteau tests’ results for the hypothesis that residuals follow iid
noise, ACF, PACF and residuals and Normal Q-Q plots are shown below:

## Null hypothesis: Residuals are iid noise.


## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 7.82 0.993
## McLeod-Li Q Q ~ chisq(20) 18.77 0.5369
## Turning points T (T-150.7)/6.3 ~ N(0,1) 152 0.8335
## Diff signs S (S-113.5)/4.4 ~ N(0,1) 112 0.7313
## Rank P (P-12939)/575.7 ~ N(0,1) 13122 0.7506

ACF PACF
0.5

0.5
−1.0

−1.0

0 10 20 30 40 0 10 20 30 40

Lag Lag

Residuals Normal Q−Q Plot


Sample Quantiles
0.10

0.10
−0.10

−0.10

0 50 100 150 200 −3 −2 −1 0 1 2 3

Time Theoretical Quantiles

6
## [1] "The AIC of the ARMA(5,0) model is -811.479590103413"

## [1] "The BIC of the ARMA(5,0) model is -787.474170700732"

Note that both the AIC and BIC are higher for this model compared to previous models.

Fitting SARIMA model

Without removing the trend and seasonality components, we try to model the visitors data in this section
using SARIMA model. The following is the fitted SARIMA model:

## Series: visitors
## ARIMA(1,0,1)(0,1,2)[12] with drift
##
## Coefficients:
## ar1 ma1 sma1 sma2 drift
## 0.8968 -0.3187 -0.7110 0.1461 1.4820
## s.e. 0.0379 0.0804 0.0753 0.0723 0.2667
##
## sigma^2 = 279.9: log likelihood = -966.83
## AIC=1945.66 AICc=1946.04 BIC=1966.24

The fitted model with parameters is SARIMA(1,0,1)(0,1,2)[12], i.e.,

(1 − B 12 )(1 − ϕ1 B)Xt = (1 − Θ1 B 12 − Θ2 B 24 )(1 − θ1 B)Zt

where {Xt } denotes the visitors data indexed by time and {Zt } is a WhiteNoise(0,σ 2 ) process. Here the
estimated coefficients and σ 2 are as displayed above.
To test the hypothesis that the residuals follow iid noise, the Portmanteau tests are used once again and the
results are as follows:

## Null hypothesis: Residuals are iid noise.


## Test Distribution Statistic p-value
## Ljung-Box Q Q ~ chisq(20) 14.67 0.7949
## McLeod-Li Q Q ~ chisq(20) 104.61 0 *
## Turning points T (T-158.7)/6.5 ~ N(0,1) 160 0.8377
## Diff signs S (S-119.5)/4.5 ~ N(0,1) 114 0.2197
## Rank P (P-14340)/621.6 ~ N(0,1) 14009 0.5944

7
ACF PACF

0.5

0.5
−1.0

−1.0
0 10 20 30 40 0 10 20 30 40

Lag Lag

Residuals Normal Q−Q Plot

Sample Quantiles
0 40

0 40
−60

−60
1985 1990 1995 2000 2005 −3 −2 −1 0 1 2 3

Time Theoretical Quantiles

We see that 4 out of 5 tests agree that the residuals follow iid noise.

Conclusion

• We have de-trended and de-seasonalized the given timeseries data.


• The remainder (or the random noise) component is modelled as ARMA(5,0). The residuals follow iid
noise.
• The original data has been modelled as SARIMA(1,0,1)(0,1,2)[12]. The residuals follow iid noise.

You might also like