Project Gas Ravneet PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Contents

Project Introduction: ..................................................................................................................................... 1


Explanatory Analysis of the data: .................................................................................................................. 1
Data Loading......................................................................................................................................... 1
Data Summary and Visualization ......................................................................................................... 2
Decomposed Time Series Analysis ................................................................................................................ 5
Data Preparation ........................................................................................................................................... 6
Check for stationary time series .................................................................................................................... 6
Dickey–Fuller test...................................................................................................................................... 6
ACF and PACF Test: ................................................................................................................................... 8
Model Building .............................................................................................................................................. 9
Auto Arima: ............................................................................................................................................... 9
Arima:...................................................................................................................................................... 10
Final Model: ............................................................................................................................................ 10
Project Introduction:

Business Scenario:
Download the Forecast package in R. The package contains methods and tools for
displaying and analyzing univariate time series forecasts including exponential
smoothing via state space models and automatic ARIMA modelling

Explore the gas (Australian monthly gas production) dataset in Forecast


package to do the following:

• Read the data as a time series object in R. Plot the data


• What do you observe? Which components of the time series are present in this dataset?
• What is the periodicity of dataset?
• Is the time series Stationary? Inspect visually as well as conduct an ADF test? Write down the null
and alternate hypothesis for the stationarity test? De-seasonalise the series if seasonality is present?
• Develop an ARIMA Model to forecast for next 12 periods. Use both manual and auto.arima (Show &
explain all the steps)
• Report the accuracy of the model

Explanatory Analysis of the data:

Data Loading
Data is already in the data library. Hence, loading the library forecast

library(forecast)

## store gas data into a dataset

data<- gas

## check the class of the dataset to ensure it is a time series data set
class(data)
[1] "ts"

## Find the start and end of the series, frequency and cycle
> ## start of the series
> start(data)
[1] 1956 1
>
> ## end of the series
> end(data)
[1] 1995 8
>
> ## frequency of the series
> frequency(data)
[1] 12
> ## cycle of the series
> cycle(data)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1956 1 2 3 4 5 6 7 8 9 10 11 12
1957 1 2 3 4 5 6 7 8 9 10 11 12
1958 1 2 3 4 5 6 7 8 9 10 11 12
1959 1 2 3 4 5 6 7 8 9 10 11 12
1960 1 2 3 4 5 6 7 8 9 10 11 12
1961 1 2 3 4 5 6 7 8 9 10 11 12
1962 1 2 3 4 5 6 7 8 9 10 11 12
1963 1 2 3 4 5 6 7 8 9 10 11 12
1964 1 2 3 4 5 6 7 8 9 10 11 12
1965 1 2 3 4 5 6 7 8 9 10 11 12
1966 1 2 3 4 5 6 7 8 9 10 11 12
1967 1 2 3 4 5 6 7 8 9 10 11 12
1968 1 2 3 4 5 6 7 8 9 10 11 12
1969 1 2 3 4 5 6 7 8 9 10 11 12
1970 1 2 3 4 5 6 7 8 9 10 11 12
1971 1 2 3 4 5 6 7 8 9 10 11 12
1972 1 2 3 4 5 6 7 8 9 10 11 12
1973 1 2 3 4 5 6 7 8 9 10 11 12
1974 1 2 3 4 5 6 7 8 9 10 11 12
1975 1 2 3 4 5 6 7 8 9 10 11 12
1976 1 2 3 4 5 6 7 8 9 10 11 12
1977 1 2 3 4 5 6 7 8 9 10 11 12
1978 1 2 3 4 5 6 7 8 9 10 11 12
1979 1 2 3 4 5 6 7 8 9 10 11 12
1980 1 2 3 4 5 6 7 8 9 10 11 12
1981 1 2 3 4 5 6 7 8 9 10 11 12
1982 1 2 3 4 5 6 7 8 9 10 11 12
1983 1 2 3 4 5 6 7 8 9 10 11 12
1984 1 2 3 4 5 6 7 8 9 10 11 12
1985 1 2 3 4 5 6 7 8 9 10 11 12
1986 1 2 3 4 5 6 7 8 9 10 11 12
1987 1 2 3 4 5 6 7 8 9 10 11 12
1988 1 2 3 4 5 6 7 8 9 10 11 12
1989 1 2 3 4 5 6 7 8 9 10 11 12
1990 1 2 3 4 5 6 7 8 9 10 11 12
1991 1 2 3 4 5 6 7 8 9 10 11 12
1992 1 2 3 4 5 6 7 8 9 10 11 12
1993 1 2 3 4 5 6 7 8 9 10 11 12
1994 1 2 3 4 5 6 7 8 9 10 11 12
1995 1 2 3 4 5 6 7 8

Frequency of the data is 12 which means that it is a monthly series

Data Summary and Visualization


> ## Summary of the data
> summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1646 2675 16788 21415 38629 66600

> ## Plot of data


> plot.ts(data, main = "Monthly Gas Production in Australia", xlab = "Time",
ylab = "Gas Production")
Starting around 1970, gas production started to increase and by 1980 had become extremely variable
month to month. The hope of our model is help Australia forecast and understand how gas production
will continue to change in the years following 1995.
> ## Quarter and Year Level
> data.qtr <- aggregate(data, nfrequency=4)
> data.yr <- aggregate(data, nfrequency=1)
> ## Plot
> plot.ts(data.qtr, main = "Quarterly Gas Production in Australia", xlab = "T
ime", ylab = "Gas Production")
> plot.ts(data.yr, main = "Yearly Gas Production in Australia", xlab = "Time"
, ylab = "Gas Production")

> ## Seasonality Plot of data


> seasonplot(data, year.labels = TRUE, year.labels.left=TRUE,
+ pch=19, main = "Monthly Gas Production in Australia", xlab = "Ti
me", ylab = "Gas Production")
Decomposed Time Series Analysis
If we decompose the time series into its component parts (trend, seasonality, random variation, and
observation) we can see that gas production is trending upward with clear semiannual seasonality. We
used this insight when constructing our seasonal arima model.
> ## Decomposed Time Series
> gasdecomp<-ts(data, frequency=12, start=c(1956,1))
> TScomp <- decompose(gasdecomp)
> plot(TScomp)
Data Preparation
Before starting with the decomposition, extract series from 1970 onwards as this was the
period gas production started increasing and showing a trend.

> ## new series from 1970 onwards


> data_new <- ts(data, start=c(1970,1),end=c(1995,8), frequency=12)
>
> ### Divide data into test and train
> DataATrain <- window(data_new, start=c(1970,1), end=c(1993,12), frequency=1
2)
> DataATest <- window(data_new, start=c(1994,1), frequency=12)

Check for stationary time series

Dickey–Fuller test
Null Hypothesis (H0): the time series has a unit root, it is non-stationary. It has some time
dependent structure.
Alternate Hypothesis (H1): the time series does not have a unit root, it is stationary. It
does not have time-dependent structure.

p-value > 0.05: Retain the null hypothesis (H0), the data has a unit root and is non-
stationary.
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and
is stationary.

> ## dickey-fuller test


> library(tseries)
> adf.test(data_new)
Augmented Dickey-Fuller Test

data: data_new
Dickey-Fuller = 0.73972, Lag order = 6, p-value = 0.99
alternative hypothesis: stationary

Warning message:
In adf.test(data_new) : p-value greater than printed p-value

Null Hypothesis is retained means gas data is non-stationary; the average gas
production changes through time.
As for the Arima model it is mandatory for a series to be stationary, hence perform
difference transformation and observe through the plot if the series is stationary or not
> diff_data_new <- diff(data_new)
> plot(diff_data_new)
From the plot, series look stationary. Performing Dicky Fuller test on the differenced
series to confirm the same.

> adf.test(diff_data_new)

Augmented Dickey-Fuller Test


data: diff_data_new
Dickey-Fuller = -15.575, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(diff_data_new) : p-value smaller than printed p-value

From the ADF test the Null Hypothesis is rejected.

The time series of differences appear to be stationary in mean and variance, as the level
of the series stays roughly constant over time, and the variance of the series appears
roughly constant over time

ACF and PACF Test:

Performing to check the stationary data and autocorrelation


> acf(diff_data_new)
> pacf(diff_data_new)

From above plots, Lags are significant.

Model Building

Auto Arima:
Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical
model for the irregular component of a time series, that allows for non-zero
autocorrelations in the irregular component. ARIMA models are defined for stationary time
series.

As the series has seasonality, hence in the auto arima function seasonality is assumed
to be true.

> ## ARIMA Model


> TSdata.arima.fit.train <- auto.arima(DataATrain, seasonal=TRUE)
> TSdata.arima.fit.train
Series: DataATrain
ARIMA(1,1,1)(0,1,1)[12]

Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581
sigma^2 estimated as 259078: log likelihood=-2103.9
AIC=4215.79 AICc=4215.94 BIC=4230.26

Arima:
> fit <- Arima(DataATrain, c(1, 1, 1),seasonal = list(order = c(0, 1, 1), per
iod = 12))
>
> fit
Series: DataATrain
ARIMA(1,1,1)(0,1,1)[12]

Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581

sigma^2 estimated as 259078: log likelihood=-2103.9


AIC=4215.79 AICc=4215.94 BIC=4230.26

> plot(fit$x,col="blue")
> lines(fit$fitted,col="red",main="Production: Actual vs Forecast")

Final Model:
In time series, model creation is a two-step process.
In the first step, data is divided into train and test sample. Model is prepared using the
train dataset and validated using the test sample.
Once the model is finalized, the final model is prepared on the complete dataset. Output
of this model is used for a real forecast i.e. forecast for an unknown period
> ## Forecasting using the entire data set
> Final_model <- auto.arima(data_new, seasonal=TRUE)
> Final_model
Series: data_new
ARIMA(1,1,0)(0,1,2)[12]

Coefficients:
ar1 sma1 sma2
-0.3260 -0.4403 0.1176
s.e. 0.0564 0.0716 0.0705

sigma^2 estimated as 401079: log likelihood=-2321.35


AIC=4650.69 AICc=4650.83 BIC=4665.44

> Box.test(Final_model$residuals, type = c("Ljung-Box"))

Box-Ljung test

data: Final_model$residuals
X-squared = 2.9134e-05, df = 1, p-value = 0.9957

For 12 months forecast:


> ## 12 months foreacast
> Final_forecast <- forecast(Final_model, h=12)
> plot(Final_forecast)
Arima model equation is different when the entire dataset is considered for the final model,
which clearly indicates that dataset for the later periods has seasonality and trend. Hence
rather than considering dataset from 1970, a refined model should be built considering
the period from 1980.

You might also like