Professional Documents
Culture Documents
Project Gas Ravneet PDF
Project Gas Ravneet PDF
Project Gas Ravneet PDF
Business Scenario:
Download the Forecast package in R. The package contains methods and tools for
displaying and analyzing univariate time series forecasts including exponential
smoothing via state space models and automatic ARIMA modelling
Data Loading
Data is already in the data library. Hence, loading the library forecast
library(forecast)
data<- gas
## check the class of the dataset to ensure it is a time series data set
class(data)
[1] "ts"
## Find the start and end of the series, frequency and cycle
> ## start of the series
> start(data)
[1] 1956 1
>
> ## end of the series
> end(data)
[1] 1995 8
>
> ## frequency of the series
> frequency(data)
[1] 12
> ## cycle of the series
> cycle(data)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1956 1 2 3 4 5 6 7 8 9 10 11 12
1957 1 2 3 4 5 6 7 8 9 10 11 12
1958 1 2 3 4 5 6 7 8 9 10 11 12
1959 1 2 3 4 5 6 7 8 9 10 11 12
1960 1 2 3 4 5 6 7 8 9 10 11 12
1961 1 2 3 4 5 6 7 8 9 10 11 12
1962 1 2 3 4 5 6 7 8 9 10 11 12
1963 1 2 3 4 5 6 7 8 9 10 11 12
1964 1 2 3 4 5 6 7 8 9 10 11 12
1965 1 2 3 4 5 6 7 8 9 10 11 12
1966 1 2 3 4 5 6 7 8 9 10 11 12
1967 1 2 3 4 5 6 7 8 9 10 11 12
1968 1 2 3 4 5 6 7 8 9 10 11 12
1969 1 2 3 4 5 6 7 8 9 10 11 12
1970 1 2 3 4 5 6 7 8 9 10 11 12
1971 1 2 3 4 5 6 7 8 9 10 11 12
1972 1 2 3 4 5 6 7 8 9 10 11 12
1973 1 2 3 4 5 6 7 8 9 10 11 12
1974 1 2 3 4 5 6 7 8 9 10 11 12
1975 1 2 3 4 5 6 7 8 9 10 11 12
1976 1 2 3 4 5 6 7 8 9 10 11 12
1977 1 2 3 4 5 6 7 8 9 10 11 12
1978 1 2 3 4 5 6 7 8 9 10 11 12
1979 1 2 3 4 5 6 7 8 9 10 11 12
1980 1 2 3 4 5 6 7 8 9 10 11 12
1981 1 2 3 4 5 6 7 8 9 10 11 12
1982 1 2 3 4 5 6 7 8 9 10 11 12
1983 1 2 3 4 5 6 7 8 9 10 11 12
1984 1 2 3 4 5 6 7 8 9 10 11 12
1985 1 2 3 4 5 6 7 8 9 10 11 12
1986 1 2 3 4 5 6 7 8 9 10 11 12
1987 1 2 3 4 5 6 7 8 9 10 11 12
1988 1 2 3 4 5 6 7 8 9 10 11 12
1989 1 2 3 4 5 6 7 8 9 10 11 12
1990 1 2 3 4 5 6 7 8 9 10 11 12
1991 1 2 3 4 5 6 7 8 9 10 11 12
1992 1 2 3 4 5 6 7 8 9 10 11 12
1993 1 2 3 4 5 6 7 8 9 10 11 12
1994 1 2 3 4 5 6 7 8 9 10 11 12
1995 1 2 3 4 5 6 7 8
Dickey–Fuller test
Null Hypothesis (H0): the time series has a unit root, it is non-stationary. It has some time
dependent structure.
Alternate Hypothesis (H1): the time series does not have a unit root, it is stationary. It
does not have time-dependent structure.
p-value > 0.05: Retain the null hypothesis (H0), the data has a unit root and is non-
stationary.
p-value <= 0.05: Reject the null hypothesis (H0), the data does not have a unit root and
is stationary.
data: data_new
Dickey-Fuller = 0.73972, Lag order = 6, p-value = 0.99
alternative hypothesis: stationary
Warning message:
In adf.test(data_new) : p-value greater than printed p-value
Null Hypothesis is retained means gas data is non-stationary; the average gas
production changes through time.
As for the Arima model it is mandatory for a series to be stationary, hence perform
difference transformation and observe through the plot if the series is stationary or not
> diff_data_new <- diff(data_new)
> plot(diff_data_new)
From the plot, series look stationary. Performing Dicky Fuller test on the differenced
series to confirm the same.
> adf.test(diff_data_new)
Warning message:
In adf.test(diff_data_new) : p-value smaller than printed p-value
The time series of differences appear to be stationary in mean and variance, as the level
of the series stays roughly constant over time, and the variance of the series appears
roughly constant over time
Model Building
Auto Arima:
Autoregressive Integrated Moving Average (ARIMA) models include an explicit statistical
model for the irregular component of a time series, that allows for non-zero
autocorrelations in the irregular component. ARIMA models are defined for stationary time
series.
As the series has seasonality, hence in the auto arima function seasonality is assumed
to be true.
Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581
sigma^2 estimated as 259078: log likelihood=-2103.9
AIC=4215.79 AICc=4215.94 BIC=4230.26
Arima:
> fit <- Arima(DataATrain, c(1, 1, 1),seasonal = list(order = c(0, 1, 1), per
iod = 12))
>
> fit
Series: DataATrain
ARIMA(1,1,1)(0,1,1)[12]
Coefficients:
ar1 ma1 sma1
0.5489 -0.8076 -0.4130
s.e. 0.1061 0.0698 0.0581
> plot(fit$x,col="blue")
> lines(fit$fitted,col="red",main="Production: Actual vs Forecast")
Final Model:
In time series, model creation is a two-step process.
In the first step, data is divided into train and test sample. Model is prepared using the
train dataset and validated using the test sample.
Once the model is finalized, the final model is prepared on the complete dataset. Output
of this model is used for a real forecast i.e. forecast for an unknown period
> ## Forecasting using the entire data set
> Final_model <- auto.arima(data_new, seasonal=TRUE)
> Final_model
Series: data_new
ARIMA(1,1,0)(0,1,2)[12]
Coefficients:
ar1 sma1 sma2
-0.3260 -0.4403 0.1176
s.e. 0.0564 0.0716 0.0705
Box-Ljung test
data: Final_model$residuals
X-squared = 2.9134e-05, df = 1, p-value = 0.9957