TimeSeries Report

Problem Statement:
For this assignment, the data of different types of wine sales in the 20th century is to
be analysed. Both data are from the same company but of different wines. As an
analyst in the ABC Estate Wines, you are tasked to analyse and forecast Wine Sales
in the 20th century.
Solution:
Reading the data Sparkling
Top 5
Sparkling
YearMont
h
1980-01-01 1686
1980-02-01 1591
1980-03-01 2304
1980-04-01 1712
1980-05-01 1471
Basic Info
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 187 entries, 1980-01-01 to 1995-07-01
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Sparkling 187 non-null int64
dtypes: int64(1)
memory usage: 2.9 KB
Plotting Time series
Basic Descriptive statistics
Sparkling
coun
187.0
t
mean 2402.0
std 1295.0
min 1070.0
25% 1605.0
50% 1874.0
75% 2549.0
max 7242.0
Boxplots
Monthly Plot
There is a clear distinction of 'Sparkling' within different months spread across various years. The
highest such numbers are being recorded from the month of October till December across
various years.
Time Series Month Plot

This plot shows us the behaviour of the Time Series ('Sparkling' in this case) across various
months. The red line is the median value.
Grid Plot
Empirical Cumulative Distribution of Sales
Decomposition of Time series

Splitting into Train and Test
Red represents the Train data and blue represents the Test
4) Building various models and comparing Accuracy Metrics
Model 1: Linear Regression
Metrics
Test RMSE
RegressionOnTime 1374.550202
Model 2: Naïve Approach
Test RMSE
NaiveModel 1496.444629
Method 3: Simple Average

For this simple average method, we will forecast by using the average of the training values.
Test RMSE
SimpleAverageModel 1368.746717
Method 4: Moving Average (MA)
For the moving average model, we are going to calculate rolling means (or moving averages) for
different intervals. The best interval can be determined by the maximum accuracy (or the
minimum error) over here.
For Moving Average, we are going to average over the entire data
Test RMSE
2pointTrailingMovingAverag
811.178937
e
1184.213295
e
1337.200524
e
1422.653281
e
Before we go on to build the various Exponential Smoothing models, let us plot all the models
and compare the Time Series plots
Method 5: Simple Exponential Smoothing

Test RMSE
2pointTrailingMovingAverage 811.178937
Alpha=0.995,SimpleExponentialSmoothin
1362.348469
g
Setting different alpha values:
The higher the alpha value more weightage is given to the more recent observation. That
means, what happened recently will happen again.
We will run a loop with different alpha values to understand which value works best for alpha on
the test set.
Check for stationarity of the whole Time Series data.
Perform Dickey-Fuller test: Ho : There is no stationarity
Results of Dickey-Fuller Test:

Test Statistic -1.360497
p-value 0.601061
#Lags Used 11.000000
Number of Observations Used 175.000000
Critical Value (1%) -3.468280
dtype: float64
We see that at 5% significance level the Time Series is non-stationary as at high p-value we are
unable to reject the null hypothesis.
Let us take a difference of order 1 and check whether the Time Series is stationary or not.
We see that at 𝛼 = 0.05 the Time Series is indeed stationary as the p-value is lower than 0.05
and hence we can reject the null hypothesis which says that the time series is not stationary. So
differentiation by 1 makes the time series stationary.
Autocorrelation & Partial Autocorrelation function

Train & Test Data Split & Plot
Training Data is till the end of 1990. Test Data is from the beginning of 1991 to the last time
stamp provided.
First few rows of Training Data

Sparkling
YearMonth
1980-01-01 1686
1980-02-01 1591
1980-03-01 2304
1980-04-01 1712
1980-05-01 1471
Last few rows of Training Data

Sparkling
YearMonth
1990-08-01 1605
1990-09-01 2424
1990-10-01 3116
1990-11-01 4286
1990-12-01 6047
First few rows of Test Data

Sparkling
YearMonth
1991-01-01 1902
1991-02-01 2049
1991-03-01 1874
1991-04-01 1279
1991-05-01 1432
Last few rows of Test Data

Sparkling
YearMonth
1995-03-01 1897
1995-04-01 1862
1995-05-01 1670
1995-06-01 1688
1995-07-01 2031
Check for stationarity of the Training Data Time Series.
0.6697444263523344
We see that the series is not stationary at 𝛼 = 0.05.
0.0
We see that after taking a difference of order 1 the series have become stationary at 𝛼 = 0.05.
Automated version of an ARIMA model: Based on lowest Akaike Information
Criteria (AIC)
SARIMAX Results
Dep. Variable: Sparkling No. Observations: 132

Model: ARIMA(2, 1, 2) Log Likelihood -1101.755
Date: Sun, 25 Dec 2022 AIC 2213.509
Time: 23:17:33 BIC 2227.885
Sample: 01-01-1980 HQIC 2219.351
- 12-01-1990
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
ar.L1 1.3121 0.046 28.781 0.000 1.223 1.401
ar.L2 -0.5593 0.072 -7.741 0.000 -0.701 -0.418
ma.L1 -1.9917 0.109 -18.217 0.000 -2.206 -1.777
ma.L2 0.9999 0.110 9.109 0.000 0.785 1.215
sigma 1.099e+0 5.51e+1

1.99e-07 0.000 1.1e+06 1.1e+06
2 6 2
Ljung-Box (L1) (Q): 0.19 Jarque-Bera (JB): 14.46
Prob(Q): 0.67 Prob(JB): 0.00
Heteroskedasticity
2.43 Skew: 0.61
(H):
Prob(H) (two-sided): 0.00 Kurtosis: 4.08
Predict on the Test Set & Evaluation - Auto Arima Model
RMSE
ARIMA(2,1,1) 2249.496264
Manual ARIMA model - Using ACF & PACF plots

Here, we have taken alpha=0.05.
The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the significant lag
before which the PACF plot cuts-off to 0. The Moving-Average parameter in an ARIMA model is
'q' which comes from the significant lag before the ACF plot cuts-off to 1. By looking at the above
plots, we can say that both the PACF and ACF plot cuts-off at lag 1. So,our pdq values are 0,1,1.
SARIMAX Results
=======================================================================
=======
Dep. Variable: Sparkling No. Observations:
132
Model: ARIMA(0, 1, 1) Log Likelihood -
1129.530
Date: Sun, 25 Dec 2022 AIC
2263.060
Time: 23:17:34 BIC
2268.810
Sample: 01-01-1980 HQIC
2265.397
- 12-01-1990
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.3419 0.063 -5.397 0.000 -0.466
-0.218
sigma2 1.805e+06 2.13e+05 8.463 0.000 1.39e+06
2.22e+06
=======================================================================
============
Ljung-Box (L1) (Q): 1.09 Jarque-Bera (JB):
33.41
Prob(Q): 0.30 Prob(JB):
0.00
Heteroskedasticity (H): 2.45 Skew:
-0.89
Prob(H) (two-sided): 0.00 Kurtosis:
4.71
=======================================================================
============
We get a comparatively simpler model by looking at the ACF and the PACF plots.
Note: When we see that both the AR(p) and the MA(q) model are of order 0, we have to convert
the input variable into a 'float64' type variable else Python might throw an error when we try to
forecast
Predict on the Test Set & Evaluation
RMSE
ARIMA(2,1,1) 2249.496264
ARIMA(0,1,1) 3141.313590
Automated version of a SARIMA model -Parameter Selection with lowest

Akaike Information Criteria (AIC)
We see that there can be a seasonality of 12 as well as 24. We will run our auto SARIMA models
by setting seasonality both as 12 and 24
Auto SARIMA model - With Seasonality as 12
SARIMAX Results
Dep. Variable: y No. Observations: 132
SARIMAX(0, 1, 2)x(2, 0, 2,
Model: Log Likelihood -771.561
12)
Date: Sun, 25 Dec 2022 AIC 1557.122
Time: 23:18:30 BIC 1575.632
Sample: 0 HQIC 1564.621
- 132
Covariance
opg
Type:
coef std err z P>|z| [0.025 0.975]
ma.L1 -0.7771 0.101 -7.678 0.000 -0.975 -0.579
ma.L2 -0.1239 0.121 -1.022 0.307 -0.361 0.114
ar.S.L12 0.6992 0.751 0.931 0.352 -0.773 2.171
ar.S.L24 0.3588 0.783 0.458 0.647 -1.175 1.893
ma.S.L1 1.6452 3.728 0.441 0.659 -5.661 8.952

2
ma.S.L2
-1.5969 2.674 -0.597 0.550 -6.837 3.644
4
2.905e+0 - 2.15e+0
sigma2 9.49e+04 0.306 0.760
4 1.57e+05 5
Prob(Q): 0.85 Prob(JB): 0.00
Heteroskedasticity
1.45 Skew: 0.49
(H):
Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
Prediction on the Test Set & Evaluation
RMSE
ARIMA(2,1,1) 2249.496264
ARIMA(0,1,1) 3141.313590
SARIMA(0,1,2)
526.539279
(2,0,2,12)
We see that we have huge gain the RMSE value by including the seasonal parameters as
well.
Setting the seasonality as 24 for the second iteration of the auto SARIMA model.
SARIMAX Results
Dep. Variable: Sparkling No. Observations: 132
SARIMAX(1, 1, 2)x(2, 0, 2,
24)
Date: Sun, 25 Dec 2022 AIC 1224.282
Time: 23:21:36 BIC 1243.338
Sample: 01-01-1980 HQIC 1231.922

- 12-01-1990
Covariance
opg
Type:
coef std err z P>|z| [0.025 0.975]
ar.L1 0.3946 0.297 1.330 0.183 -0.187 0.976
ma.L1 -2.1097 0.815 -2.588 0.010 -3.707 -0.512
ma.L2 1.0108 0.830 1.218 0.223 -0.616 2.638
ar.S.L24 0.4197 0.122 3.434 0.001 0.180 0.659
ar.S.L48 0.7928 0.138 5.765 0.000 0.523 1.062
ma.S.L24 0.3180 0.728 0.437 0.662 -1.108 1.744
ma.S.L48 -0.8802 0.467 -1.884 0.060 -1.796 0.036
7.06e+0 2.37e- 7.06e+0 7.06e+0

sigma2 2.98e+09 0.000
4 05 4 4
Prob(Q): 0.78 Prob(JB): 0.00
Heteroskedasticity
1.06 Skew: 0.06
(H):
RMSE
ARIMA(2,1,1) 2249.496264
ARIMA(0,1,1) 3141.313590
SARIMA(0,1,2)
526.539279
(2,0,2,12)
SARIMA(1,1,2) 670.549644
RMSE
(2,0,2,24)
We see that the RMSE value have not reduced further when the seasonality parameter
was changed to 24.
Building the most optimum model on the Full Data
SARIMAX Results
No.
Dep. Variable: Sparkling 187
Observations:
SARIMAX(0, 1, 2)x(2, 0, 2,
12)
Date: Sun, 25 Dec 2022 AIC 2360.798
Time: 23:21:38 BIC 2382.281
Sample: 01-01-1980 HQIC 2369.522
- 07-01-1995
Covariance
opg
Type:
coef std err z P>|z| [0.025 0.975]
ma.L1 -0.8323 0.079 -10.518 0.000 -0.987 -0.677
ma.L2 -0.1230 0.082 -1.494 0.135 -0.284 0.038
ar.S.L12 0.6500 0.670 0.970 0.332 -0.664 1.964
ar.S.L24 0.3697 0.683 0.541 0.589 -0.970 1.709
ma.S.L12 -0.2323 0.669 -0.347 0.728 -1.544 1.079
ma.S.L24 -0.2435 0.425 -0.573 0.567 -1.076 0.590
1.467e+0 1.29e+0 1.21e+0 1.72e+0

sigma2 11.341 0.000
5 4 5 5

Prob(Q): 0.89 Prob(JB): 0.00
Heteroskedasticity
0.94 Skew: 0.60
(H):
Predictions Based on the models built and the RSME scores from the various
models built:
 The plots show that it had a significant increase during 1988 but saw a
downward slope and recovered again in 1995.
 There is a significant increase in sales and a positive flow towards a
more productive year 1995.
 To increase the sales, we need to find the trend that was during the year
1989 and try to implement similar features into the wine.
 To increase the sales, we could find the right marketing technique/build
similar ways of connecting with the customers as in 1989.
 Find improvised ways of reaching the customers via current and
trending ways of reaching over to consumers.
 Target to improve the availability of the wine in stores early from
October to get the consumers used to the brand and ease of access.
Solution:
Reading the data Rose
Top 5
Rose
YearMonth
1980-01-01 112.0
1980-02-01 118.0
1980-03-01 129.0
1980-04-01 99.0
1980-05-01 116.0
Info of the dataset
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 187 entries, 1980-01-01 to 1995-07-01
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rose 185 non-null float64
dtypes: float64(1)
memory usage: 2.9 KB
Plotting the Timeseries
Basic Descriptive Statistics

Rose
coun
185.0
t
mean 90.0
std 39.0
min 28.0
Rose
25% 63.0
50% 86.0
75% 112.0
max 267.0
The basic measures of descriptive statistics tell us how the Sales have varied across years. But,
for this measure of descriptive statistics we have averaged over the whole data without taking the
time component into account.
Boxplots - Sales across years & within months across years.
Yearly Boxplot
Monthly Plot
There is a clear distinction of 'Rose' within different months spread across various years. The
highest such numbers are being recorded in the month of December across various years.
Time Series Month Plot
This plot shows us the behaviour of the Time Series (Rose in this case) across various
months. The red line is the median value.
Monthly Sales Plot across years

Empirical Cumulative Distribution of Sales
Plot of Month on month % change of Rose

Decompose Time Series
Train – Test Split and Plot
The red indicates Train Set and Blue indicates Test Set
Building different models and comparing the accuracy metrics .
Model 1: Linear Regression
For this linear regression, we are going to regress the 'Rose' variable against the order of the
occurrence. For this we need to modify our training data before fitting it into a linear regression.
Now that our training and test data has been modified, let us go ahead use Linear Regression to
build the model on the training data and test the model on the test data.
Accuracy Metrics
Test RMSE
Model 2: Naïve Approach
RMSE Test RMSE
ARIMA(2,1,1) 2249.496264 NaN
ARIMA(0,1,1) 3141.313590 NaN
SARIMA(0,1,2)
526.539279 NaN
(2,0,2,12)
SARIMA(1,1,2)
670.549644 NaN
(2,0,2,24)
RMSE Test RMSE
NaiveModel NaN 24.671882
Method 3: Simple Average
For this simple average method, we will forecast by using the average of the training
values.
RMSE Test RMSE
ARIMA(2,1,1) 2249.496264 NaN
ARIMA(0,1,1) 3141.313590 NaN
SARIMA(0,1,2)
526.539279 NaN
(2,0,2,12)
SARIMA(1,1,2)
670.549644 NaN
(2,0,2,24)
SimpleAverageModel NaN 54.851316
Method 4: Moving Average(MA)
For the moving average model, we are going to calculate rolling means (or moving
averages) for different intervals. The best interval can be determined by the maximum
accuracy (or the minimum error) over here.For Moving Average, we are going to average over
the entire data.
RMSE Test RMSE
ARIMA(2,1,1) 2249.496264 NaN
ARIMA(0,1,1) 3141.313590 NaN
SARIMA(0,1,2)(2,0,2,12) 526.539279 NaN
SARIMA(1,1,2)(2,0,2,24) 670.549644 NaN
NaN 12.546929
e
NaN 17.148954
e
NaN 18.039241
e
NaN 18.422860
e
Before we go on to build the various Exponential Smoothing models, let us plot all the
models and compare the Time Series plots.
Method 5: Simple Exponential Smoothing
RMSE Test RMSE
ARIMA(2,1,1) 2249.496264 NaN
ARIMA(0,1,1) 3141.313590 NaN
SARIMA(0,1,2)(2,0,2,12) 526.539279 NaN
SARIMA(1,1,2)(2,0,2,24) 670.549644 NaN
Alpha=0.995,SimpleExponentialSmoothin
NaN 33.098471
g
Method 6: Double Exponential Smoothing (Holt's Model)

RMSE Test RMSE
ARIMA(2,1,1) 2249.496264 NaN
ARIMA(0,1,1) 3141.313590 NaN
SARIMA(0,1,2)(2,0,2,12) 526.539279 NaN
SARIMA(1,1,2)(2,0,2,24) 670.549644 NaN
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothin
NaN NaN
g
Check for stationarity of the whole Time Series data.
Results of Dickey-Fuller Test:

Test Statistic -1.715949
p-value 0.422914
#Lags Used 13.000000
Number of Observations Used 173.000000
dtype: float64
We see that at 5% significance level the Time Series is non-stationary as at high p-value we are
unable to reject the null hypothesis.
Let us take a difference of order 1 and check whether the Time Series is stationary or not.
0.0
We see that at 𝛼 = 0.05 the Time Series is indeed stationary as the p-value is lower than 0.05
and hence we can reject the null hypothesis which says that the time series is not stationary. So
differentiation by 1 makes the time series stationary.
Autocorrelation & Partial Autocorrelation function
Train & Test Data Split & Plot
Training Data is till the end of 1990. Test Data is from the beginning of 1991 to the last
time stamp provided.
First few rows of Training Data

Rose
YearMonth
1980-01-01 112.0
1980-02-01 118.0
1980-03-01 129.0
1980-04-01 99.0
Rose
YearMonth
1980-05-01 116.0
Last few rows of Training Data

Rose
YearMonth
1990-08-01 70.0
1990-09-01 83.0
1990-10-01 65.0
1990-11-01 110.0
1990-12-01 132.0
First few rows of Test Data

Rose
YearMont
h
1991-01-01 54.0
1991-02-01 55.0
1991-03-01 66.0
1991-04-01 65.0
1991-05-01 60.0
Last few rows of Test Data

Rose
YearMont
h
1995-03-01 45.0
1995-04-01 52.0
1995-05-01 28.0
1995-06-01 40.0
1995-07-01 62.0
Check for stationarity of the Training Data Time Series.
0.0
We see that after taking a difference of order 1 the series have become stationary at 𝛼 = 0.05.
Automated version of an ARIMA model : Based on lowest Akaike Information

Criteria (AIC).
SARIMAX Results
Dep. Variable: Rose No. Observations: 132
Model: ARIMA(2, 1, 2) Log Likelihood -635.935
1281.87
1
1296.24
Time: 23:37:18 BIC
7
1287.71
2
- 12-01-1990
Covariance
opg
Type:
std
coef z P>|z| [0.025 0.975]
err
0.33
ar.L1 -0.4540 0.469 -0.969 -1.372 0.464
3
0.99
ar.L2 0.0001 0.170 0.001 -0.334 0.334
9
0.58
ma.L1 -0.2541 0.459 -0.554 -1.154 0.646
0
0.16
ma.L2 -0.5984 0.430 -1.390 -1.442 0.245
4
952.160 0.00 1131.34

sigma2 91.424 10.415 772.973
1 0 7
Prob(Q): 0.88 Prob(JB): 0.00

Heteroskedasticity
0.37 Skew: 0.79
(H):
Predict on the Test Set & Evaluation - Auto Arima Model
RMSE
ARIMA(2,1,1
43.201893
)
Manual ARIMA model - Using ACF & PACF plots

Here, we have taken alpha=0.05.
The Auto-Regressive parameter in an ARIMA model is 'p' which comes from the significant lag
before which the PACF plot cuts-off to 1. The Moving-Average parameter in an ARIMA model is
'q' which comes from the significant lag before the ACF plot cuts-off to 1. By looking at the above
plots, we can say that both the PACF and ACF plot cuts-off at lag 1. So,our pdq values are 1,1,1.
SARIMAX Results
=======================================================================
=======
Dep. Variable: Rose No. Observations:
132
Model: ARIMA(1, 1, 1) Log Likelihood -
637.287
1280.574
Time: 23:38:42 BIC
1289.200
1284.079
- 12-01-1990
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 0.1814 0.076 2.396 0.017 0.033
0.330
ma.L1 -0.9192 0.053 -17.362 0.000 -1.023
-0.815
sigma2 972.5964 88.768 10.957 0.000 798.614
1146.579
=======================================================================
============
Ljung-Box (L1) (Q): 0.00 Jarque-Bera (JB):
41.59
Prob(Q): 0.98 Prob(JB):
0.00
Heteroskedasticity (H): 0.35 Skew:
0.87
Prob(H) (two-sided): 0.00 Kurtosis:
5.14
=======================================================================
============
We get a comparatively simpler model by looking at the ACF and the PACF plots.
Note: When we see that both the AR(p) and the MA(q) model are of order 0, we have to convert
the input variable into a 'float64' type variable else Python might throw an error when we try to
forecast
RMSE
ARIMA(2,1,1
43.201893
)
ARIMA(1,1,1
40.207902
)
We see that there is difference in the RMSE values for both the models, but remember that
the second model is a much simpler model.
Automated version of a SARIMA model -Parameter Selection with lowest

Akaike Information Criteria (AIC).
Let us look at the ACF plot once more to understand the seasonal parameters PDQ for the
SARIMA model.
We see that there can be a seasonality of 12 as well as 24. We will run our auto SARIMA
models by setting seasonality both as 12 and 24.
Auto SARIMA model - With Seasonality as 12

SARIMAX Results
Dep. Variable: y No. Observations: 132

Model: SARIMAX(0, 1, 2)x(2, 0, 2, 12) Log Likelihood -436.969
Date: Sun, 25 Dec 2022 AIC 887.938
Time: 23:42:41 BIC 906.448
Sample: 0 HQIC 895.437
- 132
Covariance
opg
Type:
coef std err z P>|z| [0.025 0.975]
ma.L1 -0.8427 189.778 -0.004 0.996 -372.800 371.115
ma.L2 -0.1573 29.815 -0.005 0.996 -58.594 58.279
ar.S.L12 0.3467 0.079 4.375 0.000 0.191 0.502
ar.S.L24 0.3023 0.076 3.996 0.000 0.154 0.451
ma.S.L1
0.0767 0.133 0.577 0.564 -0.184 0.337
2
ma.S.L2
-0.0726 0.146 -0.498 0.618 -0.358 0.213
4
251.313 - 9.37e+0
sigma2 4.77e+04 0.005 0.996
7 9.32e+04 4
Prob(Q): 0.75 Prob(JB): 0.31
Heteroskedasticity
0.88 Skew: 0.37
(H):
Prediction on the Test Set & Evaluation

RMSE
43.20189
ARIMA(2,1,1)
3
40.20790
ARIMA(1,1,1)
2
SARIMA(0,1,2) 30.03819
(2,0,2,12) 9
RMSE
43.20189
ARIMA(2,1,1)
3
40.20790
ARIMA(1,1,1)
2
SARIMA(0,1,2) 30.03819
(2,0,2,12) 9
We see that we have huge gain the RMSE value by including the seasonal parameters as
well.
Setting the seasonality as 24 for the second iteration of the auto SARIMA model.
SARIMAX Results
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 24) Log Likelihood -337.400
Date: Sun, 25 Dec 2022 AIC 690.800
Time: 23:47:23 BIC 709.856
Sample: 01-01-1980 HQIC 698.440
- 12-01-1990
Covariance
opg
Type:
coef std err z P>|z| [0.025 0.975]
ar.L1 0.2062 0.344 0.600 0.548 -0.467 0.880

ma.L1 -1.1235 0.392 -2.863 0.004 -1.893 -0.354
ma.L2 0.1543 0.379 0.407 0.684 -0.589 0.897
ar.S.L24 0.8289 0.136 6.107 0.000 0.563 1.095
ar.S.L48 0.0416 0.103 0.403 0.687 -0.160 0.244
ma.S.L24 -0.8878 0.195 -4.547 0.000 -1.270 -0.505
ma.S.L48 -0.1123 0.242 -0.464 0.643 -0.587 0.362
177.056 1.72e+0
sigma2 0.001 0.000 177.054 177.058
2 5
Prob(Q): 0.98 Prob(JB): 0.13
Heteroskedasticity
0.67 Skew: 0.56
(H):
RMSE
43.20189
ARIMA(2,1,1)
3
40.20790
ARIMA(1,1,1)
2
SARIMA(0,1,2) 30.03819
(2,0,2,12) 9
SARIMA(1,1,2) 23.12280
(2,0,2,24) 7
Building the most optimum model on the Full Data.

SARIMAX Results
SARIMAX(0, 1, 2)x(2, 0, 2,
12)
Date: Sun, 25 Dec 2022 AIC 1332.683
Time: 23:48:30 BIC 1354.166
Sample: 01-01-1980 HQIC 1341.407
- 07-01-1995
Covariance
opg
Type:
std
coef z P>|z| [0.025 0.975]
err
ma.L1 -0.6989 0.078 -8.937 0.000 -0.852 -0.546
ma.L2 -0.2026 0.079 -2.557 0.011 -0.358 -0.047
ar.S.L12 0.4266 0.062 6.903 0.000 0.305 0.548
ar.S.L24 0.3561 0.056 6.311 0.000 0.246 0.467
ma.S.L12 -0.1150 0.097 -1.188 0.235 -0.305 0.075
ma.S.L24 -0.2191 0.113 -1.933 0.053 -0.441 0.003
230.231
sigma2 23.741 9.698 0.000 183.701 276.763
8
Prob(Q): 0.77 Prob(JB): 0.11
Heteroskedasticity
0.59 Skew: 0.23
(H):
28.248216985592833
Please note the above RMSE is on the training data predictions only. You cannot calculate
RMSE for forecast as the actual values(which have not occured yet) are not available for it to be
compared with the forecasted value.
Predictions Based on the models built and the RSME scores from the various
models built:
 The plots show that it is a Downward sloping trend and seasonality.

 Though there has been a significant increase in the sales, there is still a
downfall over the years
 To increase the sales, we need to find the trend that was during the year
1980 and try to implement similar features into the wine.
 To increase the sales, we could find the right marketing technique/build
similar ways of connecting with the customers as in 1980.
 Find improvised ways of reaching the customers via current and
trending ways of reaching over to consumers.
 Target to improve the availability of the wine in stores early from
October to get the consumers used to the brand and ease of access.

TimeSeries Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TimeSeries Report

Uploaded by

Copyright:

Available Formats

Problem Statement:

Basic Descriptive statistics

Time Series Month Plot

Empirical Cumulative Distribution of Sales

Decomposition of Time series

Model 2: Naïve Approach

Method 3: Simple Average

Method 4: Moving Average (MA)

Method 5: Simple Exponential Smoothing

Setting different alpha values:

Perform Dickey-Fuller test: Ho : There is no stationarity

Results of Dickey-Fuller Test:

Autocorrelation & Partial Autocorrelation function

First few rows of Training Data

Last few rows of Training Data

First few rows of Test Data

Last few rows of Test Data

Check for stationarity of the Training Data Time Series.

Dep. Variable: Sparkling No. Observations: 132

Date: Sun, 25 Dec 2022 AIC 2213.509

Time: 23:17:33 BIC 2227.885

Sample: 01-01-1980 HQIC 2219.351

Covariance Type: opg

coef std err z P>|z| [0.025 0.975]

ar.L1 1.3121 0.046 28.781 0.000 1.223 1.401

ar.L2 -0.5593 0.072 -7.741 0.000 -0.701 -0.418

ma.L1 -1.9917 0.109 -18.217 0.000 -2.206 -1.777

ma.L2 0.9999 0.110 9.109 0.000 0.785 1.215

sigma 1.099e+0 5.51e+1

Ljung-Box (L1) (Q): 0.19 Jarque-Bera (JB): 14.46

Prob(Q): 0.67 Prob(JB): 0.00

Prob(H) (two-sided): 0.00 Kurtosis: 4.08

Predict on the Test Set & Evaluation - Auto Arima Model

Manual ARIMA model - Using ACF & PACF plots

Predict on the Test Set & Evaluation

Automated version of a SARIMA model -Parameter Selection with lowest

Auto SARIMA model - With Seasonality as 12

Dep. Variable: y No. Observations: 132

Date: Sun, 25 Dec 2022 AIC 1557.122

Time: 23:18:30 BIC 1575.632

Sample: 0 HQIC 1564.621

coef std err z P>|z| [0.025 0.975]

ma.L1 -0.7771 0.101 -7.678 0.000 -0.975 -0.579

ma.L2 -0.1239 0.121 -1.022 0.307 -0.361 0.114

ar.S.L12 0.6992 0.751 0.931 0.352 -0.773 2.171

ar.S.L24 0.3588 0.783 0.458 0.647 -1.175 1.893

ma.S.L1 1.6452 3.728 0.441 0.659 -5.661 8.952

Ljung-Box (L1) (Q): 0.03 Jarque-Bera (JB): 22.05

Prob(Q): 0.85 Prob(JB): 0.00

Prob(H) (two-sided): 0.28 Kurtosis: 5.04

Prediction on the Test Set & Evaluation

Dep. Variable: Sparkling No. Observations: 132

Date: Sun, 25 Dec 2022 AIC 1224.282

Time: 23:21:36 BIC 1243.338

Sample: 01-01-1980 HQIC 1231.922

coef std err z P>|z| [0.025 0.975]

ar.L1 0.3946 0.297 1.330 0.183 -0.187 0.976

ma.L1 -2.1097 0.815 -2.588 0.010 -3.707 -0.512

ma.L2 1.0108 0.830 1.218 0.223 -0.616 2.638

ar.S.L24 0.4197 0.122 3.434 0.001 0.180 0.659

ar.S.L48 0.7928 0.138 5.765 0.000 0.523 1.062

ma.S.L24 0.3180 0.728 0.437 0.662 -1.108 1.744