Download as pdf or txt
Download as pdf or txt
You are on page 1of 196

Problem:

For this particular assignment, the data of different types


of wine sales in the 20th century is to be analysed. Both
of these data are from the same company but of different
wines. As an analyst in the ABC Estate Wines, you are
tasked to analyse and forecast Wine Sales in the 20th
century.

Dataset - Rose.csv
In [1]: import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6

1. Read the data as an appropriate Time Series data and plot the
data.

In [2]: df = pd.read_csv("Rose.csv")

In [3]: df.head()

Out[3]:
YearMonth Rose

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Rose

0 1980-01 112.0

1 1980-02 118.0

2 1980-03 129.0

3 1980-04 99.0

4 1980-05 116.0

In [4]: df.tail()

Out[4]:
YearMonth Rose

182 1995-03 45.0

183 1995-04 52.0

184 1995-05 28.0

185 1995-06 40.0

186 1995-07 62.0

In [5]: df.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [6]: date = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date

Out[6]: DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',


'1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
'1980-09-30', '1980-10-31',
...
'1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
'1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
'1995-06-30', '1995-07-31'],
dtype='datetime64[ns]', length=187, freq='M')

In [7]: df['Time_Stamp'] = pd.DataFrame(date,columns=['Month'])

In [8]: df.head()

Out[8]:
YearMonth Rose Time_Stamp

0 1980-01 112.0 1980-01-31

1 1980-02 118.0 1980-02-29

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Rose Time_Stamp

2 1980-03 129.0 1980-03-31

3 1980-04 99.0 1980-04-30

4 1980-05 116.0 1980-05-31

In [9]: df['Time_Stamp']=pd.to_datetime(df['Time_Stamp'])
df= df.set_index('Time_Stamp')
df.drop(['YearMonth'],axis=1, inplace = True)
df.head()

Out[9]:
Rose

Time_Stamp

1980-01-31 112.0

1980-02-29 118.0

1980-03-31 129.0

1980-04-30 99.0

1980-05-31 116.0

In [10]: df.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.

In [11]: df.describe(include='all').T

Out[11]:
count mean std min 25% 50% 75% max

Rose 185.0 90.394595 39.175344 28.0 63.0 86.0 112.0 267.0

In [12]: df.isnull().sum()

Out[12]: Rose 2
dtype: int64

In [13]: df.shape

Out[13]: (187, 1)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [14]: df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 187 entries, 1980-01-31 to 1995-07-31
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rose 185 non-null float64
dtypes: float64(1)
memory usage: 2.9 KB

In [15]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot


sns.boxplot(x = df.index.year,y = df.values[:,0],ax=ax)
plt.xlabel('Yearly Rose Wine Sale');
plt.ylabel('Year');
plt.grid();

In [16]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.month_name(),y = df.values[:,0],ax=ax)
plt.xlabel('Monthly Rose Wine Sale');
plt.ylabel('Months');
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [17]: from statsmodels.graphics.tsaplots import month_plot

fig, ax = plt.subplots(figsize=(22,8))

month_plot(df,ylabel='Rose Wine Production',ax=ax)


plt.grid();

In [18]: monthly_sales_across_years = pd.pivot_table(df, values = 'Rose', column


s = df.index.month_name(), index = df.index.year)
monthly_sales_across_years

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Out[18]: Time_Stamp April August December February January July June March May Novembe

Time_Stamp

1980 99.0 129.0 267.0 118.0 112.0 118.0 168.0 129.0 116.0 150.0

1981 97.0 214.0 226.0 129.0 126.0 222.0 127.0 124.0 102.0 154.0

1982 97.0 117.0 169.0 77.0 89.0 117.0 121.0 82.0 127.0 134.0

1983 85.0 124.0 164.0 108.0 75.0 109.0 108.0 115.0 101.0 135.0

1984 87.0 142.0 159.0 85.0 88.0 87.0 87.0 112.0 91.0 139.0

1985 93.0 103.0 129.0 82.0 61.0 87.0 75.0 124.0 108.0 123.0

1986 71.0 118.0 141.0 65.0 57.0 110.0 67.0 67.0 76.0 107.0

1987 86.0 73.0 157.0 65.0 58.0 87.0 74.0 70.0 93.0 96.0

1988 66.0 77.0 135.0 115.0 63.0 79.0 83.0 70.0 67.0 100.0

1989 74.0 74.0 137.0 60.0 71.0 86.0 91.0 89.0 73.0 109.0

1990 77.0 70.0 132.0 69.0 43.0 78.0 76.0 73.0 69.0 110.0

1991 65.0 55.0 106.0 55.0 54.0 96.0 65.0 66.0 60.0 74.0

1992 53.0 52.0 91.0 47.0 34.0 67.0 55.0 56.0 53.0 58.0

1993 45.0 54.0 77.0 40.0 33.0 57.0 55.0 46.0 41.0 48.0

1994 48.0 NaN 84.0 35.0 30.0 NaN 45.0 42.0 44.0 63.0

1995 52.0 NaN NaN 39.0 30.0 62.0 40.0 45.0 28.0 NaN

In [19]: monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [20]: df_yearly_sum = df.resample('A').sum()
df_yearly_sum.head()

Out[20]:
Rose

Time_Stamp

1980-12-31 1758.0

1981-12-31 1780.0

1982-12-31 1348.0

1983-12-31 1324.0

1984-12-31 1280.0

In [21]: df_yearly_sum.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [22]: df_yearly_mean = df.resample('Y').mean()
df_yearly_mean.head()

Out[22]:
Rose

Time_Stamp

1980-12-31 146.500000

1981-12-31 148.333333

1982-12-31 112.333333

1983-12-31 110.333333

1984-12-31 106.666667

In [23]: df_yearly_mean.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [24]: df_quarterly_sum = df.resample('Q').sum()
df_quarterly_sum.head()

Out[24]:
Rose

Time_Stamp

1980-03-31 359.0

1980-06-30 383.0

1980-09-30 452.0

1980-12-31 564.0

1981-03-31 379.0

In [25]: df_quarterly_sum.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [26]: df_quarterly_mean = df.resample('Q').mean()
df_quarterly_mean.head()

Out[26]:
Rose

Time_Stamp

1980-03-31 119.666667

1980-06-30 127.666667

1980-09-30 150.666667

1980-12-31 188.000000

1981-03-31 126.333333

In [27]: df_quarterly_mean.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [28]: df_daily_sum = df.resample('D').sum()
df_daily_sum

Out[28]:
Rose

Time_Stamp

1980-01-31 112.0

1980-02-01 0.0

1980-02-02 0.0

1980-02-03 0.0

1980-02-04 0.0

... ...

1995-07-27 0.0

1995-07-28 0.0

1995-07-29 0.0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose

Time_Stamp

1995-07-30 0.0

1995-07-31 62.0

5661 rows × 1 columns

In [29]: df_daily_sum.plot()
plt.grid();

In [30]: df_decade_sum = df.resample('10Y').sum()


df_decade_sum

Out[30]:
Rose

Time_Stamp

1980-12-31 1758.0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose

Time_Stamp

1990-12-31 12094.0

2000-12-31 2871.0

In [31]: df_decade_sum.plot();
plt.grid()

In [32]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Rose Wine Sales');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [33]: # group by date and get average RetailSales, and precent change
average = df.groupby(df.index)["Rose"].mean()
pct_change = df.groupby(df.index)["Rose"].sum().pct_change()

fig, (axis1,axis2) = plt.subplots(2,1,sharex=True,figsize=(13,6))

# plot average RetailSales over time(year-month)


ax1 = average.plot(legend=True,ax=axis1,marker='o',title="Average Rose
Wine Sales",grid=True)
ax1.set_xticks(range(len(average)))
ax1.set_xticklabels(average.index.tolist())
# plot precent change for RetailSales over time(year-month)
ax2 = pct_change.plot(legend=True,ax=axis2,marker='o',colormap="summer"
,title="Rose Wine Sales Percent Change",grid=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [34]: df['1994']

Out[34]:
Rose

Time_Stamp

1994-01-31 30.0

1994-02-28 35.0

1994-03-31 42.0

1994-04-30 48.0

1994-05-31 44.0

1994-06-30 45.0

1994-07-31 NaN

1994-08-31 NaN

1994-09-30 46.0

1994-10-31 51.0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose

Time_Stamp

1994-11-30 63.0

1994-12-31 84.0

In [35]: df.interpolate(methods='spline',order=3,inplace=True)

In [36]: df['1994']

Out[36]:
Rose

Time_Stamp

1994-01-31 30.000000

1994-02-28 35.000000

1994-03-31 42.000000

1994-04-30 48.000000

1994-05-31 44.000000

1994-06-30 45.000000

1994-07-31 45.333333

1994-08-31 45.666667

1994-09-30 46.000000

1994-10-31 51.000000

1994-11-30 63.000000

1994-12-31 84.000000

In [37]: df.isna().sum()

Out[37]: Rose 0
dtype: int64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [38]: from statsmodels.tsa.seasonal import seasonal_decompose

In [39]: decomposition = seasonal_decompose(df['Rose'],model='additive')


decomposition.plot();

In [40]: trend = decomposition.trend


seasonality = decomposition.seasonal
residual = decomposition.resid

print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')

Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Name: trend, dtype: float64

Seasonality
Time_Stamp
1980-01-31 -27.908647
1980-02-29 -17.435632
1980-03-31 -9.285830
1980-04-30 -15.098330
1980-05-31 -10.196544
1980-06-30 -7.678687
1980-07-31 4.896908
1980-08-31 5.499686
1980-09-30 2.774686
1980-10-31 1.871908
1980-11-30 16.846908
1980-12-31 55.713575
Name: seasonal, dtype: float64

Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 -33.980241
1980-08-31 -24.624686
1980-09-30 53.850314
1980-10-31 -2.955241
1980-11-30 -14.263575
1980-12-31 66.161425
Name: resid, dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [41]: deaseasonalized_ts = trend + residual
deaseasonalized_ts.head(12)

Out[41]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 113.103092
1980-08-31 123.500314
1980-09-30 202.225314
1980-10-31 145.128092
1980-11-30 133.153092
1980-12-31 211.286425
dtype: float64

In [42]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [43]: decomposition = seasonal_decompose(df['Rose'],model='multiplicative')
decomposition.plot();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [44]: trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid

print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')

Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Name: trend, dtype: float64

Seasonality
Time_Stamp
1980-01-31 0.670111
1980-02-29 0.806163
1980-03-31 0.901164
1980-04-30 0.854024
1980-05-31 0.889415
1980-06-30 0.923985
1980-07-31 1.058038
1980-08-31 1.035881
1980-09-30 1.017648
1980-10-31 1.022573
1980-11-30 1.192349
1980-12-31 1.628646
Name: seasonal, dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 0.758258
1980-08-31 0.840720
1980-09-30 1.357674
1980-10-31 0.970771
1980-11-30 0.853378
1980-12-31 1.129646
Name: resid, dtype: float64

In [45]: deaseasonalized_ts = trend + residual


deaseasonalized_ts.head(12)

Out[45]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.841592
1980-08-31 148.965720
1980-09-30 149.732674
1980-10-31 149.054104
1980-11-30 148.270045
1980-12-31 146.254646
dtype: float64

In [46]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
3. Split the data into training and test. The test data should start in
1991.

In [47]: train=df[df.index.year< 1991]


test=df[df.index.year>=1991]

In [48]: print(train.shape)

(132, 1)

In [49]: print(test.shape)

(55, 1)

In [50]: print('First few rows of Training Data','\n',train.head(),'\n')


print('Last few rows of Training Data','\n',train.tail(),'\n')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print('First few rows of Test Data','\n',test.head(),'\n')
print('Last few rows of Test Data','\n',test.tail(),'\n')

First few rows of Training Data


Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0

Last few rows of Training Data


Rose
Time_Stamp
1990-08-31 70.0
1990-09-30 83.0
1990-10-31 65.0
1990-11-30 110.0
1990-12-31 132.0

First few rows of Test Data


Rose
Time_Stamp
1991-01-31 54.0
1991-02-28 55.0
1991-03-31 66.0
1991-04-30 65.0
1991-05-31 60.0

Last few rows of Test Data


Rose
Time_Stamp
1995-03-31 45.0
1995-04-30 52.0
1995-05-31 28.0
1995-06-30 40.0
1995-07-31 62.0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [51]: train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()

4. Build various exponential smoothing models on the training data


and evaluate the model using RMSE on the test data. Other models
such as regression,naïve forecast models and simple average
models. should also be built on the training data and check the
performance on the test data using RMSE.

In [52]: train_time = [i+1 for i in range(len(train))]


test_time = [i+43 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)

Training Time instance


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2
0, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 1
22, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6
0, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]

Model 1: Linear Regression

In [53]: LinearRegression_train = train.copy()


LinearRegression_test = test.copy()

In [54]: LinearRegression_train['time'] = train_time


LinearRegression_test['time'] = test_time

print('First few rows of Training Data','\n',LinearRegression_train.hea


d(),'\n')
print('Last few rows of Training Data','\n',LinearRegression_train.tail
(),'\n')
print('First few rows of Test Data','\n',LinearRegression_test.head(),'
\n')
print('Last few rows of Test Data','\n',LinearRegression_test.tail(),'
\n')

First few rows of Training Data


Rose time
Time_Stamp
1980-01-31 112.0 1
1980-02-29 118.0 2
1980-03-31 129.0 3
1980-04-30 99.0 4
1980-05-31 116.0 5

Last few rows of Training Data


Rose time
Time Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Time_Stamp
1990-08-31 70.0 128
1990-09-30 83.0 129
1990-10-31 65.0 130

1990-11-30 110.0 131


1990-12-31 132.0 132

First few rows of Test Data


Rose time
Time_Stamp
1991-01-31 54.0 43
1991-02-28 55.0 44
1991-03-31 66.0 45
1991-04-30 65.0 46
1991-05-31 60.0 47

Last few rows of Test Data


Rose time
Time_Stamp
1995-03-31 45.0 93
1995-04-30 52.0 94
1995-05-31 28.0 95
1995-06-30 40.0 96
1995-07-31 62.0 97

In [55]: from sklearn.linear_model import LinearRegression

In [56]: lr = LinearRegression()

In [57]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Rose'].
values)

Out[57]: LinearRegression()

In [58]: test_predictions_model1 = lr.predict(LinearRegression_test[['ti


me']])
LinearRegression_test['RegOnTime'] = test_predictions_model1

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.figure(figsize=(13,6))
plt.plot( train['Rose'], label='Train')
plt.plot(test['Rose'], label='Test')
plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_
Test Data')
plt.legend(loc='best')
plt.grid();

In [59]: from sklearn import metrics

In [60]: ## Test Data - RMSE

rmse_model1_test = metrics.mean_squared_error(test['Rose'],test_predict
ions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))

For RegressionOnTime forecast on the Test Data, RMSE is 51.433

In [61]: resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test]},index=['Regr

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
essionOnTime'])
resultsDf

Out[61]:
Test RMSE

RegressionOnTime 51.433312

Model 2 : Naive Approach

In [62]: NaiveModel_train = train.copy()


NaiveModel_test = test.copy()

In [63]: NaiveModel_test['naive'] = np.asarray(train['Rose'])[len(np.asarray(tra


in['Rose']))-1]
NaiveModel_test['naive'].head()

Out[63]: Time_Stamp
1991-01-31 132.0
1991-02-28 132.0
1991-03-31 132.0
1991-04-30 132.0
1991-05-31 132.0
Name: naive, dtype: float64

In [64]: plt.plot(NaiveModel_train['Rose'], label='Train')


plt.plot(test['Rose'], label='Test')
plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')
plt.legend(loc='best')
plt.title("Rose Wine Naive Forecast")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [65]: ## Test Data - RMSE

rmse_model2_test = metrics.mean_squared_error(test['Rose'],NaiveModel_t
est['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))

For Naive forecast on the Test Data, RMSE is 79.719

In [66]: resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test]},index=['Na


iveModel'])

resultsDf = pd.concat([resultsDf, resultsDf_2])


resultsDf

Out[66]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 3: Simple Average

In [67]: SimpleAverage_train = train.copy()


SimpleAverage_test = test.copy()

In [68]: SimpleAverage_test['mean_forecast'] = train['Rose'].mean()


SimpleAverage_test.head()

Out[68]:
Rose mean_forecast

Time_Stamp

1991-01-31 54.0 104.939394

1991-02-28 55.0 104.939394

1991-03-31 66.0 104.939394

1991-04-30 65.0 104.939394

1991-05-31 60.0 104.939394

In [69]: plt.plot(SimpleAverage_train['Rose'], label='Train')


plt.plot(SimpleAverage_test['Rose'], label='Test')
plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on
Test Data')
plt.legend(loc='best')
plt.title("Rose Wine Simple Average Forecast")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [70]: ## Test Data - RMSE

rmse_model3_test = metrics.mean_squared_error(test['Rose'],SimpleAverag
e_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))

For Simple Average forecast on the Test Data, RMSE is 53.461

In [71]: resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_model3_test]},index=['Si


mpleAverageModel'])

resultsDf = pd.concat([resultsDf, resultsDf_3])


resultsDf

Out[71]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

SimpleAverageModel 53.460570

Model 4: Moving Average

In [72]: MovingAverage = df.copy()


MovingAverage.head()

Out[72]:
Rose

Time_Stamp

1980-01-31 112.0

1980-02-29 118.0

1980-03-31 129.0

1980-04-30 99.0

1980-05-31 116.0

In [73]: MovingAverage['Trailing_2'] = MovingAverage['Rose'].rolling(2).mean()


MovingAverage['Trailing_4'] = MovingAverage['Rose'].rolling(4).mean()
MovingAverage['Trailing_6'] = MovingAverage['Rose'].rolling(6).mean()
MovingAverage['Trailing_9'] = MovingAverage['Rose'].rolling(9).mean()

MovingAverage.head()

Out[73]:
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9

Time_Stamp

1980-01-31 112.0 NaN NaN NaN NaN

1980-02-29 118.0 115.0 NaN NaN NaN

1980-03-31 129.0 123.5 NaN NaN NaN

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9

Time_Stamp

1980-04-30 99.0 114.0 114.5 NaN NaN

1980-05-31 116.0 107.5 115.5 NaN NaN

In [74]: # Plotting on the whole data

plt.plot(MovingAverage['Rose'], label='Train')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_4'], label='4 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.plot(MovingAverage['Trailing_9'],label = '9 Point Moving Average')
plt.title("Rose Wine Moving Average")
plt.legend(loc = 'best')
plt.grid();

In [75]: #Creating train and test set


trailing_MovingAverage_train=MovingAverage[df.index.year< 1991]
trailing_MovingAverage_test=MovingAverage[df.index.year>=1991]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [76]: ## Plotting on both the Training and Test data

plt.plot(trailing_MovingAverage_train['Rose'], label='Train')
plt.plot(trailing_MovingAverage_test['Rose'], label='Test')

plt.plot(trailing_MovingAverage_train['Trailing_2'], label='2 Point Tra


iling Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_4'], label='4 Point Tra
iling Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_6'],label = '6 Point Tr
ailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_9'],label = '9 Point Tr
ailing Moving Average on Training Set')

plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trai


ling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_4'], label='4 Point Trai
ling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_6'],label = '6 Point Tra
iling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_9'],label = '9 Point Tra
iling Moving Average on Test Set')
plt.legend(loc = 'best')
plt.title("Rose Wine Moving Average")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [77]: ## Test Data - RMSE --> 2 point Trailing MA

rmse_model4_test2 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_2'],squared=False)
print("For 2 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test2))

## Test Data - RMSE --> 4 point Trailing MA

rmse_model4_test4 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_4'],squared=False)
print("For 4 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test4))

## Test Data - RMSE --> 6 point Trailing MA

rmse_model4_test6 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_6'],squared=False)
print("For 6 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test6))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## Test Data - RMSE --> 9 point Trailing MA

rmse_model4_test9 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_9'],squared=False)
print("For 9 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f " %(rmse_model4_test9))

For 2 point Moving Average Model forecast on the Training Data, RMSE i
s 11.529
For 4 point Moving Average Model forecast on the Training Data, RMSE i
s 14.451
For 6 point Moving Average Model forecast on the Training Data, RMSE i
s 14.566
For 9 point Moving Average Model forecast on the Training Data, RMSE i
s 14.728

In [78]: resultsDf_4 = pd.DataFrame({'Test RMSE': [rmse_model4_test2,rmse_model4


_test4
,rmse_model4_test6,rmse_model
4_test9]}
,index=['2pointTrailingMovingAverage','4poin
tTrailingMovingAverage'
,'6pointTrailingMovingAverage','9poi
ntTrailingMovingAverage'])

resultsDf = pd.concat([resultsDf, resultsDf_4])


resultsDf

Out[78]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9 i tT ili M i A 14 727630
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
9pointTrailingMovingAverage 14.727630
Test RMSE

Model 5: Simple Exponential Smoothing

In [79]: from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothin


g, Holt
import warnings
warnings.filterwarnings("ignore")

In [80]: SES_train = train.copy()


SES_test = test.copy()

In [81]: model_SES = SimpleExpSmoothing(SES_train['Rose'])

In [82]: model_SES_autofit = model_SES.fit(optimized=True)

In [83]: model_SES_autofit.params

Out[83]: {'smoothing_level': 0.09874997205700975,


'smoothing_slope': nan,
'smoothing_seasonal': nan,
'damping_slope': nan,
'initial_level': 134.3867859443936,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}

In [84]: SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))


SES_test.head()

Out[84]:
Rose predict

Time_Stamp

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose predict

Time_Stamp

1991-01-31 54.0 87.105001

1991-02-28 55.0 87.105001

1991-03-31 66.0 87.105001

1991-04-30 65.0 87.105001

1991-05-31 60.0 87.105001

In [85]: ## Plotting on both the Training and Test data

plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =0.098 Simple Exponential Sm


oothing predictions on Test Set')

plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.098 Predictions for Rose Wine');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [86]: ## Test Data

rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Rose'],SES_te
st['predict'],squared=False)
print("For Alpha =0.098 Simple Exponential Smoothing Model forecast on
the Test Data, RMSE is %3.3f" %(rmse_model5_test_1))

For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Tes
t Data, RMSE is 36.796

In [87]: resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1]},index=[


'Alpha=0.098,SimpleExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_5])


resultsDf

Out[87]:
Test RMSE

RegressionOnTime 51.433312

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

In [88]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMS


E': []})
resultsDf_6

Out[88]:
Alpha Values Train RMSE Test RMSE

In [89]: for i in np.arange(0.3,1,0.1):


model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False
,use_brute=True)
SES_train['predict',i] = model_SES_alpha_i.fittedvalues
SES_test['predict',i] = model_SES_alpha_i.forecast(steps=len(test))

rmse_model5_train_i = metrics.mean_squared_error(SES_train['Rose'],
SES_train['predict',i],squared=False)

rmse_model5_test_i = metrics.mean_squared_error(SES_test['Rose'],SE
S_test['predict',i],squared=False)

resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train RMSE':rms


e_model5_train_i

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
,'Test RMSE':rmse_model5_test_i},
ignore_index=True)

In [90]: resultsDf_6.sort_values(by=['Test RMSE'],ascending=True)

Out[90]:
Alpha Values Train RMSE Test RMSE

0 0.3 32.470164 47.504821

1 0.4 33.035130 53.767406

2 0.5 33.682839 59.641786

3 0.6 34.441171 64.971288

4 0.7 35.323261 69.698162

5 0.8 36.334596 73.773992

6 0.9 37.482782 77.139276

In [91]: ## Plotting on both the Training and Test data

plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smooth


ing predictions on Test Set')

plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential


Smoothing predictions on Test Set')
plt.title("Alpha= 0.3 Simple Exponential Smoothing Rose Wine")

plt.legend(loc='best')
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [92]: resultsDf_6_1 = pd.DataFrame({'Test RMSE': [resultsDf_6.sort_values(by=
['Test RMSE'],ascending=True).values[0][2]]}
,index=['Alpha=0.3,SimpleExponentialSmoothin
g'])

resultsDf = pd.concat([resultsDf, resultsDf_6_1])


resultsDf

Out[92]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Model 6: Double Exponential Smoothing - Holt's Model

In [93]: DES_train = train.copy()


DES_test = test.copy()

In [94]: model_DES = Holt(DES_train['Rose'])

In [95]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train R


MSE':[],'Test RMSE': []})
resultsDf_7

Out[95]:
Alpha Values Beta Values Train RMSE Test RMSE

In [96]: for i in np.arange(0.3,1.1,0.1):


for j in np.arange(0.1,1.1,0.1):
model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing
_slope=j,optimized=False,use_brute=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=le
n(test))

rmse_model6_train = metrics.mean_squared_error(DES_train['Rose'
],DES_train['predict',i,j],squared=False)

rmse_model6_test = metrics.mean_squared_error(DES_test['Rose'],
DES_test['predict',i,j],squared=False)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Value
s':j,'Train RMSE':rmse_model6_train
,'Test RMSE':rmse_model6_test
}, ignore_index=True)

In [97]: resultsDf_7

Out[97]:
Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.1 33.611269 98.653317

1 0.3 0.2 34.645117 177.140327

2 0.3 0.3 35.944983 265.567594

3 0.3 0.4 37.393239 358.750942

4 0.3 0.5 38.888325 451.810230

... ... ... ... ...

75 1.0 0.6 51.831610 801.680218

76 1.0 0.7 54.497039 841.892573

77 1.0 0.8 57.365879 853.965537

78 1.0 0.9 60.474309 834.710935

79 1.0 1.0 63.873454 780.079579

80 rows × 4 columns

In [98]: resultsDf_7.sort_values(by=['Test RMSE']).head()

Out[98]:
Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.1 33.611269 98.653317

10 0.4 0.1 34.255060 128.978579

20 0.5 0.1 34.957515 155.358815

1 0.3 0.2 34.645117 177.140327

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Alpha Values Beta Values Train RMSE Test RMSE

30 0.6 0.1 35.781643 178.004967

In [99]: ## Plotting on both the Training and Test data

plt.plot(DES_train['Rose'], label='Train')
plt.plot(DES_test['Rose'], label='Test')

plt.plot(DES_test['predict', 0.3, 0.1], label='Alpha=0.3,Beta=0.1,Doubl


eExponentialSmoothing predictions on Test Set')
plt.title("Alpha= 0.3, Beta= 0.1 Double Exponential Smoothing Rose Win
e")

plt.legend(loc='best')
plt.grid();

In [100]: resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7.sort_values(by=


['Test RMSE']).values[0][3]]}
,index=['Alpha=0.3,Beta=0.1,DoubleExponentia

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_7_1])


resultsDf

Out[100]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Model 7: Triple Exponential Smoothing - Holt Winter's Model

In [101]: TES_train = train.copy()


TES_test = test.copy()

In [102]: model_TES = ExponentialSmoothing(TES_train['Rose'],trend='additive',sea


sonal='multiplicative',freq='M')

In [103]: model_TES_autofit = model_TES.fit()

In [104]: model_TES_autofit.params

Out[104]: {'smoothing_level': 0.10609625889897807,

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
'smoothing_slope': 0.04843854924060428,
'smoothing_seasonal': 0.0,
'damping_slope': nan,
'initial_level': 76.65565156766287,
'initial_slope': 0.0,
'initial_seasons': array([1.47550285, 1.65927163, 1.80572657, 1.588888
44, 1.77822725,
1.92604397, 2.11649487, 2.25135227, 2.11690616, 2.08112861,
2.40927315, 3.30448176]),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}

In [105]: ## Prediction on the test data

TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()

Out[105]:
Rose auto_predict

Time_Stamp

1991-01-31 54.0 56.674338

1991-02-28 55.0 63.471275

1991-03-31 66.0 68.788790

1991-04-30 65.0 60.277828

1991-05-31 60.0 67.180379

In [106]: ## Plotting on both the Training and Test using autofit

plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')

plt.plot(TES_test['auto_predict'], label='Alpha=0.106,Beta=0.048,Gamma=
0.0,TripleExponentialSmoothing predictions on Test Set')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title("Alpha= 0.106, Beta=0.048, Gamma=0.0 Triple Exponential Smoot
hing Rose Wine")

plt.legend(loc='best')
plt.grid();

In [107]: ## Test Data

rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Rose'],TES_te
st['auto_predict'],squared=False)
print("For Alpha=0.106,Beta=0.048,Gamma=0.0, Triple Exponential Smoothi
ng Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_test_
1))

For Alpha=0.106,Beta=0.048,Gamma=0.0, Triple Exponential Smoothing Mode


l forecast on the Test Data, RMSE is 17.369

In [108]: resultsDf_8_1 = pd.DataFrame({'Test RMSE': [rmse_model6_test_1]}


,index=['Alpha=0.106,Beta=0.048,Gamma=0.0,Tr
ipleExponentialSmoothing'])

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf = pd.concat([resultsDf, resultsDf_8_1])
resultsDf

Out[108]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

In [109]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_8_2 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Gamma


Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_8_2

Out[109]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE

In [110]: for i in np.arange(0.3,1.1,0.1):


for j in np.arange(0.3,1.1,0.1):
for k in np.arange(0.3,1.1,0.1):
model_TES_alpha_i_j_k = model_TES.fit(smoothing_level=i,smo
othing_slope=j,smoothing_seasonal=k,optimized=False,use_brute=True)
TES_train['predict',i,j,k] = model_TES_alpha_i_j_k.fittedva

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))

rmse_model8_train = metrics.mean_squared_error(TES_train['R
ose'],TES_train['predict',i,j,k],squared=False)

rmse_model8_test = metrics.mean_squared_error(TES_test['Ros
e'],TES_test['predict',i,j,k],squared=False)

resultsDf_8_2 = resultsDf_8_2.append({'Alpha Values':i,'Bet


a Values':j,'Gamma Values':k,
'Train RMSE':rmse_mod
el8_train,'Test RMSE':rmse_model8_test}
, ignore_index=True)

In [111]: resultsDf_8_2.sort_values(by=['Test RMSE']).head()

Out[111]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE

8 0.3 0.4 0.3 28.111886 10.945435

1 0.3 0.3 0.4 27.399095 11.201633

69 0.4 0.3 0.8 32.601491 12.615607

16 0.3 0.5 0.3 29.087520 14.414604

131 0.5 0.3 0.6 32.144773 16.720720

In [112]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination

plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')

#The value of alpha and beta is taken like that by python


plt.plot(TES_test['predict', 0.3, 0.4, 0.3], label='Alpha=0.3,Beta=0.4,
Gamma=0.3,TripleExponentialSmoothing predictions on Test Set')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title("Alpha= 0.3, Beta=0.4, Gamma=0.3 Triple Exponential Smoothing
Rose Wine")

plt.legend(loc='best')
plt.grid();

In [113]: resultsDf_8_3 = pd.DataFrame({'Test RMSE': [resultsDf_8_2.sort_values(b


y=['Test RMSE']).values[0][4]]}
,index=['Alpha=0.3,Beta=0.4,Gamma=0.3,Triple
ExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_8_3])


resultsDf

Out[113]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.

In [114]: ## Test for stationarity of the series - Dicky Fuller test

from statsmodels.tsa.stattools import adfuller


def test_stationarity(timeseries):

#Determing rolling statistics


rolmean = timeseries.rolling(window=7).mean() #determining the roll
ing mean
rolstd = timeseries.rolling(window=7).std() #determining the roll
ing standard deviation

#Plot rolling statistics:


orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation For Rose Wine Producti
on')
plt.show(block=False)

#Perform Dickey-Fuller test:


print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value'
,'#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')

In [115]: test_stationarity(df['Rose'])

Results of Dickey-Fuller Test:


Test Statistic -1.876699
p-value 0.343101
#Lags Used 13.000000

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
Critical Value (5%) -2.878396
Critical Value (10%) -2.575756
dtype: float64

In [116]: test_stationarity(df['Rose'].diff().dropna())

Results of Dickey-Fuller Test:


Test Statistic -8.044392e+00
p-value 1.810895e-12
#Lags Used 1.200000e+01
Number of Observations Used 1.730000e+02
Critical Value (1%) -3.468726e+00
Critical Value (5%) -2.878396e+00
Critical Value (10%) -2.575756e+00
dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
6. Build an automated version of the ARIMA/SARIMA model in which
the parameters are selected using the lowest Akaike Information
Criteria (AIC) on the training data and evaluate this model on the test
data using RMSE.

In [117]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [118]: plot_acf(df['Rose'],lags=50)
plot_acf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Difference
d Data Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [119]: plot_pacf(df['Rose'],lags=50)
plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Differenc
ed Data Partial Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [120]: test_stationarity(train['Rose'])

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Results of Dickey-Fuller Test:
Test Statistic -2.164250
p-value 0.219476
#Lags Used 13.000000
Number of Observations Used 118.000000
Critical Value (1%) -3.487022
Critical Value (5%) -2.886363
Critical Value (10%) -2.580009
dtype: float64

In [121]: test_stationarity(train['Rose'].diff().dropna())

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Results of Dickey-Fuller Test:
Test Statistic -6.592372e+00
p-value 7.061944e-09
#Lags Used 1.200000e+01
Number of Observations Used 1.180000e+02
Critical Value (1%) -3.487022e+00
Critical Value (5%) -2.886363e+00
Critical Value (10%) -2.580009e+00
dtype: float64

Automated ARIMA Model

In [122]: import itertools


p = q = range(0, 3)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
print('Some parameter combinations for the Model...')
for i in range(1,len(pdq)):
print('Model: {}'.format(pdq[i]))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Some parameter combinations for the Model...
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)

In [123]: # Creating an empty Dataframe with column names only


ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
ARIMA_AIC

Out[123]:
param AIC

In [124]: from statsmodels.tsa.arima_model import ARIMA

for param in pdq:


ARIMA_model = ARIMA(train['Rose'].values,order=param).fit()
print('ARIMA{} - AIC:{}'.format(param,ARIMA_model.aic))
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic
}, ignore_index=True)

ARIMA(0, 1, 0) - AIC:1335.1526583086775
ARIMA(0, 1, 1) - AIC:1280.7261830464054
ARIMA(0, 1, 2) - AIC:1276.8353768178954
ARIMA(1, 1, 0) - AIC:1319.3483105801852
ARIMA(1, 1, 1) - AIC:1277.775753909814
ARIMA(1, 1, 2) - AIC:1277.3592237914563
ARIMA(2, 1, 0) - AIC:1300.6092611743886
ARIMA(2, 1, 1) - AIC:1279.0456894093122
ARIMA(2, 1, 2) - AIC:1279.298693936773

In [125]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
ARIMA_AIC.sort_values(by='AIC',ascending=True)

Out[125]:
param AIC

2 (0, 1, 2) 1276.835377

5 (1, 1, 2) 1277.359224

4 (1, 1, 1) 1277.775754

7 (2, 1, 1) 1279.045689

8 (2, 1, 2) 1279.298694

1 (0, 1, 1) 1280.726183

6 (2, 1, 0) 1300.609261

3 (1, 1, 0) 1319.348311

0 (0, 1, 0) 1335.152658

In [126]: auto_ARIMA = ARIMA(train['Rose'], order=(0,1,2),freq='M')

results_auto_ARIMA = auto_ARIMA.fit()

print(results_auto_ARIMA.summary())

ARIMA Model Results


=======================================================================
=======
Dep. Variable: D.Rose No. Observations:
131
Model: ARIMA(0, 1, 2) Log Likelihood -
634.418
Method: css-mle S.D. of innovations
30.167
Date: Sun, 20 Jun 2021 AIC 1
276.835
Time: 23:02:56 BIC 1
288.336
Sample: 02-29-1980 HQIC 1

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
281.509
- 12-31-1990
=======================================================================
=========
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
---------
const -0.4885 0.085 -5.742 0.000 -0.655
-0.322
ma.L1.D.Rose -0.7601 0.101 -7.499 0.000 -0.959
-0.561
ma.L2.D.Rose -0.2398 0.095 -2.518 0.012 -0.427
-0.053
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
MA.1 1.0001 +0.0000j 1.0001
0.0000
MA.2 -4.1695 +0.0000j 4.1695
0.5000
-----------------------------------------------------------------------
------

In [127]: predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(test))

In [128]: from sklearn.metrics import mean_squared_error


rmse = mean_squared_error(test['Rose'],predicted_auto_ARIMA[0],squared=
False)
print(rmse)

15.618912366977964

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [129]: resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}
,index=['ARIMA(0,1,2)'])

resultsDf_9

Out[129]:
Test RMSE

ARIMA(0,1,2) 15.618912

In [130]: resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}


,index=['ARIMA(0,1,2)'])

resultsDf = pd.concat([resultsDf, resultsDf_9])


resultsDf

Out[130]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

ARIMA(0,1,2) 15.618912

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Automated SARIMA Model

In [131]: plot_acf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Aut


ocorrelation for Rose Wine Production')
plt.show()

Setting the seasonality as 6 for the first iteration of the auto SARIMA model.

We see that there can be a seasonality of 6 as well as 12. We will run our auto SARIMA models
by setting seasonality both as 6 and 12.

Setting the seasonality as 6 for the first iteration of the auto SARIMA model.

In [132]: import itertools


p = q = range(0, 3)
d= range(1,2)
D = range(0,1)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
pdq = list(itertools.product(p, d, q))
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))

Examples of some parameter combinations for Model...


Model: (0, 1, 1)(0, 0, 1, 6)
Model: (0, 1, 2)(0, 0, 2, 6)
Model: (1, 1, 0)(1, 0, 0, 6)
Model: (1, 1, 1)(1, 0, 1, 6)
Model: (1, 1, 2)(1, 0, 2, 6)
Model: (2, 1, 0)(2, 0, 0, 6)
Model: (2, 1, 1)(2, 0, 1, 6)
Model: (2, 1, 2)(2, 0, 2, 6)

In [133]: SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])


SARIMA_AIC

Out[133]:
param seasonal AIC

In [134]: import statsmodels.api as sm

for param in pdq:


for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_season
al,
enforce_stationarity=False,
enforce_invertibility=False
)

results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)

SARIMA(0, 1, 0)x(0, 0, 0, 6) - AIC:1323.9657875279158


SARIMA(0, 1, 0)x(0, 0, 1, 6) - AIC:1264.4996261104898
SARIMA(0, 1, 0)x(0, 0, 2, 6) - AIC:1144.7077471821074
SARIMA(0, 1, 0)x(1, 0, 0, 6) - AIC:1274.7897737087983
SARIMA(0, 1, 0)x(1, 0, 1, 6) - AIC:1241.7870945124614
SARIMA(0, 1, 0)x(1, 0, 2, 6) - AIC:1146.309326670239
SARIMA(0, 1, 0)x(2, 0, 0, 6) - AIC:1137.9167236212038
SARIMA(0, 1, 0)x(2, 0, 1, 6) - AIC:1137.453362940508
SARIMA(0, 1, 0)x(2, 0, 2, 6) - AIC:1117.0224424595006
SARIMA(0, 1, 1)x(0, 0, 0, 6) - AIC:1263.536909726376
SARIMA(0, 1, 1)x(0, 0, 1, 6) - AIC:1201.383254801385
SARIMA(0, 1, 1)x(0, 0, 2, 6) - AIC:1097.1908217817183
SARIMA(0, 1, 1)x(1, 0, 0, 6) - AIC:1222.4354735757836
SARIMA(0, 1, 1)x(1, 0, 1, 6) - AIC:1160.4386253543662
SARIMA(0, 1, 1)x(1, 0, 2, 6) - AIC:1085.0930138762844
SARIMA(0, 1, 1)x(2, 0, 0, 6) - AIC:1095.7490380128581
SARIMA(0, 1, 1)x(2, 0, 1, 6) - AIC:1097.6455026282051
SARIMA(0, 1, 1)x(2, 0, 2, 6) - AIC:1053.0044075906678
SARIMA(0, 1, 2)x(0, 0, 0, 6) - AIC:1251.6675430829725
SARIMA(0, 1, 2)x(0, 0, 1, 6) - AIC:1192.0017194588286
SARIMA(0, 1, 2)x(0, 0, 2, 6) - AIC:1081.8324069308578
SARIMA(0, 1, 2)x(1, 0, 0, 6) - AIC:1222.0132244585552
SARIMA(0, 1, 2)x(1, 0, 1, 6) - AIC:1153.8519348143127
SARIMA(0, 1, 2)x(1, 0, 2, 6) - AIC:1061.4359815067078
SARIMA(0, 1, 2)x(2, 0, 0, 6) - AIC:1089.0244978427286
SARIMA(0, 1, 2)x(2, 0, 1, 6) - AIC:1090.226508419863
SARIMA(0, 1, 2)x(2, 0, 2, 6) - AIC:1043.6002608608978
SARIMA(1, 1, 0)x(0, 0, 0, 6) - AIC:1308.161871082466
SARIMA(1, 1, 0)x(0, 0, 1, 6) - AIC:1249.8763226674776
SARIMA(1, 1, 0)x(0, 0, 2, 6) - AIC:1135.549810610796
SARIMA(1, 1, 0)x(1, 0, 0, 6) - AIC:1250.6246888205665
SARIMA(1, 1, 0)x(1, 0, 1, 6) - AIC:1230.6009595740952
SARIMA(1, 1, 0)x(1, 0, 2, 6) - AIC:1133.8029697276697
SARIMA(1, 1, 0)x(2, 0, 0, 6) - AIC:1123.2830148973985
SARIMA(1, 1, 0)x(2, 0, 1, 6) - AIC:1120.9425391966383
SARIMA(1, 1, 0)x(2, 0, 2, 6) - AIC:1105.909264565081
SARIMA(1, 1, 1)x(0, 0, 0, 6) - AIC:1262.1840064428402
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(1, 1, 1)x(0, 0, 0, 6) AIC:1262.1840064428402
SARIMA(1, 1, 1)x(0, 0, 1, 6) - AIC:1201.5037144792482
SARIMA(1, 1, 1)x(0, 0, 2, 6) - AIC:1093.6044318878214
SARIMA(1, 1, 1)x(1, 0, 0, 6) - AIC:1213.6233142837375
SARIMA(1, 1, 1)x(1, 0, 1, 6) - AIC:1162.4240004669416
SARIMA(1, 1, 1)x(1, 0, 2, 6) - AIC:1083.2585835667671
SARIMA(1, 1, 1)x(2, 0, 0, 6) - AIC:1083.9006911564184
SARIMA(1, 1, 1)x(2, 0, 1, 6) - AIC:1083.1711266498435
SARIMA(1, 1, 1)x(2, 0, 2, 6) - AIC:1052.7784687435267
SARIMA(1, 1, 2)x(0, 0, 0, 6) - AIC:1251.9495040731845
SARIMA(1, 1, 2)x(0, 0, 1, 6) - AIC:1193.2804055763945
SARIMA(1, 1, 2)x(0, 0, 2, 6) - AIC:1083.8066266793235
SARIMA(1, 1, 2)x(1, 0, 0, 6) - AIC:1213.2183953173044
SARIMA(1, 1, 2)x(1, 0, 1, 6) - AIC:1155.4829112286116
SARIMA(1, 1, 2)x(1, 0, 2, 6) - AIC:1061.3428439327972
SARIMA(1, 1, 2)x(2, 0, 0, 6) - AIC:1081.9393759564061
SARIMA(1, 1, 2)x(2, 0, 1, 6) - AIC:1083.4056423060708
SARIMA(1, 1, 2)x(2, 0, 2, 6) - AIC:1041.6558173267808
SARIMA(2, 1, 0)x(0, 0, 0, 6) - AIC:1280.2537561535767
SARIMA(2, 1, 0)x(0, 0, 1, 6) - AIC:1231.9630734580444
SARIMA(2, 1, 0)x(0, 0, 2, 6) - AIC:1128.9876565093011
SARIMA(2, 1, 0)x(1, 0, 0, 6) - AIC:1219.0664589804435
SARIMA(2, 1, 0)x(1, 0, 1, 6) - AIC:1186.6130717644878
SARIMA(2, 1, 0)x(1, 0, 2, 6) - AIC:1111.6702481022785
SARIMA(2, 1, 0)x(2, 0, 0, 6) - AIC:1099.0398509003082
SARIMA(2, 1, 0)x(2, 0, 1, 6) - AIC:1093.0537127923583
SARIMA(2, 1, 0)x(2, 0, 2, 6) - AIC:1078.6114741686215
SARIMA(2, 1, 1)x(0, 0, 0, 6) - AIC:1263.2315232809276
SARIMA(2, 1, 1)x(0, 0, 1, 6) - AIC:1201.4126986427814
SARIMA(2, 1, 1)x(0, 0, 2, 6) - AIC:1092.4754616851496
SARIMA(2, 1, 1)x(1, 0, 0, 6) - AIC:1199.8335858392797
SARIMA(2, 1, 1)x(1, 0, 1, 6) - AIC:1161.5686920204394
SARIMA(2, 1, 1)x(1, 0, 2, 6) - AIC:1079.8188703829414
SARIMA(2, 1, 1)x(2, 0, 0, 6) - AIC:1071.6995915165269
SARIMA(2, 1, 1)x(2, 0, 1, 6) - AIC:1068.4781624473278
SARIMA(2, 1, 1)x(2, 0, 2, 6) - AIC:1051.6734609013886
SARIMA(2, 1, 2)x(0, 0, 0, 6) - AIC:1253.9102116555646
SARIMA(2, 1, 2)x(0, 0, 1, 6) - AIC:1185.7695607122846
SARIMA(2, 1, 2)x(0, 0, 2, 6) - AIC:1082.558108053046
SARIMA(2, 1, 2)x(1, 0, 0, 6) - AIC:1200.4217492658531
SARIMA(2, 1, 2)x(1, 0, 1, 6) - AIC:1148.0237769524824
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(2, 1, 2)x(1, 0, 1, 6) AIC:1148.0237769524824
SARIMA(2, 1, 2)x(1, 0, 2, 6) - AIC:1063.110321547121
SARIMA(2, 1, 2)x(2, 0, 0, 6) - AIC:1073.6961455819637
SARIMA(2, 1, 2)x(2, 0, 1, 6) - AIC:1070.0771818695766
SARIMA(2, 1, 2)x(2, 0, 2, 6) - AIC:1045.2869001173763

In [135]: SARIMA_AIC.sort_values(by=['AIC']).head()

Out[135]:
param seasonal AIC

53 (1, 1, 2) (2, 0, 2, 6) 1041.655817

26 (0, 1, 2) (2, 0, 2, 6) 1043.600261

80 (2, 1, 2) (2, 0, 2, 6) 1045.286900

71 (2, 1, 1) (2, 0, 2, 6) 1051.673461

44 (1, 1, 1) (2, 0, 2, 6) 1052.778469

In [136]: import statsmodels.api as sm

auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 2),
seasonal_order=(2, 0, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_6 = auto_SARIMA_6.fit(maxiter=1000)
print(results_auto_SARIMA_6.summary())

SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 6) Log Likelihood
-512.828
Date: Sun, 20 Jun 2021 AIC
1041.656
Time: 23:03:40 BIC
1063.685

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1050.598
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.5939 0.152 -3.899 0.000 -0.892
-0.295
ma.L1 -0.1954 841.392 -0.000 1.000 -1649.294 1
648.903
ma.L2 -0.8046 676.954 -0.001 0.999 -1327.610 1
326.000
ar.S.L6 -0.0625 0.035 -1.764 0.078 -0.132
0.007
ar.S.L12 0.8451 0.039 21.885 0.000 0.769
0.921
ma.S.L6 0.2226 988.816 0.000 1.000 -1937.820 1
938.266
ma.S.L12 -0.7774 768.775 -0.001 0.999 -1507.549 1
505.994
sigma2 335.2016 4.16e+05 0.001 0.999 -8.14e+05
8.15e+05
=======================================================================
============
Ljung-Box (Q): 15.89 Jarque-Bera (JB):
56.68
Prob(Q): 1.00 Prob(JB):
0.00
Heteroskedasticity (H): 0.47 Skew:
0.52
Prob(H) (two-sided): 0.02 Kurtosis:
6.26
=======================================================================
============

Warnings:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [137]: results_auto_SARIMA_6.plot_diagnostics()
plt.show()

In [138]: predicted_auto_SARIMA_6 = results_auto_SARIMA_6.get_forecast(steps=len(


test))

In [139]: predicted_auto_SARIMA_6.summary_frame(alpha=0.05).head()

Out[139]:
y mean mean_se mean_ci_lower mean_ci_upper

0 62.840510 18.848142 25.898830 99.782190

1 67.629884 19.300034 29.802513 105.457256

2 74.746090 19.412600 36.698092 112.794087

3 71.324853 19.475547 33.153482 109.496224

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper

4 76.017352 19.483825 37.829756 114.204948

In [140]: rmse = mean_squared_error(test['Rose'],predicted_auto_SARIMA_6.predicte


d_mean,squared=False)
print(rmse)

26.133554443168514

In [141]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(1,1,2)(2,0,2,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[141]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

ARIMA(0,1,2) 15.618912

SARIMA(1,1,2)(2,0,2,6) 26.133554

Setting the seasonality as 12 for the iteration of the auto SARIMA model.

In [142]: import itertools


p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
pdq = list(itertools.product(p, d, q))
model_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p,
D, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))

Examples of some parameter combinations for Model...


Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)

In [143]: SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])


SARIMA_AIC

Out[143]:
param seasonal AIC

In [144]: import statsmodels.api as sm

for param in pdq:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_season
al,
enforce_stationarity=False,
enforce_invertibility=False
)

results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)

SARIMA(0, 1, 0)x(0, 0, 0, 12) - AIC:1323.9657875279158


SARIMA(0, 1, 0)x(0, 0, 1, 12) - AIC:1145.423082720736
SARIMA(0, 1, 0)x(0, 0, 2, 12) - AIC:976.4375296380883
SARIMA(0, 1, 0)x(1, 0, 0, 12) - AIC:1139.921738995602
SARIMA(0, 1, 0)x(1, 0, 1, 12) - AIC:1116.020786968748
SARIMA(0, 1, 0)x(1, 0, 2, 12) - AIC:969.691364004579
SARIMA(0, 1, 0)x(2, 0, 0, 12) - AIC:960.8812220353041
SARIMA(0, 1, 0)x(2, 0, 1, 12) - AIC:962.8794541633738
SARIMA(0, 1, 0)x(2, 0, 2, 12) - AIC:955.5735409343464
SARIMA(0, 1, 1)x(0, 0, 0, 12) - AIC:1263.536909726376
SARIMA(0, 1, 1)x(0, 0, 1, 12) - AIC:1098.5554825916975
SARIMA(0, 1, 1)x(0, 0, 2, 12) - AIC:923.6314049709464
SARIMA(0, 1, 1)x(1, 0, 0, 12) - AIC:1095.7936324695297
SARIMA(0, 1, 1)x(1, 0, 1, 12) - AIC:1054.7434331283894
SARIMA(0, 1, 1)x(1, 0, 2, 12) - AIC:918.857348260166
SARIMA(0, 1, 1)x(2, 0, 0, 12) - AIC:914.5982866189928
SARIMA(0, 1, 1)x(2, 0, 1, 12) - AIC:915.3332430440274
SARIMA(0, 1, 1)x(2, 0, 2, 12) - AIC:901.1988255682908
SARIMA(0, 1, 2)x(0, 0, 0, 12) - AIC:1251.6675430829725
SARIMA(0, 1, 2)x(0, 0, 1, 12) - AIC:1083.4866975348887
SARIMA(0, 1, 2)x(0, 0, 2, 12) - AIC:913.493848661737
SARIMA(0, 1, 2)x(1, 0, 0, 12) - AIC:1088.8332843917171
SARIMA(0, 1, 2)x(1, 0, 1, 12) - AIC:1045.5400933749636
SARIMA(0, 1, 2)x(1, 0, 2, 12) - AIC:904.8310906189827
SARIMA(0, 1, 2)x(2, 0, 0, 12) - AIC:913.0105912076175
SARIMA(0 1 2) (2 0 1 12) AIC 914 1707544194443
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(0, 1, 2)x(2, 0, 1, 12) - AIC:914.1707544194443
SARIMA(0, 1, 2)x(2, 0, 2, 12) - AIC:887.9375086839932
SARIMA(1, 1, 0)x(0, 0, 0, 12) - AIC:1308.161871082466
SARIMA(1, 1, 0)x(0, 0, 1, 12) - AIC:1135.2955447611132
SARIMA(1, 1, 0)x(0, 0, 2, 12) - AIC:963.9405391280565
SARIMA(1, 1, 0)x(1, 0, 0, 12) - AIC:1124.8860786766481
SARIMA(1, 1, 0)x(1, 0, 1, 12) - AIC:1105.4080055027964
SARIMA(1, 1, 0)x(1, 0, 2, 12) - AIC:958.5001972955552
SARIMA(1, 1, 0)x(2, 0, 0, 12) - AIC:939.0984778859015
SARIMA(1, 1, 0)x(2, 0, 1, 12) - AIC:940.9087133728217
SARIMA(1, 1, 0)x(2, 0, 2, 12) - AIC:942.2973104080617
SARIMA(1, 1, 1)x(0, 0, 0, 12) - AIC:1262.1840064428402
SARIMA(1, 1, 1)x(0, 0, 1, 12) - AIC:1094.3172708191385
SARIMA(1, 1, 1)x(0, 0, 2, 12) - AIC:923.0862223896353
SARIMA(1, 1, 1)x(1, 0, 0, 12) - AIC:1083.3937962968434
SARIMA(1, 1, 1)x(1, 0, 1, 12) - AIC:1054.7180546776658
SARIMA(1, 1, 1)x(1, 0, 2, 12) - AIC:916.3549420363358
SARIMA(1, 1, 1)x(2, 0, 0, 12) - AIC:905.9249060678128
SARIMA(1, 1, 1)x(2, 0, 1, 12) - AIC:907.2972867451367
SARIMA(1, 1, 1)x(2, 0, 2, 12) - AIC:900.6725796178105
SARIMA(1, 1, 2)x(0, 0, 0, 12) - AIC:1251.9495040731845
SARIMA(1, 1, 2)x(0, 0, 1, 12) - AIC:1085.4861928032612
SARIMA(1, 1, 2)x(0, 0, 2, 12) - AIC:915.4938405402323
SARIMA(1, 1, 2)x(1, 0, 0, 12) - AIC:1090.7768413244564
SARIMA(1, 1, 2)x(1, 0, 1, 12) - AIC:1042.6183156840766
SARIMA(1, 1, 2)x(1, 0, 2, 12) - AIC:906.7318502489444
SARIMA(1, 1, 2)x(2, 0, 0, 12) - AIC:906.1690198933999
SARIMA(1, 1, 2)x(2, 0, 1, 12) - AIC:907.4597828144317
SARIMA(1, 1, 2)x(2, 0, 2, 12) - AIC:905.1600388612114
SARIMA(2, 1, 0)x(0, 0, 0, 12) - AIC:1280.2537561535767
SARIMA(2, 1, 0)x(0, 0, 1, 12) - AIC:1128.77736963314
SARIMA(2, 1, 0)x(0, 0, 2, 12) - AIC:958.0793208597413
SARIMA(2, 1, 0)x(1, 0, 0, 12) - AIC:1099.5086021572488
SARIMA(2, 1, 0)x(1, 0, 1, 12) - AIC:1076.7863199029105
SARIMA(2, 1, 0)x(1, 0, 2, 12) - AIC:951.1988165852144
SARIMA(2, 1, 0)x(2, 0, 0, 12) - AIC:924.6004792567292
SARIMA(2, 1, 0)x(2, 0, 1, 12) - AIC:925.975780186385
SARIMA(2, 1, 0)x(2, 0, 2, 12) - AIC:927.8380693436321
SARIMA(2, 1, 1)x(0, 0, 0, 12) - AIC:1263.2315232809276
SARIMA(2, 1, 1)x(0, 0, 1, 12) - AIC:1094.2093491882445

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(2, 1, 1)x(0, 0, 2, 12) - AIC:922.9408472105727
SARIMA(2, 1, 1)x(1, 0, 0, 12) - AIC:1071.424960152966
SARIMA(2, 1, 1)x(1, 0, 1, 12) - AIC:1052.9244468735646
SARIMA(2, 1, 1)x(1, 0, 2, 12) - AIC:916.2424915017058
SARIMA(2, 1, 1)x(2, 0, 0, 12) - AIC:896.5181606626418
SARIMA(2, 1, 1)x(2, 0, 1, 12) - AIC:897.6399567764543
SARIMA(2, 1, 1)x(2, 0, 2, 12) - AIC:899.4835866487186
SARIMA(2, 1, 2)x(0, 0, 0, 12) - AIC:1253.9102116555646
SARIMA(2, 1, 2)x(0, 0, 1, 12) - AIC:1085.964368136149
SARIMA(2, 1, 2)x(0, 0, 2, 12) - AIC:916.3259134019722
SARIMA(2, 1, 2)x(1, 0, 0, 12) - AIC:1073.291271116631
SARIMA(2, 1, 2)x(1, 0, 1, 12) - AIC:1044.190935848489
SARIMA(2, 1, 2)x(1, 0, 2, 12) - AIC:907.6661538095441
SARIMA(2, 1, 2)x(2, 0, 0, 12) - AIC:897.3464978203967
SARIMA(2, 1, 2)x(2, 0, 1, 12) - AIC:898.3782488307854
SARIMA(2, 1, 2)x(2, 0, 2, 12) - AIC:890.6688480021876

In [145]: SARIMA_AIC.sort_values(by=['AIC']).head()

Out[145]:
param seasonal AIC

26 (0, 1, 2) (2, 0, 2, 12) 887.937509

80 (2, 1, 2) (2, 0, 2, 12) 890.668848

69 (2, 1, 1) (2, 0, 0, 12) 896.518161

78 (2, 1, 2) (2, 0, 0, 12) 897.346498

70 (2, 1, 1) (2, 0, 1, 12) 897.639957

In [146]: import statsmodels.api as sm

auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(0, 1, 2),
seasonal_order=(2, 0, 2, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_12 = auto_SARIMA_12.fit(maxiter=1000)
print(results_auto_SARIMA_12.summary())

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMAX Results
=======================================================================
===================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(0, 1, 2)x(2, 0, 2, 12) Log Likelihood
-436.969
Date: Sun, 20 Jun 2021 AIC
887.938
Time: 23:05:18 BIC
906.448
Sample: 0 HQIC
895.437
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.8428 174.448 -0.005 0.996 -342.754
341.069
ma.L2 -0.1572 27.457 -0.006 0.995 -53.971
53.657
ar.S.L12 0.3467 0.079 4.374 0.000 0.191
0.502
ar.S.L24 0.3023 0.076 3.996 0.000 0.154
0.451
ma.S.L12 0.0767 0.133 0.577 0.564 -0.184
0.337
ma.S.L24 -0.0726 0.146 -0.498 0.618 -0.358
0.213
sigma2 251.3159 4.38e+04 0.006 0.995 -8.57e+04
8.62e+04
=======================================================================
============
Ljung-Box (Q): 24.56 Jarque-Bera (JB):
2.33

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Prob(Q): 0.97 Prob(JB):
0.31
Heteroskedasticity (H): 0.88 Skew:
0.37
Prob(H) (two-sided): 0.70 Kurtosis:
3.03
=======================================================================
============

Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [147]: results_auto_SARIMA_12.plot_diagnostics()
plt.show()

In [148]: predicted_auto_SARIMA_12 = results_auto_SARIMA_12.get_forecast(steps=le


n(test))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [149]: predicted_auto_SARIMA_12.summary_frame(alpha=0.05).head()

Out[149]:
y mean mean_se mean_ci_lower mean_ci_upper

0 62.867710 15.928466 31.648491 94.086929

1 70.541191 16.147539 38.892596 102.189786

2 77.357273 16.147537 45.708683 109.005863

3 76.209639 16.147537 44.561049 107.858230

4 72.748088 16.147537 41.099498 104.396678

In [150]: rmse = mean_squared_error(test['Rose'],predicted_auto_SARIMA_12.predict


ed_mean,squared=False)
print(rmse)

26.929368240084877

In [151]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(0,1,2)(2,0,2,12)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[151]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

ARIMA(0,1,2) 15.618912

SARIMA(1,1,2)(2,0,2,6) 26.133554

SARIMA(0,1,2)(2,0,2,12) 26.929368

7. Build ARIMA/SARIMA models based on the cut-off points of ACF


and PACF on the training data and evaluate this model on the test
data using RMSE.

Manual ARIMA

In [152]: plot_acf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Aut


ocorrelation for Rose Wine Production')
plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Pa
rtial Autocorrelation for Rose Wine Production')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [153]: manual_ARIMA = ARIMA(train['Rose'].astype('float64'), order=(1,1,1),fre
q='M')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
results_manual_ARIMA = manual_ARIMA.fit()

print(results_manual_ARIMA.summary())

ARIMA Model Results


=======================================================================
=======
Dep. Variable: D.Rose No. Observations:
131
Model: ARIMA(1, 1, 1) Log Likelihood -
634.888
Method: css-mle S.D. of innovations
30.280
Date: Sun, 20 Jun 2021 AIC 1
277.776
Time: 23:05:20 BIC 1
289.277
Sample: 02-29-1980 HQIC 1
282.449
- 12-31-1990
=======================================================================
=========
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
---------
const -0.4871 0.086 -5.656 0.000 -0.656
-0.318
ar.L1.D.Rose 0.2006 0.087 2.293 0.022 0.029
0.372
ma.L1.D.Rose -0.9999 0.035 -28.645 0.000 -1.068
-0.931
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
AR.1 4.9857 +0.0000j 4.9857

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.0000
MA.1 1.0001 +0.0000j 1.0001
0.0000
-----------------------------------------------------------------------
------

In [154]: predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))

In [155]: rmse = mean_squared_error(test['Rose'],predicted_manual_ARIMA[0],square


d=False)
print(rmse)

15.734258659597247

In [156]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['ARIMA(1,1,1)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[156]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

ARIMA(0,1,2) 15.618912

SARIMA(1,1,2)(2,0,2,6) 26.133554

SARIMA(0,1,2)(2,0,2,12) 26.929368

ARIMA(1,1,1) 15.734259

Manual SARIMA

In [157]: plot_acf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Aut


ocorrelation for Rose Wine Production')
plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Pa
tial Autocorrelation For Rose Wine Production')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [158]: df.plot()
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [159]: (df['Rose'].diff(6)).plot()
plt.grid();

In [160]: (df['Rose'].diff(6)).diff().plot()
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [161]: test_stationarity((train['Rose'].diff(6).dropna()).diff(1).dropna())

Results of Dickey-Fuller Test:


Test Statistic -6.882869e+00

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
p-value 1.418693e-09
#Lags Used 1.300000e+01
Number of Observations Used 1.110000e+02
Critical Value (1%) -3.490683e+00
Critical Value (5%) -2.887952e+00
Critical Value (10%) -2.580857e+00
dtype: float64

In [162]: plot_acf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [163]: import statsmodels.api as sm

manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())

SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 1)x(1, 1, 1, 6) Log Likelihood
-545.056
Date: Sun, 20 Jun 2021 AIC
1100.112
Time: 23:05:23 BIC
1113.923

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1105.719
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 0.1764 0.070 2.516 0.012 0.039
0.314
ma.L1 -1.0452 0.059 -17.608 0.000 -1.161
-0.929
ar.S.L6 -0.6660 0.053 -12.552 0.000 -0.770
-0.562
ma.S.L6 -0.4840 0.093 -5.213 0.000 -0.666
-0.302
sigma2 579.5488 67.298 8.612 0.000 447.648
711.450
=======================================================================
============
Ljung-Box (Q): 34.58 Jarque-Bera (JB):
143.27
Prob(Q): 0.71 Prob(JB):
0.00
Heteroskedasticity (H): 0.22 Skew:
0.18
Prob(H) (two-sided): 0.00 Kurtosis:
8.41
=======================================================================
============

Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [164]: results_manual_SARIMA_6.plot_diagnostics()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.show()

In [165]: predicted_manual_SARIMA_6 = results_manual_SARIMA_6.get_forecast(steps=


len(test))

In [166]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()

Out[166]:
y mean mean_se mean_ci_lower mean_ci_upper

0 53.820556 25.161040 4.505823 103.135288

1 65.533301 25.760765 15.043128 116.023473

2 74.100741 25.843170 23.449059 124.752424

3 77.920921 25.883870 27.189469 128.652374

4 71.659662 25.918694 20.859955 122.459369

In [167]: rmse = mean_squared_error(test['Rose'],predicted_manual_SARIMA_6.predic


ted_mean,squared=False)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print(rmse)

20.96410963443785

In [168]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(1,1,1)(1,1,1,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[168]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

ARIMA(0,1,2) 15.618912

SARIMA(1,1,2)(2,0,2,6) 26.133554

SARIMA(0,1,2)(2,0,2,12) 26.929368

ARIMA(1,1,1) 15.734259

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

SARIMA(1,1,1)(1,1,1,6) 20.964110

8. Build a table (create a data frame) with all the models built along
with their corresponding parameters and the respective RMSE values
on the test data.

In [169]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(1,1,2)(2,0,2,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[169]:
Test RMSE

RegressionOnTime 51.433312

NaiveModel 79.718773

SimpleAverageModel 53.460570

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Alpha=0.3,SimpleExponentialSmoothing 47.504821

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

ARIMA(0,1,2) 15.618912

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

SARIMA(1,1,2)(2,0,2,6) 26.133554

SARIMA(0,1,2)(2,0,2,12) 26.929368

ARIMA(1,1,1) 15.734259

SARIMA(1,1,1)(1,1,1,6) 20.964110

SARIMA(1,1,2)(2,0,2,6) 20.964110

In [170]: print('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n'
,)
resultsDf.sort_values(by=['Test RMSE'])

Sorted by RMSE values on the Test Data for Rose Wine sale:

Out[170]:
Test RMSE

Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435

2pointTrailingMovingAverage 11.529278

4pointTrailingMovingAverage 14.451403

6pointTrailingMovingAverage 14.566327

9pointTrailingMovingAverage 14.727630

ARIMA(0,1,2) 15.618912

ARIMA(1,1,1) 15.734259

Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489

SARIMA(1,1,2)(2,0,2,6) 20.964110

SARIMA(1,1,1)(1,1,1,6) 20.964110

SARIMA(1,1,2)(2,0,2,6) 26.133554

SARIMA(0,1,2)(2,0,2,12) 26.929368

Alpha=0.098,SimpleExponentialSmoothing 36.796244

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,SimpleExponentialSmoothing 47.504821

RegressionOnTime 51.433312

SimpleAverageModel 53.460570

NaiveModel 79.718773

Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317

9. Based on the model-building exercise, build the most optimum


model(s) on the complete data and predict 12 months into the future
with appropriate confidence intervals/bands.

Building the most optimum model on the Full Data.

In [171]: ## Plotting on both the Training and Test data

plt.plot(train['Rose'], label='Train')
plt.plot(test['Rose'], label='Test')

plt.plot(TES_test['predict', 0.3, 0.4, 0.3], label='Alpha=0.3,Beta=0.4,


Gamma=0.3,TripleExponentialSmoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();
plt.title('Plot of Exponential Smoothing Acutal Values for Rose Wine Pr
oductions');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [172]: fullmodel = ExponentialSmoothing(df,
trend='additive',
seasonal='multiplicative').fit(smooth
ing_level=0.3,
smooth
ing_slope=0.4,
smooth
ing_seasonal=0.3)

In [173]: RMSE_fullmodel = metrics.mean_squared_error(df['Rose'],fullmodel.fitted


values,squared=False)

print('RMSE:',RMSE_fullmodel)

RMSE: 24.266535961104537

In [174]: # Getting the predictions for the same number of times stamps that are
present in the test data
prediction = fullmodel.forecast(steps=len(test))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [175]: df.plot()
prediction.plot();

In [176]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()

Out[176]:
lower_CI prediction upper_ci

1995-08-31 -7.559277 40.074648 87.708574

1995-09-30 -8.388196 39.245730 86.879656

1995 10 31 6 321556 41 312369 88 946295


Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1995-10-31 -6.321556 41.312369 88.946295
lower_CI prediction upper_ci

1995-11-30 0.433543 48.067469 95.701394

1995-12-31 21.101528 68.735454 116.369379

In [177]: # plot the forecast along with the confidence band

axis = df.plot(label='Actual', figsize=(13,6))


pred_df['prediction'].plot(ax=axis, label='Forecast', alpha=0.5)
axis.fill_between(pred_df.index, pred_df['lower_CI'], pred_df['upper_c
i'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Rose')
plt.legend(loc='best')
plt.grid()
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
END

1. Read the data as an appropriate Time Series data and plot the
data.

In [178]: df1 = pd.read_csv("Sparkling.csv")

In [179]: df1.head()

Out[179]:
YearMonth Sparkling

0 1980-01 1686

1 1980-02 1591

2 1980-03 2304

3 1980-04 1712

4 1980-05 1471

In [180]: df1.tail()

Out[180]:
YearMonth Sparkling

182 1995-03 1897

183 1995-04 1862

184 1995-05 1670

185 1995-06 1688

186 1995-07 2031

In [181]: df1.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [182]: date1 = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date1

Out[182]: DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',


'1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
'1980-09-30', '1980-10-31',
...
'1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
'1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
'1995-06-30', '1995-07-31'],
dtype='datetime64[ns]', length=187, freq='M')

In [183]: df1['Time_Stamp'] = pd.DataFrame(date1,columns=['Month'])


df1.head()

Out[183]:
YearMonth Sparkling Time_Stamp

0 1980-01 1686 1980-01-31

1 1980-02 1591 1980-02-29

2 1980-03 2304 1980-03-31

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Sparkling Time_Stamp

3 1980-04 1712 1980-04-30

4 1980-05 1471 1980-05-31

In [184]: df1['Time_Stamp']=pd.to_datetime(df1['Time_Stamp'])
df1= df1.set_index('Time_Stamp')
df1.drop(['YearMonth'],axis=1, inplace = True)
df1.head()

Out[184]:
Sparkling

Time_Stamp

1980-01-31 1686

1980-02-29 1591

1980-03-31 2304

1980-04-30 1712

1980-05-31 1471

In [185]: df1.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.

In [186]: df1.describe(include='all').T

Out[186]:
count mean std min 25% 50% 75% max

Sparkling 187.0 2402.417112 1295.11154 1070.0 1605.0 1874.0 2549.0 7242.0

In [187]: df1.isnull().sum()

Out[187]: Sparkling 0
dtype: int64

In [188]: df1.shape

Out[188]: (187, 1)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [189]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot
sns.boxplot(x = df1.index.year,y = df1.values[:,0],ax=ax)
plt.xlabel('Yearly Sparkling Wine Sale');
plt.ylabel('Years');
plt.grid();

In [190]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df1.index.month_name(),y = df1.values[:,0],ax=ax)
plt.xlabel('Monthly Sparkling Wine Sale');
plt.ylabel('Monthls');
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [191]: from statsmodels.graphics.tsaplots import month_plot

month_plot(df1,ylabel='Sparkling Wine Production')


plt.grid();

In [192]: monthly_sales_across_years = pd.pivot_table(df1, values = 'Sparkling',


columns = df1.index.month_name(), index = df1.index.year)
monthly_sales_across_years

Out[192]:
Time_Stamp April August December February January July June March May Nove

Time_Stamp

1980 1712.0 2453.0 5179.0 1591.0 1686.0 1966.0 1377.0 2304.0 1471.0 4

1981 1976.0 2472.0 4551.0 1523.0 1530.0 1781.0 1480.0 1633.0 1170.0 3

1982 1790.0 1897.0 4524.0 1329.0 1510.0 1954.0 1449.0 1518.0 1537.0 3

1983 1375.0 2298.0 4923.0 1638.0 1609.0 1600.0 1245.0 2030.0 1320.0 3

1984 1789.0 3159.0 5274.0 1435.0 1609.0 1597.0 1404.0 2061.0 1567.0 4

1985 1589.0 2512.0 5434.0 1682.0 1771.0 1645.0 1379.0 1846.0 1896.0 4

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Time_Stamp April August December February January July June March May Nove

Time_Stamp

1986 1605.0 3318.0 5891.0 1523.0 1606.0 2584.0 1403.0 1577.0 1765.0 3

1987 1935.0 1930.0 7242.0 1442.0 1389.0 1847.0 1250.0 1548.0 1518.0 4

1988 2336.0 1645.0 6757.0 1779.0 1853.0 2230.0 1661.0 2108.0 1728.0 4

1989 1650.0 1968.0 6694.0 1394.0 1757.0 1971.0 1406.0 1982.0 1654.0 4

1990 1628.0 1605.0 6047.0 1321.0 1720.0 1899.0 1457.0 1859.0 1615.0 4

1991 1279.0 1857.0 6153.0 2049.0 1902.0 2214.0 1540.0 1874.0 1432.0 3

1992 1997.0 1773.0 6119.0 1667.0 1577.0 2076.0 1625.0 1993.0 1783.0 4

1993 2121.0 2795.0 6410.0 1564.0 1494.0 2048.0 1515.0 1898.0 1831.0 4

1994 1725.0 1495.0 5999.0 1968.0 1197.0 2031.0 1693.0 1720.0 1674.0 3

1995 1862.0 NaN NaN 1402.0 1070.0 2031.0 1688.0 1897.0 1670.0

In [193]: monthly_sales_across_years.plot()
plt.grid()
plt.legend(loc='best');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [194]: df_yearly_sum1 = df1.resample('A').sum()
df_yearly_sum1.head()

Out[194]:
Sparkling

Time_Stamp

1980-12-31 28406

1981-12-31 26227

1982-12-31 25321

1983-12-31 26180

1984-12-31 28431

In [195]: df_yearly_sum1.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [196]: df_yearly_mean1 = df1.resample('Y').mean()
df_yearly_mean1.head()

Out[196]:
Sparkling

Time_Stamp

1980-12-31 2367.166667

1981-12-31 2185.583333

1982-12-31 2110.083333

1983-12-31 2181.666667

1984-12-31 2369.250000

In [197]: df_yearly_mean1.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [198]: df_quarterly_sum1 = df1.resample('Q').sum()
df_quarterly_sum1.head()

Out[198]:
Sparkling

Time_Stamp

1980-03-31 5581

1980-06-30 4560

1980-09-30 6403

1980-12-31 11862

1981-03-31 4686

In [199]: df_quarterly_sum1.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [200]: df_quarterly_mean1 = df1.resample('Q').mean()
df_quarterly_mean1.head()

Out[200]:
Sparkling

Time_Stamp

1980-03-31 1860.333333

1980-06-30 1520.000000

1980-09-30 2134.333333

1980-12-31 3954.000000

1981-03-31 1562.000000

In [201]: df_quarterly_mean1.plot();
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [202]: df_daily_sum1 = df1.resample('D').sum()
df_daily_sum1

Out[202]:
Sparkling

Time_Stamp

1980-01-31 1686

1980-02-01 0

1980-02-02 0

1980-02-03 0

1980-02-04 0

... ...

1995-07-27 0

1995-07-28 0

1995-07-29 0

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling

Time_Stamp

1995-07-30 0

1995-07-31 2031

5661 rows × 1 columns

In [203]: df_daily_sum1.plot()
plt.grid();

In [204]: df_decade_sum1 = df1.resample('10Y').sum()


df_decade_sum1

Out[204]:
Sparkling

Time_Stamp

1980-12-31 28406

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling

Time_Stamp

1990-12-31 288893

2000-12-31 131953

In [205]: df_decade_sum1.plot();
plt.grid()

In [206]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF

cdf = ECDF(df1['Sparkling'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Sparkling Wine Sales');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [207]: # group by date and get average RetailSales, and precent change
average = df1.groupby(df1.index)["Sparkling"].mean()
pct_change = df1.groupby(df1.index)["Sparkling"].sum().pct_change()

fig, (axis1,axis2) = plt.subplots(2,1,sharex=True)

# plot average RetailSales over time(year-month)


ax1 = average.plot(legend=True,ax=axis1,marker='o',title="Average Spark
ling Wine Sales",grid=True)
ax1.set_xticks(range(len(average)))
ax1.set_xticklabels(average.index.tolist())
# plot precent change for RetailSales over time(year-month)
ax2 = pct_change.plot(legend=True,ax=axis2,marker='o',colormap="summer"
,title="Sparkling Wine Sales Percent Change",grid=True)

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [208]: from statsmodels.tsa.seasonal import seasonal_decompose

In [209]: decomposition1 = seasonal_decompose(df1['Sparkling'],model='additive')


decomposition1.plot();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [210]: trend = decomposition1.trend
seasonality = decomposition1.seasonal
residual = decomposition1.resid

print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')

Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Name: trend, dtype: float64

Seasonality
Time_Stamp
1980-01-31 -854.260599
1980-02-29 -830.350678
1980-03-31 -592.356630
1980-04-30 -658.490559
1980-05-31 -824.416154
1980-06-30 -967.434011
1980-07-31 -465.502265
1980-08-31 -214.332821
1980-09-30 -254.677265
1980-10-31 599.769957
1980-11-30 1675.067179
1980-12-31 3386.983846
Name: seasonal, dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 70.835599
1980-08-31 315.999487
1980-09-30 -81.864401
1980-10-31 -307.353290
1980-11-30 109.891154
1980-12-31 -501.775513
Name: resid, dtype: float64

In [211]: decomposition2 = seasonal_decompose(df1['Sparkling'],model='multiplicat


ive')
decomposition2.plot();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [212]: trend = decomposition2.trend
seasonality = decomposition2.seasonal
residual = decomposition2.resid

print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')

Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Name: trend, dtype: float64

Seasonality
Time_Stamp
1980-01-31 0.649843
1980-02-29 0.659214
1980-03-31 0.757440
1980-04-30 0.730351
1980-05-31 0.660609
1980-06-30 0.603468
1980-07-31 0.809164
1980-08-31 0.918822
1980-09-30 0.894367
1980-10-31 1.241789
1980-11-30 1.690158
1980-12-31 2.384776
Name: seasonal, dtype: float64

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 1.029230
1980-08-31 1.135407
1980-09-30 0.955954
1980-10-31 0.907513
1980-11-30 1.050423
1980-12-31 0.946770
Name: resid, dtype: float64

3. Split the data into training and test. The test data should start in
1991.

In [213]: train=df1[df1.index.year< 1991]


test=df1[df1.index.year>=1991]

In [214]: print(train.shape)

(132, 1)

In [215]: print(test.shape)

(55, 1)

In [216]: print('First few rows of Training Data','\n',train.head(),'\n')


print('Last few rows of Training Data','\n',train.tail(),'\n')
print('First few rows of Test Data','\n',test.head(),'\n')
print('Last few rows of Test Data','\n',test.tail(),'\n')

First few rows of Training Data

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471

Last few rows of Training Data


Sparkling
Time_Stamp
1990-08-31 1605
1990-09-30 2424
1990-10-31 3116
1990-11-30 4286
1990-12-31 6047

First few rows of Test Data


Sparkling
Time_Stamp
1991-01-31 1902
1991-02-28 2049
1991-03-31 1874
1991-04-30 1279
1991-05-31 1432

Last few rows of Test Data


Sparkling
Time_Stamp
1995-03-31 1897
1995-04-30 1862
1995-05-31 1670
1995-06-30 1688
1995-07-31 2031

In [217]: train['Sparkling'].plot(figsize=(13,6), fontsize=14)


test['Sparkling'].plot(figsize=(13,6), fontsize=14)
plt.grid()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.legend(['Training Data','Test Data'])
plt.show()

4. Build various exponential smoothing models on the training data


and evaluate the model using RMSE on the test data. Other models
such as regression,naïve forecast models and simple average
models. should also be built on the training data and check the
performance on the test data using RMSE.

Model 1: Linear Regression

In [218]: train_time = [i+1 for i in range(len(train))]


test_time = [i+43 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)

Training Time instance


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2
0, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 1
22, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6
0, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]

In [219]: LinearRegression_train = train.copy()


LinearRegression_test = test.copy()

In [220]: LinearRegression_train['time'] = train_time


LinearRegression_test['time'] = test_time

print('First few rows of Training Data','\n',LinearRegression_train.hea


d(),'\n')
print('Last few rows of Training Data','\n',LinearRegression_train.tail
(),'\n')
print('First few rows of Test Data','\n',LinearRegression_test.head(),'
\n')
print('Last few rows of Test Data','\n',LinearRegression_test.tail(),'
\n')

First few rows of Training Data


Sparkling time
Time_Stamp
1980-01-31 1686 1
1980-02-29 1591 2
1980-03-31 2304 3
1980-04-30 1712 4
1980-05-31 1471 5

Last few rows of Training Data


Sparkling time
Time_Stamp
1990 08 31 1605 128
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1990-08-31 1605 128
1990-09-30 2424 129
1990-10-31 3116 130
1990-11-30 4286 131
1990-12-31 6047 132

First few rows of Test Data


Sparkling time
Time_Stamp
1991-01-31 1902 43
1991-02-28 2049 44
1991-03-31 1874 45
1991-04-30 1279 46
1991-05-31 1432 47

Last few rows of Test Data


Sparkling time
Time_Stamp
1995-03-31 1897 93
1995-04-30 1862 94
1995-05-31 1670 95
1995-06-30 1688 96
1995-07-31 2031 97

In [221]: from sklearn.linear_model import LinearRegression

In [222]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Sparkli
ng'].values)

Out[222]: LinearRegression()

In [223]: test_predictions_model1 = lr.predict(LinearRegression_test[['ti


me']])
LinearRegression_test['RegOnTime'] = test_predictions_model1

plt.figure(figsize=(13,6))
plt.plot( train['Sparkling'], label='Train')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.plot(test['Sparkling'], label='Test')
plt.title("Linear Regression plot for Sparkling Wine")

plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_


Test Data')
plt.legend(loc='best')
plt.grid();

In [224]: from sklearn import metrics

In [225]: ## Test Data - RMSE

rmse_model1_test = metrics.mean_squared_error(test['Sparkling'],test_pr
edictions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))

For RegressionOnTime forecast on the Test Data, RMSE is 1275.867

In [226]: resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test]},index=['Regr

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
essionOnTime'])
resultsDf

Out[226]:
Test RMSE

RegressionOnTime 1275.867052

Model 2: Naive Approach

In [227]: NaiveModel_train = train.copy()


NaiveModel_test = test.copy()

In [228]: NaiveModel_test['naive'] = np.asarray(train['Sparkling'])[len(np.asarra


y(train['Sparkling']))-1]
NaiveModel_test['naive'].head()

Out[228]: Time_Stamp
1991-01-31 6047
1991-02-28 6047
1991-03-31 6047
1991-04-30 6047
1991-05-31 6047
Name: naive, dtype: int64

In [229]: plt.figure(figsize=(13,6))
plt.plot(NaiveModel_train['Sparkling'], label='Train')
plt.plot(test['Sparkling'], label='Test')
plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')
plt.legend(loc='best')
plt.title("Naive Forecast for Sparkling Wine")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [230]: ## Test Data - RMSE

rmse_model2_test = metrics.mean_squared_error(test['Sparkling'],NaiveMo
del_test['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))

For Naive forecast on the Test Data, RMSE is 3864.279

In [231]: resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test]},index=['Na


iveModel'])

resultsDf = pd.concat([resultsDf, resultsDf_2])


resultsDf

Out[231]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 3 : Simple Average

In [232]: SimpleAverage_train = train.copy()


SimpleAverage_test = test.copy()

In [233]: SimpleAverage_test['mean_forecast'] = train['Sparkling'].mean()


SimpleAverage_test.head()

Out[233]:
Sparkling mean_forecast

Time_Stamp

1991-01-31 1902 2403.780303

1991-02-28 2049 2403.780303

1991-03-31 1874 2403.780303

1991-04-30 1279 2403.780303

1991-05-31 1432 2403.780303

In [234]: plt.figure(figsize=(13,6))
plt.plot(SimpleAverage_train['Sparkling'], label='Train')
plt.plot(SimpleAverage_test['Sparkling'], label='Test')
plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on
Test Data')
plt.legend(loc='best')
plt.title("Simple Average Forecast for Sparkling Wine")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [235]: ## Test Data - RMSE

rmse_model3_test = metrics.mean_squared_error(test['Sparkling'],SimpleA
verage_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))

For Simple Average forecast on the Test Data, RMSE is 1275.082

In [236]: resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_model3_test]},index=['Si


mpleAverageModel'])

resultsDf = pd.concat([resultsDf, resultsDf_3])


resultsDf

Out[236]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
p g

Model 4: Moving Average

In [237]: MovingAverage = df1.copy()


MovingAverage.head()

Out[237]:
Sparkling

Time_Stamp

1980-01-31 1686

1980-02-29 1591

1980-03-31 2304

1980-04-30 1712

1980-05-31 1471

In [238]: MovingAverage['Trailing_2'] = MovingAverage['Sparkling'].rolling(2).mea


n()
MovingAverage['Trailing_4'] = MovingAverage['Sparkling'].rolling(4).mea
n()
MovingAverage['Trailing_6'] = MovingAverage['Sparkling'].rolling(6).mea
n()
MovingAverage['Trailing_9'] = MovingAverage['Sparkling'].rolling(9).mea
n()

MovingAverage.head()

Out[238]:
Sparkling Trailing_2 Trailing_4 Trailing_6 Trailing_9

Time_Stamp

1980-01-31 1686 NaN NaN NaN NaN

1980-02-29 1591 1638.5 NaN NaN NaN

1980-03-31 2304 1947.5 NaN NaN NaN

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1980-04-30 1712
Sparkling 2008.0
Trailing_2 1823.25
Trailing_4 NaN
Trailing_6 NaN
Trailing_9
1980-05-31
Time_Stamp 1471 1591.5 1769.50 NaN NaN

In [239]: ## Plotting on the whole data

plt.figure(figsize=(13,6))
plt.plot(MovingAverage['Sparkling'], label='Train')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_4'], label='4 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.plot(MovingAverage['Trailing_9'],label = '9 Point Moving Average')
plt.title("Moving Average for Sparkling Wine")

plt.legend(loc = 'best')
plt.grid();

In [240]: trailing_MovingAverage_train=MovingAverage[df.index.year< 1991]


trailing_MovingAverage_test=MovingAverage[df.index.year>=1991]

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [241]: ## Plotting on both the Training and Test data

plt.figure(figsize=(13,6))
plt.plot(trailing_MovingAverage_train['Sparkling'], label='Train')
plt.plot(trailing_MovingAverage_test['Sparkling'], label='Test')

plt.plot(trailing_MovingAverage_train['Trailing_2'], label='2 Point Tra


iling Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_4'], label='4 Point Tra
iling Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_6'],label = '6 Point Tr
ailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_9'],label = '9 Point Tr
ailing Moving Average on Training Set')

plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trai


ling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_4'], label='4 Point Trai
ling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_6'],label = '6 Point Tra
iling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_9'],label = '9 Point Tra
iling Moving Average on Test Set')
plt.legend(loc = 'best')
plt.title("Moving Average for Sparkling Wine")

plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [242]: ## Test Data - RMSE --> 2 point Trailing MA

rmse_model4_test_2 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_2'],squared=False)
print("For 2 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_2))

## Test Data - RMSE --> 4 point Trailing MA

rmse_model4_test_4 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_4'],squared=False)
print("For 4 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_4))

## Test Data - RMSE --> 6 point Trailing MA

rmse_model4_test_6 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_6'],squared=False)
print("For 6 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_6))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## Test Data - RMSE --> 9 point Trailing MA

rmse_model4_test_9 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_9'],squared=False)
print("For 9 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f " %(rmse_model4_test_9))

For 2 point Moving Average Model forecast on the Training Data, RMSE i
s 813.401
For 4 point Moving Average Model forecast on the Training Data, RMSE i
s 1156.590
For 6 point Moving Average Model forecast on the Training Data, RMSE i
s 1283.927
For 9 point Moving Average Model forecast on the Training Data, RMSE i
s 1346.278

In [243]: resultsDf_4 = pd.DataFrame({'Test RMSE': [rmse_model4_test_2,rmse_model


4_test_4
,rmse_model4_test_6,rmse_mode
l4_test_9]}
,index=['2pointTrailingMovingAverage','4poin
tTrailingMovingAverage'
,'6pointTrailingMovingAverage','9poi
ntTrailingMovingAverage'])

resultsDf = pd.concat([resultsDf, resultsDf_4])


resultsDf

Out[243]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9 i tT ili M i A 1346 278315


Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
9pointTrailingMovingAverage 1346.278315
Test RMSE

In [244]: ## Plotting on both Training and Test data

plt.plot(train['Sparkling'], label='Train')
plt.plot(test['Sparkling'], label='Test')

plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_


Test Data')

plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')

plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on


Test Data')

plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trai


ling Moving Average on Training Set')

plt.legend(loc='best')
plt.title("Model Comparison Plots for Sparkling Wine")
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 5: Simple Exponential Smoothing

In [245]: from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothin


g, Holt

In [246]: SES_train = train.copy()


SES_test = test.copy()

In [247]: model_SES = SimpleExpSmoothing(SES_train['Sparkling'])

In [248]: model_SES_autofit = model_SES.fit(optimized=True)

In [249]: model_SES_autofit.params

Out[249]: {'smoothing_level': 0.0,


'smoothing_slope': nan,
'smoothing_seasonal': nan,
'damping_slope': nan,
'initial_level': 2403.770045419478,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}

In [250]: SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))


SES_test.head()

Out[250]:
Sparkling predict

Time_Stamp

1991-01-31 1902 2403.770045

1991-02-28 2049 2403.770045

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling predict

Time_Stamp

1991-03-31 1874 2403.770045

1991-04-30 1279 2403.770045

1991-05-31 1432 2403.770045

In [251]: ## Plotting on both the Training and Test data

plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =0.0 Simple Exponential Smoo


thing predictions on Test Set')

plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.0 Predictions');

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [252]: ## Test Data

rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Sparkling'],S
ES_test['predict'],squared=False)
print("For Alpha =0.0 Simple Exponential Smoothing Model forecast on th
e Test Data, RMSE is %3.3f" %(rmse_model5_test_1))

For Alpha =0.0 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 1275.082

In [253]: resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1]},index=[


'Alpha=0.0,SimpleExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_5])


resultsDf

Out[253]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

In [254]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMS


E': []})
resultsDf_6

Out[254]:
Alpha Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Alpha Values Train RMSE Test RMSE
Alpha Values Train RMSE Test RMSE

In [255]: for i in np.arange(0.3,1,0.1):


model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False
,use_brute=True)
SES_train['predict',i] = model_SES_alpha_i.fittedvalues
SES_test['predict',i] = model_SES_alpha_i.forecast(steps=len(test))

rmse_model5_train_i = metrics.mean_squared_error(SES_train['Sparkli
ng'],SES_train['predict',i],squared=False)

rmse_model5_test_i = metrics.mean_squared_error(SES_test['Sparklin
g'],SES_test['predict',i],squared=False)

resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train RMSE':rms


e_model5_train_i
,'Test RMSE':rmse_model5_test_i},
ignore_index=True)

In [256]: resultsDf_6.sort_values(by=['Test RMSE']).head()

Out[256]:
Alpha Values Train RMSE Test RMSE

0 0.3 1359.511747 1935.507132

1 0.4 1352.588879 2311.919615

2 0.5 1344.004369 2666.351413

3 0.6 1338.805381 2979.204388

4 0.7 1338.844308 3249.944092

In [257]: ## Plotting on both the Training and Test data

plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')

plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smooth


ing predictions on Test Set')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential
Smoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

In [258]: resultsDf_6_1 = pd.DataFrame({'Test RMSE': [resultsDf_6.sort_values(by=


['Test RMSE'],ascending=True).values[0][2]]}
,index=['Alpha=0.3,SimpleExponentialSmoothin
g'])

resultsDf = pd.concat([resultsDf, resultsDf_6_1])


resultsDf

Out[258]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Model 6: Double Exponential Smoothing - Holt's Model

In [259]: DES_train = train.copy()


DES_test = test.copy()

In [260]: model_DES = Holt(DES_train['Sparkling'])

In [261]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train R


MSE':[],'Test RMSE': []})
resultsDf_7

Out[261]:
Alpha Values Beta Values Train RMSE Test RMSE

In [262]: for i in np.arange(0.3,1.1,0.1):


for j in np.arange(0.3,1.1,0.1):
model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing
_slope=j,optimized=False,use_brute=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=le

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
n(test))

rmse_model6_train = metrics.mean_squared_error(DES_train['Spark
ling'],DES_train['predict',i,j],squared=False)

rmse_model6_test = metrics.mean_squared_error(DES_test['Sparkli
ng'],DES_test['predict',i,j],squared=False)

resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Value


s':j,'Train RMSE':rmse_model6_train
,'Test RMSE':rmse_model6_test
}, ignore_index=True)

In [263]: resultsDf_7

Out[263]:
Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.3 1592.292788 18259.110704

1 0.3 0.4 1682.573828 26069.841401

2 0.3 0.5 1771.710791 34401.512440

3 0.3 0.6 1848.576510 42162.748095

4 0.3 0.7 1899.949006 47832.397419

... ... ... ... ...

59 1.0 0.6 1753.402326 49327.087977

60 1.0 0.7 1825.187155 52655.765663

61 1.0 0.8 1902.013709 55442.273880

62 1.0 0.9 1985.368445 57823.177011

63 1.0 1.0 2077.672157 59877.076519

64 rows × 4 columns

In [264]: resultsDf_7.sort_values(by=['Test RMSE']).head()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Out[264]: Alpha Values Beta Values Train RMSE Test RMSE

0 0.3 0.3 1592.292788 18259.110704

8 0.4 0.3 1569.338606 23878.496940

1 0.3 0.4 1682.573828 26069.841401

16 0.5 0.3 1530.575845 27095.532414

24 0.6 0.3 1506.449870 29070.722592

In [265]: ## Plotting on both the Training and Test data

plt.plot(DES_train['Sparkling'], label='Train')
plt.plot(DES_test['Sparkling'], label='Test')

plt.plot(DES_test['predict', 0.3, 0.3], label='Alpha=0.3,Beta=0.3,Doubl


eExponentialSmoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [266]: resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7.sort_values(by=
['Test RMSE']).values[0][3]]}
,index=['Alpha=0.3,Beta=0.3,DoubleExponentia
lSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_7_1])


resultsDf

Out[266]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Model 7: Triple Exponential Smoothing - Holt Winter's Model

In [267]: TES_train = train.copy()


TES_test = test.copy()

In [268]: model_TES = ExponentialSmoothing(TES_train['Sparkling'],trend='additiv


e',seasonal='multiplicative',freq='M')

In [269]: model_TES_autofit = model_TES.fit()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [270]: model_TES_autofit.params

Out[270]: {'smoothing_level': 0.15422095573741845,


'smoothing_slope': 1.3071916052588054e-21,
'smoothing_seasonal': 0.3713310279757222,
'damping_slope': nan,
'initial_level': 1639.9993541529163,
'initial_slope': 4.847700780681414,
'initial_seasons': array([1.00842157, 0.96898339, 1.24179259, 1.132059
82, 0.93982621,
0.93811361, 1.22459277, 1.54431221, 1.27337186, 1.63199233,
2.48297346, 3.11866318]),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}

In [271]: ## Prediction on the test data

TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()

Out[271]:
Sparkling auto_predict

Time_Stamp

1991-01-31 1902 1602.184489

1991-02-28 2049 1373.877337

1991-03-31 1874 1807.430667

1991-04-30 1279 1704.563503

1991-05-31 1432 1602.368982

In [272]: ## Plotting on both the Training and Test using autofit

plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')

plt.plot(TES_test['auto_predict'], label='Alpha=0.154,Beta=1.307,Gamma=

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.371,TripleExponentialSmoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

In [273]: ## Test Data

rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Sparkling'],T
ES_test['auto_predict'],squared=False)
print("For Alpha=0.154,Beta=1.307,Gamma=0.371, Triple Exponential Smoot
hing Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_tes
t_1))

For Alpha=0.154,Beta=1.307,Gamma=0.371, Triple Exponential Smoothing Mo


del forecast on the Test Data, RMSE is 383.156

In [274]: resultsDf_8_1 = pd.DataFrame({'Test RMSE': [rmse_model6_test_1]}


,index=['Alpha=0.154,Beta=1.307,Gamma=0.371,
TripleExponentialSmoothing'])

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf = pd.concat([resultsDf, resultsDf_8_1])
resultsDf

Out[274]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

In [275]: ## First we will define an empty dataframe to store our values from the
loop

resultsDf_8_2 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Gamma


Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_8_2

Out[275]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE

In [276]: for i in np.arange(0.3,1.1,0.1):


for j in np.arange(0.3,1.1,0.1):
for k in np.arange(0.3,1.1,0.1):
model_TES_alpha_i_j_k = model_TES.fit(smoothing_level=i,smo
othing_slope=j,smoothing_seasonal=k,optimized=False,use_brute=True)
TES_train['predict',i,j,k] = model_TES_alpha_i_j_k.fittedva

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))

rmse_model8_train = metrics.mean_squared_error(TES_train['S
parkling'],TES_train['predict',i,j,k],squared=False)

rmse_model8_test = metrics.mean_squared_error(TES_test['Spa
rkling'],TES_test['predict',i,j,k],squared=False)

resultsDf_8_2 = resultsDf_8_2.append({'Alpha Values':i,'Bet


a Values':j,'Gamma Values':k,
'Train RMSE':rmse_mod
el8_train,'Test RMSE':rmse_model8_test}
, ignore_index=True)

In [277]: resultsDf_8_2.sort_values(by=['Test RMSE']).head()

Out[277]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE

0 0.3 0.3 0.3 404.513320 392.786198

8 0.3 0.4 0.3 424.828055 410.854547

65 0.4 0.3 0.4 435.553595 421.409170

296 0.7 0.8 0.3 700.317756 518.188752

130 0.5 0.3 0.5 498.239915 542.175497

In [278]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination

plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')

#The value of alpha and beta is taken like that by python


plt.plot(TES_test['predict', 0.3, 0.3, 0.3], label='Alpha=0.3,Beta=0.3,
Gamma=0.3,TripleExponentialSmoothing predictions on Test Set')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.legend(loc='best')
plt.grid();

In [279]: resultsDf_8_3 = pd.DataFrame({'Test RMSE': [resultsDf_8_2.sort_values(b


y=['Test RMSE']).values[0][4]]}
,index=['Alpha=0.3,Beta=0.3,Gamma=0.3,Triple
ExponentialSmoothing'])

resultsDf = pd.concat([resultsDf, resultsDf_8_3])


resultsDf

Out[279]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.

In [280]: ## Test for stationarity of the series - Dicky Fuller test

from statsmodels.tsa.stattools import adfuller


def test_stationarity(timeseries):

#Determing rolling statistics


rolmean = timeseries.rolling(window=7).mean() #determining the roll
ing mean
rolstd = timeseries.rolling(window=7).std() #determining the roll
ing standard deviation

#Plot rolling statistics:


orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.show(block=False)

#Perform Dickey-Fuller test:


print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value'
,'#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')

In [281]: test_stationarity(df1['Sparkling'])

Results of Dickey-Fuller Test:


Test Statistic -1.360497
p-value 0.601061
#Lags Used 11.000000
Number of Observations Used 175.000000
Critical Value (1%) -3.468280
Critical Value (5%) -2.878202
Critical Value (10%) -2.575653

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
dtype: float64

In [282]: test_stationarity(df1['Sparkling'].diff().dropna())

Results of Dickey-Fuller Test:


Test Statistic -45.050301
p-value 0.000000
#Lags Used 10.000000
Number of Observations Used 175.000000
Critical Value (1%) -3.468280
Critical Value (5%) -2.878202
Critical Value (10%) -2.575653
dtype: float64

6. Build an automated version of the ARIMA/SARIMA


model in which the parameters are selected using the
lowest Akaike Information Criteria (AIC) on the training

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
data and evaluate this model on the test data using
RMSE.
In [283]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

In [284]: plot_acf(df1['Sparkling'],lags=50)
plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Diff
erenced Data Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [285]: plot_pacf(df1['Sparkling'],lags=50)
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Dif
ferenced Data Partial Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [286]: test_stationarity(train['Sparkling'])

Results of Dickey-Fuller Test:


Test Statistic -1.208926
p-value 0.669744
#Lags Used 12.000000
Number of Observations Used 119.000000
Critical Value (1%) -3.486535
Critical Value (5%) -2.886151

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Critical Value (10%) -2.579896
dtype: float64

In [287]: test_stationarity(train['Sparkling'].diff().dropna())

Results of Dickey-Fuller Test:


Test Statistic -8.005007e+00
p-value 2.280104e-12
#Lags Used 1.100000e+01
Number of Observations Used 1.190000e+02
Critical Value (1%) -3.486535e+00
Critical Value (5%) -2.886151e+00
Critical Value (10%) -2.579896e+00
dtype: float64

In [288]: ## The following loop helps us in getting a combination of different pa


rameters of p and q in the range of 0 and 2
## We have kept the value of d as 1 as we need to take a difference of
the series to make it stationary.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
import itertools
p = q = range(0, 3)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
print('Some parameter combinations for the Model...')
for i in range(1,len(pdq)):
print('Model: {}'.format(pdq[i]))

Some parameter combinations for the Model...


Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)

In [289]: # Creating an empty Dataframe with column names only


ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
ARIMA_AIC

Out[289]:
param AIC

In [290]: from statsmodels.tsa.arima_model import ARIMA

for param in pdq:


ARIMA_model = ARIMA(train['Sparkling'].values,order=param).fit()
print('ARIMA{} - AIC:{}'.format(param,ARIMA_model.aic))
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic
}, ignore_index=True)

ARIMA(0, 1, 0) - AIC:2269.582796371201
ARIMA(0, 1, 1) - AIC:2264.906436778214
ARIMA(0, 1, 2) - AIC:2232.7830976841487
ARIMA(1, 1, 0) - AIC:2268.528060607777
ARIMA(1, 1, 1) - AIC:2235.0139453507686

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
ARIMA(1, 1, 2) - AIC:2233.5976471191107
ARIMA(2, 1, 0) - AIC:2262.035600076631
ARIMA(2, 1, 1) - AIC:2232.3604898787803
ARIMA(2, 1, 2) - AIC:2210.6166920985434

In [291]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value

ARIMA_AIC.sort_values(by='AIC',ascending=True)

Out[291]:
param AIC

8 (2, 1, 2) 2210.616692

7 (2, 1, 1) 2232.360490

2 (0, 1, 2) 2232.783098

5 (1, 1, 2) 2233.597647

4 (1, 1, 1) 2235.013945

6 (2, 1, 0) 2262.035600

1 (0, 1, 1) 2264.906437

3 (1, 1, 0) 2268.528061

0 (0, 1, 0) 2269.582796

In [292]: auto_ARIMA = ARIMA(train['Sparkling'], order=(2,1,2),freq='M')

results_auto_ARIMA = auto_ARIMA.fit()

print(results_auto_ARIMA.summary())

ARIMA Model Results


=======================================================================
=======
Dep. Variable: D.Sparkling No. Observations:
131
Model: ARIMA(2, 1, 2) Log Likelihood -1

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
099.308
Method: css-mle S.D. of innovations 1
012.075
Date: Sun, 20 Jun 2021 AIC 2
210.617
Time: 23:06:12 BIC 2
227.868
Sample: 02-29-1980 HQIC 2
217.627
- 12-31-1990
=======================================================================
==============
coef std err z P>|z| [0.0
25 0.975]
-----------------------------------------------------------------------
--------------
const 5.5858 0.516 10.823 0.000 4.5
74 6.597
ar.L1.D.Sparkling 1.2697 0.074 17.044 0.000 1.1
24 1.416
ar.L2.D.Sparkling -0.5600 0.074 -7.615 0.000 -0.7
04 -0.416
ma.L1.D.Sparkling -1.9991 0.042 -47.161 0.000 -2.0
82 -1.916
ma.L2.D.Sparkling 0.9991 0.042 23.584 0.000 0.9
16 1.082
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
AR.1 1.1337 -0.7074j 1.3363 -
0.0888
AR.2 1.1337 +0.7074j 1.3363
0.0888
MA.1 1.0005 -0.0005j 1.0005 -
0.0001
MA.2 1.0005 +0.0005j 1.0005

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.0001
-----------------------------------------------------------------------
------

In [293]: predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(test))

In [294]: from sklearn.metrics import mean_squared_error


rmse = mean_squared_error(test['Sparkling'],predicted_auto_ARIMA[0],squ
ared=False)
print(rmse)

1374.9769475915175

In [295]: resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}


,index=['ARIMA(0,1,2)'])

resultsDf = pd.concat([resultsDf, resultsDf_9])


resultsDf

Out[295]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA

In [296]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plt.show()

Setting the seasonality as 6 for the first iteration of the auto SARIMA model.

In [297]: import itertools


p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
pdq = list(itertools.product(p, d, q))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))

Examples of some parameter combinations for Model...


Model: (0, 1, 1)(0, 0, 1, 6)
Model: (0, 1, 2)(0, 0, 2, 6)
Model: (1, 1, 0)(1, 0, 0, 6)
Model: (1, 1, 1)(1, 0, 1, 6)
Model: (1, 1, 2)(1, 0, 2, 6)
Model: (2, 1, 0)(2, 0, 0, 6)
Model: (2, 1, 1)(2, 0, 1, 6)
Model: (2, 1, 2)(2, 0, 2, 6)

In [298]: SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])


SARIMA_AIC

Out[298]:
param seasonal AIC

In [299]: import statsmodels.api as sm

for param in pdq:


for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Sparkling'].val
ues,
order=param,
seasonal_order=param_season
al,
enforce_stationarity=False,
enforce_invertibility=False
)

results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)

SARIMA(0, 1, 0)x(0, 0, 0, 6) - AIC:2251.3597196862966


SARIMA(0, 1, 0)x(0, 0, 1, 6) - AIC:2152.3780760934646
SARIMA(0, 1, 0)x(0, 0, 2, 6) - AIC:1955.635553691462
SARIMA(0, 1, 0)x(1, 0, 0, 6) - AIC:2164.4097581959904
SARIMA(0, 1, 0)x(1, 0, 1, 6) - AIC:2079.559988015178
SARIMA(0, 1, 0)x(1, 0, 2, 6) - AIC:1926.936011141801
SARIMA(0, 1, 0)x(2, 0, 0, 6) - AIC:1839.401298687227
SARIMA(0, 1, 0)x(2, 0, 1, 6) - AIC:1841.1993618161407
SARIMA(0, 1, 0)x(2, 0, 2, 6) - AIC:1810.9177811077682
SARIMA(0, 1, 1)x(0, 0, 0, 6) - AIC:2230.1629078505994
SARIMA(0, 1, 1)x(0, 0, 1, 6) - AIC:2130.565285908976
SARIMA(0, 1, 1)x(0, 0, 2, 6) - AIC:1918.1741580885578
SARIMA(0, 1, 1)x(1, 0, 0, 6) - AIC:2139.573319180911
SARIMA(0, 1, 1)x(1, 0, 1, 6) - AIC:2006.5174298478878
SARIMA(0, 1, 1)x(1, 0, 2, 6) - AIC:1855.7093275268933
SARIMA(0, 1, 1)x(2, 0, 0, 6) - AIC:1798.7885104786772
SARIMA(0, 1, 1)x(2, 0, 1, 6) - AIC:1800.772626397826
SARIMA(0, 1, 1)x(2, 0, 2, 6) - AIC:1741.7036713090783
SARIMA(0, 1, 2)x(0, 0, 0, 6) - AIC:2187.441010220092
SARIMA(0, 1, 2)x(0, 0, 1, 6) - AIC:2084.4254745419194
SARIMA(0, 1, 2)x(0, 0, 2, 6) - AIC:1886.117140891461
SARIMA(0, 1, 2)x(1, 0, 0, 6) - AIC:2129.739572913447
SARIMA(0, 1, 2)x(1, 0, 1, 6) - AIC:1988.4111271264148
SARIMA(0, 1, 2)x(1, 0, 2, 6) - AIC:1839.6938085591055
SARIMA(0, 1, 2)x(2, 0, 0, 6) - AIC:1791.6537078775532
SARIMA(0, 1, 2)x(2, 0, 1, 6) - AIC:1793.6190985238984
SARIMA(0, 1, 2)x(2, 0, 2, 6) - AIC:1727.8879855309042
SARIMA(1, 1, 0)x(0, 0, 0, 6) - AIC:2250.3181267386713
SARIMA(1, 1, 0)x(0, 0, 1, 6) - AIC:2151.0782683263546
SARIMA(1, 1, 0)x(0, 0, 2, 6) - AIC:1953.365224545567
SARIMA(1, 1, 0)x(1, 0, 0, 6) - AIC:2146.1836648745016
SARIMA(1, 1, 0)x(1, 0, 1, 6) - AIC:2098.1824774568586
SARIMA(1, 1, 0)x(1, 0, 2, 6) - AIC:1917.5889469234053
SARIMA(1, 1, 0)x(2, 0, 0, 6) - AIC:1813.2423977806711
SARIMA(1, 1, 0)x(2, 0, 1, 6) - AIC:1814.830160311021
SARIMA(1, 1, 0)x(2, 0, 2, 6) - AIC:1791.3715261766997
SARIMA(1, 1, 1)x(0, 0, 0, 6) - AIC:2204.934049198176
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(1, 1, 1)x(0, 0, 0, 6) AIC:2204.934049198176
SARIMA(1, 1, 1)x(0, 0, 1, 6) - AIC:2103.2471232163575
SARIMA(1, 1, 1)x(0, 0, 2, 6) - AIC:1906.395141260643
SARIMA(1, 1, 1)x(1, 0, 0, 6) - AIC:2109.667121270094
SARIMA(1, 1, 1)x(1, 0, 1, 6) - AIC:2005.6125660382927
SARIMA(1, 1, 1)x(1, 0, 2, 6) - AIC:1856.072780455429
SARIMA(1, 1, 1)x(2, 0, 0, 6) - AIC:1776.94176703765
SARIMA(1, 1, 1)x(2, 0, 1, 6) - AIC:1778.822255728223
SARIMA(1, 1, 1)x(2, 0, 2, 6) - AIC:1777.2232751647634
SARIMA(1, 1, 2)x(0, 0, 0, 6) - AIC:2188.463345041383
SARIMA(1, 1, 2)x(0, 0, 1, 6) - AIC:2089.1320927065735
SARIMA(1, 1, 2)x(0, 0, 2, 6) - AIC:1908.3550140777963
SARIMA(1, 1, 2)x(1, 0, 0, 6) - AIC:2108.564551028778
SARIMA(1, 1, 2)x(1, 0, 1, 6) - AIC:1989.9755159145898
SARIMA(1, 1, 2)x(1, 0, 2, 6) - AIC:1839.3494618363907
SARIMA(1, 1, 2)x(2, 0, 0, 6) - AIC:1773.4229389216596
SARIMA(1, 1, 2)x(2, 0, 1, 6) - AIC:1775.2482456129092
SARIMA(1, 1, 2)x(2, 0, 2, 6) - AIC:1727.6786974488084
SARIMA(2, 1, 0)x(0, 0, 0, 6) - AIC:2227.302761872421
SARIMA(2, 1, 0)x(0, 0, 1, 6) - AIC:2145.357699167862
SARIMA(2, 1, 0)x(0, 0, 2, 6) - AIC:1945.156142600047
SARIMA(2, 1, 0)x(1, 0, 0, 6) - AIC:2124.9071786719132
SARIMA(2, 1, 0)x(1, 0, 1, 6) - AIC:2054.170076930759
SARIMA(2, 1, 0)x(1, 0, 2, 6) - AIC:1915.6336928719586
SARIMA(2, 1, 0)x(2, 0, 0, 6) - AIC:1782.7357821169276
SARIMA(2, 1, 0)x(2, 0, 1, 6) - AIC:1782.3598160572335
SARIMA(2, 1, 0)x(2, 0, 2, 6) - AIC:1760.3426707271706
SARIMA(2, 1, 1)x(0, 0, 0, 6) - AIC:2199.8586131872835
SARIMA(2, 1, 1)x(0, 0, 1, 6) - AIC:2103.0859053283684
SARIMA(2, 1, 1)x(0, 0, 2, 6) - AIC:1903.0344579425016
SARIMA(2, 1, 1)x(1, 0, 0, 6) - AIC:2088.133636377472
SARIMA(2, 1, 1)x(1, 0, 1, 6) - AIC:1997.3692879679495
SARIMA(2, 1, 1)x(1, 0, 2, 6) - AIC:1852.7863807256508
SARIMA(2, 1, 1)x(2, 0, 0, 6) - AIC:1794.8112224461129
SARIMA(2, 1, 1)x(2, 0, 1, 6) - AIC:1763.2674869873797
SARIMA(2, 1, 1)x(2, 0, 2, 6) - AIC:1744.0407502338978
SARIMA(2, 1, 2)x(0, 0, 0, 6) - AIC:2176.8698050945236
SARIMA(2, 1, 2)x(0, 0, 1, 6) - AIC:2078.885409868417
SARIMA(2, 1, 2)x(0, 0, 2, 6) - AIC:1882.5052122587012
SARIMA(2, 1, 2)x(1, 0, 0, 6) - AIC:2074.0236540188266
SARIMA(2, 1, 2)x(1, 0, 1, 6) - AIC:1955.6059004541967
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(2, 1, 2)x(1, 0, 1, 6) AIC:1955.6059004541967
SARIMA(2, 1, 2)x(1, 0, 2, 6) - AIC:1836.9065979271265
SARIMA(2, 1, 2)x(2, 0, 0, 6) - AIC:1758.9610729847134
SARIMA(2, 1, 2)x(2, 0, 1, 6) - AIC:1765.202888909307
SARIMA(2, 1, 2)x(2, 0, 2, 6) - AIC:1782.339988397053

In [300]: SARIMA_AIC.sort_values(by=['AIC']).head()

Out[300]:
param seasonal AIC

53 (1, 1, 2) (2, 0, 2, 6) 1727.678697

26 (0, 1, 2) (2, 0, 2, 6) 1727.887986

17 (0, 1, 1) (2, 0, 2, 6) 1741.703671

71 (2, 1, 1) (2, 0, 2, 6) 1744.040750

78 (2, 1, 2) (2, 0, 0, 6) 1758.961073

In [301]: import statsmodels.api as sm

auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
seasonal_order=(2, 0, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_6 = auto_SARIMA_6.fit(maxiter=1000)
print(results_auto_SARIMA_6.summary())

SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 6) Log Likelihood
-855.839
Date: Sun, 20 Jun 2021 AIC
1727.679
Time: 23:07:48 BIC
1749.707

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1736.621
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6449 0.286 -2.256 0.024 -1.205
-0.085
ma.L1 -0.1069 0.250 -0.428 0.669 -0.596
0.383
ma.L2 -0.7005 0.202 -3.469 0.001 -1.096
-0.305
ar.S.L6 -0.0044 0.027 -0.164 0.870 -0.057
0.049
ar.S.L12 1.0361 0.018 56.098 0.000 1.000
1.072
ma.S.L6 0.0674 0.152 0.443 0.658 -0.231
0.366
ma.S.L12 -0.6123 0.093 -6.591 0.000 -0.794
-0.430
sigma2 1.448e+05 1.71e+04 8.466 0.000 1.11e+05
1.78e+05
=======================================================================
============
Ljung-Box (Q): 28.93 Jarque-Bera (JB):
25.24
Prob(Q): 0.90 Prob(JB):
0.00
Heteroskedasticity (H): 2.63 Skew:
0.47
Prob(H) (two-sided): 0.00 Kurtosis:
5.09
=======================================================================
============

Warnings:

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [302]: results_auto_SARIMA_6.plot_diagnostics()
plt.show()

In [303]: predicted_auto_SARIMA_6 = results_auto_SARIMA_6.get_forecast(steps=len(


test))

In [304]: predicted_auto_SARIMA_6.summary_frame(alpha=0.05).head()

Out[304]:
y mean mean_se mean_ci_lower mean_ci_upper

0 1330.327397 380.564870 584.433959 2076.220835

1 1177.302125 392.117087 408.766757 1945.837493

2 1625.833239 392.311737 856.916363 2394.750115

3 1546.291624 397.712592 766.789267 2325.793981

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper

4 1308.860522 398.932958 526.966292 2090.754753

In [305]: rmse = mean_squared_error(test['Sparkling'],predicted_auto_SARIMA_6.pre


dicted_mean,squared=False)
print(rmse)

626.880153949547

In [306]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(1,1,2)(2,0,2,6)'])
resultsDf = pd.concat([resultsDf, temp_resultsDf])
resultsDf

Out[306]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA(1,1,2)(2,0,2,6) 626.880154

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Setting the seasonality as 12 for the second iteration of the auto SARIMA model.

In [307]: import itertools


p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
pdq = list(itertools.product(p, d, q))
model_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p,
D, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))

Examples of some parameter combinations for Model...


Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)

In [308]: SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])


SARIMA_AIC

Out[308]:
param seasonal AIC

In [309]: import statsmodels.api as sm

for param in pdq:


for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Sparkling'].val
ues,
order=param,
seasonal_order=param_season

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
al,
enforce_stationarity=False,
enforce_invertibility=False
)

results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)

SARIMA(0, 1, 0)x(0, 0, 0, 12) - AIC:2251.3597196862966


SARIMA(0, 1, 0)x(0, 0, 1, 12) - AIC:1956.261461684334
SARIMA(0, 1, 0)x(0, 0, 2, 12) - AIC:1723.153364023482
SARIMA(0, 1, 0)x(1, 0, 0, 12) - AIC:1837.436602245668
SARIMA(0, 1, 0)x(1, 0, 1, 12) - AIC:1806.990530137327
SARIMA(0, 1, 0)x(1, 0, 2, 12) - AIC:1633.2108735940185
SARIMA(0, 1, 0)x(2, 0, 0, 12) - AIC:1648.3776153470858
SARIMA(0, 1, 0)x(2, 0, 1, 12) - AIC:1647.2054160640976
SARIMA(0, 1, 0)x(2, 0, 2, 12) - AIC:1630.9898054509874
SARIMA(0, 1, 1)x(0, 0, 0, 12) - AIC:2230.1629078505994
SARIMA(0, 1, 1)x(0, 0, 1, 12) - AIC:1923.7688649611964
SARIMA(0, 1, 1)x(0, 0, 2, 12) - AIC:1692.708957238344
SARIMA(0, 1, 1)x(1, 0, 0, 12) - AIC:1797.17958817917
SARIMA(0, 1, 1)x(1, 0, 1, 12) - AIC:1738.0973022178505
SARIMA(0, 1, 1)x(1, 0, 2, 12) - AIC:1570.1319652608092
SARIMA(0, 1, 1)x(2, 0, 0, 12) - AIC:1605.6751954493398
SARIMA(0, 1, 1)x(2, 0, 1, 12) - AIC:1599.2245087130186
SARIMA(0, 1, 1)x(2, 0, 2, 12) - AIC:1570.3683739995645
SARIMA(0, 1, 2)x(0, 0, 0, 12) - AIC:2187.441010220092
SARIMA(0, 1, 2)x(0, 0, 1, 12) - AIC:1887.2897618827865
SARIMA(0, 1, 2)x(0, 0, 2, 12) - AIC:1657.6640310415592
SARIMA(0, 1, 2)x(1, 0, 0, 12) - AIC:1790.0326332203936
SARIMA(0, 1, 2)x(1, 0, 1, 12) - AIC:1724.166738337171
SARIMA(0, 1, 2)x(1, 0, 2, 12) - AIC:1557.1603191173776
SARIMA(0, 1, 2)x(2, 0, 0, 12) - AIC:1603.9654774483727
SARIMA(0, 1, 2)x(2, 0, 1, 12) - AIC:1600.5438801521245
SARIMA(0, 1, 2)x(2, 0, 2, 12) - AIC:1557.1215626799462
SARIMA(1, 1, 0)x(0, 0, 0, 12) - AIC:2250.3181267386713
SARIMA(1, 1, 0)x(0, 0, 1, 12) - AIC:1954.3938339929728
SARIMA(1 1 0) (0 0 2 12) AIC 1721 2688476313904
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(1, 1, 0)x(0, 0, 2, 12) - AIC:1721.2688476313904
SARIMA(1, 1, 0)x(1, 0, 0, 12) - AIC:1811.2440279327332
SARIMA(1, 1, 0)x(1, 0, 1, 12) - AIC:1788.5343592980894
SARIMA(1, 1, 0)x(1, 0, 2, 12) - AIC:1616.4894412533986

SARIMA(1, 1, 0)x(2, 0, 0, 12) - AIC:1621.635508043298


SARIMA(1, 1, 0)x(2, 0, 1, 12) - AIC:1617.1356135615595
SARIMA(1, 1, 0)x(2, 0, 2, 12) - AIC:1616.541210075358
SARIMA(1, 1, 1)x(0, 0, 0, 12) - AIC:2204.934049198176
SARIMA(1, 1, 1)x(0, 0, 1, 12) - AIC:1907.3558977806388
SARIMA(1, 1, 1)x(0, 0, 2, 12) - AIC:1678.098135279112
SARIMA(1, 1, 1)x(1, 0, 0, 12) - AIC:1775.142446655006
SARIMA(1, 1, 1)x(1, 0, 1, 12) - AIC:1739.5403526721561
SARIMA(1, 1, 1)x(1, 0, 2, 12) - AIC:1571.3248864655484
SARIMA(1, 1, 1)x(2, 0, 0, 12) - AIC:1590.6161607076579
SARIMA(1, 1, 1)x(2, 0, 1, 12) - AIC:1586.3142236394447
SARIMA(1, 1, 1)x(2, 0, 2, 12) - AIC:1571.8069968891837
SARIMA(1, 1, 2)x(0, 0, 0, 12) - AIC:2188.463345041383
SARIMA(1, 1, 2)x(0, 0, 1, 12) - AIC:1889.7708318442067
SARIMA(1, 1, 2)x(0, 0, 2, 12) - AIC:1659.6291422585368
SARIMA(1, 1, 2)x(1, 0, 0, 12) - AIC:1771.825980269072
SARIMA(1, 1, 2)x(1, 0, 1, 12) - AIC:1723.9871816320992
SARIMA(1, 1, 2)x(1, 0, 2, 12) - AIC:1555.5842544351642
SARIMA(1, 1, 2)x(2, 0, 0, 12) - AIC:1588.421693151794
SARIMA(1, 1, 2)x(2, 0, 1, 12) - AIC:1585.5152985346901
SARIMA(1, 1, 2)x(2, 0, 2, 12) - AIC:1556.0802592562013
SARIMA(2, 1, 0)x(0, 0, 0, 12) - AIC:2227.302761872421
SARIMA(2, 1, 0)x(0, 0, 1, 12) - AIC:1946.4383435279221
SARIMA(2, 1, 0)x(0, 0, 2, 12) - AIC:1711.4123039734561
SARIMA(2, 1, 0)x(1, 0, 0, 12) - AIC:1780.764606593656
SARIMA(2, 1, 0)x(1, 0, 1, 12) - AIC:1756.9357346857084
SARIMA(2, 1, 0)x(1, 0, 2, 12) - AIC:1600.970220445633
SARIMA(2, 1, 0)x(2, 0, 0, 12) - AIC:1592.2403465602624
SARIMA(2, 1, 0)x(2, 0, 1, 12) - AIC:1587.6344986172633
SARIMA(2, 1, 0)x(2, 0, 2, 12) - AIC:1585.9191734973074
SARIMA(2, 1, 1)x(0, 0, 0, 12) - AIC:2199.8586131872835
SARIMA(2, 1, 1)x(0, 0, 1, 12) - AIC:1905.0209496747561
SARIMA(2, 1, 1)x(0, 0, 2, 12) - AIC:1675.4234079628686
SARIMA(2, 1, 1)x(1, 0, 0, 12) - AIC:1759.7569277616458
SARIMA(2, 1, 1)x(1, 0, 1, 12) - AIC:1740.0911247357424
SARIMA(2 1 1)x(1 0 2 12) - AIC:1571 9888282581596
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(2, 1, 1)x(1, 0, 2, 12) - AIC:1571.9888282581596
SARIMA(2, 1, 1)x(2, 0, 0, 12) - AIC:1577.1235065031422
SARIMA(2, 1, 1)x(2, 0, 1, 12) - AIC:1573.1596420398046
SARIMA(2, 1, 1)x(2, 0, 2, 12) - AIC:1571.9780296421281
SARIMA(2, 1, 2)x(0, 0, 0, 12) - AIC:2176.8698050945236
SARIMA(2, 1, 2)x(0, 0, 1, 12) - AIC:1891.5463663420826
SARIMA(2, 1, 2)x(0, 0, 2, 12) - AIC:1664.2974982219203
SARIMA(2, 1, 2)x(1, 0, 0, 12) - AIC:1757.214093119035
SARIMA(2, 1, 2)x(1, 0, 1, 12) - AIC:1725.6086059339216
SARIMA(2, 1, 2)x(1, 0, 2, 12) - AIC:1557.439140240327
SARIMA(2, 1, 2)x(2, 0, 0, 12) - AIC:1576.0455911149202
SARIMA(2, 1, 2)x(2, 0, 1, 12) - AIC:1573.5476000272845
SARIMA(2, 1, 2)x(2, 0, 2, 12) - AIC:1557.6893214191882

In [310]: SARIMA_AIC.sort_values(by=['AIC']).head()

Out[310]:
param seasonal AIC

50 (1, 1, 2) (1, 0, 2, 12) 1555.584254

53 (1, 1, 2) (2, 0, 2, 12) 1556.080259

26 (0, 1, 2) (2, 0, 2, 12) 1557.121563

23 (0, 1, 2) (1, 0, 2, 12) 1557.160319

77 (2, 1, 2) (1, 0, 2, 12) 1557.439140

In [311]: import statsmodels.api as sm

auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
seasonal_order=(1, 0, 2, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_12 = auto_SARIMA_12.fit(maxiter=1000)
print(results_auto_SARIMA_12.summary())

SARIMAX Results
=======================================================================
===================

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(1, 0, 2, 12) Log Likelihood
-770.792
Date: Sun, 20 Jun 2021 AIC
1555.584
Time: 23:10:10 BIC
1574.095
Sample: 0 HQIC
1563.083
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6282 0.255 -2.462 0.014 -1.128
-0.128
ma.L1 -0.1041 0.225 -0.463 0.643 -0.545
0.337
ma.L2 -0.7276 0.154 -4.731 0.000 -1.029
-0.426
ar.S.L12 1.0439 0.014 72.799 0.000 1.016
1.072
ma.S.L12 -0.5550 0.098 -5.661 0.000 -0.747
-0.363
ma.S.L24 -0.1354 0.120 -1.133 0.257 -0.370
0.099
sigma2 1.506e+05 2.04e+04 7.398 0.000 1.11e+05
1.91e+05
=======================================================================
============
Ljung-Box (Q): 23.02 Jarque-Bera (JB):
11.73
Prob(Q): 0.99 Prob(JB):
0.00
Heteroskedasticity (H): 1.47 Skew:
0.36

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Prob(H) (two-sided): 0.26 Kurtosis:
4.48
=======================================================================
============

Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [312]: results_auto_SARIMA_12.plot_diagnostics()
plt.show()

In [313]: predicted_auto_SARIMA_12 = results_auto_SARIMA_12.get_forecast(steps=le


n(test))

In [314]: predicted_auto_SARIMA_12.summary_frame(alpha=0.05).head()

Out[314]:
y mean mean_se mean_ci_lower mean_ci_upper

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper

0 1327.548225 388.400943 566.296365 2088.800086

1 1315.192780 402.062309 527.165135 2103.220425

2 1621.707620 402.055918 833.692501 2409.722738

3 1598.950391 407.292759 800.671253 2397.229530

4 1392.811018 408.022686 593.101248 2192.520788

In [315]: rmse = mean_squared_error(test['Sparkling'],predicted_auto_SARIMA_12.pr


edicted_mean,squared=False)
print(rmse)

528.4527303319784

In [316]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(1,1,2)(1,0,2,12)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[316]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA(1,1,2)(2,0,2,6) 626.880154

SARIMA(1,1,2)(1,0,2,12) 528.452730

7. Build ARIMA/SARIMA models based on the cut-off


points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.

Manual SARIMA

In [317]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced D
ata Partial Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [318]: manual_ARIMA = ARIMA(train['Sparkling'].astype('float64'), order=(1,1,1
),freq='M')

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
results_manual_ARIMA = manual_ARIMA.fit()

print(results_manual_ARIMA.summary())

ARIMA Model Results


=======================================================================
=======
Dep. Variable: D.Sparkling No. Observations:
131
Model: ARIMA(1, 1, 1) Log Likelihood -1
113.507
Method: css-mle S.D. of innovations 1
171.377
Date: Sun, 20 Jun 2021 AIC 2
235.014
Time: 23:10:12 BIC 2
246.515
Sample: 02-29-1980 HQIC 2
239.687
- 12-31-1990
=======================================================================
==============
coef std err z P>|z| [0.0
25 0.975]
-----------------------------------------------------------------------
--------------
const 6.7493 4.616 1.462 0.144 -2.2
99 15.797
ar.L1.D.Sparkling 0.4289 0.082 5.221 0.000 0.2
68 0.590
ma.L1.D.Sparkling -1.0000 0.019 -51.962 0.000 -1.0
38 -0.962
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
AR.1 2.3313 +0.0000j 2.3313
0.0000
MA.1 1.0000 +0.0000j 1.0000
0.0000
-----------------------------------------------------------------------
------

In [319]: predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))

In [320]: rmse = mean_squared_error(test['Sparkling'],predicted_manual_ARIMA[0],s


quared=False)
print(rmse)

1461.6785026377725

In [321]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['ARIMA(1,1,1)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[321]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA(1,1,2)(2,0,2,6) 626.880154

SARIMA(1,1,2)(1,0,2,12) 528.452730

ARIMA(1,1,1) 1461.678503

Manual SARIMA

In [322]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Diff


erenced Data Autocorrelation')
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Dif
ferenced Data Patial Autocorrelation')
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [323]: df.plot()
plt.grid();

In [324]: (df1['Sparkling'].diff(6)).plot()
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [325]: (df1['Sparkling'].diff(6)).diff().plot()
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [326]: test_stationarity((train['Sparkling'].diff(6).dropna()).diff(1).dropna
())

Results of Dickey-Fuller Test:


Test Statistic -7.017242e+00
p-value 6.683657e-10
#Lags Used 1.300000e+01
Number of Observations Used 1.110000e+02
Critical Value (1%) -3.490683e+00
Critical Value (5%) -2.887952e+00
Critical Value (10%) -2.580857e+00
dtype: float64

In [327]: plot_acf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30
);

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [328]: import statsmodels.api as sm

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(0,1,2),
seasonal_order=(1, 1, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())

SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(0, 1, 2)x(1, 1, 2, 6) Log Likelihood
-814.410
Date: Sun, 20 Jun 2021 AIC
1640.821
Time: 23:10:17 BIC
1657.023
Sample: 0 HQIC
1647.393
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.7621 0.107 -7.150 0.000 -0.971
-0.553
ma.L2 -0.1433 0.113 -1.270 0.204 -0.364
0.078
ar.S.L6 -1.0186 0.009 -119.432 0.000 -1.035
-1.002
ma.S.L6 0.5540 0.143 3.875 0.000 0.274
0.834
ma.S.L12 -0.8700 0.174 -4.988 0.000 -1.212
-0.528
sigma2 9.961e+04 2.41e+04 4.129 0.000 5.23e+04

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1.47e+05
=======================================================================
============
Ljung-Box (Q): 28.53 Jarque-Bera (JB):
34.60
Prob(Q): 0.91 Prob(JB):
0.00
Heteroskedasticity (H): 1.83 Skew:
0.61
Prob(H) (two-sided): 0.07 Kurtosis:
5.46
=======================================================================
============

Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).

In [329]: results_manual_SARIMA_6.plot_diagnostics()
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [330]: predicted_manual_SARIMA_6 = results_manual_SARIMA_6.get_forecast(steps=
len(test))

In [331]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()

Out[331]:
y mean mean_se mean_ci_lower mean_ci_upper

0 1453.675438 394.525918 680.418848 2226.932029

1 1176.610635 405.533762 381.779068 1971.442203

2 1764.541710 407.243079 966.359942 2562.723477

3 1554.880467 408.947104 753.358872 2356.402062

4 1354.487019 410.644070 549.639430 2159.334607

In [332]: rmse = mean_squared_error(test['Sparkling'],predicted_manual_SARIMA_6.p


redicted_mean,squared=False)
print(rmse)

558.4383293762605

In [333]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}


,index=['SARIMA(0,1,0)(1,1,3,6)'])

resultsDf = pd.concat([resultsDf,temp_resultsDf])

resultsDf

Out[333]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA(1,1,2)(2,0,2,6) 626.880154

SARIMA(1,1,2)(1,0,2,12) 528.452730

ARIMA(1,1,1) 1461.678503

SARIMA(0,1,0)(1,1,3,6) 558.438329

8.Build a table with all the models built along with their
corresponding parameters and the respective RMSE
values on the test data.
In [334]: resultsDf

Out[334]:
Test RMSE

RegressionOnTime 1275.867052

NaiveModel 3864.279352

SimpleAverageModel 1275.081804

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

ARIMA(0,1,2) 1374.976948

SARIMA(1,1,2)(2,0,2,6) 626.880154

SARIMA(1,1,2)(1,0,2,12) 528.452730

ARIMA(1,1,1) 1461.678503

SARIMA(0,1,0)(1,1,3,6) 558.438329

In [335]: print('Sorted by RMSE values on the Test Data:','\n',)


resultsDf.sort_values(by=['Test RMSE'])

Sorted by RMSE values on the Test Data:

Out[335]:
Test RMSE

Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684

Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198

SARIMA(1,1,2)(1,0,2,12) 528.452730

SARIMA(0,1,0)(1,1,3,6) 558.438329

SARIMA(1,1,2)(2,0,2,6) 626.880154

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE

2pointTrailingMovingAverage 813.400684

4pointTrailingMovingAverage 1156.589694

Alpha=0.0,SimpleExponentialSmoothing 1275.081766

SimpleAverageModel 1275.081804

RegressionOnTime 1275.867052

6pointTrailingMovingAverage 1283.927428

9pointTrailingMovingAverage 1346.278315

ARIMA(0,1,2) 1374.976948

ARIMA(1,1,1) 1461.678503

Alpha=0.3,SimpleExponentialSmoothing 1935.507132

NaiveModel 3864.279352

Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704

9.Based on the model-building exercise, build the most


optimum model(s) on the complete data and predict 12
months into the future with appropriate confidence
intervals/bands.
In [336]: ## Plotting on both the Training and Test using autofit

plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')

plt.plot(TES_test['auto_predict'], label='Alpha=0.154,Beta=1.307,Gamma=
0.371,TripleExponentialSmoothing predictions on Test Set')

plt.legend(loc='best')
plt.grid();

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title('Plot of Exponential Smoothing Predictions and the Acutal Val
ues');

In [337]: fullmodel = ExponentialSmoothing(df1,


trend='additive',
seasonal='multiplicative').fit(smooth
ing_level=0.15422095573741845,
smooth
ing_slope=1.3071916052588054e-21,
smooth
ing_seasonal=0.3713310279757222)

In [338]: RMSE_fullmodel = metrics.mean_squared_error(df1['Sparkling'],fullmodel.


fittedvalues,squared=False)

print('RMSE:',RMSE_fullmodel)

RMSE: 353.91247239957886

In [339]: # Getting the predictions for the same number of times stamps that are

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
present in the test data
prediction= fullmodel.forecast(steps=len(test))

In [340]: df1.plot()
prediction.plot();

In [341]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()

Out[341]:
lower_CI prediction upper_ci

1995-08-31 1189.662677 1885.063397 2580.464118

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1995-09-30 lower_CI
1739.199777 prediction
2434.600498 upper_ci
3130.001218

1995-10-31 2503.481875 3198.882596 3894.283316

1995-11-30 3173.025555 3868.426275 4563.826996

1995-12-31 5340.512111 6035.912831 6731.313552

In [342]: # plot the forecast along with the confidence band

axis = df1.plot(label='Actual', figsize=(13,6))


pred_df['prediction'].plot(ax=axis, label='Forecast', alpha=0.5)
axis.fill_between(pred_df.index, pred_df['lower_CI'], pred_df['upper_c
i'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Sales')
plt.legend(loc='best')
plt.grid()
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
END

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

You might also like