SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF

Problem:
For this particular assignment, the data of different types

of wine sales in the 20th century is to be analysed. Both
of these data are from the same company but of different
wines. As an analyst in the ABC Estate Wines, you are
tasked to analyse and forecast Wine Sales in the 20th
century.
Dataset - Rose.csv
In [1]: import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6
1. Read the data as an appropriate Time Series data and plot the
data.
In [2]: df = pd.read_csv("Rose.csv")
In [3]: df.head()
Out[3]:
YearMonth Rose
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Rose
0 1980-01 112.0
1 1980-02 118.0
2 1980-03 129.0
3 1980-04 99.0
4 1980-05 116.0
In [4]: df.tail()
Out[4]:
YearMonth Rose
182 1995-03 45.0
183 1995-04 52.0
184 1995-05 28.0
185 1995-06 40.0
186 1995-07 62.0
In [5]: df.plot();
plt.grid()
In [6]: date = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date
Out[6]: DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',

'1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
'1980-09-30', '1980-10-31',
...
'1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
'1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
'1995-06-30', '1995-07-31'],
dtype='datetime64[ns]', length=187, freq='M')
In [7]: df['Time_Stamp'] = pd.DataFrame(date,columns=['Month'])
In [8]: df.head()
Out[8]:
YearMonth Rose Time_Stamp
0 1980-01 112.0 1980-01-31
1 1980-02 118.0 1980-02-29
YearMonth Rose Time_Stamp
2 1980-03 129.0 1980-03-31
3 1980-04 99.0 1980-04-30
4 1980-05 116.0 1980-05-31
In [9]: df['Time_Stamp']=pd.to_datetime(df['Time_Stamp'])
df= df.set_index('Time_Stamp')
df.drop(['YearMonth'],axis=1, inplace = True)
df.head()
Out[9]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0
In [10]: df.plot();
plt.grid()
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.
In [11]: df.describe(include='all').T
Out[11]:
count mean std min 25% 50% 75% max
Rose 185.0 90.394595 39.175344 28.0 63.0 86.0 112.0 267.0
In [12]: df.isnull().sum()
Out[12]: Rose 2
dtype: int64
In [13]: df.shape
Out[13]: (187, 1)
In [14]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 187 entries, 1980-01-31 to 1995-07-31
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rose 185 non-null float64
dtypes: float64(1)
memory usage: 2.9 KB
In [15]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot

sns.boxplot(x = df.index.year,y = df.values[:,0],ax=ax)
plt.xlabel('Yearly Rose Wine Sale');
plt.ylabel('Year');
plt.grid();
In [16]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.month_name(),y = df.values[:,0],ax=ax)
plt.xlabel('Monthly Rose Wine Sale');
plt.ylabel('Months');
plt.grid();
In [17]: from statsmodels.graphics.tsaplots import month_plot
fig, ax = plt.subplots(figsize=(22,8))
month_plot(df,ylabel='Rose Wine Production',ax=ax)

plt.grid();
In [18]: monthly_sales_across_years = pd.pivot_table(df, values = 'Rose', column

s = df.index.month_name(), index = df.index.year)
monthly_sales_across_years
Out[18]: Time_Stamp April August December February January July June March May Novembe
Time_Stamp
1980 99.0 129.0 267.0 118.0 112.0 118.0 168.0 129.0 116.0 150.0
1981 97.0 214.0 226.0 129.0 126.0 222.0 127.0 124.0 102.0 154.0
1982 97.0 117.0 169.0 77.0 89.0 117.0 121.0 82.0 127.0 134.0
1983 85.0 124.0 164.0 108.0 75.0 109.0 108.0 115.0 101.0 135.0
1984 87.0 142.0 159.0 85.0 88.0 87.0 87.0 112.0 91.0 139.0
1985 93.0 103.0 129.0 82.0 61.0 87.0 75.0 124.0 108.0 123.0
1986 71.0 118.0 141.0 65.0 57.0 110.0 67.0 67.0 76.0 107.0
1987 86.0 73.0 157.0 65.0 58.0 87.0 74.0 70.0 93.0 96.0
1988 66.0 77.0 135.0 115.0 63.0 79.0 83.0 70.0 67.0 100.0
1989 74.0 74.0 137.0 60.0 71.0 86.0 91.0 89.0 73.0 109.0
1990 77.0 70.0 132.0 69.0 43.0 78.0 76.0 73.0 69.0 110.0
1991 65.0 55.0 106.0 55.0 54.0 96.0 65.0 66.0 60.0 74.0
1992 53.0 52.0 91.0 47.0 34.0 67.0 55.0 56.0 53.0 58.0
1993 45.0 54.0 77.0 40.0 33.0 57.0 55.0 46.0 41.0 48.0
1994 48.0 NaN 84.0 35.0 30.0 NaN 45.0 42.0 44.0 63.0
1995 52.0 NaN NaN 39.0 30.0 62.0 40.0 45.0 28.0 NaN
In [19]: monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');
In [20]: df_yearly_sum = df.resample('A').sum()
df_yearly_sum.head()
Out[20]:
Rose
Time_Stamp
1980-12-31 1758.0
1981-12-31 1780.0
1982-12-31 1348.0
1983-12-31 1324.0
1984-12-31 1280.0
In [21]: df_yearly_sum.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');
In [22]: df_yearly_mean = df.resample('Y').mean()
df_yearly_mean.head()
Out[22]:
Rose
Time_Stamp
1980-12-31 146.500000
1981-12-31 148.333333
1982-12-31 112.333333
1983-12-31 110.333333
1984-12-31 106.666667
In [23]: df_yearly_mean.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');
In [24]: df_quarterly_sum = df.resample('Q').sum()
df_quarterly_sum.head()
Out[24]:
Rose
Time_Stamp
1980-03-31 359.0
1980-06-30 383.0
1980-09-30 452.0
1980-12-31 564.0
1981-03-31 379.0
In [25]: df_quarterly_sum.plot();
plt.grid()
In [26]: df_quarterly_mean = df.resample('Q').mean()
df_quarterly_mean.head()
Out[26]:
Rose
Time_Stamp
1980-03-31 119.666667
1980-06-30 127.666667
1980-09-30 150.666667
1980-12-31 188.000000
1981-03-31 126.333333
In [27]: df_quarterly_mean.plot();
plt.grid()
In [28]: df_daily_sum = df.resample('D').sum()
df_daily_sum
Out[28]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-01 0.0
1980-02-02 0.0
1980-02-03 0.0
1980-02-04 0.0
... ...
1995-07-27 0.0
1995-07-28 0.0
1995-07-29 0.0
Rose
Time_Stamp
1995-07-30 0.0
1995-07-31 62.0
5661 rows × 1 columns
In [29]: df_daily_sum.plot()
plt.grid();
In [30]: df_decade_sum = df.resample('10Y').sum()

df_decade_sum
Out[30]:
Rose
Time_Stamp
1980-12-31 1758.0
Rose
Time_Stamp
1990-12-31 12094.0
2000-12-31 2871.0
In [31]: df_decade_sum.plot();
plt.grid()
In [32]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Rose Wine Sales');
In [33]: # group by date and get average RetailSales, and precent change
average = df.groupby(df.index)["Rose"].mean()
pct_change = df.groupby(df.index)["Rose"].sum().pct_change()
fig, (axis1,axis2) = plt.subplots(2,1,sharex=True,figsize=(13,6))
# plot average RetailSales over time(year-month)

ax1 = average.plot(legend=True,ax=axis1,marker='o',title="Average Rose
Wine Sales",grid=True)
ax1.set_xticks(range(len(average)))
ax1.set_xticklabels(average.index.tolist())
# plot precent change for RetailSales over time(year-month)
ax2 = pct_change.plot(legend=True,ax=axis2,marker='o',colormap="summer"
,title="Rose Wine Sales Percent Change",grid=True)
In [34]: df['1994']
Out[34]:
Rose
Time_Stamp
1994-01-31 30.0
1994-02-28 35.0
1994-03-31 42.0
1994-04-30 48.0
1994-05-31 44.0
1994-06-30 45.0
1994-07-31 NaN
1994-08-31 NaN
1994-09-30 46.0
1994-10-31 51.0
Rose
Time_Stamp
1994-11-30 63.0
1994-12-31 84.0
In [35]: df.interpolate(methods='spline',order=3,inplace=True)
In [36]: df['1994']
Out[36]:
Rose
Time_Stamp
1994-01-31 30.000000
1994-02-28 35.000000
1994-03-31 42.000000
1994-04-30 48.000000
1994-05-31 44.000000
1994-06-30 45.000000
1994-07-31 45.333333
1994-08-31 45.666667
1994-09-30 46.000000
1994-10-31 51.000000
1994-11-30 63.000000
1994-12-31 84.000000
In [37]: df.isna().sum()
Out[37]: Rose 0
dtype: int64
In [38]: from statsmodels.tsa.seasonal import seasonal_decompose
In [39]: decomposition = seasonal_decompose(df['Rose'],model='additive')

decomposition.plot();
In [40]: trend = decomposition.trend

seasonality = decomposition.seasonal
residual = decomposition.resid
print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Name: trend, dtype: float64
Seasonality
Time_Stamp
1980-01-31 -27.908647
1980-02-29 -17.435632
1980-03-31 -9.285830
1980-04-30 -15.098330
1980-05-31 -10.196544
1980-06-30 -7.678687
1980-07-31 4.896908
1980-08-31 5.499686
1980-09-30 2.774686
1980-10-31 1.871908
1980-11-30 16.846908
1980-12-31 55.713575
Name: seasonal, dtype: float64
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 -33.980241
1980-08-31 -24.624686
1980-09-30 53.850314
1980-10-31 -2.955241
1980-11-30 -14.263575
1980-12-31 66.161425
Name: resid, dtype: float64
In [41]: deaseasonalized_ts = trend + residual
deaseasonalized_ts.head(12)
Out[41]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 113.103092
1980-08-31 123.500314
1980-09-30 202.225314
1980-10-31 145.128092
1980-11-30 133.153092
1980-12-31 211.286425
dtype: float64
In [42]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);
In [43]: decomposition = seasonal_decompose(df['Rose'],model='multiplicative')
decomposition.plot();
In [44]: trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Seasonality
Time_Stamp
1980-01-31 0.670111
1980-02-29 0.806163
1980-03-31 0.901164
1980-04-30 0.854024
1980-05-31 0.889415
1980-06-30 0.923985
1980-07-31 1.058038
1980-08-31 1.035881
1980-09-30 1.017648
1980-10-31 1.022573
1980-11-30 1.192349
1980-12-31 1.628646
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 0.758258
1980-08-31 0.840720
1980-09-30 1.357674
1980-10-31 0.970771
1980-11-30 0.853378
1980-12-31 1.129646
In [45]: deaseasonalized_ts = trend + residual

deaseasonalized_ts.head(12)
Out[45]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.841592
1980-08-31 148.965720
1980-09-30 149.732674
1980-10-31 149.054104
1980-11-30 148.270045
1980-12-31 146.254646
dtype: float64
In [46]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);
3. Split the data into training and test. The test data should start in
1991.
In [47]: train=df[df.index.year< 1991]

test=df[df.index.year>=1991]
In [48]: print(train.shape)
(132, 1)
In [49]: print(test.shape)
(55, 1)
In [50]: print('First few rows of Training Data','\n',train.head(),'\n')

print('Last few rows of Training Data','\n',train.tail(),'\n')
print('First few rows of Test Data','\n',test.head(),'\n')
print('Last few rows of Test Data','\n',test.tail(),'\n')
First few rows of Training Data

Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0
Last few rows of Training Data

Rose
Time_Stamp
1990-08-31 70.0
1990-09-30 83.0
1990-10-31 65.0
1990-11-30 110.0
1990-12-31 132.0
First few rows of Test Data

Rose
Time_Stamp
1991-01-31 54.0
1991-02-28 55.0
1991-03-31 66.0
1991-04-30 65.0
1991-05-31 60.0
Last few rows of Test Data

Rose
Time_Stamp
1995-03-31 45.0
1995-04-30 52.0
1995-05-31 28.0
1995-06-30 40.0
1995-07-31 62.0
In [51]: train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()
4. Build various exponential smoothing models on the training data

and evaluate the model using RMSE on the test data. Other models
such as regression,naïve forecast models and simple average
models. should also be built on the training data and check the
performance on the test data using RMSE.
In [52]: train_time = [i+1 for i in range(len(train))]

test_time = [i+43 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)
Training Time instance

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2
0, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 1
22, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6
0, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]
Model 1: Linear Regression
In [53]: LinearRegression_train = train.copy()

LinearRegression_test = test.copy()
In [54]: LinearRegression_train['time'] = train_time

LinearRegression_test['time'] = test_time
print('First few rows of Training Data','\n',LinearRegression_train.hea

d(),'\n')
print('Last few rows of Training Data','\n',LinearRegression_train.tail
(),'\n')
print('First few rows of Test Data','\n',LinearRegression_test.head(),'
\n')
print('Last few rows of Test Data','\n',LinearRegression_test.tail(),'
\n')

Rose time
Time_Stamp
1980-01-31 112.0 1
1980-02-29 118.0 2
1980-03-31 129.0 3
1980-04-30 99.0 4
1980-05-31 116.0 5

Rose time
Time Stamp
Time_Stamp
1990-08-31 70.0 128
1990-09-30 83.0 129
1990-10-31 65.0 130
1990-11-30 110.0 131

1990-12-31 132.0 132

Rose time
Time_Stamp
1991-01-31 54.0 43
1991-02-28 55.0 44
1991-03-31 66.0 45
1991-04-30 65.0 46
1991-05-31 60.0 47

Rose time
Time_Stamp
1995-03-31 45.0 93
1995-04-30 52.0 94
1995-05-31 28.0 95
1995-06-30 40.0 96
1995-07-31 62.0 97
In [55]: from sklearn.linear_model import LinearRegression
In [56]: lr = LinearRegression()
In [57]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Rose'].
values)
Out[57]: LinearRegression()
In [58]: test_predictions_model1 = lr.predict(LinearRegression_test[['ti

me']])
LinearRegression_test['RegOnTime'] = test_predictions_model1
plt.figure(figsize=(13,6))
plt.plot( train['Rose'], label='Train')
plt.plot(test['Rose'], label='Test')
plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_
Test Data')
plt.legend(loc='best')
plt.grid();
In [59]: from sklearn import metrics
In [60]: ## Test Data - RMSE
rmse_model1_test = metrics.mean_squared_error(test['Rose'],test_predict
ions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))
For RegressionOnTime forecast on the Test Data, RMSE is 51.433
In [61]: resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test]},index=['Regr
essionOnTime'])
resultsDf
Out[61]:
Test RMSE
RegressionOnTime 51.433312
Model 2 : Naive Approach
In [62]: NaiveModel_train = train.copy()

NaiveModel_test = test.copy()
In [63]: NaiveModel_test['naive'] = np.asarray(train['Rose'])[len(np.asarray(tra

in['Rose']))-1]
NaiveModel_test['naive'].head()
Out[63]: Time_Stamp
1991-01-31 132.0
1991-02-28 132.0
1991-03-31 132.0
1991-04-30 132.0
1991-05-31 132.0
Name: naive, dtype: float64
In [64]: plt.plot(NaiveModel_train['Rose'], label='Train')

plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')
plt.title("Rose Wine Naive Forecast")
plt.grid();
rmse_model2_test = metrics.mean_squared_error(test['Rose'],NaiveModel_t
est['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))
For Naive forecast on the Test Data, RMSE is 79.719
In [66]: resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test]},index=['Na

iveModel'])
resultsDf = pd.concat([resultsDf, resultsDf_2])

resultsDf
Out[66]:
Test RMSE
NaiveModel 79.718773
Model 3: Simple Average
In [67]: SimpleAverage_train = train.copy()

SimpleAverage_test = test.copy()
In [68]: SimpleAverage_test['mean_forecast'] = train['Rose'].mean()

SimpleAverage_test.head()
Out[68]:
Rose mean_forecast
Time_Stamp
1991-01-31 54.0 104.939394
1991-02-28 55.0 104.939394
1991-03-31 66.0 104.939394
1991-04-30 65.0 104.939394
1991-05-31 60.0 104.939394
In [69]: plt.plot(SimpleAverage_train['Rose'], label='Train')

plt.plot(SimpleAverage_test['Rose'], label='Test')
plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on
Test Data')
plt.title("Rose Wine Simple Average Forecast")
plt.grid();
rmse_model3_test = metrics.mean_squared_error(test['Rose'],SimpleAverag
e_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))
For Simple Average forecast on the Test Data, RMSE is 53.461
In [71]: resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_model3_test]},index=['Si

mpleAverageModel'])

resultsDf
Out[71]:
Test RMSE
Test RMSE
SimpleAverageModel 53.460570
Model 4: Moving Average
In [72]: MovingAverage = df.copy()

MovingAverage.head()
Out[72]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0
In [73]: MovingAverage['Trailing_2'] = MovingAverage['Rose'].rolling(2).mean()

MovingAverage['Trailing_4'] = MovingAverage['Rose'].rolling(4).mean()
Out[73]:
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
1980-01-31 112.0 NaN NaN NaN NaN
1980-02-29 118.0 115.0 NaN NaN NaN
1980-03-31 129.0 123.5 NaN NaN NaN
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
1980-04-30 99.0 114.0 114.5 NaN NaN
1980-05-31 116.0 107.5 115.5 NaN NaN
In [74]: # Plotting on the whole data
plt.plot(MovingAverage['Rose'], label='Train')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.title("Rose Wine Moving Average")
plt.legend(loc = 'best')
plt.grid();
In [75]: #Creating train and test set

trailing_MovingAverage_train=MovingAverage[df.index.year< 1991]
trailing_MovingAverage_test=MovingAverage[df.index.year>=1991]
In [76]: ## Plotting on both the Training and Test data
plt.plot(trailing_MovingAverage_train['Rose'], label='Train')
plt.plot(trailing_MovingAverage_test['Rose'], label='Test')
plt.plot(trailing_MovingAverage_train['Trailing_2'], label='2 Point Tra

iling Moving Average on Training Set')
plt.plot(trailing_MovingAverage_train['Trailing_6'],label = '6 Point Tr
ailing Moving Average on Training Set')
plt.plot(trailing_MovingAverage_test['Trailing_2'], label='2 Point Trai

ling Moving Average on Test Set')
plt.plot(trailing_MovingAverage_test['Trailing_6'],label = '6 Point Tra
iling Moving Average on Test Set')
plt.title("Rose Wine Moving Average")
plt.grid();
In [77]: ## Test Data - RMSE --> 2 point Trailing MA
rmse_model4_test2 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_2'],squared=False)
print("For 2 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test2))
## Test Data - RMSE --> 4 point Trailing MA
RMSE is %3.3f " %(rmse_model4_test9))
For 2 point Moving Average Model forecast on the Training Data, RMSE i
s 11.529
s 14.451
s 14.566
s 14.728
In [78]: resultsDf_4 = pd.DataFrame({'Test RMSE': [rmse_model4_test2,rmse_model4

_test4
,rmse_model4_test6,rmse_model
4_test9]}
,index=['2pointTrailingMovingAverage','4poin
tTrailingMovingAverage'
,'6pointTrailingMovingAverage','9poi
ntTrailingMovingAverage'])

resultsDf
Out[78]:
Test RMSE
2pointTrailingMovingAverage 11.529278
9 i tT ili M i A 14 727630
Test RMSE
Model 5: Simple Exponential Smoothing
In [79]: from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothin

g, Holt
import warnings
warnings.filterwarnings("ignore")
In [80]: SES_train = train.copy()

SES_test = test.copy()
In [81]: model_SES = SimpleExpSmoothing(SES_train['Rose'])
In [82]: model_SES_autofit = model_SES.fit(optimized=True)
In [83]: model_SES_autofit.params
Out[83]: {'smoothing_level': 0.09874997205700975,

'smoothing_slope': nan,
'smoothing_seasonal': nan,
'damping_slope': nan,
'initial_level': 134.3867859443936,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
In [84]: SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))

SES_test.head()
Out[84]:
Rose predict
Time_Stamp
Rose predict
Time_Stamp
1991-01-31 54.0 87.105001
1991-02-28 55.0 87.105001
1991-03-31 66.0 87.105001
1991-04-30 65.0 87.105001
1991-05-31 60.0 87.105001
plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')
plt.plot(SES_test['predict'], label='Alpha =0.098 Simple Exponential Sm

oothing predictions on Test Set')
plt.grid()
plt.title('Alpha =0.098 Predictions for Rose Wine');
In [86]: ## Test Data
rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Rose'],SES_te
st['predict'],squared=False)
print("For Alpha =0.098 Simple Exponential Smoothing Model forecast on
the Test Data, RMSE is %3.3f" %(rmse_model5_test_1))
For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Tes
t Data, RMSE is 36.796
In [87]: resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1]},index=[

'Alpha=0.098,SimpleExponentialSmoothing'])

resultsDf
Out[87]:
Test RMSE
Test RMSE
Alpha=0.098,SimpleExponentialSmoothing 36.796244
In [88]: ## First we will define an empty dataframe to store our values from the
loop
resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMS

E': []})
resultsDf_6
Out[88]:
Alpha Values Train RMSE Test RMSE
In [89]: for i in np.arange(0.3,1,0.1):

model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False
,use_brute=True)
SES_train['predict',i] = model_SES_alpha_i.fittedvalues
SES_test['predict',i] = model_SES_alpha_i.forecast(steps=len(test))
rmse_model5_train_i = metrics.mean_squared_error(SES_train['Rose'],
SES_train['predict',i],squared=False)
rmse_model5_test_i = metrics.mean_squared_error(SES_test['Rose'],SE
S_test['predict',i],squared=False)
resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train RMSE':rms

e_model5_train_i
,'Test RMSE':rmse_model5_test_i},
ignore_index=True)
In [90]: resultsDf_6.sort_values(by=['Test RMSE'],ascending=True)
Out[90]:
0 0.3 32.470164 47.504821
1 0.4 33.035130 53.767406
2 0.5 33.682839 59.641786
3 0.6 34.441171 64.971288
4 0.7 35.323261 69.698162
5 0.8 36.334596 73.773992
6 0.9 37.482782 77.139276
plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')
plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smooth

ing predictions on Test Set')
plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential

Smoothing predictions on Test Set')
plt.title("Alpha= 0.3 Simple Exponential Smoothing Rose Wine")
plt.grid();
In [92]: resultsDf_6_1 = pd.DataFrame({'Test RMSE': [resultsDf_6.sort_values(by=
['Test RMSE'],ascending=True).values[0][2]]}
,index=['Alpha=0.3,SimpleExponentialSmoothin
g'])
resultsDf = pd.concat([resultsDf, resultsDf_6_1])

resultsDf
Out[92]:
Test RMSE
Test RMSE
Model 6: Double Exponential Smoothing - Holt's Model
In [93]: DES_train = train.copy()

DES_test = test.copy()
In [94]: model_DES = Holt(DES_train['Rose'])
loop
resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train R

MSE':[],'Test RMSE': []})
resultsDf_7
Out[95]:
Alpha Values Beta Values Train RMSE Test RMSE
In [96]: for i in np.arange(0.3,1.1,0.1):

for j in np.arange(0.1,1.1,0.1):
model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing
_slope=j,optimized=False,use_brute=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=le
n(test))
rmse_model6_train = metrics.mean_squared_error(DES_train['Rose'
],DES_train['predict',i,j],squared=False)
rmse_model6_test = metrics.mean_squared_error(DES_test['Rose'],
DES_test['predict',i,j],squared=False)
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Value
s':j,'Train RMSE':rmse_model6_train
,'Test RMSE':rmse_model6_test
}, ignore_index=True)
In [97]: resultsDf_7
Out[97]:
0 0.3 0.1 33.611269 98.653317
1 0.3 0.2 34.645117 177.140327
2 0.3 0.3 35.944983 265.567594
3 0.3 0.4 37.393239 358.750942
4 0.3 0.5 38.888325 451.810230
... ... ... ... ...
75 1.0 0.6 51.831610 801.680218
76 1.0 0.7 54.497039 841.892573
77 1.0 0.8 57.365879 853.965537
78 1.0 0.9 60.474309 834.710935
79 1.0 1.0 63.873454 780.079579
In [98]: resultsDf_7.sort_values(by=['Test RMSE']).head()
Out[98]:
0 0.3 0.1 33.611269 98.653317
10 0.4 0.1 34.255060 128.978579
20 0.5 0.1 34.957515 155.358815
1 0.3 0.2 34.645117 177.140327
30 0.6 0.1 35.781643 178.004967
plt.plot(DES_train['Rose'], label='Train')
plt.plot(DES_test['Rose'], label='Test')
plt.plot(DES_test['predict', 0.3, 0.1], label='Alpha=0.3,Beta=0.1,Doubl

eExponentialSmoothing predictions on Test Set')
plt.title("Alpha= 0.3, Beta= 0.1 Double Exponential Smoothing Rose Win
e")
plt.grid();

['Test RMSE']).values[0][3]]}
,index=['Alpha=0.3,Beta=0.1,DoubleExponentia
lSmoothing'])

resultsDf
Out[100]:
Test RMSE
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Model 7: Triple Exponential Smoothing - Holt Winter's Model
In [101]: TES_train = train.copy()

TES_test = test.copy()
In [102]: model_TES = ExponentialSmoothing(TES_train['Rose'],trend='additive',sea

sonal='multiplicative',freq='M')
In [103]: model_TES_autofit = model_TES.fit()
In [104]: model_TES_autofit.params
'smoothing_slope': 0.04843854924060428,
'smoothing_seasonal': 0.0,
'initial_level': 76.65565156766287,
'initial_slope': 0.0,
'initial_seasons': array([1.47550285, 1.65927163, 1.80572657, 1.588888
44, 1.77822725,
1.92604397, 2.11649487, 2.25135227, 2.11690616, 2.08112861,
2.40927315, 3.30448176]),
'lamda': None,
In [105]: ## Prediction on the test data
TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()
Out[105]:
Rose auto_predict
Time_Stamp
1991-01-31 54.0 56.674338
1991-02-28 55.0 63.471275
1991-03-31 66.0 68.788790
1991-04-30 65.0 60.277828
1991-05-31 60.0 67.180379
In [106]: ## Plotting on both the Training and Test using autofit
plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')
plt.plot(TES_test['auto_predict'], label='Alpha=0.106,Beta=0.048,Gamma=
0.0,TripleExponentialSmoothing predictions on Test Set')
plt.title("Alpha= 0.106, Beta=0.048, Gamma=0.0 Triple Exponential Smoot
hing Rose Wine")
plt.grid();
rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Rose'],TES_te
st['auto_predict'],squared=False)
print("For Alpha=0.106,Beta=0.048,Gamma=0.0, Triple Exponential Smoothi
ng Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_test_
1))
For Alpha=0.106,Beta=0.048,Gamma=0.0, Triple Exponential Smoothing Mode

l forecast on the Test Data, RMSE is 17.369
In [108]: resultsDf_8_1 = pd.DataFrame({'Test RMSE': [rmse_model6_test_1]}

,index=['Alpha=0.106,Beta=0.048,Gamma=0.0,Tr
ipleExponentialSmoothing'])
resultsDf
Out[108]:
Test RMSE
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
loop
resultsDf_8_2 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Gamma

Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_8_2
Out[109]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE

for k in np.arange(0.3,1.1,0.1):
model_TES_alpha_i_j_k = model_TES.fit(smoothing_level=i,smo
othing_slope=j,smoothing_seasonal=k,optimized=False,use_brute=True)
TES_train['predict',i,j,k] = model_TES_alpha_i_j_k.fittedva
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))
rmse_model8_train = metrics.mean_squared_error(TES_train['R
ose'],TES_train['predict',i,j,k],squared=False)
rmse_model8_test = metrics.mean_squared_error(TES_test['Ros
e'],TES_test['predict',i,j,k],squared=False)
resultsDf_8_2 = resultsDf_8_2.append({'Alpha Values':i,'Bet

a Values':j,'Gamma Values':k,
'Train RMSE':rmse_mod
el8_train,'Test RMSE':rmse_model8_test}
, ignore_index=True)
In [111]: resultsDf_8_2.sort_values(by=['Test RMSE']).head()
Out[111]:
8 0.3 0.4 0.3 28.111886 10.945435
1 0.3 0.3 0.4 27.399095 11.201633
69 0.4 0.3 0.8 32.601491 12.615607
16 0.3 0.5 0.3 29.087520 14.414604
131 0.5 0.3 0.6 32.144773 16.720720
In [112]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination
plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')
#The value of alpha and beta is taken like that by python

plt.plot(TES_test['predict', 0.3, 0.4, 0.3], label='Alpha=0.3,Beta=0.4,
Gamma=0.3,TripleExponentialSmoothing predictions on Test Set')
plt.title("Alpha= 0.3, Beta=0.4, Gamma=0.3 Triple Exponential Smoothing
Rose Wine")
plt.grid();
In [113]: resultsDf_8_3 = pd.DataFrame({'Test RMSE': [resultsDf_8_2.sort_values(b

y=['Test RMSE']).values[0][4]]}
,index=['Alpha=0.3,Beta=0.4,Gamma=0.3,Triple
ExponentialSmoothing'])

resultsDf
Out[113]:
Test RMSE
Test RMSE
5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.
In [114]: ## Test for stationarity of the series - Dicky Fuller test
from statsmodels.tsa.stattools import adfuller

def test_stationarity(timeseries):
#Determing rolling statistics

rolmean = timeseries.rolling(window=7).mean() #determining the roll
ing mean
rolstd = timeseries.rolling(window=7).std() #determining the roll
ing standard deviation
#Plot rolling statistics:

orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.title('Rolling Mean & Standard Deviation For Rose Wine Producti
on')
plt.show(block=False)
#Perform Dickey-Fuller test:

print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value'
,'#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')
In [115]: test_stationarity(df['Rose'])
Results of Dickey-Fuller Test:

Test Statistic -1.876699
p-value 0.343101
#Lags Used 13.000000
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
dtype: float64
In [116]: test_stationarity(df['Rose'].diff().dropna())

Test Statistic -8.044392e+00
p-value 1.810895e-12
#Lags Used 1.200000e+01
Number of Observations Used 1.730000e+02
Critical Value (1%) -3.468726e+00
dtype: float64
6. Build an automated version of the ARIMA/SARIMA model in which
the parameters are selected using the lowest Akaike Information
Criteria (AIC) on the training data and evaluate this model on the test
data using RMSE.
In [117]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [118]: plot_acf(df['Rose'],lags=50)
plot_acf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Difference
d Data Autocorrelation')
plt.show()
In [119]: plot_pacf(df['Rose'],lags=50)
plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Differenc
ed Data Partial Autocorrelation')
plt.show()
In [120]: test_stationarity(train['Rose'])
p-value 0.219476
dtype: float64
In [121]: test_stationarity(train['Rose'].diff().dropna())
p-value 7.061944e-09
#Lags Used 1.200000e+01
dtype: float64
Automated ARIMA Model
In [122]: import itertools

p = q = range(0, 3)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
print('Some parameter combinations for the Model...')
for i in range(1,len(pdq)):
print('Model: {}'.format(pdq[i]))
Some parameter combinations for the Model...
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
In [123]: # Creating an empty Dataframe with column names only

ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
ARIMA_AIC
Out[123]:
param AIC
In [124]: from statsmodels.tsa.arima_model import ARIMA
for param in pdq:

ARIMA_model = ARIMA(train['Rose'].values,order=param).fit()
print('ARIMA{} - AIC:{}'.format(param,ARIMA_model.aic))
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic
ARIMA(0, 1, 0) - AIC:1335.1526583086775
ARIMA(0, 1, 1) - AIC:1280.7261830464054
ARIMA(0, 1, 2) - AIC:1276.8353768178954
ARIMA(1, 1, 0) - AIC:1319.3483105801852
ARIMA(1, 1, 1) - AIC:1277.775753909814
ARIMA(1, 1, 2) - AIC:1277.3592237914563
ARIMA(2, 1, 0) - AIC:1300.6092611743886
ARIMA(2, 1, 1) - AIC:1279.0456894093122
ARIMA(2, 1, 2) - AIC:1279.298693936773
In [125]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value
ARIMA_AIC.sort_values(by='AIC',ascending=True)
Out[125]:
param AIC
2 (0, 1, 2) 1276.835377
5 (1, 1, 2) 1277.359224
4 (1, 1, 1) 1277.775754
7 (2, 1, 1) 1279.045689
8 (2, 1, 2) 1279.298694
1 (0, 1, 1) 1280.726183
6 (2, 1, 0) 1300.609261
3 (1, 1, 0) 1319.348311
0 (0, 1, 0) 1335.152658
In [126]: auto_ARIMA = ARIMA(train['Rose'], order=(0,1,2),freq='M')
results_auto_ARIMA = auto_ARIMA.fit()
print(results_auto_ARIMA.summary())
ARIMA Model Results

=======================================================================
=======
Dep. Variable: D.Rose No. Observations:
131
Model: ARIMA(0, 1, 2) Log Likelihood -
634.418
Method: css-mle S.D. of innovations
30.167
Date: Sun, 20 Jun 2021 AIC 1
276.835
Time: 23:02:56 BIC 1
288.336
Sample: 02-29-1980 HQIC 1
281.509
- 12-31-1990
=======================================================================
=========
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
---------
const -0.4885 0.085 -5.742 0.000 -0.655
-0.322
ma.L1.D.Rose -0.7601 0.101 -7.499 0.000 -0.959
-0.561
ma.L2.D.Rose -0.2398 0.095 -2.518 0.012 -0.427
-0.053
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
MA.1 1.0001 +0.0000j 1.0001
0.0000
MA.2 -4.1695 +0.0000j 4.1695
0.5000
-----------------------------------------------------------------------
------
In [127]: predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(test))
In [128]: from sklearn.metrics import mean_squared_error

rmse = mean_squared_error(test['Rose'],predicted_auto_ARIMA[0],squared=
False)
print(rmse)
15.618912366977964
In [129]: resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}
,index=['ARIMA(0,1,2)'])
resultsDf_9
Out[129]:
Test RMSE
ARIMA(0,1,2) 15.618912


resultsDf
Out[130]:
Test RMSE
ARIMA(0,1,2) 15.618912
Automated SARIMA Model
In [131]: plot_acf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Aut

ocorrelation for Rose Wine Production')
plt.show()
Setting the seasonality as 6 for the first iteration of the auto SARIMA model.
We see that there can be a seasonality of 6 as well as 12. We will run our auto SARIMA models
by setting seasonality both as 6 and 12.

p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]
print('Examples of some parameter combinations for Model...')
print('Model: {}{}'.format(pdq[i], model_pdq[i]))
Examples of some parameter combinations for Model...

Model: (0, 1, 1)(0, 0, 1, 6)
Model: (0, 1, 2)(0, 0, 2, 6)
Model: (1, 1, 0)(1, 0, 0, 6)
Model: (1, 1, 1)(1, 0, 1, 6)
Model: (1, 1, 2)(1, 0, 2, 6)
Model: (2, 1, 0)(2, 0, 0, 6)
Model: (2, 1, 1)(2, 0, 1, 6)
Model: (2, 1, 2)(2, 0, 2, 6)
In [133]: SARIMA_AIC = pd.DataFrame(columns=['param','seasonal', 'AIC'])

SARIMA_AIC
Out[133]:
param seasonal AIC
In [134]: import statsmodels.api as sm
for param in pdq:

for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_season
al,
enforce_stationarity=False,
enforce_invertibility=False
)
results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)
SARIMA(0, 1, 0)x(0, 0, 0, 6) - AIC:1323.9657875279158

SARIMA(0, 1, 0)x(0, 0, 1, 6) - AIC:1264.4996261104898
SARIMA(0, 1, 0)x(0, 0, 2, 6) - AIC:1144.7077471821074
SARIMA(0, 1, 0)x(1, 0, 0, 6) - AIC:1274.7897737087983
SARIMA(0, 1, 0)x(1, 0, 1, 6) - AIC:1241.7870945124614
SARIMA(0, 1, 0)x(1, 0, 2, 6) - AIC:1146.309326670239
SARIMA(0, 1, 0)x(2, 0, 0, 6) - AIC:1137.9167236212038
SARIMA(0, 1, 0)x(2, 0, 1, 6) - AIC:1137.453362940508
SARIMA(0, 1, 0)x(2, 0, 2, 6) - AIC:1117.0224424595006
SARIMA(0, 1, 1)x(0, 0, 0, 6) - AIC:1263.536909726376
SARIMA(0, 1, 1)x(0, 0, 1, 6) - AIC:1201.383254801385
SARIMA(0, 1, 1)x(0, 0, 2, 6) - AIC:1097.1908217817183
SARIMA(0, 1, 1)x(1, 0, 0, 6) - AIC:1222.4354735757836
SARIMA(0, 1, 1)x(1, 0, 1, 6) - AIC:1160.4386253543662
SARIMA(0, 1, 1)x(1, 0, 2, 6) - AIC:1085.0930138762844
SARIMA(0, 1, 1)x(2, 0, 0, 6) - AIC:1095.7490380128581
SARIMA(0, 1, 1)x(2, 0, 1, 6) - AIC:1097.6455026282051
SARIMA(0, 1, 1)x(2, 0, 2, 6) - AIC:1053.0044075906678
SARIMA(0, 1, 2)x(0, 0, 0, 6) - AIC:1251.6675430829725
SARIMA(0, 1, 2)x(0, 0, 1, 6) - AIC:1192.0017194588286
SARIMA(0, 1, 2)x(0, 0, 2, 6) - AIC:1081.8324069308578
SARIMA(0, 1, 2)x(1, 0, 0, 6) - AIC:1222.0132244585552
SARIMA(0, 1, 2)x(1, 0, 1, 6) - AIC:1153.8519348143127
SARIMA(0, 1, 2)x(1, 0, 2, 6) - AIC:1061.4359815067078
SARIMA(0, 1, 2)x(2, 0, 0, 6) - AIC:1089.0244978427286
SARIMA(0, 1, 2)x(2, 0, 1, 6) - AIC:1090.226508419863
SARIMA(0, 1, 2)x(2, 0, 2, 6) - AIC:1043.6002608608978
SARIMA(1, 1, 0)x(0, 0, 0, 6) - AIC:1308.161871082466
SARIMA(1, 1, 0)x(0, 0, 1, 6) - AIC:1249.8763226674776
SARIMA(1, 1, 0)x(0, 0, 2, 6) - AIC:1135.549810610796
SARIMA(1, 1, 0)x(1, 0, 0, 6) - AIC:1250.6246888205665
SARIMA(1, 1, 0)x(1, 0, 1, 6) - AIC:1230.6009595740952
SARIMA(1, 1, 0)x(1, 0, 2, 6) - AIC:1133.8029697276697
SARIMA(1, 1, 0)x(2, 0, 0, 6) - AIC:1123.2830148973985
SARIMA(1, 1, 0)x(2, 0, 1, 6) - AIC:1120.9425391966383
SARIMA(1, 1, 0)x(2, 0, 2, 6) - AIC:1105.909264565081
SARIMA(1, 1, 1)x(0, 0, 0, 6) - AIC:1262.1840064428402
SARIMA(1, 1, 1)x(0, 0, 0, 6) AIC:1262.1840064428402
SARIMA(1, 1, 1)x(0, 0, 1, 6) - AIC:1201.5037144792482
SARIMA(1, 1, 1)x(0, 0, 2, 6) - AIC:1093.6044318878214
SARIMA(1, 1, 1)x(1, 0, 0, 6) - AIC:1213.6233142837375
SARIMA(1, 1, 1)x(1, 0, 1, 6) - AIC:1162.4240004669416
SARIMA(1, 1, 1)x(1, 0, 2, 6) - AIC:1083.2585835667671
SARIMA(1, 1, 1)x(2, 0, 0, 6) - AIC:1083.9006911564184
SARIMA(1, 1, 1)x(2, 0, 1, 6) - AIC:1083.1711266498435
SARIMA(1, 1, 1)x(2, 0, 2, 6) - AIC:1052.7784687435267
SARIMA(1, 1, 2)x(0, 0, 0, 6) - AIC:1251.9495040731845
SARIMA(1, 1, 2)x(0, 0, 1, 6) - AIC:1193.2804055763945
SARIMA(1, 1, 2)x(0, 0, 2, 6) - AIC:1083.8066266793235
SARIMA(1, 1, 2)x(1, 0, 0, 6) - AIC:1213.2183953173044
SARIMA(1, 1, 2)x(1, 0, 1, 6) - AIC:1155.4829112286116
SARIMA(1, 1, 2)x(1, 0, 2, 6) - AIC:1061.3428439327972
SARIMA(1, 1, 2)x(2, 0, 0, 6) - AIC:1081.9393759564061
SARIMA(1, 1, 2)x(2, 0, 1, 6) - AIC:1083.4056423060708
SARIMA(1, 1, 2)x(2, 0, 2, 6) - AIC:1041.6558173267808
SARIMA(2, 1, 0)x(0, 0, 0, 6) - AIC:1280.2537561535767
SARIMA(2, 1, 0)x(0, 0, 1, 6) - AIC:1231.9630734580444
SARIMA(2, 1, 0)x(0, 0, 2, 6) - AIC:1128.9876565093011
SARIMA(2, 1, 0)x(1, 0, 0, 6) - AIC:1219.0664589804435
SARIMA(2, 1, 0)x(1, 0, 1, 6) - AIC:1186.6130717644878
SARIMA(2, 1, 0)x(1, 0, 2, 6) - AIC:1111.6702481022785
SARIMA(2, 1, 0)x(2, 0, 0, 6) - AIC:1099.0398509003082
SARIMA(2, 1, 0)x(2, 0, 1, 6) - AIC:1093.0537127923583
SARIMA(2, 1, 0)x(2, 0, 2, 6) - AIC:1078.6114741686215
SARIMA(2, 1, 1)x(0, 0, 0, 6) - AIC:1263.2315232809276
SARIMA(2, 1, 1)x(0, 0, 1, 6) - AIC:1201.4126986427814
SARIMA(2, 1, 1)x(0, 0, 2, 6) - AIC:1092.4754616851496
SARIMA(2, 1, 1)x(1, 0, 0, 6) - AIC:1199.8335858392797
SARIMA(2, 1, 1)x(1, 0, 1, 6) - AIC:1161.5686920204394
SARIMA(2, 1, 1)x(1, 0, 2, 6) - AIC:1079.8188703829414
SARIMA(2, 1, 1)x(2, 0, 0, 6) - AIC:1071.6995915165269
SARIMA(2, 1, 1)x(2, 0, 1, 6) - AIC:1068.4781624473278
SARIMA(2, 1, 1)x(2, 0, 2, 6) - AIC:1051.6734609013886
SARIMA(2, 1, 2)x(0, 0, 0, 6) - AIC:1253.9102116555646
SARIMA(2, 1, 2)x(0, 0, 1, 6) - AIC:1185.7695607122846
SARIMA(2, 1, 2)x(0, 0, 2, 6) - AIC:1082.558108053046
SARIMA(2, 1, 2)x(1, 0, 0, 6) - AIC:1200.4217492658531
SARIMA(2, 1, 2)x(1, 0, 1, 6) - AIC:1148.0237769524824
SARIMA(2, 1, 2)x(1, 0, 1, 6) AIC:1148.0237769524824
SARIMA(2, 1, 2)x(1, 0, 2, 6) - AIC:1063.110321547121
SARIMA(2, 1, 2)x(2, 0, 0, 6) - AIC:1073.6961455819637
SARIMA(2, 1, 2)x(2, 0, 1, 6) - AIC:1070.0771818695766
SARIMA(2, 1, 2)x(2, 0, 2, 6) - AIC:1045.2869001173763
In [135]: SARIMA_AIC.sort_values(by=['AIC']).head()
Out[135]:
param seasonal AIC
53 (1, 1, 2) (2, 0, 2, 6) 1041.655817
26 (0, 1, 2) (2, 0, 2, 6) 1043.600261
80 (2, 1, 2) (2, 0, 2, 6) 1045.286900
71 (2, 1, 1) (2, 0, 2, 6) 1051.673461
44 (1, 1, 1) (2, 0, 2, 6) 1052.778469
auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 2),
seasonal_order=(2, 0, 2, 6),
enforce_invertibility=False)
results_auto_SARIMA_6 = auto_SARIMA_6.fit(maxiter=1000)
print(results_auto_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 6) Log Likelihood
-512.828
Date: Sun, 20 Jun 2021 AIC
1041.656
Time: 23:03:40 BIC
1063.685
Sample: 0 HQIC
1050.598
- 132
Covariance Type: opg
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.5939 0.152 -3.899 0.000 -0.892
-0.295
ma.L1 -0.1954 841.392 -0.000 1.000 -1649.294 1
648.903
ma.L2 -0.8046 676.954 -0.001 0.999 -1327.610 1
326.000
ar.S.L6 -0.0625 0.035 -1.764 0.078 -0.132
0.007
ar.S.L12 0.8451 0.039 21.885 0.000 0.769
0.921
ma.S.L6 0.2226 988.816 0.000 1.000 -1937.820 1
938.266
ma.S.L12 -0.7774 768.775 -0.001 0.999 -1507.549 1
505.994
sigma2 335.2016 4.16e+05 0.001 0.999 -8.14e+05
8.15e+05
=======================================================================
============
Ljung-Box (Q): 15.89 Jarque-Bera (JB):
56.68
Prob(Q): 1.00 Prob(JB):
0.00
Heteroskedasticity (H): 0.47 Skew:
0.52
Prob(H) (two-sided): 0.02 Kurtosis:
6.26
=======================================================================
============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [137]: results_auto_SARIMA_6.plot_diagnostics()
plt.show()
In [138]: predicted_auto_SARIMA_6 = results_auto_SARIMA_6.get_forecast(steps=len(

test))
In [139]: predicted_auto_SARIMA_6.summary_frame(alpha=0.05).head()
Out[139]:
y mean mean_se mean_ci_lower mean_ci_upper
0 62.840510 18.848142 25.898830 99.782190
1 67.629884 19.300034 29.802513 105.457256
2 74.746090 19.412600 36.698092 112.794087
3 71.324853 19.475547 33.153482 109.496224
4 76.017352 19.483825 37.829756 114.204948
In [140]: rmse = mean_squared_error(test['Rose'],predicted_auto_SARIMA_6.predicte

d_mean,squared=False)
print(rmse)
26.133554443168514
In [141]: temp_resultsDf = pd.DataFrame({'Test RMSE': [rmse]}

,index=['SARIMA(1,1,2)(2,0,2,6)'])
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[141]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
Setting the seasonality as 12 for the iteration of the auto SARIMA model.

p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
model_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p,
D, q))]

Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)

SARIMA_AIC
Out[143]:
param seasonal AIC
for param in pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
al,
)
lts_SARIMA.aic))
SARIMA(0, 1, 0)x(0, 0, 0, 12) - AIC:1323.9657875279158

SARIMA(0, 1, 0)x(0, 0, 1, 12) - AIC:1145.423082720736
SARIMA(0, 1, 0)x(0, 0, 2, 12) - AIC:976.4375296380883
SARIMA(0, 1, 0)x(1, 0, 0, 12) - AIC:1139.921738995602
SARIMA(0, 1, 0)x(1, 0, 1, 12) - AIC:1116.020786968748
SARIMA(0, 1, 0)x(1, 0, 2, 12) - AIC:969.691364004579
SARIMA(0, 1, 0)x(2, 0, 0, 12) - AIC:960.8812220353041
SARIMA(0, 1, 0)x(2, 0, 1, 12) - AIC:962.8794541633738
SARIMA(0, 1, 0)x(2, 0, 2, 12) - AIC:955.5735409343464
SARIMA(0, 1, 1)x(0, 0, 0, 12) - AIC:1263.536909726376
SARIMA(0, 1, 1)x(0, 0, 1, 12) - AIC:1098.5554825916975
SARIMA(0, 1, 1)x(0, 0, 2, 12) - AIC:923.6314049709464
SARIMA(0, 1, 1)x(1, 0, 0, 12) - AIC:1095.7936324695297
SARIMA(0, 1, 1)x(1, 0, 1, 12) - AIC:1054.7434331283894
SARIMA(0, 1, 1)x(1, 0, 2, 12) - AIC:918.857348260166
SARIMA(0, 1, 1)x(2, 0, 0, 12) - AIC:914.5982866189928
SARIMA(0, 1, 1)x(2, 0, 1, 12) - AIC:915.3332430440274
SARIMA(0, 1, 1)x(2, 0, 2, 12) - AIC:901.1988255682908
SARIMA(0, 1, 2)x(0, 0, 0, 12) - AIC:1251.6675430829725
SARIMA(0, 1, 2)x(0, 0, 1, 12) - AIC:1083.4866975348887
SARIMA(0, 1, 2)x(0, 0, 2, 12) - AIC:913.493848661737
SARIMA(0, 1, 2)x(1, 0, 0, 12) - AIC:1088.8332843917171
SARIMA(0, 1, 2)x(1, 0, 1, 12) - AIC:1045.5400933749636
SARIMA(0, 1, 2)x(1, 0, 2, 12) - AIC:904.8310906189827
SARIMA(0, 1, 2)x(2, 0, 0, 12) - AIC:913.0105912076175
SARIMA(0 1 2) (2 0 1 12) AIC 914 1707544194443
SARIMA(0, 1, 2)x(2, 0, 1, 12) - AIC:914.1707544194443
SARIMA(0, 1, 2)x(2, 0, 2, 12) - AIC:887.9375086839932
SARIMA(1, 1, 0)x(0, 0, 0, 12) - AIC:1308.161871082466
SARIMA(1, 1, 0)x(0, 0, 1, 12) - AIC:1135.2955447611132
SARIMA(1, 1, 0)x(0, 0, 2, 12) - AIC:963.9405391280565
SARIMA(1, 1, 0)x(1, 0, 0, 12) - AIC:1124.8860786766481
SARIMA(1, 1, 0)x(1, 0, 1, 12) - AIC:1105.4080055027964
SARIMA(1, 1, 0)x(1, 0, 2, 12) - AIC:958.5001972955552
SARIMA(1, 1, 0)x(2, 0, 0, 12) - AIC:939.0984778859015
SARIMA(1, 1, 0)x(2, 0, 1, 12) - AIC:940.9087133728217
SARIMA(1, 1, 0)x(2, 0, 2, 12) - AIC:942.2973104080617
SARIMA(1, 1, 1)x(0, 0, 0, 12) - AIC:1262.1840064428402
SARIMA(1, 1, 1)x(0, 0, 1, 12) - AIC:1094.3172708191385
SARIMA(1, 1, 1)x(0, 0, 2, 12) - AIC:923.0862223896353
SARIMA(1, 1, 1)x(1, 0, 0, 12) - AIC:1083.3937962968434
SARIMA(1, 1, 1)x(1, 0, 1, 12) - AIC:1054.7180546776658
SARIMA(1, 1, 1)x(1, 0, 2, 12) - AIC:916.3549420363358
SARIMA(1, 1, 1)x(2, 0, 0, 12) - AIC:905.9249060678128
SARIMA(1, 1, 1)x(2, 0, 1, 12) - AIC:907.2972867451367
SARIMA(1, 1, 1)x(2, 0, 2, 12) - AIC:900.6725796178105
SARIMA(1, 1, 2)x(0, 0, 0, 12) - AIC:1251.9495040731845
SARIMA(1, 1, 2)x(0, 0, 1, 12) - AIC:1085.4861928032612
SARIMA(1, 1, 2)x(0, 0, 2, 12) - AIC:915.4938405402323
SARIMA(1, 1, 2)x(1, 0, 0, 12) - AIC:1090.7768413244564
SARIMA(1, 1, 2)x(1, 0, 1, 12) - AIC:1042.6183156840766
SARIMA(1, 1, 2)x(1, 0, 2, 12) - AIC:906.7318502489444
SARIMA(1, 1, 2)x(2, 0, 0, 12) - AIC:906.1690198933999
SARIMA(1, 1, 2)x(2, 0, 1, 12) - AIC:907.4597828144317
SARIMA(1, 1, 2)x(2, 0, 2, 12) - AIC:905.1600388612114
SARIMA(2, 1, 0)x(0, 0, 0, 12) - AIC:1280.2537561535767
SARIMA(2, 1, 0)x(0, 0, 1, 12) - AIC:1128.77736963314
SARIMA(2, 1, 0)x(0, 0, 2, 12) - AIC:958.0793208597413
SARIMA(2, 1, 0)x(1, 0, 0, 12) - AIC:1099.5086021572488
SARIMA(2, 1, 0)x(1, 0, 1, 12) - AIC:1076.7863199029105
SARIMA(2, 1, 0)x(1, 0, 2, 12) - AIC:951.1988165852144
SARIMA(2, 1, 0)x(2, 0, 0, 12) - AIC:924.6004792567292
SARIMA(2, 1, 0)x(2, 0, 1, 12) - AIC:925.975780186385
SARIMA(2, 1, 0)x(2, 0, 2, 12) - AIC:927.8380693436321
SARIMA(2, 1, 1)x(0, 0, 0, 12) - AIC:1263.2315232809276
SARIMA(2, 1, 1)x(0, 0, 1, 12) - AIC:1094.2093491882445
SARIMA(2, 1, 1)x(0, 0, 2, 12) - AIC:922.9408472105727
SARIMA(2, 1, 1)x(1, 0, 0, 12) - AIC:1071.424960152966
SARIMA(2, 1, 1)x(1, 0, 1, 12) - AIC:1052.9244468735646
SARIMA(2, 1, 1)x(1, 0, 2, 12) - AIC:916.2424915017058
SARIMA(2, 1, 1)x(2, 0, 0, 12) - AIC:896.5181606626418
SARIMA(2, 1, 1)x(2, 0, 1, 12) - AIC:897.6399567764543
SARIMA(2, 1, 1)x(2, 0, 2, 12) - AIC:899.4835866487186
SARIMA(2, 1, 2)x(0, 0, 0, 12) - AIC:1253.9102116555646
SARIMA(2, 1, 2)x(0, 0, 1, 12) - AIC:1085.964368136149
SARIMA(2, 1, 2)x(0, 0, 2, 12) - AIC:916.3259134019722
SARIMA(2, 1, 2)x(1, 0, 0, 12) - AIC:1073.291271116631
SARIMA(2, 1, 2)x(1, 0, 1, 12) - AIC:1044.190935848489
SARIMA(2, 1, 2)x(1, 0, 2, 12) - AIC:907.6661538095441
SARIMA(2, 1, 2)x(2, 0, 0, 12) - AIC:897.3464978203967
SARIMA(2, 1, 2)x(2, 0, 1, 12) - AIC:898.3782488307854
SARIMA(2, 1, 2)x(2, 0, 2, 12) - AIC:890.6688480021876
Out[145]:
param seasonal AIC
26 (0, 1, 2) (2, 0, 2, 12) 887.937509
80 (2, 1, 2) (2, 0, 2, 12) 890.668848
69 (2, 1, 1) (2, 0, 0, 12) 896.518161
78 (2, 1, 2) (2, 0, 0, 12) 897.346498
70 (2, 1, 1) (2, 0, 1, 12) 897.639957
auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(0, 1, 2),
SARIMAX Results
=======================================================================
===================
132
-436.969
887.938
Time: 23:05:18 BIC
906.448
Sample: 0 HQIC
895.437
- 132
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.8428 174.448 -0.005 0.996 -342.754
341.069
ma.L2 -0.1572 27.457 -0.006 0.995 -53.971
53.657
ar.S.L12 0.3467 0.079 4.374 0.000 0.191
0.502
ar.S.L24 0.3023 0.076 3.996 0.000 0.154
0.451
ma.S.L12 0.0767 0.133 0.577 0.564 -0.184
0.337
ma.S.L24 -0.0726 0.146 -0.498 0.618 -0.358
0.213
sigma2 251.3159 4.38e+04 0.006 0.995 -8.57e+04
8.62e+04
=======================================================================
============
2.33
0.31
0.37
3.03
=======================================================================
============
Warnings:
(complex-step).
plt.show()
In [148]: predicted_auto_SARIMA_12 = results_auto_SARIMA_12.get_forecast(steps=le

n(test))
Out[149]:
0 62.867710 15.928466 31.648491 94.086929
1 70.541191 16.147539 38.892596 102.189786
2 77.357273 16.147537 45.708683 109.005863
3 76.209639 16.147537 44.561049 107.858230
4 72.748088 16.147537 41.099498 104.396678
In [150]: rmse = mean_squared_error(test['Rose'],predicted_auto_SARIMA_12.predict

ed_mean,squared=False)
print(rmse)
26.929368240084877

,index=['SARIMA(0,1,2)(2,0,2,12)'])
resultsDf
Out[151]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
7. Build ARIMA/SARIMA models based on the cut-off points of ACF

and PACF on the training data and evaluate this model on the test
data using RMSE.
Manual ARIMA

plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Pa
rtial Autocorrelation for Rose Wine Production')
plt.show()
In [153]: manual_ARIMA = ARIMA(train['Rose'].astype('float64'), order=(1,1,1),fre
q='M')
results_manual_ARIMA = manual_ARIMA.fit()
print(results_manual_ARIMA.summary())
ARIMA Model Results

=======================================================================
=======
Dep. Variable: D.Rose No. Observations:
131
Model: ARIMA(1, 1, 1) Log Likelihood -
634.888
Method: css-mle S.D. of innovations
30.280
277.776
Time: 23:05:20 BIC 1
289.277
282.449
- 12-31-1990
=======================================================================
=========
0.975]
-----------------------------------------------------------------------
---------
const -0.4871 0.086 -5.656 0.000 -0.656
-0.318
ar.L1.D.Rose 0.2006 0.087 2.293 0.022 0.029
0.372
ma.L1.D.Rose -0.9999 0.035 -28.645 0.000 -1.068
-0.931
Roots
=======================================================================
======
quency
-----------------------------------------------------------------------
------
AR.1 4.9857 +0.0000j 4.9857
0.0000
MA.1 1.0001 +0.0000j 1.0001
0.0000
-----------------------------------------------------------------------
------
In [154]: predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))
In [155]: rmse = mean_squared_error(test['Rose'],predicted_manual_ARIMA[0],square

d=False)
print(rmse)
15.734258659597247

resultsDf
Out[156]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
Manual SARIMA

plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Differenced Data Pa
tial Autocorrelation For Rose Wine Production')
plt.show()
In [158]: df.plot()
plt.grid();
In [159]: (df['Rose'].diff(6)).plot()
plt.grid();
In [160]: (df['Rose'].diff(6)).diff().plot()
plt.grid();
In [161]: test_stationarity((train['Rose'].diff(6).dropna()).diff(1).dropna())

p-value 1.418693e-09
#Lags Used 1.300000e+01
dtype: float64
In [162]: plot_acf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30);
manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 1),
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
132
-545.056
1100.112
Time: 23:05:23 BIC
1113.923
Sample: 0 HQIC
1105.719
- 132
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ar.L1 0.1764 0.070 2.516 0.012 0.039
0.314
ma.L1 -1.0452 0.059 -17.608 0.000 -1.161
-0.929
ar.S.L6 -0.6660 0.053 -12.552 0.000 -0.770
-0.562
ma.S.L6 -0.4840 0.093 -5.213 0.000 -0.666
-0.302
sigma2 579.5488 67.298 8.612 0.000 447.648
711.450
=======================================================================
============
143.27
0.00
0.18
8.41
=======================================================================
============
Warnings:
(complex-step).
In [164]: results_manual_SARIMA_6.plot_diagnostics()
plt.show()
In [165]: predicted_manual_SARIMA_6 = results_manual_SARIMA_6.get_forecast(steps=

len(test))
In [166]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()
Out[166]:
0 53.820556 25.161040 4.505823 103.135288
1 65.533301 25.760765 15.043128 116.023473
2 74.100741 25.843170 23.449059 124.752424
3 77.920921 25.883870 27.189469 128.652374
4 71.659662 25.918694 20.859955 122.459369
In [167]: rmse = mean_squared_error(test['Rose'],predicted_manual_SARIMA_6.predic

ted_mean,squared=False)
print(rmse)
20.96410963443785

,index=['SARIMA(1,1,1)(1,1,1,6)'])
resultsDf
Out[168]:
Test RMSE
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
Test RMSE
SARIMA(1,1,1)(1,1,1,6) 20.964110
8. Build a table (create a data frame) with all the models built along
with their corresponding parameters and the respective RMSE values
on the test data.

,index=['SARIMA(1,1,2)(2,0,2,6)'])
resultsDf
Out[169]:
Test RMSE
ARIMA(0,1,2) 15.618912
Test RMSE
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
SARIMA(1,1,1)(1,1,1,6) 20.964110
SARIMA(1,1,2)(2,0,2,6) 20.964110
In [170]: print('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n'
,)
resultsDf.sort_values(by=['Test RMSE'])
Sorted by RMSE values on the Test Data for Rose Wine sale:
Out[170]:
Test RMSE
ARIMA(0,1,2) 15.618912
ARIMA(1,1,1) 15.734259
SARIMA(1,1,2)(2,0,2,6) 20.964110
SARIMA(1,1,1)(1,1,1,6) 20.964110
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
Test RMSE
9. Based on the model-building exercise, build the most optimum

model(s) on the complete data and predict 12 months into the future
with appropriate confidence intervals/bands.
Building the most optimum model on the Full Data.
plt.plot(train['Rose'], label='Train')

plt.grid();
plt.title('Plot of Exponential Smoothing Acutal Values for Rose Wine Pr
oductions');
In [172]: fullmodel = ExponentialSmoothing(df,
trend='additive',
seasonal='multiplicative').fit(smooth
ing_level=0.3,
smooth
ing_slope=0.4,
smooth
ing_seasonal=0.3)
In [173]: RMSE_fullmodel = metrics.mean_squared_error(df['Rose'],fullmodel.fitted

values,squared=False)
print('RMSE:',RMSE_fullmodel)
RMSE: 24.266535961104537
In [174]: # Getting the predictions for the same number of times stamps that are
present in the test data
prediction = fullmodel.forecast(steps=len(test))
In [175]: df.plot()
prediction.plot();
In [176]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()
Out[176]:
lower_CI prediction upper_ci
1995-08-31 -7.559277 40.074648 87.708574
1995-09-30 -8.388196 39.245730 86.879656
1995 10 31 6 321556 41 312369 88 946295

1995-10-31 -6.321556 41.312369 88.946295
1995-11-30 0.433543 48.067469 95.701394
1995-12-31 21.101528 68.735454 116.369379
In [177]: # plot the forecast along with the confidence band
axis = df.plot(label='Actual', figsize=(13,6))

pred_df['prediction'].plot(ax=axis, label='Forecast', alpha=0.5)
axis.fill_between(pred_df.index, pred_df['lower_CI'], pred_df['upper_c
i'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Rose')
plt.grid()
plt.show()
END
1. Read the data as an appropriate Time Series data and plot the
data.
In [178]: df1 = pd.read_csv("Sparkling.csv")
In [179]: df1.head()
Out[179]:
YearMonth Sparkling
0 1980-01 1686
1 1980-02 1591
2 1980-03 2304
3 1980-04 1712
4 1980-05 1471
In [180]: df1.tail()
Out[180]:
YearMonth Sparkling
182 1995-03 1897
183 1995-04 1862
184 1995-05 1670
185 1995-06 1688
186 1995-07 2031
In [181]: df1.plot();
plt.grid()
In [182]: date1 = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date1
Out[182]: DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',

'1980-05-31', '1980-06-30', '1980-07-31', '1980-08-31',
'1980-09-30', '1980-10-31',
...
'1994-10-31', '1994-11-30', '1994-12-31', '1995-01-31',
'1995-02-28', '1995-03-31', '1995-04-30', '1995-05-31',
'1995-06-30', '1995-07-31'],
dtype='datetime64[ns]', length=187, freq='M')
In [183]: df1['Time_Stamp'] = pd.DataFrame(date1,columns=['Month'])

df1.head()
Out[183]:
YearMonth Sparkling Time_Stamp
0 1980-01 1686 1980-01-31
1 1980-02 1591 1980-02-29
2 1980-03 2304 1980-03-31
YearMonth Sparkling Time_Stamp
3 1980-04 1712 1980-04-30
4 1980-05 1471 1980-05-31
In [184]: df1['Time_Stamp']=pd.to_datetime(df1['Time_Stamp'])
df1= df1.set_index('Time_Stamp')
df1.drop(['YearMonth'],axis=1, inplace = True)
df1.head()
Out[184]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471
In [185]: df1.plot();
plt.grid()
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.
In [186]: df1.describe(include='all').T
Out[186]:
count mean std min 25% 50% 75% max
Sparkling 187.0 2402.417112 1295.11154 1070.0 1605.0 1874.0 2549.0 7242.0
In [187]: df1.isnull().sum()
Out[187]: Sparkling 0
dtype: int64
In [188]: df1.shape
Out[188]: (187, 1)
In [189]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot
sns.boxplot(x = df1.index.year,y = df1.values[:,0],ax=ax)
plt.xlabel('Yearly Sparkling Wine Sale');
plt.ylabel('Years');
plt.grid();
In [190]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df1.index.month_name(),y = df1.values[:,0],ax=ax)
plt.xlabel('Monthly Sparkling Wine Sale');
plt.ylabel('Monthls');
plt.grid();
In [191]: from statsmodels.graphics.tsaplots import month_plot
month_plot(df1,ylabel='Sparkling Wine Production')

plt.grid();
In [192]: monthly_sales_across_years = pd.pivot_table(df1, values = 'Sparkling',

columns = df1.index.month_name(), index = df1.index.year)
monthly_sales_across_years
Out[192]:
Time_Stamp April August December February January July June March May Nove
Time_Stamp
1980 1712.0 2453.0 5179.0 1591.0 1686.0 1966.0 1377.0 2304.0 1471.0 4
1981 1976.0 2472.0 4551.0 1523.0 1530.0 1781.0 1480.0 1633.0 1170.0 3
1982 1790.0 1897.0 4524.0 1329.0 1510.0 1954.0 1449.0 1518.0 1537.0 3
1983 1375.0 2298.0 4923.0 1638.0 1609.0 1600.0 1245.0 2030.0 1320.0 3
1984 1789.0 3159.0 5274.0 1435.0 1609.0 1597.0 1404.0 2061.0 1567.0 4
1985 1589.0 2512.0 5434.0 1682.0 1771.0 1645.0 1379.0 1846.0 1896.0 4
Time_Stamp April August December February January July June March May Nove
Time_Stamp
1986 1605.0 3318.0 5891.0 1523.0 1606.0 2584.0 1403.0 1577.0 1765.0 3
1987 1935.0 1930.0 7242.0 1442.0 1389.0 1847.0 1250.0 1548.0 1518.0 4
1988 2336.0 1645.0 6757.0 1779.0 1853.0 2230.0 1661.0 2108.0 1728.0 4
1989 1650.0 1968.0 6694.0 1394.0 1757.0 1971.0 1406.0 1982.0 1654.0 4
1990 1628.0 1605.0 6047.0 1321.0 1720.0 1899.0 1457.0 1859.0 1615.0 4
1991 1279.0 1857.0 6153.0 2049.0 1902.0 2214.0 1540.0 1874.0 1432.0 3
1992 1997.0 1773.0 6119.0 1667.0 1577.0 2076.0 1625.0 1993.0 1783.0 4
1993 2121.0 2795.0 6410.0 1564.0 1494.0 2048.0 1515.0 1898.0 1831.0 4
1994 1725.0 1495.0 5999.0 1968.0 1197.0 2031.0 1693.0 1720.0 1674.0 3
1995 1862.0 NaN NaN 1402.0 1070.0 2031.0 1688.0 1897.0 1670.0
In [193]: monthly_sales_across_years.plot()
plt.grid()
plt.legend(loc='best');
In [194]: df_yearly_sum1 = df1.resample('A').sum()
df_yearly_sum1.head()
Out[194]:
Sparkling
Time_Stamp
1980-12-31 28406
1981-12-31 26227
1982-12-31 25321
1983-12-31 26180
1984-12-31 28431
In [195]: df_yearly_sum1.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');
In [196]: df_yearly_mean1 = df1.resample('Y').mean()
df_yearly_mean1.head()
Out[196]:
Sparkling
Time_Stamp
1980-12-31 2367.166667
1981-12-31 2185.583333
1982-12-31 2110.083333
1983-12-31 2181.666667
1984-12-31 2369.250000
In [197]: df_yearly_mean1.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');
In [198]: df_quarterly_sum1 = df1.resample('Q').sum()
df_quarterly_sum1.head()
Out[198]:
Sparkling
Time_Stamp
1980-03-31 5581
1980-06-30 4560
1980-09-30 6403
1980-12-31 11862
1981-03-31 4686
In [199]: df_quarterly_sum1.plot();
plt.grid()
In [200]: df_quarterly_mean1 = df1.resample('Q').mean()
df_quarterly_mean1.head()
Out[200]:
Sparkling
Time_Stamp
1980-03-31 1860.333333
1980-06-30 1520.000000
1980-09-30 2134.333333
1980-12-31 3954.000000
1981-03-31 1562.000000
In [201]: df_quarterly_mean1.plot();
plt.grid()
In [202]: df_daily_sum1 = df1.resample('D').sum()
df_daily_sum1
Out[202]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-01 0
1980-02-02 0
1980-02-03 0
1980-02-04 0
... ...
1995-07-27 0
1995-07-28 0
1995-07-29 0
Sparkling
Time_Stamp
1995-07-30 0
1995-07-31 2031
In [203]: df_daily_sum1.plot()
plt.grid();
In [204]: df_decade_sum1 = df1.resample('10Y').sum()

df_decade_sum1
Out[204]:
Sparkling
Time_Stamp
1980-12-31 28406
Sparkling
Time_Stamp
1990-12-31 288893
2000-12-31 131953
In [205]: df_decade_sum1.plot();
plt.grid()
In [206]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df1['Sparkling'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Sparkling Wine Sales');
In [207]: # group by date and get average RetailSales, and precent change
average = df1.groupby(df1.index)["Sparkling"].mean()
pct_change = df1.groupby(df1.index)["Sparkling"].sum().pct_change()
fig, (axis1,axis2) = plt.subplots(2,1,sharex=True)
# plot average RetailSales over time(year-month)

ax1 = average.plot(legend=True,ax=axis1,marker='o',title="Average Spark
ling Wine Sales",grid=True)
ax1.set_xticks(range(len(average)))
ax1.set_xticklabels(average.index.tolist())
# plot precent change for RetailSales over time(year-month)
ax2 = pct_change.plot(legend=True,ax=axis2,marker='o',colormap="summer"
,title="Sparkling Wine Sales Percent Change",grid=True)
In [208]: from statsmodels.tsa.seasonal import seasonal_decompose
In [209]: decomposition1 = seasonal_decompose(df1['Sparkling'],model='additive')

decomposition1.plot();
In [210]: trend = decomposition1.trend
seasonality = decomposition1.seasonal
residual = decomposition1.resid
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Seasonality
Time_Stamp
1980-01-31 -854.260599
1980-02-29 -830.350678
1980-03-31 -592.356630
1980-04-30 -658.490559
1980-05-31 -824.416154
1980-06-30 -967.434011
1980-07-31 -465.502265
1980-08-31 -214.332821
1980-09-30 -254.677265
1980-10-31 599.769957
1980-11-30 1675.067179
1980-12-31 3386.983846
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 70.835599
1980-08-31 315.999487
1980-09-30 -81.864401
1980-10-31 -307.353290
1980-11-30 109.891154
1980-12-31 -501.775513
In [211]: decomposition2 = seasonal_decompose(df1['Sparkling'],model='multiplicat

ive')
decomposition2.plot();
In [212]: trend = decomposition2.trend
seasonality = decomposition2.seasonal
residual = decomposition2.resid
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Seasonality
Time_Stamp
1980-01-31 0.649843
1980-02-29 0.659214
1980-03-31 0.757440
1980-04-30 0.730351
1980-05-31 0.660609
1980-06-30 0.603468
1980-07-31 0.809164
1980-08-31 0.918822
1980-09-30 0.894367
1980-10-31 1.241789
1980-11-30 1.690158
1980-12-31 2.384776
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 1.029230
1980-08-31 1.135407
1980-09-30 0.955954
1980-10-31 0.907513
1980-11-30 1.050423
1980-12-31 0.946770
3. Split the data into training and test. The test data should start in
1991.
In [213]: train=df1[df1.index.year< 1991]

test=df1[df1.index.year>=1991]
In [214]: print(train.shape)
(132, 1)
In [215]: print(test.shape)
(55, 1)
In [216]: print('First few rows of Training Data','\n',train.head(),'\n')

print('Last few rows of Training Data','\n',train.tail(),'\n')
print('First few rows of Test Data','\n',test.head(),'\n')
print('Last few rows of Test Data','\n',test.tail(),'\n')
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471

Sparkling
Time_Stamp
1990-08-31 1605
1990-09-30 2424
1990-10-31 3116
1990-11-30 4286
1990-12-31 6047

Sparkling
Time_Stamp
1991-01-31 1902
1991-02-28 2049
1991-03-31 1874
1991-04-30 1279
1991-05-31 1432

Sparkling
Time_Stamp
1995-03-31 1897
1995-04-30 1862
1995-05-31 1670
1995-06-30 1688
1995-07-31 2031
In [217]: train['Sparkling'].plot(figsize=(13,6), fontsize=14)

test['Sparkling'].plot(figsize=(13,6), fontsize=14)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()
4. Build various exponential smoothing models on the training data

and evaluate the model using RMSE on the test data. Other models
such as regression,naïve forecast models and simple average
models. should also be built on the training data and check the
performance on the test data using RMSE.
Model 1: Linear Regression
In [218]: train_time = [i+1 for i in range(len(train))]

test_time = [i+43 for i in range(len(test))]
print('Training Time instance','\n',train_time)
print('Test Time instance','\n',test_time)
Training Time instance

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 2
0, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 1
22, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6
0, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]
In [219]: LinearRegression_train = train.copy()

LinearRegression_test = test.copy()
In [220]: LinearRegression_train['time'] = train_time

LinearRegression_test['time'] = test_time
print('First few rows of Training Data','\n',LinearRegression_train.hea

d(),'\n')
print('Last few rows of Training Data','\n',LinearRegression_train.tail
(),'\n')
print('First few rows of Test Data','\n',LinearRegression_test.head(),'
\n')
print('Last few rows of Test Data','\n',LinearRegression_test.tail(),'
\n')

Sparkling time
Time_Stamp
1980-01-31 1686 1
1980-02-29 1591 2
1980-03-31 2304 3
1980-04-30 1712 4
1980-05-31 1471 5

Sparkling time
Time_Stamp
1990 08 31 1605 128
1990-08-31 1605 128
1990-09-30 2424 129
1990-10-31 3116 130
1990-11-30 4286 131
1990-12-31 6047 132

Sparkling time
Time_Stamp
1991-01-31 1902 43
1991-02-28 2049 44
1991-03-31 1874 45
1991-04-30 1279 46
1991-05-31 1432 47

Sparkling time
Time_Stamp
1995-03-31 1897 93
1995-04-30 1862 94
1995-05-31 1670 95
1995-06-30 1688 96
1995-07-31 2031 97
In [221]: from sklearn.linear_model import LinearRegression
In [222]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Sparkli
ng'].values)
Out[222]: LinearRegression()
In [223]: test_predictions_model1 = lr.predict(LinearRegression_test[['ti

me']])
LinearRegression_test['RegOnTime'] = test_predictions_model1
plt.plot( train['Sparkling'], label='Train')
plt.plot(test['Sparkling'], label='Test')
plt.title("Linear Regression plot for Sparkling Wine")

Test Data')
plt.grid();
In [224]: from sklearn import metrics
rmse_model1_test = metrics.mean_squared_error(test['Sparkling'],test_pr
edictions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))
For RegressionOnTime forecast on the Test Data, RMSE is 1275.867
In [226]: resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test]},index=['Regr
essionOnTime'])
resultsDf
Out[226]:
Test RMSE
Model 2: Naive Approach
In [227]: NaiveModel_train = train.copy()

NaiveModel_test = test.copy()
In [228]: NaiveModel_test['naive'] = np.asarray(train['Sparkling'])[len(np.asarra

y(train['Sparkling']))-1]
NaiveModel_test['naive'].head()
Out[228]: Time_Stamp
1991-01-31 6047
1991-02-28 6047
1991-03-31 6047
1991-04-30 6047
1991-05-31 6047
Name: naive, dtype: int64
In [229]: plt.figure(figsize=(13,6))
plt.plot(NaiveModel_train['Sparkling'], label='Train')
plt.title("Naive Forecast for Sparkling Wine")
plt.grid();
rmse_model2_test = metrics.mean_squared_error(test['Sparkling'],NaiveMo
del_test['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))
For Naive forecast on the Test Data, RMSE is 3864.279
In [231]: resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test]},index=['Na

iveModel'])

resultsDf
Out[231]:
Test RMSE
Model 3 : Simple Average
In [232]: SimpleAverage_train = train.copy()

SimpleAverage_test = test.copy()
In [233]: SimpleAverage_test['mean_forecast'] = train['Sparkling'].mean()

SimpleAverage_test.head()
Out[233]:
Sparkling mean_forecast
Time_Stamp
1991-01-31 1902 2403.780303
1991-02-28 2049 2403.780303
1991-03-31 1874 2403.780303
1991-04-30 1279 2403.780303
1991-05-31 1432 2403.780303
In [234]: plt.figure(figsize=(13,6))
plt.plot(SimpleAverage_train['Sparkling'], label='Train')
plt.plot(SimpleAverage_test['Sparkling'], label='Test')
Test Data')
plt.title("Simple Average Forecast for Sparkling Wine")
plt.grid();
rmse_model3_test = metrics.mean_squared_error(test['Sparkling'],SimpleA
verage_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))
For Simple Average forecast on the Test Data, RMSE is 1275.082
In [236]: resultsDf_3 = pd.DataFrame({'Test RMSE': [rmse_model3_test]},index=['Si

mpleAverageModel'])

resultsDf
Out[236]:
Test RMSE
p g
Model 4: Moving Average
In [237]: MovingAverage = df1.copy()

Out[237]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471
In [238]: MovingAverage['Trailing_2'] = MovingAverage['Sparkling'].rolling(2).mea

n()
MovingAverage['Trailing_4'] = MovingAverage['Sparkling'].rolling(4).mea
n()
n()
n()
Out[238]:
Sparkling Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
1980-01-31 1686 NaN NaN NaN NaN
1980-02-29 1591 1638.5 NaN NaN NaN
1980-03-31 2304 1947.5 NaN NaN NaN
1980-04-30 1712
Sparkling 2008.0
Trailing_2 1823.25
Trailing_4 NaN
Trailing_6 NaN
Trailing_9
1980-05-31
Time_Stamp 1471 1591.5 1769.50 NaN NaN
In [239]: ## Plotting on the whole data
plt.plot(MovingAverage['Sparkling'], label='Train')
plt.title("Moving Average for Sparkling Wine")
plt.grid();
In [240]: trailing_MovingAverage_train=MovingAverage[df.index.year< 1991]

trailing_MovingAverage_test=MovingAverage[df.index.year>=1991]
plt.plot(trailing_MovingAverage_train['Sparkling'], label='Train')
plt.plot(trailing_MovingAverage_test['Sparkling'], label='Test')


plt.title("Moving Average for Sparkling Wine")
plt.grid();
In [242]: ## Test Data - RMSE --> 2 point Trailing MA
rmse_model4_test_2 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_2'],squared=False)
RMSE is %3.3f" %(rmse_model4_test_2))
RMSE is %3.3f " %(rmse_model4_test_9))
s 813.401
s 1156.590
s 1283.927
s 1346.278
In [243]: resultsDf_4 = pd.DataFrame({'Test RMSE': [rmse_model4_test_2,rmse_model

4_test_4
,rmse_model4_test_6,rmse_mode
l4_test_9]}
,index=['2pointTrailingMovingAverage','4poin
tTrailingMovingAverage'
,'6pointTrailingMovingAverage','9poi
ntTrailingMovingAverage'])

resultsDf
Out[243]:
Test RMSE
9 i tT ili M i A 1346 278315

Test RMSE
In [244]: ## Plotting on both Training and Test data
plt.plot(train['Sparkling'], label='Train')

Test Data')

Test Data')

ling Moving Average on Training Set')
plt.title("Model Comparison Plots for Sparkling Wine")
plt.grid();
Model 5: Simple Exponential Smoothing
In [245]: from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothin

g, Holt
In [246]: SES_train = train.copy()

SES_test = test.copy()
In [247]: model_SES = SimpleExpSmoothing(SES_train['Sparkling'])
In [248]: model_SES_autofit = model_SES.fit(optimized=True)
In [249]: model_SES_autofit.params

'smoothing_slope': nan,
'smoothing_seasonal': nan,
'initial_level': 2403.770045419478,
'initial_slope': nan,
'initial_seasons': array([], dtype=float64),
'lamda': None,
In [250]: SES_test['predict'] = model_SES_autofit.forecast(steps=len(test))

SES_test.head()
Out[250]:
Sparkling predict
Time_Stamp
1991-01-31 1902 2403.770045
1991-02-28 2049 2403.770045
Sparkling predict
Time_Stamp
1991-03-31 1874 2403.770045
1991-04-30 1279 2403.770045
1991-05-31 1432 2403.770045
plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')
plt.plot(SES_test['predict'], label='Alpha =0.0 Simple Exponential Smoo

thing predictions on Test Set')
plt.grid()
plt.title('Alpha =0.0 Predictions');
rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Sparkling'],S
ES_test['predict'],squared=False)
print("For Alpha =0.0 Simple Exponential Smoothing Model forecast on th
e Test Data, RMSE is %3.3f" %(rmse_model5_test_1))
For Alpha =0.0 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 1275.082
In [253]: resultsDf_5 = pd.DataFrame({'Test RMSE': [rmse_model5_test_1]},index=[

'Alpha=0.0,SimpleExponentialSmoothing'])

resultsDf
Out[253]:
Test RMSE
loop
resultsDf_6 = pd.DataFrame({'Alpha Values':[],'Train RMSE':[],'Test RMS

E': []})
resultsDf_6
Out[254]:
In [255]: for i in np.arange(0.3,1,0.1):

model_SES_alpha_i = model_SES.fit(smoothing_level=i,optimized=False
,use_brute=True)
SES_train['predict',i] = model_SES_alpha_i.fittedvalues
SES_test['predict',i] = model_SES_alpha_i.forecast(steps=len(test))
rmse_model5_train_i = metrics.mean_squared_error(SES_train['Sparkli
ng'],SES_train['predict',i],squared=False)
rmse_model5_test_i = metrics.mean_squared_error(SES_test['Sparklin
g'],SES_test['predict',i],squared=False)
resultsDf_6 = resultsDf_6.append({'Alpha Values':i,'Train RMSE':rms

e_model5_train_i
,'Test RMSE':rmse_model5_test_i},
ignore_index=True)
Out[256]:
0 0.3 1359.511747 1935.507132
1 0.4 1352.588879 2311.919615
2 0.5 1344.004369 2666.351413
3 0.6 1338.805381 2979.204388
4 0.7 1338.844308 3249.944092
plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')
plt.plot(SES_test['predict'], label='Alpha =1 Simple Exponential Smooth

ing predictions on Test Set')
plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential
Smoothing predictions on Test Set')
plt.grid();

['Test RMSE'],ascending=True).values[0][2]]}
,index=['Alpha=0.3,SimpleExponentialSmoothin
g'])

resultsDf
Out[258]:
Test RMSE
Test RMSE
Model 6: Double Exponential Smoothing - Holt's Model
In [259]: DES_train = train.copy()

DES_test = test.copy()
In [260]: model_DES = Holt(DES_train['Sparkling'])
loop
resultsDf_7 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Train R

MSE':[],'Test RMSE': []})
resultsDf_7
Out[261]:

model_DES_alpha_i_j = model_DES.fit(smoothing_level=i,smoothing
_slope=j,optimized=False,use_brute=True)
DES_train['predict',i,j] = model_DES_alpha_i_j.fittedvalues
DES_test['predict',i,j] = model_DES_alpha_i_j.forecast(steps=le
n(test))
rmse_model6_train = metrics.mean_squared_error(DES_train['Spark
ling'],DES_train['predict',i,j],squared=False)
rmse_model6_test = metrics.mean_squared_error(DES_test['Sparkli
ng'],DES_test['predict',i,j],squared=False)
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Value

s':j,'Train RMSE':rmse_model6_train
,'Test RMSE':rmse_model6_test
In [263]: resultsDf_7
Out[263]:
0 0.3 0.3 1592.292788 18259.110704
1 0.3 0.4 1682.573828 26069.841401
2 0.3 0.5 1771.710791 34401.512440
3 0.3 0.6 1848.576510 42162.748095
4 0.3 0.7 1899.949006 47832.397419
... ... ... ... ...
59 1.0 0.6 1753.402326 49327.087977
60 1.0 0.7 1825.187155 52655.765663
61 1.0 0.8 1902.013709 55442.273880
62 1.0 0.9 1985.368445 57823.177011
63 1.0 1.0 2077.672157 59877.076519
Out[264]: Alpha Values Beta Values Train RMSE Test RMSE
0 0.3 0.3 1592.292788 18259.110704
8 0.4 0.3 1569.338606 23878.496940
1 0.3 0.4 1682.573828 26069.841401
16 0.5 0.3 1530.575845 27095.532414
24 0.6 0.3 1506.449870 29070.722592
plt.plot(DES_train['Sparkling'], label='Train')
plt.plot(DES_test['Sparkling'], label='Test')
plt.plot(DES_test['predict', 0.3, 0.3], label='Alpha=0.3,Beta=0.3,Doubl

eExponentialSmoothing predictions on Test Set')
plt.grid();
['Test RMSE']).values[0][3]]}
,index=['Alpha=0.3,Beta=0.3,DoubleExponentia
lSmoothing'])

resultsDf
Out[266]:
Test RMSE
Model 7: Triple Exponential Smoothing - Holt Winter's Model
In [267]: TES_train = train.copy()

TES_test = test.copy()
In [268]: model_TES = ExponentialSmoothing(TES_train['Sparkling'],trend='additiv

e',seasonal='multiplicative',freq='M')
In [269]: model_TES_autofit = model_TES.fit()
In [270]: model_TES_autofit.params

'smoothing_slope': 1.3071916052588054e-21,
'smoothing_seasonal': 0.3713310279757222,
'initial_level': 1639.9993541529163,
'initial_slope': 4.847700780681414,
'initial_seasons': array([1.00842157, 0.96898339, 1.24179259, 1.132059
82, 0.93982621,
0.93811361, 1.22459277, 1.54431221, 1.27337186, 1.63199233,
2.48297346, 3.11866318]),
'lamda': None,
In [271]: ## Prediction on the test data
TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()
Out[271]:
Sparkling auto_predict
Time_Stamp
1991-01-31 1902 1602.184489
1991-02-28 2049 1373.877337
1991-03-31 1874 1807.430667
1991-04-30 1279 1704.563503
1991-05-31 1432 1602.368982
plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')
plt.grid();
rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Sparkling'],T
ES_test['auto_predict'],squared=False)
print("For Alpha=0.154,Beta=1.307,Gamma=0.371, Triple Exponential Smoot
hing Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_tes
t_1))
For Alpha=0.154,Beta=1.307,Gamma=0.371, Triple Exponential Smoothing Mo

del forecast on the Test Data, RMSE is 383.156
In [274]: resultsDf_8_1 = pd.DataFrame({'Test RMSE': [rmse_model6_test_1]}

,index=['Alpha=0.154,Beta=1.307,Gamma=0.371,
TripleExponentialSmoothing'])
resultsDf
Out[274]:
Test RMSE
loop
resultsDf_8_2 = pd.DataFrame({'Alpha Values':[],'Beta Values':[],'Gamma

Values':[],'Train RMSE':[],'Test RMSE': []})
resultsDf_8_2
Out[275]:

for k in np.arange(0.3,1.1,0.1):
model_TES_alpha_i_j_k = model_TES.fit(smoothing_level=i,smo
othing_slope=j,smoothing_seasonal=k,optimized=False,use_brute=True)
TES_train['predict',i,j,k] = model_TES_alpha_i_j_k.fittedva
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))
rmse_model8_train = metrics.mean_squared_error(TES_train['S
parkling'],TES_train['predict',i,j,k],squared=False)
rmse_model8_test = metrics.mean_squared_error(TES_test['Spa
rkling'],TES_test['predict',i,j,k],squared=False)
resultsDf_8_2 = resultsDf_8_2.append({'Alpha Values':i,'Bet

a Values':j,'Gamma Values':k,
'Train RMSE':rmse_mod
el8_train,'Test RMSE':rmse_model8_test}
, ignore_index=True)
In [277]: resultsDf_8_2.sort_values(by=['Test RMSE']).head()
Out[277]:
0 0.3 0.3 0.3 404.513320 392.786198
8 0.3 0.4 0.3 424.828055 410.854547
65 0.4 0.3 0.4 435.553595 421.409170
296 0.7 0.8 0.3 700.317756 518.188752
130 0.5 0.3 0.5 498.239915 542.175497
In [278]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination
#The value of alpha and beta is taken like that by python

plt.grid();
In [279]: resultsDf_8_3 = pd.DataFrame({'Test RMSE': [resultsDf_8_2.sort_values(b

y=['Test RMSE']).values[0][4]]}
,index=['Alpha=0.3,Beta=0.3,Gamma=0.3,Triple
ExponentialSmoothing'])

resultsDf
Out[279]:
Test RMSE
Test RMSE
5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.
In [280]: ## Test for stationarity of the series - Dicky Fuller test
from statsmodels.tsa.stattools import adfuller

def test_stationarity(timeseries):
#Determing rolling statistics

rolmean = timeseries.rolling(window=7).mean() #determining the roll
ing mean
rolstd = timeseries.rolling(window=7).std() #determining the roll
ing standard deviation
#Plot rolling statistics:

orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Perform Dickey-Fuller test:

print ('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value'
,'#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print (dfoutput,'\n')
In [281]: test_stationarity(df1['Sparkling'])

p-value 0.601061
dtype: float64
In [282]: test_stationarity(df1['Sparkling'].diff().dropna())

p-value 0.000000
dtype: float64
6. Build an automated version of the ARIMA/SARIMA

model in which the parameters are selected using the
lowest Akaike Information Criteria (AIC) on the training
data and evaluate this model on the test data using
RMSE.
In [283]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [284]: plot_acf(df1['Sparkling'],lags=50)
plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Diff
erenced Data Autocorrelation')
plt.show()
In [285]: plot_pacf(df1['Sparkling'],lags=50)
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Dif
ferenced Data Partial Autocorrelation')
plt.show()
In [286]: test_stationarity(train['Sparkling'])

p-value 0.669744
dtype: float64
In [287]: test_stationarity(train['Sparkling'].diff().dropna())

p-value 2.280104e-12
#Lags Used 1.100000e+01
dtype: float64
In [288]: ## The following loop helps us in getting a combination of different pa

rameters of p and q in the range of 0 and 2
## We have kept the value of d as 1 as we need to take a difference of
the series to make it stationary.
import itertools
p = q = range(0, 3)
d= range(1,2)
print('Some parameter combinations for the Model...')
print('Model: {}'.format(pdq[i]))
Some parameter combinations for the Model...

Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
In [289]: # Creating an empty Dataframe with column names only

ARIMA_AIC = pd.DataFrame(columns=['param', 'AIC'])
ARIMA_AIC
Out[289]:
param AIC
In [290]: from statsmodels.tsa.arima_model import ARIMA
for param in pdq:

ARIMA_model = ARIMA(train['Sparkling'].values,order=param).fit()
print('ARIMA{} - AIC:{}'.format(param,ARIMA_model.aic))
ARIMA_AIC = ARIMA_AIC.append({'param':param, 'AIC': ARIMA_model.aic
ARIMA(0, 1, 0) - AIC:2269.582796371201
ARIMA(0, 1, 1) - AIC:2264.906436778214
ARIMA(0, 1, 2) - AIC:2232.7830976841487
ARIMA(1, 1, 0) - AIC:2268.528060607777
ARIMA(1, 1, 1) - AIC:2235.0139453507686
ARIMA(1, 1, 2) - AIC:2233.5976471191107
ARIMA(2, 1, 0) - AIC:2262.035600076631
ARIMA(2, 1, 1) - AIC:2232.3604898787803
ARIMA(2, 1, 2) - AIC:2210.6166920985434
In [291]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value
ARIMA_AIC.sort_values(by='AIC',ascending=True)
Out[291]:
param AIC
8 (2, 1, 2) 2210.616692
7 (2, 1, 1) 2232.360490
2 (0, 1, 2) 2232.783098
5 (1, 1, 2) 2233.597647
4 (1, 1, 1) 2235.013945
6 (2, 1, 0) 2262.035600
1 (0, 1, 1) 2264.906437
3 (1, 1, 0) 2268.528061
0 (0, 1, 0) 2269.582796
In [292]: auto_ARIMA = ARIMA(train['Sparkling'], order=(2,1,2),freq='M')
results_auto_ARIMA = auto_ARIMA.fit()
print(results_auto_ARIMA.summary())
ARIMA Model Results

=======================================================================
=======
Dep. Variable: D.Sparkling No. Observations:
131
Model: ARIMA(2, 1, 2) Log Likelihood -1
099.308
Method: css-mle S.D. of innovations 1
012.075
210.617
Time: 23:06:12 BIC 2
227.868
217.627
- 12-31-1990
=======================================================================
==============
25 0.975]
-----------------------------------------------------------------------
--------------
const 5.5858 0.516 10.823 0.000 4.5
74 6.597
ar.L1.D.Sparkling 1.2697 0.074 17.044 0.000 1.1
24 1.416
ar.L2.D.Sparkling -0.5600 0.074 -7.615 0.000 -0.7
04 -0.416
ma.L1.D.Sparkling -1.9991 0.042 -47.161 0.000 -2.0
82 -1.916
ma.L2.D.Sparkling 0.9991 0.042 23.584 0.000 0.9
16 1.082
Roots
=======================================================================
======
quency
-----------------------------------------------------------------------
------
AR.1 1.1337 -0.7074j 1.3363 -
0.0888
AR.2 1.1337 +0.7074j 1.3363
0.0888
MA.1 1.0005 -0.0005j 1.0005 -
0.0001
MA.2 1.0005 +0.0005j 1.0005
0.0001
-----------------------------------------------------------------------
------
In [293]: predicted_auto_ARIMA = results_auto_ARIMA.forecast(steps=len(test))
In [294]: from sklearn.metrics import mean_squared_error

rmse = mean_squared_error(test['Sparkling'],predicted_auto_ARIMA[0],squ
ared=False)
print(rmse)
1374.9769475915175


resultsDf
Out[295]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA
In [296]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plt.show()

p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]

Model: (0, 1, 1)(0, 0, 1, 6)
Model: (0, 1, 2)(0, 0, 2, 6)
Model: (1, 1, 0)(1, 0, 0, 6)
Model: (1, 1, 1)(1, 0, 1, 6)
Model: (1, 1, 2)(1, 0, 2, 6)
Model: (2, 1, 0)(2, 0, 0, 6)
Model: (2, 1, 1)(2, 0, 1, 6)
Model: (2, 1, 2)(2, 0, 2, 6)

SARIMA_AIC
Out[298]:
param seasonal AIC
for param in pdq:

SARIMA_model = sm.tsa.statespace.SARIMAX(train['Sparkling'].val
ues,
order=param,
al,
)
lts_SARIMA.aic))
SARIMA(0, 1, 0)x(0, 0, 0, 6) - AIC:2251.3597196862966

SARIMA(0, 1, 0)x(0, 0, 1, 6) - AIC:2152.3780760934646
SARIMA(0, 1, 0)x(0, 0, 2, 6) - AIC:1955.635553691462
SARIMA(0, 1, 0)x(1, 0, 0, 6) - AIC:2164.4097581959904
SARIMA(0, 1, 0)x(1, 0, 1, 6) - AIC:2079.559988015178
SARIMA(0, 1, 0)x(1, 0, 2, 6) - AIC:1926.936011141801
SARIMA(0, 1, 0)x(2, 0, 0, 6) - AIC:1839.401298687227
SARIMA(0, 1, 0)x(2, 0, 1, 6) - AIC:1841.1993618161407
SARIMA(0, 1, 0)x(2, 0, 2, 6) - AIC:1810.9177811077682
SARIMA(0, 1, 1)x(0, 0, 0, 6) - AIC:2230.1629078505994
SARIMA(0, 1, 1)x(0, 0, 1, 6) - AIC:2130.565285908976
SARIMA(0, 1, 1)x(0, 0, 2, 6) - AIC:1918.1741580885578
SARIMA(0, 1, 1)x(1, 0, 0, 6) - AIC:2139.573319180911
SARIMA(0, 1, 1)x(1, 0, 1, 6) - AIC:2006.5174298478878
SARIMA(0, 1, 1)x(1, 0, 2, 6) - AIC:1855.7093275268933
SARIMA(0, 1, 1)x(2, 0, 0, 6) - AIC:1798.7885104786772
SARIMA(0, 1, 1)x(2, 0, 1, 6) - AIC:1800.772626397826
SARIMA(0, 1, 1)x(2, 0, 2, 6) - AIC:1741.7036713090783
SARIMA(0, 1, 2)x(0, 0, 0, 6) - AIC:2187.441010220092
SARIMA(0, 1, 2)x(0, 0, 1, 6) - AIC:2084.4254745419194
SARIMA(0, 1, 2)x(0, 0, 2, 6) - AIC:1886.117140891461
SARIMA(0, 1, 2)x(1, 0, 0, 6) - AIC:2129.739572913447
SARIMA(0, 1, 2)x(1, 0, 1, 6) - AIC:1988.4111271264148
SARIMA(0, 1, 2)x(1, 0, 2, 6) - AIC:1839.6938085591055
SARIMA(0, 1, 2)x(2, 0, 0, 6) - AIC:1791.6537078775532
SARIMA(0, 1, 2)x(2, 0, 1, 6) - AIC:1793.6190985238984
SARIMA(0, 1, 2)x(2, 0, 2, 6) - AIC:1727.8879855309042
SARIMA(1, 1, 0)x(0, 0, 0, 6) - AIC:2250.3181267386713
SARIMA(1, 1, 0)x(0, 0, 1, 6) - AIC:2151.0782683263546
SARIMA(1, 1, 0)x(0, 0, 2, 6) - AIC:1953.365224545567
SARIMA(1, 1, 0)x(1, 0, 0, 6) - AIC:2146.1836648745016
SARIMA(1, 1, 0)x(1, 0, 1, 6) - AIC:2098.1824774568586
SARIMA(1, 1, 0)x(1, 0, 2, 6) - AIC:1917.5889469234053
SARIMA(1, 1, 0)x(2, 0, 0, 6) - AIC:1813.2423977806711
SARIMA(1, 1, 0)x(2, 0, 1, 6) - AIC:1814.830160311021
SARIMA(1, 1, 0)x(2, 0, 2, 6) - AIC:1791.3715261766997
SARIMA(1, 1, 1)x(0, 0, 0, 6) - AIC:2204.934049198176
SARIMA(1, 1, 1)x(0, 0, 0, 6) AIC:2204.934049198176
SARIMA(1, 1, 1)x(0, 0, 1, 6) - AIC:2103.2471232163575
SARIMA(1, 1, 1)x(0, 0, 2, 6) - AIC:1906.395141260643
SARIMA(1, 1, 1)x(1, 0, 0, 6) - AIC:2109.667121270094
SARIMA(1, 1, 1)x(1, 0, 1, 6) - AIC:2005.6125660382927
SARIMA(1, 1, 1)x(1, 0, 2, 6) - AIC:1856.072780455429
SARIMA(1, 1, 1)x(2, 0, 0, 6) - AIC:1776.94176703765
SARIMA(1, 1, 1)x(2, 0, 1, 6) - AIC:1778.822255728223
SARIMA(1, 1, 1)x(2, 0, 2, 6) - AIC:1777.2232751647634
SARIMA(1, 1, 2)x(0, 0, 0, 6) - AIC:2188.463345041383
SARIMA(1, 1, 2)x(0, 0, 1, 6) - AIC:2089.1320927065735
SARIMA(1, 1, 2)x(0, 0, 2, 6) - AIC:1908.3550140777963
SARIMA(1, 1, 2)x(1, 0, 0, 6) - AIC:2108.564551028778
SARIMA(1, 1, 2)x(1, 0, 1, 6) - AIC:1989.9755159145898
SARIMA(1, 1, 2)x(1, 0, 2, 6) - AIC:1839.3494618363907
SARIMA(1, 1, 2)x(2, 0, 0, 6) - AIC:1773.4229389216596
SARIMA(1, 1, 2)x(2, 0, 1, 6) - AIC:1775.2482456129092
SARIMA(1, 1, 2)x(2, 0, 2, 6) - AIC:1727.6786974488084
SARIMA(2, 1, 0)x(0, 0, 0, 6) - AIC:2227.302761872421
SARIMA(2, 1, 0)x(0, 0, 1, 6) - AIC:2145.357699167862
SARIMA(2, 1, 0)x(0, 0, 2, 6) - AIC:1945.156142600047
SARIMA(2, 1, 0)x(1, 0, 0, 6) - AIC:2124.9071786719132
SARIMA(2, 1, 0)x(1, 0, 1, 6) - AIC:2054.170076930759
SARIMA(2, 1, 0)x(1, 0, 2, 6) - AIC:1915.6336928719586
SARIMA(2, 1, 0)x(2, 0, 0, 6) - AIC:1782.7357821169276
SARIMA(2, 1, 0)x(2, 0, 1, 6) - AIC:1782.3598160572335
SARIMA(2, 1, 0)x(2, 0, 2, 6) - AIC:1760.3426707271706
SARIMA(2, 1, 1)x(0, 0, 0, 6) - AIC:2199.8586131872835
SARIMA(2, 1, 1)x(0, 0, 1, 6) - AIC:2103.0859053283684
SARIMA(2, 1, 1)x(0, 0, 2, 6) - AIC:1903.0344579425016
SARIMA(2, 1, 1)x(1, 0, 0, 6) - AIC:2088.133636377472
SARIMA(2, 1, 1)x(1, 0, 1, 6) - AIC:1997.3692879679495
SARIMA(2, 1, 1)x(1, 0, 2, 6) - AIC:1852.7863807256508
SARIMA(2, 1, 1)x(2, 0, 0, 6) - AIC:1794.8112224461129
SARIMA(2, 1, 1)x(2, 0, 1, 6) - AIC:1763.2674869873797
SARIMA(2, 1, 1)x(2, 0, 2, 6) - AIC:1744.0407502338978
SARIMA(2, 1, 2)x(0, 0, 0, 6) - AIC:2176.8698050945236
SARIMA(2, 1, 2)x(0, 0, 1, 6) - AIC:2078.885409868417
SARIMA(2, 1, 2)x(0, 0, 2, 6) - AIC:1882.5052122587012
SARIMA(2, 1, 2)x(1, 0, 0, 6) - AIC:2074.0236540188266
SARIMA(2, 1, 2)x(1, 0, 1, 6) - AIC:1955.6059004541967
SARIMA(2, 1, 2)x(1, 0, 1, 6) AIC:1955.6059004541967
SARIMA(2, 1, 2)x(1, 0, 2, 6) - AIC:1836.9065979271265
SARIMA(2, 1, 2)x(2, 0, 0, 6) - AIC:1758.9610729847134
SARIMA(2, 1, 2)x(2, 0, 1, 6) - AIC:1765.202888909307
SARIMA(2, 1, 2)x(2, 0, 2, 6) - AIC:1782.339988397053
Out[300]:
param seasonal AIC
53 (1, 1, 2) (2, 0, 2, 6) 1727.678697
26 (0, 1, 2) (2, 0, 2, 6) 1727.887986
17 (0, 1, 1) (2, 0, 2, 6) 1741.703671
71 (2, 1, 1) (2, 0, 2, 6) 1744.040750
78 (2, 1, 2) (2, 0, 0, 6) 1758.961073
auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
SARIMAX Results
=======================================================================
==================
132
-855.839
1727.679
Time: 23:07:48 BIC
1749.707
Sample: 0 HQIC
1736.621
- 132
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6449 0.286 -2.256 0.024 -1.205
-0.085
ma.L1 -0.1069 0.250 -0.428 0.669 -0.596
0.383
ma.L2 -0.7005 0.202 -3.469 0.001 -1.096
-0.305
ar.S.L6 -0.0044 0.027 -0.164 0.870 -0.057
0.049
ar.S.L12 1.0361 0.018 56.098 0.000 1.000
1.072
ma.S.L6 0.0674 0.152 0.443 0.658 -0.231
0.366
ma.S.L12 -0.6123 0.093 -6.591 0.000 -0.794
-0.430
sigma2 1.448e+05 1.71e+04 8.466 0.000 1.11e+05
1.78e+05
=======================================================================
============
25.24
0.00
0.47
5.09
=======================================================================
============
Warnings:
(complex-step).
plt.show()
In [303]: predicted_auto_SARIMA_6 = results_auto_SARIMA_6.get_forecast(steps=len(

test))
Out[304]:
0 1330.327397 380.564870 584.433959 2076.220835
1 1177.302125 392.117087 408.766757 1945.837493
2 1625.833239 392.311737 856.916363 2394.750115
3 1546.291624 397.712592 766.789267 2325.793981
4 1308.860522 398.932958 526.966292 2090.754753
In [305]: rmse = mean_squared_error(test['Sparkling'],predicted_auto_SARIMA_6.pre

dicted_mean,squared=False)
print(rmse)
626.880153949547

,index=['SARIMA(1,1,2)(2,0,2,6)'])
resultsDf = pd.concat([resultsDf, temp_resultsDf])
resultsDf
Out[306]:
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
Setting the seasonality as 12 for the second iteration of the auto SARIMA model.

p = q = range(0, 3)
d= range(1,2)
D = range(0,1)
model_pdq = [(x[0], x[1], x[2], 12) for x in list(itertools.product(p,
D, q))]

Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)

SARIMA_AIC
Out[308]:
param seasonal AIC
for param in pdq:

SARIMA_model = sm.tsa.statespace.SARIMAX(train['Sparkling'].val
ues,
order=param,
al,
)
lts_SARIMA.aic))
SARIMA(0, 1, 0)x(0, 0, 0, 12) - AIC:2251.3597196862966

SARIMA(0, 1, 0)x(0, 0, 1, 12) - AIC:1956.261461684334
SARIMA(0, 1, 0)x(0, 0, 2, 12) - AIC:1723.153364023482
SARIMA(0, 1, 0)x(1, 0, 0, 12) - AIC:1837.436602245668
SARIMA(0, 1, 0)x(1, 0, 1, 12) - AIC:1806.990530137327
SARIMA(0, 1, 0)x(1, 0, 2, 12) - AIC:1633.2108735940185
SARIMA(0, 1, 0)x(2, 0, 0, 12) - AIC:1648.3776153470858
SARIMA(0, 1, 0)x(2, 0, 1, 12) - AIC:1647.2054160640976
SARIMA(0, 1, 0)x(2, 0, 2, 12) - AIC:1630.9898054509874
SARIMA(0, 1, 1)x(0, 0, 0, 12) - AIC:2230.1629078505994
SARIMA(0, 1, 1)x(0, 0, 1, 12) - AIC:1923.7688649611964
SARIMA(0, 1, 1)x(0, 0, 2, 12) - AIC:1692.708957238344
SARIMA(0, 1, 1)x(1, 0, 0, 12) - AIC:1797.17958817917
SARIMA(0, 1, 1)x(1, 0, 1, 12) - AIC:1738.0973022178505
SARIMA(0, 1, 1)x(1, 0, 2, 12) - AIC:1570.1319652608092
SARIMA(0, 1, 1)x(2, 0, 0, 12) - AIC:1605.6751954493398
SARIMA(0, 1, 1)x(2, 0, 1, 12) - AIC:1599.2245087130186
SARIMA(0, 1, 1)x(2, 0, 2, 12) - AIC:1570.3683739995645
SARIMA(0, 1, 2)x(0, 0, 0, 12) - AIC:2187.441010220092
SARIMA(0, 1, 2)x(0, 0, 1, 12) - AIC:1887.2897618827865
SARIMA(0, 1, 2)x(0, 0, 2, 12) - AIC:1657.6640310415592
SARIMA(0, 1, 2)x(1, 0, 0, 12) - AIC:1790.0326332203936
SARIMA(0, 1, 2)x(1, 0, 1, 12) - AIC:1724.166738337171
SARIMA(0, 1, 2)x(1, 0, 2, 12) - AIC:1557.1603191173776
SARIMA(0, 1, 2)x(2, 0, 0, 12) - AIC:1603.9654774483727
SARIMA(0, 1, 2)x(2, 0, 1, 12) - AIC:1600.5438801521245
SARIMA(0, 1, 2)x(2, 0, 2, 12) - AIC:1557.1215626799462
SARIMA(1, 1, 0)x(0, 0, 0, 12) - AIC:2250.3181267386713
SARIMA(1, 1, 0)x(0, 0, 1, 12) - AIC:1954.3938339929728
SARIMA(1 1 0) (0 0 2 12) AIC 1721 2688476313904
SARIMA(1, 1, 0)x(0, 0, 2, 12) - AIC:1721.2688476313904
SARIMA(1, 1, 0)x(1, 0, 0, 12) - AIC:1811.2440279327332
SARIMA(1, 1, 0)x(1, 0, 1, 12) - AIC:1788.5343592980894
SARIMA(1, 1, 0)x(1, 0, 2, 12) - AIC:1616.4894412533986
SARIMA(1, 1, 0)x(2, 0, 0, 12) - AIC:1621.635508043298

SARIMA(1, 1, 0)x(2, 0, 1, 12) - AIC:1617.1356135615595
SARIMA(1, 1, 0)x(2, 0, 2, 12) - AIC:1616.541210075358
SARIMA(1, 1, 1)x(0, 0, 0, 12) - AIC:2204.934049198176
SARIMA(1, 1, 1)x(0, 0, 1, 12) - AIC:1907.3558977806388
SARIMA(1, 1, 1)x(0, 0, 2, 12) - AIC:1678.098135279112
SARIMA(1, 1, 1)x(1, 0, 0, 12) - AIC:1775.142446655006
SARIMA(1, 1, 1)x(1, 0, 1, 12) - AIC:1739.5403526721561
SARIMA(1, 1, 1)x(1, 0, 2, 12) - AIC:1571.3248864655484
SARIMA(1, 1, 1)x(2, 0, 0, 12) - AIC:1590.6161607076579
SARIMA(1, 1, 1)x(2, 0, 1, 12) - AIC:1586.3142236394447
SARIMA(1, 1, 1)x(2, 0, 2, 12) - AIC:1571.8069968891837
SARIMA(1, 1, 2)x(0, 0, 0, 12) - AIC:2188.463345041383
SARIMA(1, 1, 2)x(0, 0, 1, 12) - AIC:1889.7708318442067
SARIMA(1, 1, 2)x(0, 0, 2, 12) - AIC:1659.6291422585368
SARIMA(1, 1, 2)x(1, 0, 0, 12) - AIC:1771.825980269072
SARIMA(1, 1, 2)x(1, 0, 1, 12) - AIC:1723.9871816320992
SARIMA(1, 1, 2)x(1, 0, 2, 12) - AIC:1555.5842544351642
SARIMA(1, 1, 2)x(2, 0, 0, 12) - AIC:1588.421693151794
SARIMA(1, 1, 2)x(2, 0, 1, 12) - AIC:1585.5152985346901
SARIMA(1, 1, 2)x(2, 0, 2, 12) - AIC:1556.0802592562013
SARIMA(2, 1, 0)x(0, 0, 0, 12) - AIC:2227.302761872421
SARIMA(2, 1, 0)x(0, 0, 1, 12) - AIC:1946.4383435279221
SARIMA(2, 1, 0)x(0, 0, 2, 12) - AIC:1711.4123039734561
SARIMA(2, 1, 0)x(1, 0, 0, 12) - AIC:1780.764606593656
SARIMA(2, 1, 0)x(1, 0, 1, 12) - AIC:1756.9357346857084
SARIMA(2, 1, 0)x(1, 0, 2, 12) - AIC:1600.970220445633
SARIMA(2, 1, 0)x(2, 0, 0, 12) - AIC:1592.2403465602624
SARIMA(2, 1, 0)x(2, 0, 1, 12) - AIC:1587.6344986172633
SARIMA(2, 1, 0)x(2, 0, 2, 12) - AIC:1585.9191734973074
SARIMA(2, 1, 1)x(0, 0, 0, 12) - AIC:2199.8586131872835
SARIMA(2, 1, 1)x(0, 0, 1, 12) - AIC:1905.0209496747561
SARIMA(2, 1, 1)x(0, 0, 2, 12) - AIC:1675.4234079628686
SARIMA(2, 1, 1)x(1, 0, 0, 12) - AIC:1759.7569277616458
SARIMA(2, 1, 1)x(1, 0, 1, 12) - AIC:1740.0911247357424
SARIMA(2 1 1)x(1 0 2 12) - AIC:1571 9888282581596
SARIMA(2, 1, 1)x(1, 0, 2, 12) - AIC:1571.9888282581596
SARIMA(2, 1, 1)x(2, 0, 0, 12) - AIC:1577.1235065031422
SARIMA(2, 1, 1)x(2, 0, 1, 12) - AIC:1573.1596420398046
SARIMA(2, 1, 1)x(2, 0, 2, 12) - AIC:1571.9780296421281
SARIMA(2, 1, 2)x(0, 0, 0, 12) - AIC:2176.8698050945236
SARIMA(2, 1, 2)x(0, 0, 1, 12) - AIC:1891.5463663420826
SARIMA(2, 1, 2)x(0, 0, 2, 12) - AIC:1664.2974982219203
SARIMA(2, 1, 2)x(1, 0, 0, 12) - AIC:1757.214093119035
SARIMA(2, 1, 2)x(1, 0, 1, 12) - AIC:1725.6086059339216
SARIMA(2, 1, 2)x(1, 0, 2, 12) - AIC:1557.439140240327
SARIMA(2, 1, 2)x(2, 0, 0, 12) - AIC:1576.0455911149202
SARIMA(2, 1, 2)x(2, 0, 1, 12) - AIC:1573.5476000272845
SARIMA(2, 1, 2)x(2, 0, 2, 12) - AIC:1557.6893214191882
Out[310]:
param seasonal AIC
50 (1, 1, 2) (1, 0, 2, 12) 1555.584254
53 (1, 1, 2) (2, 0, 2, 12) 1556.080259
26 (0, 1, 2) (2, 0, 2, 12) 1557.121563
23 (0, 1, 2) (1, 0, 2, 12) 1557.160319
77 (2, 1, 2) (1, 0, 2, 12) 1557.439140
auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
SARIMAX Results
=======================================================================
===================
132
-770.792
1555.584
Time: 23:10:10 BIC
1574.095
Sample: 0 HQIC
1563.083
- 132
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6282 0.255 -2.462 0.014 -1.128
-0.128
ma.L1 -0.1041 0.225 -0.463 0.643 -0.545
0.337
ma.L2 -0.7276 0.154 -4.731 0.000 -1.029
-0.426
ar.S.L12 1.0439 0.014 72.799 0.000 1.016
1.072
ma.S.L12 -0.5550 0.098 -5.661 0.000 -0.747
-0.363
ma.S.L24 -0.1354 0.120 -1.133 0.257 -0.370
0.099
sigma2 1.506e+05 2.04e+04 7.398 0.000 1.11e+05
1.91e+05
=======================================================================
============
11.73
0.00
0.36
4.48
=======================================================================
============
Warnings:
(complex-step).
plt.show()
In [313]: predicted_auto_SARIMA_12 = results_auto_SARIMA_12.get_forecast(steps=le

n(test))
Out[314]:
0 1327.548225 388.400943 566.296365 2088.800086
1 1315.192780 402.062309 527.165135 2103.220425
2 1621.707620 402.055918 833.692501 2409.722738
3 1598.950391 407.292759 800.671253 2397.229530
4 1392.811018 408.022686 593.101248 2192.520788
In [315]: rmse = mean_squared_error(test['Sparkling'],predicted_auto_SARIMA_12.pr

edicted_mean,squared=False)
print(rmse)
528.4527303319784

,index=['SARIMA(1,1,2)(1,0,2,12)'])
resultsDf
Out[316]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
7. Build ARIMA/SARIMA models based on the cut-off

points of ACF and PACF on the training data and
evaluate this model on the test data using RMSE.
Manual SARIMA
In [317]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced D
ata Partial Autocorrelation')
plt.show()
In [318]: manual_ARIMA = ARIMA(train['Sparkling'].astype('float64'), order=(1,1,1
),freq='M')
results_manual_ARIMA = manual_ARIMA.fit()
print(results_manual_ARIMA.summary())
ARIMA Model Results

=======================================================================
=======
Dep. Variable: D.Sparkling No. Observations:
131
Model: ARIMA(1, 1, 1) Log Likelihood -1
113.507
Method: css-mle S.D. of innovations 1
171.377
235.014
Time: 23:10:12 BIC 2
246.515
239.687
- 12-31-1990
=======================================================================
==============
25 0.975]
-----------------------------------------------------------------------
--------------
const 6.7493 4.616 1.462 0.144 -2.2
99 15.797
ar.L1.D.Sparkling 0.4289 0.082 5.221 0.000 0.2
68 0.590
ma.L1.D.Sparkling -1.0000 0.019 -51.962 0.000 -1.0
38 -0.962
Roots
=======================================================================
======
quency
-----------------------------------------------------------------------
------
AR.1 2.3313 +0.0000j 2.3313
0.0000
MA.1 1.0000 +0.0000j 1.0000
0.0000
-----------------------------------------------------------------------
------
In [319]: predicted_manual_ARIMA = results_manual_ARIMA.forecast(steps=len(test))
In [320]: rmse = mean_squared_error(test['Sparkling'],predicted_manual_ARIMA[0],s

quared=False)
print(rmse)
1461.6785026377725

resultsDf
Out[321]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
Manual SARIMA
In [322]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Diff

erenced Data Autocorrelation')
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Dif
ferenced Data Patial Autocorrelation')
plt.show()
In [323]: df.plot()
plt.grid();
In [324]: (df1['Sparkling'].diff(6)).plot()
plt.grid();
In [325]: (df1['Sparkling'].diff(6)).diff().plot()
plt.grid();
In [326]: test_stationarity((train['Sparkling'].diff(6).dropna()).diff(1).dropna
())

p-value 6.683657e-10
#Lags Used 1.300000e+01
dtype: float64
In [327]: plot_acf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30
);
manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(0,1,2),
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
132
-814.410
1640.821
Time: 23:10:17 BIC
1657.023
Sample: 0 HQIC
1647.393
- 132
=======================================================================
=======
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.7621 0.107 -7.150 0.000 -0.971
-0.553
ma.L2 -0.1433 0.113 -1.270 0.204 -0.364
0.078
ar.S.L6 -1.0186 0.009 -119.432 0.000 -1.035
-1.002
ma.S.L6 0.5540 0.143 3.875 0.000 0.274
0.834
ma.S.L12 -0.8700 0.174 -4.988 0.000 -1.212
-0.528
sigma2 9.961e+04 2.41e+04 4.129 0.000 5.23e+04
1.47e+05
=======================================================================
============
34.60
0.00
0.61
5.46
=======================================================================
============
Warnings:
(complex-step).
In [329]: results_manual_SARIMA_6.plot_diagnostics()
plt.show()
In [330]: predicted_manual_SARIMA_6 = results_manual_SARIMA_6.get_forecast(steps=
len(test))
In [331]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()
Out[331]:
0 1453.675438 394.525918 680.418848 2226.932029
1 1176.610635 405.533762 381.779068 1971.442203
2 1764.541710 407.243079 966.359942 2562.723477
3 1554.880467 408.947104 753.358872 2356.402062
4 1354.487019 410.644070 549.639430 2159.334607
In [332]: rmse = mean_squared_error(test['Sparkling'],predicted_manual_SARIMA_6.p

redicted_mean,squared=False)
print(rmse)
558.4383293762605

,index=['SARIMA(0,1,0)(1,1,3,6)'])
resultsDf
Out[333]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
SARIMA(0,1,0)(1,1,3,6) 558.438329
8.Build a table with all the models built along with their
corresponding parameters and the respective RMSE
values on the test data.
In [334]: resultsDf
Out[334]:
Test RMSE
Test RMSE
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
SARIMA(0,1,0)(1,1,3,6) 558.438329
In [335]: print('Sorted by RMSE values on the Test Data:','\n',)

resultsDf.sort_values(by=['Test RMSE'])
Sorted by RMSE values on the Test Data:
Out[335]:
Test RMSE
SARIMA(1,1,2)(1,0,2,12) 528.452730
SARIMA(0,1,0)(1,1,3,6) 558.438329
SARIMA(1,1,2)(2,0,2,6) 626.880154
Test RMSE
ARIMA(0,1,2) 1374.976948
ARIMA(1,1,1) 1461.678503
9.Based on the model-building exercise, build the most

optimum model(s) on the complete data and predict 12
months into the future with appropriate confidence
intervals/bands.
plt.grid();
plt.title('Plot of Exponential Smoothing Predictions and the Acutal Val
ues');
In [337]: fullmodel = ExponentialSmoothing(df1,

trend='additive',
seasonal='multiplicative').fit(smooth
ing_level=0.15422095573741845,
smooth
ing_slope=1.3071916052588054e-21,
smooth
ing_seasonal=0.3713310279757222)
In [338]: RMSE_fullmodel = metrics.mean_squared_error(df1['Sparkling'],fullmodel.

fittedvalues,squared=False)
print('RMSE:',RMSE_fullmodel)
RMSE: 353.91247239957886
In [339]: # Getting the predictions for the same number of times stamps that are
present in the test data
prediction= fullmodel.forecast(steps=len(test))
In [340]: df1.plot()
prediction.plot();
In [341]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()
Out[341]:
1995-08-31 1189.662677 1885.063397 2580.464118
1995-09-30 lower_CI
1739.199777 prediction
2434.600498 upper_ci
3130.001218
1995-10-31 2503.481875 3198.882596 3894.283316
1995-11-30 3173.025555 3868.426275 4563.826996
1995-12-31 5340.512111 6035.912831 6731.313552
In [342]: # plot the forecast along with the confidence band
axis = df1.plot(label='Actual', figsize=(13,6))

pred_df['prediction'].plot(ax=axis, label='Forecast', alpha=0.5)
axis.fill_between(pred_df.index, pred_df['lower_CI'], pred_df['upper_c
i'], color='k', alpha=.15)
axis.set_xlabel('Year-Months')
axis.set_ylabel('Sales')
plt.grid()
plt.show()
END

SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF

Uploaded by

Copyright:

Available Formats

Problem:

For this particular assignment, the data of different types

182 1995-03 45.0

183 1995-04 52.0

184 1995-05 28.0

185 1995-06 40.0

186 1995-07 62.0

Out[6]: DatetimeIndex(['1980-01-31', '1980-02-29', '1980-03-31', '1980-04-30',

In [7]: df['Time_Stamp'] = pd.DataFrame(date,columns=['Month'])

0 1980-01 112.0 1980-01-31

1 1980-02 118.0 1980-02-29

2 1980-03 129.0 1980-03-31

3 1980-04 99.0 1980-04-30

4 1980-05 116.0 1980-05-31

Rose 185.0 90.394595 39.175344 28.0 63.0 86.0 112.0 267.0

In [15]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot

month_plot(df,ylabel='Rose Wine Production',ax=ax)

In [18]: monthly_sales_across_years = pd.pivot_table(df, values = 'Rose', column

5661 rows × 1 columns

In [30]: df_decade_sum = df.resample('10Y').sum()

fig, (axis1,axis2) = plt.subplots(2,1,sharex=True,figsize=(13,6))

# plot average RetailSales over time(year-month)

In [39]: decomposition = seasonal_decompose(df['Rose'],model='additive')

In [40]: trend = decomposition.trend

In [45]: deaseasonalized_ts = trend + residual

In [47]: train=df[df.index.year< 1991]

In [50]: print('First few rows of Training Data','\n',train.head(),'\n')

First few rows of Training Data

Last few rows of Training Data

First few rows of Test Data

Last few rows of Test Data

4. Build various exponential smoothing models on the training data

In [52]: train_time = [i+1 for i in range(len(train))]

Training Time instance

Model 1: Linear Regression

In [53]: LinearRegression_train = train.copy()

In [54]: LinearRegression_train['time'] = train_time

print('First few rows of Training Data','\n',LinearRegression_train.hea

First few rows of Training Data

Last few rows of Training Data

1990-11-30 110.0 131

First few rows of Test Data

Last few rows of Test Data

In [55]: from sklearn.linear_model import LinearRegression

In [58]: test_predictions_model1 = lr.predict(LinearRegression_test[['ti

In [59]: from sklearn import metrics

In [60]: ## Test Data - RMSE

For RegressionOnTime forecast on the Test Data, RMSE is 51.433

In [61]: resultsDf = pd.DataFrame({'Test RMSE': [rmse_model1_test]},index=['Regr

Model 2 : Naive Approach

In [62]: NaiveModel_train = train.copy()

In [63]: NaiveModel_test['naive'] = np.asarray(train['Rose'])[len(np.asarray(tra

In [64]: plt.plot(NaiveModel_train['Rose'], label='Train')

For Naive forecast on the Test Data, RMSE is 79.719

In [66]: resultsDf_2 = pd.DataFrame({'Test RMSE': [rmse_model2_test]},index=['Na

resultsDf = pd.concat([resultsDf, resultsDf_2])

In [67]: SimpleAverage_train = train.copy()

In [68]: SimpleAverage_test['mean_forecast'] = train['Rose'].mean()

1991-01-31 54.0 104.939394

1991-02-28 55.0 104.939394

1991-03-31 66.0 104.939394

1991-04-30 65.0 104.939394

1991-05-31 60.0 104.939394