Professional Documents
Culture Documents
SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF
SANDYA VB TIME SERIES FORECASTING PROJECT - HTML PDF
Dataset - Rose.csv
In [1]: import numpy as np
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 13, 6
1. Read the data as an appropriate Time Series data and plot the
data.
In [2]: df = pd.read_csv("Rose.csv")
In [3]: df.head()
Out[3]:
YearMonth Rose
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Rose
0 1980-01 112.0
1 1980-02 118.0
2 1980-03 129.0
3 1980-04 99.0
4 1980-05 116.0
In [4]: df.tail()
Out[4]:
YearMonth Rose
In [5]: df.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [6]: date = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date
In [8]: df.head()
Out[8]:
YearMonth Rose Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Rose Time_Stamp
In [9]: df['Time_Stamp']=pd.to_datetime(df['Time_Stamp'])
df= df.set_index('Time_Stamp')
df.drop(['YearMonth'],axis=1, inplace = True)
df.head()
Out[9]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0
In [10]: df.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.
In [11]: df.describe(include='all').T
Out[11]:
count mean std min 25% 50% 75% max
In [12]: df.isnull().sum()
Out[12]: Rose 2
dtype: int64
In [13]: df.shape
Out[13]: (187, 1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [14]: df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 187 entries, 1980-01-31 to 1995-07-31
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rose 185 non-null float64
dtypes: float64(1)
memory usage: 2.9 KB
In [16]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df.index.month_name(),y = df.values[:,0],ax=ax)
plt.xlabel('Monthly Rose Wine Sale');
plt.ylabel('Months');
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [17]: from statsmodels.graphics.tsaplots import month_plot
fig, ax = plt.subplots(figsize=(22,8))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Out[18]: Time_Stamp April August December February January July June March May Novembe
Time_Stamp
1980 99.0 129.0 267.0 118.0 112.0 118.0 168.0 129.0 116.0 150.0
1981 97.0 214.0 226.0 129.0 126.0 222.0 127.0 124.0 102.0 154.0
1982 97.0 117.0 169.0 77.0 89.0 117.0 121.0 82.0 127.0 134.0
1983 85.0 124.0 164.0 108.0 75.0 109.0 108.0 115.0 101.0 135.0
1984 87.0 142.0 159.0 85.0 88.0 87.0 87.0 112.0 91.0 139.0
1985 93.0 103.0 129.0 82.0 61.0 87.0 75.0 124.0 108.0 123.0
1986 71.0 118.0 141.0 65.0 57.0 110.0 67.0 67.0 76.0 107.0
1987 86.0 73.0 157.0 65.0 58.0 87.0 74.0 70.0 93.0 96.0
1988 66.0 77.0 135.0 115.0 63.0 79.0 83.0 70.0 67.0 100.0
1989 74.0 74.0 137.0 60.0 71.0 86.0 91.0 89.0 73.0 109.0
1990 77.0 70.0 132.0 69.0 43.0 78.0 76.0 73.0 69.0 110.0
1991 65.0 55.0 106.0 55.0 54.0 96.0 65.0 66.0 60.0 74.0
1992 53.0 52.0 91.0 47.0 34.0 67.0 55.0 56.0 53.0 58.0
1993 45.0 54.0 77.0 40.0 33.0 57.0 55.0 46.0 41.0 48.0
1994 48.0 NaN 84.0 35.0 30.0 NaN 45.0 42.0 44.0 63.0
1995 52.0 NaN NaN 39.0 30.0 62.0 40.0 45.0 28.0 NaN
In [19]: monthly_sales_across_years.plot(figsize=(20,10))
plt.grid()
plt.legend(loc='best');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [20]: df_yearly_sum = df.resample('A').sum()
df_yearly_sum.head()
Out[20]:
Rose
Time_Stamp
1980-12-31 1758.0
1981-12-31 1780.0
1982-12-31 1348.0
1983-12-31 1324.0
1984-12-31 1280.0
In [21]: df_yearly_sum.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [22]: df_yearly_mean = df.resample('Y').mean()
df_yearly_mean.head()
Out[22]:
Rose
Time_Stamp
1980-12-31 146.500000
1981-12-31 148.333333
1982-12-31 112.333333
1983-12-31 110.333333
1984-12-31 106.666667
In [23]: df_yearly_mean.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [24]: df_quarterly_sum = df.resample('Q').sum()
df_quarterly_sum.head()
Out[24]:
Rose
Time_Stamp
1980-03-31 359.0
1980-06-30 383.0
1980-09-30 452.0
1980-12-31 564.0
1981-03-31 379.0
In [25]: df_quarterly_sum.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [26]: df_quarterly_mean = df.resample('Q').mean()
df_quarterly_mean.head()
Out[26]:
Rose
Time_Stamp
1980-03-31 119.666667
1980-06-30 127.666667
1980-09-30 150.666667
1980-12-31 188.000000
1981-03-31 126.333333
In [27]: df_quarterly_mean.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [28]: df_daily_sum = df.resample('D').sum()
df_daily_sum
Out[28]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-01 0.0
1980-02-02 0.0
1980-02-03 0.0
1980-02-04 0.0
... ...
1995-07-27 0.0
1995-07-28 0.0
1995-07-29 0.0
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose
Time_Stamp
1995-07-30 0.0
1995-07-31 62.0
In [29]: df_daily_sum.plot()
plt.grid();
Out[30]:
Rose
Time_Stamp
1980-12-31 1758.0
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose
Time_Stamp
1990-12-31 12094.0
2000-12-31 2871.0
In [31]: df_decade_sum.plot();
plt.grid()
In [32]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df['Rose'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Rose Wine Sales');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [33]: # group by date and get average RetailSales, and precent change
average = df.groupby(df.index)["Rose"].mean()
pct_change = df.groupby(df.index)["Rose"].sum().pct_change()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [34]: df['1994']
Out[34]:
Rose
Time_Stamp
1994-01-31 30.0
1994-02-28 35.0
1994-03-31 42.0
1994-04-30 48.0
1994-05-31 44.0
1994-06-30 45.0
1994-07-31 NaN
1994-08-31 NaN
1994-09-30 46.0
1994-10-31 51.0
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose
Time_Stamp
1994-11-30 63.0
1994-12-31 84.0
In [35]: df.interpolate(methods='spline',order=3,inplace=True)
In [36]: df['1994']
Out[36]:
Rose
Time_Stamp
1994-01-31 30.000000
1994-02-28 35.000000
1994-03-31 42.000000
1994-04-30 48.000000
1994-05-31 44.000000
1994-06-30 45.000000
1994-07-31 45.333333
1994-08-31 45.666667
1994-09-30 46.000000
1994-10-31 51.000000
1994-11-30 63.000000
1994-12-31 84.000000
In [37]: df.isna().sum()
Out[37]: Rose 0
dtype: int64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [38]: from statsmodels.tsa.seasonal import seasonal_decompose
print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Name: trend, dtype: float64
Seasonality
Time_Stamp
1980-01-31 -27.908647
1980-02-29 -17.435632
1980-03-31 -9.285830
1980-04-30 -15.098330
1980-05-31 -10.196544
1980-06-30 -7.678687
1980-07-31 4.896908
1980-08-31 5.499686
1980-09-30 2.774686
1980-10-31 1.871908
1980-11-30 16.846908
1980-12-31 55.713575
Name: seasonal, dtype: float64
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 -33.980241
1980-08-31 -24.624686
1980-09-30 53.850314
1980-10-31 -2.955241
1980-11-30 -14.263575
1980-12-31 66.161425
Name: resid, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [41]: deaseasonalized_ts = trend + residual
deaseasonalized_ts.head(12)
Out[41]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 113.103092
1980-08-31 123.500314
1980-09-30 202.225314
1980-10-31 145.128092
1980-11-30 133.153092
1980-12-31 211.286425
dtype: float64
In [42]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [43]: decomposition = seasonal_decompose(df['Rose'],model='multiplicative')
decomposition.plot();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [44]: trend = decomposition.trend
seasonality = decomposition.seasonal
residual = decomposition.resid
print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.083333
1980-08-31 148.125000
1980-09-30 148.375000
1980-10-31 148.083333
1980-11-30 147.416667
1980-12-31 145.125000
Name: trend, dtype: float64
Seasonality
Time_Stamp
1980-01-31 0.670111
1980-02-29 0.806163
1980-03-31 0.901164
1980-04-30 0.854024
1980-05-31 0.889415
1980-06-30 0.923985
1980-07-31 1.058038
1980-08-31 1.035881
1980-09-30 1.017648
1980-10-31 1.022573
1980-11-30 1.192349
1980-12-31 1.628646
Name: seasonal, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 0.758258
1980-08-31 0.840720
1980-09-30 1.357674
1980-10-31 0.970771
1980-11-30 0.853378
1980-12-31 1.129646
Name: resid, dtype: float64
Out[45]: Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 147.841592
1980-08-31 148.965720
1980-09-30 149.732674
1980-10-31 149.054104
1980-11-30 148.270045
1980-12-31 146.254646
dtype: float64
In [46]: df.plot()
deaseasonalized_ts.plot()
plt.legend(["Original Time Series", "Time Series without Seasonality Co
mponent"]);
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
3. Split the data into training and test. The test data should start in
1991.
In [48]: print(train.shape)
(132, 1)
In [49]: print(test.shape)
(55, 1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print('First few rows of Test Data','\n',test.head(),'\n')
print('Last few rows of Test Data','\n',test.tail(),'\n')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [51]: train['Rose'].plot(figsize=(13,5), fontsize=14)
test['Rose'].plot(figsize=(13,5), fontsize=14)
plt.grid()
plt.legend(['Training Data','Test Data'])
plt.show()
In [56]: lr = LinearRegression()
In [57]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Rose'].
values)
Out[57]: LinearRegression()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.figure(figsize=(13,6))
plt.plot( train['Rose'], label='Train')
plt.plot(test['Rose'], label='Test')
plt.plot(LinearRegression_test['RegOnTime'], label='Regression On Time_
Test Data')
plt.legend(loc='best')
plt.grid();
rmse_model1_test = metrics.mean_squared_error(test['Rose'],test_predict
ions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
essionOnTime'])
resultsDf
Out[61]:
Test RMSE
RegressionOnTime 51.433312
Out[63]: Time_Stamp
1991-01-31 132.0
1991-02-28 132.0
1991-03-31 132.0
1991-04-30 132.0
1991-05-31 132.0
Name: naive, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [65]: ## Test Data - RMSE
rmse_model2_test = metrics.mean_squared_error(test['Rose'],NaiveModel_t
est['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))
Out[66]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 3: Simple Average
Out[68]:
Rose mean_forecast
Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [70]: ## Test Data - RMSE
rmse_model3_test = metrics.mean_squared_error(test['Rose'],SimpleAverag
e_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))
Out[71]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
SimpleAverageModel 53.460570
Out[72]:
Rose
Time_Stamp
1980-01-31 112.0
1980-02-29 118.0
1980-03-31 129.0
1980-04-30 99.0
1980-05-31 116.0
MovingAverage.head()
Out[73]:
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
plt.plot(MovingAverage['Rose'], label='Train')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_4'], label='4 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.plot(MovingAverage['Trailing_9'],label = '9 Point Moving Average')
plt.title("Rose Wine Moving Average")
plt.legend(loc = 'best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [76]: ## Plotting on both the Training and Test data
plt.plot(trailing_MovingAverage_train['Rose'], label='Train')
plt.plot(trailing_MovingAverage_test['Rose'], label='Test')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [77]: ## Test Data - RMSE --> 2 point Trailing MA
rmse_model4_test2 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_2'],squared=False)
print("For 2 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test2))
rmse_model4_test4 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_4'],squared=False)
print("For 4 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test4))
rmse_model4_test6 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_6'],squared=False)
print("For 6 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test6))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## Test Data - RMSE --> 9 point Trailing MA
rmse_model4_test9 = metrics.mean_squared_error(test['Rose'],trailing_Mo
vingAverage_test['Trailing_9'],squared=False)
print("For 9 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f " %(rmse_model4_test9))
For 2 point Moving Average Model forecast on the Training Data, RMSE i
s 11.529
For 4 point Moving Average Model forecast on the Training Data, RMSE i
s 14.451
For 6 point Moving Average Model forecast on the Training Data, RMSE i
s 14.566
For 9 point Moving Average Model forecast on the Training Data, RMSE i
s 14.728
Out[78]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9 i tT ili M i A 14 727630
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
9pointTrailingMovingAverage 14.727630
Test RMSE
In [83]: model_SES_autofit.params
Out[84]:
Rose predict
Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Rose predict
Time_Stamp
plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')
plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.098 Predictions for Rose Wine');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [86]: ## Test Data
rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Rose'],SES_te
st['predict'],squared=False)
print("For Alpha =0.098 Simple Exponential Smoothing Model forecast on
the Test Data, RMSE is %3.3f" %(rmse_model5_test_1))
For Alpha =0.098 Simple Exponential Smoothing Model forecast on the Tes
t Data, RMSE is 36.796
Out[87]:
Test RMSE
RegressionOnTime 51.433312
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
In [88]: ## First we will define an empty dataframe to store our values from the
loop
Out[88]:
Alpha Values Train RMSE Test RMSE
rmse_model5_train_i = metrics.mean_squared_error(SES_train['Rose'],
SES_train['predict',i],squared=False)
rmse_model5_test_i = metrics.mean_squared_error(SES_test['Rose'],SE
S_test['predict',i],squared=False)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
,'Test RMSE':rmse_model5_test_i},
ignore_index=True)
Out[90]:
Alpha Values Train RMSE Test RMSE
plt.plot(SES_train['Rose'], label='Train')
plt.plot(SES_test['Rose'], label='Test')
plt.legend(loc='best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [92]: resultsDf_6_1 = pd.DataFrame({'Test RMSE': [resultsDf_6.sort_values(by=
['Test RMSE'],ascending=True).values[0][2]]}
,index=['Alpha=0.3,SimpleExponentialSmoothin
g'])
Out[92]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,SimpleExponentialSmoothing 47.504821
In [95]: ## First we will define an empty dataframe to store our values from the
loop
Out[95]:
Alpha Values Beta Values Train RMSE Test RMSE
rmse_model6_train = metrics.mean_squared_error(DES_train['Rose'
],DES_train['predict',i,j],squared=False)
rmse_model6_test = metrics.mean_squared_error(DES_test['Rose'],
DES_test['predict',i,j],squared=False)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf_7 = resultsDf_7.append({'Alpha Values':i,'Beta Value
s':j,'Train RMSE':rmse_model6_train
,'Test RMSE':rmse_model6_test
}, ignore_index=True)
In [97]: resultsDf_7
Out[97]:
Alpha Values Beta Values Train RMSE Test RMSE
80 rows × 4 columns
Out[98]:
Alpha Values Beta Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Alpha Values Beta Values Train RMSE Test RMSE
plt.plot(DES_train['Rose'], label='Train')
plt.plot(DES_test['Rose'], label='Test')
plt.legend(loc='best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lSmoothing'])
Out[100]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
In [104]: model_TES_autofit.params
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
'smoothing_slope': 0.04843854924060428,
'smoothing_seasonal': 0.0,
'damping_slope': nan,
'initial_level': 76.65565156766287,
'initial_slope': 0.0,
'initial_seasons': array([1.47550285, 1.65927163, 1.80572657, 1.588888
44, 1.77822725,
1.92604397, 2.11649487, 2.25135227, 2.11690616, 2.08112861,
2.40927315, 3.30448176]),
'use_boxcox': False,
'lamda': None,
'remove_bias': False}
TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()
Out[105]:
Rose auto_predict
Time_Stamp
plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')
plt.plot(TES_test['auto_predict'], label='Alpha=0.106,Beta=0.048,Gamma=
0.0,TripleExponentialSmoothing predictions on Test Set')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title("Alpha= 0.106, Beta=0.048, Gamma=0.0 Triple Exponential Smoot
hing Rose Wine")
plt.legend(loc='best')
plt.grid();
rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Rose'],TES_te
st['auto_predict'],squared=False)
print("For Alpha=0.106,Beta=0.048,Gamma=0.0, Triple Exponential Smoothi
ng Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_test_
1))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf = pd.concat([resultsDf, resultsDf_8_1])
resultsDf
Out[108]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
In [109]: ## First we will define an empty dataframe to store our values from the
loop
Out[109]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))
rmse_model8_train = metrics.mean_squared_error(TES_train['R
ose'],TES_train['predict',i,j,k],squared=False)
rmse_model8_test = metrics.mean_squared_error(TES_test['Ros
e'],TES_test['predict',i,j,k],squared=False)
Out[111]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
In [112]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination
plt.plot(TES_train['Rose'], label='Train')
plt.plot(TES_test['Rose'], label='Test')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title("Alpha= 0.3, Beta=0.4, Gamma=0.3 Triple Exponential Smoothing
Rose Wine")
plt.legend(loc='best')
plt.grid();
Out[113]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation For Rose Wine Producti
on')
plt.show(block=False)
In [115]: test_stationarity(df['Rose'])
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Number of Observations Used 173.000000
Critical Value (1%) -3.468726
Critical Value (5%) -2.878396
Critical Value (10%) -2.575756
dtype: float64
In [116]: test_stationarity(df['Rose'].diff().dropna())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
6. Build an automated version of the ARIMA/SARIMA model in which
the parameters are selected using the lowest Akaike Information
Criteria (AIC) on the training data and evaluate this model on the test
data using RMSE.
In [118]: plot_acf(df['Rose'],lags=50)
plot_acf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Difference
d Data Autocorrelation')
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [119]: plot_pacf(df['Rose'],lags=50)
plot_pacf(df['Rose'].diff().dropna(),lags=50,title='Rose Wine Differenc
ed Data Partial Autocorrelation')
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [120]: test_stationarity(train['Rose'])
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Results of Dickey-Fuller Test:
Test Statistic -2.164250
p-value 0.219476
#Lags Used 13.000000
Number of Observations Used 118.000000
Critical Value (1%) -3.487022
Critical Value (5%) -2.886363
Critical Value (10%) -2.580009
dtype: float64
In [121]: test_stationarity(train['Rose'].diff().dropna())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Results of Dickey-Fuller Test:
Test Statistic -6.592372e+00
p-value 7.061944e-09
#Lags Used 1.200000e+01
Number of Observations Used 1.180000e+02
Critical Value (1%) -3.487022e+00
Critical Value (5%) -2.886363e+00
Critical Value (10%) -2.580009e+00
dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Some parameter combinations for the Model...
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
Out[123]:
param AIC
ARIMA(0, 1, 0) - AIC:1335.1526583086775
ARIMA(0, 1, 1) - AIC:1280.7261830464054
ARIMA(0, 1, 2) - AIC:1276.8353768178954
ARIMA(1, 1, 0) - AIC:1319.3483105801852
ARIMA(1, 1, 1) - AIC:1277.775753909814
ARIMA(1, 1, 2) - AIC:1277.3592237914563
ARIMA(2, 1, 0) - AIC:1300.6092611743886
ARIMA(2, 1, 1) - AIC:1279.0456894093122
ARIMA(2, 1, 2) - AIC:1279.298693936773
In [125]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
ARIMA_AIC.sort_values(by='AIC',ascending=True)
Out[125]:
param AIC
2 (0, 1, 2) 1276.835377
5 (1, 1, 2) 1277.359224
4 (1, 1, 1) 1277.775754
7 (2, 1, 1) 1279.045689
8 (2, 1, 2) 1279.298694
1 (0, 1, 1) 1280.726183
6 (2, 1, 0) 1300.609261
3 (1, 1, 0) 1319.348311
0 (0, 1, 0) 1335.152658
results_auto_ARIMA = auto_ARIMA.fit()
print(results_auto_ARIMA.summary())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
281.509
- 12-31-1990
=======================================================================
=========
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
---------
const -0.4885 0.085 -5.742 0.000 -0.655
-0.322
ma.L1.D.Rose -0.7601 0.101 -7.499 0.000 -0.959
-0.561
ma.L2.D.Rose -0.2398 0.095 -2.518 0.012 -0.427
-0.053
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
MA.1 1.0001 +0.0000j 1.0001
0.0000
MA.2 -4.1695 +0.0000j 4.1695
0.5000
-----------------------------------------------------------------------
------
15.618912366977964
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [129]: resultsDf_9 = pd.DataFrame({'Test RMSE': [rmse]}
,index=['ARIMA(0,1,2)'])
resultsDf_9
Out[129]:
Test RMSE
ARIMA(0,1,2) 15.618912
Out[130]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
ARIMA(0,1,2) 15.618912
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Automated SARIMA Model
Setting the seasonality as 6 for the first iteration of the auto SARIMA model.
We see that there can be a seasonality of 6 as well as 12. We will run our auto SARIMA models
by setting seasonality both as 6 and 12.
Setting the seasonality as 6 for the first iteration of the auto SARIMA model.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
pdq = list(itertools.product(p, d, q))
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))
Out[133]:
param seasonal AIC
results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)
In [135]: SARIMA_AIC.sort_values(by=['AIC']).head()
Out[135]:
param seasonal AIC
auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 2),
seasonal_order=(2, 0, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_6 = auto_SARIMA_6.fit(maxiter=1000)
print(results_auto_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 6) Log Likelihood
-512.828
Date: Sun, 20 Jun 2021 AIC
1041.656
Time: 23:03:40 BIC
1063.685
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1050.598
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.5939 0.152 -3.899 0.000 -0.892
-0.295
ma.L1 -0.1954 841.392 -0.000 1.000 -1649.294 1
648.903
ma.L2 -0.8046 676.954 -0.001 0.999 -1327.610 1
326.000
ar.S.L6 -0.0625 0.035 -1.764 0.078 -0.132
0.007
ar.S.L12 0.8451 0.039 21.885 0.000 0.769
0.921
ma.S.L6 0.2226 988.816 0.000 1.000 -1937.820 1
938.266
ma.S.L12 -0.7774 768.775 -0.001 0.999 -1507.549 1
505.994
sigma2 335.2016 4.16e+05 0.001 0.999 -8.14e+05
8.15e+05
=======================================================================
============
Ljung-Box (Q): 15.89 Jarque-Bera (JB):
56.68
Prob(Q): 1.00 Prob(JB):
0.00
Heteroskedasticity (H): 0.47 Skew:
0.52
Prob(H) (two-sided): 0.02 Kurtosis:
6.26
=======================================================================
============
Warnings:
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [137]: results_auto_SARIMA_6.plot_diagnostics()
plt.show()
In [139]: predicted_auto_SARIMA_6.summary_frame(alpha=0.05).head()
Out[139]:
y mean mean_se mean_ci_lower mean_ci_upper
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper
26.133554443168514
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[141]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
Setting the seasonality as 12 for the iteration of the auto SARIMA model.
Out[143]:
param seasonal AIC
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
for param_seasonal in model_pdq:
SARIMA_model = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=param,
seasonal_order=param_season
al,
enforce_stationarity=False,
enforce_invertibility=False
)
results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA(2, 1, 1)x(0, 0, 2, 12) - AIC:922.9408472105727
SARIMA(2, 1, 1)x(1, 0, 0, 12) - AIC:1071.424960152966
SARIMA(2, 1, 1)x(1, 0, 1, 12) - AIC:1052.9244468735646
SARIMA(2, 1, 1)x(1, 0, 2, 12) - AIC:916.2424915017058
SARIMA(2, 1, 1)x(2, 0, 0, 12) - AIC:896.5181606626418
SARIMA(2, 1, 1)x(2, 0, 1, 12) - AIC:897.6399567764543
SARIMA(2, 1, 1)x(2, 0, 2, 12) - AIC:899.4835866487186
SARIMA(2, 1, 2)x(0, 0, 0, 12) - AIC:1253.9102116555646
SARIMA(2, 1, 2)x(0, 0, 1, 12) - AIC:1085.964368136149
SARIMA(2, 1, 2)x(0, 0, 2, 12) - AIC:916.3259134019722
SARIMA(2, 1, 2)x(1, 0, 0, 12) - AIC:1073.291271116631
SARIMA(2, 1, 2)x(1, 0, 1, 12) - AIC:1044.190935848489
SARIMA(2, 1, 2)x(1, 0, 2, 12) - AIC:907.6661538095441
SARIMA(2, 1, 2)x(2, 0, 0, 12) - AIC:897.3464978203967
SARIMA(2, 1, 2)x(2, 0, 1, 12) - AIC:898.3782488307854
SARIMA(2, 1, 2)x(2, 0, 2, 12) - AIC:890.6688480021876
In [145]: SARIMA_AIC.sort_values(by=['AIC']).head()
Out[145]:
param seasonal AIC
auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(0, 1, 2),
seasonal_order=(2, 0, 2, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_12 = auto_SARIMA_12.fit(maxiter=1000)
print(results_auto_SARIMA_12.summary())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMAX Results
=======================================================================
===================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(0, 1, 2)x(2, 0, 2, 12) Log Likelihood
-436.969
Date: Sun, 20 Jun 2021 AIC
887.938
Time: 23:05:18 BIC
906.448
Sample: 0 HQIC
895.437
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.8428 174.448 -0.005 0.996 -342.754
341.069
ma.L2 -0.1572 27.457 -0.006 0.995 -53.971
53.657
ar.S.L12 0.3467 0.079 4.374 0.000 0.191
0.502
ar.S.L24 0.3023 0.076 3.996 0.000 0.154
0.451
ma.S.L12 0.0767 0.133 0.577 0.564 -0.184
0.337
ma.S.L24 -0.0726 0.146 -0.498 0.618 -0.358
0.213
sigma2 251.3159 4.38e+04 0.006 0.995 -8.57e+04
8.62e+04
=======================================================================
============
Ljung-Box (Q): 24.56 Jarque-Bera (JB):
2.33
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Prob(Q): 0.97 Prob(JB):
0.31
Heteroskedasticity (H): 0.88 Skew:
0.37
Prob(H) (two-sided): 0.70 Kurtosis:
3.03
=======================================================================
============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [147]: results_auto_SARIMA_12.plot_diagnostics()
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [149]: predicted_auto_SARIMA_12.summary_frame(alpha=0.05).head()
Out[149]:
y mean mean_se mean_ci_lower mean_ci_upper
26.929368240084877
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[151]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
Manual ARIMA
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [153]: manual_ARIMA = ARIMA(train['Rose'].astype('float64'), order=(1,1,1),fre
q='M')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
results_manual_ARIMA = manual_ARIMA.fit()
print(results_manual_ARIMA.summary())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.0000
MA.1 1.0001 +0.0000j 1.0001
0.0000
-----------------------------------------------------------------------
------
15.734258659597247
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[156]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
Manual SARIMA
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [158]: df.plot()
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [159]: (df['Rose'].diff(6)).plot()
plt.grid();
In [160]: (df['Rose'].diff(6)).diff().plot()
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [161]: test_stationarity((train['Rose'].diff(6).dropna()).diff(1).dropna())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
p-value 1.418693e-09
#Lags Used 1.300000e+01
Number of Observations Used 1.110000e+02
Critical Value (1%) -3.490683e+00
Critical Value (5%) -2.887952e+00
Critical Value (10%) -2.580857e+00
dtype: float64
In [162]: plot_acf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df['Rose'].diff(6).dropna()).diff(1).dropna(),lags=30);
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [163]: import statsmodels.api as sm
manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Rose'].values,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 1)x(1, 1, 1, 6) Log Likelihood
-545.056
Date: Sun, 20 Jun 2021 AIC
1100.112
Time: 23:05:23 BIC
1113.923
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1105.719
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 0.1764 0.070 2.516 0.012 0.039
0.314
ma.L1 -1.0452 0.059 -17.608 0.000 -1.161
-0.929
ar.S.L6 -0.6660 0.053 -12.552 0.000 -0.770
-0.562
ma.S.L6 -0.4840 0.093 -5.213 0.000 -0.666
-0.302
sigma2 579.5488 67.298 8.612 0.000 447.648
711.450
=======================================================================
============
Ljung-Box (Q): 34.58 Jarque-Bera (JB):
143.27
Prob(Q): 0.71 Prob(JB):
0.00
Heteroskedasticity (H): 0.22 Skew:
0.18
Prob(H) (two-sided): 0.00 Kurtosis:
8.41
=======================================================================
============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [164]: results_manual_SARIMA_6.plot_diagnostics()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.show()
In [166]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()
Out[166]:
y mean mean_se mean_ci_lower mean_ci_upper
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
print(rmse)
20.96410963443785
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[168]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
ARIMA(0,1,2) 15.618912
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
SARIMA(1,1,1)(1,1,1,6) 20.964110
8. Build a table (create a data frame) with all the models built along
with their corresponding parameters and the respective RMSE values
on the test data.
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[169]:
Test RMSE
RegressionOnTime 51.433312
NaiveModel 79.718773
SimpleAverageModel 53.460570
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Alpha=0.3,SimpleExponentialSmoothing 47.504821
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
ARIMA(0,1,2) 15.618912
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
ARIMA(1,1,1) 15.734259
SARIMA(1,1,1)(1,1,1,6) 20.964110
SARIMA(1,1,2)(2,0,2,6) 20.964110
In [170]: print('Sorted by RMSE values on the Test Data for Rose Wine sale:','\n'
,)
resultsDf.sort_values(by=['Test RMSE'])
Sorted by RMSE values on the Test Data for Rose Wine sale:
Out[170]:
Test RMSE
Alpha=0.3,Beta=0.4,Gamma=0.3,TripleExponentialSmoothing 10.945435
2pointTrailingMovingAverage 11.529278
4pointTrailingMovingAverage 14.451403
6pointTrailingMovingAverage 14.566327
9pointTrailingMovingAverage 14.727630
ARIMA(0,1,2) 15.618912
ARIMA(1,1,1) 15.734259
Alpha=0.106,Beta=0.048,Gamma=0.0,TripleExponentialSmoothing 17.369489
SARIMA(1,1,2)(2,0,2,6) 20.964110
SARIMA(1,1,1)(1,1,1,6) 20.964110
SARIMA(1,1,2)(2,0,2,6) 26.133554
SARIMA(0,1,2)(2,0,2,12) 26.929368
Alpha=0.098,SimpleExponentialSmoothing 36.796244
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,SimpleExponentialSmoothing 47.504821
RegressionOnTime 51.433312
SimpleAverageModel 53.460570
NaiveModel 79.718773
Alpha=0.3,Beta=0.1,DoubleExponentialSmoothing 98.653317
plt.plot(train['Rose'], label='Train')
plt.plot(test['Rose'], label='Test')
plt.legend(loc='best')
plt.grid();
plt.title('Plot of Exponential Smoothing Acutal Values for Rose Wine Pr
oductions');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [172]: fullmodel = ExponentialSmoothing(df,
trend='additive',
seasonal='multiplicative').fit(smooth
ing_level=0.3,
smooth
ing_slope=0.4,
smooth
ing_seasonal=0.3)
print('RMSE:',RMSE_fullmodel)
RMSE: 24.266535961104537
In [174]: # Getting the predictions for the same number of times stamps that are
present in the test data
prediction = fullmodel.forecast(steps=len(test))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [175]: df.plot()
prediction.plot();
In [176]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()
Out[176]:
lower_CI prediction upper_ci
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
END
1. Read the data as an appropriate Time Series data and plot the
data.
In [179]: df1.head()
Out[179]:
YearMonth Sparkling
0 1980-01 1686
1 1980-02 1591
2 1980-03 2304
3 1980-04 1712
4 1980-05 1471
In [180]: df1.tail()
Out[180]:
YearMonth Sparkling
In [181]: df1.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [182]: date1 = pd.date_range(start='01/01/1980', end='08/01/1995', freq='M')
date1
Out[183]:
YearMonth Sparkling Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
YearMonth Sparkling Time_Stamp
In [184]: df1['Time_Stamp']=pd.to_datetime(df1['Time_Stamp'])
df1= df1.set_index('Time_Stamp')
df1.drop(['YearMonth'],axis=1, inplace = True)
df1.head()
Out[184]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471
In [185]: df1.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
2. Perform appropriate Exploratory Data Analysis to understand the
data and also perform decomposition.
In [186]: df1.describe(include='all').T
Out[186]:
count mean std min 25% 50% 75% max
In [187]: df1.isnull().sum()
Out[187]: Sparkling 0
dtype: int64
In [188]: df1.shape
Out[188]: (187, 1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [189]: _, ax = plt.subplots(figsize=(30,10)) #yearly plot
sns.boxplot(x = df1.index.year,y = df1.values[:,0],ax=ax)
plt.xlabel('Yearly Sparkling Wine Sale');
plt.ylabel('Years');
plt.grid();
In [190]: _, ax = plt.subplots(figsize=(22,8))
sns.boxplot(x = df1.index.month_name(),y = df1.values[:,0],ax=ax)
plt.xlabel('Monthly Sparkling Wine Sale');
plt.ylabel('Monthls');
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [191]: from statsmodels.graphics.tsaplots import month_plot
Out[192]:
Time_Stamp April August December February January July June March May Nove
Time_Stamp
1980 1712.0 2453.0 5179.0 1591.0 1686.0 1966.0 1377.0 2304.0 1471.0 4
1981 1976.0 2472.0 4551.0 1523.0 1530.0 1781.0 1480.0 1633.0 1170.0 3
1982 1790.0 1897.0 4524.0 1329.0 1510.0 1954.0 1449.0 1518.0 1537.0 3
1983 1375.0 2298.0 4923.0 1638.0 1609.0 1600.0 1245.0 2030.0 1320.0 3
1984 1789.0 3159.0 5274.0 1435.0 1609.0 1597.0 1404.0 2061.0 1567.0 4
1985 1589.0 2512.0 5434.0 1682.0 1771.0 1645.0 1379.0 1846.0 1896.0 4
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Time_Stamp April August December February January July June March May Nove
Time_Stamp
1986 1605.0 3318.0 5891.0 1523.0 1606.0 2584.0 1403.0 1577.0 1765.0 3
1987 1935.0 1930.0 7242.0 1442.0 1389.0 1847.0 1250.0 1548.0 1518.0 4
1988 2336.0 1645.0 6757.0 1779.0 1853.0 2230.0 1661.0 2108.0 1728.0 4
1989 1650.0 1968.0 6694.0 1394.0 1757.0 1971.0 1406.0 1982.0 1654.0 4
1990 1628.0 1605.0 6047.0 1321.0 1720.0 1899.0 1457.0 1859.0 1615.0 4
1991 1279.0 1857.0 6153.0 2049.0 1902.0 2214.0 1540.0 1874.0 1432.0 3
1992 1997.0 1773.0 6119.0 1667.0 1577.0 2076.0 1625.0 1993.0 1783.0 4
1993 2121.0 2795.0 6410.0 1564.0 1494.0 2048.0 1515.0 1898.0 1831.0 4
1994 1725.0 1495.0 5999.0 1968.0 1197.0 2031.0 1693.0 1720.0 1674.0 3
1995 1862.0 NaN NaN 1402.0 1070.0 2031.0 1688.0 1897.0 1670.0
In [193]: monthly_sales_across_years.plot()
plt.grid()
plt.legend(loc='best');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [194]: df_yearly_sum1 = df1.resample('A').sum()
df_yearly_sum1.head()
Out[194]:
Sparkling
Time_Stamp
1980-12-31 28406
1981-12-31 26227
1982-12-31 25321
1983-12-31 26180
1984-12-31 28431
In [195]: df_yearly_sum1.plot();
plt.grid()
plt.xlabel('Sum of the Observations of each year');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [196]: df_yearly_mean1 = df1.resample('Y').mean()
df_yearly_mean1.head()
Out[196]:
Sparkling
Time_Stamp
1980-12-31 2367.166667
1981-12-31 2185.583333
1982-12-31 2110.083333
1983-12-31 2181.666667
1984-12-31 2369.250000
In [197]: df_yearly_mean1.plot();
plt.grid()
plt.xlabel('Mean of the Observations of each year');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [198]: df_quarterly_sum1 = df1.resample('Q').sum()
df_quarterly_sum1.head()
Out[198]:
Sparkling
Time_Stamp
1980-03-31 5581
1980-06-30 4560
1980-09-30 6403
1980-12-31 11862
1981-03-31 4686
In [199]: df_quarterly_sum1.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [200]: df_quarterly_mean1 = df1.resample('Q').mean()
df_quarterly_mean1.head()
Out[200]:
Sparkling
Time_Stamp
1980-03-31 1860.333333
1980-06-30 1520.000000
1980-09-30 2134.333333
1980-12-31 3954.000000
1981-03-31 1562.000000
In [201]: df_quarterly_mean1.plot();
plt.grid()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [202]: df_daily_sum1 = df1.resample('D').sum()
df_daily_sum1
Out[202]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-01 0
1980-02-02 0
1980-02-03 0
1980-02-04 0
... ...
1995-07-27 0
1995-07-28 0
1995-07-29 0
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling
Time_Stamp
1995-07-30 0
1995-07-31 2031
In [203]: df_daily_sum1.plot()
plt.grid();
Out[204]:
Sparkling
Time_Stamp
1980-12-31 28406
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling
Time_Stamp
1990-12-31 288893
2000-12-31 131953
In [205]: df_decade_sum1.plot();
plt.grid()
In [206]: # statistics
from statsmodels.distributions.empirical_distribution import ECDF
cdf = ECDF(df1['Sparkling'])
plt.plot(cdf.x, cdf.y, label = "statmodels");
plt.grid()
plt.xlabel('Sparkling Wine Sales');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [207]: # group by date and get average RetailSales, and precent change
average = df1.groupby(df1.index)["Sparkling"].mean()
pct_change = df1.groupby(df1.index)["Sparkling"].sum().pct_change()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [208]: from statsmodels.tsa.seasonal import seasonal_decompose
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [210]: trend = decomposition1.trend
seasonality = decomposition1.seasonal
residual = decomposition1.resid
print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Name: trend, dtype: float64
Seasonality
Time_Stamp
1980-01-31 -854.260599
1980-02-29 -830.350678
1980-03-31 -592.356630
1980-04-30 -658.490559
1980-05-31 -824.416154
1980-06-30 -967.434011
1980-07-31 -465.502265
1980-08-31 -214.332821
1980-09-30 -254.677265
1980-10-31 599.769957
1980-11-30 1675.067179
1980-12-31 3386.983846
Name: seasonal, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 70.835599
1980-08-31 315.999487
1980-09-30 -81.864401
1980-10-31 -307.353290
1980-11-30 109.891154
1980-12-31 -501.775513
Name: resid, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [212]: trend = decomposition2.trend
seasonality = decomposition2.seasonal
residual = decomposition2.resid
print('Trend','\n',trend.head(12),'\n')
print('Seasonality','\n',seasonality.head(12),'\n')
print('Residual','\n',residual.head(12),'\n')
Trend
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 2360.666667
1980-08-31 2351.333333
1980-09-30 2320.541667
1980-10-31 2303.583333
1980-11-30 2302.041667
1980-12-31 2293.791667
Name: trend, dtype: float64
Seasonality
Time_Stamp
1980-01-31 0.649843
1980-02-29 0.659214
1980-03-31 0.757440
1980-04-30 0.730351
1980-05-31 0.660609
1980-06-30 0.603468
1980-07-31 0.809164
1980-08-31 0.918822
1980-09-30 0.894367
1980-10-31 1.241789
1980-11-30 1.690158
1980-12-31 2.384776
Name: seasonal, dtype: float64
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Residual
Time_Stamp
1980-01-31 NaN
1980-02-29 NaN
1980-03-31 NaN
1980-04-30 NaN
1980-05-31 NaN
1980-06-30 NaN
1980-07-31 1.029230
1980-08-31 1.135407
1980-09-30 0.955954
1980-10-31 0.907513
1980-11-30 1.050423
1980-12-31 0.946770
Name: resid, dtype: float64
3. Split the data into training and test. The test data should start in
1991.
In [214]: print(train.shape)
(132, 1)
In [215]: print(test.shape)
(55, 1)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.legend(['Training Data','Test Data'])
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55,
56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 1
22, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132]
Test Time instance
[43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6
0, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97]
In [222]: lr.fit(LinearRegression_train[['time']],LinearRegression_train['Sparkli
ng'].values)
Out[222]: LinearRegression()
plt.figure(figsize=(13,6))
plt.plot( train['Sparkling'], label='Train')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.plot(test['Sparkling'], label='Test')
plt.title("Linear Regression plot for Sparkling Wine")
rmse_model1_test = metrics.mean_squared_error(test['Sparkling'],test_pr
edictions_model1,squared=False)
print("For RegressionOnTime forecast on the Test Data, RMSE is %3.3f"
%(rmse_model1_test))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
essionOnTime'])
resultsDf
Out[226]:
Test RMSE
RegressionOnTime 1275.867052
Out[228]: Time_Stamp
1991-01-31 6047
1991-02-28 6047
1991-03-31 6047
1991-04-30 6047
1991-05-31 6047
Name: naive, dtype: int64
In [229]: plt.figure(figsize=(13,6))
plt.plot(NaiveModel_train['Sparkling'], label='Train')
plt.plot(test['Sparkling'], label='Test')
plt.plot(NaiveModel_test['naive'], label='Naive Forecast on Test Data')
plt.legend(loc='best')
plt.title("Naive Forecast for Sparkling Wine")
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [230]: ## Test Data - RMSE
rmse_model2_test = metrics.mean_squared_error(test['Sparkling'],NaiveMo
del_test['naive'],squared=False)
print("For Naive forecast on the Test Data, RMSE is %3.3f" %(rmse_mode
l2_test))
Out[231]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 3 : Simple Average
Out[233]:
Sparkling mean_forecast
Time_Stamp
In [234]: plt.figure(figsize=(13,6))
plt.plot(SimpleAverage_train['Sparkling'], label='Train')
plt.plot(SimpleAverage_test['Sparkling'], label='Test')
plt.plot(SimpleAverage_test['mean_forecast'], label='Simple Average on
Test Data')
plt.legend(loc='best')
plt.title("Simple Average Forecast for Sparkling Wine")
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [235]: ## Test Data - RMSE
rmse_model3_test = metrics.mean_squared_error(test['Sparkling'],SimpleA
verage_test['mean_forecast'],squared=False)
print("For Simple Average forecast on the Test Data, RMSE is %3.3f" %(
rmse_model3_test))
Out[236]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
p g
Out[237]:
Sparkling
Time_Stamp
1980-01-31 1686
1980-02-29 1591
1980-03-31 2304
1980-04-30 1712
1980-05-31 1471
MovingAverage.head()
Out[238]:
Sparkling Trailing_2 Trailing_4 Trailing_6 Trailing_9
Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1980-04-30 1712
Sparkling 2008.0
Trailing_2 1823.25
Trailing_4 NaN
Trailing_6 NaN
Trailing_9
1980-05-31
Time_Stamp 1471 1591.5 1769.50 NaN NaN
plt.figure(figsize=(13,6))
plt.plot(MovingAverage['Sparkling'], label='Train')
plt.plot(MovingAverage['Trailing_2'], label='2 Point Moving Average')
plt.plot(MovingAverage['Trailing_4'], label='4 Point Moving Average')
plt.plot(MovingAverage['Trailing_6'],label = '6 Point Moving Average')
plt.plot(MovingAverage['Trailing_9'],label = '9 Point Moving Average')
plt.title("Moving Average for Sparkling Wine")
plt.legend(loc = 'best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [241]: ## Plotting on both the Training and Test data
plt.figure(figsize=(13,6))
plt.plot(trailing_MovingAverage_train['Sparkling'], label='Train')
plt.plot(trailing_MovingAverage_test['Sparkling'], label='Test')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [242]: ## Test Data - RMSE --> 2 point Trailing MA
rmse_model4_test_2 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_2'],squared=False)
print("For 2 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_2))
rmse_model4_test_4 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_4'],squared=False)
print("For 4 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_4))
rmse_model4_test_6 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_6'],squared=False)
print("For 6 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f" %(rmse_model4_test_6))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
## Test Data - RMSE --> 9 point Trailing MA
rmse_model4_test_9 = metrics.mean_squared_error(test['Sparkling'],trail
ing_MovingAverage_test['Trailing_9'],squared=False)
print("For 9 point Moving Average Model forecast on the Training Data,
RMSE is %3.3f " %(rmse_model4_test_9))
For 2 point Moving Average Model forecast on the Training Data, RMSE i
s 813.401
For 4 point Moving Average Model forecast on the Training Data, RMSE i
s 1156.590
For 6 point Moving Average Model forecast on the Training Data, RMSE i
s 1283.927
For 9 point Moving Average Model forecast on the Training Data, RMSE i
s 1346.278
Out[243]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
plt.plot(train['Sparkling'], label='Train')
plt.plot(test['Sparkling'], label='Test')
plt.legend(loc='best')
plt.title("Model Comparison Plots for Sparkling Wine")
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Model 5: Simple Exponential Smoothing
In [249]: model_SES_autofit.params
Out[250]:
Sparkling predict
Time_Stamp
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sparkling predict
Time_Stamp
plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')
plt.legend(loc='best')
plt.grid()
plt.title('Alpha =0.0 Predictions');
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [252]: ## Test Data
rmse_model5_test_1 = metrics.mean_squared_error(SES_test['Sparkling'],S
ES_test['predict'],squared=False)
print("For Alpha =0.0 Simple Exponential Smoothing Model forecast on th
e Test Data, RMSE is %3.3f" %(rmse_model5_test_1))
For Alpha =0.0 Simple Exponential Smoothing Model forecast on the Test
Data, RMSE is 1275.082
Out[253]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
In [254]: ## First we will define an empty dataframe to store our values from the
loop
Out[254]:
Alpha Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Alpha Values Train RMSE Test RMSE
Alpha Values Train RMSE Test RMSE
rmse_model5_train_i = metrics.mean_squared_error(SES_train['Sparkli
ng'],SES_train['predict',i],squared=False)
rmse_model5_test_i = metrics.mean_squared_error(SES_test['Sparklin
g'],SES_test['predict',i],squared=False)
Out[256]:
Alpha Values Train RMSE Test RMSE
plt.plot(SES_train['Sparkling'], label='Train')
plt.plot(SES_test['Sparkling'], label='Test')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.plot(SES_test['predict', 0.3], label='Alpha =0.3 Simple Exponential
Smoothing predictions on Test Set')
plt.legend(loc='best')
plt.grid();
Out[258]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
In [261]: ## First we will define an empty dataframe to store our values from the
loop
Out[261]:
Alpha Values Beta Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
n(test))
rmse_model6_train = metrics.mean_squared_error(DES_train['Spark
ling'],DES_train['predict',i,j],squared=False)
rmse_model6_test = metrics.mean_squared_error(DES_test['Sparkli
ng'],DES_test['predict',i,j],squared=False)
In [263]: resultsDf_7
Out[263]:
Alpha Values Beta Values Train RMSE Test RMSE
64 rows × 4 columns
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Out[264]: Alpha Values Beta Values Train RMSE Test RMSE
plt.plot(DES_train['Sparkling'], label='Train')
plt.plot(DES_test['Sparkling'], label='Test')
plt.legend(loc='best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [266]: resultsDf_7_1 = pd.DataFrame({'Test RMSE': [resultsDf_7.sort_values(by=
['Test RMSE']).values[0][3]]}
,index=['Alpha=0.3,Beta=0.3,DoubleExponentia
lSmoothing'])
Out[266]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [270]: model_TES_autofit.params
TES_test['auto_predict'] = model_TES_autofit.forecast(steps=len(test))
TES_test.head()
Out[271]:
Sparkling auto_predict
Time_Stamp
plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')
plt.plot(TES_test['auto_predict'], label='Alpha=0.154,Beta=1.307,Gamma=
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.371,TripleExponentialSmoothing predictions on Test Set')
plt.legend(loc='best')
plt.grid();
rmse_model6_test_1 = metrics.mean_squared_error(TES_test['Sparkling'],T
ES_test['auto_predict'],squared=False)
print("For Alpha=0.154,Beta=1.307,Gamma=0.371, Triple Exponential Smoot
hing Model forecast on the Test Data, RMSE is %3.3f" %(rmse_model6_tes
t_1))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
resultsDf = pd.concat([resultsDf, resultsDf_8_1])
resultsDf
Out[274]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
In [275]: ## First we will define an empty dataframe to store our values from the
loop
Out[275]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
lues
TES_test['predict',i,j,k] = model_TES_alpha_i_j_k.forecast(
steps=len(test))
rmse_model8_train = metrics.mean_squared_error(TES_train['S
parkling'],TES_train['predict',i,j,k],squared=False)
rmse_model8_test = metrics.mean_squared_error(TES_test['Spa
rkling'],TES_test['predict',i,j,k],squared=False)
Out[277]:
Alpha Values Beta Values Gamma Values Train RMSE Test RMSE
In [278]: ## Plotting on both the Training and Test data using brute force alpha,
beta and gamma determination
plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.legend(loc='best')
plt.grid();
Out[279]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
5. Check for the stationarity of the data on which the model is being
built on using appropriate statistical tests and also mention the
hypothesis for the statistical test. If the data is found to be non-
stationary, take appropriate steps to make it stationary. Check the
new data for stationarity and comment. Note: Stationarity should be
checked at alpha = 0.05.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.show(block=False)
In [281]: test_stationarity(df1['Sparkling'])
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
dtype: float64
In [282]: test_stationarity(df1['Sparkling'].diff().dropna())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
data and evaluate this model on the test data using
RMSE.
In [283]: from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
In [284]: plot_acf(df1['Sparkling'],lags=50)
plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Diff
erenced Data Autocorrelation')
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [285]: plot_pacf(df1['Sparkling'],lags=50)
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Sparkling Dif
ferenced Data Partial Autocorrelation')
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [286]: test_stationarity(train['Sparkling'])
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Critical Value (10%) -2.579896
dtype: float64
In [287]: test_stationarity(train['Sparkling'].diff().dropna())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
import itertools
p = q = range(0, 3)
d= range(1,2)
pdq = list(itertools.product(p, d, q))
print('Some parameter combinations for the Model...')
for i in range(1,len(pdq)):
print('Model: {}'.format(pdq[i]))
Out[289]:
param AIC
ARIMA(0, 1, 0) - AIC:2269.582796371201
ARIMA(0, 1, 1) - AIC:2264.906436778214
ARIMA(0, 1, 2) - AIC:2232.7830976841487
ARIMA(1, 1, 0) - AIC:2268.528060607777
ARIMA(1, 1, 1) - AIC:2235.0139453507686
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
ARIMA(1, 1, 2) - AIC:2233.5976471191107
ARIMA(2, 1, 0) - AIC:2262.035600076631
ARIMA(2, 1, 1) - AIC:2232.3604898787803
ARIMA(2, 1, 2) - AIC:2210.6166920985434
In [291]: ## Sort the above AIC values in the ascending order to get the paramete
rs for the minimum AIC value
ARIMA_AIC.sort_values(by='AIC',ascending=True)
Out[291]:
param AIC
8 (2, 1, 2) 2210.616692
7 (2, 1, 1) 2232.360490
2 (0, 1, 2) 2232.783098
5 (1, 1, 2) 2233.597647
4 (1, 1, 1) 2235.013945
6 (2, 1, 0) 2262.035600
1 (0, 1, 1) 2264.906437
3 (1, 1, 0) 2268.528061
0 (0, 1, 0) 2269.582796
results_auto_ARIMA = auto_ARIMA.fit()
print(results_auto_ARIMA.summary())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
099.308
Method: css-mle S.D. of innovations 1
012.075
Date: Sun, 20 Jun 2021 AIC 2
210.617
Time: 23:06:12 BIC 2
227.868
Sample: 02-29-1980 HQIC 2
217.627
- 12-31-1990
=======================================================================
==============
coef std err z P>|z| [0.0
25 0.975]
-----------------------------------------------------------------------
--------------
const 5.5858 0.516 10.823 0.000 4.5
74 6.597
ar.L1.D.Sparkling 1.2697 0.074 17.044 0.000 1.1
24 1.416
ar.L2.D.Sparkling -0.5600 0.074 -7.615 0.000 -0.7
04 -0.416
ma.L1.D.Sparkling -1.9991 0.042 -47.161 0.000 -2.0
82 -1.916
ma.L2.D.Sparkling 0.9991 0.042 23.584 0.000 0.9
16 1.082
Roots
=======================================================================
======
Real Imaginary Modulus Fre
quency
-----------------------------------------------------------------------
------
AR.1 1.1337 -0.7074j 1.3363 -
0.0888
AR.2 1.1337 +0.7074j 1.3363
0.0888
MA.1 1.0005 -0.0005j 1.0005 -
0.0001
MA.2 1.0005 +0.0005j 1.0005
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
0.0001
-----------------------------------------------------------------------
------
1374.9769475915175
Out[295]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA
In [296]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plt.show()
Setting the seasonality as 6 for the first iteration of the auto SARIMA model.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
model_pdq = [(x[0], x[1], x[2], 6) for x in list(itertools.product(p, D
, q))]
print('Examples of some parameter combinations for Model...')
for i in range(1,len(pdq)):
print('Model: {}{}'.format(pdq[i], model_pdq[i]))
Out[298]:
param seasonal AIC
results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)
In [300]: SARIMA_AIC.sort_values(by=['AIC']).head()
Out[300]:
param seasonal AIC
auto_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
seasonal_order=(2, 0, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_6 = auto_SARIMA_6.fit(maxiter=1000)
print(results_auto_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(2, 0, 2, 6) Log Likelihood
-855.839
Date: Sun, 20 Jun 2021 AIC
1727.679
Time: 23:07:48 BIC
1749.707
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Sample: 0 HQIC
1736.621
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6449 0.286 -2.256 0.024 -1.205
-0.085
ma.L1 -0.1069 0.250 -0.428 0.669 -0.596
0.383
ma.L2 -0.7005 0.202 -3.469 0.001 -1.096
-0.305
ar.S.L6 -0.0044 0.027 -0.164 0.870 -0.057
0.049
ar.S.L12 1.0361 0.018 56.098 0.000 1.000
1.072
ma.S.L6 0.0674 0.152 0.443 0.658 -0.231
0.366
ma.S.L12 -0.6123 0.093 -6.591 0.000 -0.794
-0.430
sigma2 1.448e+05 1.71e+04 8.466 0.000 1.11e+05
1.78e+05
=======================================================================
============
Ljung-Box (Q): 28.93 Jarque-Bera (JB):
25.24
Prob(Q): 0.90 Prob(JB):
0.00
Heteroskedasticity (H): 2.63 Skew:
0.47
Prob(H) (two-sided): 0.00 Kurtosis:
5.09
=======================================================================
============
Warnings:
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [302]: results_auto_SARIMA_6.plot_diagnostics()
plt.show()
In [304]: predicted_auto_SARIMA_6.summary_frame(alpha=0.05).head()
Out[304]:
y mean mean_se mean_ci_lower mean_ci_upper
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper
626.880153949547
Out[306]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Setting the seasonality as 12 for the second iteration of the auto SARIMA model.
Out[308]:
param seasonal AIC
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
al,
enforce_stationarity=False,
enforce_invertibility=False
)
results_SARIMA = SARIMA_model.fit(maxiter=1000)
print('SARIMA{}x{} - AIC:{}'.format(param, param_seasonal, resu
lts_SARIMA.aic))
SARIMA_AIC = SARIMA_AIC.append({'param':param,'seasonal':param_
seasonal ,'AIC': results_SARIMA.aic}, ignore_index=True)
In [310]: SARIMA_AIC.sort_values(by=['AIC']).head()
Out[310]:
param seasonal AIC
auto_SARIMA_12 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(1, 1, 2),
seasonal_order=(1, 0, 2, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results_auto_SARIMA_12 = auto_SARIMA_12.fit(maxiter=1000)
print(results_auto_SARIMA_12.summary())
SARIMAX Results
=======================================================================
===================
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Dep. Variable: y No. Observations:
132
Model: SARIMAX(1, 1, 2)x(1, 0, 2, 12) Log Likelihood
-770.792
Date: Sun, 20 Jun 2021 AIC
1555.584
Time: 23:10:10 BIC
1574.095
Sample: 0 HQIC
1563.083
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ar.L1 -0.6282 0.255 -2.462 0.014 -1.128
-0.128
ma.L1 -0.1041 0.225 -0.463 0.643 -0.545
0.337
ma.L2 -0.7276 0.154 -4.731 0.000 -1.029
-0.426
ar.S.L12 1.0439 0.014 72.799 0.000 1.016
1.072
ma.S.L12 -0.5550 0.098 -5.661 0.000 -0.747
-0.363
ma.S.L24 -0.1354 0.120 -1.133 0.257 -0.370
0.099
sigma2 1.506e+05 2.04e+04 7.398 0.000 1.11e+05
1.91e+05
=======================================================================
============
Ljung-Box (Q): 23.02 Jarque-Bera (JB):
11.73
Prob(Q): 0.99 Prob(JB):
0.00
Heteroskedasticity (H): 1.47 Skew:
0.36
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Prob(H) (two-sided): 0.26 Kurtosis:
4.48
=======================================================================
============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [312]: results_auto_SARIMA_12.plot_diagnostics()
plt.show()
In [314]: predicted_auto_SARIMA_12.summary_frame(alpha=0.05).head()
Out[314]:
y mean mean_se mean_ci_lower mean_ci_upper
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
y mean mean_se mean_ci_lower mean_ci_upper
528.4527303319784
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[316]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
Manual SARIMA
In [317]: plot_acf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced Da
ta Autocorrelation')
plot_pacf(df1['Sparkling'].diff().dropna(),lags=50,title='Differenced D
ata Partial Autocorrelation')
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [318]: manual_ARIMA = ARIMA(train['Sparkling'].astype('float64'), order=(1,1,1
),freq='M')
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
results_manual_ARIMA = manual_ARIMA.fit()
print(results_manual_ARIMA.summary())
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
AR.1 2.3313 +0.0000j 2.3313
0.0000
MA.1 1.0000 +0.0000j 1.0000
0.0000
-----------------------------------------------------------------------
------
1461.6785026377725
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[321]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
Manual SARIMA
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [323]: df.plot()
plt.grid();
In [324]: (df1['Sparkling'].diff(6)).plot()
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [325]: (df1['Sparkling'].diff(6)).diff().plot()
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [326]: test_stationarity((train['Sparkling'].diff(6).dropna()).diff(1).dropna
())
In [327]: plot_acf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30)
plot_pacf((df1['Sparkling'].diff(6).dropna()).diff(1).dropna(),lags=30
);
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [328]: import statsmodels.api as sm
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
manual_SARIMA_6 = sm.tsa.statespace.SARIMAX(train['Sparkling'].values,
order=(0,1,2),
seasonal_order=(1, 1, 2, 6),
enforce_stationarity=False,
enforce_invertibility=False)
results_manual_SARIMA_6 = manual_SARIMA_6.fit(maxiter=1000)
print(results_manual_SARIMA_6.summary())
SARIMAX Results
=======================================================================
==================
Dep. Variable: y No. Observations:
132
Model: SARIMAX(0, 1, 2)x(1, 1, 2, 6) Log Likelihood
-814.410
Date: Sun, 20 Jun 2021 AIC
1640.821
Time: 23:10:17 BIC
1657.023
Sample: 0 HQIC
1647.393
- 132
Covariance Type: opg
=======================================================================
=======
coef std err z P>|z| [0.025
0.975]
-----------------------------------------------------------------------
-------
ma.L1 -0.7621 0.107 -7.150 0.000 -0.971
-0.553
ma.L2 -0.1433 0.113 -1.270 0.204 -0.364
0.078
ar.S.L6 -1.0186 0.009 -119.432 0.000 -1.035
-1.002
ma.S.L6 0.5540 0.143 3.875 0.000 0.274
0.834
ma.S.L12 -0.8700 0.174 -4.988 0.000 -1.212
-0.528
sigma2 9.961e+04 2.41e+04 4.129 0.000 5.23e+04
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1.47e+05
=======================================================================
============
Ljung-Box (Q): 28.53 Jarque-Bera (JB):
34.60
Prob(Q): 0.91 Prob(JB):
0.00
Heteroskedasticity (H): 1.83 Skew:
0.61
Prob(H) (two-sided): 0.07 Kurtosis:
5.46
=======================================================================
============
Warnings:
[1] Covariance matrix calculated using the outer product of gradients
(complex-step).
In [329]: results_manual_SARIMA_6.plot_diagnostics()
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [330]: predicted_manual_SARIMA_6 = results_manual_SARIMA_6.get_forecast(steps=
len(test))
In [331]: predicted_manual_SARIMA_6.summary_frame(alpha=0.05).head()
Out[331]:
y mean mean_se mean_ci_lower mean_ci_upper
558.4383293762605
resultsDf = pd.concat([resultsDf,temp_resultsDf])
resultsDf
Out[333]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
SARIMA(0,1,0)(1,1,3,6) 558.438329
8.Build a table with all the models built along with their
corresponding parameters and the respective RMSE
values on the test data.
In [334]: resultsDf
Out[334]:
Test RMSE
RegressionOnTime 1275.867052
NaiveModel 3864.279352
SimpleAverageModel 1275.081804
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
ARIMA(0,1,2) 1374.976948
SARIMA(1,1,2)(2,0,2,6) 626.880154
SARIMA(1,1,2)(1,0,2,12) 528.452730
ARIMA(1,1,1) 1461.678503
SARIMA(0,1,0)(1,1,3,6) 558.438329
Out[335]:
Test RMSE
Alpha=0.154,Beta=1.307,Gamma=0.371,TripleExponentialSmoothing 383.155684
Alpha=0.3,Beta=0.3,Gamma=0.3,TripleExponentialSmoothing 392.786198
SARIMA(1,1,2)(1,0,2,12) 528.452730
SARIMA(0,1,0)(1,1,3,6) 558.438329
SARIMA(1,1,2)(2,0,2,6) 626.880154
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
Test RMSE
2pointTrailingMovingAverage 813.400684
4pointTrailingMovingAverage 1156.589694
Alpha=0.0,SimpleExponentialSmoothing 1275.081766
SimpleAverageModel 1275.081804
RegressionOnTime 1275.867052
6pointTrailingMovingAverage 1283.927428
9pointTrailingMovingAverage 1346.278315
ARIMA(0,1,2) 1374.976948
ARIMA(1,1,1) 1461.678503
Alpha=0.3,SimpleExponentialSmoothing 1935.507132
NaiveModel 3864.279352
Alpha=0.3,Beta=0.3,DoubleExponentialSmoothing 18259.110704
plt.plot(TES_train['Sparkling'], label='Train')
plt.plot(TES_test['Sparkling'], label='Test')
plt.plot(TES_test['auto_predict'], label='Alpha=0.154,Beta=1.307,Gamma=
0.371,TripleExponentialSmoothing predictions on Test Set')
plt.legend(loc='best')
plt.grid();
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.title('Plot of Exponential Smoothing Predictions and the Acutal Val
ues');
print('RMSE:',RMSE_fullmodel)
RMSE: 353.91247239957886
In [339]: # Getting the predictions for the same number of times stamps that are
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
present in the test data
prediction= fullmodel.forecast(steps=len(test))
In [340]: df1.plot()
prediction.plot();
In [341]: #In the below code, we have calculated the upper and lower confidence b
ands at 95% confidence level
#The percentile function under numpy lets us calculate these and adding
and subtracting from the predictions
#gives us the necessary confidence bands for the predictions
pred_df = pd.DataFrame({'lower_CI':prediction - 1.96*np.std(fullmodel.r
esid,ddof=1),
'prediction':prediction,
'upper_ci': prediction + 1.96*np.std(fullmode
l.resid,ddof=1)})
pred_df.head()
Out[341]:
lower_CI prediction upper_ci
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
1995-09-30 lower_CI
1739.199777 prediction
2434.600498 upper_ci
3130.001218
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
END
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD