Time Series Forecating

Time se foreca ing
Submi ed By: Saranya S

Batch Code: PGPDSBA.O.JULY23.B
1
ri
tt
es
st
Content
S.No Topics Page No
Sparkling Wine
1.1 Wine - Sparkling Data 8
1.2 Basic Analysis 8
1.3 Time Series Conversion 10
1.4 Exploratory Data Analysis 12
1.5 Decomposition 16
1.6 Train & Test Split 18
1.7 Model Prediction 19
1.8 Stationary Check 28
1.9 Auto ARIMA & SARIMA 30
1.10 ACF & PACF Plot 34
1.11 Manual ARIMA & SARIMA 36
1.12 Test RMSE across models built 38
1.13 Optimum model on whole dataset 38
1.14 Future Prediction 39
Rose Wine
2.1 Wine - Rose Data 41
2.2 Basic Analysis 41
2.3 Time Series Conversion 44
2.4 Exploratory Data Analysis 45
2.5 Decomposition 49
2.6 Train & Test Split 52
2.7 Model Prediction 53
2.8 Stationary Check 62
2.9 Auto ARIMA & SARIMA 64
2
2.10 ACF & PACF Plot 67
2.11 Manual ARIMA & SARIMA 69
2.12 Test RMSE across models built 71
2.13 Optimum model on whole dataset 71
2.14 Future Prediction 72
2.15 Business Insights and Recommendation 74
List of Tables
S.No Topics Page No
1 Top 5 Rows 8
2 Last 5 Rows 9
3 Missing value check 9
4 Info Check 9
5 Descriptive Summary 10
6 Top 5 rows after converting 10
7 Pivot Table 14
8 Additive decompose value 17
9 Multiplicative decompose value 18
10 First and last 5 rows of Train & Test set 19
11 Moving Average forecast value 22
12 SES values between Alpha 1 to 9 24
13 RMSE across multiple models 27
14 Dickey fuller test results 28
15 Dickey fuller test results after differencing approach 29
16 ARIMA - AIC values 31
17 Auto ARIMA Summary 31
18 SARIMA - AIC values 32
3
19 Auto SARIMA Summary 33
20 Manual ARIMA Summary 36
21 Manual SARIMA Summary 37
22 RMSE across models 38
23 TES on entire data 39
24 Forecast for 12 months 39
25 Forecast Summary 40
26 Upper and Lower CI 40
Rose Wine
27 Top 5 Rows 42
28 Last 5 Rows 42
29 Missing value check 42
30 Info Check 43
31 Descriptive Summary 43
32 Top 5 rows after converting 44
33 Pivot Table 48
34 Additive decompose value 50
35 Multiplicative decompose value 51
36 Moving Average forecast value 56
37 RMSE value on Models 56
38 SES values between Alpha 1 to 9 58
39 TES value to nd best t 60
40 RMSE across multiple models 61
41 Dickey fuller test results 62
42 Dickey fuller test results after differencing approach 63
43 ARIMA - AIC values 65
44 Auto ARIMA Summary 65
4
fi
fi
45 Auto SARIMA Summary 66
46 Manual ARIMA Summary 69
47 Manual SARIMA Summary 70
48 RMSE across models 71
49 TES on entire data 72
50 Forecast for 12 months 72
51 Forecast Summary 73
52 Upper and Lower CI 73
List of Figures
S.No Topics Page No
1 Time series plot before & after time stamp 12
2 Year wise sales analysis 12
3 Monthly sales analysis 12
4 Weekly sales across years 13
5 Empirical Cumulative Distribution plot 13
6 Average Monthly sales across years 14
7 Month wise Comparison plot 15
8 Average and Percentage sales 15
9 Additive Decompose 16
10 Multiplicative Decompose 17
11 Train & Test Split 19
12 Linear Regression 20
13 Naives Model 20
14 Simple Average 21
15 MA prediction on Whole data 22
16 MA precision on Test data 22
17 Different model comparison 23
5
S.No Topics Page No
18 Simple Exponential Smoothing 23
19 SES Alpha 1 to 9 24
20 DES plot 25
21 TES Auto t model plot 26
22 TES Best t model plot 26
23 Stationary check 28
24 Stationary check after differencing approach 29
25 SARIMA Diagnostic plot 33
26 ACF Plot 34
27 PACF plot 35
28 Manual SARIMA Diagnostic Plot 37
29 Future forecast plot 41
30 Plot before Time stamp 44
31 Time series plot after Time stamp 45
32 Year wise sales analysis 45
33 Monthly sales across years 46
34 Weekly sales across years 46
35 Empirical Cumulative Distribution plot 47
36 Average monthly sales across years 47
37 Month wise Comparison plot 48
38 Average & Percentage sales 49
39 Additive Decompose 50
40 Multiplicative Decompose 51
41 Train and Test Split 52
42 Linear Regression 53
43 Naives Model 54
44 Simple Average 54
6
fi
fi
S.No Topics Page No
45 MA prediction Whole data 55
46 MA prediction on Test data 55
47 Different Model predicition 56
48 Simple Exponential Smoothing 57
49 SES Alpha 1 to 9 57
50 DES Plot 58
51 TES Auto t Model plot 59
52 TES Modle plot 60
53 TES Best t plot 61
54 Stationary check 62
55 Stationary check after differencing approach 63
56 Auto SARIMA Diagnostic plot 67
57 ACF plot 68
58 PACF plot 69
59 Manual SARIMA Diagnostic plot 70
60 Future forecast plot 74
7
fi
fi
1.1 Wine - Sparkling Data:
As an analyst at ABC Estate Wines, we are presented with historical data

encompassing the sales of different types of wines throughout the 20th
century. These datasets originate from the same company but represent
sales gures for distinct wine varieties. Our objective is to delve into the
data, analyze trends, patterns, and factors in uencing wine sales over the
course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic
decision-making and optimize sales strategies for the future.
The primary objective of this project is to analyze and forecast wine sales
trends for the 20th century based on historical data provided by ABC
Estate Wines. We aim to equip ABC Estate Wines with the necessary
insights and foresight to enhance sales performance, capitalize on
emerging market opportunities, and maintain a competitive edge in the
wine industry.
1.2 Basic Analysis:
Shape:
There are 187 rows and 2 Columns available in the give dataset.
First 5 rows of the dataset:
Table 1 : Top 5 rows

8
fi
fl
Last 5 rows of the dataset:
Table 2 : Last 5 rows
Duplicate Check:
There is no duplicate values in the Sparkling dataset.
Missing value Check:
There is no missing values in the Sparkling dataset.
Table 3 : Missing value check
Info Check:
Table 4 : Info check

9
In the Sparkling dataset we have 1 Object type and 1 Integer type values.
Descriptive Summary:
Table 5 : Descriptive summary
- The maximum sale of Sparkling wine is 7,242 and minimum is 1,070.

- The average sale of this particular type wine is 2,402.
1.3 Time series conversion:

We are have to change the given data into Time series data by converting
‘YearMonth’ column into Date format and name the column as Time
Stamp.
Table 6 : Top 5 rows after Converting
10
Lets drop Year Month column, which is not required after converting it into
date format.
Before:
After:
Figure 1 - Time series plot before & After Time stamp
- We could see seasonality has a pattern on yearly basis.

- During 1987 to 1989 we could see there is a spike in sale when
compared to other years.
11
1.4 Exploratory data analysis:
Figure 2 - Year wise sales analysis
- From the above year wise sales analysis we could see most of the years have
outliers except the year 1995. We need not treat outliers.
- The sales was on peak during the years 1987 to 1989.
Figure 3 - Monthly sales across years
- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Oct, Nov and Dec shows increasing trend in sales, which could be due to
Holiday season.
12
Figure 4 - Weekly sales across years
From the above plot its clear that on Saturdays we have highest number of
sales, when compared to other days.
Its clear that people tend to buy more during weekends.
Figure 5 - Empirical Cumulative Distribution plot
- We could see the highest sale was reached to 7000.

- 50% sale is within 4000.
13
Figure 6 - Average monthly sales across years
- The average monthly sales during the month Dec is approximately around
5,900.
Table 7 : Pivot table
- From the above Pivot table we can understand the monthly sales across years
in detail.
- The numbers coloured in red are the highest sales happen for that particular
month.
14
Figure 7 - Month wise comparison plot
- This plot also indicates the same trend, where Dec with highest sale across
all years followed by Nov and Oct.
Figure 8 - Average and Percentage sales
- We could see certain years upto 75% drop in sales.
15
1.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.
Additive Decompose:
- We could see seasonality is relatively constant over years.
- Over the years the trend keeps on changing.
- The residual are located around 0 in the below residual plot.
Figure 9 - Additive decompose
16
Table 8 - Additive decompose values
Multiplicative Decompose:
Figure 10 - Multiplicative decompose
17
Table 9 - Multiplicative decompose values
- Trend and seasonality remains the same between Additive and

Multiplicative decompose.
- Residual is around 1 in Multiplicative decompose.
Additive Decompose = Trend + Season + Residual

Multiplicative Decompose = Trend * Season * Residual
1.6 Train and Test Split:

- We are going to split the data into Train and Test.
- Train set till Dec 1990 and Test from Jan 1991.
- After splitting the data below is the shape of Train and Test set.
- From the below screenshots we can see the Train and Test set split are
achieved as expected.
18
Table 10 - First & last 5 rows of Train and Test set
Figure 11 - Train and Test split
The above plot helps us to visually see the range between Train and Test
set.
1.7 Model Predictions:

Linear regression: We are going to regress the 'Sales' variable against the
order of the occurrence. For this we need to modify our training data
before tting it into a linear regression.
19
fi
Figure 12 - Linear Regression
The green and red line indicates the model performance on the actual test
& train values.
The RMSE value for Linear regression is 1389.14
Naives Model:
Figure 13 - Naives Model

20
The green line indicates the model performance on the Orange test area,
where we could see the predictive values are far from the actual values.
The RMSE value for Naives Model is 3864.28
Simple Average:
Figure 14 - Simple Average
The green line indicates the Simple average model prediction on the
Orange test area, where we could see the predictive values and actual
values has high distance.
The RMSE value for Simple average Model is 1275.08
21
Moving Average:
Figure 15 - MA prediction on whole data
Figure 16 - MA prediction on Test data
Table 11 - Moving Average forecast value
22
The above table shows the Moving average performance between 2 to 9.
We can see the rolling average performs better than the simple
average.The higher the rolling window, the smoother the curve will be its,
hence more values are taken into account.
Model Comparison:
Figure 17 - Different model comparison
The above plot shows the performance of different models on Test values.
Simple Exponential Smoothing:
Figure 18 - Simple Exponential Smoothing

23
The auto picked model of smoothing parameter is Alpha 0.049 and RMSE
is 1316.136
Lets run test with different Alpha values to see the best t.
Figure 19 - SES Alpha 1 to 9
Table 12 - SES values between Alpha 1 to 9
The above table gives you the values between Alpha 1 to 9.

24
fi
Double Exponential Smoothing:
Figure 20 - DES Plot
Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 1778.564670
The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 1778.564670.
25
The above RMSE is derived from auto t model.
Triple Exponential Smoothing:
The auto t model for Triple exponential smoothing with Alpha 0.12, Beta
0.049 and Gamma 0.361.
The RMSE is 404.418
Figure 21 - TES Auto t model Plot
Figure 22 - TES Best t model Plot
The above plot is the best t model in Triple exponential smoothing.
26
fi
fi
fi
fi
fi
Alpha=0.4,Beta=0.1,Gamma=0.2,TripleExponentialSmoothing 317.434302
The best t Triple exponential smoothing is Alpha 0.4, Beta 0.1 and
Gamma 0.2 , where RMSE is 317.434302.
Best Model t:
We have built different models and got an idea which model give least
error in the test data set.
Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.
Table 13 - RMSE across multiple models
From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.
27
fi
fi
1.8 Stationary Check:
P < 0.05 = We reject the null Hypothesis.
P > 0.05 = We fail to reject the null Hypothesis.
Figure 23 - Stationary Check
Table 14 - Dickey Fuller test results
28
We could see the p-value as 0.601061 which is greater than 0.05. Hence
we fail to reject the null Hypothesis.
The dataset for Sparkling wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.
In order to make it stationary lets use different approaches. Lets take

difference of order 1 and drop NaN values from the dataset.
Figure 24 - Stationary Check after differencing approach
Table 15 - Dickey Fuller test result after differencing approach

29
Param AIC
10 (2, 1, 2) 2213.509212
15 (3, 1, 3) 2221.452537
14 (3, 1, 2) 2230.816576
11 (2, 1, 3) 2232.917659
9 (2, 1, 1) 2233.777626
3 (0, 1, 3) 2233.994858
2 (0, 1, 2) 2234.408323
6 (1, 1, 2) 2234.527200
13 (3, 1, 1) 2235.499067
7 (1, 1, 3) 2235.607814
5 (1, 1, 1) 2235.755095
12 (3, 1, 0) 2257.723379
8 (2, 1, 0) 2260.365744
1 (0, 1, 1) 2263.060016
4 (1, 1, 0) 2266.608539
0 (0, 1, 0) 2267.663036
From the above Dickey fuller test we can see the p-value as 0.000, which
is lesser than 0.05. Hence we reject the null hypothesis and conclude by
saying that the data is Stationary.
1.9 Auto ARIMA & SARIMA:

Auto ARIMA:
We use iteration to nd the optimum of p,d,q.
AR(Auto Regressive) - p
Differencing to make series stationary - d
MA(Moving Average) - q
We x the value for d as 1, where we have already determined d to be 1.
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (0, 1, 3)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (1, 1, 3)
30
fi
fi
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
Model: (2, 1, 3)
Model: (3, 1, 0)
Model: (3, 1, 1)
Model: (3, 1, 2)
Model: (3, 1, 3)
Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=2 for Auto ARIMA model.
Table 16 - ARIMA - AIC Value
Table 17 - Auto ARIMA Summary

31
Auto ARIMA Test RMSE value is 1299.9799
Auto SARIMA:
Similar to Auto ARIMA we apply for loop to determine the optimum

values.
Examples of some parameter combinations for Model...

Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (0, 1, 3)(0, 0, 3, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (1, 1, 3)(1, 0, 3, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)
Model: (2, 1, 3)(2, 0, 3, 12)
Model: (3, 1, 0)(3, 0, 0, 12)
Model: (3, 1, 1)(3, 0, 1, 12)
Model: (3, 1, 2)(3, 0, 2, 12)
Model: (3, 1, 3)(3, 0, 3, 12)
Table 18 - SARIMA - AIC Value
From the above summary we are going to choose the AIC with least value.
32
Table 19 - Auto SARIMA Summary
Figure 25 - SARIMA Diagnostic Plot

33
Auto SARIMA Test RMSE value is 528.586
1.10 ACF & PACF Plot:
Figure 26 - ACF Plot
34
In the ACF plot we could see decay after lag 1 for original as well as
differenced data.Hence we select the q value to be 1. i.e. q=1.
Figure 27 - PACF Plot

35
In the PACF plot also we can see signi cant bars till lag 1 for differenced
data which is stationary in nature, post 1 the decay is large enough.
Hence we are choosing p value to be 1. i.e. p=1, d=1. We have seen earlier
the series is stationary with lag1.
Hence the values selected for manual ARIMA are p=1, d=1, q=1
1.11 Manual ARIMA & SARIMA:

Manual ARIMA:
Table 20 - Manual ARIMA Summary
The Model prediction of Manual ARIMA RMSE is 1319.937.
Manual SARIMA:
Selected Manual SARIMA is (1, 1, 1) (1, 1, 1, 12)
36
fi
Table 21 - Manual SARIMA Summary
Figure 28 - Manual SARIMA Diagnostic Plot

37
The Model prediction of Manual SARIMA RMSE is 359.613
1.12 Test RMSE across Models built:
Table 22 - RMSE across Models
From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.
1.13 Optimum model on Whole dataset:
Now we are going to implement the Optimum model which we analysed

from all the mode prediction on Whole dataset, without any Train and Test
Split.
38
Figure 23 - TES on entire data
RMSE value on whole dataset is 414.324
1.14 Future Prediction:
Table 24 - Forecast for 12months

39
Table 25 - Forecast Summary
Table 26 - Upper and Lower CI
The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.
40
Figure 29 - Future forecast plot
*************************************
2.1 Wine - Rose Data:
As an analyst at ABC Estate Wines, we are presented with historical data

encompassing the sales of different types of wines throughout the 20th
century. These datasets originate from the same company but represent
sales gures for distinct wine varieties. Our objective is to delve into the
data, analyze trends, patterns, and factors in uencing wine sales over the
course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic
decision-making and optimize sales strategies for the future.
2.2 Basic Analysis:
Shape:
There are 187 rows and 2 Columns available in the give dataset.
Duplicate Check:
There is no duplicate values in the Rose dataset.

41
fi
fl
First 5 rows of the dataset:
Table 27 : Top 5 rows

Last 5 rows of the dataset:
Table 28 : Last 5 rows
Missing value Check:
There are 2 missing values in Rose dataset.
Table 29 : Missing value check
We will treat the missing values with Linear Interpolation method.
42
Info Check:
Table 30 : Info check
In the Rose dataset we have 1 Object type and 1 Float type values.
Descriptive Summary:
Table 31 : Descriptive summary
- The maximum sale of Rose wine is 267 and minimum is 28.

- The average sale of this particular type wine is 89.91.
43
Figure 30 - Plot before Time stamp
2.3 Time series conversion:

We are have to change the given data into Time series data by converting
‘YearMonth’ column into Date format and name the column as Time
Stamp.
Table 32 : Top 5 rows after Converting
Lets drop Year Month column, which is not required after converting it into
date format.
44
Figure 31 - Time series plot after Time stamp
- We could see seasonality has a pattern on yearly basis.

- During 1981 we could see there is a spike in sale when compared to
other years.
2.4 Exploratory data analysis:
Figure 32 - Year wise sales analysis
- From the above year wise sales analysis we could see certain years have
outliers. We need not treat outliers.
- The sales was on peak during the years 1980 & 1981.
45
Figure 33 - Monthly sales across years
- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Holiday seasons show increasing trend in sales, when compared to other
seasons.
Figure 34 - Weekly sales across years
When we do a weekly comparison we could see except Saturday and

Sunday rest all days has outliers.
Looks like Rose wine was bought by people across all days irrespective for
weekend.
46
Figure 35 - Empirical Cumulative Distribution plot
- We could see the highest sale was reached to 250.

- 50% sale is within 150.
Figure 36 - Average monthly sales across years

47
- The average monthly sales during the month Dec is approximately around
250.
Table 33 : Pivot table
- From the above Pivot table we can understand the monthly sales across years
in detail.
- The cells coloured in Green are the highest sales happen for that particular
month.
Figure 37 - Month wise comparison plot
48
Figure 38 - Average and Percentage sales
- We could each year has a certain % of drop, with minimum 40% and
Maximum more than 60% sales drop.
-
2.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.
Additive Decompose:
From the below plot we can see the Trend in peak during 1981 and keeps
declining over the years.
The Residual value is spread across and not in straight line.
We can also see Seasonality and Trend present in the dataset.
49
Figure 39 - Additive decompose
Table 34 - Additive decompose values
50
Multiplicative Decompose:
Figure 40 - Multiplicative decompose
Table 35 - Multiplicative decompose values

51
- Trend and seasonality remains the same between Additive and
Multiplicative decompose.
- Residual is straight around 1 in Multiplicative decompose.
- Trend and Seasonality is present and same as additive.
- Residual is between 0 to 1 in Multiplicative, where in Additive is was
spread between 0 to 50.
- We can say Multiplicative model is more stable in Residual part than
Additive.
Additive Decompose = Trend + Season + Residual

Multiplicative Decompose = Trend * Season * Residual
1.6 Train and Test Split:

- We are going to split the data into Train and Test.
- Train set till Dec 1990 and Test from Jan 1991.
- After splitting the data below is the shape of Train and Test set.
- From the below screenshots we can see the Train and Test set split are
achieved as expected.
Figure 41 - Train and Test split

52
The above plot helps us to visually see the range between Train and Test
set.
2.7 Model Predictions:

Linear regression: We are going to regress the 'Sales' variable against the
order of the occurrence. For this we need to modify our training data
before tting it into a linear regression.
Figure 42 - Linear Regression
The green and red line indicates the model performance on the actual test
& train values.
The RMSE value for Linear regression is 15.27
Naives Model:
The green line indicates the model performance on the Orange test area,
where we could see the predictive values are far from the actual values.
The RMSE value for Naives Model is 79.72

53
fi
Figure 43 - Naives Model
Simple Average:
Figure 44 - Simple Average

54
The green line indicates the Simple average model prediction on the
Orange test area, where we could see the predictive values and actual
values has high distance.
The RMSE value for Simple average Model is 53.46
Moving Average:
Figure 45 - MA prediction on whole data
Figure 46 - MA prediction on Test data

55
Table 36 - Moving Average forecast value
The above table shows the Moving average performance between 2 to 9.

We can see the rolling average performs better than the simple
average.When the rolling window is higher the curve will be smoother,
Hence we will take more values.
Model Comparison:
Figure 47 - Different model comparison
The above plot shows the performance of different models on Test values.
Table 37 -RMSE Values on Models

56
Simple Exponential Smoothing:
Figure 48 - Simple Exponential Smoothing
The auto picked model of smoothing parameter is Alpha =0.098 and

RMSE is 37.592
Lets run test with different Alpha values to see the best t.
Figure 49 - SES Alpha 1 to 9

57
fi
Table 38 - SES values between Alpha 1 to 9
The above table gives you the values between Alpha 1 to 9.
Double Exponential Smoothing:
Figure 50 - DES Plot
Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 36.923416
58
The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 36.924
The above RMSE is derived from auto t model.
Triple Exponential Smoothing:
The auto t model for Triple exponential smoothing with Alpha=0.0720,

Beta=0.044, Gamma=1.366.
The RMSE is 20.741
Figure 51 - TES Auto t model Plot

59
fi
fi
fi
Best Model t:
The below is the best t model in Triple exponential smoothing.
Table 39 - TSE values to nd best t
The best t Triple exponential smoothing is Alpha 0.1, Beta 0.2 and
Gamma 0.1 , where RMSE is 9.223504.
Figure 52 - TES model Plot
60
fi
fi
fi
fi
fi
We have built different models and got an idea which model give least
error in the test data set.
Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.
Table 40 - RMSE across multiple models
From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.
Figure 53 - TES Best t model Plot

61
fi
2.8 Stationary Check:
P < 0.05 = We reject the null Hypothesis.
P > 0.05 = We fail to reject the null Hypothesis.
Figure 54 - Stationary Check
Table 41 - Dickey Fuller test results

62
We could see the p-value as 0.343101 which is greater than 0.05. Hence
we fail to reject the null Hypothesis.
The dataset for Rose wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.
In order to make it stationary lets use different approaches. Lets take

difference of order 1 and drop NaN values from the dataset.
Figure 55 - Stationary Check after differencing approach
Table 42 - Dickey Fuller test result after differencing approach

63
From the above Dickey fuller test we can see the p-value as 1.810895e-12,
which is lesser than 0.05. Hence we reject the null hypothesis and
conclude by saying that the data is Stationary.
2.9 Auto ARIMA & SARIMA:

Auto ARIMA:
We use iteration to nd the optimum of p,d,q.
AR(Auto Regressive) - p
Differencing to make series stationary - d
MA(Moving Average) - q
We x the value for d as 1, where we have already determined d to be 1.
Some parameter combinations for the Model...
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (0, 1, 3)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (1, 1, 3)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
Model: (2, 1, 3)
Model: (3, 1, 0)
Model: (3, 1, 1)
Model: (3, 1, 2)
Model: (3, 1, 3)
Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=3 for Auto ARIMA model.
64
fi
fi
Table 43 - ARIMA - AIC Value
Table 44 - Auto ARIMA Summary

65
Auto ARIMA Test RMSE value is 136.8175
Auto SARIMA:
Similar to Auto ARIMA we apply for loop to determine the optimum

values.
Examples of some parameter combinations for Model...
Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (0, 1, 3)(0, 0, 3, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (1, 1, 3)(1, 0, 3, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)
Model: (2, 1, 3)(2, 0, 3, 12)
Model: (3, 1, 0)(3, 0, 0, 12)
Model: (3, 1, 1)(3, 0, 1, 12)
Model: (3, 1, 2)(3, 0, 2, 12)
Model: (3, 1, 3)(3, 0, 3, 12)
Table 45 - Auto SARIMA - AIC Value
From the above summary we are going to choose the AIC with least value.
66
Figure 56 - Auto SARIMA Diagnostic Plot
Auto SARIMA Test RMSE value is 518.883
2.10 ACF & PACF Plot:
67
Figure 57 - ACF Plot
In the ACF plot we could see decay after lag 2 for original as well as
differenced data.Hence we select the q value to be 2. i.e. q=2.
68
Figure 58 - PACF Plot
In the PACF plot also we can see signi cant bars till lag 1 for differenced
data which is stationary in nature, post 2 the decay is large enough.
Hence we are choosing p value to be 2. i.e. p=2, d=1. We have seen earlier
the series is stationary with lag 2. The values selected for manual ARIMA
are p=2, d=1, q=2
2.11 Manual ARIMA & SARIMA:
Manual ARIMA:
Table 46 - Manual ARIMA Summary

69
fi
The Model prediction of Manual ARIMA RMSE is 36.872.
Manual SARIMA:
Selected Manual SARIMA is (2, 1, 2) (2, 1, 2, 12)
Table 47 - Manual SARIMA Summary
Figure 59 - Manual SARIMA Diagnostic Plot

70
The Model prediction of Manual SARIMA RMSE is 15.1696
2.12 Test RMSE across Models built:
Table 48 - RMSE across Models
From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.
2.13 Optimum model on Whole dataset:

Now we are going to implement the Optimum model which we analysed
from all the mode prediction on Whole dataset, without any Train and Test
Split.
71
Figure 49 - TES on entire data
RMSE value on whole dataset is 18.899
2.14 Future Prediction:

Forecasted values for the next 12 months:
1995-08-31 46.576143
1995-09-30 48.463220
1995-10-31 51.275198
1995-11-30 59.982543
1995-12-31 84.153330
1996-01-31 32.062233
1996-02-29 40.612599
1996-03-31 47.214012
1996-04-30 48.421536
1996-05-31 40.498449
1996-06-30 47.692170
1996-07-31 52.675485
Table 50 - Forecast for 12months

72
Table 51 - Forecast Summary
Table 52 - Upper and Lower CI
The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.
73
Figure 60 - Future forecast plot
2.15 Business Insights and Recommendation:

On comparing both wines, we could see Sparkling wine is higher in sales
over the years when compared to Rose wine.
- We could see people likely to purchase wine during festive season and
drop in sales during Peak winter season.
- Rose wine is always in dropping trend over years with 1980 as
exceptional.
- We can expect Rose wine to drop further in future years too.
- Sparkling is popular over periods from 1980 and expected to maintain
the same trend in Future as well.
- There is an increase in sale of Sparkling wine during weekends and
Rose wine was purchased all days of the week.
- The company can arrange for a campaign during non peak months.
- The company can think of issuing some samples in stores and Markets
as an advertising strategy.
- The company should also analyse the reason for less popularity of Rose
wine and if required can also think about make some changes to the
Rose wine production and Marketing strategy.
74
- Data can also be collect on the customers who purchaser Rose wine and
do some analytics to nd out what kind of people prefer Rose wine to
make changes accordingly to attract other crowd too.
- Since Sparkling is already popular we can give some combo offers by
selling Rose wine and Sparkling wine together at a certain discount to
increase sale of Rose wine along with Sparkling wine.
*********** ank Y ***********
75
fi
Th
ou

Time Series Forecating

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Forecating

Uploaded by

Copyright:

Available Formats

Time se foreca ing

Submi ed By: Saranya S

As an analyst at ABC Estate Wines, we are presented with historical data

First 5 rows of the dataset:

Table 1 : Top 5 rows

Table 2 : Last 5 rows

There is no duplicate values in the Sparkling dataset.

Missing value Check:

There is no missing values in the Sparkling dataset.

Table 3 : Missing value check

Table 4 : Info check

Table 5 : Descriptive summary

- The maximum sale of Sparkling wine is 7,242 and minimum is 1,070.

1.3 Time series conversion:

Table 6 : Top 5 rows after Converting

Figure 1 - Time series plot before & After Time stamp

- We could see seasonality has a pattern on yearly basis.

Figure 2 - Year wise sales analysis

Figure 3 - Monthly sales across years

Figure 5 - Empirical Cumulative Distribution plot

- We could see the highest sale was reached to 7000.

Table 7 : Pivot table

Figure 8 - Average and Percentage sales

- We could see certain years upto 75% drop in sales.

Figure 9 - Additive decompose

Figure 10 - Multiplicative decompose

- Trend and seasonality remains the same between Additive and

Additive Decompose = Trend + Season + Residual

1.6 Train and Test Split:

Figure 11 - Train and Test split

1.7 Model Predictions:

The RMSE value for Linear regression is 1389.14

Figure 13 - Naives Model

The RMSE value for Naives Model is 3864.28

Figure 14 - Simple Average

The RMSE value for Simple average Model is 1275.08

Figure 15 - MA prediction on whole data

Figure 16 - MA prediction on Test data

Table 11 - Moving Average forecast value

Figure 17 - Different model comparison

Figure 18 - Simple Exponential Smoothing

Figure 19 - SES Alpha 1 to 9

Table 12 - SES values between Alpha 1 to 9

The above table gives you the values between Alpha 1 to 9.

Figure 20 - DES Plot

Triple Exponential Smoothing:

The RMSE is 404.418

Figure 21 - TES Auto t model Plot

Figure 22 - TES Best t model Plot

The above plot is the best t model in Triple exponential smoothing.

Table 13 - RMSE across multiple models

Figure 23 - Stationary Check

Table 14 - Dickey Fuller test results

In order to make it stationary lets use different approaches. Lets take

Figure 24 - Stationary Check after differencing approach

Table 15 - Dickey Fuller test result after differencing approach

1.9 Auto ARIMA & SARIMA:

Table 16 - ARIMA - AIC Value

Table 17 - Auto ARIMA Summary

Similar to Auto ARIMA we apply for loop to determine the optimum

Examples of some parameter combinations for Model...

Table 18 - SARIMA - AIC Value

Figure 25 - SARIMA Diagnostic Plot