Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

Time se foreca ing

Submi ed By: Saranya S


Batch Code: PGPDSBA.O.JULY23.B
1
ri
tt
es
st
Content
S.No Topics Page No
Sparkling Wine
1.1 Wine - Sparkling Data 8
1.2 Basic Analysis 8
1.3 Time Series Conversion 10
1.4 Exploratory Data Analysis 12
1.5 Decomposition 16
1.6 Train & Test Split 18
1.7 Model Prediction 19
1.8 Stationary Check 28
1.9 Auto ARIMA & SARIMA 30
1.10 ACF & PACF Plot 34
1.11 Manual ARIMA & SARIMA 36
1.12 Test RMSE across models built 38
1.13 Optimum model on whole dataset 38
1.14 Future Prediction 39
Rose Wine
2.1 Wine - Rose Data 41
2.2 Basic Analysis 41
2.3 Time Series Conversion 44
2.4 Exploratory Data Analysis 45
2.5 Decomposition 49
2.6 Train & Test Split 52
2.7 Model Prediction 53
2.8 Stationary Check 62
2.9 Auto ARIMA & SARIMA 64

2
2.10 ACF & PACF Plot 67
2.11 Manual ARIMA & SARIMA 69
2.12 Test RMSE across models built 71
2.13 Optimum model on whole dataset 71
2.14 Future Prediction 72
2.15 Business Insights and Recommendation 74

List of Tables
S.No Topics Page No
1 Top 5 Rows 8
2 Last 5 Rows 9
3 Missing value check 9
4 Info Check 9
5 Descriptive Summary 10
6 Top 5 rows after converting 10
7 Pivot Table 14
8 Additive decompose value 17
9 Multiplicative decompose value 18
10 First and last 5 rows of Train & Test set 19
11 Moving Average forecast value 22
12 SES values between Alpha 1 to 9 24
13 RMSE across multiple models 27
14 Dickey fuller test results 28
15 Dickey fuller test results after differencing approach 29
16 ARIMA - AIC values 31
17 Auto ARIMA Summary 31
18 SARIMA - AIC values 32

3
19 Auto SARIMA Summary 33
20 Manual ARIMA Summary 36
21 Manual SARIMA Summary 37
22 RMSE across models 38
23 TES on entire data 39
24 Forecast for 12 months 39
25 Forecast Summary 40
26 Upper and Lower CI 40
Rose Wine
27 Top 5 Rows 42
28 Last 5 Rows 42
29 Missing value check 42
30 Info Check 43
31 Descriptive Summary 43
32 Top 5 rows after converting 44
33 Pivot Table 48
34 Additive decompose value 50
35 Multiplicative decompose value 51
36 Moving Average forecast value 56
37 RMSE value on Models 56
38 SES values between Alpha 1 to 9 58
39 TES value to nd best t 60
40 RMSE across multiple models 61
41 Dickey fuller test results 62
42 Dickey fuller test results after differencing approach 63
43 ARIMA - AIC values 65
44 Auto ARIMA Summary 65

4
fi
fi
45 Auto SARIMA Summary 66
46 Manual ARIMA Summary 69
47 Manual SARIMA Summary 70
48 RMSE across models 71
49 TES on entire data 72
50 Forecast for 12 months 72
51 Forecast Summary 73
52 Upper and Lower CI 73
List of Figures
S.No Topics Page No
1 Time series plot before & after time stamp 12
2 Year wise sales analysis 12
3 Monthly sales analysis 12
4 Weekly sales across years 13
5 Empirical Cumulative Distribution plot 13
6 Average Monthly sales across years 14
7 Month wise Comparison plot 15
8 Average and Percentage sales 15
9 Additive Decompose 16
10 Multiplicative Decompose 17
11 Train & Test Split 19
12 Linear Regression 20
13 Naives Model 20
14 Simple Average 21
15 MA prediction on Whole data 22
16 MA precision on Test data 22
17 Different model comparison 23
5
S.No Topics Page No
18 Simple Exponential Smoothing 23
19 SES Alpha 1 to 9 24
20 DES plot 25
21 TES Auto t model plot 26
22 TES Best t model plot 26
23 Stationary check 28
24 Stationary check after differencing approach 29
25 SARIMA Diagnostic plot 33
26 ACF Plot 34
27 PACF plot 35
28 Manual SARIMA Diagnostic Plot 37
29 Future forecast plot 41
30 Plot before Time stamp 44
31 Time series plot after Time stamp 45
32 Year wise sales analysis 45
33 Monthly sales across years 46
34 Weekly sales across years 46
35 Empirical Cumulative Distribution plot 47
36 Average monthly sales across years 47
37 Month wise Comparison plot 48
38 Average & Percentage sales 49
39 Additive Decompose 50
40 Multiplicative Decompose 51
41 Train and Test Split 52
42 Linear Regression 53
43 Naives Model 54
44 Simple Average 54
6
fi
fi
S.No Topics Page No
45 MA prediction Whole data 55
46 MA prediction on Test data 55
47 Different Model predicition 56
48 Simple Exponential Smoothing 57
49 SES Alpha 1 to 9 57
50 DES Plot 58
51 TES Auto t Model plot 59
52 TES Modle plot 60
53 TES Best t plot 61
54 Stationary check 62
55 Stationary check after differencing approach 63
56 Auto SARIMA Diagnostic plot 67
57 ACF plot 68
58 PACF plot 69
59 Manual SARIMA Diagnostic plot 70
60 Future forecast plot 74

7
fi
fi
1.1 Wine - Sparkling Data:

As an analyst at ABC Estate Wines, we are presented with historical data


encompassing the sales of different types of wines throughout the 20th
century. These datasets originate from the same company but represent
sales gures for distinct wine varieties. Our objective is to delve into the
data, analyze trends, patterns, and factors in uencing wine sales over the
course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic
decision-making and optimize sales strategies for the future.
The primary objective of this project is to analyze and forecast wine sales
trends for the 20th century based on historical data provided by ABC
Estate Wines. We aim to equip ABC Estate Wines with the necessary
insights and foresight to enhance sales performance, capitalize on
emerging market opportunities, and maintain a competitive edge in the
wine industry.
1.2 Basic Analysis:
Shape:

There are 187 rows and 2 Columns available in the give dataset.

First 5 rows of the dataset:

Table 1 : Top 5 rows


8
fi
fl
Last 5 rows of the dataset:

Table 2 : Last 5 rows

Duplicate Check:

There is no duplicate values in the Sparkling dataset.

Missing value Check:

There is no missing values in the Sparkling dataset.

Table 3 : Missing value check

Info Check:

Table 4 : Info check


9
In the Sparkling dataset we have 1 Object type and 1 Integer type values.

Descriptive Summary:

Table 5 : Descriptive summary

- The maximum sale of Sparkling wine is 7,242 and minimum is 1,070.


- The average sale of this particular type wine is 2,402.

1.3 Time series conversion:


We are have to change the given data into Time series data by converting
‘YearMonth’ column into Date format and name the column as Time
Stamp.

Table 6 : Top 5 rows after Converting

10
Lets drop Year Month column, which is not required after converting it into
date format.

Before:

After:

Figure 1 - Time series plot before & After Time stamp

- We could see seasonality has a pattern on yearly basis.


- During 1987 to 1989 we could see there is a spike in sale when
compared to other years.

11
1.4 Exploratory data analysis:

Figure 2 - Year wise sales analysis

- From the above year wise sales analysis we could see most of the years have
outliers except the year 1995. We need not treat outliers.
- The sales was on peak during the years 1987 to 1989.

Figure 3 - Monthly sales across years

- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Oct, Nov and Dec shows increasing trend in sales, which could be due to
Holiday season.

12
Figure 4 - Weekly sales across years

From the above plot its clear that on Saturdays we have highest number of
sales, when compared to other days.
Its clear that people tend to buy more during weekends.

Figure 5 - Empirical Cumulative Distribution plot

- We could see the highest sale was reached to 7000.


- 50% sale is within 4000.

13
Figure 6 - Average monthly sales across years

- The average monthly sales during the month Dec is approximately around
5,900.

Table 7 : Pivot table

- From the above Pivot table we can understand the monthly sales across years
in detail.
- The numbers coloured in red are the highest sales happen for that particular
month.
14
Figure 7 - Month wise comparison plot

- This plot also indicates the same trend, where Dec with highest sale across
all years followed by Nov and Oct.

Figure 8 - Average and Percentage sales

- We could see certain years upto 75% drop in sales.

15
1.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.

Additive Decompose:
- We could see seasonality is relatively constant over years.
- Over the years the trend keeps on changing.
- The residual are located around 0 in the below residual plot.

Figure 9 - Additive decompose

16
Table 8 - Additive decompose values

Multiplicative Decompose:

Figure 10 - Multiplicative decompose

17
Table 9 - Multiplicative decompose values

- Trend and seasonality remains the same between Additive and


Multiplicative decompose.
- Residual is around 1 in Multiplicative decompose.

Additive Decompose = Trend + Season + Residual


Multiplicative Decompose = Trend * Season * Residual

1.6 Train and Test Split:


- We are going to split the data into Train and Test.
- Train set till Dec 1990 and Test from Jan 1991.
- After splitting the data below is the shape of Train and Test set.

- From the below screenshots we can see the Train and Test set split are
achieved as expected.

18
Table 10 - First & last 5 rows of Train and Test set

Figure 11 - Train and Test split

The above plot helps us to visually see the range between Train and Test
set.

1.7 Model Predictions:


Linear regression: We are going to regress the 'Sales' variable against the
order of the occurrence. For this we need to modify our training data
before tting it into a linear regression.

19
fi
Figure 12 - Linear Regression

The green and red line indicates the model performance on the actual test
& train values.

The RMSE value for Linear regression is 1389.14

Naives Model:

Figure 13 - Naives Model


20
The green line indicates the model performance on the Orange test area,
where we could see the predictive values are far from the actual values.

The RMSE value for Naives Model is 3864.28

Simple Average:

Figure 14 - Simple Average

The green line indicates the Simple average model prediction on the
Orange test area, where we could see the predictive values and actual
values has high distance.

The RMSE value for Simple average Model is 1275.08

21
Moving Average:

Figure 15 - MA prediction on whole data

Figure 16 - MA prediction on Test data

Table 11 - Moving Average forecast value

22
The above table shows the Moving average performance between 2 to 9.
We can see the rolling average performs better than the simple
average.The higher the rolling window, the smoother the curve will be its,
hence more values are taken into account.
Model Comparison:

Figure 17 - Different model comparison

The above plot shows the performance of different models on Test values.
Simple Exponential Smoothing:

Figure 18 - Simple Exponential Smoothing


23
The auto picked model of smoothing parameter is Alpha 0.049 and RMSE
is 1316.136
Lets run test with different Alpha values to see the best t.

Figure 19 - SES Alpha 1 to 9

Table 12 - SES values between Alpha 1 to 9

The above table gives you the values between Alpha 1 to 9.


24
fi
Double Exponential Smoothing:

Figure 20 - DES Plot

Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 1778.564670

The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 1778.564670.
25
The above RMSE is derived from auto t model.

Triple Exponential Smoothing:

The auto t model for Triple exponential smoothing with Alpha 0.12, Beta
0.049 and Gamma 0.361.

The RMSE is 404.418

Figure 21 - TES Auto t model Plot

Figure 22 - TES Best t model Plot

The above plot is the best t model in Triple exponential smoothing.

26
fi
fi
fi
fi
fi
Alpha=0.4,Beta=0.1,Gamma=0.2,TripleExponentialSmoothing 317.434302

The best t Triple exponential smoothing is Alpha 0.4, Beta 0.1 and
Gamma 0.2 , where RMSE is 317.434302.

Best Model t:

We have built different models and got an idea which model give least
error in the test data set.

Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.

Table 13 - RMSE across multiple models

From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.

Alpha=0.12,Beta=0.049,Gamma=0.361,TripleExponentialSmoothing 404.418372
Alpha=0.4,Beta=0.1,Gamma=0.2,TripleExponentialSmoothing 317.434302

27
fi
fi
1.8 Stationary Check:
P < 0.05 = We reject the null Hypothesis.
P > 0.05 = We fail to reject the null Hypothesis.

Figure 23 - Stationary Check

Table 14 - Dickey Fuller test results

28
We could see the p-value as 0.601061 which is greater than 0.05. Hence
we fail to reject the null Hypothesis.
The dataset for Sparkling wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.

In order to make it stationary lets use different approaches. Lets take


difference of order 1 and drop NaN values from the dataset.

Figure 24 - Stationary Check after differencing approach

Table 15 - Dickey Fuller test result after differencing approach


29
Param AIC
10 (2, 1, 2) 2213.509212
15 (3, 1, 3) 2221.452537
14 (3, 1, 2) 2230.816576
11 (2, 1, 3) 2232.917659
9 (2, 1, 1) 2233.777626
3 (0, 1, 3) 2233.994858
2 (0, 1, 2) 2234.408323
6 (1, 1, 2) 2234.527200
13 (3, 1, 1) 2235.499067
7 (1, 1, 3) 2235.607814
5 (1, 1, 1) 2235.755095
12 (3, 1, 0) 2257.723379
8 (2, 1, 0) 2260.365744
1 (0, 1, 1) 2263.060016
4 (1, 1, 0) 2266.608539
0 (0, 1, 0) 2267.663036

From the above Dickey fuller test we can see the p-value as 0.000, which
is lesser than 0.05. Hence we reject the null hypothesis and conclude by
saying that the data is Stationary.

1.9 Auto ARIMA & SARIMA:


Auto ARIMA:
We use iteration to nd the optimum of p,d,q.
AR(Auto Regressive) - p
Differencing to make series stationary - d
MA(Moving Average) - q
We x the value for d as 1, where we have already determined d to be 1.
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (0, 1, 3)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (1, 1, 3)
30
fi
fi
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
Model: (2, 1, 3)
Model: (3, 1, 0)
Model: (3, 1, 1)
Model: (3, 1, 2)
Model: (3, 1, 3)

Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=2 for Auto ARIMA model.

Table 16 - ARIMA - AIC Value

Table 17 - Auto ARIMA Summary


31
Auto ARIMA Test RMSE value is 1299.9799

Auto SARIMA:

Similar to Auto ARIMA we apply for loop to determine the optimum


values.

Examples of some parameter combinations for Model...


Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (0, 1, 3)(0, 0, 3, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (1, 1, 3)(1, 0, 3, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)
Model: (2, 1, 3)(2, 0, 3, 12)
Model: (3, 1, 0)(3, 0, 0, 12)
Model: (3, 1, 1)(3, 0, 1, 12)
Model: (3, 1, 2)(3, 0, 2, 12)
Model: (3, 1, 3)(3, 0, 3, 12)

Table 18 - SARIMA - AIC Value

From the above summary we are going to choose the AIC with least value.

32
Table 19 - Auto SARIMA Summary

Figure 25 - SARIMA Diagnostic Plot


33
Auto SARIMA Test RMSE value is 528.586

1.10 ACF & PACF Plot:

Figure 26 - ACF Plot

34
In the ACF plot we could see decay after lag 1 for original as well as
differenced data.Hence we select the q value to be 1. i.e. q=1.

Figure 27 - PACF Plot


35
In the PACF plot also we can see signi cant bars till lag 1 for differenced
data which is stationary in nature, post 1 the decay is large enough.
Hence we are choosing p value to be 1. i.e. p=1, d=1. We have seen earlier
the series is stationary with lag1.
Hence the values selected for manual ARIMA are p=1, d=1, q=1

1.11 Manual ARIMA & SARIMA:


Manual ARIMA:

Table 20 - Manual ARIMA Summary

The Model prediction of Manual ARIMA RMSE is 1319.937.

Manual SARIMA:

Selected Manual SARIMA is (1, 1, 1) (1, 1, 1, 12)

36
fi
Table 21 - Manual SARIMA Summary

Figure 28 - Manual SARIMA Diagnostic Plot


37
The Model prediction of Manual SARIMA RMSE is 359.613

1.12 Test RMSE across Models built:

Table 22 - RMSE across Models

From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.

1.13 Optimum model on Whole dataset:

Now we are going to implement the Optimum model which we analysed


from all the mode prediction on Whole dataset, without any Train and Test
Split.

38
Figure 23 - TES on entire data
RMSE value on whole dataset is 414.324

1.14 Future Prediction:

Table 24 - Forecast for 12months


39
Table 25 - Forecast Summary

Table 26 - Upper and Lower CI

The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.

40
Figure 29 - Future forecast plot

*************************************
2.1 Wine - Rose Data:

As an analyst at ABC Estate Wines, we are presented with historical data


encompassing the sales of different types of wines throughout the 20th
century. These datasets originate from the same company but represent
sales gures for distinct wine varieties. Our objective is to delve into the
data, analyze trends, patterns, and factors in uencing wine sales over the
course of the century. By leveraging data analytics and forecasting
techniques, we aim to gain actionable insights that can inform strategic
decision-making and optimize sales strategies for the future.
2.2 Basic Analysis:
Shape:
There are 187 rows and 2 Columns available in the give dataset.

Duplicate Check:

There is no duplicate values in the Rose dataset.


41
fi
fl
First 5 rows of the dataset:

Table 27 : Top 5 rows


Last 5 rows of the dataset:

Table 28 : Last 5 rows

Missing value Check:

There are 2 missing values in Rose dataset.

Table 29 : Missing value check

We will treat the missing values with Linear Interpolation method.

42
Info Check:

Table 30 : Info check

In the Rose dataset we have 1 Object type and 1 Float type values.

Descriptive Summary:

Table 31 : Descriptive summary

- The maximum sale of Rose wine is 267 and minimum is 28.


- The average sale of this particular type wine is 89.91.

43
Figure 30 - Plot before Time stamp

2.3 Time series conversion:


We are have to change the given data into Time series data by converting
‘YearMonth’ column into Date format and name the column as Time
Stamp.

Table 32 : Top 5 rows after Converting

Lets drop Year Month column, which is not required after converting it into
date format.

44
Figure 31 - Time series plot after Time stamp

- We could see seasonality has a pattern on yearly basis.


- During 1981 we could see there is a spike in sale when compared to
other years.

2.4 Exploratory data analysis:

Figure 32 - Year wise sales analysis

- From the above year wise sales analysis we could see certain years have
outliers. We need not treat outliers.
- The sales was on peak during the years 1980 & 1981.
45
Figure 33 - Monthly sales across years

- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Holiday seasons show increasing trend in sales, when compared to other
seasons.

Figure 34 - Weekly sales across years

When we do a weekly comparison we could see except Saturday and


Sunday rest all days has outliers.

Looks like Rose wine was bought by people across all days irrespective for
weekend.
46
Figure 35 - Empirical Cumulative Distribution plot

- We could see the highest sale was reached to 250.


- 50% sale is within 150.

Figure 36 - Average monthly sales across years


47
- The average monthly sales during the month Dec is approximately around
250.

Table 33 : Pivot table

- From the above Pivot table we can understand the monthly sales across years
in detail.
- The cells coloured in Green are the highest sales happen for that particular
month.

Figure 37 - Month wise comparison plot

48
Figure 38 - Average and Percentage sales

- We could each year has a certain % of drop, with minimum 40% and
Maximum more than 60% sales drop.
-
2.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.

Additive Decompose:
From the below plot we can see the Trend in peak during 1981 and keeps
declining over the years.

The Residual value is spread across and not in straight line.

We can also see Seasonality and Trend present in the dataset.

49
Figure 39 - Additive decompose

Table 34 - Additive decompose values

50
Multiplicative Decompose:

Figure 40 - Multiplicative decompose

Table 35 - Multiplicative decompose values


51
- Trend and seasonality remains the same between Additive and
Multiplicative decompose.
- Residual is straight around 1 in Multiplicative decompose.
- Trend and Seasonality is present and same as additive.
- Residual is between 0 to 1 in Multiplicative, where in Additive is was
spread between 0 to 50.
- We can say Multiplicative model is more stable in Residual part than
Additive.

Additive Decompose = Trend + Season + Residual


Multiplicative Decompose = Trend * Season * Residual

1.6 Train and Test Split:


- We are going to split the data into Train and Test.
- Train set till Dec 1990 and Test from Jan 1991.
- After splitting the data below is the shape of Train and Test set.

- From the below screenshots we can see the Train and Test set split are
achieved as expected.

Figure 41 - Train and Test split


52
The above plot helps us to visually see the range between Train and Test
set.

2.7 Model Predictions:


Linear regression: We are going to regress the 'Sales' variable against the
order of the occurrence. For this we need to modify our training data
before tting it into a linear regression.

Figure 42 - Linear Regression

The green and red line indicates the model performance on the actual test
& train values.

The RMSE value for Linear regression is 15.27

Naives Model:
The green line indicates the model performance on the Orange test area,
where we could see the predictive values are far from the actual values.

The RMSE value for Naives Model is 79.72


53
fi
Figure 43 - Naives Model

Simple Average:

Figure 44 - Simple Average


54
The green line indicates the Simple average model prediction on the
Orange test area, where we could see the predictive values and actual
values has high distance.

The RMSE value for Simple average Model is 53.46

Moving Average:

Figure 45 - MA prediction on whole data

Figure 46 - MA prediction on Test data


55
Table 36 - Moving Average forecast value

The above table shows the Moving average performance between 2 to 9.


We can see the rolling average performs better than the simple
average.When the rolling window is higher the curve will be smoother,
Hence we will take more values.
Model Comparison:

Figure 47 - Different model comparison

The above plot shows the performance of different models on Test values.

Table 37 -RMSE Values on Models


56
Simple Exponential Smoothing:

Figure 48 - Simple Exponential Smoothing

The auto picked model of smoothing parameter is Alpha =0.098 and


RMSE is 37.592

Lets run test with different Alpha values to see the best t.

Figure 49 - SES Alpha 1 to 9


57
fi
Table 38 - SES values between Alpha 1 to 9

The above table gives you the values between Alpha 1 to 9.

Double Exponential Smoothing:

Figure 50 - DES Plot

Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 36.923416

58
The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 36.924

The above RMSE is derived from auto t model.

Triple Exponential Smoothing:

The auto t model for Triple exponential smoothing with Alpha=0.0720,


Beta=0.044, Gamma=1.366.

The RMSE is 20.741

Figure 51 - TES Auto t model Plot


59
fi
fi
fi
Best Model t:

The below is the best t model in Triple exponential smoothing.

Table 39 - TSE values to nd best t

The best t Triple exponential smoothing is Alpha 0.1, Beta 0.2 and
Gamma 0.1 , where RMSE is 9.223504.

Figure 52 - TES model Plot

60
fi
fi
fi
fi
fi
We have built different models and got an idea which model give least
error in the test data set.

Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.

Table 40 - RMSE across multiple models

From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.

Figure 53 - TES Best t model Plot


61
fi
2.8 Stationary Check:
P < 0.05 = We reject the null Hypothesis.
P > 0.05 = We fail to reject the null Hypothesis.

Figure 54 - Stationary Check

Table 41 - Dickey Fuller test results


62
We could see the p-value as 0.343101 which is greater than 0.05. Hence
we fail to reject the null Hypothesis.

The dataset for Rose wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.

In order to make it stationary lets use different approaches. Lets take


difference of order 1 and drop NaN values from the dataset.

Figure 55 - Stationary Check after differencing approach

Table 42 - Dickey Fuller test result after differencing approach


63
From the above Dickey fuller test we can see the p-value as 1.810895e-12,
which is lesser than 0.05. Hence we reject the null hypothesis and
conclude by saying that the data is Stationary.

2.9 Auto ARIMA & SARIMA:


Auto ARIMA:
We use iteration to nd the optimum of p,d,q.
AR(Auto Regressive) - p
Differencing to make series stationary - d
MA(Moving Average) - q
We x the value for d as 1, where we have already determined d to be 1.
Some parameter combinations for the Model...
Model: (0, 1, 1)
Model: (0, 1, 2)
Model: (0, 1, 3)
Model: (1, 1, 0)
Model: (1, 1, 1)
Model: (1, 1, 2)
Model: (1, 1, 3)
Model: (2, 1, 0)
Model: (2, 1, 1)
Model: (2, 1, 2)
Model: (2, 1, 3)
Model: (3, 1, 0)
Model: (3, 1, 1)
Model: (3, 1, 2)
Model: (3, 1, 3)

Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=3 for Auto ARIMA model.

64
fi
fi
Table 43 - ARIMA - AIC Value

Table 44 - Auto ARIMA Summary


65
Auto ARIMA Test RMSE value is 136.8175

Auto SARIMA:

Similar to Auto ARIMA we apply for loop to determine the optimum


values.
Examples of some parameter combinations for Model...
Model: (0, 1, 1)(0, 0, 1, 12)
Model: (0, 1, 2)(0, 0, 2, 12)
Model: (0, 1, 3)(0, 0, 3, 12)
Model: (1, 1, 0)(1, 0, 0, 12)
Model: (1, 1, 1)(1, 0, 1, 12)
Model: (1, 1, 2)(1, 0, 2, 12)
Model: (1, 1, 3)(1, 0, 3, 12)
Model: (2, 1, 0)(2, 0, 0, 12)
Model: (2, 1, 1)(2, 0, 1, 12)
Model: (2, 1, 2)(2, 0, 2, 12)
Model: (2, 1, 3)(2, 0, 3, 12)
Model: (3, 1, 0)(3, 0, 0, 12)
Model: (3, 1, 1)(3, 0, 1, 12)
Model: (3, 1, 2)(3, 0, 2, 12)
Model: (3, 1, 3)(3, 0, 3, 12)

Table 45 - Auto SARIMA - AIC Value

From the above summary we are going to choose the AIC with least value.

66
Figure 56 - Auto SARIMA Diagnostic Plot

Auto SARIMA Test RMSE value is 518.883

2.10 ACF & PACF Plot:

67
Figure 57 - ACF Plot

In the ACF plot we could see decay after lag 2 for original as well as
differenced data.Hence we select the q value to be 2. i.e. q=2.

68
Figure 58 - PACF Plot

In the PACF plot also we can see signi cant bars till lag 1 for differenced
data which is stationary in nature, post 2 the decay is large enough.
Hence we are choosing p value to be 2. i.e. p=2, d=1. We have seen earlier
the series is stationary with lag 2. The values selected for manual ARIMA
are p=2, d=1, q=2
2.11 Manual ARIMA & SARIMA:
Manual ARIMA:

Table 46 - Manual ARIMA Summary


69
fi
The Model prediction of Manual ARIMA RMSE is 36.872.

Manual SARIMA:
Selected Manual SARIMA is (2, 1, 2) (2, 1, 2, 12)

Table 47 - Manual SARIMA Summary

Figure 59 - Manual SARIMA Diagnostic Plot


70
The Model prediction of Manual SARIMA RMSE is 15.1696

2.12 Test RMSE across Models built:

Table 48 - RMSE across Models

From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.

2.13 Optimum model on Whole dataset:


Now we are going to implement the Optimum model which we analysed
from all the mode prediction on Whole dataset, without any Train and Test
Split.

71
Figure 49 - TES on entire data

RMSE value on whole dataset is 18.899

2.14 Future Prediction:


Forecasted values for the next 12 months:
1995-08-31 46.576143
1995-09-30 48.463220
1995-10-31 51.275198
1995-11-30 59.982543
1995-12-31 84.153330
1996-01-31 32.062233
1996-02-29 40.612599
1996-03-31 47.214012
1996-04-30 48.421536
1996-05-31 40.498449
1996-06-30 47.692170
1996-07-31 52.675485

Table 50 - Forecast for 12months


72
Table 51 - Forecast Summary

Table 52 - Upper and Lower CI

The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.

73
Figure 60 - Future forecast plot

2.15 Business Insights and Recommendation:


On comparing both wines, we could see Sparkling wine is higher in sales
over the years when compared to Rose wine.

- We could see people likely to purchase wine during festive season and
drop in sales during Peak winter season.
- Rose wine is always in dropping trend over years with 1980 as
exceptional.
- We can expect Rose wine to drop further in future years too.
- Sparkling is popular over periods from 1980 and expected to maintain
the same trend in Future as well.
- There is an increase in sale of Sparkling wine during weekends and
Rose wine was purchased all days of the week.
- The company can arrange for a campaign during non peak months.
- The company can think of issuing some samples in stores and Markets
as an advertising strategy.
- The company should also analyse the reason for less popularity of Rose
wine and if required can also think about make some changes to the
Rose wine production and Marketing strategy.

74
- Data can also be collect on the customers who purchaser Rose wine and
do some analytics to nd out what kind of people prefer Rose wine to
make changes accordingly to attract other crowd too.
- Since Sparkling is already popular we can give some combo offers by
selling Rose wine and Sparkling wine together at a certain discount to
increase sale of Rose wine along with Sparkling wine.

*********** ank Y ***********

75
fi
Th
ou

You might also like