Professional Documents
Culture Documents
Time Series Forecating
Time Series Forecating
2
2.10 ACF & PACF Plot 67
2.11 Manual ARIMA & SARIMA 69
2.12 Test RMSE across models built 71
2.13 Optimum model on whole dataset 71
2.14 Future Prediction 72
2.15 Business Insights and Recommendation 74
List of Tables
S.No Topics Page No
1 Top 5 Rows 8
2 Last 5 Rows 9
3 Missing value check 9
4 Info Check 9
5 Descriptive Summary 10
6 Top 5 rows after converting 10
7 Pivot Table 14
8 Additive decompose value 17
9 Multiplicative decompose value 18
10 First and last 5 rows of Train & Test set 19
11 Moving Average forecast value 22
12 SES values between Alpha 1 to 9 24
13 RMSE across multiple models 27
14 Dickey fuller test results 28
15 Dickey fuller test results after differencing approach 29
16 ARIMA - AIC values 31
17 Auto ARIMA Summary 31
18 SARIMA - AIC values 32
3
19 Auto SARIMA Summary 33
20 Manual ARIMA Summary 36
21 Manual SARIMA Summary 37
22 RMSE across models 38
23 TES on entire data 39
24 Forecast for 12 months 39
25 Forecast Summary 40
26 Upper and Lower CI 40
Rose Wine
27 Top 5 Rows 42
28 Last 5 Rows 42
29 Missing value check 42
30 Info Check 43
31 Descriptive Summary 43
32 Top 5 rows after converting 44
33 Pivot Table 48
34 Additive decompose value 50
35 Multiplicative decompose value 51
36 Moving Average forecast value 56
37 RMSE value on Models 56
38 SES values between Alpha 1 to 9 58
39 TES value to nd best t 60
40 RMSE across multiple models 61
41 Dickey fuller test results 62
42 Dickey fuller test results after differencing approach 63
43 ARIMA - AIC values 65
44 Auto ARIMA Summary 65
4
fi
fi
45 Auto SARIMA Summary 66
46 Manual ARIMA Summary 69
47 Manual SARIMA Summary 70
48 RMSE across models 71
49 TES on entire data 72
50 Forecast for 12 months 72
51 Forecast Summary 73
52 Upper and Lower CI 73
List of Figures
S.No Topics Page No
1 Time series plot before & after time stamp 12
2 Year wise sales analysis 12
3 Monthly sales analysis 12
4 Weekly sales across years 13
5 Empirical Cumulative Distribution plot 13
6 Average Monthly sales across years 14
7 Month wise Comparison plot 15
8 Average and Percentage sales 15
9 Additive Decompose 16
10 Multiplicative Decompose 17
11 Train & Test Split 19
12 Linear Regression 20
13 Naives Model 20
14 Simple Average 21
15 MA prediction on Whole data 22
16 MA precision on Test data 22
17 Different model comparison 23
5
S.No Topics Page No
18 Simple Exponential Smoothing 23
19 SES Alpha 1 to 9 24
20 DES plot 25
21 TES Auto t model plot 26
22 TES Best t model plot 26
23 Stationary check 28
24 Stationary check after differencing approach 29
25 SARIMA Diagnostic plot 33
26 ACF Plot 34
27 PACF plot 35
28 Manual SARIMA Diagnostic Plot 37
29 Future forecast plot 41
30 Plot before Time stamp 44
31 Time series plot after Time stamp 45
32 Year wise sales analysis 45
33 Monthly sales across years 46
34 Weekly sales across years 46
35 Empirical Cumulative Distribution plot 47
36 Average monthly sales across years 47
37 Month wise Comparison plot 48
38 Average & Percentage sales 49
39 Additive Decompose 50
40 Multiplicative Decompose 51
41 Train and Test Split 52
42 Linear Regression 53
43 Naives Model 54
44 Simple Average 54
6
fi
fi
S.No Topics Page No
45 MA prediction Whole data 55
46 MA prediction on Test data 55
47 Different Model predicition 56
48 Simple Exponential Smoothing 57
49 SES Alpha 1 to 9 57
50 DES Plot 58
51 TES Auto t Model plot 59
52 TES Modle plot 60
53 TES Best t plot 61
54 Stationary check 62
55 Stationary check after differencing approach 63
56 Auto SARIMA Diagnostic plot 67
57 ACF plot 68
58 PACF plot 69
59 Manual SARIMA Diagnostic plot 70
60 Future forecast plot 74
7
fi
fi
1.1 Wine - Sparkling Data:
There are 187 rows and 2 Columns available in the give dataset.
Duplicate Check:
Info Check:
Descriptive Summary:
10
Lets drop Year Month column, which is not required after converting it into
date format.
Before:
After:
11
1.4 Exploratory data analysis:
- From the above year wise sales analysis we could see most of the years have
outliers except the year 1995. We need not treat outliers.
- The sales was on peak during the years 1987 to 1989.
- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Oct, Nov and Dec shows increasing trend in sales, which could be due to
Holiday season.
12
Figure 4 - Weekly sales across years
From the above plot its clear that on Saturdays we have highest number of
sales, when compared to other days.
Its clear that people tend to buy more during weekends.
13
Figure 6 - Average monthly sales across years
- The average monthly sales during the month Dec is approximately around
5,900.
- From the above Pivot table we can understand the monthly sales across years
in detail.
- The numbers coloured in red are the highest sales happen for that particular
month.
14
Figure 7 - Month wise comparison plot
- This plot also indicates the same trend, where Dec with highest sale across
all years followed by Nov and Oct.
15
1.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.
Additive Decompose:
- We could see seasonality is relatively constant over years.
- Over the years the trend keeps on changing.
- The residual are located around 0 in the below residual plot.
16
Table 8 - Additive decompose values
Multiplicative Decompose:
17
Table 9 - Multiplicative decompose values
- From the below screenshots we can see the Train and Test set split are
achieved as expected.
18
Table 10 - First & last 5 rows of Train and Test set
The above plot helps us to visually see the range between Train and Test
set.
19
fi
Figure 12 - Linear Regression
The green and red line indicates the model performance on the actual test
& train values.
Naives Model:
Simple Average:
The green line indicates the Simple average model prediction on the
Orange test area, where we could see the predictive values and actual
values has high distance.
21
Moving Average:
22
The above table shows the Moving average performance between 2 to 9.
We can see the rolling average performs better than the simple
average.The higher the rolling window, the smoother the curve will be its,
hence more values are taken into account.
Model Comparison:
The above plot shows the performance of different models on Test values.
Simple Exponential Smoothing:
Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 1778.564670
The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 1778.564670.
25
The above RMSE is derived from auto t model.
The auto t model for Triple exponential smoothing with Alpha 0.12, Beta
0.049 and Gamma 0.361.
26
fi
fi
fi
fi
fi
Alpha=0.4,Beta=0.1,Gamma=0.2,TripleExponentialSmoothing 317.434302
The best t Triple exponential smoothing is Alpha 0.4, Beta 0.1 and
Gamma 0.2 , where RMSE is 317.434302.
Best Model t:
We have built different models and got an idea which model give least
error in the test data set.
Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.
From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.
Alpha=0.12,Beta=0.049,Gamma=0.361,TripleExponentialSmoothing 404.418372
Alpha=0.4,Beta=0.1,Gamma=0.2,TripleExponentialSmoothing 317.434302
27
fi
fi
1.8 Stationary Check:
P < 0.05 = We reject the null Hypothesis.
P > 0.05 = We fail to reject the null Hypothesis.
28
We could see the p-value as 0.601061 which is greater than 0.05. Hence
we fail to reject the null Hypothesis.
The dataset for Sparkling wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.
From the above Dickey fuller test we can see the p-value as 0.000, which
is lesser than 0.05. Hence we reject the null hypothesis and conclude by
saying that the data is Stationary.
Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=2 for Auto ARIMA model.
Auto SARIMA:
From the above summary we are going to choose the AIC with least value.
32
Table 19 - Auto SARIMA Summary
34
In the ACF plot we could see decay after lag 1 for original as well as
differenced data.Hence we select the q value to be 1. i.e. q=1.
Manual SARIMA:
36
fi
Table 21 - Manual SARIMA Summary
From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.
38
Figure 23 - TES on entire data
RMSE value on whole dataset is 414.324
The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.
40
Figure 29 - Future forecast plot
*************************************
2.1 Wine - Rose Data:
Duplicate Check:
42
Info Check:
In the Rose dataset we have 1 Object type and 1 Float type values.
Descriptive Summary:
43
Figure 30 - Plot before Time stamp
Lets drop Year Month column, which is not required after converting it into
date format.
44
Figure 31 - Time series plot after Time stamp
- From the above year wise sales analysis we could see certain years have
outliers. We need not treat outliers.
- The sales was on peak during the years 1980 & 1981.
45
Figure 33 - Monthly sales across years
- We could see increase in sales during the month of Dec across years, when
compared to other months.
- Holiday seasons show increasing trend in sales, when compared to other
seasons.
Looks like Rose wine was bought by people across all days irrespective for
weekend.
46
Figure 35 - Empirical Cumulative Distribution plot
- From the above Pivot table we can understand the monthly sales across years
in detail.
- The cells coloured in Green are the highest sales happen for that particular
month.
48
Figure 38 - Average and Percentage sales
- We could each year has a certain % of drop, with minimum 40% and
Maximum more than 60% sales drop.
-
2.5 Decomposition:
Decomposition is used as a preprocessing step in time series analysis to
remove trend and seasonal effects, making it easier to identify and model
the underlying patterns or relationships in the data.
It can also be used for forecasting, anomaly detection, and other
applications in time series analysis and forecasting.
Additive Decompose:
From the below plot we can see the Trend in peak during 1981 and keeps
declining over the years.
49
Figure 39 - Additive decompose
50
Multiplicative Decompose:
- From the below screenshots we can see the Train and Test set split are
achieved as expected.
The green and red line indicates the model performance on the actual test
& train values.
Naives Model:
The green line indicates the model performance on the Orange test area,
where we could see the predictive values are far from the actual values.
Simple Average:
Moving Average:
The above plot shows the performance of different models on Test values.
Lets run test with different Alpha values to see the best t.
Alpha=0.1,Beta=0.1,DoubleExponentialSmoothing 36.923416
58
The Double Exponential Smoothing for Alpha 0.1 & Beta 0.1 the RMSE
value is 36.924
The best t Triple exponential smoothing is Alpha 0.1, Beta 0.2 and
Gamma 0.1 , where RMSE is 9.223504.
60
fi
fi
fi
fi
fi
We have built different models and got an idea which model give least
error in the test data set.
Based on the RMSE score we need to build the model for the whole data
in order to see the forecast.
From the above table we could see RMSE for all the models we built.
The least error derived from both Triple exponential smoothing models.
The dataset for Rose wine is non - Stationary. We don’t have enough
evidence to prove that its stationary.
Akaike information criterion (AIC) value is derived for each model and
the model with least AIC value was selected.
From the below summary we are going to choose the AIC with least value,
which is p=2,d=1,q=3 for Auto ARIMA model.
64
fi
fi
Table 43 - ARIMA - AIC Value
Auto SARIMA:
From the above summary we are going to choose the AIC with least value.
66
Figure 56 - Auto SARIMA Diagnostic Plot
67
Figure 57 - ACF Plot
In the ACF plot we could see decay after lag 2 for original as well as
differenced data.Hence we select the q value to be 2. i.e. q=2.
68
Figure 58 - PACF Plot
In the PACF plot also we can see signi cant bars till lag 1 for differenced
data which is stationary in nature, post 2 the decay is large enough.
Hence we are choosing p value to be 2. i.e. p=2, d=1. We have seen earlier
the series is stationary with lag 2. The values selected for manual ARIMA
are p=2, d=1, q=2
2.11 Manual ARIMA & SARIMA:
Manual ARIMA:
Manual SARIMA:
Selected Manual SARIMA is (2, 1, 2) (2, 1, 2, 12)
From the above Table we can see the least value achieved from Triple
Exponential Smoothing. Hence we will use TES model in the entire
dataset.
71
Figure 49 - TES on entire data
The above table gives the prediction, Lower CI and Upper CI for future 12
months from Aug 1995 to Jul 1996.
73
Figure 60 - Future forecast plot
- We could see people likely to purchase wine during festive season and
drop in sales during Peak winter season.
- Rose wine is always in dropping trend over years with 1980 as
exceptional.
- We can expect Rose wine to drop further in future years too.
- Sparkling is popular over periods from 1980 and expected to maintain
the same trend in Future as well.
- There is an increase in sale of Sparkling wine during weekends and
Rose wine was purchased all days of the week.
- The company can arrange for a campaign during non peak months.
- The company can think of issuing some samples in stores and Markets
as an advertising strategy.
- The company should also analyse the reason for less popularity of Rose
wine and if required can also think about make some changes to the
Rose wine production and Marketing strategy.
74
- Data can also be collect on the customers who purchaser Rose wine and
do some analytics to nd out what kind of people prefer Rose wine to
make changes accordingly to attract other crowd too.
- Since Sparkling is already popular we can give some combo offers by
selling Rose wine and Sparkling wine together at a certain discount to
increase sale of Rose wine along with Sparkling wine.
75
fi
Th
ou