Professional Documents
Culture Documents
Recursive Chain Forecasting: A Hybrid Time Series Model - Blueocean MI
Recursive Chain Forecasting: A Hybrid Time Series Model - Blueocean MI
Bishal Neogi, Senior Data Scientist Soumajyoti Mazumder, Data Scientist Reetika Choudhary, Data Scientist with inputs and guidance from Eron Kar, AVP, Advanced Analytics
www.blueoceanmi.com
ABSTRACT
A technology major had a few pre-existing brands in the market which they wanted to replace with a single easy-to-carry, less space-occupying product across the major U.S. electronic/technology retailers for all the states. In due course of the analysis, we came across the parent product (Product A), which had 3 different pre-existing variants. The company plans to replace all those with a single branded product (Product B). Our analysis methodology assesses the time-series sales data for a stipulated time period, with all possible conventional tool kits. What really cuts through is our hybrid methodology, which magnifies the model precision over and above the standalone known techniques.
OBJECTIVE
To understand the sales trend of Product A and Product B over a period of time To determine the effect of seasonality which acts as a key influencer of Product A (and consequently Product B) Data harmonization for reducing seasonal impact and carrying out an unbiased analysis over a period of time To study the effect on a cross-sell unit volume of a product due to the change in sales volume of Product A and/or Product B To develop a final time series model to establish the unit sales volume for Product A and B
DATA
Our data set comprises of a product and its sub-product referred to as Product A and Product B respectively throughout the analysis. We have considered bivariate monthly series for our analysis, starting from July 2010 til March 2013.
Figure 1: Data Plot of Product A and Product B Sales
TABLE 1
Variable Product A Product B Description Sales Sales
10
15
20 Time
25
30
As we can see in the figure 1 above, both Product A and Product B sales are following a similar pattern throughout the observed time period.
CHOICE OF MODEL
For monthly data, an additive model incorporates an underlying assumption that the difference between the values of any two distinct months will approximately remain same each year. In other words, the amplitude of
the seasonal effect will be same every year. Based on this assumption, the graphical results and the logical understanding of the extreme possible hypothetical situation where the sales can reach a point of zero, we have agreed on to make use of the Additive model. For a clear picture, consider the two graphs below (Figure 2). On observation, you will notice that the Additive model best fits the given data. Hence, we conclude that choice of the Additive model is the most appropriate of all.
180000
180000
Product B
140000
140000
Product B
2011.0 2012.0 2013.0
100000
100000
10
15
20
100000
140000
180000
2012.0
2013.0
Index
Time
Index
Time
APPROACH 1:
Decomposition Trend Analysis Seasonality Indexing and Normalization Stationarity Check and Granger Causality Regression
DECOMPOSITION
To proceed, we first decompose the data into its respective components, including trend (Tt), seasonality (St) and irregularity (It). The equation below gives a concise view of our analytical modelling technique, and the graphs further substantiate it with pictorial representation.
Data = Tt + St + It
Figure 3.1: Decomposition for Product A Sales
Decomposition of additive time series
ranbo seasonal trend observed ranbo seasonal trend observed
2010.5 2011.0 2011.5 2012.0 2012.5 2013.0
2010.5
2011.0
2011.5
2012.0
2012.5
2013.0
Time
Time
On observing the graphs above (Figure 3.1 and Figure 3.2), we notice that the sales trend of Product A & Product B up to 2012, has a linear increasing pattern, post which, their behaviour becomes constant. However, as for the seasonality, both the products portray almost the same behaviour.
TREND
Next, we plot the data of both the products together to capture permanent changes in the given time period. The trend plot in Figure 4 helps us look for an overall pattern hidden in the data and in the long run, even helps us forecast future values.
Product A Sales
150000 80000
200000
250000
100000
120000
140000
160000
180000
Product B Sales
60000
20000
-20000
-20000 2010.5
20000
60000
2010.5
2011.0
2011.5
2012.0
2012.5
2013.0
2011.0
2011.5
2012.0
2012.5
2013.0
Time
Time
STATIONARITY CHECK
The stationarity check tells us that both Product A and Product B are non-stationary in nature, up to a legitimate level. Hence, to make the series stationary, we do the 2nd order differencing. The referred test gives us a fair-enough evidence to claim that the seasonally adjusted and twice-differenced data is stationary at 5% level of significance. After the removal of Non-stationarity in both Product A and Product B, we fit them into a linear regression framework.
CAUSALITY
Since we are modelling a bivariate time series data, we have used Granger Causality test to check the causality dependence between the variables. The test gives us following insights: On checking causality between Product A and Product B, we see that Product A is granger causal to Product B up to order 3. The results stand valid up to 47% level of significance. However, the point to be noted here is that for the above statement, the reverse does not hold true. On checking causality between Product B and Product A, we see that Product B does not granger cause Product A at 5% level of significance.
LINEAR REGRESSION
To see if our model best fits the linear regression framework, we plot the adjusted data and obtain the below mentioned graph in Figure 7:
2 Product A
-1e+05
-1e+05
0e+00
1e+05
-5e+04
0e+00
2
5e+04
1e+05
Product B
At a 0.93 level of significance, we conclude that the data best fits the linear model and hence the derived model can be as represented below:
2 Product A= +* 2 ProductB
APPROACH 2:
MARS on complete data Stationarity check Fitting MARS on the normalized data
PRODUCT A
200000
150000
-1e+05 0
0e+00
1e+05
10
15 INDEX
20
25
30
MODEL VALIDATION
To validate our model selection process, we proceed to model validation. One of the reasons why model validation is necessary is that the high R-square is not sufficient to conclude the goodness of fit. Hence, we conduct few tests like Residual Analysis and plot the normal Q-Q plot (Figure 10) to analyze the goodness of fit of the regression. We check if the residuals are random and if the model's predictive performance weakens substantially when we infuse new variables into the estimation process. Apart from this, we even adopt standard methods to detect the various outliers among residuals. If you observe the residual plot below (Figure 9), you would see that the residuals are behaving in a random fashion along the line y=0 and this suggests that our model fits the data well. Further, autocorrelation tests have yielded 87% proof that no auto correlation exists between residual and time.
20000
Simple Quantiles
Residual Values
-20000
-20000
20000
-100000
50000
150000
-2
-1
Predicted Values
Theoretical Quantiles
FORECASTING
Using our final model as mentioned below, we now move ahead with forecasting sales for product A assuming that the sales for product B is a given, and that the seasonal effect is known to us.
Vrindavan Tech Village Building 2A, Ground Floor East Tower Sarjapur Outer Ring Road Bangalore, 560 037 INDIA p: 91.80.41785800
4835 E Cactus Road, #300 Scottsdale, AZ 85254 USA p: 602.441.2474 11280 Northup Way, #E200 Bellevue, WA 98005 USA