Professional Documents
Culture Documents
Time Series Forecasting
Time Series Forecasting
Time Series Forecasting
Introduction
A time series is a sequence of observations recorded over a certain period of time.
A simple example of time series is how we come across different temperature
changes day by day or in a month.
There are different methods to forecast the values.
while Forecasting time series values, 3 important terms need to be taken care of
and the main task of time series forecasting is to forecast these three terms.
1) Seasonality
Seasonality is a simple term that means while predicting a time series data there
are some months in a particular domain where the output value is at a peak as
compared to other months. for example if you observe the data of tours and travels
companies of past 3 years then you can see that in November and December the
distribution will be very high due to holiday season and festival season. So while
forecasting time series data we need to capture this seasonality.
2) Trend
The trend is also one of the important factors which describe that there is certainly
increasing or decreasing trend time series, which actually means the value of
organization or sales over a period of time and seasonality is increasing or
decreasing.
3) Unexpected Events
Unexpected events mean some dynamic changes occur in an organization, or in
the market which cannot be captured. for example a current pandemic we are
suffering from, and if you observe the Sensex or nifty chart there is a huge
decrease in stock price which is an unexpected event that occurs in the
surrounding.
Methods and algorithms are using which we can capture seasonality and trend But
the unexpected event occurs dynamically so capturing this becomes very difficult.
Rolling Statistics and Stationarity in Time-series
A stationary time series is a data that has a constant mean and constant variance.
If I take a mean of T1 and T2 and compare it with the mean of T4 and T5 then is
it the same, and if different, how much difference is there? So, constant mean
means this difference should be less, and the same with variance.
If the time series is not stationary, we have to make it stationary and then proceed
with modelling. Rolling statistics is help us in making time series stationary. so
basically rolling statistics calculates moving average. To calculate the moving
average we need to define the window size which is basically how much past
values to be considered.
For example, if we take the window as 2 then to calculate a moving average in the
above example then, at point T1 it will be blank, at point T2 it will be the mean of
T1 and T2, at point T3 mean of T3 and T2, and so on. And after calculating all
moving averages if you plot the line above actual values and calculated moving
averages then you can see that the plot will be smooth.
This is one method of making time series stationary, there are other methods also
which we are going to study as Exponential smoothing.
Additive and Multiplicative Time series
In the real world, we meet with different kinds of time series data. For this, we
must know the concepts of Exponential smoothing and for this first, we need to
study types of time series data as additive and multiplicative. As we studied there
are 3 components we need to capture as Trend(T), seasonality(S), and
Irregularity(I).
Additive time series is a combination(addition) of trend, seasonality, and
Irregularity while multiplicative time series is the multiplication of these three
terms.
Components of Time Series
Time series analysis provides a body of techniques to better understand a dataset.
Perhaps the most useful of these is the decomposition of a time series into 4
constituent parts:
1. Level. The baseline value for the series if it were a straight line.
2. Trend. The optional and often linear increasing or decreasing behavior of
the series over time.
3. Seasonality. The optional repeating patterns or cycles of behavior over time.
4. Noise. The optional variability in the observations that cannot be explained
by the model.
Time Series Models:
Model Use
Exponential
Smooth-based model + exponential window function
Smoothing
TOP 10 ALGORITHMS
1. Autoregressive (AR)
2. Autoregressive Integrated Moving Average (ARIMA)
3. Seasonal Autoregressive Integrated Moving Average (SARIMA)
4. Exponential Smoothing (ES)
5. XGBoost
6. Prophet
7. LSTM (Deep Learning)
8. DeepAR
9. N-BEATS
Moving Averages
Moving averages alone aren’t that useful for forecasting. Instead, they are mainly
used for analysis. For example, moving averages help stock investors in technical
analysis by smoothing out the volatility of constantly changing prices. This way,
short-term fluctuations are reduced, and investors can get a more generalized
picture. Math-wise, simple moving averages are dead simple to implement:
Where t represents the time period and s the size of a sliding window.
Let’s take a look at an example. x will represent a sample time series without the
time information, and we’ll calculate moving averages for sliding window sizes 2
and 3.
As you can see, the first data point is copied, as there’s no data point before it to
calculate the average. Pandas won’t copy the value by default, but will instead
return NaN. You can specify min_periods=1 inside the rolling() functions to avoid
this behavior.
The calculations are dead simple, once again. The first value is copied, but the
second value is calculated like MA(2), since there isn’t enough data for a complete
calculation. You can use the rolling() function in Pandas to calculate simple
moving averages. The only parameter you should care for right now is window, as
it specifies the size of a sliding window. Read the previous section once again if
you don’t know what it represents.
Calling rolling() by itself won’t do anything. You still need an aggregate function.
Since these are moving averages, sticking .mean() to the end of a call will do.
Looking at the above visualization uncovers a couple of problems with simple
moving averages:
• Lag — moving average time series always lags from the original one.
Look at the peaks to verify that claim.
• Noise — too small sliding window size won’t remove all noise from the
original data.
• Averaging issue — averaged data will never capture the low and high
points of the original series due to, well, averaging.
• Weighting — identical weights are assigned to all data points. This can
be an issue as frequently the most recent values have more impact on the
future.
Pandas includes a method to compute the EMA moving average of any time
series: .ewm(). Will this method respond to our needs and compute an average that
matches our definition?
Which gives:
Models with low values of beta assume that the trend changes very slowly over
time while models with larger value of beta assume that the trend is changing
very rapidly.
The equations vary given the model is additive or multiplicative. Honestly, one
needs to put in significant effort to understand the math behind these equations. In
case you want to go into more depth on the intuition behind the equations
Here s is the season length i.e. the number of data points after which a new season
begins.
Autoregression Model
Before you learn what is autoregression, let’s recall what is a regression model.
For example, you have the stock prices of Bank of America (referred to as
BAC) and the stock prices of J.P. Morgan (referred to as JPM).
• Now you want to predict the stock price of JPM based on the stock price
of BAC.
• Here, the stock price of JPM would become the dependent variable, y
and the stock price of BAC would be the independent variable X.
Assuming that there is a linear relationship between X and y, the
•
regression model estimating the relationship between x and Y is of the
form
y=mX+cy=mX+c
where m is the slope of the equation and c is the constant.
Now, let’s suppose you only have one series of data, say, the stock price of
JPM. Instead of using a second time series (the stock price of BAC), the current
value of the stock price of JPM will be estimated based on the past stock price
of JPM. Let the observation at any point in time, t, be denoted as yt. You can
estimate the relationship between the value at any point t, yt, and the value at
any point (t-1), yt-1using the regression model as below.
AR(1)=yt=Φ1yt−1+cAR(1)=yt=𝚽1yt−1+c
where 𝚽1is the parameter of the model and c is the constant. This is the
autoregression model of order 1. The term autoregression means regression of
a variable against its own past values.
Like the linear regression model, the autoregression model assumes that there
is a linear relationship between yt and yt-1. This is called the autocorrelation.
You will be learning more about this later.
Let us have a look at other orders of the autoregression model.
AR(2)=yt=Φ1yt−1+Φ2yt−2+cAR(2)=yt=𝚽1yt−1+𝚽2yt−2+c
where 𝚽1 and 𝚽2 are the parameters of the model and c is the constant. This is
the autoregression model of order 2. Here yt is assumed to have a correlation
with its past two values and is predicted as a linear combination of past 2 values.
This model can be generalised to order p as below.
AR(p)=yt=Φ1yt−1+Φ2yt−2+...+Φpyt−p+cAR(p)=yt=𝚽1yt−1+𝚽2yt−2+...+𝚽py
t−p+c
where 𝚽p's ; (p = 1, 2, ..., p) are the parameters of the model and c is the
constant. This is the autoregression model of order p. Here yt is assumed to
have a correlation with its past p values and is predicted as a linear combination
of past p values.
There are a number of ways available to compute the parameters 𝚽p's.
Just like the correlation coefficient, autocorrelation also measures the degree
of relationship between a variable’s current value and its past values. The value
of autocorrelation lies between -1 and +1.
What is autocovariance?
First, you need to understand what covariance and correlation are. Remember
that covariance is applied to 2 assets. The autocovariance is the same as the
covariance.
The only difference is that the autocovariance is applied to the same asset, i.e.,
you compute the covariance of the asset price return X with the same asset price
return X, but from a previous period.
Where:
Where:
Why variance and not the multiplication of the standard deviation of the returns
at the different lags?
Well, you must remember that an ARMA model is applied to stationary time
series. This topic belongs to time series analysis. Consequently, it is assumed
that the price returns, if stationary, have the same variance for any lag, i.e.:
What are the autocovariance and autocorrelation at lag zero?
Interesting question and simple to answer! Let’s see first for the former:
Since we know, from the above algebraic calculation, that the covariance of the
same variable is its variance, we have the following:
Consequently, the autocorrelation function for any asset price return at lag 0 is
always 1.
Day Day Day Day Day Day Day Day Day Day
1 2 3 4 5 6 7 8 9 10
Let’s suppose we want to compute the autocovariance at lag 1. You will need
the returns up to day 10, and the 1-period lagged returns up to day 9.
Thus, you have the following data structure for returns on days 10 and 9:
Da
Variab Da Da Da Da Da Da Da Da Da y
le y1 y2 y3 y4 y5 y6 y7 y8 y9 10
- - - -
5% 1% 2% 3% 4% 6% 2% 1% 3% 4%
rtrt
- - - -
rt−1rt− 5% 1% 2% 3% 4% 6% 2% 1% 3%
1
If you paid attention to details, you could see that the average return is the same
for both returns, in our case, for returns up to day 10 and up to day 9. As we
explained before, autocovariance and autocorrelation functions are applied
only to stationary time series.
Consequently, not only the variance but also the mean is a unique value for the
whole span. That’s why the mean is the same for any lag of the price returns.
The mean of the Microsoft price returns is 1.1%. Let’s follow the procedure to
compute the autocovariance
The autocovariance is just the sum of the last column values divided by (N-1,
equal to 8), which results in -0.046%.
Let’s follow the algebraic formulas and use the numbers to compute the
autocorrelation:
This means that values up to lag 20 are statistically significant, that is, they
affect the current price. Also, the autocorrelation is gradually approaching zero
as the lag term increases. This means that farther we go, lesser is the correlation.
Form the above plot you can see that lag 1, 2, 3, 4, etc are outside the confidence
band (blue region) and hence are statistically significant.
Finally, you will learn to estimate the parameters of the autoregression models.
But before that, let’s see if there are any challenges.
Any ‘non-seasonal’ time series that exhibits patterns and is not a random white
noise can be modeled with ARIMA models.
The most common approach is to difference it. That is, subtract the previous
value from the current value. Sometimes, depending on the complexity of the
series, more than one differencing may be needed.
‘p’ is the order of the ‘Auto Regressive’ (AR) term. It refers to the number of
lags of Y to be used as predictors. And ‘q’ is the order of the ‘Moving Average’
(MA) term. It refers to the number of lagged forecast errors that should go into
the ARIMA Model.
AR and MA models
So what are AR and MA models? what is the actual mathematical formula for
the AR and MA models?
A pure Auto Regressive (AR only) model is one where Yt depends only on its
own lags. That is, Yt is a function of the ‘lags of Yt’.
where, $Y{t-1}$ is the lag1 of the series, $\beta1$ is the coefficient of lag1 that
the model estimates and $\alpha$ is the intercept term, also estimated by the
model.
Likewise a pure Moving Average (MA only) model is one where Yt depends
only on the lagged forecast errors.
where the error terms are the errors of the autoregressive models of the
respective lags. The errors Et and E(t-1) are the errors from the following
equations :
But you need to be careful to not over-difference the series. Because, an over
differenced series may still be stationary, which in turn will affect the model
parameters.
If the autocorrelations are positive for many number of lags (10 or more), then
the series needs further differencing. On the other hand, if the lag 1
autocorrelation itself is too negative, then the series is probably over-
differenced.
In the event, you can’t really decide between two orders of differencing, then
go with the order that gives the least standard deviation in the differenced
series.
Let’s see how to do it with an example.
First, I am going to check if the series is stationary using the Augmented
Dickey Fuller test (adfuller()), from the statsmodels package.
Why?
The null hypothesis of the ADF test is that the time series is non-stationary. So,
if the p-value of the test is less than the significance level (0.05) then you reject
the null hypothesis and infer that the time series is indeed stationary.
So, in our case, if P Value > 0.05 we go ahead with finding the order of
differencing.
Since P-value is greater than the significance level, let’s difference the series
and see how the autocorrelation plot looks like.
Order of Differencing
For the above series, the time series reaches stationarity with two orders of
differencing. But on looking at the autocorrelation plot for the 2nd differencing
the lag goes into the far negative zone fairly quick, which indicates, the series
might have been over differenced.
So, I am going to tentatively fix the order of differencing as 1 even though the
series is not perfectly stationary (weak stationarity).
Find the order of the AR term (p)
The next step is to identify if the model needs any AR terms. You can find out
the required number of AR terms by inspecting the Partial Autocorrelation
(PACF) plot.
Partial autocorrelation of lag (k) of a series is the coefficient of that lag in the
autoregression equation of Y.
That is, suppose, if Y_t is the current series and Y_t-1 is the lag 1 of Y, then
the partial autocorrelation of lag 3 (Y_t-3) is the coefficient $\alpha_3$ of Y_t-
3 in the above equation.
The ACF tells how many MA terms are required to remove any autocorrelation
in the stationarized series.
Let’s see the autocorrelation plot of the differenced series.
Order of MA Term
Couple of lags are well above the significance line. So, let’s tentatively fix q as
2. When in doubt, go with the simpler model that sufficiently explains the Y.
model_fit = model.fit(disp=0)
print(model_fit.summary())
=========================================================
=====================
=========================================================
========================
---------------------------------------------------------------------------------
Roots
=========================================================
====================
-----------------------------------------------------------------------------
The model summary reveals a lot of information. The table in the middle is the
coefficients table where the values under ‘coef’ are the weights of the
respective terms.
Notice here the coefficient of the MA2 term is close to zero and the P-Value in
‘P>|z|’ column is highly insignificant. It should ideally be less than 0.05 for the
respective X to be significant.
model_fit = model.fit(disp=0)
print(model_fit.summary())
=========================================================
=====================
=========================================================
========================
coef std err z P>|z| [0.025 0.975]
---------------------------------------------------------------------------------
Roots
=========================================================
====================
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
The model AIC has reduced, which is good. The P Values of the AR1 and MA1
terms have improved and are highly significant (<< 0.05).
Let’s plot the residuals to ensure there are no patterns (that is, look for constant
mean and variance).
residuals = pd.DataFrame(model_fit.resid)
fig, ax = plt.subplots(1,2)
residuals.plot(title="Residuals", ax=ax[0])
Residuals Density
The residual errors seem fine with near zero mean and uniform variance. Let’s
plot the actuals against the fitted values using plot_predict().
# Actual vs Fitted
model_fit.plot_predict(dynamic=False)
plt.show()
Actual vs Fitted
When you set dynamic=False the in-sample lagged values are used for
prediction.
That is, the model gets trained up until the previous value to make the next
prediction. This can make the fitted forecast and actuals look artificially good.
So, we seem to have a decent ARIMA model. But is that the best?
Can’t say that at this point because we haven’t actually forecasted into the
future and compared the forecast with the actual performance.
So, the real validation you need now is the Out-of-Time cross-validation.
10. How to do find the optimal ARIMA model manually using Out-of-Time Cross
validation
In Out-of-Time cross-validation, you take few steps back in time and forecast
into the future to as many steps you took back. Then you compare the forecast
against the actuals.
That’s because the order sequence of the time series should be intact in order
to use it for forecasting.
Forecast vs Actuals
From the chart, the ARIMA(1,1,1) model seems to give a directionally correct
forecast. And the actual observed values lie within the 95% confidence band.
That seems fine.
But each of the predicted forecasts is consistently below the actuals. That
means, by adding a small constant to our forecast, the accuracy will certainly
improve. So, there is definitely scope for improvement.
While doing this, I keep an eye on the P values of the AR and MA terms in the
model summary. They should be as close to zero, ideally, less than 0.05.
# Build Model
fitted = model.fit(disp=-1)
print(fitted.summary())
# Forecast
# Plot
plt.figure(figsize=(12,5), dpi=100)
plt.plot(train, label='training')
plt.plot(test, label='actual')
plt.plot(fc_series, label='forecast')
color='k', alpha=.15)
plt.title('Forecast vs Actuals')
plt.show()
=========================================================
=====================
=========================================================
=========================
----------------------------------------------------------------------------------
const 0.0483 0.084 0.577 0.565 -0.116 0.212
Roots
=========================================================
====================
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
Revised Forecast vs Actuals
The AIC has reduced to 440 from 515. Good. The P-values of the X terms are
less the < 0.05, which is great.
Ideally, you should go back multiple points in time, like, go back 1, 2, 3 and 4
quarters and see how your forecasts are performing at various points in the year.
Here’s a great practice exercise: Try to go back 27, 30, 33, 36 data points and
see how the forcasts performs. The forecast performance can be judged using
various accuracy metrics discussed next.
Because only the above three are percentage errors that vary between 0 and 1.
That way, you can judge how good is the forecast irrespective of the scale of
the series.
The other error metrics are quantities. That implies, an RMSE of 100 for a
series whose mean is in 1000’s is better than an RMSE of 5 for series in 10’s.
So, you can’t really use them to compare the forecasts of two different scaled
time series.
# Accuracy metrics
me = np.mean(forecast - actual) # ME
mins = np.amin(np.hstack([forecast[:,None],
actual[:,None]]), axis=1)
maxs = np.amax(np.hstack([forecast[:,None],
actual[:,None]]), axis=1)
'corr':corr, 'minmax':minmax})
forecast_accuracy(fc, test.values)
Around 2.2% MAPE implies the model is about 97.8% accurate in predicting
the next 15 observations.
import pmdarima as pm
df =
pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/wwwusa
ge.csv', names=['value'], header=0)
seasonal=False, # No Seasonality
start_P=0,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
print(model.summary())
#>
=========================================================
=====================
#>
#>
=========================================================
=====================
#> ------------------------------------------------------------------------------
#> Roots
#>
=========================================================
====================
#> -----------------------------------------------------------------------------
#> -----------------------------------------------------------------------------
model.plot_diagnostics(figsize=(7,5))
plt.show()
Residuals Chart
So how to interpret the plot diagnostics?
Top left: The residual errors seem to fluctuate around a mean of zero and have
a uniform variance.
Top Right: The density plot suggest normal distribution with mean zero.
Bottom left: All the dots should fall perfectly in line with the red line. Any
significant deviations would imply the distribution is skewed.
Bottom Right: The Correlogram, aka, ACF plot shows the residual errors are
not autocorrelated. Any autocorrelation would imply that there is some pattern
in the residual errors which are not explained in the model. So you will need to
look for more X’s (predictors) to the model.
# Forecast
n_periods = 24
# Plot
plt.plot(df.value)
plt.plot(fc_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.title("Final Forecast of WWW Usage")
plt.show()
If your model has well defined seasonal patterns, then enforce D=1 for a given
frequency ‘x’.
As a general rule, set the model parameters such that D never exceeds one. And
the total differencing ‘d + D’ never exceeds 2. Try to keep only either SAR or
SMA terms if your model has seasonal components.
data =
pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
# Plot
# Usual Differencing
axes[0].set_title('Usual Differencing')
# Seasinal Dei
axes[1].set_title('Seasonal Differencing')
plt.show()
Seasonal Differencing
As you can clearly see, the seasonal spikes is intact after applying usual
differencing (lag 1). Whereas, it is rectified after seasonal differencing.
import pmdarima as pm
test='adf',
start_P=0, seasonal=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
smodel.summary()
(...TRUNCATED...)
The best model SARIMAX(3, 0, 0)x(0, 1, 1, 12) has an AIC of 528.6 and the P
Values are significant.
Let’s forecast for the next 24 months.
# Forecast
n_periods = 24
# Plot
plt.plot(data)
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.show()
There you have a nice forecast that captures the expected seasonal demand
pattern.
But for the sake of completeness, let’s try and force an external predictor, also
called, ‘exogenous variable’ into the model. This model is called the
SARIMAX model.
The only requirement to use an exogenous variable is you need to know the
value of the variable during the forecast period as well.
For the sake of demonstration, I am going to use the seasonal index from
the classical seasonal decomposition on the latest 36 months of data.
Why the seasonal index? Isn’t SARIMA already modeling the seasonality, you
ask?
But also, I want to see how the model looks if we force the recent seasonality
pattern into the training and forecast.
Secondly, this is a good variable for demo purpose. So you can use this as a
template and plug in any of your variables into the code. The seasonal index is
a good exogenous variable because it repeats every frequency cycle, 12 months
in this case.
So, you will always know what values the seasonal index will hold for the
future forecasts.
# Import Data
data =
pd.read_csv('https://raw.githubusercontent.com/selva86/datasets/master/a10.csv',
parse_dates=['date'], index_col='date')
Let’s compute the seasonal index so that it can be forced as a (exogenous)
predictor to the SARIMAX model.
model='multiplicative',
extrapolate_trend='freq')
seasonal_index = result_mul.seasonal[-12:].to_frame()
seasonal_index['month'] = pd.to_datetime(seasonal_index.index).month
data['month'] = data.index.month
The exogenous variable (seasonal index) is ready. Let’s build the SARIMAX
model.
import pmdarima as pm
# SARIMAX Model
start_p=1, start_q=1,
test='adf',
start_P=0, seasonal=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True)
sxmodel.summary()
So, we have the model with the exogenous term. But the coefficient is very
small for x1, so the contribution from that variable will be negligible. Let’s
forecast it anyway.
We have effectively forced the latest seasonal effect of the latest 3 years into
the model instead of the entire history.
Alright let’s forecast into the next 24 months. For this, you need the value of
the seasonal index for the next 24 months.
# Forecast
n_periods = 24
fitted, confint = sxmodel.predict(n_periods=n_periods,
exogenous=np.tile(seasonal_index.value, 2).reshape(-1,1),
return_conf_int=True)
# Plot
plt.plot(data['value'])
plt.plot(fitted_series, color='darkgreen')
plt.fill_between(lower_series.index,
lower_series,
upper_series,
color='k', alpha=.15)
plt.show()