Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

SKFORECAST:

FORECASTING TIME
SERIES WITH
SCIKIT-LEARN MODELS
Joaquín Amat Rodrigo
Javier Escobar Ortiz

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


BEHIND THE SCENES

Javier Escobar Ortiz


Data Scientist @IKEA 󰐴🌭
LinkedIn
javier.escobar.ortiz@gmail.com

Joaquín Amat Rodrigo


Data Scientist @Veeva Systems
LinkedIn
j.amatrodrigo@gmail.com

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


INDEX
● Time series
● Forecasting
● Prediction horizon: single-step and multi-step
○ Recursive multi-step models
○ Direct multi-step models
● Validation of forecasting models
● Hyperparameter tuning
● Prediction intervals
● Multi-series and multivariate models
● Other features of Skforecast
● Additional material

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


TIME SERIES

A time series is a sequence of data arranged chronologically and spaced at equal or irregular
intervals.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


FORECASTING
The forecasting process consists of predicting the future value of a time series, either by modeling the
series solely based on its past behavior (autoregressive) or by incorporating other external variables.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


FORECASTING
The forecasting process consists of predicting the future value of a time series, either by modeling the
series solely based on its past behavior (autoregressive) or by incorporating other external variables.

Historical data is used to obtain a mathematical representation capable of predicting future values to
create a forecasting model. This idea is based on the assumption that the future behavior of a
phenomenon can be explained based on its past behavior.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


FORECASTING
The forecasting process consists of predicting the future value of a time series, either by modeling the
series solely based on its past behavior (autoregressive) or by incorporating other external variables.

Historical data is used to obtain a mathematical representation capable of predicting future values to
create a forecasting model. This idea is based on the assumption that the future behavior of a
phenomenon can be explained based on its past behavior.

This rarely happens in reality, or at least not entirely.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


FORECASTING APPROACHES

Statistical-Econometric models Machine learning models

Autoregression (AR) Lasso


Moving Average (MA) Ridge
Autoregressive Moving Average (ARMA) Random Forest
Autoregressive Integrated Moving Average (ARIMA) Gradient Boosting
Simple Exponential Smoothing (SES) SVM
Holt Winter’s Exponential Smoothing (HWES) Neural networks
LSTM

Single-step models
Recursive multi-step models
Direct multi-step models

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


FORECASTING APPROACHES

Statistical-Econometric models Machine learning models

Autoregression (AR) Lasso


Moving Average (MA) Ridge
Autoregressive Moving Average (ARMA) Random Forest
Autoregressive Integrated Moving Average (ARIMA) Gradient Boosting
Simple Exponential Smoothing (SES) SVM
Holt Winter’s Exponential Smoothing (HWES) Neural networks
LSTM

Single-step models
Recursive multi-step models
Direct multi-step models

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


PREDICTION
Single-step
The goal is to predict only the next value in the series, t+1.

Multi-step
The goal is to predict the next n values in the series.

● Recursive multi-step strategy

● Direct multi-step strategy

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


SINGLE-STEP PREDICTION
The goal is to predict only the next value in the series.

Observed value

T-n … T-4 T-3 T-2 T-1 Predictors (lags)

Model for step +1

Predicted value

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


SINGLE-STEP PREDICTION
The goal is to predict only the next value in the series.

Observed value

T-n … T-4 T-3 T-2 T-1 Predictors (lags)

Model for step +1

Predicted value

T-n … T-4 T-3 T-2 T-1

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


SINGLE-STEP PREDICTION
The goal is to predict only the next value in the series.

Observed value

T-n … T-4 T-3 T-2 T-1 Predictors (lags)

Model for step +1

Predicted value

T-n … T-4 T-3 T-2 T-1

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


SINGLE-STEP PREDICTION
The goal is to predict only the next value in the series.

Observed value

T-n … T-4 T-3 T-2 T-1 Predictors (lags)

Model for step +1

Predicted value

T-n … T-4 T-3 T-2 T-1 T+1

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


RECURSIVE MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


RECURSIVE MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value
Predictors
T-n … T-4 T-3 T-2 T-1 T+1 Prediction step 1 Model for step +1

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


RECURSIVE MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value
Predictors
T-n … T-4 T-3 T-2 T-1 T+1 Prediction step 1 Model for step +1

T-n … T-4 T-3 T-2 T-1 T+1 T+2 Prediction step 2

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


RECURSIVE MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value
Predictors
T-n … T-4 T-3 T-2 T-1 T+1 Prediction step 1 Model for step +1

T-n … T-4 T-3 T-2 T-1 T+1 T+2 Prediction step 2

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Prediction step 3

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterAutoreg

# Create and fit forecaster


# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 12,
transformer_y = None, # transformer for series y
transformer_exog = None, # transformer for exogenous variables
weight_func = None # function to create weights based on an index
)

forecaster.fit(y=data_train['y'], exog=None)

# Prediction
# ==============================================================================
steps = 36 # steps to be predicted
predictions = forecaster.predict(steps=steps, last_window=None)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterAutoregCustom
# Custom function to create predictors (row by row)
# ==============================================================================
def create_predictors(y): # series y as input
"""
Create the first 10 lags of a time series.
Calculate the 20-observation moving average.
"""
lags = y[-1:-11:-1]
mean = np.mean(y[-20:])
predictors = np.hstack([lags, mean])

return predictors

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterAutoregCustom

# Create and fit forecaster


# ==============================================================================
forecaster = ForecasterAutoregCustom(
regressor = LGBMRegressor(random_state=123),
fun_predictors = create_predictors, # custom function to create predictors
window_size = 20, # number of observations needed to create predictors
transformer_y = None,
transformer_exog = None,
weight_func = None
)
forecaster.fit(y=data_train['y'], exog=None)

# Prediction
# ==============================================================================
steps = 36 # steps to be predicted
predictions = forecaster.predict(steps=steps, last_window=None)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


DIRECT MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


DIRECT MULTI-STEP PREDICTION

T-n … T-4 T-3 T-2 T-1 T+1 T+2 T+3 Observed value

Predicted value
Predictors
T-n … T-4 T-3 T-2 T-1 T+1 Prediction step 1
Model for step 1

T-n … T-4 T-3 T-2 T-1 T+2 Prediction step 2

Model for step 2

T-n … T-4 T-3 T-2 T-1 T+3 Prediction step 3


Model for step 3

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterAutoregDirect

# Create and fit forecaster


# ==============================================================================
forecaster = ForecasterAutoregDirect(
regressor = Ridge(random_state=123),
steps = 36, # maximum number of steps to be predicted
lags = 15,
transformer_y = None,
transformer_exog = None,
weight_func = None
)
forecaster.fit(y=data_train['y'], exog=None)

# Prediction
# ==============================================================================
steps = 36 # steps to be predicted
predictions = forecaster.predict(steps=steps, last_window=None)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


COMPARISON OF MULTI-STEP APPROACHES

● No prediction method outperforms another in all scenarios. It depends on the use case.

● Direct multi-step requires training a model for each step, so it has higher computational
requirements.

● With direct multi-step, the prediction horizon must be defined in advance, which is
unnecessary with the recursive multi-step approach.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


MODEL VALIDATION (BACKTESTING)
Backtesting without refit

Backtesting with refit (time series cross validation,


Backtesting with refit (fixed origen)
rolling origen)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


Backtesting

# Backtesting forecaster
# ==============================================================================
metric, preds_backtest = backtesting_forecaster(
forecaster = forecaster, # forecaster
y = data['y'], # full time series
exog = None, # exogenous variables
steps = 12, # steps to be predicted
metric = 'mean_absolute_error', # metric
initial_train_size = len(data_train), # nº observations initial train
fixed_train_size = True, # fix/moving train size
refit = True, # retrain after each fold
verbose = True # verbose
)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


Grid search
# Grid search of hyperparameters and lags
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 12, # placeholder, it will be replaced in the search
transformer_y = StandardScaler(),
transformer_exog = None,
weight_func = None
)

# Grid of Lags used as predictors


lags_grid = [6, 12, 18]

# Regressor hyperparameters
param_grid = {'n_estimators': [100, 200],
'max_depth': [3, 5, 10]}

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


Grid search
# Grid search of hyperparameters
# ==============================================================================
results_grid = grid_search_forecaster(
forecaster = forecaster, # forecaster
y = data['y'], # full time series
exog = None, # exogenous variables
param_grid = param_grid, # hyperparameters grid
lags_grid = lags_grid, # lags grid
steps = 12, # nº steps to be predicted
metric = mean_absolute_error, # Callable metric
initial_train_size = len(data_train), # nº observations initial train
fixed_train_size = True, # fix/moving train size
refit = True, # retrain after each fold
return_best = True, # retrain forecaster with best option
verbose = False # Verbose
)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


PROBABILISTIC FORECASTING: PREDICTION INTERVALS

A prediction interval defines the interval within which the true value of y is expected to be
found with a given probability. For example, the prediction interval of 98% is expected to
contain the true value of the prediction with a probability of 98%.

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


PREDICTION INTERVALS: BOOTSTRAPPED RESIDUALS

T-n … T-4 T-3 T-2 T-1 T+1

+ 𝝐1
T-n … T-4 T-3 T-2 T-1 T+1+𝝐1

T-n … T-4 T-3 T-2 T-1 T+1+𝝐1 T+2

+ 𝝐2
Observed value
T-n … T-4 T-3 T-2 T-1 T+1+𝝐1 T+2+𝝐2
Predicted value
Predicted value + 𝝐
T-n … T-4 T-3 T-2 T-1 T+1+𝝐1 T+2+𝝐2 T+3
Predictors
+ 𝝐3
Model for step +1
T-n … T-4 T-3 T-2 T-1 T+1+𝝐1 T+2+𝝐2 T+3+𝝐3

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


PREDICTION INTERVALS: BOOTSTRAPPED RESIDUALS

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


Forecaster.predict_interval()
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = Ridge(random_state=123),
lags = 12
)

forecaster.fit(y=data_train['y'], exog=None)

# Predict interval
# ==============================================================================
predictions = forecaster.predict_interval(
steps = 36,
interval = [10, 90] # 80% interval between 10th and 90th percentiles
)
predictions.head(5)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


MULTI-SERIES FORECASTING

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


MULTIVARIATE FORECASTING

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterMultiSeries

# Create and fit forecaster


# ===============================================================================
forecaster = ForecasterAutoregMultiSeries(
regressor = Ridge(random_state=123),
lags = 24,
transformer_series = None, # individual transformation for each series
transformer_exog = None,
weight_func = None, # weights for each observation based on index
series_weights = None # weights for each series
)

forecaster.fit(series=data, exog=None)
forecaster

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterMultiSeries predict

# Prediction
# ==============================================================================
steps = 24

# Predictions for all series (levels)


predictions_items = forecaster.predict(
steps = steps,
levels = None # levels to predict. If None, all series are predicted
)
print(predictions_items.head(3))

# Prediction intervals for series 1 and 2


predictions_intervals = forecaster.predict_interval(steps=steps, levels=[‘series_1’, ‘series_2’])
print(predictions_intervals.head(3))

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterMultiVariate

# Create and fit forecaster


# ==============================================================================
forecaster = ForecasterAutoregMultiVariate(
regressor = Ridge(random_state=123),
level = 'CO', # forecaster target variable (column name)
lags = 7, # allows a different lag configuration for each series
steps = 7, # number of regressors, 1 regressor per step
transformer_series = None,
transformer_exog = None,
weight_func = None
)

forecaster.fit(series=data, exog=None)
forecaster

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ForecasterMultiVariate predict

# Prediction
# ==============================================================================
predictions = forecaster.predict(steps=None) # steps to predict
print(predictions)

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


OTHER FEATURES OF SKFORECAST

● Transformation of time series and exogenous variables


● Production of models (last window)
● Weighted forecasting for the exclusion of parts of historical data
● Forecasting with missing values
● Explainability of SHAP Values models
● Custom metrics

Skforecast 0.7.0 new release?


● SARIMAX wrapper

● Probabilistic distribution prediction


● Bootstrapping predictions in ForecasterDirect

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


ADDITIONAL MATERIAL

● Examples and tutorials (link)


○ Skforecast: time series forecasting with Python and Scikit-learn
○ Forecasting electricity demand with Python
○ Forecasting web traffic with machine learning and Python
○ Forecasting time series with gradient boosting: Skforecast, XGBoost, LightGBM and
CatBoost
○ Bitcoin price prediction with Python
○ Prediction intervals in forecasting models
○ Multi-series forecasting
○ Reducing the influence of Covid-19 on time series forecasting models
○ Forecasting time series with missing values

● Links
○ Github: Skforecast
○ Documentation and user guides

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


OTHER PYTHON FORECASTING LIBRARIES

● sktime
● StatsForecast (Nixtla)
● Prophet
● NeuralProphet

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


WE NEED YOUR HELP 😊

● Feedback creating Github issues

● GitHub stars ⭐
● Improvement suggestions
● Recommend to your friends

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net


Thank you!
Joaquín Amat Rodrigo
Javier Escobar Ortiz

Skforecast: GitHub Skforecast: Docs www.cienciadedatos.net

You might also like