Professional Documents
Culture Documents
Advanced Machine Learning and Feature Engineering: Stacking
Advanced Machine Learning and Feature Engineering: Stacking
Advanced Machine Learning and Feature Engineering: Stacking
Stacking
and uses their output as the input for the meta-classifier/regressor. Stacking is an
ensemble learning technique much like Random Forests where the quality of prediction
is improved by combining.
There are many stacking methods that can be deployed. They are:
1. Vecstack
2. Sklearn stacking
3. mlxtend
We have chosen sklearn stacking. The library that we used in sklearn stacking is as
mentioned below
Here, the “earth”, “lin_model”, “lgmbr” are estimators where “MARS”, “Lasso”,
“LightGBM” are strings, and the “final_estimator” is the parameter which will be used
to combine the base estimators, by default the parameter for “final_estimator” is
“RidgeCV”.
The benefit of stacking is that it can harness the capabilities of a range of well-
performing models on classification or regression task and make predictions that have
better performance than any single model in the ensemble.
Once, we obtain the best model we can perform feature selection depending on the
model. The features that are obtained are model dependent.
The hyperparameters tuning don’t change over training time and remain constant and
manipulate the training process of a model.
The task we need to find in deep learning are:
• Number of layers to choose
• Number of neurons in a layer to choose
• Choice of the optimization function
• Choice of the learning rate for optimization function
• Choice of the loss function
• Choice of metrics
• Choice of activation function
• Choice of layer weight initialization
From all the above options we are selecting number of layers, number of neurons in a
layer, the activation function. Our main aim is to get the Root Mean Squared Error
(RMSE) value to be low, so we are considering Mean Squared Error (MSE).
As, we have used multiple machine learning models and deep learning models.
Among them we are considering the model which has highest “R squared score” and
least “Root Mean Squared Error”.
We have selected the best model and checked the loss that have been obtained. As
we can see the Root Mean Squared Error is high for the MLP trained with Keras Tuner
and the error is of 3646.
We now try using LSTM model, Long short-term memory (LSTM) is an artificial
recurrent neural network architecture used in the field of deep learning. Unlike
standard feedforward neural networks, LSTM has feedback connections. It can
process not only single data points, but also entire sequences of data.
Now, we have a look how to build an LSTM model in Python. The bi-directional LSTM
with dropouts works fairly well, when compared with uni-directional LSTM
The below screenshot explains, the lstm model is trained on previous 100 days and
predicts the next 60 days i.e., 2 months of future data.
Error Analysis
The above screenshot, explains the RMSE and R Squared Value for different Kinds
of dataset. We can clearly say that LSTM, Light GBM have given a less root mean
squared error when compared with others.
Model Interpretability
The major methods that are used for check the model interpretability are:
• LIME
• SHAP
As, we already know that by SHAP method is much better than LIME since, in the
LIME method it is difficult to define the neighbourhood, and the neighbourhood is
exponential kernel and it is instability due to lack of robustness. So, we are using SHAP
rather than LIME, below is the mentioned formula to calculate the SHAP.
To work with SHAP we need to run below command in our command prompt to install
the relevant libraries
The below screenshot explains which feature is most important, the feature
importance is ranked based in the descending order. The “onpromotion”,
“f_GROCERY I”, “f_CLEANING”, “store_nbr” and “f_BEVERAGES” are top 5 features
that are most important to predict the output.
As we already know that Multiple Linear Regression works well and moreover it is
considered as base line model. When an unknown dataset is trained on multiple Linear
Regression the Root Mean squared Error is 10.72.