Professional Documents
Culture Documents
Ensemble Machine Learning Method
Ensemble Machine Learning Method
ensemble learning
in my most recent
machine learning
project.
Bagging
Briefing
High variance
Usually when the model is overfitted
, there is always a problem of high
variance.
Fig 1
Image from : Codebasics machine learning course
Explanation of the fig 1
Training different models on different
dataset
After creating different subsets , we
train different models on all these
subsets
I used the same dataset from my last project , so I will be doing the same data preprocessing. As you
can see , I can’t throw this dataset into my machine learning model because it contains a lot of
categorical variables.
Scaling of my dataset
● The variables in my dataset are terribly lopsided , for example , my dummies are zero
and ones , and there are other variables that are way bigger, throwing them into the
model without scaling might confuse the model.
Data preparation
Missing values Label encoding
I tried to explore my dataset to see if ● So columns with just two unique
there is any missing values and variables ,I used label encoding
fortunately , there is none. while I used one hot encoding for
the rest.
4 3
Also , use cross validation
The I used the bagging
also just to get several
model to compare my results
results I can compare with.
Kfold with Decision Tree classifier
Bagging method
Accuracy from my
bagging model is:
83%
CONCLUSION