Ensemble Machine Learning Approach

Ensemble Machine Learning Approach
An ensemble is a composite model, combines a series of low performing classifiers with

the aim of creating an improved classifier. Here, individual classifier vote and final
prediction label returned that performs majority voting. Ensembles offer more accuracy
than individual or base classifier. Ensemble methods can parallelize by allocating each
base learner to different-different machines. Finally, you can say Ensemble learning
methods are meta-algorithms that combine several machine learning methods into a
single predictive model to increase performance. Ensemble methods can decrease
variance using bagging approach, bias using a boosting approach, or improve
predictions using stacking approach.
1. Bagging stands for bootstrap aggregation. It combines multiple learners in a

way to reduce the variance of estimates. For example, random forest trains M
Decision Tree, you can train M different trees on different random subsets of
the data and perform voting for final prediction. Bagging ensembles methods
are Random Forest and Extra Trees.
2. Boosting algorithms are a set of the low accurate classifier to create a highly
accurate classifier. Low accuracy classifier (or weak classifier) offers the
accuracy better than the flipping of a coin. Highly accurate classifier( or strong
classifier) offer error rate close to 0. Boosting algorithm can track the model
who failed the accurate prediction. Boosting algorithms are less affected by
the overfitting problem. The following three algorithms have gained massive
popularity in data science competitions.
 AdaBoost (Adaptive Boosting)
 Gradient Tree Boosting

 XGBoost
3. Stacking(or stacked generalization) is an ensemble learning technique that
combines multiple base classification models predictions into a new data set.
This new data are treated as the input data for another classifier. This
classifier employed to solve this problem. Stacking is often referred to as
blending.
On the basis of the arrangement of base learners, ensemble methods can be divided into two
groups: In parallel ensemble methods, base learners are generated in parallel for example.
Random Forest. In sequential ensemble methods, base learners are generated sequentially for
example AdaBoost.
On the basis of the type of base learners, ensemble methods can be divided into two
groups: homogenous ensemble method uses the same type of base learner in each
iteration. heterogeneous ensemble method uses the different type of base learner in each
iteration.
AdaBoost Classifier
Ada-boost or Adaptive Boosting is one of ensemble boosting classifier proposed by Yoav Freund
and Robert Schapire in 1996. It combines multiple classifiers to increase the accuracy of
classifiers. AdaBoost is an iterative ensemble method. AdaBoost classifier builds a strong
classifier by combining multiple poorly performing classifiers so that you will get high accuracy
strong classifier. The basic concept behind Adaboost is to set the weights of classifiers and
training the data sample in each iteration such that it ensures the accurate predictions of unusual
observations. Any machine learning algorithm can be used as base classifier if it accepts weights
on the training set. Adaboost should meet two conditions:
1. The classifier should be trained interactively on various weighed training examples.

2. In each iteration, it tries to provide an excellent fit for these examples by minimizing
training error.
How does the AdaBoost algorithm work?

It works in the following steps:
1. Initially, Adaboost selects a training subset randomly.

2. It iteratively trains the AdaBoost machine learning model by selecting the training set
based on the accurate prediction of the last training.
3. It assigns the higher weight to wrong classified observations so that in the next
iteration these observations will get the high probability for classification.
4. Also, It assigns the weight to the trained classifier in each iteration according to the
accuracy of the classifier. The more accurate classifier will get high weight.
5. This process iterate until the complete training data fits without any error or until
reached to the specified maximum number of estimators.
6. To classify, perform a "vote" across all of the learning algorithms you built.
Algorithm
Before we dive into the code, it’s important that we grasp how the
Gradient Boost algorithm is implemented under the hood. Suppose, we
were trying to predict the price of a house given their age, square
footage and location.
Step 1: Calculate the average of the target label
When tackling regression problems, we start with a leaf that is the

average value of the variable we want to predict. This leaf will be used
as a baseline to approach the correct solution in the proceeding steps.
Step 2: Calculate the residuals
For every sample, we calculate the residual with the proceeding

formula.
residual = actual value – predicted value
In our example, the predicted value is the equal to the mean calculated
in the previous step and the actual value can be found in the price
column of each sample. After computing the residuals, we get the
following table.
Step 3: Construct a decision tree
Next, we build a tree with the goal of predicting the residuals. In other
words, every leaf will contain a prediction as to the value of the residual
(not the desired label).
In the event there are more residuals than leaves, some residuals will
end up inside the same leaf. When this happens, we compute their
average and place that inside the leaf.
Thus, the tree becomes:

Step 4: Predict the target label using all of the trees
within the ensemble
Each sample passes through the decision nodes of the newly formed
tree until it reaches a given lead. The residual in said leaf is used to
predict the house price.
It’s been shown through experimentation that taking small

incremental steps towards the solution achieves a comparable bias
with a lower overall vatiance (a lower variance leads to better accuracy
on samples outside of the training data). Thus, to prevent overfitting,
we introduce a hyperparameter called learning rate. When we make a
prediction, each residual is multiplied by the learning rate. This forces
us to use more decision trees, each taking a small step towards the final
solution.
Step 5: Compute the new residuals
We calculate a new set of residuals by subtracting the actual house

prices from the predictions made in the previous step. The residuals
will then be used for the leaves of the next decision tree as described in
step 3.
Step 6: Repeat steps 3 to 5 until the number of iterations
matches the number specified by the hyperparameter
(i.e. number of estimators)
Step 7: Once trained, use all of the trees in the
ensemble to make a final prediction as to the value of
the target variable
The final prediction will be equal to the mean we computed in the first
step, plus all of the residuals predicted by the trees that make up the
forest multiplied by the learning rate.
Code
In this tutorial, we’ll make use of

the GradientBoostingRegressor class from the scikit-
learn library.

Ensemble Machine Learning Approach

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ensemble Machine Learning Approach

Uploaded by

Copyright:

Available Formats

Ensemble Machine Learning Approach

An ensemble is a composite model, combines a series of low performing classifiers with

1. Bagging stands for bootstrap aggregation. It combines multiple learners in a

 Gradient Tree Boosting

1. The classifier should be trained interactively on various weighed training examples.

How does the AdaBoost algorithm work?

1. Initially, Adaboost selects a training subset randomly.

When tackling regression problems, we start with a leaf that is the

For every sample, we calculate the residual with the proceeding

residual = actual value – predicted value

Thus, the tree becomes:

It’s been shown through experimentation that taking small

We calculate a new set of residuals by subtracting the actual house

In this tutorial, we’ll make use of

You might also like