Professional Documents
Culture Documents
MTech Seminar II
MTech Seminar II
INTRODUCTION
1
rise to different decision boundaries, even if all other parameters are kept constant.
The most commonly used procedure - choosing the classifiers with the smallest error
on training data - is unfortunately a flawed one. Performance on a training dataset -
even when computed using a cross-validation approach - can be misleading in terms
of the classification performance on the previously unseen data. Then, of all
(possibly infinite) classifiers that may all have the same training - or even the same
(pseudo) generalization performance as computed on the validation data (part of the
training data left unused for evaluating the classifier performance) - which one
should be chosen? Everything else being equal, one may be tempted to choose at
random, but with that decision comes the risk of choosing a particularly poor model.
Using an ensemble of such models - instead of choosing just one - and combining
their outputs by - for example, simply averaging them - can reduce the risk of an
unfortunate selection of a particularly poorly performing classifier. It is important to
emphasize that there is no guarantee that the combination of multiple classifiers will
always perform better than the best individual classifier in the ensemble. Combining
classifiers may not necessarily beat the performance of the best classifier in the
ensemble, but it certainly reduces the overall risk of making a particularly poor
selection.
2
combination of which can then reduce the total error. Figure 1.1 graphically
illustrates this concept, where each classifier - trained on a different subset of the
available training data - makes different errors (shown as instances with dark
borders), but the combination of the (three) classifiers provides the best decision
boundary.
3
2. HISTORY
4
3. ENSEMBLE LEARNING METHODS
Ensemble methods are a machine learning technique that combines several base
models in order to produce one optimal predictive model. Ensemble modeling is a
powerful way to improve the performance of your model. The main principle behind
the ensemble model is that a group of weak learners come together to form a strong
learner, thus increasing the accuracy of the model. Ensemble learning usually
produces more accurate solutions than a single model would. Ensemble learning
methods is applied to regression as well as classification. Ensemble learning for
regression creates multiple repressors i.e. multiple regression models such as linear,
polynomial, etc.
1. Boosting
2. Stacking
3. Bagging
3.1 Boosting, that often considers homogeneous weak learners, learns them
sequentially in a very adaptative way (a base model depends on the previous
ones) and combines them following a deterministic strategy.
3.2 Stacking, that often considers heterogeneous weak learners, learns them in
parallel and combines them by training a meta-model to output a prediction
based on the different weak models predictions.
3.3 Bagging, that often considers homogeneous weak learners, learns them
independently from each other in parallel and combines them following some
kind of deterministic averaging process.
5
4. BOOSTING
Boosting consists in, iteratively, fitting a weak learner, aggregate it to the ensemble
model and “update” the training dataset to better take into account the strengths and
weakness of the current ensemble model when fitting the next base model.
6
5. STACKED GENARALIZATION
7
6. BAGGING
Bagging, which stands for bootstrap aggregating, is one of the earliest, most
intuitive and perhaps the simplest ensemble based algorithms, with a surprisingly
good performance (Breiman 1996). Diversity of classifiers in bagging is obtained by
using bootstrapped replicas of the training data. That is, different training data
subsets are randomly drawn – with replacement – from the entire training dataset.
Each training data subset is used to train a different classifier of the same type.
Individual classifiers are then combined by taking a simple majority vote of their
decisions. For any given instance, the class chosen by most number of classifiers is
the ensemble decision. Since the training datasets may overlap substantially,
additional measures can be used to increase diversity, such as using a subset of the
training data for training each classifier, or using relatively weak classifiers (such as
decision stumps). The example of Bagging is provided in Figure 6.1.
8
7. CONCLUSION
Although ensemble methods can help you win machine learning competitions
by devising sophisticated algorithms and producing results with high accuracy, it is
often not preferred in the industries where interpretability is more important.
Nonetheless, the effectiveness of these methods is undeniable, and their benefits in
appropriate applications can be tremendous. In fields such as healthcare, even the
smallest amount of improvement in the accuracy of machine learning algorithms can
be something truly valuable.
9
References
[1] N. J. Nilsson. ‘Learning Machines: Foundations of trainable pattern-classifying
systems. McgrawHill”, New York, 1965.
[2] Joseph Rocca. “Ensemble methods: bagging, boosting and stacking”, Accessed:
Augest. 2019. [Online]. Available: https://towardsdatascience.com/ensemble-
methods-bagging-boosting-and-stacking-c9214a10a205
[3] Robert E. Schapire. “The strength of weak learnability. Machine Learning”,
5(2):197– 227, 1990.
10