Module 4

Ensemble Methods: Bagging and Boosting

Supervised Learning
Goal: learn predictor h(x)

High accuracy (low error)
Using training data {(x1,y1),…,(xn,yn)}
Different Classifiers
Performance
None of the classifiers is perfect
Complementary
Examples which are not correctly
classified by one classifier may be
correctly classified by the other
classifiers
Potential Improvements?
Utilize the complementary property
Ensembles of Classifiers
Idea
Combine the classifiers to improve the
performance
Ensembles of Classifiers
Combine the classification results from
different classifiers to produce the
final output
Unweighted voting
Weighted voting
Example: Weather Forecast
Reality
1 X X X
2 X X X
3 X X X
4 X X
5 X X
Combine
Outline
•Bias/Variance Tradeoff
•Ensemble methods that minimize variance

– Bagging
– Random Forests
•Ensemble methods that minimize bias
– Functional Gradient Descent
– Boosting
– Ensemble Selection
Some Simple Ensembles
Ensemble Methods: Bagging and Boosting 2

Voting or Averaging of predictions of multiple pre-trained

models

Voting or Averaging of predictions of multiple pre-trained
models
“Stacking”: Use predictions of multiple models as “features” to train a new model and use the
new model to make predictions on test data

Ensembles: Another Approach
Instead of training different models on same data, trainsame modelmultiple times

ondifferent data sets, and “combine” these “different” models


We can use some simple/weak model as the base model


How do we get multiple training data sets (in practice, we only have one data set at
training time)?


How do we get multiple training data sets (in practice, we only have one data set at
training time)?

Bagging
Bagging stands for Bootstrap Aggregation

Bagging

Takes original data set D with N training examples

Bagging

Takes original data set D with N training
examples
Creates M copies {D ˜m} M
m=1

Bagging

˜m} M
Creates M copies {D m=1
Each D˜m is generated from D bysampling with replacement

Bagging

˜m} M
Each data set D˜m has the same number of examples as in data set D

Bagging

˜m} M
These data sets are reasonably different from each other (since only about 63% of the original
examples appear in any of these data sets)

Bagging

˜m} M
Train models h1, . . . , hM using D˜1, . . . , D˜M, respectively

Bagging

examples
m=1
Train models h1, . . . , hM using D˜M1, . . . , D˜M, respectively

Σ
Use an averaged model h = M1 m=1 hm as the final model

Bagging

examples
m=1
Train models h1, . . . , hM using ˜1, . . . , D˜M, respectively

1 DM
Use an averaged model h =M Σ m=1 hm as the final model
Useful for models with high variance and noisy
data
Bagging: illustration
Top: Original data, Middle: 3 models (from some model class) learned using three data sets chosen
via bootstrapping, Bottom: averaged model

Random Forests
An ensemble of decision tree (DT)

classifiers

Random Forests
An ensemble of decision tree (DT) classifiers

Uses bagging on features (each DT will use a random set of
features)

Random Forests

Uses bagging on features (each DT will use a random set of features)
√
Given a total of D features, each DT uses D randomly chosen features

Random Forests

√
Randomly chosen features make the different trees uncorrelated

Random Forests

√
All DTs usually have the same depth

Random Forests

√
Each DT will split the training data differently at the leaves

Random Forests

√
Given a total of D features, each DT uses D randomly chosen features Randomly
chosen features make the different trees uncorrelated
Each DT will split the training data differently at the leaves
Prediction for a test example votes on/averages predictions from all the DTs

Module 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 4

Uploaded by

Copyright:

Available Formats

Ensemble Methods: Bagging and Boosting

Ensemble Methods: Bagging and Boosting

Goal: learn predictor h(x)

•Ensemble methods that minimize variance

Ensemble Methods: Bagging and Boosting 2

Voting or Averaging of predictions of multiple pre-trained

Ensemble Methods: Bagging and Boosting 2

Ensemble Methods: Bagging and Boosting 2

Instead of training different models on same data, trainsame modelmultiple times

Ensemble Methods: Bagging and Boosting 3

Instead of training different models on same data, trainsame modelmultiple times

Ensemble Methods: Bagging and Boosting 3

Instead of training different models on same data, trainsame modelmultiple times

Ensemble Methods: Bagging and Boosting 3

Instead of training different models on same data, trainsame modelmultiple times

Ensemble Methods: Bagging and Boosting 3

Bagging stands for Bootstrap Aggregation

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Train models h1, . . . , hM using D˜1, . . . , D˜M, respectively

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Train models h1, . . . , hM using D˜M1, . . . , D˜M, respectively

Ensemble Methods: Bagging and Boosting 4

Bagging stands for Bootstrap Aggregation

Each D˜m is generated from D bysampling with replacement

Train models h1, . . . , hM using ˜1, . . . , D˜M, respectively

Ensemble Methods: Bagging and Boosting

An ensemble of decision tree (DT)

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

Ensemble Methods: Bagging and Boosting 6

An ensemble of decision tree (DT) classifiers

You might also like