Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Ensemble Methods: Bagging and Boosting

Ensemble Methods: Bagging and Boosting


Supervised Learning

Goal: learn predictor h(x)


High accuracy (low error)
Using training data {(x1,y1),…,(xn,yn)}
Different Classifiers
Performance
None of the classifiers is perfect
Complementary
Examples which are not correctly
classified by one classifier may be
correctly classified by the other
classifiers
Potential Improvements?
Utilize the complementary property
Ensembles of Classifiers
Idea
Combine the classifiers to improve the
performance
Ensembles of Classifiers
Combine the classification results from
different classifiers to produce the
final output
Unweighted voting
Weighted voting
Example: Weather Forecast

Reality
1 X X X
2 X X X
3 X X X
4 X X
5 X X
Combine
Outline

•Bias/Variance Tradeoff

•Ensemble methods that minimize variance


– Bagging
– Random Forests
•Ensemble methods that minimize bias
– Functional Gradient Descent
– Boosting
– Ensemble Selection
Some Simple Ensembles

Ensemble Methods: Bagging and Boosting 2


Some Simple Ensembles

Voting or Averaging of predictions of multiple pre-trained


models

Ensemble Methods: Bagging and Boosting 2


Some Simple Ensembles
Voting or Averaging of predictions of multiple pre-trained
models

“Stacking”: Use predictions of multiple models as “features” to train a new model and use the
new model to make predictions on test data

Ensemble Methods: Bagging and Boosting 2


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models
We can use some simple/weak model as the base model

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models
We can use some simple/weak model as the base model
How do we get multiple training data sets (in practice, we only have one data set at
training time)?

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models
We can use some simple/weak model as the base model
How do we get multiple training data sets (in practice, we only have one data set at
training time)?

Ensemble Methods: Bagging and Boosting 3


Bagging

Bagging stands for Bootstrap Aggregation

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training examples

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training
examples
Creates M copies {D ˜m} M
m=1

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training examples
˜m} M
Creates M copies {D m=1

Each D˜m is generated from D bysampling with replacement

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training examples
˜m} M
Creates M copies {D m=1

Each D˜m is generated from D bysampling with replacement

Each data set D˜m has the same number of examples as in data set D

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training examples
˜m} M
Creates M copies {D m=1

Each D˜m is generated from D bysampling with replacement

Each data set D˜m has the same number of examples as in data set D

These data sets are reasonably different from each other (since only about 63% of the original
examples appear in any of these data sets)

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training examples
˜m} M
Creates M copies {D m=1

Each D˜m is generated from D bysampling with replacement

Each data set D˜m has the same number of examples as in data set D

These data sets are reasonably different from each other (since only about 63% of the original
examples appear in any of these data sets)

Train models h1, . . . , hM using D˜1, . . . , D˜M, respectively

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training
examples
Creates M copies {D ˜m} M
m=1

Each D˜m is generated from D bysampling with replacement

Each data set D˜m has the same number of examples as in data set D

These data sets are reasonably different from each other (since only about 63% of the original
examples appear in any of these data sets)

Train models h1, . . . , hM using D˜M1, . . . , D˜M, respectively


Σ
Use an averaged model h = M1 m=1 hm as the final model

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation


Takes original data set D with N training
examples
Creates M copies {D ˜m} M
m=1

Each D˜m is generated from D bysampling with replacement

Each data set D˜m has the same number of examples as in data set D

These data sets are reasonably different from each other (since only about 63% of the original
examples appear in any of these data sets)

Train models h1, . . . , hM using ˜1, . . . , D˜M, respectively


1 DM
Use an averaged model h =M Σ m=1 hm as the final model
Useful for models with high variance and noisy
data
Ensemble Methods: Bagging and Boosting 4
Bagging: illustration
Top: Original data, Middle: 3 models (from some model class) learned using three data sets chosen
via bootstrapping, Bottom: averaged model

Ensemble Methods: Bagging and Boosting


Random Forests

An ensemble of decision tree (DT)


classifiers

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of
features)

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features
Randomly chosen features make the different trees uncorrelated

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features
Randomly chosen features make the different trees uncorrelated
All DTs usually have the same depth

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features
Randomly chosen features make the different trees uncorrelated
All DTs usually have the same depth
Each DT will split the training data differently at the leaves

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features Randomly
chosen features make the different trees uncorrelated
All DTs usually have the same depth
Each DT will split the training data differently at the leaves
Prediction for a test example votes on/averages predictions from all the DTs
Ensemble Methods: Bagging and Boosting 6

You might also like