ML Cat 2 - 7

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

Ensemble Methods: Bagging and

Boosting

Ensemble Methods: Bagging and Boosting


Supervised Learning

Goal: learn predictor h(x)


High accuracy (low error)
Using training data {(x1,y1),…,(xn,yn)}
Different Classifiers
Performance
None of the classifiers is perfect
Complementary
Examples which are not correctly
classified by one classifier may be
correctly classified by the other
classifiers
Potential Improvements?
Utilize the complementary property
Ensembles of Classifiers
Idea
Combine the classifiers to improve the
performance
Ensembles of Classifiers
Combine the classification results from
different classifiers to produce the
final output
Unweighted voting
Weighted voting
Example: Weather Forecast

Reality
1 X X X
2 X X X
3 X X X
4 X X
5 X X
Combine
Outline

•Bias/Variance Tradeoff

•Ensemble methods that minimize variance


–Bagging
–Random Forests
•Ensemble methods that minimize bias
–Functional Gradient Descent
–Boosting
–Ensemble Selection
Some Simple Ensembles

Ensemble Methods: Bagging and Boosting 2


Some Simple Ensembles

Voting or Averaging of predictions of multiple pre-trained


models

Ensemble Methods: Bagging and Boosting 2


Some Simple Ensembles
Voting or Averaging of predictions of multiple pre-trained
models

“Stacking”: Use predictions of multiple models as “features” to train a new model and use the
new model to make predictions on test data

Ensemble Methods: Bagging and Boosting 2


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models

We can use some simple/weak model as the base model

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models

We can use some simple/weak model as the base model

How do we get multiple training data sets (in practice, we only have one data set at training
time)?

Ensemble Methods: Bagging and Boosting 3


Ensembles: Another Approach

Instead of training different models on same data, trainsame modelmultiple times


ondifferent data sets, and “combine” these “different” models

We can use some simple/weak model as the base model

How do we get multiple training data sets (in practice, we only have one data set at training
time)?

Ensemble Methods: Bagging and Boosting 3


Bagging

Bagging stands for Bootstrap Aggregation

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training examples

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training


examples
Creates M copies {D ˜ m} M
m=1

Ensemble Methods: Bagging and Boosting 4


Baggin
g
Bagging stands for Bootstrap Aggregation

Takes original data set D with N training examples


˜ m}
Creates M copies {D M
m=1

Each D˜mis generated from D bysampling with replacement

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training examples


˜ m}
Creates M copies {D M
m=1

Each D˜mis generated from D bysampling with replacement


Each data set D˜mhas the same number of examples as in data set D

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training examples


˜ m}
Creates M copies {D M
m=1

Each D˜mis generated from D bysampling with replacement


Each data set D˜mhas the same number of examples as in data set D
These data sets are reasonably different from each other (since only about 63% of the
original examples appear in any of these data sets)

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training examples


˜ m}
Creates M copies {D M
m=1

Each D˜mis generated from D bysampling with replacement


Each data set D˜mhas the same number of examples as in data set D
These data sets are reasonably different from each other (since only about 63% of the
original examples appear in any of these data sets)

Train models h1, . . . , hM using D˜1, . . . , D˜M,respectively

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training


examples
Creates M copies {D ˜ m} M
m=1

Each D˜mis generated from D bysampling with replacement


Each data set D˜mhas the same number of examples as in data set D
These data sets are reasonably different from each other (since only about 63% of the
original examples appear in any of these data sets)

Train models h1, . . . , hM using D˜1, . . . , D˜M,respectively


Σ M
Use an averaged model h =M1 m=1 hm as the final model

Ensemble Methods: Bagging and Boosting 4


Bagging

Bagging stands for Bootstrap Aggregation

Takes original data set D with N training


examples
Creates M copies {D ˜ m} M
m=1

Each D˜mis generated from D bysampling with replacement


Each data set D˜mhas the same number of examples as in data set D
These data sets are reasonably different from each other (since only about 63% of the
original examples appear in any of these data sets)

Train models h1, . . . , hM using D˜1, . . . , D˜M,respectively


1 M
Use an averaged model h =M Σ m=1 hm as the final model
Useful for models with high variance and noisy
data

Ensemble Methods: Bagging and Boosting 4


Bagging: illustration
Top: Original data, Middle: 3 models (from some model class) learned using three data sets chosen
via bootstrapping, Bottom: averagedmodel

Ensemble Methods: Bagging and Boosting


Random Forests

An ensemble of decision tree (DT)


classifiers

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of
features)

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen
features

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen
features Randomly chosen features make the different trees
uncorrelated

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen
features Randomly chosen features make the different trees
uncorrelated
All DTs usually have the same depth

Ensemble Methods: Bagging and Boosting 6


Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen
features Randomly chosen features make the different trees
uncorrelated
All DTs usually have the same depth
Each DT will split the training data differently at the leaves
Ensemble Methods: Bagging and Boosting 6
Random Forests

An ensemble of decision tree (DT) classifiers


Uses bagging on features (each DT will use a random set of features)

Given a total of D features, each DT uses D randomly chosen features Randomly
chosen features make the different trees uncorrelated
All DTs usually have the same depth
Each DT will split the training data differently at the leaves
Prediction for a test example votes on/averages predictions from all the DTs
Ensemble Methods: Bagging and Boosting 6

You might also like