Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

RANDOM FOREST

ENSEMBLE METHODS

 A single decision tree does not perform well


 But, it is super fast
 What if we learn multiple trees?

 We need to make sure they do not all just learn the same

2
BAGGING

 If we split the data in random different ways, decision trees give different
results, high variance.
 Bagging: Bootstrap aggregating is a method that result in low variance.
 If we had multiple realizations of the data (or multiple samples), we could
calculate the predictions multiple times and take the average of the fact that
averaging multiple onerous estimations produce less uncertain results

3
BAGGING (BOOTSTRAP AGGREGATING):
 Bagging (Bootstrap Aggregating):

Take random subsets of data points from the training set to create N smaller data sets
Fit a decision tree on each subset
N subsets (with
replacement)

Training set
Bagging is a random
sampling with
replacement.

4
RANDOM FOREST

 Random Forest: Random Forest is one of the popular ensemble method in which
several trees (thus the name “forest”) are developed using different sampling strategies.
One of the most frequently used sampling strategy is the Bootstrap Aggregating (or
Bagging).
 Bagging is a random sampling with replacement.

 Decision tree is an easy-to-use tool, but it incorporates lot of inaccuracy. In other words,
it works great with the data used to create them, but they are not flexible when it comes
to classifying new samples.
 Random forests combine the simplicity of decision trees with the flexibility resulting in a
vast improvement in accuracy.

5
BAGGING AT TESTING TIME

A test sample

75% confidence

6
STEPS
Random forests are developed using the following steps:

 Step 1 : Assume that the training data has N observations. One needs to
generate several samples of size M (M < N) with replacement (called Bagging).
Let the number of samples based on sampling of the training dataset be S1.
 Step 2 : Develop trees for each of the samples generated in steps 1 using the
sample of predictors from step 2 using CART.
 Step 3 : Repeat step 3 for all the samples generated in step 1.
 Step 4 : Predict the class of a new observation using majority voting based on
all trees.

7
RANDOM FOREST

 One has to be aware of possible overfitting while using random forest. The
model is validated using the validation data, known as Out-of-Bag (OOB) data.

 All such cases that are not part of training data of a tree can be used as a
validation data and such cases are called Out-Of-Bag data.

You might also like