Professional Documents
Culture Documents
Tree-Based Methods
Tree-Based Methods
TREE-BASED METHODS
FORESTS: The Wisdom of Crowds?
CONTENT
1. CLASSIFICATION TREES
2. REGRESSION TREES
3. RANDOM FORESTS
4. BOOSTED TREES
TREES AND RULES
▪
▪
▪
▪
RECURSIVE PARTITIONING
▪
▪
▪
▪
▪
Income Lot_Size Ownership
EXAMPLE:
60.0 18.4 owner
85.5 16.8 owner
64.8 21.6 owner
▪
▪
▪
NOTE: CATEGORICAL VARIABLES
▪
▪
▪
▪
▪
▪
THE FIRST SPLIT: INCOME = 60
SECOND SPLIT: LOT SIZE = 21
AFTER ALL SPLITS
MEASURING IMPURITY: GINI INDEX
▪
I(A) = 1 - σ𝑚 𝑝
𝑘=1 𝑘
2
MEASURING IMPURITY: ENTROPY MEASURE
▪
𝑚
▪
EXP: GINI & ENTROPY INDEX
IMPURITY AND RECURSIVE PARTITIONING
▪
▪
THE TREE: FIRST SPLIT
THE TREE: AFTER THREE SPLITS
The dominant class
in this portion of the
first split (those with
income >= 60) is
If Income < 60 AND
“owner” – 11 owners
Lot Size < 21,
and 5 non-owners
classify as “Nonowner”
The next split for
this group of 16 will
be on the basis of
lot size, splitting at
Decision nodes 20
Leaves
(terminal/leaf nodes)
The first split is on Income, then the next split is on Lot Size for both the low
income group (at lot size 21) and the high income split (at lot size 20)
DETERMINING LEAF NODE LABEL
▪
▪
OVERFITTING PROBLEM
▪
▪
▪
OVERFITTING PROBLEM
▪
PRUNING
CC(T) = Err(T) + L(T)
▪
▪
▪
→
REGRESSION TREES FOR REGRESSION
▪
▪
▪
▪
▪
▪
ADVANTAGES/DISADVANTAGES OF TREES
▪
▪
▪
▪
▪
▪
▪
▪
▪
ENSEMBLE LEARNING
▪
▪
▪
▪
▪
RANDOM FORESTS
▪
▪
BOOSTED TREES
SUMMARY
▪