Professional Documents
Culture Documents
Evaluating Model Accuracy and Bias-Variance Tradeoff
Evaluating Model Accuracy and Bias-Variance Tradeoff
Bias-Variance TradeOff
Bias check:
How well the predicted values fitted the actual values? (Ideally low bias model is the best model)
Important Terminology:
Root Node : Test/Decision Points
Branch : Collection of nodes and the Leaf
Leaves : End route note/Final decisions/Conclusions
Advantages:
• Fast
• Robust
• Explicable
Regression Trees
• C5.0:
- Multi split
- Information Gain (Measure of Purity)
- Pessimistic pruning (To avoid overfitting)
• CART:
- Binary Split
- Gini Index (Measure of Purity)
- Cost Complexity Pruning (To avoid overfitting)
C5.0 Algorithm
MEASURE OF PURITY
Entropy becomes Zero in case Probability value of any class(Pi) = 1
LOG(1) =0
We can grow until we exhaust the data. But is
that the right time to stop?
Weighted sum of errors(0.51) for the lowest layer is calculated as fallows(which is higher than its previous layer , so Prune it
(6/14)*0.47+(2/14)*0.72+(6/14)*0.47 = 0.51
Finally, the lower layer’s error is higher the its mother branch, hence Prune the complete layer.
CART Algorithm
MEASURE OF PURITY
** Here ‘S’ is Total records