Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Decision Tree Assignment

Supervised By
Dr Mohamed Abo Rizka
Prepared By
Saif Allah Mohamed Bakry
1. How to handling the training data with the missing attribute value?
Decision Trees handle missing values in the following ways:
 Fill the missing attribute value by the most common value of that attribute.
 Fill the missing value by assigning a probability to each of the possible values
of the attribute based on other samples

2. Choosing an best attribute: what quality measure to use?


When we choose the best attribute, we must select one of this two techniques of
decision tree:
 Information gain
 Gini index

 Information Gain:
The measurement of changes in entropy after segmenting a dataset based on an
attribute is known as information gain.
It calculates how much information a feature provides us about a class
We split the node and built the decision tree based on the value of information
gain
Information Gain Formula = Entropy(S)- [(Weighted Avg) *Entropy (each feature)  

 Gini Index:
 Gini index is a measure of impurity or purity used while creating a decision tree in
the CART (Classification and Regression Tree) algorithm.
 An attribute with the low Gini index should be preferred as compared to the high
Gini index.
 It only creates binary splits, and the CART algorithm uses the Gini index to create
binary splits

Gini Index formula = 1- ∑jPj2

2|Page
3. Determining when to stop splitting: avoid overfitting?
To avoid the overfitting in decision tree we must follow 2 ways:
 Reduced error pruning: Creating a node from a subtree and
characterizing it with the most common classification of the training
examples
 Rule post-pruning: Convert the tree to rules, then prune the decision
tree's rules by removing preconditions that improve the rule's estimated
accuracy

4. Handling Attributes with Differing Costs?


We would prefer decision trees that use low-cost attributes whenever
possible, and only use high-cost attributes when necessary to produce
reliable classifications

3|Page

You might also like