Professional Documents
Culture Documents
DECISION TREE REGRESSION Lyst7279 PDF
DECISION TREE REGRESSION Lyst7279 PDF
Decision Tree model breaks down a dataset into smaller subset in form of a tree structure.
The core algorithm to build Decision Tree is ID3. The ID3 algorithm is used to build Decision Tree for
regression using Standard Deviation Reduction.
Standard Deviation:
A decision tree is build in a top-down manner such that the data is partitioned into subsets which are
more homogenous. We will used standard deviation to calculate the homogeneity of numerical sample.
If the numeric sample is completely homogenous, the standard deviation is zero.
A. Standard deviation for one attribute:
Count = n = 14
Mean = 39.8
= 7.66
Standard Deviation Reduction
The goal is to find the attribute such that there is maximum reduction of standard deviation.
Step 2. T he dataset is split on different attributes. The standard deviation for each branch is calculated.
The resulting standard deviation is subtracted from standard deviation before split. This resulting value
is called standard deviation reduction.
Step 3. T he attribute with the largest standard deviation reduction is chosen for decision node. In this
case, it was outlook.
Step 4. T he dataset is divided based on the values of the selected attribute. This process is performed
recursively on the non leaf branch until all data is processed.
This process is terminated when coefficient of deviation (CV) reaches a certain threshold (like 10%), or
the count (n) becomes too low in a branch(like 4).
Step 5. In the leaf node, we calculate the average of all the instances as the final value for the target.