Professional Documents
Culture Documents
Data Mining Part 1: Ing. Ridho Rahmadi, M.SC
Data Mining Part 1: Ing. Ridho Rahmadi, M.SC
Data Mining Part 1: Ing. Ridho Rahmadi, M.SC
All contents are from materials and book Introduction to Data Mining by
Tan, Steinbach, Kumar.
Task: Find a model for class attribute as a function of the values of other
attributes.
Classification techniques:
1 Decision tree based methods
2 Rule based methods
3 Support vector machine
4 and many more
There can be more than one tree that fits the same data set!
Ing. Ridho Rahmadi, M.Sc SPK-BI November 21, 2017 9 / 39
Applying model to the test data
Multi-way split
Use as many partitions as distinct values.
Binary split
Divides values into two subsets; need to find optimal partitioning.
Multi-way split
Use as many partitions as distinct values.
Binary split
Divides values into two subsets, respects the order; need to find optimal
partitioning.
Greedy approach
Nodes with homogeneous class distribution are preferred
1. GINI index
2. Entropy
3. Misclassification error
Information gain:
k
X ni
GAINsplit = Entropy (p) − ( Entropy (i))
n
i=1