1.9-b - Discretization - Concept-Hierarchy

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

1.

10 Discretization
Discretization in data mining is the process that is frequently used and it is used to
transform the attributes that are in continuous format.
On the other hand, binarization is used to transform both the discrete attributes and
the continuous attributes into binary attributes in data mining
Data Discretization is the process which divides the range of attributes into intervals
so as to reduce number of values for a given continuous attribute.
Numerous continuous attributes are replaced by small interval labels. 4 steps are
involved, namely, as:
a) Splitting – it follows top – down approach in which the attribute is splitted into
range of values.
b) Merging – it follows bottom – up approach in which the processing starts from
the last (end) till the front(top) through merging. Here, initially we consider all
and later remove some by merging
c) Supervised – here the class information is known
d) Unsupervised - here the class information is not known
What are some famous techniques of data discretization ?
Histogram analysis, Binning, Correlation analysis, Clustering analysis, Decision tree
analysis, Equal width partitioning. Equal depth partitioning
Thus transforming continuous attributes into a categorical attribute is called
discretization.
Concept hierarchy :
It also helps in reducing the data by collecting and replacing low level concepts with
high level concepts.
Ex.; when we have both mobile no. & landline no. as two attributes at low level, then
they can be replaced by a single attribute with telephone no. at higher level.
Both Discretization & Concept hierarchy can be applied on numerical data by using
the following methods for further processing such as
- Binning
- Histogram analysis
- Cluster analysis

You might also like