Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

CLUSTERING Of

CATEGORICAL DATA
Using Entropy based K
MODE algorithm
INTRODUCTION :-
The interest in attribute weighting for soft subspace clustering have been increasing
in the last years. However, most of the proposed approaches are designed for
dealing only with numeric data. In the Proposed work, our focus is on soft subspace
clustering for categorical data. In soft subspace clustering, the attribute weighting
approach plays a crucial role.We will implement an entropy-based approach for
measuring the relevance of each categorical attribute in each cluster and will
compare the performance of proposed algorithms with additional approach, using
three well-known evaluation metrics: accuracy, f-measure and adjusted Rand index.

PROPOSED THEORY:-
Most of the recent results in the soft subspace clustering for categorical data,
proposes modification of the k-modes algorithm. In general in these approaches the
contribution of each attribute is measured considering only the frequency of the
mode category of the average distance of the data objects from the mode of cluster.
In the proposed work the strategy for measuring the contribution of each attribute
considering the motion of the entropy, which measures the uncertainty of a given
random variable is explored. We will implements the EBK modes (Entropy based K
modes) an extension of the basic knowledge algorithm that uses the notion of the
entropy for measuring the relevance of each attribute in each cluster.
PLATFORM:-The given theory will be implemented using java jdk.1.8.
REFERENCES

JOES LUIS CARBONERA, MARA ABEL, An entropy-based


subspace clustering algorithm for categorical data, IEEE 26th
International Conference on Tools with Artificial Intelligence,2014,p.p
272-277.

D. Barbara, Y. Li, and J. Couto, Coolcat: an entropybased


algorithm for categorical clustering, in Proceedings of the
eleventh international conference on Information and knowledge
management. ACM, 2002, pp. 582589.
P. Andritsos and P. Tsaparas, Categorical data clustering, in
Encyclopedia of Machine Learning. Springer, 2010, pp. 154159.
J. L. Carbonera and M. Abel, Categorical data clustering:a
correlation-based approach for unsupervised attribute
weighting, in Proceedings of ICTAI , 2014.
L. Bai, J. Liang, C. Dang, and F. Cao, A novel attribute weighting
algorithm for clustering high-dimensional categoricaldata,
Pattern Recognition,vol. 44,no. 12,2011,pp. 28432861.
G. Gan and J. Wu, Subspace clustering for high dimensional
categorical data, ACM SIGKDD Explorat.
GROUP MEMBERS
1. Karan Chopra
2. Mayank Tagra
3. Sagar Jha
4. Vivek Khandelwal

You might also like