Professional Documents
Culture Documents
Advanced Cluster Analysis: Clustering High-Dimensional Data
Advanced Cluster Analysis: Clustering High-Dimensional Data
ANALYSIS
Clustering high-dimensional data
SYLLABUS
Clustering techniques:
hierarchical,
K-means,
Biclustering methods
SUBSPACE SEARCH METHODS
A subspace search method searches various subspaces
for clusters.
Here, a cluster is a subset of objects that are similar to
each other in a subspace.
The similarity is often captured by conventional
measures such as distance or density.
N1
N2
N2
N1
N3
N1 N2
K=4
N3 N4
COMMUNITY
Community is the group of CLIQUES such that all the
CLIQUES must have ‘K-1’ nodes in common.
CLIQUE PERCOLATION METHOD (CPM)
CLIQUE
COMMUNITY
CLIQUE- EXAMPLE
EXAMPLE CONTINUE
EXAMPLE CONTINUE
EXAMPLE CONTINUE
CLIQUE & COMMUNITY
COMMUNITY =
{CLIQUE 1, CLIQUE 2 }
EXAMPLE
CLIQUE ( K =3)
a) {1,2,3}
b) {1,2,8}
c) {2,6,5}
d) {2,6,4}
e) {2,5,4}
f) {4,5,6}
Community 1= {a, b}
Community 2 = { c,d,e,f}
CLIQUE ( K =3)
a) {1,2,3}
b) {1,2,8}
d
c) {2,6,5} c
d) {2,6,4}
e) {2,5,4}
f) {4,5,6} e f
Community 1= {a, b}
Community 2 = { c,d,e,f}
EXAMPLE
IDENTIFY – CLIQUE(K= 5 AND K = 4 )
3
10
2 7
1 9
5 6
PROCLUS
Choose a sample set of data point randomly.
Choose a set of data point which is probably the
medoids of the cluster
INPUT AND OUTPUT FOR PROCLUS
Input:
The set of data points
Iterative Phase
Refinement Phase
INITIALIZATION PHASE
Choose a sample set of data point randomly.
Choose a set of data point which is probably the medoids
of the cluster
ITERATIVE PHASE
From the Initialization Phase, we got a set of data points which
should contains the medoids. (Denoted by M)
This phase, we will find the best medoidsfrom M.
Randomly find the set of points Mcurrent, and replace the “bad”
medoidsfrom other point in M if necessary.
For the medoids, following will be done:
Find Dimensions related to the medoids
Find the bad medoid, and try the result of replacing bad medoid