Professional Documents
Culture Documents
Clustering High Dimensional Data
Clustering High Dimensional Data
Clustering High Dimensional Data
Dimensional Data
I ntroduction
• Noise
W hat happen
when Data become increasingly sparse because the data
points are likely located in different dimensional
dimensionality subspaces.
increases?
data points can be considered as all equally
distanced. the distance measure, which is essential
for cluster analysis, becomes meaningless.
S olution Techniques
Subspace
eature/Attribute
F Feature/Attribute Clustering
Transformation Selection
F eature Transformation
Transform the data onto a smaller space while They summarize data by creating linear combinations of
preserving the original relative distance between the attributes
objects.
F eature Selection
• The problem becomes how to find such subspace clusters effectively and
efficiently.
F eature Transformation Issues
01 02 03
Dimension-Growth Dimension-Reduction Frequent Pattern-
Subspace Clustering, Projected Clustering, Based Clustering,
represented by represented by represented by
CLIQUE. PROCLUS. pCluster.
C LIQUE: Dimension-Growth Subspace Clustering
CLIQUE is used for the clustering CLIQUE identifies the dense units
of high-dimensional data present in the subspaces of high
in large tables. By high- dimensional data space, and uses
dimensional data we mean records these subspaces to provide more
that have many attributes. efficient clustering.
C LIQUE Overall Approach
PROCLUS Drawbacks: