Professional Documents
Culture Documents
Poster
Poster
Figure 1b: Data with 6 objects in one bin. Adding a dimension into 2 dimension result noise and outlier
Figure 1c: Data with 4 objects in one bin. High dimensional datasets (3 or more dimension) will produced more outlier or noise, cluster should have intersection
CLIQUE (Kailing, et.al) identifies dense clusters in subspaces of maximum dimensionality. Once the appropriate subspaces are found, the task is to find clusters in the corresponding projections. The data points are separated according to the valleys of the density function. The clusters are unions of connected high density units within a subspace. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. PreDeCon (subspace PREference weighted DEnsity CONnected Clustering) is the concept of local subspace preferences, which captures the main directions of high point density, use a weighted Euclidean distance measure to compute smaller but more specific clusters instead of trying to cluster all available points, resulting in large and unspeci c clusters (Bohm et.al.). SC2D (Subspace Clustering with Dimensional Density) puts objects into the same cluster if they have similar dimensional density (Huang, et.al.). EDSC (Efficient Density-based Subspace Clustering) propose lossless efficient detection of densitybased subspace clusters, reduce the high computational cost of densitybased subspace clustering by a complete multi step filter and refine algorithm (Assent, et.al.). SUBCLU (density connected SUBspace CLUstering) is using the concept of density-connectivity underlying the algorithm DBSCAN (Kailing, et.al.).
The distance between objects in high dimensional datasets are generally similar to each other . This fact will produce clusters that tend to be very tight, even coincide or overlap. To detect clusters usually performed similarity judgments of objects (Sembiring, et.al.). Resemblance or similarity between objects is often determined by measuring the distance between objects in various dimensions. Subspace method is ideal to use for the case of high dimensional datasets. Subspace clustering is an extension of conventional clustering (Parson, et.al.), is used to find second cluster, third and so on of the datasets, which are in different domains. Figure 2 illustrate the need for subspace clustering. Subspace clustering is a method of detecting all groups in all subspaces (Aggrawal, et.al.). It is possible, one point as a member of several groups that are on in a different subspace. Subspaces can be axis-parallel. This term is commonly used in high dimensional clustering. Figure 2 is an illustration to illustrate Figure 2 illustrates
Proposed Algorithm:
CONCLUSION: This paper has proposed a new subspace algorithm for clustering in high dimensional space, concept based on density connectivity. DBSCAN used to find out initial cluster. Dimensional density used to find the subspace from each cluster. Implementation of this algorithm will testing for synthetic datasets and real datasets.
References: 1. Ester, Martin, Hans-Peter Kriegel, Jrg Sander, Xiaowei Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Database 2. Kailing, Karin, Hans-Peter Kriegel , Peer Kroger, Density-Connected Subspace Clustering for HighDimensional Data 3. Sembiring, Rahmat Widia, Jasni Mohamad Zain, Abdullah Embong: Clustering High Dimensional Data Using Subspace And Projected Clustering Algorithm 4. Sembiring, Rahmat Widia, Jasni Mohamad Zain: Cluster Evaluation Of Density BasedSubspace Clustering 5. Bohm , Christian, Karin Kailing, Hans-Peter Kriegel, Peer Kroger , Density Connected Clustering with Local Subspace Preferences 6. Huang, Wang Fei, Lifei Chen, Qingsan Jiang, A Novel Subspace Clustering Algorithm with Dimensional Density 7. Assent, Ira, Ralph Krieger, Emmanuel Mller, Thomas Seidl EDSC: Efficient Density-Based Subspace Clustering
Figure 3a. Sample data Figure 3b. Sample data Figure 3c. Sample data plot in 2 dimension (a plot in 2 dimension (b visible in 4 cluster, but and b), two cluster and c), two cluster still overlap, and uneasy properly separated, but properly separated, but to separate using 1 cluster remain mixed 1 cluster still mixed conventional clustering algorithm.