Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 1

Subspace Clustering Based On Density Connection

Rahmat Widia Sembiring Assoc. Prof. Jasni Mohamad Zain


ABSTRACT : Objects generally represented as vectors or points contained in one or more dimensions. Cluster analysis performed to find groups, or patterns that are similar, conventional algorithms often produce irrelevant clusters. The distance between objects in high dimensional datasets generally similar each other, will produce tight clusters, or overlap, subspace method ideal to use to detect clusters due to similarity judgments of objects,. Refer to multidimensional data, let A={A1,A2, ,An } as set of finite bounded, totally ordered domains of n-dimensional numerical space. Problem is how to place each object datasets into different subspaces, explore the data and put objects into a separate cluster. This research propose a new subspace algorithm for clustering, concept based on density connectivity. INTRODUCTION : Generally, objects are represented as vectors or points contained in one or more dimensions. Cluster analysis performed to find groups (Parson, et.al.), or patterns that are similar (Figure 1a). Due to increase of data need to process 2 dimensions, in this phase clustering process resulted outlier or noise (Figure 1b). Conventional algorithms tend not to work to get the cluster with the maximum, even generate noise or outlier (Figure-1c). In two dimensions, can form three clusters, as in Figure 3a above, sample plot data in 2 dimension (a and b), two clusters properly separated, but a cluster Remain Mixed. Can also sample plot data in 2 dimension (b and c), two clusters properly separated, but still a mixed cluster. Besides it can also be produced with a clear separation of clusters (Figure 3c), but still object overlap, and uneasy to separate using conventional clustering algorithms. RELATED WORK : DBSCAN (Ester et.al) is relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape. DBSCAN requires only one input parameter and supports the user in determining an appropriate value for it. DBSCAN starts with an arbitrary point p and retrieves all points density-reachable from p wrt. Eps and MinPts. Directly density-reachable (Figure 4.a), is a point p is directly density-reachable from a point q wrt. Eps, MinPts if and (as core point condition). Density-connectivity (Figure 4.b) is a symmetric relation.

Figure 4a. Density reach ability

Figure 4b. Density connectivity

Figure 1a: Data with 11 object in one bin

Figure 1b: Data with 6 objects in one bin. Adding a dimension into 2 dimension result noise and outlier

Figure 1c: Data with 4 objects in one bin. High dimensional datasets (3 or more dimension) will produced more outlier or noise, cluster should have intersection

CLIQUE (Kailing, et.al) identifies dense clusters in subspaces of maximum dimensionality. Once the appropriate subspaces are found, the task is to find clusters in the corresponding projections. The data points are separated according to the valleys of the density function. The clusters are unions of connected high density units within a subspace. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. PreDeCon (subspace PREference weighted DEnsity CONnected Clustering) is the concept of local subspace preferences, which captures the main directions of high point density, use a weighted Euclidean distance measure to compute smaller but more specific clusters instead of trying to cluster all available points, resulting in large and unspeci c clusters (Bohm et.al.). SC2D (Subspace Clustering with Dimensional Density) puts objects into the same cluster if they have similar dimensional density (Huang, et.al.). EDSC (Efficient Density-based Subspace Clustering) propose lossless efficient detection of densitybased subspace clusters, reduce the high computational cost of densitybased subspace clustering by a complete multi step filter and refine algorithm (Assent, et.al.). SUBCLU (density connected SUBspace CLUstering) is using the concept of density-connectivity underlying the algorithm DBSCAN (Kailing, et.al.).

The distance between objects in high dimensional datasets are generally similar to each other . This fact will produce clusters that tend to be very tight, even coincide or overlap. To detect clusters usually performed similarity judgments of objects (Sembiring, et.al.). Resemblance or similarity between objects is often determined by measuring the distance between objects in various dimensions. Subspace method is ideal to use for the case of high dimensional datasets. Subspace clustering is an extension of conventional clustering (Parson, et.al.), is used to find second cluster, third and so on of the datasets, which are in different domains. Figure 2 illustrate the need for subspace clustering. Subspace clustering is a method of detecting all groups in all subspaces (Aggrawal, et.al.). It is possible, one point as a member of several groups that are on in a different subspace. Subspaces can be axis-parallel. This term is commonly used in high dimensional clustering. Figure 2 is an illustration to illustrate Figure 2 illustrates

Proposed Algorithm:

Figure 2.Cluster overlap each other

CONCLUSION: This paper has proposed a new subspace algorithm for clustering in high dimensional space, concept based on density connectivity. DBSCAN used to find out initial cluster. Dimensional density used to find the subspace from each cluster. Implementation of this algorithm will testing for synthetic datasets and real datasets.
References: 1. Ester, Martin, Hans-Peter Kriegel, Jrg Sander, Xiaowei Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Database 2. Kailing, Karin, Hans-Peter Kriegel , Peer Kroger, Density-Connected Subspace Clustering for HighDimensional Data 3. Sembiring, Rahmat Widia, Jasni Mohamad Zain, Abdullah Embong: Clustering High Dimensional Data Using Subspace And Projected Clustering Algorithm 4. Sembiring, Rahmat Widia, Jasni Mohamad Zain: Cluster Evaluation Of Density BasedSubspace Clustering 5. Bohm , Christian, Karin Kailing, Hans-Peter Kriegel, Peer Kroger , Density Connected Clustering with Local Subspace Preferences 6. Huang, Wang Fei, Lifei Chen, Qingsan Jiang, A Novel Subspace Clustering Algorithm with Dimensional Density 7. Assent, Ira, Ralph Krieger, Emmanuel Mller, Thomas Seidl EDSC: Efficient Density-Based Subspace Clustering

Figure 3a. Sample data Figure 3b. Sample data Figure 3c. Sample data plot in 2 dimension (a plot in 2 dimension (b visible in 4 cluster, but and b), two cluster and c), two cluster still overlap, and uneasy properly separated, but properly separated, but to separate using 1 cluster remain mixed 1 cluster still mixed conventional clustering algorithm.

You might also like