Professional Documents
Culture Documents
SPATIALCLUSTERINGALGORITHMS-ANOVERVIEW
SPATIALCLUSTERINGALGORITHMS-ANOVERVIEW
net/publication/235605835
CITATIONS READS
26 9,260
3 authors:
K. Poulose Jacob
Cochin University of Science and Technology
116 PUBLICATIONS 653 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Integrating social and health care for improved management of chronic physical and psychological health conditions in India; testing a technology infused health
intervention strategy View project
All content following this page was uploaded by Avittathur Unnikrishnan on 07 August 2015.
1
Bindiya et.al/ Spatial Clustering Algorithms- An overview
dissimilarity of different clusters. Partitional algorithms are samples is given by 40 + 2k. [2]. for CLARA, by applying
generally iterative in nature and converge to some local PAM just to the samples, each iteration is of O(k(40 +
optima. Given a set of data points xi ∈ ℜd, i = 1,…,N , k)2 + k(n - k)). This explains why CLARA is more efficient
partitional clustering algorithms aim to organize them into than PAM for large values of n.
K clusters {C1, …, CK} while maximizing or minimizing a CLARANS
pre-specified criterion function J. CLARANS (Clustering Large Applications based on
K-mediod Randomized Search) improves on CLARA by using multiple
K-medoids algorithms are partitional algorithm which different samples (Ng, Raymond T. and Jiawei Han., 1994).
attempt to minimize squared error, the distance between While CLARA draws a sample of nodes at the beginning of a
points labeled to be in a cluster and a point designated as search, CLARANS draws a sample of neighbors in each step
the center of that cluster. A medoid can be defined as that of a search. This has the benefit of not confining a search to
object of a cluster, whose average dissimilarity to all the a localized area. In addition to the normal input to PAM,
objects in the cluster is minimal i.e. it is a most centrally CLARANs uses two additional parameters; numlocal and
located point in the given data set. In contrast to the k- maxneighbor. Numlocal indicates the number of samples to
means algorithm k-medoids chooses data points as centers. be taken. The numlocal also indicates the number of
PAM clustering to be made since a new clustering has to be done
The Partitioning around medoid (PAM) algorithm on every sample. Maxneighbor is the number of neighbors
represents a cluster by a medoid (Ng, Raymond T. and of a node to which any specific node can be compared. As
Jiawei Han., 1994). PAM is based on the search for k maxneighbor increases, CLARANs resemble PAM, because
representative objects among the objects of the data set. all nodes are to be examined. J. Han et al shows the good
These objects should represent various aspects of the choice for the parameters are numlocal = 2 and
structure of the data are often called centrotypes. In the maxneighbor = max ((0.0125 x k(n-k)),250). The
PAM algorithm the representative objects are the so-called disadvantage of CLARANS is that it assumes all data are in
medoid of the clusters (Kaufman and Rousseeuw, 1987). main memory.
After finding a set of k representative objects, the k clusters SDCLARANS
are constructed by assigning each object of the data set to Spatial dominant CLARANS assumes the data set to contain
the nearest representative object. spatial and non-spatial components (Ng, Raymond T. and
Initially, a random set of K items is taken to be the set of Jiawei Han., 1994). The general approach is to cluster
medoids. Then, at each step, all items other than the chosen spatial components using CLARANS and then examines the
medoids from the input sample set are examined one by non-spatial within each cluster to derive a description of
one to see if they should be the new medoids. The that cluster. For mining spatial attributes, a tool named
algorithm chooses the new set of medoids which improves DBLEARN is used (Jiawei Han , Yandong Cai , Nick Cercone,
the overall quality of the clustering and replaces the old set 1992 ). From a learning request, DBLEARN first extracts
of medoids with them. a set of relevant tuples via SQL queries. Then based on
Let Ki be the cluster represented by the medoid ti. To swap the generalization hierarchies of attributes, it ,iteratively
with a non medoid th, the cost change of an item tj generalizes the tuples. SDCLARANS is a combination of
associated with the of exchange of ti with th, Cjih has to be CLARANS and DBLEARN.
computed. (M. Ester, H.-P. Kriegel, S. Jörg, and X. Xu, 1996). NSDCLARANS
The total impact to quality by a medoid change TCjih is Opposite to SDCLARANS, NSDCLARANS considers
given by TCjih = ∑𝑛𝑗=1 𝐶𝑗𝑖ℎ. the non-spatial attributes in the first phase (Jiawei Han ,
The k-medoid methods are very robust to the existence of Yandong Cai , Nick Cercone, 1992 ). DBLEARN is applied to
outliers. Also, Clusters found by K-medoid methods do the non-spatial attributes, until the final number of
not depend on the order in which the objects are generalized tuples fall below a certain threshold. For each
examined. They are invariant with respect to translations generalized tuple obtained above, the spatial components
and orthogonal transformations of data points. PAM does of the tuples represented by the current generalized tuple
not scale well to large datasets because of its are collected, and CLARANS is applied.
computational complexity. For each iteration, the cost TCjih K-Mean
has to be computed for k(n-k) pair of objects. Thus the total K-means is one of the simplest unsupervised learning
complexity per iteration is k(n-k)2 , thereby making PAM algorithms used for clustering. K-means partitions n
not an alternative for large databases. observations into k clusters in which each observation
CLARA belongs to the cluster with the nearest mean. This
CLARA (Clustering Large Applications) improves algorithm aims at minimizing an objective function, in this
on the time complexity of PAM (Ng, Raymond T. and Jiawei case a squared error function. The algorithm aims to
Han., 1994). CLARA relies on sampling. PAM is applied to minimize the objective function 𝐽 = ∑𝑘𝑗=1 ∑𝑛𝑖=1�𝑥𝑖 𝑗 −
samples drawn from the large datasets. For better 2 2
𝑐𝑗 � where �𝑥𝑖 𝑗 − 𝑐𝑗 � is a chosen distance measure
approximations, CLARA draws multiple samples and gives
between a data point 𝑥𝑖 𝑗 and the cluster centre 𝑐𝑗 , is an
best clustering as the result. For accuracy, the quality of a
clustering is measured based on the average dissimilarity indicator of the distance of the n data points from their
of all objects in the entire data set, and not only of those respective cluster centres.
objects in the samples. DENCLUE
The method used in CLARA, which was first described by DENCLUE (Density basted Clustering) is a
Kaufman and Rousseeuw (1986), is based on the generalization of partitioning, locality-based and
selection of five (or more) random samples of objects. hierarchical or grid-based clustering approaches (A.
The size of the samples depends on the number of Hinneburg and D. A. Keim, 1998). The influence of each
clusters. For a clustering into k clusters, the size of the data point can be modeled formally using a mathematical
2
Bindiya et.al/ Spatial Clustering Algorithms- An overview
function called influence function. This influence function is reduces the adverse effects due to outliers since
applied to each data point. The algorithm models the outliers are typically further away from the mean and
overall point density analytically using the sum of the are thus shifted a larger distance due to the shrinking.
influence functions of the points. An example influence The kinds of clusters identified by CURE can be
𝑑(𝑧,𝑦)2 tuned by varying α: between 0 and 1. CURE reduces to
−
function can be a Gaussian function fGauss (x,y)= 𝑒 2𝜎2 .
the centroid-based algorithm if α = 1, while for α = 0, it
The density function which results from a guassian
𝑑(𝑧,𝑦)2
becomes similar to the all-points approach. CURE’s
−
influence function is 𝐷
𝑓𝐺𝑢𝑎𝑠𝑠 (𝑥) = ∑𝑁
𝑖=1 𝑒
2𝜎2 . Clusters hierarchical clustering algorithm have a space complexity
can then be determined mathematically by identifying linear to the input size n and has a worst-case time
density attractors. Density attractors are local maxima of complexity of O(n2 log n). For lower dimensions the
the overall density function. These can be either center- complexity is further reduced to O(n2). The overview of
defined clusters, similar to k-means clusters, or multi- CURE algorithm can be diagrammatically represented as
center-defined clusters, that is a series of center-defined [12]
clusters linked by a particular path which identify clusters
of arbitrary shape. Clusters of arbitrary shape can also be
defined mathematically. The mathematical model requires
two parameters, α and ξ. α is a parameter which describes
a threshold for the influence of a data point in the data
space and ξ is a parameter which sets a threshold for
determining whether a density-attractor is significant.
The three major advantages for this method of
higher-dimensional clustering claimed by the authors are Figure 1 Overview of CURE
that the algorithm provides a firm mathematical base for ROCK
finding arbitrary shaped clusters in high-dimensional ROCK (Robust Clustering using links) implements a new
datasets. Also, result show good clustering properties in concept o links to measure the similarity/proximity
data sets with large amounts of noise and significantly between a pair of data points (Sudipto Guha, Rajeev
faster than existing algorithms. Rastogi, Kyuseok Shim, 1999). A pair of data points are
Hierarchical considered neighbors if their similarity exceeds a certain
A sequence is said to be a hierarchical clustering if threshold. The number of links between a pair of points is
there exists 2 samples, c1 and c2, which belong in the same then the common neighbors for the points. Points
cluster at some level k and remain clustered together at all belonging to a single cluster will have a large number of
higher levels > k. The hierarchy is represented as a tree, common neighbors. Let sim(pi,pj) be a similarity function
called a dendrogram, with individual elements at one end that is normalized and captures the closeness between the
and a single cluster containing every element at the other. pair of points pi and pj. The sim assumes values between 0
Hierarchical clustering algorithms are either top-down or and 1. Given a threshold θ between 0 and 1, a pair of points
bottom-up. Bottom-up algorithms called hierarchical (pi, pj) is defined to be neighbors if sim (pi,pj) > θ. Link (pi,pj),
agglomerative clustering, treat each object as a singleton the number of common neighbors between the pair of
cluster at the outset and then successively merge (or points pi and pj. The criterion function is to maximize the
agglomerate) pairs of clusters until all clusters have been sum of link(pq,pr) for data pairs pq, pr belonging to a single
merged into a single cluster. Top-down or divisive cluster and at the same time, minimize the sum of
clustering proceeds by splitting clusters recursively until link(pq,ps) for pq and ps in different clusters. i.e. Maximize
individual objects s are reached. 𝑙𝑖𝑛𝑘 (𝑝𝑞 , 𝑝𝑟 )
∑𝑘𝑖=1 𝑛𝑖 ∗ ∑𝑝 𝑞𝑝𝑟 ∈ 𝐶𝑖 1+2𝑓 (𝜃) where cluster Ci denotes
Agglomerative algorithms 𝑛𝑖
3
Bindiya et.al/ Spatial Clustering Algorithms- An overview
CHAMELEON’s sparse graph representation of the data attribute- independent parameter are the number of
items is based on the k-nearest neighbor graph approach. objects (points) in this cell, say n. Attribute-dependent
Each vertex of the k-nearest neighbor graph represents a parameters are
data item, and there exists an edge between two vertices, if • m: mean of all values in this cell
data items corresponding to either of the nodes are among • s: standard deviation of all values of the
the k-most similar data points of the data point attribute in this cell
corresponding to the other node. CHAMELEON determines • min: the minimum value of the attribute in this
the similarity between each pair of clusters Ci and Cj by cell
looking both at their relative inter-connectivity RI(Ci, Cj) • max: the maximum value of the attribute in this
and their relative closeness RC(Ci ,Cj ). CHAMELEON’s cell
hierarchical clustering algorithm selects to merge the pair • distribution the type of distribution that the
of clusters for which both RI (Ci ,Cj ) and RC(Ci ,Cj) are high; attribute value in this cell follows.
i.e., it selects to merge clusters that are well inter- Clustering operations are performed using a top-
connected as well as close together with respect to the down method, starting with the root. The relevant cells are
internal inter-connectivity and closeness of the clusters. determined using the statistical information and only the
The relative inter-connectivity between a pair of clusters Ci paths from those cells down the tree are followed. Once the
and Cj is defined as the absolute inter-connectivity between leaf cells are reached, the clusters are formed using a
Ci and Cj normalized with respect to the internal inter- breadth-first search, by merging cells based on their
connectivity of the two clusters Ci and Cj. The absolute proximity and whether the average density of the area is
inter-connectivity between a pair of clusters Ci and Cj is greater than some specified threshold. The computational
defined to be as the sum of the weight of the edges that complexity is O(K), where K is the number of grid cells
connect vertices in Ci to vertices in Cj . This is essentially at the lowest level. Usually K << N, where N is the
the edge-cut of the cluster, EC{Ci ,Cj } containing both Ci and Cj number of objects.
such that the cluster is broken into Ci and Cj . The relative STING+
inter-connectivity between a pair of clusters Ci and Cj is STING+ is an approach to active spatial data
�𝐸𝐶{𝐶𝑖 ,𝐶𝑗 }�
given by RI(Ci, Cj) = which normalizes the mining, which takes advantage of the rich research results
�𝐸𝐶𝐶 �+ �𝐸𝐶𝐶 �
𝑖 𝑗 of active database systems and the efficient algorithms in
2
STING (Wei Wang , Jiong Yang , Richard R. Muntz, 1997)for
absolute inter-connectivity with the average internal inter-
passive spatial data mining (Wei Wang, Jiong Yang, Richard
connectivity of the two clusters. The relative closeness
Muntz, 1999). A region in STING+ is defined as a set of
between a pair of clusters Ci and Cj is computed as,
̅
𝑆𝐸𝐶{𝐶 adjacent leaf level cells. Also, object density and attribute
𝑖 ,𝐶𝑗 }
RC(Ci ,Cj ) = ̅
, where 𝑆𝐸𝐶 and conditions in STING+ are defined in terms of leaf level cells.
�𝐶𝑖 � �𝐶𝑗 � 𝐶𝑖
�𝐶𝑖 �+ �𝐶𝑗 �
̅
𝑆𝐸𝐶 + 𝑆̅
𝐶𝑖 �𝐶 �+ �𝐶 � 𝐸𝐶𝐶𝑗 The density of a leaf level cell is defined as the ratio of the
𝑖 𝑗
̅ number of objects in this cell divided by the area of this
𝑆𝐸𝐶 are the average weights of the edges that belong in the
𝐶
𝑗 cell. A region is said to have a certain density c if and only if
min-cut bisector of clusters Ci and Cj , respectively, and the density of every leaf level cell in this region is at least c.
̅
𝑆𝐸𝐶{𝐶 is the average weight of the edges that connect Conditions on attribute values are defined in a similar
𝑖 ,𝐶𝑗 }
vertices in Ci to vertices in Cj . The overall complexity of manner. Two kinds of conditions can be specified by the
CHAMELEON’s two-phase clustering algorithm is O(nm + n user. One condition is an absolute condition, i.e., the
log n + m2 log m). condition is satisfied when a certain state is reached. The
Divisive Algorithms other type of condition is a relative condition, i.e., the
STING condition is satisfied when a certain degree of change has
Statistical Information Grid-based method exploits been detected. Therefore, four categories of triggers are
the clustering properties of index structures (Wei Wang , supported by STING+;
Jiong Yang , Richard R. Muntz, 1997). The spatial area is 1. Region-trigger: absolute condition on certain
divided into rectangular cells which forms a hierarchical regions
structure. Each cell at a high level is partitioned to form 2. Attribute-trigger: absolute condition on certain
a number of cells of the next lower level. Statistical attributes
information of each cell is calculated and stored 3. Region trigger: relative condition on certain
beforehand and is used to answer queries. For each cell, regions
two types of parameters are considered; attribute- 4. Attribute trigger: relative condition on certain
dependent and attribute-independent parameters. The attributes.
4
Bindiya et.al/ Spatial Clustering Algorithms- An overview
5
Bindiya et.al/ Spatial Clustering Algorithms- An overview
6
Bindiya et.al/ Spatial Clustering Algorithms- An overview
processed and the information which would be used by an (Nanopoulos, A., Theodoridis, Y., and Manolopoulos, 2001).
extended DBSCAN algorithm to assign cluster C2P consists of two main phases. The first phase efficiently
memberships. This information consists of only two values determines a number of sub-clusters. The first phase of C2P
for each object: the core-distance and a reachability has as input n points and produces m sub-clusters, and it is
distance. The core-distance of an object p is simply the iterative. The first phase of C2P has the objective of
smallest distance ε’ between p and an object in its ε- efficiently producing a number of sub-clusters which
neighborhood such that p would be a core object with capture the shape of final clusters. Therefore, it represents
respect to ε’ if this neighbor is contained in Nε(p). clusters with their center points. The second phase uses a
Otherwise, the core-distance is UNDEFINED. The different cluster representation scheme to produce the
reachability-distance of an object p with respect to another final clustering. The second phase performs the final
object o is the smallest distance such that p is directly clustering by using the sub-clusters of the first phase and a
density-reachable from o if o is a core object. Depending on different cluster representation scheme. The second phase
the size of the database, the cluster-ordering can be merges two clusters at each step in order to better control
represented graphically for small data sets or can be the clustering procedure. The second phase is a
represented using appropriate visualization technique for specialization of the first, i.e., the latter can be modified in:
large data sets. a) finding different points to represent the cluster instead
DBCLASD the center point, b) finding at each iteration only the closest
Distribution Based Clustering of Large Spatial pair of clusters that will be merged, instead of finding for
Databases (DBCLASD) is another locality-based clustering each cluster the one closest to it. The time complexity of
algorithm, but unlike DBSCAN, the algorithm assumes that C2Pfor large datasets is O(nlog n), thus it scales well to
the points inside each cluster are uniformly distributed large inputs.
(Xiaowei Xu , Martin Ester , Hans-Peter Kriegel , Jörg DBRS+
Sander, 1998). Three parameters are defined in the Density-Based Spatial Clustering in the Presence of
algorithm; NNS(q), NNDist (q ), and NNDistSet(S). Let q be a Obstacles and Facilitators (DBRS+) aims to cluster spatial
query point and S be a set of points. Then the nearest data in the presence of both obstacles and facilitators
neighbor of q in S, denoted by NNS(q), is a point p in S- {q} (Wang, X., Rostoker, C., and Hamilton, H. J, 2004). The
which has the minimum distance to q. The distance from q authors claim that without preprocessing, DBRS+
to its nearest neighbor in S is called the nearest neighbor processes constraints during clustering. It can also find
distance of q, NNDist (q )for short. Let S be a set of points clusters with arbitrary shapes and varying densities. DBRS
and ei be the elements of S. The nearest neighbor distance is a density-based clustering method with three
set of S, denoted by NNDistSet(S), or distance set for short, is parameters, Eps, MinPts, and MinPur. DBRS repeatedly
the multi-set of all values. The probability distribution of picks an unclassified point at random and examines its
the nearest neighbor distances of a cluster is analysed neighborhood, i.e., all points within a radius Eps of the
based on the assumption that the points inside of a cluster chosen point. The purity of the neighborhood is defined as
are uniformly distributed, i.e. the points of a cluster are the percentage of the neighbor points with the same non-
distributed as a homogeneous Poisson point process spatial property as the central point. If the neighborhood is
restricted to a certain part of the data space. A grid-based sparsely populated (≤MinPts) or the purity of the points in
representation is used to approximate the clusters as part the neighborhood is too low (≤MinPur) and disjoint with all
of the probability calculation. DBCLASD is an incremental known clusters, the point is classified as noise. Otherwise,
algorithm. Points are processed based on the points if any point in the neighborhood is part of a known cluster,
previously seen, without regard for the points yet to come this neighborhood is joined to that cluster, i.e., all points in
which makes the clusters produced by DBCLASD the neighborhood are classified as being part of the known
dependent on input order. The major advantage of cluster. If neither of these two possibilities applies, a new
DBCLASD is that it requires no outside input which makes cluster is begun with this neighborhood. The time
it attractive for larger data sets and sets with larger complexity of DBRS is O(n log n) if an R-tree or SR-tree is
numbers of attributes. used to store and retrieve all points in a neighborhood.
Others
CONCLUSION
SParClus
The main objective of spatial data mining is to find
SpaRClus (Spatial RelationshipPattern-Based Hierarchical
patterns in data with respect to its locational significance.
Clustering) to cluster image data is based on an algorithm,
The scope of spatial mining increases as the data generated
SpIBag (Spatial Item Bag Mining), which discovers frequent
from various sources which are geographically referenced
spatial patterns in images (S. Kim, X. Jin, and J. Han, 2008).
increases. Every aspects of applications like health services,
SpIBag is invariant on semi-affine transformations. Semi-
marketing, environmental agencies make use of spatial
affine transformation is a way to express or detect shape
mining to find information contained within. The major
preserving images. SpaRClus uses internally SpIBag
challenges in spatial data mining are that the spatial data
algorithm to mine frequent patterns, and generates a
repositories are tend to be very large and the range and
hierarchical structure of image clusters based on their
diversity in representing the spatial and non-spatial data
representative frequent patterns. When SpIBag algorithm
attributes in the same canvas. Though the above discussed
generates a frequent n-pattern p, SpaRClus computes a
clustering algorithms try to resolve issues like scalability
scoring function of its support image set � 𝐼𝑝 and decides if p
and complexity, it can be observed that a perfect clustering
will be used or not to join with other n-patterns, which
algorithm which comprehends all the issues with the
enables more pruning power than using SpIBag alone.
dataset is an idealistic notion. Current research progresses
C2P
on more dynamic, adaptive and innovative methods that
C2P, Clustering based on Closest Pairs, exploits spatial
decipher meaningful patterns that effectively satisfy the
access methods for the determination of closest pairs
requirements of dealing huge volumes of data of higher
7
Bindiya et.al/ Spatial Clustering Algorithms- An overview
dimensionality, insensitive to large noises, unaffected by large spatial databases with noise. Proceedings of 2nd
the order of input, and having no prior knowledge of the International Conference on Knowledge Discovery and
domain. Data Mining (KDD-96), (pp. 226-231).
14. Nanopoulos, A., Theodoridis, Y., and Manolopoulos.
REFERENCES
(2001). C 2 P: Clustering based on Closest Pairs. In
1. A. Hinneburg and D. A. Keim. (1998). An Efficient
Proceedings of the 27th international Conference on
Approach to Clustering in Large Multimedia Databases
Very Large Data Bases , (pp. 331--340).
with Noise. International Conference on Knowledge
15. Ng, Raymond T. and Jiawei Han. (1994). Efficient and
Discovery and Data Mining, (pp. 58-65).
Effective Clustering Methods for Spatial Data Mining.
2. Ankerst, Mihael, Markus M. Breunig, Hans-Peter
Proceedings of the 20th International Conference on
Kriegel, and Jörg Sander. (1999). OPTICS: ordering
Very Large Data Bases. (pp. 144-155). San Francisco,
points to identify the clustering structure. SIGMOD Rec.
CA: USA: Morgan Kaufmann Publishers Inc.
28 , (pp. 49-60).
16. Rakesh Agrawal , Johannes Gehrke , Dimitrios
3. E. Schikuta. (1996). Grid-Clustering: An Efficient
Gunopulos , Prabhakar Raghavan. (1998). Automatic
Hierarchical Clustering Method for Very Large Data
subspace clustering of high dimensional data for data
Sets. Proceedings of the 13th International Conference
mining applications. Proceedings of the 1998 ACM
on Pattern Recognition, (pp. 101-105).
SIGMOD international conference on Management of
4. Erich Schikuta , Martin Erhart. (1997). The BANG-
data, (pp. 94-105). Seattle, Washington, United States.
Clustering System: Grid-Based Data Analysis.
17. S. Kim, X. Jin, and J. Han. (2008). SpaRClus: Spatial
Proceedings of the Second International Symposium on
Relationship Pattern-Based Hierarchical Clustering.
Advances in Intelligent Data Analysis,Reasoning about
SIAM International Conference on Data Mining - SDM,
Data, (pp. 513-524).
(pp. 49-60).
5. Ester, M., Frommelt, A., Kriegel, H.-P., and Sander, J.
18. Sudipto Guha , Rajeev Rastogi , Kyuseok Shim. (1998).
(1998). Algorithms for characterization and trend
CURE: an efficient clustering algorithm for large
detection in spatial databases. Int. Conf. on Knowledge
databases. Proceedings of the 1998 ACM SIGMOD
Discovery and Data Mining, (pp. 44-50). New York City,
international conference on Management of data, (pp.
NY.
73-84). Seattle.
6. George Karypis , Eui-Hong (Sam) Han , Vipin Kumar.
19. Sudipto Guha, Rajeev Rastogi, Kyuseok Shim. (1999).
(1999). Chameleon: Hierarchical Clustering Using
ROCK: A Robust Clustering Algorithm for Categorical
Dynamic Modeling. Computer, 32, 68-75.
Attributes. Proceedings of the 15th international
7. Gholamhosein Sheikholeslami , Surojit Chatterjee ,
Conference on Data Engineering (pp. 512-521). IEEE
Aidong Zhang. (2000). WaveCluster: a wavelet-based
Computer Society .
clustering approach for spatial data in very large
20. Tian Zhang , Raghu Ramakrishnan , Miron Livny.
databases. The VLDB Journal — The International
(1996). BIRCH: an efficient data clustering method for
Journal on Very Large Data Bases, 8(3-4), 289-304.
very large databases. Proceedings of the 1996 ACM
8. Güting, R. H. (1994). An Introduction to Spatial
SIGMOD international conference on Management of
Database Systems. VLDB:Special Issue on Spatial
data, (pp. 103-114). Montreal, Canada.
Database System, 3(4).
21. Usama Fayyad, Gregory Piatetsky-shapiro,Padhraic
9. Jiawei Han , Yandong Cai , Nick Cercone. (1992 ).
Smyth. (1996). Knowledge Discovery and Data Mining:
Knowledge Discovery in Databases: An Attribute-
Towards a Unifying Framework.
Oriented Approach. Proceedings of the 18th
22. Wang, X., Rostoker, C., and Hamilton, H. J. (2004).
International Conference on Very Large Data Bases, (pp.
Density-based spatial clustering in the presence of
547-559).
obstacles and facilitators. European Conference on
10. Jiyeon Choo, Rachsuda Jiamthapthaksin , Chun-
Principles and Practice of Knowledge Discovery in
sheng Chen, Oner Ulvi Celepcikay, Christian Giusti and
Databases (pp. 446-458). Pisa, Italy: J. Boulicaut, F.
Christoph F. Eick. (2007). C.F.: MOSAIC: A proximity
Esposito, F. Giannotti, and D. Pedreschi, Eds.
graph approach for agglomerative clustering.
23. Wei Wang , Jiong Yang , Richard R. Muntz. (1997).
International Conferenceon Data Warehousing and
STING: A Statistical Information Grid Approach to
Knowledge Discovery. Springer Berlin/Heidelberg.
Spatial Data Mining. Proceedings of the 23rd
11. Jörg Sander , Martin Ester , Hans-Peter Kriegel ,
International Conference on Very Large Data Bases, (pp.
Xiaowei Xu. (1998). Density-Based Clustering in Spatial
186-195).
Databases: The Algorithm GDBSCAN and Its
24. Wei Wang, Jiong Yang, Richard Muntz. (1999). STING+:
Applications. Data Mining and Knowledge Discovery,
An Approach to Active Spatial Data Mining. Proceedings
2(2), 169-194.
of the 15th International Conference on Data
12. M. Ester, H.-P. Kriegel, S. Jörg, and X. Xu. (1996). A
Engineering, (pp. 116-125).
density-based algorithm for discovering clusters in
25. Xiaowei Xu , Martin Ester , Hans-Peter Kriegel , Jörg
large spatial databases with noise. Proceedings of 2nd
Sander. (1998). Distribution-Based Clustering
International Conference on Knowledge Discovery and
Algorithm for Mining in Large Spatial Databases.
Data Mining , (pp. 226-231).
Proceedings of the Fourteenth International Conference
13. M. Ester, H.-P. Kriegel, S. Jörg, and X. Xu. (1996). A
on Data Engineering, (pp. 324-331).
density-based algorithm for discovering clusters in
8
View publication stats