Cluster

Clustering
Concepts & Methods

Clustering
• Cluster: A collection of data objects
• Clustering: process of grouping a set of data objects into multiple
groups or clusters so that
– objects within a cluster have high similarity
– but are very dissimilar to objects in other clusters
• Unsupervised learning
– no predefined classes (i.e., learning by observations)
• Purpose: summarize data – increases ease of interpretability
– loss of detailed information
Clustering
• Applications
– As a stand-alone tool to get insight into data distribution
– Business intelligence – market segmentation, CRM
– Image recognition – handwritten character recognition
– Web search
Customers
– As a preprocessing step for other algorithms
– Outlier detection - credit card fraud
• Challenges
– Scalability
– Ability to deal with different types of attributes
– Discovery of clusters with arbitrary shape
– Requirements for domain knowledge to determine input parameters
– Ability to deal with noisy data
Considerations for Cluster Analysis
• Partitioning criteria
– Single level vs. hierarchical partitioning (often, multi-level hierarchical
partitioning is desirable)
• Separation of clusters
– Exclusive (e.g., one customer belongs to only one region) vs. non-
exclusive (e.g., one document may belong to more than one class)
• Similarity measure
– Distance-based (e.g., Euclidian, road network, vector) vs. connectivity-
based (e.g., density or contiguity)
• Clustering space
– Full space (often when low dimensional) vs. subspaces (often in high-
dimensional clustering)
Major Clustering Approaches
• Partitioning approach
– Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
– Typical methods: k-means, k-medoids, CLARANS
• Hierarchical approach
– Create a hierarchical decomposition of the set of data (or objects)
using some criterion
– Typical methods: Diana, Agnes, BIRCH, CAMELEON
• Density-based approach
– Based on connectivity and density functions; can find arbitrarily
shaped clusters
– Typical methods: DBSACN, OPTICS, DenClue
• Grid-based approach
– Based on a multiple-level granularity structure; fast processing time
– Typical methods: STING, WaveCluster, CLIQUE
• Density-based approach
– Clusters are dense regions in the data space, separated by regions of lower density of
points
– Goal is to identify dense regions; measured by number of objects close to a given point
– Any point x in the data set, with a neighbor count greater than or equal to Min_Pts, is
marked as a core point
– x is border point, if the number of its neighbors is less than Min_Pts, but it belongs to
the ϵ-neighborhood of some core point z
– If a point is neither a core nor a border point, then it is called a noise point or an outlier
• Grid-based approach
1. Partitioning the data space into a finite number of cells
2. Calculating the cell density for each cell
3. Sorting of the cells according to their densities
4. Identifying cluster centres (blocks with the highest density )
5. Traversal of neighbor cells
Partitioning Algorithms
• Given a data set, D, of n objects, and k, the number of clusters
to form, a partitioning algorithm organizes the objects into k
partitions (k<=n), where each partition represents a cluster
• Sum of squared distances is minimized (where ci is the centroid
or medoid of cluster Ci)
• Find a partition of k clusters that optimizes the chosen

partitioning criterion
– Global optimal: exhaustively enumerate all partitions
– Heuristic methods: k-means and k-medoids algorithms
– k-means: Each cluster is represented by the center of the cluster
– k-medoids or PAM (Partition around medoids): Each cluster is
represented by one of the objects in the cluster
k-Means Clustering - Illustration
• k=3
• Step 1: Initialize cluster centers
– randomly pick three points C1, C2 and C3
• Step 2: Assign observations to the closest
cluster center
– For each point compute distance to each cluster
center
– Assign each point to the clusters based on the
minimum distance to the cluster center
• Step 3: Revise cluster centers as mean of
assigned observations
• Step 4: Repeat Step 2 and Step 3 until
convergence
k-Means: Comments
• Choosing the right k
– Elbow method
• Centroid Initialisation
– k-means++
• Sensitive to outliers
k-Medoids Clustering Method
• Instead of taking the mean value of the objects in a cluster as
a reference point, actual object is used
• Absolute-error criterion is used
10 10
9 9
8 8
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
k-means k-medoids
k-Medoid Clustering Method
• Find representative objects (medoids) in clusters
• PAM (Partitioning Around Medoids)

– Starts from an initial set of medoids and iteratively replaces
one of the medoids by one of the non-medoids if it
improves the total distance of the resulting clustering
– PAM works effectively for small data sets, but does not
scale well for large data sets (due to the computational
complexity)
– O(k(n-k)2) for each iteration, n = # data, k= # clusters
k-Medoids Algorithm: Illustration
Total Cost = 20
10 10 10
9 9 9
8 8 8
7 7
Arbitrary Assign
7
6 6 6
5
choose k 5 each 5
4 object as 4 remaining 4
3
initial 3
object to 3
2
medoids 2
nearest 2
medoids
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
K=2 Randomly select a

Total Cost = 26 nonmedoid object, Oramdom
10 10
Do loop 9
8
Compute
9
8
Swapping O total cost of
Until no change
7 7
and Oramdom 6
swapping 6
5 5
If quality is 4 4
improved. 3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Hierarchical Clustering
• Group data objects into a hierarchy
• Produces a set of nested clusters organized as a hierarchical tree
• Dendogram - A tree structure representing the sequence of merging
decisions
• Useful for data summarization and visualization
• Does not require the number of clusters k as an input
• Needs a termination condition
• Methods – agglomerative, divisive, BIRCH, Chameleon
Error
Data objects
Hierarchical Clustering
• Two main types of hierarchical clustering
– Agglomerative: bottom-up (merging) fashion
1. Start with the points as individual clusters
2. At each step, merge the closest pair of clusters until only one
cluster (or k clusters) left
– Divisive: top-down (splitting) fashion

1. Start with one, all-inclusive cluster
2. At each step, split a cluster until each cluster contains a point (or
there are k clusters)
– Requires at most n iterations
Agglomerative vs. Divisive
How to Define Inter-Cluster Similarity
• MIN
• MAX
• Group Average
• Distance Between Centroids
• Other methods driven by an objective
function
– Ward’s Method uses squared error
(ESS)
Ward’s Method - Example
• Five customers – A, B, C, D, E
• Ratings provided – 2, 5, 9, 10, 15, respectively, on a 20-point scale
• Cluster customers based on ratings
• Stage 1:
– Five clusters of one, ESS = 0
– No loss of information since there is no
clustering
• Stage 2:
– Combining C and D as they are closest; Centroid = 9.5
– Four cluster solution: {A, B, {C,D}, E}
– ESS = 0 + 0 + [(9-9.5)2 + (10-9.5)2] + 0 = 0.5
• Stage 3:
– ESS for the solution {{A,B}, {C,D}, E} = 5.0
– ESS for the solution {A, B, {C,D,E}} = 20.7
– ESS for the solution {{A,E}, {C,D}, B} = 85.0
– ESS for the solution {A, {B,E}, {C,D}} = 50.5
– ESS for the solution {A, {B,C,D}, E} = 14.0
– ESS for the solution {{A,C,D}, B, E} = 50.5
Ward’s Method - Example
• Stage 4:
– ESS for the solution {{A,B,C,D}, E} = 41.0
– ESS for the solution {{A,B,E}, {C,D}} = 93.2
– ESS for the solution {{A,B}, {C,D,E}} = 25.2
• Stage 5:
– ESS for the solution {{A,B, C,D,E}} = 98.8
Ward’s Method - Comments
• Clusters from previous stage are never taken apart
• Less sensitive to outliers
• Spherical tightly bound clusters - biased towards globular
clusters
• Can be used to decide value of k

Cluster

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster

Uploaded by

Copyright:

Available Formats

Clustering

Concepts & Methods

• Find a partition of k clusters that optimizes the chosen

• PAM (Partitioning Around Medoids)

K=2 Randomly select a

– Divisive: top-down (splitting) fashion

You might also like