Professional Documents
Culture Documents
BIS 541 Ch04 20-21 S
BIS 541 Ch04 20-21 S
Data Mining
Chapter 4
Cluster Analysis
2020/2021 Spring
1
Chapter 4. Cluster Analysis
2
What is Cluster Analysis?
Cluster: a collection of data objects
Similar to one another within the same cluster
Cluster analysis
Finding similarities between data according to the
distribution
As a preprocessing step for other algorithms
3
What Is Good Clustering?
N N (t t ) 2
Dm i 1 i 1 ip iq
N ( N 1)
6
Typical Alternatives to Calculate the
Distance between Clusters
Single link: smallest distance between an element in one cluster and
an element in the other, i.e., dis(Ki, Kj) = min(tip, tjq)
Medoid: distance between the medoids of two clusters, i.e., dis(K i, Kj)
= dis(Mi, Mj)
Medoid: one chosen, centrally located object in the cluster
7
Single link
***
** *
* *
* * * *
*
** *
Single link
Complete link
***
** *
* *
* * * *
*
** *
complete link
Average link
***
** *
* *
* * * *
*
** *
average link average of
all pairvise distacnes
A medoid of a cluster
****
**
*
**
Scalability
Ability to deal with different types of attributes
Ability to handle dynamic data
Discovery of clusters with arbitrary shape
Minimal requirements for domain knowledge to
determine input parameters
Able to deal with noise and outliers
Insensitive to order of input records
High dimensionality
Incorporation of user-specified constraints
Interpretability and usability
12
Chapter 4. Cluster Analysis
13
Major Clustering Approaches (I)
Partitioning approach:
Construct various partitions and then evaluate them by some criterion,
e.g., minimizing the sum of square errors
Typical methods: k-means, k-medoids, CLARANS
Hierarchical approach:
Create a hierarchical decomposition of the set of data (or objects) using
some criterion
Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON
Density-based approach:
Based on connectivity and density functions
Typical methods: DBSACN, OPTICS, DenClue
Major Clustering Approaches (II)
Grid-based approach:
based on a multiple-level granularity structure
Typical methods: STING, WaveCluster, CLIQUE
Model-based:
A model is hypothesized for each of the clusters and tries to find the best
fit of that model to each other
Typical methods: EM, SOM, COBWEB
Frequent pattern-based:
Based on the analysis of frequent patterns
Typical methods: pCluster
User-guided or constraint-based:
Clustering by considering user-specified or application-specific constraints
Typical methods: COD (obstacles), constrained clustering
Chapter 4. Cluster Analysis
16
General Applications of Clustering
Students
according to grades
Empoyees of a company,
sales performance
17
Examples of Clustering Applications
Marketing: Help marketers discover distinct groups in their
customer bases, and then use this knowledge to develop
targeted marketing programs
Land use: Identification of areas of similar land use in an
earth observation database
Insurance: Identifying groups of motor insurance policy
holders with a high average claim cost
City-planning: Identifying groups of houses according to
their house type, value, and geographical location
Earth-quake studies: Observed earth quake epicenters
should be clustered along continent faults
18
WWW
document classification
Demographic
Economical
Characteristics
Political :general elections 1999,2002
Demographic:population,urbanization rates
Economical:gnp per head,growth rate of gnp
20
Constraint-Based Clustering Analysis
Clustering analysis: less parameters but more user-desired
constraints, e.g., an ATM allocation problem
21
Chapter 4. Cluster Analysis
22
Basic Measures for Clustering
23
Partitioning Algorithms: Basic Concept
Partitioning method: Construct a partition of a database D of n objects
into a set of k clusters, s.t., min sum of squared distance
24
How to find sum of square distance
SSE sum of square error or distance
Given a data set D = {t1, t2, .., tn}
There are K clusters
Make SSE = 0, initially total sum of squaree is 0
For all clusters f from 1 to K
Calculate and store cluster center c ,
f
For each data object i in cluster f
26
The K-Means Clustering Method
Example
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
Assign 3 Update 3
the
3
each
2 2
2
1
objects
1
0
cluster 1
0
0
0 1 2 3 4 5 6 7 8 9 10 to most
0 1 2 3 4 5 6 7 8 9 10 means 0 1 2 3 4 5 6 7 8 9 10
similar
center reassign reassign
10 10
K=2 9 9
8 8
Arbitrarily choose K 7 7
6 6
object as initial 5 5
2
the 3
1 cluster 1
0
0 1 2 3 4 5 6 7 8 9 10
means 0
0 1 2 3 4 5 6 7 8 9 10
27
Example TBP Sec 3.3 page 84 Table 3.6
Instance X Y
1 1.0 1.5
2 1.0 4.5
3 2.0 1.5
4 2.0 3.5
5 3.0 2.5
6 5.0 6.1
k is chosen as 2 k=2
Chose two points at random representing initial
cluster centers
Object 1 and 3 are chosen as cluster centers
28
6
4
5
1* 3*
29
Example cont.
Euclidean distance between point i and j
D(i - j)=( (Xi – Xj)2 + (Yi – Yj)2)1/2
Initial cluster centers
C1:(1.0,1.5) C2:(2.0,1.5) assigned to
D(C1 – 1) = 0.00 D(C2 –1) = 1.00 C1
D(C1 – 2) = 3.00 D(C2 –2) = 3.16 C1
D(C1 – 3) = 1.00 D(C2 –3) = 0.00 C2
D(C1 – 4) = 2.24 D(C2 –4) = 2.00 C2
D(C1 – 5) = 2.24 D(C2 –5) = 1.41 C2
D(C1 – 6) = 6.02 D(C2 –6) = 5.41 C2
C1:{1,2} C2:{3.4.5.6}
30
6
4
5
1* 3*
31
Example cont.
Recomputing cluster centers
For C1:
X
C1 = (1.0+1.0)/2 = 1.0
32
6
2
*
* 4
5
1 3
33
Example cont.
New cluster centers
C1(1.0,3.0) and C2(3.0,3.375) assigned to
D(C1 – 1) = 1.50 D(C2 –1) = 2.74 C1
D(C1 – 2) = 1.50 D(C2 –2) = 2.29 C1
D(C1 – 3) = 1.80 D(C2 –3) = 2.13 C1
D(C1 – 4) = 1.12 D(C2 –4) = 1.01 C2
D(C1 – 5) = 2.06 D(C2 –5) = 0.88 C2
D(C1 – 6) = 5.00 D(C2 –6) = 3.30 C2
C1 {1,2.3} C2:{4.5.6}
34
Example cont.
computing new cluster centers
For C1:
X
C1 = (1.0+1.0+2.0)/3 = 1.33
35
Exercise
36
Commands
Each initial cluster centers may end up with
different final cluster configuration
Finds local optimum but not necessarily the
global optimum
Based on sum of squared error SSE
condition
37
Comments on the K-Means Method
38
Weaknesses of K-Means Algorithm
39
Presence of Outliers
x
x
x xx
x x xx x xx
xx x x x xx
xx x
When k=2
When k=3
When k=2 the two natural clusters are not captured
40
Quality of clustering depends on
unit of measure
income
x income x
x x
x
x
x
x
x x age
Income measured by
YTL,age by year
x x
So what to do?
age Income measured by
TL, age by year
41
Variations of the K-Means Method
42
Exercise
43
A: boundary object
* Dist(A,C1) < Dsit(A,C2)
** *
* ** **** Assigned to cluster 1
* * *A* ****** *
*
* * * ** * *
*
* ***** *
* * ** *
*** ** * **
** *
Cluster I Cluster 2
44
How to chose K
45
Plot of SSE vs. K
Choose k when
SSE
SSE v.s. k curve becomes flat
2 4 6 8 10 12 k
46
Velidation of clustering
or
apply clustering to each of these groups
47
The K-Medoids Clustering Method
48
6
4
5
1* 3*
49
CLARA (Clustering Large Applications) (1990)
CLARA (Kaufmann and Rousseeuw in 1990)
Built in statistical analysis packages, such as S+
It draws multiple samples of the data set, applies PAM on
each sample, and gives the best clustering as the output
Strength: deals with larger data sets than PAM
Weakness:
Efficiency depends on the sample size
A good clustering based on samples will not necessarily
represent a good clustering of the whole data set if the
sample is biased
50
CLARANS (“Randomized” CLARA) (1994)
CLARANS (A Clustering Algorithm based on Randomized
Search) (Ng and Han’94)
CLARANS draws sample of neighbors dynamically
The clustering process can be presented as searching a
graph where every node is a potential solution, that is, a
set of k medoids
If the local optimum is found, CLARANS starts with new
randomly selected node in search for a new local optimum
It is more efficient and scalable than both PAM and CLARA
Focusing techniques and spatial access structures may
further improve its performance (Ester et al.’95)
51
Chapter 4. Cluster Analysis
52
Hierarchical Clustering
9 9 9
8 8 8
7 7 7
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
54
A Dendrogram Shows How the
Clusters are Merged Hierarchically
55
Example
A B C D E
A 0
B 1 0
C 2 2 0
D 2 4 1 0
E 3 3 5 3 0
56
Single Link Distance Measure
1
A B
2 2 1
3 A B
3 C
E 4 5
2 1 C
3 D E
1
D
{A}{B}{C}{D}{E}
{AB}{CD}{E}
1
1 A B
A B 2 2
2 2 3
3 C 3
C E
E 2 1 2
2 1 3 D
D
{ABCDE} 1
{ABCD}{E}
A B C D E
57
Merging Calculations with single link
At a distance 1
{A} and {B} merge – D({A},{B}) = 1
{C} and {D} merge – D({C},{D}) = 1
Clustering structure: {A,B}, {C,D} {E}
2 clusters
Clustering structure: {A,B,C,D} {E}
C D A B E
60
Merging Calculations with complete link
At a distance 1
{A} and {B} merge – D({A},{B}) = 1
{C} and {D} merge – D({C},{D}) = 1
Clustering structure: {A,B}, {C,D} {E}
2 clusters
Clustering structure: {A,B, E,} {C,D}
2 clusters
Clustering structure: {A,B,C,D} {E}
10 10
10
9 9
9
8 8
8
7 7
7
6 6
6
5 5
5
4 4
4
3 3
3
2 2
2
1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
66
More on Hierarchical Clustering Methods
Major weakness of agglomerative clustering methods
do not scale well: time complexity of at least O(n2),
dynamic modeling
67
Chapter 4. Cluster Analysis
68
Probability Based Clustering
69
Simplest Case
K = 2 two clusters
There is one real valued attribute X
Each having Normally distributed
Five parameters to determine:
mean and standard deviation for cluster A
A
A
mean and standard deviation for cluster B
B
B
Sampling probability for cluster A p
A
As P(A) + P(B) = 1
70
There are two populations each normally
distributed with mean and standard deviation
An object comes from one of them but which one
is unknown
Assign a probability that an object comes from
A or B
That is equivalent to assign a cluster for each object
71
In parametrıc statıstıcs
Assume that objects come from a distribution usually
Normal distribution with unknown parameters and
•Estimate the parameters from data
72
Two normal dirtributions
Bi modal
Arbitary shapes can be captured by mixture of normals
73
Variable age
K=3
Clasters C1 C2 C3
X1: age = 25
P(C1/age = 25) = 0.4
P(C2/age = 25) = 0.35
P(C3/age = 25) = 0.25
X2: age = 65
P(C1/age = 65) = 0.1
P(C2/age = 65) = 0.25
P(C3/age = 65) = 0.65
74
Chapter 4. Cluster Analysis
75
Density-Based Clustering Methods
Clustering based on density (local cluster criterion), such
as density-connected points
Major features:
Discover clusters of arbitrary shape
Handle noise
One scan
Need density parameters as termination condition
Several interesting studies:
DBSCAN: Ester, et al. (KDD’96)
76
Density Concepts
Core object (CO)–object with at least ‘M’ objects within a
radius ‘E-neighborhood’
Directly density reachable (DDR)–x is CO, y is in x’s ‘E-
neighborhood’
Density reachable–there exists a chain of DDR objects
from x to y
Density based cluster–density connected objects
maximum w.r.t. reachability
77
Density-Based Clustering: Background
Two parameters:
Eps: Maximum radius of the neighbourhood
MinPts: Minimum number of points in an Eps-
neighbourhood of that point
NEps(p): {q belongs to D | dist(p,q) <= Eps}
Directly density-reachable: A point p is directly density-
reachable from a point q wrt. Eps, MinPts if
1) p belongs to NEps(q)
p MinPts = 5
2) core point condition: q
Eps = 1 cm
|NEps (q)| >= MinPts
78
Density-Based Clustering: Background (II)
Density-reachable:
p
A point p is density-reachable
from a point q wrt. Eps, MinPts if p1
q
there is a chain of points p1, …,
pn, p1 = q, pn = p such that pi+1 is
directly density-reachable from pi
Density-connected
p q
A point p is density-connected to a
point q wrt. Eps, MinPts if there is
a point o such that both, p and q o
are density-reachable from o wrt.
Eps and MinPts.
79
Let MinPts =3
*
* Q
*
*M
* P **
*
* *
*
Outlier
Border
Eps = 1cm
Core MinPts = 5
81
DBSCAN: The Algorithm