Professional Documents
Culture Documents
Pert 05 A Clustering - Overview Aplikasi - 66
Pert 05 A Clustering - Overview Aplikasi - 66
Rong Jin
Outline
K means for clustering
w w
Expectation Maximization algorithm for clustering
age
Application (I): Search Result Clustering
Application (II): Navigation
Application (III): Google News
Application (III): Visualization
Islands of music
(Pampalk et al., KDD’ 03)
Application (IV): Image Compression
http://www.ece.neu.edu/groups/rpl/kmeans/
How to Find good Clustering?
Minimize the sum of
distance within clusters
C1
26n
a
r
g
min
m, x
ij i
Cj C2
C,
m j
1i
j ij
, 1
1 xi thej-thcluster C4
mi, j C5
0 xi thej-thcluster
6 C3
mi, j 1
j1
anyxi asinglecluster
How to Efficiently Clustering Data?
2
6n
a
r
g
min
m, x
ij i
Cj
C,
m j
j ij
, 1i
1
Memberships mi , j and centers C j are correlated.
2
1 j arg min( xi C j )
Given centers {C j }, mi , j k
0 otherwise
n
mi, j xi
Given memberships mi , j , Cj i 1
n
mi, j
i 1
K-means for Clustering
K-means
Start with a random
guess of cluster
centers
Determine the
membership of each
data points
Adjust the cluster
centers
K-means for Clustering
K-means
Start with a random
guess of cluster
centers
Determine the
membership of each
data points
Adjust the cluster
centers
K-means for Clustering
K-means
Start with a random
guess of cluster
centers
Determine the
membership of each
data points
Adjust the cluster
centers
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
(Thus each Center “owns” a
set of datapoints)
K-means
1. Ask user how many clusters
they’d like. (e.g. k=5)
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
4. Each Center finds the
centroid of the points it
owns
K-means
1. Ask user how many clusters
Computational Complexity:
they’d like. (e.g. k=5)O(N)
where N is the number of points?
2. Randomly guess k cluster
Center locations
3. Each datapoint finds out
which Center it’s closest to.
4. Each Center finds the
centroid of the points it
owns
xi j
1
p( j ) exp 2
2
d /2
j 2 2 2
Learning a Gaussian Mixture
(with known covariance)
Probability p( x xi )
p ( x xi ) p ( x xi , j ) p ( j ) p ( x xi | j )
j j
xi j
1
p( j ) exp 2
2
d /2
j 2 2 2
Log-likelihood of data
x
i
l
og
p(
x
x)
l
o
g(p)
1 i j2
ex
p
j
d/
2 2
i i
j 22
2
Apply MLE to find optimal parameters p
(, j
j)
j
Learning a Gaussian Mixture
(with known covariance)
E-Step E[ zij ] p( j | x xi )
p( x xi | j ) p( j )
k
p( x xi | n ) p( j )
n 1
1 2
( xi j )
e 2 2 p( j )
1
k ( xi n )2
e
2
2 p( n )
n 1
Learning a Gaussian Mixture
(with known covariance)
1 m
M-Step j m E[ zij ]xi
i 1
ij
E[ z ]
i 1
1 m
p( j ) E[ zij ]
m i 1
Gaussian Mixture Example: Start
After First Iteration
After 2nd Iteration
After 3rd Iteration
After 4th Iteration
After 5th Iteration
After 6th Iteration
After 20th Iteration
Mixture Model for Doc Clustering
A set of language models 1 , 2 ,..., K
i { p( w1 | i ), p( w2 | i ),..., p( wV | i )}
Mixture Model for Doc Clustering
A set of language models 1 , 2 ,..., K
i { p( w1 | i ), p( w2 | i ),..., p( wV | i )}
Probability p(d di )
p ( d di ) p (d di , j )
j
p ( j ) p(d di | j )
j
V tf ( wk , di )
p ( j ) p ( wk | j )
j k 1
Mixture Model for Doc Clustering
A set of language models 1 , 2 ,..., K
i { p( w1 | i ), p( w2 | i ),..., p( wV | i )}
Probability p(d di )
p ( d di ) p (d di , j )
j
p ( j ) p(d di | j )
j
V tf ( wk , di )
p ( j ) p ( wk | j )
j k 1
Mixture Model for Doc Clustering
1 , 2 ,..., K
A set of language models Introduce hidden variable zij
i { p( w1 | i ), p( w2 | i ),...,
p( wV | i )}
zij: document di is generated by the
j-th language model j.
Probability p ( d d )
i
p ( d d i ) p ( d di , j )
j
p( j ) p(d di | j )
j
V tf ( wk , di )
p ( j ) p ( wk | j )
j k 1
Learning a Mixture Model
E
[z
ij]p
( j|dd
i)
E-Step
p
(di|
d j)p
(
j)
K
p(di|
d
n)p
(
n)
n
1
V tf ( wk , di )
p(wm | j ) p( j )
m 1
K V
tf ( wk , di )
p(wm | n ) p( n )
n 1 m 1
M-Step p
(wi |j)k
1
N
E[z
ij] d
k
k
1
N
1
p( j )
N
E[ zij ]
i 1
N: number of documents
Examples of Mixture Models
Other Mixture Models
Probabilistic latent semantic index (PLSI)
Latent Dirichlet Allocation (LDA)
Problems (I)
Both k-means and mixture models need to compute
centers of clusters and explicit distance measurement
Given strange distance measurement, the center of
clusters can be hard to compute
E.g., x x ' max x1 x1' , x2 x2' ,..., xn xn'
x y
xy
xz
z
Problems (II)
Both k-means and mixture models look for compact
clustering structures
In some cases, connected clustering structures are more desirable
Graph Partition
MinCut: bipartite graphs with minimal number of
cut edges
CutSize = 2
2-way Spectral Graph Partitioning
Weight matrix W
wi,j: the weight between two
vertices i and j
Membership vector q
1 i Cluster A
qi
-1 i Cluster B
1
2
CutSize J qi q j wi , j
4 i, j
Solving the Optimization Problem
1
2
q arg min qi q j wi , j
q[ 1,1]n 4 i , j
( D W )q 2q
Graph Laplacian
L D W : W wi , j , D i , j
j
wi ,
j
s ( A, B ) s ( A, B) ddj
J j
dA dB
Minimize the similarity between clusters and meanwhile
maximize the similarity within clusters
s( A, B) s( A, B) dB d A
J wi , j
dA dB iA jB d AdB
2
B A
d d
wi , j
d/
BdAd if i
A
i A jB d AdB d q(i
)
d
A/
d dif
B iB
2
wi , j qi q j
i j
Normalized Cut
2
J wi , j qi q j qT (D - W )q
i j
d B / d A d if i A
qi
d A / d B d if i B
Normalized Cut
2
J wi , j qi q j qT (D - W )q
i j
d B / d A d if i A
qi
d A / d B d if i B
Solution: (
DW
)
qDq
Image Segmentation
Non-negative Matrix Factorization