Professional Documents
Culture Documents
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Graphs and Matrices: Adjacency Matrix
where A(i, j) = aij denotes the similarity or affinity between points x i and x j .
We require the similarity to be symmetric and non-negative, that is, aij = aji and
aij ≥ 0, respectively.
The matrix A is the weighted adjacency matrix for the data graph. If all affinities
are 0 or 1, then A represents the regular adjacency relationship between the
vertices.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Iris Similarity Graph: Mutual Nearest Neighbors
|V | = n = 150, |E | = m = 1730
bC
bC
bC bC
bC bC bC
bC bC
bC bC
bC bC bC
bC bC bC bC
bC bC bC
bC bC
bC bC bC
bC
bC bC bC bC
bC bC
bC bC bC bC
bC bC
bC bC
bC bC
bC bC bC bC
bC bC
bC
bC bC
bC bC bC
bC
bC bC
bC bC bC bC bC bC
bC bC bC bC
bC bC
bC bC bC
bC bC bC
bC bC bC
bC bC bC
bC bC
bC bC bC
bC bC
bC bC bC
bC bC
bC bC
bC bC bC bC
bC bC
bC bC bC
bC bC bC bC bC bC
bC bC
bC
bC bC bC bC bC
bC bC bC
bC bC bC
bC bC bC bC
bC bC bC
bC bC bC bC
bC bC bC bC bC
bC bC
bC
bC bC
bC bC bC
bC
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Graphs and Matrices: Degree Matrix
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Graphs and Matrices: Normalized Adjacency Matrix
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Example Graph
Adjacency and Degree Matrices
1 6
2 4 5
3 7
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Example Graph
Normalized Adjacency Matrix
1 6
2 4 5
3 7
L = ∆−A
Pn
0 ··· 0
j =1 a1j Pn a11 a12 ··· a1n
0 j =1 a2j ··· 0 a21 a22 ··· a2n
= .. .. .. − .. .. ..
..
. . . . . ··· .
Pn .
0 0 ··· j =1 anj
an 1 an 2 ··· ann
P
j 6=1 a1j −a ··· −a1n
P 12
−a21 j 6=2 a2j ··· −a2n
= .. .. ..
. . ··· P .
−an1 −an2 ··· j 6=n anj
Laplacian matrix is usually used when the graph is irregular. Its main advantage is
that it enables us to study certain properties of irregular graphs (unlike adjacent
matrix).
Given a graph function, the Laplacian measures how much the function on a
graph differs at a vertex from the average of the values of the same function over
the neighbors of the vertex.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Example Graph: Laplacian Matrix
1 6
2 4 5
3 7
1 6
2 4 5
3 7
1 6
2 4 5
3 7
XX
W (S, T ) = aij
vi ∈ S vj ∈ T
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Cuts and Matrix Operations
Given a clustering C = {C1 , . . . , Ck } comprising k clusters. Let c i ∈ {0, 1}n be the cluster
indicator vector that records the cluster membership for cluster Ci , defined as
(
1 if vj ∈ Ci
cij =
0 if vj 6∈ Ci
|Ci | = c Ti c i = kc i k2
The volume of a cluster Ci is defined as the sum of all the weights on edges with one
end in cluster Ci :
X X X n
n X
vol(Ci ) = W (Ci , V ) = dr = cir dr cir = cir ∆rs cis = c Ti ∆c i
vr ∈Ci vr ∈Ci r =1 s =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Cuts and Matrix Operations
X X X n
n X
W (Ci , Ci ) = ars = cir ars cis = c Ti Ac i
vr ∈Ci vs ∈Ci r =1 s =1
We can get the sum of weights for all the external edges as follows:
X X
W (Ci , Ci ) = ars = W (Ci , V ) − W (Ci , Ci ) = c i (∆ − A)c i = c Ti Lc i
vr ∈Ci vs ∈V −Ci
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Clustering Objective Functions: Ratio Cut
The clustering objective function can be formulated as an optimization problem
over the k-way cut C = {C1 , . . . , Ck }.
The ratio cut objective is defined over a k-way cut as follows:
k k k
X W (Ci , Ci ) X c T Lc i
i
X c T Lc i
i
min Jrc (C) = = = 2
C
i =1
|Ci | i =1
c Ti c i i =1 kc i k
Ratio cut tries to minimize the sum of the similarities from a cluster Ci to other
points not in the cluster Ci , taking into account the size of each cluster.
Unfortunately, for binary cluster indicator vectors c i , the ratio cut objective is
NP-hard. An obvious relaxation is to allow c i to take on any real value. In this
case, we can rewrite the objective as
k k T k
c Ti Lc i X c i
X
X ci
min Jrc (C) = 2
= L = u Ti Lu i
C
i =1 kc i k i =1
kc ik kc ik
i =1
Normalized cut is similar to ratio cut, except that it divides the cut weight of each
cluster by the volume of a cluster instead of its size. The objective function is
given as
k k
X W (Ci , Ci ) X c Ti Lc i
min Jnc (C) = =
C
i =1
vol(Ci ) c T ∆c i
i =1 i
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Spectral Clustering Algorithm
The spectral clustering algorithm takes a dataset D as input and computes the
similarity matrix A. For normalized cut we chose either Ls or La , whereas for ratio
cut we choose L. Next, we compute the k smallest eigenvalues and corresponding
eigenvectors of the chosen matrix.
The main problem is that the eigenvectors u i are not binary, and thus it is not
immediately clear how we can assign points to clusters.
One solution to this problem is to treat the n × k matrix of eigenvectors as a new
data matrix:
— y T1 —
| | | — y T2 —
U = u n u n−1 · · · u n−k +1 → normalize rows → .. =Y
| | |
.
T
— yn —
We then cluster the new points in Y into k clusters via the K-means algorithm or
any other fast clustering method to obtain binary cluster indicator vectors c i .
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Spectral Clustering Algorithm
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Spectral Clustering on Example Graph
k = 2, normalized cut (normalized asymmetric Laplacian)
1 6
2 4 5
3 7
u2
5 bC 6, 7
bC
0.5
1, 3
−0.5 bC 4
bC
2
bC
−1 u1
−1 −0.9 −0.8 −0.7 −0.6
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Normalized Cut on Iris Graph
k = 3, normalized asymmetric Laplacian
bC
bC
bC bC
bC bC bC
bC bC
bC bC
bC bC rS
bC bC bC bC
bC bC bC
bC rS
bC bC rS
bC
bC bC bC rS
bC bC
bC bC bC rS
bC rS
bC rS
bC rS
bC bC rS rS
bC rS
bC
bC rS
bC bC bC
rS
bC rS
bC bC bC bC rS rS
bC bC bC rS
bC rS
bC rS rS
bC bC rS
bC bC bC
bC uT rS
bC rS
uT rS rS
uT rS
uT rS rS
rS rS
uT rS
uT uT uT rS
uT uT
uT rS rS
uT uT uT uT uT uT
uT rS
rS
uT uT uT uT uT
uT uT uT
uT uT uT
uT uT uT uT
uT uT uT
uT uT uT uT
uT uT uT uT uT
uT uT
uT
uT uT
uT uT uT
uT
k k k
X W (Ci , Ci ) X c T Ac i
i
X
max Jaw (C) = = = u Ti Au i
C
i =1
|Ci | i =1
c Ti c i i =1
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Markov Chain Clustering
Further, we assume that the Markov chain is homogeneous, that is, the transition
probability
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Markov Chain Clustering: Markov Matrix
M t −1 · M = M t
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Markov Chain Clustering: Random Walk
π Tt = π Tt−1 M = π Tt−2 M 2 = · · · = π T0 M t
π t = (M t )T π 0 = (M T )t π 0
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Markov Clustering Algorithm
Consider a variation of the random walk, where the probability of transitioning
from node i to j is inflated by taking each element mij to the power r ≥ 1. Given
a transition matrix M, define the inflation operator Υ as follows:
n
(m )r
Υ(M, r ) = Pn ij r
a=1 (mia ) i ,j =1
The net effect of the inflation operator is to increase the higher probability
transitions and decrease the lower probability transitions.
The Markov clustering algorithm (MCL) is an iterative method that interleaves
matrix expansion and inflation steps. Matrix expansion corresponds to taking
successive powers of the transition matrix, leading to random walks of longer
lengths. On the other hand, matrix inflation makes the higher probability
transitions even more likely and reduces the lower probability transitions.
MCL takes as input the inflation parameter r ≥ 1. Higher values lead to more,
smaller clusters, whereas smaller values lead to fewer, but larger clusters.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Markov Clustering Algorithm: MCL
The final clusters are found by enumerating the weakly connected components in
the directed graph induced by the converged transition matrix M t , where the
edges are defined as:
E = (i, j) | M t (i, j) > 0
A directed edge (i, j) exists only if node i can transition to node j within t steps
of the expansion and inflation process.
A node j is called an attractor if M t (j, j) > 0, and we say that node i is attracted
to attractor j if M t (i, j) > 0. The MCL process yields a set of attractor nodes,
Va ⊆ V , such that other nodes are attracted to at least one attractor in Va .
To extract the clusters from Gt , MCL first finds the strongly connected
components S1 , S2 , . . . , Sq over the set of attractors Va . Next, for each strongly
connected set of attractors Sj , MCL finds the weakly connected components
consisting of all nodes i ∈ Vt − Va attracted to an attractor in Sj . If a node i is
attracted to multiple strongly connected components, it is added to each such
cluster, resulting in possibly overlapping clusters.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Algorithm Markov Clustering
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
MCL Attractors and Clusters
r = 2.5
1 6
2 4 5
3 7
1 6
1 0.5
1
2 4 5 0.5
1 0.5
3 7
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
MCL on Iris Graph
Comparison between two values of r
bC bC
bC bC
bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC
bC bC rS bC bC bC
bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC
bC rS bC bC
bC bC rS bC bC bC
bC bC
bC bC bC rS bC bC bC bC
bC bC bC bC
bC bC bC rS bC bC bC bC
bC rS bC bC
bC rS bC bC
bC rS bC bC
bC bC rS rS bC bC bC bC
bC rS bC bC
bC bC
bC rS bC bC
bC bC bC bC bC bC
rS bC
bC rS bC bC
bC bC bC bC rS rS bC bC bC bC bC bC
bC bC bC rS bC bC bC bC
bC rS bC bC
bC rS rS bC bC bC
bC bC rS bC bC bC
bC bC bC bC bC bC
bC bC rS bC bC bC
bC rS bC bC
bC rS rS bC bC bC
bC rS bC bC
uT rS rS bC bC bC
rS rS bC bC
uT rS bC bC
uT uT uT rS bC bC bC bC
uT uT bC bC
uT rS rS bC bC bC
uT uT uT uT uT uT bC bC bC bC bC bC
uT rS bC bC
rS bC
uT uT uT uT uT bC bC bC bC bC
uT uT uT bC bC bC
uT uT uT bC bC bC
uT uT uT uT bC bC bC bC
uT uT uT bC bC bC
uT uT uT uT bC bC bC bC
uT uT uT bC bC bC
uT uT bC bC
uT uT bC bC
uT bC
uT uT bC bC
uT uT uT bC bC bC
uT bC
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Contingency Table: MCL Clusters versus Iris Types
r = 1.3
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info
1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 16: Spectral & Graph Clustering