Professional Documents
Culture Documents
Ward K Means
Ward K Means
Rachel Ward
But often it is not so clear (especially with data in Rd for d large) ...
k-means clustering
Most popular unsupervised clustering method. Points embedded in
Euclidean space.
k X
2
X
1 X
min
x − x j
C 1 ∪C 2 ∪···∪C k =P |C |
i x ∈C
i=1 x ∈C i
j i
k-means
Spectral
k-means ++ Semidefinite
initialization
relaxation
Outline of Talk
Points within the same cluster are closer to each other than points in
different clusters – simple thresholding of distance matrix.
vs.
k X
2
X
1 X
min
x − x j
P=C 1 ∪C 2 ∪···∪C k
i=1 x ∈C i
|C |
i x ∈C
j i
k
X 1 X
= min Di, j
P=C 1 ∪C 2 ∪···∪C k
`=1
|C` | (i, j ) ∈C
`
k-means clustering
k X
2
X
1 X
min
x − x j
P=C 1 ∪C 2 ∪···∪C k
i=1 x ∈C i
|C |
i x ∈C
j i
k
X 1 X
= min Di, j
P=C 1 ∪C 2 ∪···∪C k
`=1
|C` | (i, j ) ∈C
`
k-means clustering
min hD, Zi
Z ∈R N ×N
subject to {Rank (Z ) = k, λ 1 (Z ) = · · · = λ k (Z ) = 1, Z1 = 1, Z ≥ 0}
min hD, Zi
Z ∈R N ×N
subject to {Rank (Z ) = k, λ 1 (Z ) = · · · = λ k (Z ) = 1, Z1 = 1, Z ≥ 0}
min hD, Zi
subject to {Tr(Z) = k, Z 0, Z1 = 1, Z ≥ 0}
I Matrix factorizations
I [Recht, Fazel, Parrilo ’10] Low-rank matrix recovery
I Many more...
Stability of k-means SDP
Stability of k-means SDP
Recall SDP:
min hD, Zi
Z ∈R N ×N
subject to {Rank (Z ) = k, λ 1 (Z ) = · · · = λ k (Z ) = 1, Z1 = 1, Z ≥ 0}
min hD, Zi
subject to {Tr(Z) = k, Z 0, Z1 = 1, Z ≥ 0}
min hD, Zi
subject to {Tr(Z) = k, Z 0, Z1 = 1, Z ≥ 0}
Mentioned papers:
1. Relax, no need to round: integrality of clustering formulations
with P. Awasthi, A. Bandeira, M. Charikar, R. Krishnaswamy,
and S. Villar. ITCS, 2015.
2. Stability of an SDP relaxation of k-means. D. Mixon, S. Villar,
R. Ward. Preprint, 2016.