Professional Documents
Culture Documents
Segmentation Using Superpixels: A Bipartite Graph Partitioning Approach
Segmentation Using Superpixels: A Bipartite Graph Partitioning Approach
Abstract
Figure 2. The proposed bipartite graph model with K over- 3.3. Transfer Cuts
segmentations of an image. A black dot denotes a pixel while a
red square denotes a superpixel. The above methods do not exploit the structure that the
bipartite graph can be unbalanced, i.e., NX ̸= NY . In our
case, NX = NY + |I|, and |I| ≫ NY in general. Thus
has a distinct bipartite structure, which allows highly effi- we have NX ≫ NY . This unbalance can be translated into
cient graph partitioning as shown later. Besides, our graph the efficiency of partial SVDs without losing accuracy. One
model is much sparser because pixels are only connected way is by using the dual property of SVD that a left sin-
to superpixels, while in [15] and [5], neighboring pixels are gular vector can be derived from its right counterpart, and
also connected to each other. Common to the three methods vice versa5 . In this paper, we pursue a “sophisticated” path
is that they all incorporate long-range grouping cues. which not only exploits such unbalance but also sheds light
3.2. Bipartite Graph Partitioning on spectral clustering when operated on bipartite graphs.
Specifically, we reveal an essential equivalence between
Given the above bipartite graph G = {X, Y, B}, the task the eigen-problem (4) on the original bipartite graph and the
is to partition it into k groups, where k is manually speci- one on a much smaller graph over superpixels only:
fied. Each group includes a subset of pixels and/or super-
pixels, and the pixels from one group form a segment. Var- LY v = λDY v, (5)
ious techniques can be employed for such a task. Here we
use spectral clustering [26, 22] since it has been success- where LY = DY − WY , DY = diag(B ⊤ 1), and WY =
−1
fully applied to a variety of applications including image B ⊤ DX B. LY is exactly the Laplacian of the graph GY =
segmentation [26]. {Y, WY } because DY = diag(B ⊤ 1) = diag(WY 1). It
Spectral clustering captures essential cluster structure of should be noted that our analysis in this section applies to
a graph using the spectrum of graph Laplacian. Mathemat- spectral clustering on any bipartite graph.
ically, it solves the generalized eigen-problem [26]: Our main result states that the bottom k eigenvectors of
Lf = γDf , (4) (4) can be obtained from the bottom k eigenvectors of (5),
which can be computed efficiently given the much smaller
where L := D − W is the graph Laplacian and D := scale of (5). Formally, we have the following Theorem 1.
diag(W 1) is the degree matrix with W denoting the affini-
ty matrix of the graph and 1 a vector of ones of appropriate Theorem 1. Let {(λi , vi )}ki=1 be the bottom k eigen-
size. For a k-way partition, the bottom3 k eigenvectors are pairs of the eigen-problem (5) on the superpixel graph
−1
computed. Clustering is performed by applying k-means GY = {Y, B ⊤ DX B}, 0 = λ1 ≤ · · · ≤ λk < 1.
[22] or certain discretization technique [30]. Then {(γi , fi )}i=1 are the bottom k eigenpairs of the eigen-
k
To solve (4), one can simply treat the bipartite graph problem (4) on the bipartite graph G = {X, Y, B}, where
constructed above as a general sparse4 graph and apply the 0 ≤ γi < 1, γi (2 − γi ) = λi , and fi = (u⊤ ⊤ ⊤
i , vi ) with
1 −1
Lanczos method [13] to W̄ := D−1/2 W D−1/2 , the nor- ui = 1−γi P vi . Here P := DX B is the transition proba-
malized affinity matrix, which takes O(k(NX + NY )3/2 ) bility matrix from X to Y .
running time empirically [26].
Alternatively, it was shown, in the literature of document Proof. See the Appendix.
clustering, that the bottom k eigenvectors of (4) can be de-
It is interesting to note that, if B is by construction a
rived from the top k left and right singular vectors of the
transition probability matrix from Y to X (i.e. non-negative
3 Bottom (top) eigenvectors refer to the ones with smallest (largest)
eigenvalues. Similar arguments apply to singular vectors. 5 Let v and u be the i-th right and left singular vector of B̄ with
i i
4 In our bipartite graph model, a pixel is connected to only K superpix- singular value λi . Then B̄vi = λi ui The top k right singular vectors
els for K over-segmentations of an image. Given K ≪ (NX + NY ), the v1 , · · · , vk are exactly the top k eigenvectors of B̄ ⊤ B̄, which can be
graph {V, W } is actually highly sparse. In our experiments in Section 4, computed efficiently by the Lancsoz method or even a full eigenvalue de-
K = 5 or 6, and NX + NY > 481 × 321 = 154401. composition given the much smaller size (NY × NY ) of B̄ ⊤ B̄.
Table 1. Complexities of eigen-solvers on bipartite graphs 10
Ours
25
Ours
SVD
Algorithm Complexity 8
SVD
Lanczos 20 Lanczos
Second
Second
SVD [31]
3/2 4 10
Ours 2k(1 + dX )NX operations + O(kNY )
2 5
0 0
10,000 30,000 50,000 70,000 90,000 50 200 500 1000 2000
and the sum of the entries in each column is 1.), then the in- Number of rows Number of columns
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l)
Figure 6. Segmentation with different combinations of superpixels. (a) Input image. (b) SAS(1, MS). (c) SAS(2, MS). (d) SAS(3, MS). (e)
SAS(1, FH2). (f) SAS(2, FH2). (g) SAS(3, FH2). (h) SAS(1, MS & FH2). (i) SAS(2, MS & FH2). (j) SAS(3, MS & FH2). (k) MLSS. (l)
TBES. Notations: SAS(i, MS) denotes SAS using the first i layers superpixels of Mean Shift; SAS(i, MS & FH2) denotes SAS using both
the first i layers superpixels of Mean Shift and FH2.
in terms of both quantitative and perceptual criteria. Future be used later to establish correspondence between the eigen-
work should study the selection of superpixels more sys- values of (4) and (5).
tematically and the incorporation of high-level cues.
Lemma 2. Given 0 ≤ λ < 1, the value of γ satisfying
Acknowledgments. This work is supported in part by Of-
0 ≤ γ < 1 and γ(2 − γ) = λ exists and is unique. Such γ
fice of Naval Research (ONR) grant #N00014-10-1-0242.
increases along with λ.
Appendix. To prove Theorem 1, we need several results.
The next lemma describes the distribution of the eigenval- Proof. Define h(γ) := γ(2 − γ), γ ∈ [0, 1]. Then h(·) is
ues of (4). continuous, and h(0) = 0, h(1) = 1. Since 0 ≤ λ < 1,
there must exist 0 ≤ γ < 1 such that γ(2 − γ) = λ. The
Lemma 1. The eigenvalues of (4) are in [0, 2]. Particularly, uniqueness comes from the strictly increasing monotonicity
if γ is its eigenvalue, so is 2 − γ. of h(·) in [0, 1]. For the same reason, γ satisfying γ(2 −
Proof. From (4), we have D−1/2 LD−1/2 (D1/2 f ) = γ) = λ increases when λ increases.
γ(D1/2 f ), meaning that γ is an eigenvalue of (4) iff it The following theorem states that an eigenvector of (4),
is an eigenvalue of the normalized graph Laplacian L̄ := if confined to Y , will be an eigenvector of (5).
D−1/2 LD−1/2 , whose eigenvalues are in [0, 2] [3].
Direct calculation shows that if (γ, f ) is an eigenpair of Theorem 2. If Lf = γDf , then LY v = λDY v, where
(4), so is (2 − γ, (f |⊤ ⊤ ⊤
X , −f |Y ) ).
λ := γ(2 − γ), v := f |Y .
By Lemma 1, the eigenvalues of (4) are distributed sym- Proof. Denote u := f |X . From Lf = γDf , we have (a)
metrically w.r.t. 1, in [0, 2], implying that half of the eigen- DX u − Bv = γDX u and (b) DY v − B ⊤ u = γDY v.
1 −1
values are not larger than 1. Therefore, it is reasonable to From (a), we have u = 1−γ DX Bv. Substituting it into
⊤ −1
assume that the k smallest eigenvalues are less than 1, as (b) yields (DY − B DX B)v = γ(2 − γ)DY v, namely,
k ≪ (NX + NY )/2 in general. The following lemma will LY v = λDY v.
The converse of Theorem 2 is also true, namely, an [9] M. Donoser, M. Urschler, M. Hirzer, and H. Bischof. Salien-
eigenvector of (5) can be extended to be one of (4), as de- cy driven total variation segmentation. In ICCV, 2009. 2,
tailed in the following theorem. 5
[10] P. Felzenszwalb and D. Huttenlocher. Efficient graph-based
Theorem 3. Suppose LY v = λDY v with 0 ≤ λ < 1. Then image segmentation. IJCV, 59(2):167–181, 2004. 1, 5
Lf = γDf , where γ satisfies 0 ≤ γ < 1 and γ(2 − γ) = λ, [11] X. Fern and C. Brodley. Solving cluster ensemble problems
−1
and f = (u⊤ , v⊤ )⊤ with u = 1−γ1
DX Bv. by bipartite graph partitioning. In ICML, 2004. 2
[12] J. Freixenet, X. Muñoz, D. Raba, J. Martı́, and X. Cufı́. Yet
Proof. First, by Lemma 2, γ exists and is unique. To prove another survey on image segmentation: Region and bound-
Lf = γDf , it suffices to show (a) DX u − Bv = γDX u ary information integration. In ECCV, 2002. 5
and (b) DY v − B ⊤ u = γDY v. (a) holds because u = [13] G. Golub and C. Van Loan. Matrix computations. Johns
1 −1 Hopkins University Press, 1996. 3
1−γ DX Bv is known. Since LY v = λDY v, λ = γ(2−γ),
−1 −1 [14] A. Ion, J. Carreira, and C. Sminchisescu. Image segmenta-
LY = DY − B ⊤ DX B, u = 1−γ 1
DX Bv, and γ ̸= 1, tion by figure-ground composition into maximal cliques. In
−1
we have γ(2 − γ)DY v = LY v = (DY − B ⊤ DX B)v = ICCV, 2011. 2
DY v−(1−γ)B u ⇐⇒ (1−γ) DY v = (1−γ)B ⊤ u ⇐⇒
⊤ 2
[15] T. Kim and K. Lee. Learning full pairwise affinities for spec-
(1 − γ)DY v = B ⊤ u, which is equivalent to (b). tral segmentation. In CVPR, 2010. 2, 3, 5
[16] G. Lanckriet, N. Cristianini, P. Bartlett, L. Ghaoui, and
Proof of Theorem 1. By Theorem 3, fi is the eigenvector of M. Jordan. Learning the kernel matrix with semidefinite pro-
(4) corresponding to the eigenvalue γi , i = 1, . . . , k. By gramming. JMLR, 5:27–72, 2004. 1
Lemma 2, γ1 ≤ · · · ≤ γk < 1. So our proof is done if [17] W. Liu, J. He, and S. Chang. Large graph construction for
γi ’s are the k smallest eigenvalues of (4). Otherwise as- scalable semi-supervised learning. In ICML, 2010. 4
sume (γ1 , . . . , γk ) ̸= (γ1′ , . . . , γk′ ), where γ1′ , . . . , γk′ de- [18] D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database
of human segmented natural images and its application to e-
note the k smallest eigenvalues of (4) with non-decreasing
valuating segmentation algorithms and measuring ecological
order whose corresponding eigenvectors are f1′ , . . . , fk′ , re- statistics. In ICCV, pages 416–423, 2001. 5
spectively. Let γj′ be the first in γi′ ’s that γj′ < γj < 1, [19] M. Meila. Comparing clusterings: an axiomatic view. In
i.e., γi′ = γi , ∀i < j. Without loss of generality, assume ICML, pages 577–584. ACM, 2005. 5
fi′ = fi , ∀i < j. Denote λ∗ = γj′ (2 − γj′ ). Then by Lemma [20] B. Mičušı́k and A. Hanbury. Automatic image segmentation
2, λ∗ < λj because γj′ < γj . By Theorem 2, λ∗ is an eigen- by positioning a seed. In ECCV, pages 468–480, 2006. 2
value of (5) with eigenvector v∗ := fj′ |Y . Note that v∗ is [21] H. Mobahi, S. Rao, A. Yang, S. Sastry, and Y. Ma. Segmenta-
different from vi , ∀i < j, showing that (λ∗ , v∗ ) is the j-th tion of natural images by texture and boundary compression.
eigenpair of (5), which contradicts the fact that (λj , vj ) is IJCV, pages 1–13, 2011. 2, 5
the j-th eigenpair of (5). [22] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering:
Analysis and an algorithm. In NIPS, 2001. 1, 3, 4
References [23] D. Opitz and R. Maclin. Popular ensemble methods: An
empirical study. Journal of Artificial Intelligence Research,
[1] P. Arbeláez, M. Maire, C. Fowlkes, and J. Malik. From con- 11(1):169–198, 1999. 1
tours to regions: An empirical evaluation. In CVPR, 2009. [24] X. Ren and J. Malik. Learning a classification model for
2, 5, 6 segmentation. In ICCV, pages 10–17, 2003. 2
[2] A. Barbu and S. Zhu. Graph partition by swendsen-wang [25] E. Sharon, A. Brandt, and R. Basri. Fast multiscale image
cuts. In ICCV, pages 320–327, 2003. 2 segmentation. In CVPR, pages 70–77, 2000. 2
[3] F. Chung. Spectral Graph Theory. American Mathematical [26] J. Shi and J. Malik. Normalized cuts and image segmenta-
Society, 1997. 7 tion. IEEE Trans. PAMI, 22(8):888–905, 2000. 1, 3, 4, 5
[4] D. Comaniciu and P. Meer. Mean shift: A robust approach [27] R. Unnikrishnan, C. Pantofaru, and M. Hebert. Toward ob-
toward feature space analysis. IEEE Trans. PAMI, pages jective evaluation of image segmentation algorithms. IEEE
603–619, 2002. 1, 5 Trans. PAMI, 29(6):929–944, 2007. 5
[5] T. Cour, F. Benezit, and J. Shi. Spectral segmentation with [28] J. Wang, Y. Jia, X. Hua, C. Zhang, and L. Quan. Normalized
multiscale graph decomposition. In CVPR, 2005. 2, 3, 5 tree partitioning for image segmentation. In CVPR, 2008. 2,
5
[6] Y. Deng and B. Manjunath. Unsupervised segmentation of
color-texture regions in images and video. IEEE Trans. PA- [29] A. Yang, J. Wright, Y. Ma, and S. Sastry. Unsupervised
MI, 23(8):800–810, 2001. 5 segmentation of natural images via lossy data compression.
CVIU, 110(2):212–225, 2008. 2
[7] I. Dhillon. Co-clustering documents and words using bipar-
[30] S. X. Yu and J. Shi. Multiclass spectral clustering. In ICCV,
tite spectral graph partitioning. In KDD, 2001. 3
pages 313–319, 2003. 3, 4
[8] L. Ding and A. Yilmaz. Interactive image segmenta-
[31] H. Zha, X. He, C. Ding, H. Simon, and M. Gu. Bipartite
tion using probabilistic hypergraphs. Pattern Recognition,
graph partitioning and data clustering. In CIKM, 2001. 3, 4
43(5):1863–1873, 2010. 2