Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

BiMMSB

Bipartite mixed membership stochastic blockmodel

Huan Qing qinghuan@cumt.edu.cn


Department of Mathematics
China University of Mining and Technology
Xuzhou, 221116, P.R. China
arXiv:2101.02307v1 [stat.ML] 7 Jan 2021

Jingli Wang jlwang@nankai.edu.cn


School of Statistics and Data Science
Nankai University
Tianjin, 300071, P.R. China

Editor: **

Abstract

Abstract

Mixed membership problem for undirected network has been well studied in network analy-
sis recent years. However, the more general case of mixed membership for directed network
remains a challenge. Here, we propose an interpretable model: bipartite mixed member-
ship stochastic blockmodel (BiMMSB for short) for directed mixed membership networks.
BiMMSB allows that row nodes and column nodes of the adjacency matrix can be different
and these nodes may have distinct community structure in a directed network. We also
develop an efficient spectral algorithm called BiMPCA to estimate the mixed memberships
for both row nodes and column nodes in a directed network. We show that the approach
is asymptotically consistent under BiMMSB. We demonstrate the advantages of BiMMSB
with applications to a small-scale simulation study, the directed Political blogs network
and the Papers Citations network.
Keywords: bipartite networks, spectral clustering, mixed memberships, the general
Davis-Kahan theorem, rectangular Bernstein inequality

1. Introduction
Networks with meaningful structures are ubiquitous in our daily life in the big data era.
For example, the social networks generated by social platforms (such as, Facebook, Twit-
ter, Wechat, Instagram, WhatsUp, Line, etc) provide relationships or friendships among
users, the protein-protein interaction networks record the relationships among proteins;
the citation networks reflect authors’ research preferences (Dunne et al., 2002; Newman,
2004; Notebaart et al., 2006; Pizzuti, 2008; Gao et al., 2010; Lin et al., 2012; Su et al.,
2010; Scott and Carrington, 2014; Bedi and Sharma, 2016; Wang et al., 2020). To ana-
lyze networks mathematically, researches present them in a form of graph in which sub-
jects/individuals are presented by nodes, and the relationships are measured by the edges,
directions of edges and weights. Community detection is one of the major tools to extract
structural information from these networks.

1
Qing and Wang

For simplification, most researchers study the undirected networks for community detec-
tion such as Lancichinetti and Fortunato (2009); Goldenberg et al. (2010); Karrer and Newman
(2011); Qin and Rohe (2013); Jin (2015); Chen et al. (2018); Jing et al. (2021). The Stochas-
tic Blockmodel (SBM) (Holland et al., 1983) is a classical and widely used model to generate
undirected networks. SBM assumes that one node only belongs to one community and the
probability of a link between two nodes depends only on the communities memberships of
the two nodes. SBM also assumes the nodes within each community have the same ex-
pected degrees. While, in real cases some nodes may share among multiple communities
with different degrees, which is known as mixed membership (also known as overlapping)
networks. Airoldi et al. (2008) extended SBM to mixed membership networks and designed
the Mixed Membership Stochastic Blockmodel (MMSB). Substantial algorithms have been
developed based on MMSB, such as Gopalan and Blei (2013); Jin et al. (2017); Mao et al.
(2017, 2020); Zhang et al. (2020).
Directed networks such as citation networks, protein-protein interaction networks and
the hyperlink network of websites are also common in our life. Such directed networks
are more complex since they often involve two types of information, sending nodes and
receiving nodes. For instance, in a citation network, one paper may cite many other papers,
then this paper can be labeled as ‘sending node’ and these cited papers can be labeled
as ‘receiving nodes’. Several interesting works have been developed for directed networks.
Rohe et al. (2016) proposed a model called Stochastic co-Blockmodel (ScBM) to model
networks with directed (asymmetric) relationships where nodes have no mixed memberships
(i.e., one node only belongs to one community). Wang et al. (2020) studied the theoretical
guarantee for the algorithm D-SCORE (Ji and Jin, 2016) which is designed based on the
degree-corrected version of ScBM. Lim et al. (2018) proposed a flexible noise tolerant graph
clustering formulation based on non-negative matrix factorization (NMF), which solves
graph clustering such as community detection for either undirected or directed graphs. In
the bipartite setting some authors constructed new models by extending SBM, such as
(Zhou and Amini, 2018; Razaee et al., 2019).
While all the existing models and algorithms for directed network community detection
focus on non-mixed membership networks. As far as we know, there is no literature at
present to provide a general model to study directed networks with mixed memberships.
However, similar as in undirected networks, in reality, there exists a lot of directed networks
such that their sending nodes and/or receiving nodes may belong to multiple clusters.
In this paper, we focus on the directed network with mixed membership. Our contribu-
tions in this paper are as follows:

(i) We propose a novel generative model for directed networks with mixed member-
ship, the Bipartite Mixed Membership Stochastic Blockmodel (BiMMSB for short).
BiMMSB allows that nodes in a directed network can belong to multiple communi-
ties. The proposed model also allows that sending nodes and receiving nodes can be
different, that is, the adjacency matrix could be an non-square matrix. We also prove
the identifiability of BiMMSB under certain constraints.

(ii) We construct a fast spectral algorithm Bipartite Mixed Principal Component Analysis
(BiMPCA for short) to fit BiMMSB. We show that our method produces asymptoti-
cally consistent parameter estimations.

2
BiMMSB

Notations. We take the following general notations in this paper. For a vector x and
fixed q > 0, kxkq denotes its lq -norm. We drop the subscript if q = 2 occasionally. For
a matrix M , M ′ denotes the transpose of the matrix M , and kM k denotes the spectral
norm, and kM kF denotes the Frobenius norm. Let σi (M ) be the i-th largest singular value
of matrix M , and λi (M ) denote the i-th largest eigenvalue of the matrix M ordered by
the magnitude. M (i, :) and M (:, j) denote the i-th row and the j-th column of matrix M ,
respectively. M (Sr , :) and M (:, Sc ) denote the rows and columns in the index sets Sr and Sc
of matrix M , respectively. For any matrix M , we simply use Y = max(0, M ) to represent
Yij = max(0, Mij ) for any i, j.

2. The bipartite mixed membership stochastic blockmodel


We introduce the bipartite mixed membership stochastic blockmodel (BiMMSB for short).
First we define a bi-adjacency matrix A ∈ {0, 1}nr ×nc such that for each entry, A(i, j) = 1
if there is a directional edge from node i to node j, and A(i, j) = 0 otherwise, where nr
and nc indicate the number of rows and the number of columns, respectively (the followings
are similar), and they are not necessarily equal. So, the i-th row of A records how node i
sends edges, and the j-th column of A records how node j receives edges. Let Sr = {i :
i is a row node, 1 ≤ i ≤ nr }, and Sc = {j : j is a column node, 1 ≤ j ≤ nc }, then Sr is the
set of all row nodes in A and Sc is the set of all column nodes in A. Since row nodes can be
different from the column nodes, we may have Sr ∩ Sc = ∅, where ∅ denotes the null set.
We assume that the row nodes of A belong to K perceivable disjoint communities (call row
communities in this paper)

Cr(1) , Cr(2) , . . . , Cr(K) , (1)

and the column nodes of A belong to K perceivable disjoint communities (call column
communities in this paper)

Cc(1) , Cc(2) , . . . , Cc(K) . (2)

Define an nr ×K row nodes membership matrix Πr and an nc ×K column nodes membership


matrix Πc such that Πr (i, :) is a 1 × K Probability Mass Function (PMF) for row node i
and

P(row node i belongs to Cr(k) ) = Πr (i, k), 1 ≤ k ≤ K, 1 ≤ i ≤ nr , (3)

and Πc (i, :) is a 1 × K Probability Mass Function (PMF) for column node j and

P(column node j belongs to Cc(l) ) = Πc (j, l), 1 ≤ l ≤ K, 1 ≤ j ≤ nc . (4)

We call row node i ‘pure’ if Πr (i, :) degenerates (i.e., one entry is 1, all others K − 1 entries
are 0) and ‘mixed’ otherwise. Same definitions hold for column nodes.
Define a mixing matrix P ∈ RK×K which is an nonnegative matrix and for any 1 ≤
k, l ≤ K,

P (k, l) ∈ [0, 1]. (5)

3
Qing and Wang

(a) (b)

Figure 1: Two schematic diagrams for BiMMSB.

Note that since we consider directed network in this paper, P may be asymmetric.
For any fixed pair of (i, j), BiMMSB assumes that

P(A(i, j) = 1|i ∈ Cr(k) & j ∈ Cc(l) ) = P (k, l). (6)

Then, for all pairs of (i, j) with 1 ≤ i ≤ nr , 1 ≤ j ≤ nc , A(i, j) are Bernoulli random
variables that are independent of each other, satisfying
K X
X K
P(A(i, j) = 1) = Πr (i, k)Πc (j, l)P (k, l). (7)
k=1 l=1

Definition 1 Call (1)-(7) the Bipartite Mixed Membership Stochastic Blockmodel (BiMMSB)
and denote it by BiM M SB(nr , nc , K, P, Πr , Πc ).

BiMMSB can be deemed as an extension of some previous models.

• When all row nodes and column nodes are pure, our BiMMSB reduces to ScBM
Rohe et al. (2016).

• When nr = nc , P = P ′ , BiMMSB reduces to MMSB (Airoldi et al., 2008).

• When nr = nc , P = P ′ , and all row nodes and column nodes are pure, BiMMSB
reduces to SBM (Holland et al., 1983).

BiMMSB can model various networks, and the generality of BiMMSB can be laconically
explained by the two schematic diagrams in Figure 1, where there is an edge if there is an
arrow from one node to another node in this graph, nodes in the same cluster are enclosed
by dashed circle, and nodes in block have mixed memberships. In panel (a) of Figure 1,

4
BiMMSB

row nodes and column nodes are the same, where there are 7 nodes in this network (i.e.,
A ∈ R7×7 ), nodes a, b, c, d belong to row cluster 1 and they also belong to column cluster
1, nodes e, f, g belong to row cluster 2 and column cluster 2. Since nodes c and d point
to nodes e, node e points to nodes c, d, these three nodes c, d, e have mixed memberships.
In panel (b), row nodes are different from column nodes. There are 10 row nodes where
nodes referred by solid circle belong to row cluster 1, nodes referred by solid square belong
to row cluster 2. There are 9 columns nodes where nodes referred by solid triangle belong
to column cluster 1, nodes referred by solid star belong to column cluster 2. The bipartite
adjacency matrix A in panel (b) is a 10 × 9 matrix, whose row nodes are different from
column nodes. Meanwhile, for row nodes, since the blacked circles and the black squares
point to the black triangle node and the black star node, they belong to row cluster 1 and
cluster 2 simultaneously (i.e., they are mixed nodes). Since the black triangle node and the
black star node are pointed by mixed nodes, they are treated as mixed column nodes.

2.1 Identifiability
The parameters in the BiMMSB model obviously need to be constrained to guarantee
identifiability of the model. The model identifiability problem is necessary, since if one
model is not identifiable, methods designed based on such model may lead to unbelievable
results. For example, Mao et al. (2020), Jin et al. (2017) and Zhang et al. (2020) all studied
the identifiability of their proposed new models. Therefore it is necessary to study the
identifiability of BiMMSB. All models with communities, are considered identifiable if they
are identifiable up to a permutation of community labels. The following conditions are
sufficient for the identifiability of BiMMSB:

• (I1) rank(P ) = K.

• (I2) There is at least one pure node for each of the K row and K column communities.

For convenience, from now on we treat the conditions (I1) and (I2) hold as default in
this paper. Now we decompose A into a sum of a ‘signal’ part and a ‘noise’ part:

A = Ω + W,

where the nr × nc matrix Ω is the expectation of the adjacency matrix A, and W is a


generalized Wigner matrix. Then, under BiMMSB, we have

Ω = Πr P Π′c . (8)

We refer Ω as the population adjacency matrix. Ω is of rank K by basic algebra. Since Ω


is a low-rank matrix (K < min{nr , nc }) which is the key in why spectral clustering works
for BiMMSB.
Next theorem guarantees that once conditions (I1) and (I2) hold, BiMMSB is identifi-
able.

Theorem 2 If conditions (I1) and (I2) hold, BiMMSB is identifiable, i.e., if a given matrix
Ω corresponds to a set of parameters (nr , nc , K, P, Πr , Πc ) through (8), these parameters are
unique up to a permutation of community labels.

5
Qing and Wang

3. A spectral algorithm for fitting BiMMSB


The primary goal of the proposed algorithm is to estimate the row membership matrix Πr
and column membership matrix Πc from the observed adjacency matrix A with given K
1 . Considering the computational scalability, we focus on the idea of spectral clustering by

spectral decomposition to design an efficient algorithm under BiMMSB in this paper.


We now discuss our intuition for the design of our algorithm. Under conditions (I1)
and (I2), by basic algebra, we have rank(Ω) = K, which is much smaller than min{nr , nc }.
Let Ω = U ΛV ′ be the compact singular value decomposition of Ω, where U ∈ Rnr ×K , Λ ∈
RK×K , V ∈ Rnc ×K , U ′ U = I, V ′ V = I, and I is a K × K identity matrix. Define a
K × K matrix Fr as Fr = P Π′c Ω′ Πr and a K × K matrix Fc as Fc = P ′ Π′r ΩΠc . Note that
rank(Fr ) = K and rank(Fc ) = K. Then we have the following lemma.
Lemma 3 Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that each row (column) commu-
nity has at least one pure node (i.e., the set {1 ≤ i ≤ nr : Πr (i, k) = 1} = 6 ∅ for all
1 ≤ k ≤ K, and the set {1 ≤ j ≤ nc : Πc (j, l) = 1} =
6 ∅ for all 1 ≤ l ≤ K). There exist an
unique K × K matrix Br and an unique K × K matrix Bc such that for 1 ≤ k, l ≤ K,

• U = Πr Br , and the k-th column of Br is the k-th right eigenvector of Fr , and σk2 (Ω)
is the k-th eigenvalue of Fr .

• V = Πc Bc , and the l-th column of Bc is the l-th right eigenvector of Fc , and σl2 (Ω) is
the l-th eigenvalue of Fc .

Lemma 3 says that the population left-singular vectors U lie on a rotated and scaled simplex,
similar for the population right-singular vectors V . Furthermore, since Br and Bc are full
rank matrices, if U, V, Br and Bc are known in advance ideally, we can exactly recover Πr
and Πc by setting Πr = U Br′ (Br Br′ )−1 and Πc = U Bc′ (Bc Bc′ )−1 . However, if we only have
U and V at hand and do not know Br and Bc in advance, it seems that we can not exactly
recover Πr and Πc . Luckily, this problem is solved by Lemma 4. The lemma says that each
row of U is a convex linear combination of the K rows of Br , hence all rows of U lie in the
simplex formed by the K rows of Br . Therefore we can exactly obtain Br by some convex
hull algorithms. Similarly, we can obtain Bc from V .

Lemma 4 (Ideal BiMPCA) Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that each row
(column) community has at least one pure node, we have

• U (i, :) is a convex linear combination of Br (1, :), Br (2, :), . . . , Br (K, :) for 1 ≤ i ≤ nr .
All rows of U form a K-simplex in RK which is called the Row Ideal Simplex (RIS) with
Br (1, :), Br (2, :), . . . , Br (K, :) being vertices. V (j, :) is a convex linear combination of
Bc (1, :), Bc (2, :), . . . , Bc (K, :) for 1 ≤ j ≤ nc . All the rows of V form a K-simplex in
RK which is called the Column Ideal Simplex (CIS) with Bc (1, :), Bc (2, :), . . . , Bc (K, :)
being vertices.

• If Πr (i, :) = Πr (ī, :) for any two distinct row nodes i and ī, then U (i, :) = U (ī, :).
Furthermore, if row node i is pure, U (i, :) falls exactly on one of the vertices of the
1. The estimation of P and K may also be of interest, we mainly focus on estimating memberships in this
paper.

6
BiMMSB

RIS. If row node i is mixed, U (i, :) is in the interior or face of the RIS, but not on
any of the vertices. Similar conclusions hold for column nodes.

• Given U and V , we can obtain Br and Bc by a convex hull algorithm. Hence, the row
membership matrix Πr and column membership matrix Πc can be exactly recovered
with given U and V by setting Πr = U Br′ (Br Br′ )−1 and Πc = U Bc′ (Bc Bc′ )−1 .

To demonstrate the RIS and CIS, we drew some figures in Figure 2. The results in
panels (a) and (b) support the first two statements in Lemma 4 such that if node i is pure,
then U (i, :) falls on the vertex of the RIS, and if node i is mixed, then U (i, :) falls in the
interior of the RIS. Similar arguments hold for V . In panels (c)-(h), we plot Û and V̂ under
different settings of the number of pure nodes in row clusters and column clusters, where
the data is generated according to a BiMMSB from Experiment 1. And in panels (c)-(h) of
Figure 2, we also plot the B̂r and B̂c obtained by applying successive projection algorithm
(Gillis and Vavasis, 2015) on Û and V̂ , respectively. From the results in panels (c)-(e),
we can find that points in Û generated from the same row cluster are always much closer
than row nodes from different row clusters. Meanwhile, as the number of pure row nodes
nr,0 increases for each row cluster, the number of points fall in the interior of the triangle
decreases, which is consistent with the first statement of Lemma 4 such that if less row
nodes are pure, then the number of rows of U that falls in the interior of the RIS decreases.
Similar arguments hold for V̂ .
Based on the above analysis, we are now ready to give the following three-stage algorithm
which we call Ideal BiMPCA. Input Ω. Output: Πr and Πc .

• PCA step. Obtain the left singular vectors U ∈ Rnr ×K and the right singular vectors
V ∈ Rnc ×K from Ω.

• Vertex Hunting (VH) step. Run a convex hull algorithm on rows of U to obtain
Br and on rows of V to obtain Bc .

• Membership Reconstruction (MR) step. Recover Πr and Πc by setting Πr =


U Br′ (Br Br′ )−1 and Πc = U Bc′ (Bc Bc′ )−1 .

The process of obtaining Br from U (or obtaining Bc from V ) is known as vertex hunting
in Jin et al. (2017). Meanwhile, Jin et al. (2017) suggests several efficient vertex hunting
algorithms, and we choose the successive projection (SP for short) algorithm in this paper
for hunting vertexes from U and V as well as their random versions Û and V̂ (defined in
Algorithm 1). The SP algorithm is originally proposed in Gillis and Vavasis (2015), and
the detail of SP can also be found in Jin et al. (2017). Actually, the SP algorithm is also
applied in Mao et al. (2020) for the vertex hunting step under the MMSB Airoldi et al.
(2008). Readers can also apply other VH algorithms provided in Jin et al. (2017) to finding
B̂r and B̂c for our BiMPCA algorithm.
Algorithm 1 called bipartite mixed principal component analysis (BiMPCA for short)
is a natural extension of the Ideal BiMPCA to the real case.
In the MR step, we set the negative entries of Ŷr as 0 by setting Ŷc = max(0, Ŷc ) for
the reason that weights for any row node should be nonnegative while there may exist
some negative entries of Û ∗ B̂r′ (B̂r B̂r′ )−1 . Meanwhile, since B̂r has K distinct rows and nr is

7
Qing and Wang

(a) Ideal U (b) Ideal V

(c) Û when nr,0 = 60 (d) Û when nr,0 = 120 (e) Û when nr,0 = 180

(f) V̂ when nc,0 = 40 (g) V̂ when nc,0 = 100 (h) V̂ when nc,0 = 160

Figure 2: Panel (a) shows the RIS in Experiment 1 when nc,0 = nr,0 = 120, where
nc,0 and nr,0 are the numbers of pure nodes in row and column respectively
(black: pure nodes; blue: mixed nodes. Each point is a row of U . Many
rows are equal, so a point may represent many rows). All mixed (both row
and column) nodes evenly distributed in 4 groups, where the PMFs equal to
(0.4, 0.4, 0.2), (0.4, 0.2, 0.4), (0.2, 0.4, 0.4) and (1/3, 1/3, 1/3). Panel (b) shows the
CIS with same setting as (a). Panel (c): each point is a row of Û while black point
is the vertex obtained by SP algorithm, and we also add a triangle (solid black)
estimated by SP algorithm in Experiment 1 when setting nr,0 = 60. Similar in-
terpretations for Panels (d)-(h). Since K = 3 in Experiment 1, for visualization,
we have projected and rotated these points from R3 to R2 .

8
BiMMSB

Algorithm 1 Bipartite Mixed Principal Component Analysis (BiMPCA)


Require: The adjacency matrix A ∈ Rnr ×nc of a directed network, the number of row
communities (column communities) K.
Ensure: The estimated nr × K row membership matrix Π̂r and the estimated nc × K
column membership matrix Π̂c .
1: PCA step. Compute the left singular vectors Û ∈ Rnr ×K and right singular vectors
V̂ ∈ Rnc ×K of A.
2: Vertex Hunting (VH) step. Apply SP algorithm on Û to obtain K estimated row
cluster centers B̂r (1, :), B̂r (2, :), . . . , B̂r (K, :) ∈ R1×K . We form a K × K matrix B̂r such
that the k-th row of B̂r is B̂r (k, :), 1 ≤ k ≤ K. Similarly, apply SP algorithm on V̂ to
obtain K estimated column cluster centers B̂c (1, :), B̂c (2, :), . . . , B̂c (K, :) ∈ R1×K . Form
a K × K matrix B̂c such that the l-th row of B̂c is B̂c (l, :), 1 ≤ l ≤ K.
3: Membership Reconstruction (MR) step. Project the rows of Û onto the spans of
B̂r (1, :), . . . , B̂r (K, :), i.e., compute the nr × K matrix Ŷr such that Ŷr = Û B̂r′ (B̂r B̂r′ )−1 .
Set Ŷr = max(0, Ŷr ) and estimate Πr (i, :) by Π̂r (i, :) = Ŷr (i, :)/kŶr (i, :)k1 , 1 ≤ i ≤ nr .
Similarly, compute the nc × K matrix Ŷc such that Ŷc = V̂ B̂c′ (B̂c B̂c′ )−1 . Set Ŷc =
max(0, Ŷc ) and estimate Πc (j, :) by Π̂c (j, :) = Ŷc (j, :)/kŶc (j, :)k1 , 1 ≤ j ≤ nc .

always much lager than K, the inverse of B̂r B̂r′ always exists in practice. Similar statements
hold for column nodes.

4. Main results for BiMPCA


In this section, we show the consistency of our algorithm for fitting the BiMMSB as the
number of row nodes nr and the number of column nodes nc increase. Thus we need to show
that the sample-based estimates Π̂r and Π̂c concentrate around the true mixed membership
matrix Πr and Πc . Throughout this paper, K ≥ 2 is a known integer.
For convenience, set n = nr + nc . First, we bound kA − Ωk under BiMMSB based on
the application of the rectangular version of Bernstein inequality in Tropp (2012). This
technique allows us to deal with rectangular random matrices, and this is the corner stone
that our algorithm BiMPCA can fit BiMMSB when nr 6= nc . And this technique also
supports us to propose BiMMSB to such a general case that it can even model networks with
different row nodes and column nodes. Set Z ≡ max(n4 r ,nc ) , we assume that for sufficiently
large n, we have
log( n2 )
≤ 1. (9)
Z
This assumption is mild since as long as min(nr , nc ) is large enough, Z is always larger than
log( n2 ). Then we have the following lemma.
Lemma 5 Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that the identifiability conditions
(I1) and (I2) hold, and the assumption (9) holds, then for sufficiently large n, with proba-
bility at least 1 − o(n−4 ), we have
r
n
kA − Ωk ≤ 6 Z · log( ).
2

9
Qing and Wang

Based on Lemma 5, by the extended Davis-Kahan theorem for general real matrices (The-
orem 3 in Yu et al. (2015) which allows us to obtain the bound of vectors between two
rectangular random matrices.), we can obtain the bound between Û and U (as well as the
bound between V̂ and V ), up to an orthogonal matrix. Combining the extended Davis-
Kahan theorem and the rectangular version of Bernstein inequality, the core fundamen-
tals for BiMPCA are already built. Again, this two techniques support the generality of
BiMMSB such that it can model networks with non-square adjacency matrices.
Lemma 6 Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that the identifiability conditions
(I1) and (I2) hold, and the assumption (9) holds, then for sufficiently large n, with proba-
bility at least 1 − o(n−4 ), there exist two orthogonal matrixes Ôr , Ôc ∈ RK×K such that

24 2K(σ1 + 3 Z · log( n2 )) Z · log( n2 )
p p
kÛ Ôr − U kF ≤ 2 ,
σK

24 2K(σ1 + 3 Z · log( n2 )) Z · log( n2 )
p p
kV̂ Ôr − V kF ≤ 2 ,
σK

where σ1 , σK are the largest and smallest singular values of Ω, respectively.


Lemma 6 bounds the terms obtained in the first step in BiMPCA and the Ideal BiMPCA.
Next, we move to the second step of BiMPCA, and we aim to build the bound between B̂r
and Br . Theorem 1 in Gillis and Vavasis (2015) builds bound on the vertex centers matrix
obtained by SP algorithm, which supports us to obtain√ Lemma√ 7 directly.
CKκ2 (Br )(σ1 +3 Z·log( n ))
2 2
Z·log( n )
For notation convenience, set errr = 2 √n
σK
and errc =
2
√ n
√ n
r
CKκ (Bc )(σ1 +3 Z·log( 2 )) Z·log( 2 )
2 √n
σK
, where C is a positive constant, and κ(M ) denotes M ’s
c
condition number for any matrix M . Assume that
σmin (Br ) σmin (Bc )
errr = O( √ ), and errc = O( √ ), (10)
Kκ2 (Br ) Kκ2 (Bc )

where σmin (M ) denotes the minimum singular value for any matrix M . Next lemma bounds
kB̂r Ôr − Br kF and kB̂c Ôc − Bc kF .
Lemma 7 Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that the identifiability conditions
(I1) and (I2) hold, and the assumptions (9)-(10) hold, then for sufficiently large n, with
probability at least 1 − o(n−4 ), we have

kB̂r Ôr − Br kF ≤ errr , kB̂c Ôc − Bc kF ≤ errc .

Before providing next lemma, first we introduce one assumption and several new terms for
notation convenience. Assume that there exists two global constant mBr > 0 and mBc > 0
such that

λmin (Br Br′ ) ≥ mBr and λmin (Bc Bc′ ) ≥ mBc . (11)

Let nkr = |{i : Πr (i, k) = 1, 1 ≤ i ≤ nr }| be the number of pure row nodes in row community
k for 1 ≤ k ≤ K. Let nlc = |{j : Πc (j, l) = 1, 1 ≤ j ≤ nc }| be the number of pure

10
BiMMSB

column nodes in column community l for 1 ≤ l ≤ K. Let nr,min = min1≤k≤K nkr , nc,min =
min1≤l≤K nlc . For convenience, set

24K 3/2 2(σ1 + 3 Z · log( n2 )) Z · log( n2 )
p p
2K 5/2 errr
Errr = ( √ q + K) + 2 √n ,
1
mBr − K(1 + nr,min )errr mBr mBr σK r,min

24K 3/2 2(σ1 + 3 Z · log( n2 )) Z · log( n2 )
p p
2K 5/2 errc
Errc = ( √ q + K) + 2 √n .
mB − K(1 + 1
)errc mBc mBc σK c,min
c nc,min

Now we move to the MR step of BiMPCA, where we aim to bound kŶr −Yr kF and kŶc −Yc kF
based on the above lemmas, where Yr and Yc are defined as Yr = max(U Br′ (Br Br′ )−1 , 0), Yc =
max(V Bc′ (Bc Bc′ )−1 , 0).

Lemma 8 Under BiM M SB(nr , nc , K, P, Πr , Πc ), assume that the identifiability conditions


(I1) and (I2) hold, and the assumptions (9)-(11) hold, then for sufficiently large n, with
probability at least 1 − o(n−4 ), we have

kŶr − Yr kF ≤ Errr , kŶc − Yc kF ≤ Errc .

Next theorem gives theoretical bounds on kΠ̂r − Πr kF and kΠ̂c − Πc kF , which is the main
theoretical result for our BiMPCA method.

Theorem 9 Under BiM M SB(nr , nc , K, P, Πr , Πc ), set mYr = min1≤i≤nr {kYr (i, :)k} and
mYc = min1≤j≤nc {kYc (j, :)k}. Under the same conditions as in Lemma 8, then for suffi-
ciently large n, with probability at least 1 − o(n−4 ), we have

kΠ̂r − Πr kF 2Errr kΠ̂c − Πc kF 2Errc


√ ≤ √ , √ ≤ √ .
nr mYr n r nc mYc n c

For the bounds in Theorem 9, we can make some comments. With fixed nr and nc , increasing
K gives a larger error bound. If nr and nc are fixed but nr,min decreases, i.e., there are
fewer pure nodes in each row clusters, then the Errr increases which makes the error bound
between Π̂r and Πr increase. Thus, fewer pure nodes in each row cluster leads to a more
challenging case to detect the row memberships. Furthermore, considering the extreme
case nr,min = 0, i.e., there exists a row cluster which has no pure node, then Errr tends
to infinity, and the error bounds for all row clusters are infinity, hence BiMPCA fails to
detect row clusters in such extreme case. And this is why condition (I2) is needed for the
identifiability of BiMMSB. Similar arguments hold if we decrease nc,min . If we fix all other
parameters and change the entries in P such that P is more close to a singular matrix (and
we can even consider the extreme case such that P is not a full rank matrix), then both Br
and Bc will be more close to singular matrices. This leads to a result that λmin (Br Br′ ) and
λmin (Bc Bc′ ) are close to 0, hence mBr and mBc tend to zero, all these changes turn out that
Errr and Errc tend to infinity. Thus the error bounds in Theorem 9 tend to infinity, and
BiMPCA fails to detect both row clusters and column clusters. This is why condition (I1)
is needed to guarantee the identifiability of BiMMSB.

11
Qing and Wang

5. Simulations
In this section, some simulations are conducted to investigate the performance of our BiM-
PCA. We measure the performance of the proposed method by Bi-Mixed-Hamming error
rate, row-Mixed-Hamming error rate and column-Mixed-Hamming error rate, and they are
defined as:
minO∈S kΠ̂r O−Πr k1 +minO∈S kΠ̂c O−Πc k1
• BiMHamm= n ,
minO∈S kΠ̂r O−Πr k1
• row-MHamm= nr ,

minO∈S kΠ̂c O−Πc k1


• column-MHamm= nc ,

where Πr (Πc ) and Π̂r (Π̂c ) are the true and estimated row (column) mixed membership
matrices respectively, and S is the set of K×K permutation matrices. Here, we also consider
the permutation of labels since the measurement of error should not depend on how we label
each of the K communities. BiMHamm is used to measure the BiMPCA’s performances on
both row nodes and column nodes, while row-MHamm and column-MHamm are used to
measure its performance on row nodes and column nodes respectively.
For all simulations in this section, the parameters (nr , nc , K, P, Πr , Πc ) under BiMMSB
are set as follows. For row nodes, nr = 600 and K = 3. For convenience, call the 3 row
clusters and 3 column clusters as: row cluster 1, row cluster 2, row cluster 3, column cluster
1, column cluster 2 and column cluster 3. Let each row block own nr,0 pure nodes, and
0 ≤ nr,0 ≤ 160. So there are 3nr,0 pure nodes for 3 communities. We let the top 3nr,0
row nodes {1, 2, . . . , 3nr,0 } be pure and the rest row nodes {3nr,0 + 1, 3nr,0 + 2, . . . , 600}
1
be mixed. Fixing x ∈ [0, ), let all the mixed row nodes have four different memberships
2
600−3nr,0
(x, x, 1 − 2x), (x, 1 − 2x, x), (1 − 2x, x, x) and (1/3, 1/3, 1/3), each with 4 number
of nodes. For column nodes, set nc = 500. Let each column block own nc,0 number of
pure nodes. Let the top 3nc,0 column nodes {1, 2, . . . , 3nc,0 } be pure and column nodes
{3nc,0 + 1, 3nc,0 + 2, . . . , 500} be mixed. The settings of column mixed memberships are
same as row mixed memberships. When nr,0 = nc,0 , denote n0 = nr,0 = nc,0 for convenience.
Fixing ρ ∈ (0, 1) and γ > 0, unless specified, the mixing matrix P is
 
ρ 0.1 0.3
P = γ 0.2 ρ 0.4 .
0.5 0.2 ρ

For all settings, we report the averaged BiMHamm, the averaged row-MHamm and the
averaged column-MHamm over 50 repetitions.
Experiment 1: Fraction of pure nodes. Fix (x, ρ, γ) = (0.4, 0.8, 1) and let n0
range in {40, 60, 80, 100, 120, 140, 160}. A larger n0 indicates a case with higher fraction of
pure nodes for both row clusters and column clusters. The numerical results are shown in
Panels (a), (b) and (c) of Figure 3. From the three panels, we see that the three error rates
look similar, and the fraction of pure nodes influences the performance of BiMPCA such
that BiMPCA performs better with the increasing number of pure nodes in the simulated
network.

12
BiMMSB

Experiment 2: Connectivity across communities. Fix (x, n0 , γ) = (0.4, 120, 1)


and let ρ range in {0.1, 0.2, . . . , 1}. A lager ρ generates more edges. The results are dis-
played in panels (d), (e) and (f) of Figure 3. Again, the three error rates share similar results
in this experiment. From these figures we can find that when ρ is small (ρ < 0.5), BiMPCA
performs poor with large error rates, while when ρ > 0.5, the error rates decreases dramat-
ically. We can conclude, for k = 1, 2, 3, when the number of edges between row cluster k
and column cluster k are small, BiMPCA may fail to detect communities, while when the
number of edges between row cluster k and column cluster k is suitable large, it performs
better with more edges.
Experiment 3: Purity of mixed nodes. Fix (n0 , ρ, γ) = (120, 0.8, 1), and let x
range in {0, 0.05, . . . , 0.5}. When 0 < x < 1/3, mixed nodes become less pure with the
increasing of x, and when 1/3 < x < 1/2, mixed nodes become more pure as x increases.
Panels (g), (e) and (f) of Figure 3 record numerical results of this experiment. All error rates
for different x are quite close in this experiment. We may argue that x has slight influence
on the performance of BiMPCA because all error rates are close in this experiment.
Experiment 4: Sparsity of mixing matrix. We study how the sparsity of mixing
matrix affects the behavior of BiMPCA in this experiment. The mixing matrix in this
experiment is set as
 
ρ 0.1 0.1
P = γ 0.15 ρ 0.1 .
0.15 0.15 ρ

Fix (n0 , ρ, x) = (120, 0.25, 0.4). Let γ ∈ {1, 1.5, . . . , 4}. Therefore, a larger γ indicates a
denser simulated network. Panels (j), (k) and (l) in Figure 3 display simulation results of
this experiment. These three error rates are similar in this experiment. The results show
that BiMPCA performs better as the simulated network become denser.

6. Applications to real-world data sets


6.1 Political Blogs network
In this subsection, we apply BiMPCA algorithm to the web Political blogs network in-
troduced by Adamic and Glance (2005). The data was collected at 2004 US presidential
election. Such political blogs data can be represented by a directed graph, in which each
node in the graph corresponds to a web blog labelled either as liberal or conservative. An
directed edge from node i to node j indicates that there is a hyperlink from blog i to blog
j. Clearly, such a political blog graph is directed due to the fact that there is a hyperlink
from blog i to j does not imply there is also a hyperlink from blog j to i.
In our experiment, we first extract the largest component of the graph, which contains
1222 nodes. And let A ∈ R1222×1222 be the adjacency matrix of the largest component.
Before applying our algorithm to this network, we define two diagonal matrices first. For
any directed
Pnadjacency matrix A ∈ Rnr ×ns , let Dr be an nr × nr diagonal matrix such that
Dr (i, i) = Pj=1 A(i, j) for 1 ≤ i ≤ nr , and let Dc be an nc × nc diagonal matrix such that
c

Dc (j, j) = ni=1
r
A(i, j) for 1 ≤ j ≤ nc . Hence, Dr ’s i-th diagonal entry (without causing
confusion, we call it row degree for convenience) is the degree of row node i and Dc ’s j-th
diagonal entry (call column degree) is the degree of column node j.

13
Qing and Wang

Fraction of pure nodes Fraction of pure nodes Fraction of pure nodes


0.15 0.15 0.15
BiMPCA BiMPCA BiMPCA
0.14 0.14 0.14

column-MHamm
row-MHamm
0.13
BiMHamm

0.13 0.13
0.12
0.12 0.12
0.11
0.11 0.1 0.11

0.1 0.09 0.1


40 60 80 100 120 140 160 40 60 80 100 120 140 160 40 60 80 100 120 140 160
n0 n0 n0

(a) Ex1: BiMHamm (b) Ex1: row-MHamm (c) Ex1: column-MHamm


Connectivity across communities Connectivity across communities Connectivity across communities
0.35 0.35 0.4
BiMPCA BiMPCA BiMPCA
0.3 0.3 0.35

column-MHamm
0.3
row-MHamm

0.25 0.25
BiMHamm

0.25
0.2 0.2
0.2
0.15 0.15
0.15
0.1 0.1 0.1

0.05 0.05 0.05


0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1

(d) Ex2: BiMHamm (e) Ex2: row-MHamm (f) Ex2: column-MHamm


Purity of mixed nodes Purity of mixed nodes Purity of mixed nodes
0.107 0.108 0.11

0.106
0.106
0.108
column-MHamm
row-MHamm

0.105
BiMHamm

0.104
0.104 0.106
0.102
0.103
0.104
0.1
0.102
BiMPCA BiMPCA BiMPCA
0.101 0.098 0.102
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
x x x

(g) Ex3: BiMHamm (h) Ex3: row-MHamm (i) Ex3: column-MHamm


Sparsity of mixing matrix Sparsity of mixing matrix Sparsity of mixing matrix
0.22 0.2 0.22
BiMPCA BiMPCA BiMPCA
0.2 0.18 0.2
column-MHamm

0.18 0.18
row-MHamm

0.16
BiMHamm

0.16 0.16
0.14
0.14 0.14
0.12
0.12 0.12

0.1 0.1 0.1

0.08 0.08 0.08


1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4 1 1.5 2 2.5 3 3.5 4

(j) Ex4: BiMHamm (k) Ex4: row-MHamm (l) Ex4: column-MHamm

Figure 3: Numerical results of Experiments 1-4 where we use Ex1-Ex4 to denote Experi-
ments 1-4, respectively.

14
BiMMSB

The histogram of row degree The histogram of column degree Leading 8 singular values
300 500 60

250
400 50

200
300 40
150
200 30
100

100 20
50

0 0 10
0 50 100 150 200 250 300 0 100 200 300 400 0 2 4 6 8

(a) row degree histogram (b) column degree histogram (c) leading 8 singular values

Figure 4: Row (column) degree distribution and leading singular values for Political blogs
network.

We plot the histograms of row (column) degrees and the leading 8 singular values of
A ∈ R1222×1222 in Figure 4. In practice, we find that for the Political blog network with
1222 nodes, some degrees of row nodes and column nodes may be 0 (see Panels (a) and (b)
of Figure 4). Especially, if we do not remove nodes with degree 0 and apply BiMPCA on
the adjacency matrix directly, for any row node ĩ (or column node j̃) with degree 0, the
ĩ-th row of vector Û (or the j̃-th row of vector V̂ ) is a vector with all entries being 0, which
leads a result that Π̂r (ĩ, :) (or Π̂c (j̃, :)) has no value. Therefore, to apply our algorithm on
the network, we need to remove row nodes and column nodes with zero degree, that is, we
only care nodes (either row nodes or column nodes) with degrees at least one. What’s more,
let nedges be the degrees for both row nodes and column nodes. If we remove all row nodes
and column nodes with degree less than nedges , then we obtain a new adjacency matrix
Anedges with all nodes have degree at least nedges . Such pre-processing can be achieved by
Algorithm 2.

Algorithm 2 Pre-processing for real-world network to obtain a network whose nodes have
degree at least nedges
Require: The adjacency matrix A ∈ Rnr ×nc and nedges (we only care about nodes with
degree at least nedges ), initially set Anedges = A ∈ Rnr ×nc .
Ensure: The newly adjacency matrix Anedges .
1: Find the row nodes set Sr = {ĩ : row node ĩ has degree at least ne }, the column nodes
set Sc = {j̃ : column node j̃ has degree at least ne } from Anedges .
2: Set Anedges = Anedges (Sr , Sc ).
3: Repeat step 1 and step 2 until all nodes in Anedges have degree at least nedges .

Meanwhile, since both row nodes and column nodes in Political blog network denote
blogs, hence there are some common nodes in rows and columns in Anedges , we pick out
these common nodes from Anedges and denote the smaller (compared with Anedges ) adjacency
matrix as Anedges ,common . For instance, if we only care about nodes with degree at least 1
(i.e., nedges = 1) for Poltical blog network, apply Algorithm 2 on A ∈ R1222×1222 , we obtain
a network with A1 ∈ R1064×989 , i.e., there are 1064 row nodes and 989 column nodes in
A1 (since nedges = 1, Anedges = A1 ) and all nodes have degree at least 1. Only using the

15
Qing and Wang

common row nodes and column nodes in A1 , we can obtain a square A1,common such that
A1,common is in R831×831 .
Without causing confusion, in the part of applications to real data sets below, we use A
to denote Anedges or Anedges ,common unless specified. After the pre-processing of the Political
blogs network by Algorithm 2, we define five statistics we care as the following.
• Fraction of estimated pure row nodes µr and fraction of estimated pure
column nodes µc : Since there are two communities liberal and conservative in Polti-
cal blogs network, K is chosen as 2 2 . Then we run BiMPCA on A with K = 2.
Let nr , nc be the number of row nodes and column nodes in A, respectively. Let
Π̂r ∈ Rnr ×K be the estimated row membership matrix, Π̂c ∈ Rnc ×K be the estimated
column membership matrix obtained by BiMPCA with input (A, K). Let nr,0 be the
n
number of pure nodes in Π̂r , nc,0 be the number of pure nodes in Π̂c , hence nr,0
r
(set
nc,0
it as µr ) is the fraction of estimated pure row nodes by BiMPCA, and nc (set it as
µc ) is the fraction of estimated pure column nodes by BiMPCA.
• Fraction of estimated highly mixed row nodes νr and fraction of estimated
highly mixed column nodes νc : For any estimated membership Π̂m×K , let Π̂(i, :)
be the estimated PMF of node i. Let πi,k be the k-th largest entry in Π̂(i, :) for
πi,2
1 ≤ k ≤ K. Let ν be a m × 1 vector such that ν(i) = πi,1 is the diversity of node i.
The definition of diversity of node i can be understood as: if node i is pure, then its
diversity is 0; if node i is highly mixed, then its diversity is close to 1; and if node i
belongs to multiple communities with equal probability, then its diversity is 1. Hence,
ν(i) ∈ [0, 1], and a smaller ν(i) means that nodeP
i is more purer. We P say that node
nr nc
1 } 1{ν(j)≥0.5 }
i is highly mixed if ν(i) ≥ 0.5. And let νr = i=1 {ν(i)≥0.5
nr and νc = j=1 nc
be the proportions of highly mixed row nodes and highly mixed column nodes in a
network, respectively.
• MHamm---the measurement of structural difference in row clusters and column
clusters: When row nodes and column nodes share common nodes (i.e., nr = nc in
Anedges ,common ), we can compare Π̂r and Π̂c directly to see that whether the structure
of row clusters differs from the structure of column clusters. To compare Π̂r and Π̂c ,
we simply use the mixed-Hamming error rate computed as
minO∈S kΠ̂r O − Π̂c k1
MHamm = ,
nr
and we refer MHamm as the measurement of structural difference in row clusters and
column clusters 3 .
In Table 1, we report µr , µc , νr , νc and M Hamm obtained by applying our BiMPCA on
Political blogs network with Anedges and Anedges ,common when K = 2 with different values of
2. Such choice of K is also consistent with the results of Panel (c) in Figure 4, which say that there is a
singular value gap between the 2nd and 3rd singular values, suggesting that there are 2 clusters in both
row nodes and column nodes.
3. Generally speaking, for Anedges , since nr and nc are usually not equal, there is no need to consider the
structural difference in row clusters and column clusters, unless we only focus on these common nodes
in rows and columns (i.e., it is the case of Anedges ,common ).

16
BiMMSB

nedges . We see that when nedges is small (such as those nedges < 21), only a few row nodes or
column nodes in Political blogs network are highly mixed nodes, and most nodes are pure
(i.e., either liberal or conservative). When 1 ≤ nedges ≤ 25, MHamm is always close to 0.05,
suggesting that though the Political blogs network with adjacency matrix Anedges ,common is
a directed network , if nodes are in the same row cluster, then they are also highly belong
to the same column cluster, hence the row communities structure is highly consistent with
the column communities structure for Political blogs network. Furthermore, for adjacency
matrices generated from nedges ≤ 25, most of the MHamms are close to 0.05, which also
suggests that most blogs, even if we shrink the size of the network by focusing on blogs
with degree at least nedges , have firm political stances. However, when nedges ≥ 26, we
see that MHamm is larger, meanwhile the fractions of highly mixed row nodes and column
nodes are also much larger than the case that when nedges < 26, suggesting that if we only
consider blogs with high degrees (their degrees are larger than 25), then these blogs are
highly mixed (i.e., they are neutral). However, the total number of such blogs is small
compared to 1222 (the size of the giant component part of Political blogs network). For
example, when nedges = 26, this number of such blogs is 65 and A26 has 8 highly mixed
row nodes and 10 highly mixed column nodes; when nedges = 27, this number of such blogs
is 62 and A27 has 8 highly mixed row nodes and 13 highly mixed column nodes; when
nedges = 28, this number of such blogs is 49 and A28 has 6 highly mixed row nodes and
6 highly mixed column nodes. And these highly mixed (neutral) blogs account for a large
proportion compared to their network’s size. Thus in this Political blogs network, only a
small fraction (less than 65/1222) of blogs have highly mixed memberships (neutral) and
such blogs always have large degrees (with degree bigger than 25). Meanwhile, we can
also find that the fraction of pure nodes (as well as the fraction of highly mixed nodes)
in Anedges is always close to that of Anedges ,common , suggesting that most blogs have firm
political stances.

6.2 Papers Citations network

Ji and Jin (2016) collected a network data set for statisticians, based on all publised papers
in Annals of Statistics, Biometrika, JASA, and JRSS-B, from 2003 to the first half of
2012. In this paper, we focus on the Papers Citations network with adjacency matrix
A ∈ R3248×3248 , where each node is a paper, and A(i, j) is 1 if and only if i is cited
by paper j, and 0 otherwise. The Papers Citations network can be downloaded from
http://www.stat.cmu.edu/~jiashun/StatNetwork/PhaseOne/. From the definition of
edge in this network, the Paper Citations network is directed.

We also plot the histograms of row (column) degree, and the leading 8 singular values
of A ∈ R3248×3248 in Figure 5. Panel (a) of Figure 5 says that most papers are not cited by
the papers in this data set (actually, there are 1693 papers are not cited by papers in this
data set). Panel (b) of Figure 5 says that most papers do not cite any paper in this data
set (actually, there are 1450 papers do not cite papers in this data set). Panel (c) says that
there is a singular value gap between the 2nd and 3rd singular values, suggesting that there
may exist 2 clusters in both row nodes and column nodes, and we take K = 2 for Papers
Citations network based on this finding.

17
Qing and Wang

A µr νr µc νc MHamm
A1 ∈ R1064×989 793/1064 74/1064 825/989 46/989 -
A1,common ∈ R831×831 633/831 57/831 700/831 38/831 0.0947
A2 ∈ R936×771 708/936 63/936 652/771 33/771 -
A2,common ∈ R642×642 510/642 42/642 547/642 29/642 0.0733
A3 ∈ R835×647 515/835 80/835 547/647 27/647 -
A3,common ∈ R545×545 365/545 40/545 463/545 23/545 0.0820
A4 ∈ R762×581 473/762 74/762 509/581 22/581 -
A4,common ∈ R479×479 325/479 34/479 325/479 18/479 0.0712
A5 ∈ R700×519 471/700 53/700 411/519 16/519 -
A5,common ∈ R415×415 300/415 25/415 333/415 13/415 0.0568
A6 ∈ R657×470 424/657 30/657 371/470 14/470 -
A6,common ∈ R368×368 246/368 12/368 295/368 11/368 0.0542
A7 ∈ R613×426 341/613 34/613 359/426 6/426 -
A7,common ∈ R321×321 184/321 13/321 273/321 5/321 0.0579
A8 ∈ R578×398 351/578 30/578 335/398 5/398 -
A8,common ∈ R291×291 173/291 11/291 247/291 4/291 0.0539
A9 ∈ R539×368 322/539 27/539 288/368 4/368 -
A9,common ∈ R260×260 149/260 9/260 208/260 3/260 0.0558
A10 ∈ R502×336 319/502 25/502 251/336 2/336 -
A10,common ∈ R227×227 137/227 8/227 176/227 1/227 0.0517
A11 ∈ R464×316 306/464 23/464 246/316 1/316 -
A11,common ∈ R203×203 124/203 7/203 162/203 1/203 0.0489
A12 ∈ R439×300 305/439 22/439 192/300 2/300 -
A12,common ∈ R185×185 121/185 7/185 120/185 1/185 0.0473
A13 ∈ R414×289 264/414 16/414 204/289 2/289 -
A13,common ∈ R177×177 102/177 6/177 126/177 1/177 0.0511
A14 ∈ R384×278 233/384 21/384 195/278 1/278 -
A14,common ∈ R164×164 89/164 7/164 114/164 1/164 0.0540
A15 ∈ R355×265 232/355 17/355 188/265 1/265 -
A15,common ∈ R152×152 90/152 6/152 108/152 1/152 0.0514
A16 ∈ R331×256 206/331 19/331 181/256 1/256 -
A16,common ∈ R146×146 84/146 6/146 105/146 1/146 0.0519
A17 ∈ R308×243 177/308 12/308 167/243 1/243 -
A17,common ∈ R137×137 68/137 4/137 96/137 1/137 0.0554
A18 ∈ R289×231 171/289 11/289 162/231 1/231 -
A18,common ∈ R125×125 62/125 4/125 91/125 1/125 0.0552
A19 ∈ R274×221 146/274 14/274 155/221 1/221 -
A19,common ∈ R121×121 55/121 3/121 89/121 1/121 0.0626
A20 ∈ R258×205 176/258 10/258 139/205 1/205 -
A20,common ∈ R112×112 67/112 3/112 78/112 1/112 0.0462
A21 ∈ R236×192 156/236 10/236 130/192 1/192 -
A21,common ∈ R98×98 56/98 3/98 68/98 1/98 0.0518
A22 ∈ R216×184 140/216 10/216 125/184 1/184 -
A22,common ∈ R94×94 56/94 1/94 67/94 1/94 0.0450
A23 ∈ R185×161 115/185 8/185 107/161 1/161 -
A23,common ∈ R77×77 41/77 1/77 52/77 1/77 0.0475
A24 ∈ R166×145 91/166 6/166 91/145 1/145 -
A24,common ∈ R69×69 31/69 1/69 44/69 1/69 0.0563
A25 ∈ R116×111 73/116 9/116 73/111 1/111 -
A25,common ∈ R48×48 32/48 1/48 34/48 1/48 0.0415
A26 ∈ R65×61 31/65 8/65 32/61 10/61 -
A26,common ∈ R31×31 14/31 3/31 14/31 5/31 0.3509
A27 ∈ R62×57 30/62 8/62 22/57 13/57 -
A27,common ∈ R27×27 12/27 4/27 10/27 7/27 0.2630
A28 ∈ R49×51 19/49 6/49 22/51 6/51 -
A28,common ∈ R19×19 7/19 2/19 8/19 4/19 0.3229

Table 1: Numerical results of BiMPCA on Political blogs network with K = 2.

18
BiMMSB

The histogram of row degree The histogram of column degree Leading 8 singular values
2000 1500 16

14
1500
1000
12
1000
10
500
500
8

0 0 6
0 20 40 60 80 0 5 10 15 20 25 0 2 4 6 8

(a) row degree histogram (b) column degree histogram (c) leading 8 singular values

Figure 5: Row (column) degree distribution and leading singular values for Papers Citations
network.

Similar as the processing of the Political blogs network, we need to remove row nodes
and column nodes with degree 0 in Papers Citations network, and we also pay our at-
tention to nodes with degree at least nedges . With different choice of nedges and K = 2,
we apply our BiMPCA on Papers Citations network with Anedges and Anedges ,common , and
we report µr , µc , νr , νc and M Hamm in Table 2. From the table we can find that when
nedges is 1, around 52% to 56% papers (whether in row position or column position) are
pure, and around 6.5% to 8% papers are highly mixed. MHamm is 0.1616, suggesting that
the structure in the citing clusters is slightly different from that of cited clusters. Similar
analysis holds when nedges is 2, except that the MHamm is 0.1014 (smaller than 0.1616).
Thus, if papers are cited by or citing at least two papers, the structure of cited clusters
are much similar to the citing clusters than the case of papers with at least 1 degree. This
phenomenon also occurs in the case when nedges is 3, 4 and 5. However, if we further
focus on the network formed by papers with at least 6 edges, we see that the MHamm is
a slightly large number (0.1191, which is larger than 0.0674 and 0.0341), suggesting that
papers with large degree tend to be more mixed. This phenomenon can also be found in
the four statistics of A6 , which have 4 highly mixed row nodes among 27 row nodes, and 14
highly mixed column nodes among 44 column nodes. Meanwhile, this phenomenon is also
consistent with the findings in Table 4 and the empirical experience that nodes with high
degree tend to be more mixed than those with low degree in a network.

7. Discussions

In this paper, we introduced bipartite mixed membership stochastic blockmodel (BiMMSB),


where row nodes of an adjacency matrix may differ from column nodes for a given directed
network. BiMMSB allows that row nodes have mixed row memberships and column nodes
also have mixed column memberships. We propose a spectral algorithm BiMPCA based
on the SVD, SP algorithm and membership reconstruction skills. The theoretical results of
BiMPCA show the proposed method is consistent. Through the applications on the directed
Political blogs network and the Papers Citations network, BiMPCA reveals new insights on
the asymmetries in the structure of these networks, that is, the structure of row clusters is

19
Qing and Wang

A µr νr µc νc MHamm

A1 ∈ R1555×1798 880/1555 123/1555 1001/1798 134/1798 -


A1,common ∈ R883×883 462/883 68/883 471/883 58/883 0.1616
A2 ∈ R838×1035 397/838 75/838 511/1035 72/1035 -
A2,common ∈ R347×347 168/347 21/347 168/347 21/347 0.1014
A3 ∈ R367×493 186/367 26/367 250/493 26/493 -
A3,common ∈ R121×121 61/121 7/121 59/121 5/121 0.0676
A4 ∈ R148×227 78/148 5/148 117/227 4/227 -
A4,common ∈ R42×42 23/42 1/42 19/42 1/42 0.0341
A5 ∈ R77×125 39/77 6/77 54/125 5/125 -
A5,common ∈ R13×13 7/13 1/13 3/13 1/13 0.0674
A6 ∈ R27×44 20/27 4/27 15/44 14/44 -
A6,common ∈ R4×4 2/4 1/4 1/4 2/4 0.1191

Table 2: Numerical results of BiMPCA on Papers Citations network with K = 2.

consistent with that of column clusters as long as the fraction of highly mixed nodes (i.e.,
nodes with large degrees) is small.
However, there is one limitation of our BiMMSB model. Though BiMMSB allows row
nodes can differ from column nodes (hence the number of row nodes can also be differ
from the number of column nodes), it assumes that the number of row clusters equals to
the number of column clusters. In next subsection, we provide a more general case of
BiMMSB, i.e., the number of row clusters may be different from that of column clusters.

7.1 The general bipartite mixed membership stochastic blockmodel


(GBiMMSB)
We consider the general bipartite mixed membership stochastic blockmodel (GBiMMSB
for short) with bi-adjacency matrix A ∈ {0, 1}nr ×nc such that for each entry, A(i, j) = 1
if there is a directional edge from node i to node j, and A(i, j) = 0 otherwise, where r, c
denote row and column, respectively (the followings are similar). We assume that the row
nodes of A belong to Kr perceivable disjoint row communities

Cr(1) , Cr(2) , . . . , Cr(Kr ) , (12)

and the column nodes of A belong to Kc perceivable disjoint column communities

Cc(1) , Cc(2) , . . . , Cc(Kc ) . (13)

Define the nr × Kr row nodes membership matrix Πr and the nc × Kc column nodes
membership matrix Πc such that Πr (i, :) is a 1 × Kr Probability Mass Function (PMF) for
row node i and

P(row node i belongs to Cr(k) ) = Πr (i, k), 1 ≤ k ≤ Kr , 1 ≤ i ≤ ns , (14)

and Πc (i, :) a 1 × Kc Probability Mass Function (PMF) for column node j and

P(column node j belongs to Cc(l) ) = Πc (j, l), 1 ≤ l ≤ Kc , 1 ≤ j ≤ nc . (15)

20
BiMMSB

Define an non-negative matrix P ∈ RKr ×Kc such that for any 1 ≤ k ≤ Kr , 1 ≤ l ≤ Kc ,

P (k, l) ∈ [0, 1]. (16)

For any fixed pair of (i, j), GBiMMSB assumes that

P(A(i, j) = 1|i ∈ Cr(k) & j ∈ Cc(l) ) = P (k, l). (17)

Then, for all pairs of (i, j) with 1 ≤ i ≤ nr , 1 ≤ j ≤ nc , A(i, j) are Bernoulli random
variables that are independent of each other, satisfying
Kr X
X Kc
P(A(i, j) = 1) = Πr (i, k)Πc (j, l)P (k, l). (18)
k=1 l=1

Definition 10 Call (12)-(18) the General Bipartite Mixed Membership Stochastic Block-
model(GBiMMSB) model and denote it by GBiM M SB(nr , nc , Kr , Kc , P, Πr , Πc ).

When rank(P ) = K where K is defined as K = min{Kr , Kc } and there is at least one


pure node for each row and column clusters, following a similar proof as that for BiMMSB,
we can show that GBiMMSB is identifiable, and we omit the details here. Similarly, let
Ω = E(A) = Πr P Π′c be the expectation matrix of A, then define a Kr × Kr matrix Fr as
Fr = P Π′c Ω′ Πr , note that rank(Fr ) = K. Define a Kc × Kc matrix Fc as F c = P ′ Π′r ΩΠc ,
note that rank(Fc ) = K, then we have the following lemma

Lemma 11 Under GBiM M SB(nr , nc , Kr , Kc , P, Πr , Πc ), assume that for each row (col-
umn) community has at least one pure node. Let Ω = U ΛV ′ be the compact singular value
decomposition of Ω such that U ∈ Rnr ×K , V ∈ Rnc ×K , then there exist an unique Kr × K
matrix Br and an unique Kc × K matrix Bc such that for 1 ≤ k, l ≤ K,

• U = Πr Br where Br (:, k) is the k-th right eigenvector of Fr .

• V = Πc Bc where Bc (:, j) is the j-th right eigenvector of Fc .

The proof of Lemma 11 is similar as that of Lemma 3, and we omit the details here.
When Kr = Kc , then GBiMMSB is BiMMSB. However, when Kr 6= Kc (say, Kr > Kc ),
GBiMMSB differs from BiMMSB significantly because even we know the fact that U =
Πr Br , we can not obtain Πr by setting it as U Br′ (Br Br′ )−1 since the inverse of Br Br′ does
not exist when Kr > Kc . Therefore, it is challenging to design algorithms under GBiMMSB
to estimate Πr and Πc . We leave these works for our future study.
Recall that K is assumed to be known in this paper, actually for a given adjacency
matrix A ∈ Rnr ×nc from an empirical network, K is always unknown, hence it needs to be
estimated. We leave the estimation of K for our future work.

21
Qing and Wang

References
Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 u.s. election:
divided they blog. In Proceedings of the 3rd international workshop on Link discovery,
pages 36–43, 2005.

Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, and Eric P. Xing. Mixed mem-
bership stochastic blockmodels. Journal of Machine Learning Research, 9:1981–2014,
2008.

Punam Bedi and Chhavi Sharma. Community detection in social networks. Wiley Inter-
disciplinary Reviews: Data Mining and Knowledge Discovery, 6(3):115–135, 2016.

Yudong Chen, Xiaodong Li, and Jiaming Xu. Convexified modularity maximization for
degree-corrected stochastic block models. Annals of Statistics, 46(4):1573–1602, 2018.

Jennifer A. Dunne, Richard J. Williams, and Neo D. Martinez. Food-web structure and
network theory: The role of connectance and size. Proceedings of the National Academy
of ences of the United States of America, 99(20):12917, 2002.

Jing Gao, Feng Liang, Wei Fan, Chi Wang, Yizhou Sun, and Jiawei Han. On commu-
nity outliers and their efficient detection in information networks. In Proceedings of the
16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,
pages 813–822, 2010.

Nicolas Gillis and Stephen A. Vavasis. Semidefinite programming based preconditioning


for more robust near-separable nonnegative matrix factorization. SIAM Journal on Op-
timization, 25(1):677–698, 2015.

Anna Goldenberg, Alice X. Zheng, Stephen E. Fienberg, and Edoardo M. Airoldi. A survey
of statistical network models. Foundations and Trends® in Machine Learning archive, 2
(2):129–233, 2010.

P.K. Gopalan and D.M. Blei. Efficient discovery of overlapping communities in massive net-
works. Proceedings of the National Academy of Sciences of the United States of America,
110(36):14534–14539, 2013.

Paul W. Holland, Kathryn Blackmond Laskey, and Samuel Leinhardt. Stochastic block-
models: First steps. Social Networks, 5(2):109–137, 1983.

Pengsheng Ji and Jiashun Jin. Coauthorship and citation networks for statisticians. The
Annals of Applied Statistics, 10(4):1779–1812, 2016.

Jiashun Jin. Fast community detection by SCORE. Annals of Statistics, 43(1):57–89, 2015.

Jiashun Jin, Zheng Tracy Ke, and Shengming Luo. Estimating network memberships by
simplex vertex hunting. arXiv: Methodology, 2017.

Bingyi Jing, Ting Li, Ningchen Ying, and Xianshi Yu. Community detection in sparse
networks using the symmetrized laplacian inverse matrix (slim). Statistica Sinica, 2021.

22
BiMMSB

Brian Karrer and M. E. J. Newman. Stochastic blockmodels and community structure in


networks. Physical Review E, 83(1):16107, 2011.
Andrea Lancichinetti and Santo Fortunato. Community detection algorithms: a compara-
tive analysis. Physical Review E, 80(5):056117, 2009.
Woosang Lim, Rundong Du, and Haesun Park. Codinmf: Co-clustering of directed graphs
via nmf. In AAAI, pages 3611–3618, 2018.
Wangqun Lin, Xiangnan Kong, Philip S Yu, Quanyuan Wu, Yan Jia, and Chuan Li. Com-
munity detection in incomplete information networks. In Proceedings of the 21st Inter-
national Conference on World Wide Web, pages 341–350, 2012.
Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. On mixed memberships
and symmetric nonnegative matrix factorizations. International Conference on Machine
Learning, pages 2324–2333, 2017.
Xueyu Mao, Purnamrita Sarkar, and Deepayan Chakrabarti. Estimating mixed member-
ships with sharp eigenvector deviations. Journal of the American Statistical Association,
pages 1–13, 2020.
M. E. J. Newman. Coauthorship networks and patterns of scientific collaboration. Proceed-
ings of the National Academy of Sciences, 101(suppl 1):5200–5205, 2004.
Richard A Notebaart, Frank HJ van Enckevort, Christof Francke, Roland J Siezen, and
Bas Teusink. Accelerating the reconstruction of genome-scale metabolic networks. BMC
Bioinformatics, 7:296, 2006.
Clara Pizzuti. Ga-net: A genetic algorithm for community detection in social networks.
In International Conference on Parallel Problem Solving from Nature, pages 1081–1090.
Springer, 2008.
Tai Qin and Karl Rohe. Regularized spectral clustering under the degree-corrected stochas-
tic blockmodel. Advances in Neural Information Processing Systems 26, pages 3120–3128,
2013.
Zahra S. Razaee, Arash A. Amini, and Jingyi Jessica Li. Matched bipartite block model
with covariates. Journal of Machine Learning Research, 20(34):1–44, 2019.
Karl Rohe, Tai Qin, and Bin Yu. Co-clustering directed graphs to discover asymmetries and
directional communities. Proceedings of the National Academy of Sciences of the United
States of America, 113(45):12679–12684, 2016.
John Scott and Peter J. Carrington. The SAGE handbook of social network analysis. Lon-
don: SAGE Publications, 2014.
Gang Su, Allan Kuchinsky, John H Morris, David J States, and Fan Meng. Glay: community
structure analysis of biological networks. Bioinformatics, 26(24):3135–3137, 2010.
Joel A. Tropp. User-friendly tail bounds for sums of random matrices. Foundations of
Computational Mathematics, 12(4):389–434, 2012.

23
Qing and Wang

Zhe Wang, Yingbin Liang, and Pengsheng Ji. Spectral algorithms for community detection
in directed networks. Journal of Machine Learning Research, 21(153):1–45, 2020.

Yi Yu, Tengyao Wang, and Richard J. Samworth. A useful variant of the davis–kahan
theorem for statisticians. Biometrika, 102(2):315–323, 2015.

Yuan Zhang, Elizaveta Levina, and Ji Zhu. Detecting overlapping communities in networks
using spectral methods. SIAM Journal on Mathematics of Data Science, 2(2):265–283,
2020.

Zhixin Zhou and Arash A. Amini. Analysis of spectral clustering algorithms for community
detection: the general bipartite setting. arXiv preprint arXiv:1803.04547, 2018.

24

You might also like