Professional Documents
Culture Documents
NonNegative Matrix Factorizations For Multiplex Network Analysis
NonNegative Matrix Factorizations For Multiplex Network Analysis
NonNegative Matrix Factorizations For Multiplex Network Analysis
4, APRIL 2019
Abstract—Networks have been a general tool for representing, analyzing, and modeling relational data arising in several domains.
One of the most important aspect of network analysis is community detection or network clustering. Until recently, the major focus have
been on discovering community structure in single (i.e., monoplex) networks. However, with the advent of relational data with multiple
modalities, multiplex networks, i.e., networks composed of multiple layers representing different aspects of relations, have emerged.
Consequently, community detection in multiplex network, i.e., detecting clusters of nodes shared by all layers, has become a new
challenge. In this paper, we propose Network Fusion for Composite Community Extraction (NF-CCE), a new class of algorithms, based
on four different non-negative matrix factorization models, capable of extracting composite communities in multiplex networks. Each
algorithm works in two steps: first, it finds a non-negative, low-dimensional feature representation of each network layer; then, it fuses
the feature representation of layers into a common non-negative, low-dimensional feature representation via collective factorization.
The composite clusters are extracted from the common feature representation. We demonstrate the superior performance of our
algorithms over the state-of-the-art methods on various types of multiplex networks, including biological, social, economic, citation,
phone communication, and brain multiplex networks.
Index Terms—Multiplex networks, non-negative matrix factorization, community detection, network integration
1 INTRODUCTION
(or graphs1) along with their theoretical
N ETWORKS
foundations are powerful mathematical tools for rep-
resenting, modeling, and analyzing complex systems aris-
communities) that are more densely connected to each other
than they are to the rest of the network [4]. Even thought the
volume of research on community detection is large, e.g., [5],
ing in several scientific disciplines including sociology, [6], [7], [8], [9], [10], [11], [12], [13], the majority of these meth-
biology, physics, and engineering among others [1]. Con- ods focus on networks with only one type of relations
cretely, social networks, economic networks, biological net- between nodes (i.e., networks of single type interaction).
works, telecommunications networks, etc. are just a few However, many real-world systems are naturally repre-
examples of graphs in which a large set of entities (or sented with multiple types of relationships, or with relation-
agents) correspond to nodes (or vertices) and relationships or ships that chance in time. Such systems include subsystems
interactions between entities correspond to edges (or links). or layers of connectivity representing different modes of
Structural analysis of these networks have yielded impor- complexity. For instance, in social systems, users in social
tant findings in the corresponding fields [2], [3]. networks engage in different types of interactions (e.g., per-
Community detection (also known as graph clustering or mod- sonal, professional, social, etc.). In biology, different experi-
ule detection) is one of the foremost problems in network anal- ments or measurements can provide different types of
ysis. It aims to find groups of nodes (i.e., clusters, modules or interactions between genes. Reducing these networks to a
single type interactions by disregarding their multiple
modalities is often a very crude approximation that fails to
1. we use terms graphs and network interchangeably throughout this
capture a rich and holistic complexity of the system. In
paper
order to encompass a multimodal nature of these relations,
V. Gligorijevic was with the Department of Computing, Imperial College a multiplex network representation has been proposed [14],
London, London SW7 2AZ, United Kingdom, and now with Flatiron [15], [16]. Multiplex networks (also known as multidimen-
Institute, Simons Foundation, New York, NY 10010. sional, multiview or multilayer networks) have recently
E-mail: vgligorijevic@flatironinstitute.org.
Y. Panagakis is with the Department of Computing, Imperial College London, attracted a lot of attention in network science community.
London SW7 2AZ, United Kingdom, and also with the Department of They can be represented as a set of graph layers that share a
Computer Science, Middlesex University London, London NW4 4BT, United common set of nodes, but different set of edges in each layer
Kingdom. E-mail: i.panagakis@imperial.ac.uk. (cf. Fig. 1). With the emergence of this network representa-
S. Zafeiriou is with the Department of Computing, Imperial College London,
London SW7 2AZ, United Kingdom. E-mail: s.zafeiriou@imperial.ac.uk. tion, finding composite communities across different layers
Manuscript received 25 Jan. 2017; revised 10 Jan. 2018; accepted 12 Mar.
of multiplex network has become a new challenge [17]. We
2018. Date of publication 3 Apr. 2018; date of current version 13 Mar. 2019. refer a reader to Kivel€ a et al. [18] for a recent review on mul-
(Corresponding author: Vladimir Gligorijevic.) tiplex networks.
Recommended for acceptance by G. Elidan. Here, distinct from the previous approaches (reviewed in
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. Section 2), we focus on multiplex community detection. Con-
Digital Object Identifier no. 10.1109/TPAMI.2018.2821146 cretely, a novel and general model, namely the Network
0162-8828 ß 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
ET AL.: NON-NEGATIVE MATRIX FACTORIZATIONS FOR MULTIPLEX NETWORK ANALYSIS
GLIGORIJEVIC 929
partitioning, spectral clustering, hierarchical clustering, modular- can be represented as a set of adjacency matrices encoding
ity maximization, statistical inference and structure-based meth- connectivity patterns of individual layers, AðiÞ 2 Rnn , for
ods, as well as method that rely on non-negative matrix i ¼ 1; . . . ; N. The goal of community detection in multiplex
factorizations: networks is to infer shared, latent community assignment
that best fits all given layers. Given that each layer contains
Graph partitioning aims to group nodes into partitions incomplete and complementary information, this process of
such that the cut size, i.e., the total number of edges finding shared communities by integrating information
between any two partitions, is minimal. Two widely from all layers is also known in the literature as network inte-
used graph partitioning algorithms that also take gration (fusion) [25], [35].
into account the size of partitions are Ratio Cut and Unlike the case of monoplex networks, research on com-
Normalized Cut [30]. Graph partitioning algorithms munity detection in multiplex networks is scarce. Existing
can be alternatively defined as spectral algorithms in methods extract communities from multiplex networks first
which the objective is to partition the nodes into by aggregating the links of all layers into a single layer, and
communities based on their eigenvectors obtained
then applying a monoplex method to that single layer [25],
from eigendecomposition of graph Laplacian matrix [31].
[36], [37]. However, this approach does not account for
In hierarchical clustering the goal is to reveal network
shared information between layers and treats the noise pres-
communities and their hierarchical structure based
ent in each layer uniformly. Clearly, this is not the case in
on a similarity (usually topological) measure com-
real-world multiplex networks where each level is contami-
puted between pairs of nodes [32].
nated by different noise in terms of magnitude and, possibly,
Modularity-based algorithms are among the most
distribution. Thus, by aggregating links from different layers
popular ones. Modularity was designed to measure
the noise in the aggregated layer significantly increases,
the strength of partition of a network into communi-
resulting in a poor community detection performance.
ties. It is defined as a fraction of the edges that fall
Current state-of-the-art methods are built on monoplex
within the community minus the expected fraction
approaches and further generalized to multiplex networks.
when these edges are randomly distributed [5], [29].
They can be divided into the following categories:
Various algorithms have been proposed for modu-
larity optimization, including greedy techniques, Modularity-based approaches that generalize the
simulated annealing, spectral optimization, etc. [4]. notion of modularity from single-layer to multi-layer
Statistical inference methods aims at fitting the genera- networks. Namely, to alleviate the above mentioned
tive model to the network data based on some hypothe- limitations, the Principal Modularity Maximization
sis. Most commonly used statistical inference method (PMM) [25] has been proposed. First, for each layer,
for community detection is the stochastic block model, PMM extracts structural features by optimizing its
that aims to approximate a given adjacency matrix by modularity, and thus significantly denoising each
a block structure [33]. Each block in the model repre- layer; then, it applies PCA on concatenated matrix of
sents a community. structural feature matrices, to find the principal vec-
Structure-based methods aim to find subgraphs repre- tors, followed by K-means to perform clustering
senting meta definitions of communities. Their objec- assignment. The main drawback of this approach is
tive is to find maximal cliques, i.e., the cliques which that it treats structural feature matrices of all layers
are not the subgraph of any other clique. The union on equal basis (i.e., it is not capable of distinguishing
of these cliques form a subgraph, whose components between more and less informative network layers,
are interpreted as communities [34]. or complementary layers). Even though the noise is
More recently, graph clustering methods that rely on properly handled by this method, the complementarity
the Non-Negative Matrix Factorization [27] have been aspect cannot be captured well by the integration step.
proposed e.g., [19], [20]. Their goal is to approximate On the other hand, Mucha et al. [16] proposed a
a symmetric adjacency matrix of a given network by modularity-based method for multiplex and also
a product of two non-negative, low-rank matrices, time-dependent networks that can handle layer com-
such that they have clustering interpretation, i.e., plementarity. Namely, in addition to inter-layer con-
they can be used for assigning nodes to communi- nections, their generalized modularity function takes
ties. The proposed methods here, follow this line of into account the coupling between nodes of different
research, but as opposed to the existing methods layers. To optimize the modularity function, the
[19], [20], the NF-CCE can effectively handle multi- authors proposed Generalized Louvain (or GenLouvain,
plex networks. GL) algorithm that has been shown to be most effi-
cient algorithm for their generalized modularity func-
2.2 Multiplex Networks tion [38]. Despite the efficiency, the method is not
A multiplex network is a set of N monoplex networks (or designed to find consensus clustering assignment
layers), Gi ðV; Ei Þ, for i ¼ 1; . . . ; N. The number of nodes in across different layers; instead, it only provides node
each layer is the same, n ¼ jV j, while the connectivity pat- clustering assignment for each individual layer.
tern and the distribution of links in each layer differs, Spectral clustering approaches that generalize the
mi ¼ jEi j (see Fig. 1). Similarly to monoplex networks, we eigendecomposition from single to multiple Laplacian
consider the case where each layer represents a weighted, matrices representing network layers. One of the state-
undirected graph, i.e., Gi ðV; Ei ; wi Þ. A multiplex network of-the-art spectral clustering methods for multiplex
ET AL.: NON-NEGATIVE MATRIX FACTORIZATIONS FOR MULTIPLEX NETWORK ANALYSIS
GLIGORIJEVIC 931
graphs is the Spectral Clustering on Multi-Layer (SC- To alleviate the drawbacks of the aforementioned meth-
ML) [21]. First, for each network layer, SC-ML com- ods, the NF-CCE framework is detailed in the following sec-
putes a subspace spanned by the principal eigenvec- tion. It consists of 4 models, where each layer is first
tors of its Laplacian matrix. Then, by interpreting each denoised by computing its non-negative low-dimensional
subspace as a point on Grassmann manifold, SC-ML subspace representation. Then, the low-dimensional sub-
merges subspaces into a consensus subspace from spaces are merged into a consensus subspace whose non-
which the composite clusters are extracted. The big- negative property enables clustering interpretation. The
gest drawback of this methods is the underlying spec- models are conceptually similar to the SC-ML method, since
tral clustering, that always finds tight and small-scale they use the same merging technique to find the common
and, in some cases, almost trivial communities. For subspace of all layers.
example, SC-ML cannot adequately handle network
layers with missing or weak connections, or layers 3 PROPOSED FRAMEWORK
that have disconnected parts.
Here, we present four novel metods that are built upon
Information diffusion-based approaches that utilize the
four non-negative matrix factorization models, SNMF [19],
concept of diffusion on networks to integrate network
PNMF [46], SNMTF [47] and Semi-NMTF [48], and exten-
layers. One of such methods is Similarity Network
ded for fusion and clustering of multiplex networks. Since
Fusion (SNF) proposed by Wang et al. [39]. SNF cap-
the derivation of each method is similar, we present them in
tures both shared and complementary information in
a unified framework, namely NF-CCE. NF-CCE extracts
network layers. It computes a fused matrix from the
composite communities from a multiplex network consist-
similarity matrices derived from all layers through
ing of N layers. In particular, given N-layered multiplex
parallel interchanging diffusion process on network
network represented by adjacency matrices, fAð1Þ ; . . . ;
layers. Then, by applying a spectral clustering method
AðNÞ g, NF-CCE consists of two steps:
on the fused matrix they extract communities. Another
Step 1. For each network layer, i, we obtain its non-
widely used method that uses the concept of dynamics
negative, low-dimensional representation, HðiÞ , under column
on network (i.e., diffusion) is Multiplex Infomap [40].
orthonormality constraints i.e., HðiÞT HðiÞ ¼ I, by using any of
Namely, Multiplex Infomap optimizes the map equa-
the non-negative factorization methods mentioned above.
tion, which exploits the information-theoretic duality
Step 2. We fuse the low-dimensional representations into a
between network dimensionality reduction, and the
common, consensus representation, H, by proposing a collec-
problem of network community detection. However,
tive matrix factorization model. That is, we collectively
for noisy networks, the diffusion process, i.e., informa-
decompose all adjacency matrices, AðiÞ into a common matrix,
tion propagation, is not very efficient and it may
H, whilst enforcing the non-negative low-dimensional repre-
results in poor clustering performance [41].
sentation of network layers, HðiÞ (computed in the previous
Matrix and tensor factorization-based approaches that
step), to be close enough to the consensus low-dimensional
utilize collective factorization of adjacency matrices
representation, H. The general objective function capturing
representing network layers. A few matrix and ten-
these two properties is written as follows:
sor decomposition-based approaches have been pro-
posed so far [24], [26], [42], [43], [44]. Tang et al. [42] X
N X
N
introduced the Linked Matrix Factorization (LMF) min J ¼ J ðiÞ ðH; AðiÞ Þ þ a J ðiÞ ðiÞ
c ðH; H Þ; (1)
H0
which fuses information from multiple network i¼1 i¼1
layers by factorizing each adjacency matrix into a
layer-specific factor and a factor that is common to where, J ðiÞ is an objective function for clustering ith layer
all network layers. Dong et al. [26], introduced the and J ðiÞ
c is the loss function quantifying the inconsistency
between each low-dimensional representation HðiÞ , com-
Spectral Clustering with Generalized Eigendecom-
puted in Step 1, and the consensus representation H.
position (SC-GED) which factorizes Laplacian
Below we provide the details of the second step for each
matrices instead of adjacency matrices. Papalexakis
individual factorization technique.
et al. [43] proposed GraphFuse, a method for cluster-
ing multi-layer networks based on sparse PARAllel
FACtor (PARAFAC) decomposition [45] with non- 3.1 Collective SNMF (CSNMF)
negativity constraints. A similar approach has been We factorize each individual adjacency matrix using Sym-
adopted by Gauvin et al. [44] who used PARAFAC metric NMF in the following way:
decomposition for time-varying networks. Cheng
et al. introduced Co-regularized Graph Clustering AðiÞ HHT
based on NMF (CGC-NMF). They factorize each under the following constraints: H 0 and HT H ¼ I;
adjacency matrix using symmetric NMF while keep- where, i ¼ 1; . . . ; N.
ing the euclidean distance between their non- The first part of our general objective function in Step 2
negative low-dimensional representations small. As (Eq. (1)) has the following form:
already pointed out in Section 1, one of the major
limitations of all of these factorization methods is J ðiÞ ðH; AðiÞ Þ ¼k AðiÞ HHT k2F ; (2)
that they treat each network layer on an equal basis
and, unlike PMM or SC-ML, for example, they can- where H is the consensus, non-negative low-dimensional
not filter out irrelevant information or noise. matrix, and F denotes Frobenius norm.
932 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 41, NO. 4, APRIL 2019
TABLE 1
Multiplicative Update Rules (MUR) for Single-Layer and Multiplex Network Analysis
We tri-factorize each individual adjacency matrix using d2proj ðHi ; Hj Þ ¼ sin2 ui ¼ k cos2 ui
i¼1 i¼1
Symmetric NMTF in the following way:
¼k trðHi Hi Hj HTj Þ;
T
ðiÞ ðiÞ
A HS H T
where, fui gki¼1 are principal angles between k-dimensional
subspaces, spanðHi Þ and spanðHj Þ.
under the following constraints: H 0 and HT H ¼ I; where To find a consensus subspace, H, we factorize all the
i ¼ 1; . . . ; N. adjacency matrices, AðiÞ , and minimize the distance between
The first part of our general objective function in Step 2 their subspaces and the consensus subspace on Grassmann
(Eq. (1)) has the following form: manifold. Following this approach, we can write the second
part of our general objective function in the following way:
J ðiÞ ðH; AðiÞ ; SðiÞ Þ ¼k AðiÞ HSðiÞ HT k2F ; (4)
J ðiÞ ðiÞ T ðiÞ ðiÞT
c ðH; H Þ ¼ k trðHH H H Þ
where H is the consensus low-dimensional matrix. (5)
¼k HHT HðiÞ HðiÞT k2F :
In the derivation of our algorithm, we distinguish
between two cases. In the first case, we consider S matrix to
be non-negative, i.e., SðiÞ 0. We call that case CSNMTF. In 3.5 Derivation of the General Multiplicative
the second case, we consider SðiÞ matrix to have both posi- Update Rule
tive and negative entries. We call this case collective sym- In Step 1, we use well-known non-negative factorization tech-
metric semi-NMTF, or CSsemi-NMTF (or CSsNMTF). niques, namely SNMF, PNMF, SNMTF and Ssemi-NMTF, for
ET AL.: NON-NEGATIVE MATRIX FACTORIZATIONS FOR MULTIPLEX NETWORK ANALYSIS
GLIGORIJEVIC 933
which the update rules for computing low-dimensional non- Following Panagakis et al. [23], the natural gradient of J on
negative matrices, HðiÞ , have been provided in the correspond- Grassmann manifold at H can be written in terms of the
ing papers [19], [46], [47], [48], respectively. They are summa- ordinary gradient as follows:
rized in Table 1. As for the Step 2, we derive the update rules
for each of the collective factorization techniques presented in reH J ¼ rH J HHT rH J ; (7)
Sections 3.1, 3.2 and 3.3. The details of the derivation are given
where, rH J is the ordinary gradient given in Eq. (6).
in Section 2 of the online supplementary material, which can
Following the Karush-Kuhn-Tucker (KKT) complemen-
be found on the Computer Society Digital Library at http://
tarity condition [49] and preserving the non-negativity of H,
doi.ieeecomputersociety.org/10.1109/TPAMI.2018.2821146.
the general update rule for H matrix using the natural gra-
Here we provide a general update rule for Eq. (1).
dient is as follows:
We minimize the general objective function shown in
Eq. (1), under the following constraints: H 0 and HT H ¼ I. ½reH J
jk
Hjk Hjk (8)
Namely, we adopt the idea from Ding et al. [47] to impose ½reH J þ jk
orthonormality constraint on H matrix, i.e., HT H ¼ I; that
has been shown to lead to a more rigorous clustering inter- X
N
½reH J ¼ HHT ½rH J ðiÞ ðH; AðiÞ Þþ
pretation [47]. Moreover, assignments of network nodes to i¼1
composite communities can readily be done by examining X
N X
N
the entries in rows of H matrix. Namely, we can interpret þ ½rH J ðiÞ ðH; AðiÞ Þ þ a HðiÞ HðiÞT H
matrix Hnk as the cluster indicator matrix, where the entries i¼1 i¼1
in ith row (after row normalization) can be interpreted as a
X
N
posterior probability that a node i belongs to each of the k ½reH J þ ¼ HHT ½rH J ðiÞ ðH; AðiÞ Þ
composite communities. In all our experiments, we apply i¼1
hard clustering procedure, where a node is assign to the clus- X
N X
N
ter that has the largest probability value. þ ½rH J ðiÞ ðH; AðiÞ Þþ þ aHHT HðiÞ HðiÞT H:
We derive the update rule, for matrix H, for minimizing i¼1 i¼1
the objective function (Eq. (1)) following the procedure
The concrete update rule for each collective factorization
from the constrained optimization theory [49]. Specifically,
method is summarized in Table 1 and united within NF-
we follow the strategy employed in the derivation of NMF
CCE algorithm (see Algorithm 1).
[27] to obtain a multiplicative update rule for H matrix that
can be used for finding a local minimum of the optimization
problem (Eq. (1)). Algorithm 1. NF-CCE
The derivative of the objective function (Eq. (1)) with Input: Adjacency matrices AðiÞ for each network layer
respect to H is as follows: i ¼ 1; . . . ; N; number of clusters k; parameter a; factoriza-
tion technique: FACTORIZATION 2 {SNMF, PNMF,
X
N X
N SNMTF}
rH J ¼ rH J ðiÞ ðH; AðiÞ Þ a HðiÞ HðiÞT H; (6) Output: Consensus cluster indicator matrix H
i¼1 i¼1
switch (FACTORIZATION)
where the first term under summation can be decomposed case ’SNMF’:
into two non-negative terms, namely: for i 2 ½1; N do
HðiÞ FACTORIZATIONðAðiÞ ; kÞ
rH J ðiÞ ðH; AðiÞ Þ ¼ ½rH J ðiÞ ðH; AðiÞ Þþ ½rH J ðiÞ ðH; AðiÞ Þ ; end for P
ðiÞ
Aavg N
i¼1 A þ a2 HðiÞ HðiÞT
where, ½rH J ðiÞ ðH; AðiÞ Þþ 0, ½rH J ðiÞ ðH; AðiÞ Þ 0 are H FACTORIZATIONðAavg ; kÞ
non-negative terms. Depending on the type of collective fac- case ’PNMF’:
torization technique represented in Sections 3.1, 3.2 or 3.3, for i 2 ½1; N do
the first term represents the derivative of the corresponding HðiÞ FACTORIZATIONðAðiÞ ; kÞ
objective function, i.e., Eqs. (2), (3) or (4), respectively. end for P
ðiÞ ðiÞT
The second term represents a derivative of Eq. (5) with Aavg N
i¼1 A A þ aHðiÞ HðiÞT
respect to H. H FACTORIZATIONðAavg ; kÞ
To incorporate the orthonormality constraint into the case ’SNMTF’:
for i 2 ½1; N do
update rule, we introduce the concept of natural gradient by
(HðiÞ ; SðiÞ Þ FACTORIZATIONðAðiÞ ; kÞ
following the work of Panagakis et al. [23]. Namely, we
end for
shown in Section 3.4 that columns of H matrix span a ðiÞ N
H CSNMTFðfAðiÞ gN ðiÞ N
i¼1 ; fH gi¼1 ; fS gi¼1 ; kÞ
vector subspace known as Grassmann manifold Gðk; nÞ, i.e., end switch
spanðHÞ 2 Gðk; nÞ [23]. Using that, Amari in [22] has
showed that when an optimization problem is defined over
a Grassmann manifold, the ordinary gradient of the optimi-
zation function (Eq. (6)) does not represent its steepest 4 EXPERIMENTS
direction, but natural gradient does [22]. We test our methods on synthetic, as well as on real-world
Therefore, we define a natural gradient to optimize our data. We designed synthetic multiplex networks with clear
objective function (1) under the orthornormality constraint. ground truth information and different properties in terms
934 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 41, NO. 4, APRIL 2019
of noise and complementary information of network layers. community probability p12 ¼ f0:02; 0:04; . . . ; 0:2g of the first
The goal is to address the robustness of our methods against layer, while keeping the between-community probability of
noise and their ability to handle complementary informa- the second layer fixed, p12 ¼ 0:02, across all multiplex
tion contained in layers. The real-world multiplex networks networks.
are taken from diverse experimental studies to demonstrate For each set of within-community, pii , and between-
the applicability of our methods in a broad spectrum of dis- community, pij , probabilities, we generate 100 different ran-
ciplines. Namely, we consider social and biological net- dom realizations of multiplex networks. This allows us to
works, networks of mobile phone communications, brain compute the average performance and the standard error
networks and networks constructed from bibliographic for each artificial multiplex networks. The spy plots of
data. The biological networks are treated as a special case adjacency matrices representing layers of SYNTH-C and
because of their lack of ground truth clusters. We provide SYNTH-N are given in the Section 1 of the online supple-
detailed analysis of such networks based on the functional mentary material, available online.
annotations of their nodes (genes). We present results of
comparative analysis of our proposed methods against 4.2 Real-World Multiplex Networks
state-of-the-art methods described in Section 2.2. Specifically, Below we provide a brief description of real-world multi-
we compare our methods against PMM, SC-ML, SNF, LMF, plex networks used in our comparative study:
GraphFuse and CGC-NMF. Moreover, we adopt the follow- Bibliographic Data, CiteSeer. The data are adopted from
ing single-layer methods, SNMF, SNMTF, PNMF and MM [51]. The network consist of N ¼ 3; 312 papers belonging to
(modularity maximization) to be our baseline methods. 6 different research categories, that we grouped into k ¼ 3
pairs categories. We consider these categories as the ground
4.1 Synthetic Multiplex Networks truth classes. We construct two layers: citation layer, repre-
We generate two sets of synthetic multiplex networks. First senting the citation relations between papers extracted from
type, that we denote SYNTH-C, is designed to demonstrate the paper citation records; and the paper similarity layer,
complementary information in layers; whereas the second constructed by extracting a vector of 3,703 most frequent
type, that we denote SYNTH-N, is designed to demonstrate and unique words for each paper, and then computing the
different levels of noise between communities contained in cosine similarity between each pair of papers. We construct
layers. Our synthetic networks are generated by using planted the k-nearest neighbor graph from the similarity matrix by
partition model [50]. The procedure is as follows: we choose the connecting each paper with its 10 most similar papers.
total number of nodes n partitioned into N communities of Bibliographic Data, CoRA. The data are adopted from [51].
equal or different sizes. For each layer, we split the corre- The network consists of 1,662 machine learning papers
sponding adjacency matrix into blocks defined by the parti- grouped into k ¼ 3 different research categories. Namely,
tion. Entries in each diagonal block, are filled with ones Genetic Algorithms, Neural Networks and Probabilistic
randomly, with probability pii , representing the within-com- Methods. We use the same approach as for CiteSeer dataset
munity probability or also referred as community edge density. to construct the citation and similarity layers.
We also add random noise between each pair of blocks, ij, Mobile Phone Data (MPD). The data are adopted from
with probability pij , representing between-community probabil- [26]. The network consists of N ¼ 3 layers representing dif-
ity. The larger the values of pij are the harder the clustering is. ferent mobile phone communications between n ¼ 87
Similarly, the smaller the values of pii are the harder the phone users on the MIT campus; namely, the layers repre-
clustering is. We vary these probabilities across the layers sent physical location, bluetooth scans and phone calls. The
to simulate complementary information and noise in the ground truth clusters are known and manually annotated.
following way: Social network data (SND). The data are adopted from [52].
SYNTH-C. We generate two-layer multiplex networks The dataset represents the multiplex social network of a cor-
with n ¼ 200 nodes and N ¼ 2 communities with equal num- porate law partnership, consisting of N ¼ 3 layers having
ber of nodes each. We generate 11 different multiplex net- three types of edges, namely, co-work, friendship and
works with different amounts of information between two advice. Each layer has n ¼ 71 nodes representing employees
layers. Namely, we vary the within-community probability in a law firm. Nodes have many attributes. We use the loca-
p11 ¼ f0:05; 0:075; . . . ; 0:3g of the first community of the first tion of employees’ offices as well as their status in the firm
layer across different multiplex networks, while fixing the as the ground truth for clusters and perform two different
within-community probability of the second community, experiments, namely SND(o) and SND(s) respectively.
p22 ¼ 0:2. In the second layer, we represent the complemen- Worm Brain Networks (WBN). The data are retrieved from
tary information by fixing the within-community probability WormAtlas,2 i.e., from the original study of White et al. [53].
of the first community to p11 ¼ 0:2 and varying within-cluster The network consist of n ¼ 279 nodes representing neurons,
probability of the second community p22 ¼ f0:05; 0:075; . . . ; connected via N ¼ 5 different types of links (i.e., layers),
0:3g across the multiplex networks. For all multiplex net- representing five different types of synapse. We use neuron
works, we set the same amount of noisy links, by fixing types as the ground truth clusters.
between-community probability to p12 ¼ 0:05. Word Trade Networks (WTN). The data represents different
SYNTH-N. Similar to SYNTH-C we generate two-layer types of trade relationships (import/export) among n ¼ 183
multiplex networks with two communities (n and N are the countries in the world [54]. The network consist of 339 layers
same as in SYNTH-C). We fix the within-community probabil- representing different products (goods). Since, for some
ity of both communities and both layers to p11 ¼ 0:3 and
p22 ¼ 0:3 across all multiplex networks. We vary the between- 2. http://www.wormatlas.org/
ET AL.: NON-NEGATIVE MATRIX FACTORIZATIONS FOR MULTIPLEX NETWORK ANALYSIS
GLIGORIJEVIC 935
TABLE 3
Clustering Accuracy Measures for Methods (from Left to Right): MM, SNMF, PNMF, SNMTF, SsNMTF, PMM, SNF, SC-ML, LMF,
GraphFuse, CGC-NMF, GL, Infomap, CSNMF, CPNMF, CSNMTF, CSsNMTF Applied on Real-World Multi-Layer Networks
(from Top to Bottom): CiteSeer, CoRA, MPD, SND, WBN, WTN, HBN, YBN and MBN. AVG Row Represents
the Average Performance of the Methods Over All Datasets
MM SNMF PNMF SNMTF SsNMTF PMM SNF SC-ML LMF GF CGC GL Infomap CSNMF CPNMF CSNMTF CSsNMTF
Purity 0.500 0.407 0.405 0.377 0.371 0.302 0.214 0.419 0.235 0.512 0.212 0.669 0.509 0.501 0.519 0.416 0.404
CiteSeer NMI 0.187 0.222 0.221 0.170 0.164 0.145 0.023 0.191 0.013 0.211 0.013 0.323 0.265 0.237 0.216 0.172 0.195
ARI 0.152 0.059 0.057 0.042 0.038 0.008 0.001 0.169 0.005 0.201 0.001 0.155 0.157 0.207 0.185 0.093 0.089
Purity 0.706 0.669 0.660 0.669 0.669 0.496 0.733 0.787 0.492 0.642 0.678 0.781 0.922 0.802 0.790 0.683 0.684
CoRA NMI 0.340 0.385 0.353 0.382 0.382 0.085 0.449 0.480 0.002 0.201 0.389 0.418 0.373 0.514 0.480 0.346 0.390
ARI 0.257 0.280 0.247 0.277 0.277 0.030 0.470 0.485 0.001 0.209 0.296 0.334 0.016 0.491 0.470 0.279 0.288
Purity 0.563 0.678 0.666 0.620 0.678 0.689 0.620 0.701 0.471 0.689 0.678 0.678 0.701 0.701 0.655 0.655 0.724
MPD NMI 0.313 0.471 0.466 0.384 0.473 0.533 0.395 0.495 0.191 0.565 0.457 0.467 0.410 0.504 0.451 0.458 0.521
ARI 0.147 0.268 0.259 0.228 0.272 0.383 0.280 0.379 0.029 0.411 0.357 0.372 0.115 0.394 0.368 0.346 0.422
Purity 0.929 0.943 0.943 0.676 0.943 0.943 0.943 0.943 0.788 0.943 0.943 0.929 0.676 0.943 0.943 0.943 0.943
SND(o) NMI 0.618 0.681 0.681 0.133 0.681 0.675 0.689 0.681 0.303 0.675 0.673 0.618 0.001 0.681 0.685 0.773 0.678
ARI 0.460 0.493 0.493 0.021 0.493 0.477 0.515 0.493 0.239 0.477 0.472 0.460 0.012 0.493 0.503 0.811 0.484
Purity 0.619 0.577 0.633 0.634 0.619 0.591 0.633 0.591 0.633 0.619 0.662 0.671 0.507 0.633 0.633 0.747 0.605
SND (s) NMI 0.038 0.025 0.052 0.055 0.041 0.037 0.057 0.030 0.053 0.045 0.0781 0.097 0.001 0.053 0.053 0.276 0.034
ARI 0.024 0.012 0.058 0.058 0.043 0.022 0.058 0.021 0.059 0.044 0.092 0.089 0.001 0.059 0.059 0.234 0.031
Purity 0.473 0.512 0.501 0.476 0.523 0.473 0.534 0.272 0.283 0.509 0.516 0.566 0.401 0.577 0.537 0.548 0.530
WBN NMI 0.333 0.382 0.400 0.327 0.363 0.373 0.425 0.079 0.098 0.426 0.370 0.398 0.355 0.463 0.432 0.404 0.424
ARI 0.199 0.226 0.213 0.112 0.180 0.130 0.211 0.001 0.009 0.216 0.211 0.185 0.200 0.291 0.233 0.237 0.225
Purity 0.506 0.475 0.464 0.388 0.284 0.388 0.289 0.497 0.453 0.415 0.278 0.783 0.453 0.579 0.420 0.371 0.420
WTN NMI 0.231 0.269 0.242 0.176 0.077 0.205 0.073 0.226 0.191 0.176 0.072 0.426 0.273 0.322 0.172 0.154 0.155
ARI 0.080 0.114 0.114 0.073 0.001 0.039 0.005 0.133 0.094 0.107 0.002 0.068 0.273 0.160 0.094 0.035 0.088
Purity 0.613 0.608 0.610 0.548 0.583 0.554 0.566 0.601 0.479 0.618 0.566 0.725 0.595 0.676 0.642 0.6233 0.6157
AVG NMI 0.294 0.347 0.345 0.232 0.311 0.293 0.301 0.311 0.121 0.328 0.293 0.392 0.239 0.396 0.355 0.369 0.342
ARI 0.188 0.207 0.205 0.115 0.186 0.155 0.220 0.240 0.062 0.237 0.204 0.237 0.110 0.300 0.273 0.290 0.232
938 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 41, NO. 4, APRIL 2019
TABLE 4
Clustering Accuracy Measures for Methods (from Top to
Bottom): MM, SNMF, PNMF, SNMTF, SsNMTF, PMM, SNF, SC-
ML, LMF, GraphFuse, CGC-NMF, GL, Infomap, CSNMF,
CPNMF, CSNMTF, CSsNMTF Applied on Biological
Multi-Layer Networks (from Left to Right): HBN, YBN and MBN
[31] U. von Luxburg, “A tutorial on spectral clustering,” Statist. [57] C. D. Manning, P. Raghavan, and H. Sch€ utze, Introduction to Infor-
Comput., vol. 17, no. 4, pp. 395–416, 2007. mation Retrieval. New York, NY, USA: Cambridge Univ. Press,
[32] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical 2008.
Learning. New York, NY, USA: Springer, 2001. [58] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler,
[33] P. W. Holland, K. B. Laskey, and S. Leinhardt, “Stochastic block- J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig,
models: First steps,” Soc. Netw., vol. 5, no. 2, pp. 109–137, 1983. M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis,
[34] M. G. Everett and S. P. Borgatti, “Analyzing clique overlap,” J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and
Connections, vol. 21, no. 1, pp. 49–61, 1998. G. Sherlock, “Gene Ontology: Tool for the unification of biology,”
[35] S. Zhang, H. Zhao, and M. K. Ng, “Functional module analysis for Nature Genetics, vol. 25, no. 1, pp. 25–29, May 2000.
gene coexpression networks with network integration,” IEEE/ [59] Y.-K. Shih and S. Parthasarathy, “Identifying functional modules
ACM Trans. Comput. Biol. Bioinformatics, vol. 12, no. 5, pp. 1146– in interaction networks through overlapping Markov clustering,”
1160, Sep. 2015. Bioinformatics, vol. 28, no. 18, pp. i473–i479, 2012.
[36] M. Berlingerio, M. Coscia, and F. Giannotti, “Finding and charac-
terizing communities in multidimensional networks,” in Proc. Int. Vladimir Gligorijevic received the BSc and MSc
Conf. Adv. Soc. Netw. Anal. Mining, Jul. 2011, pp. 490–494. degrees from the Department of Physics, Univer-
[37] M. A. Rodriguez and J. Shinavier, “Exposing multi-relational net- sity of Belgrade, Serbia, and the PhD degree
works to single-relational network analysis algorithms,” J. Infor- from the Department of Computing, Imperial Col-
metrics, vol. 4, no. 1, pp. 29–41, 2010. lege London, United Kingdom. He is a research
[38] I. S. Jutla, L. G. Jeub, and P. J. Mucha, “A generalized louvain fellow with the Center for Computational Biology
method for community detection implemented in matlab,” 2011. (CCB), Flatiron Institute, Simons Foundation. His
[Online]. Available: http://netwiki. amath. unc. edu/GenLouvain research interests include machine learning and
[39] B. Wang, A. M. Mezlini, F. Demir, M. Fiume, Z. Tu, M. Brudno, network theory with applications in social scien-
B. Haibe-Kains, and A. Goldenberg, “Similarity network fusion ces, computational biology and bioinformatics.
for aggregating data types on a genomic scale,” Nature Methods,
vol. 11, no. 3, pp. 333–337, 2014.
[40] M. De Domenico, A. Lancichinetti, A. Arenas, and M. Rosvall, Yannis Panagakis received the BSc degree in
“Identifying modular flows on multilayer networks reveals highly informatics and telecommunication from the Uni-
overlapping organization in interconnected systems,” Phys. Rev. versity of Athens, Greece, and the MSc and PhD
X, vol. 5, Mar. 2015, Art. no. 011027. degrees from the Department of Informatics,
[41] Y. Wang and X. Qian, “Biological network clustering by robust Aristotle University of Thessaloniki. He is a lec-
NMF,” in Proc. 5th ACM Conf. Bioinformatics Comput. Biol. Health turer (assistant professor equivalent) in computer
Informat., 2014, pp. 688–689. science with Middlesex University London and a
[42] W. Tang, Z. Lu, and I. S. Dhillon, “Clustering with multiple graphs,” research faculty, with the Department of Comput-
in Proc. 9th IEEE Int. Conf. Data Mining, 2009, pp. 1016–1021. ing, Imperial College London. His research inter-
[43] E. E. Papalexakis, L. Akoglu, and D. Ienco, “Do more views of a ests include machine learning and its interface
graph help? community detection and clustering in multi- with signal processing, high-dimensional statis-
graphs,” in Proc. 16th Int. Conf. Inf. Fusion, Jul. 2013, pp. 899–905. tics, and computational optimization. Specifically, he is working on statis-
[44] L. Gauvin, A. Panisson, and C. Cattuto, “Detecting the community tical models and algorithms for robust and efficient learning from high-
structure and activity patterns of temporal networks: A non- dimensional data and signals conveying audio, visual, behavioural, med-
negative tensor factorization approach,” Plos One, vol. 9, no. 1, ical, and social information. He has been awarded the prestigious Marie-
pp. 1–13, Jan. 2014. Curie Fellowship, among various scholarships and awards for his studies
[45] R. A. Harshman, “Foundations of the PARAFAC procedure: Models and research. His work has been published leading journals and
and conditions for an” explanatory” multi-modal factor analysis,” conferences proceedings, including the IEEE Transactions on Pattern
UCLA Working Papers Phonetics, vol. 16, no. 1, 1970, Art. no. 84. Analysis and Machine Intelligence, the International Journal of Computer
[46] Z. Yang and E. Oja, “Linear and nonlinear projective nonnegative Vision, CVPR, and ICCV. He currently serves as the managing editor-in-
matrix factorization,” IEEE Trans. Neural Netw., vol. 21, no. 5, chief of the Image and Vision Computing Journal. He co-organised the
pp. 734–749, May 2010. British Machine Vision Conference (BMVC), in 2017 and numerous
[47] C. Ding, T. Li, W. Peng, and H. Park, “Orthogonal nonnegative workshops in conjunction with international conferences. He is a mem-
matrix T-factorizations for clustering,” in Proc. 12th ACM SIGKDD ber of the IEEE.
Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 126–135.
[48] C. H. Q. Ding, T. Li, and M. I. Jordan, “Convex and semi-
nonnegative matrix factorizations,” IEEE Trans. Pattern Anal. Stefanos Zafeiriou is currently a reader in pattern
Mach. Intell., vol. 32, no. 1, pp. 45–55, Jan. 2010. recognition/statistical machine learning for com-
[49] S. Boyd and L. Vandenberghe, Convex Optimization. New York, puter vision with the Department of Computing,
NY, USA: Cambridge Univ. Press, 2004. Imperial College London, London, United King-
[50] A. Condon and R. M. Karp, “Algorithms for graph partitioning on dom, and a distinguishing research fellow with the
the planted partition model,” Random Struct. Algorithms, vol. 18, University of Oulu under Finish distinguishing pro-
no. 2, pp. 116–140, Mar. 2001. fessor programme. He was a recipient of the Pres-
[51] P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and tigious Junior Research Fellowships from Imperial
T. Eliassi-Rad, “Collective classification in network data,” AI College London in 2011 to start his own indepen-
Mag., vol. 29, no. 3, pp. 93–106, 2008. dent research group. He was the recipient of the
[52] T. A. B. Snijders, P. E. Pattison, G. L. Robins, and M. S. Handcock, President’s Medal for Excellence in Research
“New specifications for exponential random graph models,” Supervision for 2016. He has received various awards during his doctoral
Sociol. Methodology, vol. 36, no. 1, pp. 99–153, 2006. and post-doctoral studies. He currently serves as an associate editor of
[53] J. G. White, E. Southgate, J. N. Thomson, and S. Brenner, “The the IEEE Transactions on Cybernetics, the Image and Vision Computing
structure of the nervous system of the nematode caenorhabditis Journal. He has been a guest editor of over six journal special issues and
elegans,” Philosoph. Trans. Roy. Soc. London B: Biological Sci., co-organised over nine workshops/special sessions on face analysis
vol. 314, no. 1165, pp. 1–340, 1986. topics in top venues, such as CVPR/FG/ICCV/ECCV (including two very
[54] M. De Domenico, V. Nicosia, A. Arenas, and V. Latora, “Structural successfully challenges run in ICCV13 and ICCV15 on facial landmark
reducibility of multilayer networks,” Nature Commun., vol. 6, 2015, localisation/tracking). He has more than 5,000 citations to his work,
Art. no. 6864. h-index 37. He is the general chair of BMVC 2017 and a member of
[55] S. Mostafavi and Q. Morris, “Fast integration of heterogeneous the IEEE.
data sources for predicting gene function with limited annota-
tion,” Bioinformatics, vol. 26, no. 14, pp. 1759–1765, 2010.
[56] Y. Zhao and G. Karypis, “Empirical and theoretical comparisons " For more information on this or any other computing topic,
of selected criterion functions for document clustering,” Mach.
please visit our Digital Library at www.computer.org/publications/dlib.
Learn., vol. 55, no. 3, pp. 311–331, Jun. 2004.