Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

GRAE: Graph Recurrent Autoencoder for Multi-view Graph

Clustering
Ercong Cai Jin Huang∗ Bosong Huang
South China Normal University South China Normal University South China Normal University
Guangzhou, Guangdong Province Guangzhou, Guangdong Province Guangzhou, Guangdong Province
China China China
caiercong@m.scnu.edu.cn huangjin@m.scnu.edu.cn

Shi Xu Jia Zhu


South China Normal University Zhejiang Normal University
Guangzhou, Guangdong Province Jinhua, Zhejiang Province, China
China
ABSTRACT ACM Reference Format:
Multi-view graph clustering aims to discover communities or groups Ercong Cai, Jin Huang, Bosong Huang, Shi Xu, and Jia Zhu. 2021. GRAE:
Graph Recurrent Autoencoder for Multi-view Graph Clustering. In 2021 4th
in the graph with multiple views, which usually can supply more
International Conference on Algorithms, Computing and Artificial Intelligence
comprehensive information than that single-view graph cluster- (ACAI’21), December 22–24, 2021, Sanya, China. ACM, New York, NY, USA,
ing. With the increasing scale of complex data from the real world, 9 pages. https://doi.org/10.1145/3508546.3508618
multi-view graph clustering has drawn much attention. It has a solid
theoretical foundation and high effectiveness in applications such 1 INTRODUCTION
as data mining and social network analysis. However, Most existing
methods obtain the clustering result only through the shared feature Graph clustering, aiming to partition the nodes in the graph into dis-
representations, defectively overlooking the unique features of mul- joint groups, is a basic task in graph analysis. Typical applications
tiple views. To fill this gap, a Graph Recurrent AutoEncoder (GRAE) include biological network [1], group segmentation [2], community
is proposed for attributed multi-view graph clustering, which can detection [3], brain network [4], and structure analysis of commu-
attain node representation well by learning different view features. nicating network [5, 6]. Nowadays, due to the rapid development
Specifically, we first design a global graph autoencoder and a partial of big data and social media, the information existing in social net-
graph autoencoder to extract the shared features and the unique works shows a trend of diversification. For example, the same user
features of all views, respectively, which can better represent the in the author network may have multiple relationships described
nodes in the graph. Then, from the perspective of representation by multiple graph views such as co-author view and co-conference
fusion, we adopt an adaptive weight learning method to fuse the view. In this sense, building multiple graph views (i.e., multi-layer
different features according to the importance of features. More- network) to represent the real-world data is more reasonable and
over, we investigate a self-training clustering method to optimize acceptable. In addition, the author can also take representative key-
a clustering objective for improving the clustering effect. Finally, words or labels such as database and mathematicians as attribute
we conducted a large number of experiments on three real-world information and sometimes treat attribute information as a par-
datasets, demonstrating the superior performance of our proposed ticular view. We call this complex graph an attribute multi-view
GRAE model on the multi-view graph clustering task. graph. Unlike single-view algorithms that only focus on dealing
with a graph view [7–10], attributed multi-view graph clustering
CCS CONCEPTS methods entirely utilize shared and complementary information
among multiple views to achieve competitive clustering results.
• Computer systems organization → Embedded systems; Re-
Figure 1 shows the generation process of co-author view and
dundancy; Robotics; • Networks → Network reliability.
co-conference view on DBLP dataset1 (An author network). Fig-
ure 1(a) gives the node types of DBLP dataset and Figure 1(b)
KEYWORDS represents the heterogeneous graph of the same dataset. In the
multi-view; graph clustering; recurrent autoencoder; adaptive weight heterogeneous graph, a relation between two co-authors can be de-
learning scribed the meta-path as Author-Paper-Author (APA). Similarly, the
Permission to make digital or hard copies of all or part of this work for personal or Author–Paper–Conference–Paper–Author (APCPA) means that
classroom use is granted without fee provided that copies are not made or distributed two authors have published papers in the same conference. Pre-
for profit or commercial advantage and that copies bear this notice and the full citation cisely because of the multi-type of edges and nodes in the heteroge-
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, neous graph, the traditional single-view graph clustering method
to post on servers or to redistribute to lists, requires prior specific permission and/or a can not be directly applied to the heterogeneous graph [11]. Hence,
fee. Request permissions from permissions@acm.org.
we only retain author-type nodes and separate different types of
ACAI’21, December 22–24, 2021, Sanya, China
© 2021 Association for Computing Machinery. edges to generate co-author view in Figure 1(c) and co-conference
ACM ISBN 978-1-4503-8505-3/21/12. . . $15.00
https://doi.org/10.1145/3508546.3508618 1 https://dblp.uni-trier.de/
ACAI’21, December 22–24, 2021, Sanya, China Trovato and Tobin, et al.

Figure 1: The generation process of co-author view and co-conference view

view in Figure 1(d) through the meta-paths APA and APCPA. We the graph. We also proposed a novel graph recurrent autoen-
can see that author D has the features of author B and C in the coder model named GRAE to capture this inherent correla-
co-author view and author A and B in the co-conference view. It is tion by exploiting rich complementary semantics among the
evident that feature B is the shared feature of author D, and features nodes of the graph.
A and C are the unique features of author D. Therefore, a graph • We design an adaptive weight learning and a self-training
with multiple views can better represent real-world graph data than clustering method, which can fuse the different features well
a graph only with a graph view [12]. and get more accurate node representation which can lead
Attributed multi-view graph clustering is one of the essential to better clustering results.
tasks of multi-view learning [13]. Although there is in-depth re- • We conduct experimental studies on three real-world datasets.
search on multi-view learning [14–16] and multiple-layer network Abundant results show that GRAE can produce a more ro-
embedding learning [17–19], few works have studied the problem of bust and accurate partition than other state-of-the-art view-
attributed multi-view graph clustering. Cen et al. [20] propound an based graph clustering methods, which demonstrates the
attributed multi-view graph embedded learning, called General At- effectiveness of our proposed techniques.
tributed Multiplex HeTerogeneous Network Embedding (GATNE),
which is a unified framework that supports both transductive and 2 NOTATIONS AND PROBLEM DEFINITION
inductive learning. Fan et al. [21] focus on the attributed multi-view
In this section, we first describe some notations and definitions and
graph clustering problem and put forward the One2Multi Graph
then give details about the proposed method.
Autoencoder (O2MAC) algorithm based on the shared features of
An attributed multi-view graph can be represented as G =
views. However, all of these methods only use the shared features of
{V , E 1 , · · · , E M , X }, where V = {v 1 , · · · , vn } consists of N nodes.
multiple views to learn the features of the graph, defectively over- M
And M sets of edges {Em }m=1 describe the interaction between
looking the unique features of multiple views also do an excellent
nodes in the corresponding M graph views. These M types of in-
favor for learning task.
teractions can be also described by adjacency matrices {Am ∈
In this paper, we put forward a novel Graph Recurrent AutoEncoder
R N ×N }m=1 M , where A m
mi j = 1 if ei j ∈ Em , otherwise Ami j = 1.
(GRAE) model for attributed multi-view graph clustering to address
the above limitations. Specifically, the GRAE model obtains the x i ∈ X indicates a real-value attribute vector associated with node
shared features and unique features of multiple views through the vi . Here we focus on the undirected and unweighted graph. For
Global Graph AutoEncoder (GGAE) and Partial Graph AutoEncoder clarity, important notations are summarized in Table 1. Then, we
(PGAE), respectively. GGAE adapts a new graph recurrent neural formally define the problem as follows:
network to learn the shared feature representation of multiple views.
At the same time, PGAE aims to attain the unique features of mul-
tiple views, which leverages the graph autoencoder model to learn Table 1: Main Symbols
the unique feature representation of the corresponding view. Next,
considering the importance of the different features, we design an Notation Definition
adaptive weight learning method to obtain the node representation M The number of graph views
X The node attributes
by fusing the shared features and the unique features. Finally, we
Am The adjacency matrix of the m-th graph view
propound a self-training clustering method to make the node rep-
Z дm The m-th embedded representation of GGAE
resentation more suitable for clustering to achieve better results. Z pm The m-th embedded representation of PGAE
We conducted a large number of experiments on three real-world Z The node representation of GRAE
datasets, proving the effectiveness of our proposed GRAE model. D The embedding dimension of global graph encoder
To summarize, our contributions can be summarized as follows: d The embedding dimension of partial graph encoder
• We emphasize the crucial significance of the inherent corre- T The maximum training times of total objective function
lation between the shared features and the unique features λ Learning rate
of all views for utilizing the comprehensive information in α The coefficient of KL divergence loss
GRAE: Graph Recurrent Autoencoder for Multi-view Graph Clustering ACAI’21, December 22–24, 2021, Sanya, China

Attributed Multi-view Graph Clustering: The attributed multi- Finally, we would like to add that {Wдm ∈ R (D+F )×D }m=2
M .
view graph clustering is designed to partition nodes in G into pre- Global Graph Decoder: A global graph decoder is proposed to
defined K disjoint clusters {C 1 , C 2 , · · · , C K }, so that nodes within reconstruct graph views by the shared feature representation Z д .
the same cluster are generally: (1) close to each other in terms of Then the global reconstruction loss is proposed to optimize and
graph views structure while distant otherwise, and (2) more likely obtain more shared feature representations. The adjacency matrices
to have similar attribute values. M are given as a prior value. Inner products
of graph views {Am }m=1
allow the rigorous introduction of intuitive geometrical notions,
3 METHODOLOGY such as the length of a vector or the angle between two vectors.
In this section, we propose a novel model for attributed multi-view An inner product associates each pair of vectors in the space with
graph clustering. The basic idea is to exploit both the shared fea- a scalar quantity known as the inner product of the vectors and
tures of all views and the unique features for each graph view. provides the means of defining orthogonality between vectors [22],
Specifically, we put forward a novel Graph Recurrent AutoEncoder so we can use the inner product between two nodes to measure their
(GRAE) model, which works as follows: (1) obtaining shared feature correlation. According to the correlation of the nodes, it is judged
representation and unique feature representations; (2) designing whether there are edges between the two nodes to reconstruct the
M . Since our data have M graph views, we
multi-view data {Am }m=1
an adaptive weight learning method to fusing two feature repre-
д
sentations as node representation; and (3) proposing a self-training added parameters {Wm ∈ R D×D }m=1 M to better reconstruct multi-
clustering to optimize node representation to improve the cluster- view data. The formula of the global graph decoder is as follows:
ing results. The overall scheme of the GRAE method is shown in д д
Âm = siдmoid(Z дWm Z дT ). (4)
Figure 2.
We then minimize the sum of reconstruction errors for each
3.1 Global Graph Autoencoder graph view by:
M
Although multi-view graph data reflect the relationship between Õ д
nodes from different aspects, they should still have some common Lд = loss(Am , Âm ), (5)
m=1
node features. Hence, we propose a Global Graph AutoEncoder
(GGAE) model to learn the shared features of multiple views. As where Lд is the global reconstruction loss for all views. Owing
shown in Figure 3, GRAE includes three parts: global graph encoder, to the multi-view architecture of the global graph decoder, in the
global graph decoder, and global reconstruction loss. backpropagation process, the gradient of the global graph decoder
Global Graph Encoder: We propose a noval Graph Recurrent will be propagated through the global graph encoder. Therefore,
Neural Network (GRNN) model as a global graph encoder to learn the global graph encoder can capture the shared representation of
shared feature representation from multiple views. As shown in all views during forwarding propagation.
Figure 3, GRNN will gradually learn the features of the view to ob-
tain more shared features. These adjacency matrices {A1 , · · · , AM }
3.2 Partial Graph Autoencoder (PGAE)
and node attributes X as the input of the global graph encoder. The As we discussed before, although we have gained the shared fea-
specific steps are described below. tures, the unique features of each view which describe the diversity
In the first layer of graph recurrent neural network, we directly of different views are core factors to improve the quality of cluster-
use the node attributes X and adjacency matrix A1 as input. Thus ing. Therefore, we propose a Partial Graph AutoEncoder (PGAE)
д
the embedded representation Z 1 is shown in Eq. (1). to learn the unique features for each view. Since the graph au-
Starting from the second layer of graph recurrent neural net- toencoder can learn the features of a graph view well [7, 23], our
work, in order to better retain the node attributes X, we fuse the proposed PGAE is composed of multiple graph autoencoders. Of
node attributes X and the hidden representation obtained from the course, when we get unique features, we will also get some shared
previous layer and use the adjacency matrix of this layer together features, which will not increase data noise and prevent us from
as the input. That is, we can calculate the hidden representation obtaining unique features but may promote node representation
д to obtain more shared features. As shown in Figure 4, the PGAE
Z 2 as shown in Eq. (2).
Finally, we get the shared feature representation Z д of the global model includes partial graph encoder, partial graph decoder, and
graph encoder as shown in Eq. (3). partial reconstruction loss.
Partial Graph Encoder: Because graph convolutional network
д −1 −1
Z 1 = σ (D̃ 1 2 Ã1 D̃ 1 2 XWд1 ), (1) (GCN) can fully integrate the network structure and the node at-
tributes, multiple GCNs are used as the partial graph encoder to
д − 12 − 21 д
Z 2 = σ (D̃ 2 Ã2 D̃ 2 [Z 1 , X ]Wд2 ), (2) learn the unique features of different views. Taking the l-th adja-
cency matrix Al and node attributes X as the input of the l-th GCN,
.. p
. the unique feature representation Zl ∈ R N ×d of the l-th view can
д −1 −1 д
be obtained:
Z д = Z M = σ (D̃ M2 ÃM D̃ M2 [Z M −1 , X ]WдM ), (3) p −1 −1
Zl = σ (D̃l 2 Ãl D̃l 2 XWpl ). (6)
where X ∈ R N ×F (N nodes and F features). Wд1 ∈ R F ×D is the filter Partial Graph Decoder: Like the global graph decoder, the pur-
parameter matrix in the first layer. D̃ 1ii = j Ã1i j and A˜1 = A1 + I .
Í
pose of the partial graph decoder is to reconstruct the adjacency
I is the identity matrix of A1 and σ (·) is a relu activation function. matrix of the corresponding view from the node representation
ACAI’21, December 22–24, 2021, Sanya, China Trovato and Tobin, et al.

p p
Figure 2: The framework of GRAE. The ⊙ is the element-wise product shown in (9), where the Z 1 , . . . , Z M and Z д all represent
an element.

Figure 3: The framework of GGAE

Figure 4: The framework of PGAE

obtained by the partial graph encoder. Therefore, we use multiple Then, we use partial reconstruction loss shown in Eq. 8 to obtain
graph decoders as partial graph decoder corresponding to the par- more unique features.
tial graph encoder. The following is to use the m-th graph decoder
to reconstruct the m-th adjacency matrix:
M M
p
Lm
Õ Õ
p p p pT Lp = p = loss(Am , Âm ). (8)
Âm = siдmoid(ZmWm Zm ). (7) m=1 m=1
GRAE: Graph Recurrent Autoencoder for Multi-view Graph Clustering ACAI’21, December 22–24, 2021, Sanya, China

It can be seen from Eqs. (6), (7) and (8) that there is no interaction Finally, we jointly optimize GRAE embedding and cluster learn-
between the unique feature representations of graph views so that ing through the total objective function as:
we can learn the unique features of the corresponding view through L = Lд + Lp + αLc , (13)
the partial graph autoencoder.
where Lд is global reconstruction loss, Lp is partial reconstruction
3.3 Adaptive Weight Learning and loss, Lc is KL divergence loss, and α is a coefficient of KL divergence
loss.
Self-training Clustering
Although we have obtained shared features through GGAE and 4 EXPERIMENT
unique features through PGAE, these features are relatively inde-
In this section, we first describe the datasets we use to evaluate our
pendent. Therefore, we put forward an adaptive weight learning
proposed model in Section 4.1, then present the our baselines and
method to fuse shared and unique features and propose self-training
experimental setup in Section 4.2. After that, we give the graph clus-
clustering collaborative training node representation to improve
tering results by comparing our model to other baseline methods
the clustering result.
in Section 4.3. Finally, we further discuss the parameter sensitivity
Adaptive Weight Learning: Due to the difference in the impor-
in Section 4.4.
tance of the shared features and the unique features, we propose
an adaptive weight learning method to fuse the shared features 4.1 Datasets
and the unique features. Because shared feature representation and
unique feature representation have good clustering performance We study the effectiveness of our proposed GRAE model on three
on most data sets, we need to preserve the independence of shared public datasets. The statistics of these public datasets are shown in
features and unique features as much as possible to improve our Table 2 and the detailed descriptions are the followings:
clustering performance further. Therefore, we propose the adaptive • ACM2 : ACM dataset is a paper network dataset. We use
weight learning method as follows: the two meta-paths as Paper-Author-Paper (PAP) and the
Paper-Subject-Paper (PSP) to construct a co-paper view and
w1 p wM p w M +1 д
Z =[ Z1 , · · · ZM , Z ]. (9) co-subject view, respectively. Paper attribute features are
mean(w) mean(w) mean(w)
the elements of a bag-of-words represented by keywords.
In order to adaptive adjust and optimize the weights between The research fields of thesis Database, Wireless Communi-
different features, we combine the adaptive weight learning and cation, Data Mining as the ground-truth labels are used in
self-training clustering as shown in Eq. (10). our experiments.
Self-training Clustering: Although we fuse the shared fea- • DBLP1 : DBLP dataset is an author network dataset. We
tures and unique features through the adaptive weight learning use the three meta-paths as Author-Paper-Author (APA),
method, GGAE and PGAE are unsupervised graph embedding al- Author-Paper-Conference-Paper-Author (APCPA), and Author-
gorithms, which may not guarantee that the node representation is Paper-Term-Paper-Author (APTPA) to construct a co-author
suitable for clustering. Therefore, we use self-training clustering view, co-conference view, and co-term view, respectively.
according to DEC [24] to improve the clustering as well as the node Each author attribute features are the elements of a bag-
representation. The self-training clustering objective is defined as of-words represented by keywords. The author’s research
the Kullback-Leibler (KL) divergence loss as follows: fields of database, data mining, machine learning, and infor-
mation retrieval as the ground-truth labels are used in our
ÕÕ piu
Lc = KL(P ||Q) = piu loд , (10) experiments.
q iu
i u • IMDB3 : IMDB dataset is a movie network dataset. We use
where qiu is a soft assignment interpreted as the probability of the two meta-paths as Movie-Actor-Movie (MAM) and the
assigning sample i to cluster u. Then qiu is measured by using Movie-Director-MOVIE (MDM) to construct a co-actor view
Student’s t-distribution [25] as a kernel to indicate the similarity and co-director view, respectively. Movie attribute features
between embedded point zi and cluster centroid µu : correspond to the elements of bag-of-words of plots. The
movie type Action, Comedy, Drama as the ground-truth
(1 + ∥zi − µu ∥)−1 labels are used in our experiments.
qiu = Í , (11)
k (1 + ∥zi − µ k ∥)
−1
4.2 Baselines and Experimental Setup
After obtaining the soft assignments, we optimize the node repre-
sentation through learning from the high confidence assignments. 4.2.1 Baselines.
Therefore, the probability piu in the auxiliary target distribution P To prove the effectiveness, we compare our proposed GRAE model
is calculated as: with three categories of methods: (1) single-view graph clustering
q 2 /fu methods; (2) multi-view graph embedding/clustering methods; and
piu = Í iu 2 , (12) (3) attributed multi-view graph clustering methods. For those meth-
k qik /fk ods in (1), we use the adjacency matrix and the node attributes or
where fu = i qiu are soft cluster frequencies. As shown in Eq.
Í only the adjacency matrix as single-view graph clustering method
(12), we raise qiu to the second power, and then normalize the 2 http://dl.acm.org

frequency of each cluster to calculate piu . 3 https://www.imdb.com/


ACAI’21, December 22–24, 2021, Sanya, China Trovato and Tobin, et al.

Table 2: The statistics of three public datasets

Dataset Features Nodes View Meta-paths Edges Class


co-paper PAP 29281
ACM 1830 3025 3
co-subject PSP 2210761
co-author APA 11113
DBLP 334 4057 co-conference APCPA 5000495 4
co-term APTPA 6776335
co-actor MAM 98010
IMDB 1232 4780 3
co-director MDM 21018

input. Then we show the best result among all input data. For the 4.2.2 Experimental Setup.
methods in (2), we only utilize graph views as the input. For the Due to the different convergence rates of iteration in different
methods in (3), we use graph views and node attributes as the input. datasets, the training times of our GRAE model may be different.
Once baseline methods obtain the node representation, we use the Our GGAE model is repeatedly trained 1500 times on DBLP dataset,
k-Means method for clustering. 100 times on ACM dataset, and 400 times on IMDB dataset, respec-
tively, while the PGAE model is trained 100 times on all datasets. For
• GAE [23] is a single-view graph autoencoder method based DBLP and IMDB datasets, we will set the learning rate λ = 0.004 and
on the variational auto-encoder (VAE). the coefficient of KL divergence loss α = 0.001. For ACM dataset,
• LINE [26] is a classical single-view graph embedding method, we will set the learning rate λ = 0.001 and the super parameter
which can maintain both the global and local network struc- α = 0.1. In GGAE, the embedding dimension of global graph en-
tures. coder D = 200 is set. In PGAE, the embedding dimension of partial
• MNE [18] is a scalable multi-view network embedding method, graph encoder d = 32 is set.
which can represent information of Multiple types of rela- Since some clustering algorithms rely on initialization, we use
tions into a unified embedding space random initialization to repeat the experiment 10 times for all
• PMNE [27] are multi-view graph embedding models, includ- methods, and show the average performance as the performance of
ing PMNE (c), PMNE (n), and PMNE (r). all methods. For the baseline models, we preserve all experimental
• RMSC [28] is a robust multi-view spectral clustering method settings described in their corresponding literature.
via low rank and sparse decomposition.
• PwMC and SwMC [29] are multi-view graph clustering meth- 4.2.3 Evaluation Metrics.
ods. We certify the performance of our proposed method using the fol-
• O2MA and O2MAC [21]. O2MA is a variant of O2MAC, lowing 4 widely adopted evaluation measures: clustering accuracy
which are both attributed multi-view graph clustering method. (ACC), normalized mutual information (NMI), F-Score (F1), and
The former doesn’t contain KL divergence loss in the total adjusted rand index (ARI) [31]. For all metrics, a higher value de-
function, while the latter does. notes better performance. Each metric penalizes or favors different
properties in the clustering, and hence we report results on these
We provide the descriptions in our GRAE model series as below: diverse measures to perform a comprehensive evaluation.

• PGAE: It is an attributed multi-view graph clustering method


4.3 Graph Clustering Performance
which only considers the unique features of the multi-view
graph. The parameter settings of the PGAE are the same as Table 3 demonstrates the overall performance of baselines and
GRAE. GRAE model series on three public datasets. We highlight the best
• GGAE: It is an attributed multi-view graph clustering method performance separately in each table and have the following obser-
which only considers the shared features of the multi-view vations.
graph. The training times and parameter settings of the • Our proposed GRAE series substantially outperforms other
GGAE model are the same as GRAE. baseline methods in all four metrics. The results clearly show
• (P+G)GAE: It is a combination method by merging the node that our GRAE model series are promising attributed multi-
representations of the PGAE and GGAE directly. Except for view clustering method.
the maximum training times of the total objective function • The clustering performance of shared features obtained by
T = 0, the other parameter settings are the same as GRAE. GGAE is higher than that of other clustering methods such
• GGAE+KL: It is an attributed multi-view graph clustering as O2MA and O2MAC. Like O2MAC, we combine GGAE and
method which considers the shared features of the multi- the self-training clustering method for clustering. Compared
view graph and self-training clustering. The training times with other methods to obtain shared features, the clustering
and parameter settings of the GGAE+KL model are the same performance of GGAE + KL is significantly improved, which
as GRAE. proves that our GGAE model can receive more features.
• GRAE: It is an attributed multi-view graph clustering method • All the multi-view graph clustering methods, i.e., MNF, PMNE
which considers the shared features and the unique features. series, wMC series, O2MA series, are markedly better than
GRAE: Graph Recurrent Autoencoder for Multi-view Graph Clustering ACAI’21, December 22–24, 2021, Sanya, China

Table 3: Performance comparison of different methods of ACM, DBLP and IMDB datasets. The ’*’ represents the method run
out-of-memory on this dataset.

ACM DBLP IMDB


Method
ACC F1 NMI ARI ACC F1 NMI ARI ACC F1 NMI ARI
LINE 0.6479 0.6594 0.3941 0.3433 0.8689 0.8546 0.6676 0.6988 0.4268 0.2870 0.0031 -0.0090
GAE 0.8216 0.8225 0.4914 0.5444 0.8859 0.8743 0.6925 0.7410 0.4298 0.4062 0.0402 0.0473
MNF 0.6379 0.6479 0.2999 0.2486 * * * * 0.4268 0.2870 0.0031 -0.0090
PMNE(n) 0.6936 0.6955 0.4648 0.4302 0.7925 0.7966 0.5914 0.5265 0.4958 0.3906 0.0359 0.0366
PMNE(r) 0.6492 0.6618 0.4063 0.3453 0.3688 0.3688 0.0872 0.0689 0.4697 0.3183 0.0014 0.0115
PMNE(c) 0.6998 0.7003 0.4775 0.4431 * * * * 0.4719 0.3882 0.0285 0.0284
RMSC 0.6315 0.5746 0.3973 0.3312 0.8994 0.8248 0.7111 0.7647 0.2702 0.3775 0.0054 0.0018
PwMC 0.4162 0.3783 0.0332 0.0395 0.3253 0.2808 0.0190 0.0159 0.2453 0.3164 0.0023 0.0017
SwMC 0.3831 0.4709 0.0838 0.0187 0.6538 0.5602 0.3760 0.3800 0.2671 0.3714 0.0056 0.0004
O2MA 0.8880 0.8894 0.6515 0.6987 0.9040 0.8976 0.7257 0.7705 0.4697 0.4229 0.0524 0.0753
O2MAC 0.9042 0.9053 0.6923 0.7394 0.9074 0.9013 0.7287 0.7780 0.4502 0.4159 0.0421 0.0564
PGAE 0.8903 0.8977 0.6531 0.7038 0.4673 0.4403 0.2549 0.2135 0.4469 0.4198 0.0444 0.0549
GGAE 0.9092 0.9100 0.7034 0.7515 0.9074 0.9026 0.7253 0.7781 0.5041 0.4620 0.0815 0.0874
(P+G)GAE 0.9102 0.9112 0.7048 0.7532 0.9102 0.9035 0.7335 0.7875 0.5427 0.4586 0.0713 0.0981
GGAE+KL 0.9162 0.9167 0.7204 0.7665 0.9124 0.9054 0.7396 0.7928 0.5287 0.4621 0.0766 0.0957
GRAE 0.9183 0.9188 0.7233 0.7739 0.9183 0.9128 0.7478 0.8052 0.5522 0.4611 0.0802 0.1114

single-view graph clustering methods, i.e., GAE and LINE,


which prove that single-view graph clustering methods can
not effectively use the multi-view information in various
problems while multi-view graph clustering methods can
fuse different views to produce a more robust and accurate
partition.
• Especially on IMDB dataset shown in Table 3, the clustering
ACC of the GRAE model is 10% higher than that of the state-
of-the-art O2MAC model proposed in 2020, as other metrics
are also significantly improved.
Especially, to prove the effectiveness of using the unique features
of multiple views, we construct the (P+G)GAE model by easily Figure 5: Performance in different learning rates on ACM
connecting the node representation of PGAE and GGAE. Table 3 dataset
demonstrates that the clustering performance has been obviously
improved. In addition, the last row in Table 3 demonstrates the even diverge, resulting in rapid degradation of experimental perfor-
performance of the complete GRAE model. We find that GRAE mance. Therefore, we set λ of the GRAE model to 0.001 to ensure
reach the best performance in all the cases, which demonstrates experimental results. At the same time, we can find that the metrics
our fusion and clustering strategies can lead to encouraging results. ACC, F1, NMI, and ARI are positively correlated. As ACC value
grows, F1, NMI, ARI values also increase. Therefore, we only use
4.4 Parameter Sensitivity Analysis the ACC metric to represent the performance of the models for the
In this section, we explore how hyper-parameters influence the sake of simplicity.
performance of the GRAE model. Specifically we mainly report the Figure 6 shows the accuracy performance of our GRAE model
analysis of learning rate λ, embedding dimension of global graph by ranging the global graph encoder’s embedding dimension D
encoder D, and the coefficient of KL divergence loss α. For these from 40 to 400. On ACM dataset, we observe that the accuracy
parameters, we set the random seed to 16, and other parameters performance is relatively stable, which indicates the embedding
remain unchanged as the prior experiment settings. dimension D has little effect on the accuracy performance. However,
It can be seen from Figure 5, if λ is too small, the gradient will on DBLP dataset, the accuracy performance fluctuates between
drop very slowly, resulting in that the loss function is likely to fall 0.9058 and 0.9255. Under the different parameters D, the accuracy
on the saddle point or local minimum after GRAE training, and may performance varies significantly. Therefore, D = 200 is a good
not even reach convergence. And if λ is too large, the loss function choice for the embedding dimension of global graph encoding.
may directly cross the global optimum and can not converge or Similarly, as shown in Figure 6, on the two datasets, the impact
ACAI’21, December 22–24, 2021, Sanya, China Trovato and Tobin, et al.

[4] Ma, G., Lu, C. T., He, L., Philip, S. Y., & Ragin, A. B. (2017, November). Multi-
view graph embedding with hub detection for brain network analysis. In 2017
IEEE International Conference on Data Mining (ICDM) (pp. 967-972). IEEE.
https://doi.org/10.1109/ICDM.2017.123
[5] Wong W., Fu A.W. (2002) Incremental Document Clustering for Web Page Classifi-
cation. In: Jin Q., Li J., Zhang N., Cheng J., Yu C., Noguchi S. (eds) Enabling Society
with Information Technology. Springer, Tokyo. https://doi.org/10.1007/978-4-431-
66979-1_10
[6] Li, X., Xu, G., Jiao, L., Zhou, Y., & Yu, W. (2019). Multi-layer net-
work community detection model based on attributes and social in-
teraction intensity. Computers & Electrical Engineering, 77, 300-313.
(a) Embedding dimension in (b) Embedding dimension in https://doi.org/10.1016/j.compeleceng.2019.06.010
ACM DBLP [7] Wang, C., Pan, S., Hu, R., Long, G., Jiang, J., & Zhang, C. (2019). Attrib-
uted graph clustering: A deep attentional embedding approach. arXiv preprint
arXiv:1906.06532. https://arxiv.org/abs/1906.06532
[8] Tian, F., Gao, B., Cui, Q., Chen, E., & Liu, T.-Y. (2014). Learning
Deep Representations for Graph Clustering. Proceedings of the
AAAI Conference on Artificial Intelligence, 28(1). Retrieved from
https://ojs.aaai.org/index.php/AAAI/article/view/8916
[9] Huang, S., Wang, H., Li, T., Li, T., & Xu, Z. (2018). Robust graph regularized
nonnegative matrix factorization for clustering. Data Mining and Knowledge
Discovery, 32(2), 483-503. https://doi.org/10.1007/s10618-017-0543-9
[10] Mei, J. P., Lv, H., Yang, L., & Li, Y. (2019). Clustering for heterogeneous information
networks with extended star-structure. Data Mining and Knowledge Discovery,
33(4), 1059-1087. https://doi.org/10.1007/s10618-019-00626-2
[11] Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., & Philip, S. Y. (2020). A compre-
(c) The coefficient α in ACM (d) The coefficient α in DBLP hensive survey on graph neural networks. IEEE transactions on neural networks
and learning systems, 32(1), 4-24. https://doi.org/10.1109/TNNLS.2020.2978386
[12] Bazzi, M., Porter, M. A., Williams, S., McDonald, M., Fenn, D. J., & Howison,
Figure 6: The embedding dimension of global graph encoder S. D. (2016). Community detection in temporal multilayer networks, with an
application to correlation networks. Multiscale Modeling & Simulation, 14(1),
and the coefficient α of KL divergence loss on ACM and 1-41. https://doi.org/10.1137/15M1009615
DBLP datasets [13] Tripathi, A., Klami, A., Orešič, M., & Kaski, S. (2011). Matching samples
of multiple views. Data Mining and Knowledge Discovery, 23(2), 300-321.
https://doi.org/10.1007/s10618-010-0205-7
of the KL divergence loss coefficient α from 0.001 to 0.512 on the [14] Zhan, K., Nie, F., Wang, J., & Yang, Y. (2018). Multiview consensus
accuracy performance is entirely different. Between 0.001 and 0.016, graph clustering. IEEE Transactions on Image Processing, 28(3), 1261-1270.
https://doi.org/10.1109/TIP.2018.2877335
the accuracy performance on DBLP dataset shows a downward [15] Gao, H., Nie, F., Li, X., & Huang, H. (2015). Multi-view subspace clustering. In
trend, while on ACM dataset, the accuracy performance shows Proceedings of the IEEE international conference on computer vision (pp. 4238-
4246). https://doi.org/10.1109/ICCV.2015.482
an upward trend. Therefore, our model GRAE sets different KL [16] Brbić, M., & Kopriva, I. (2018). Multi-view low-rank sparse subspace clustering.
divergence loss coefficient values on different datasets. Pattern Recognition, 73, 247-258. https://doi.org/10.1016/j.patcog.2017.08.024
[17] Qu, M., Tang, J., Shang, J., Ren, X., Zhang, M., & Han, J. (2017, No-
vember). An attention-based collaboration framework for multi-view net-
5 CONCLUSION work representation learning. In Proceedings of the 2017 ACM on Con-
In real-world applications, graph data can usually be represented ference on Information and Knowledge Management (pp. 1767-1776).
https://doi.org/10.1145/3132847.3133021
in many different views, and all these solutions are exciting and [18] Zhang, H., Qiu, L., Yi, L., & Song, Y. (2018, July). Scalable Multiplex Network Em-
reasonable from different views. In order to discover more accurate bedding. In IJCAI (Vol. 18, pp. 3082-3088). https://doi.org/10.24963/ijcai.2018/428
and robust partitions, we put forward a novel GRAE model for [19] Wang, X., Ji, H., Shi, C., Wang, B., Ye, Y., Cui, P., & Yu, P. S. (2019, May). Het-
erogeneous graph attention network. In The World Wide Web Conference (pp.
attributed multi-view graph clustering. We investigate the inherent 2022-2032). https://doi.org/10.1145/3308558.3313562
connection between the shared features and the unique features of [20] Cen, Y., Zou, X., Zhang, J., Yang, H., Zhou, J., & Tang, J. (2019, July). Representation
learning for attributed multiplex heterogeneous network. In Proceedings of the
multiple views. Firstly, we design a global graph autoencoder and a 25th ACM SIGKDD International Conference on Knowledge Discovery & Data
partial graph autoencoder to extract them from the graph. Next, we Mining (pp. 1358-1368). https://doi.org/10.1145/3292500.3330964
propose an adaptive weight learning method to fuse them. At last, [21] Fan, S., Wang, X., Shi, C., Lu, E., Lin, K., & Wang, B. (2020, April). One2multi
graph autoencoder for multi-view graph clustering. In Proceedings of The Web
we address a self-training clustering to obtain meaningful node Conference 2020 (pp. 3070-3076). https://doi.org/10.1145/3366423.3380079
representation and produce competitive clustering results. Abun- [22] Deutsch, F., & Deutsch, F. (2001). Best approximation in inner product spaces
dant experimental results on three real-world datasets demonstrate (Vol. 7). New York: Springer. https://doi.org/10.1007/978-1-4684-9298-9
[23] Kipf, T. N., & Welling, M. (2016). Variational graph auto-encoders. arXiv preprint
the superior performance and effectiveness of our proposed GRAE arXiv:1611.07308.
model. [24] Xie, J., Girshick, R., & Farhadi, A. (2016, June). Unsupervised deep embedding
for clustering analysis. In International conference on machine learning (pp.
478-487). PMLR. https://arxiv.org/abs/1511.06335
REFERENCES [25] Van der Maaten, L., & Hinton, G. (2008). Visualizing data
[1] Bansal, S., Khandelwal, S., & Meyers, L. A. (2009). Exploring biological net- using t-SNE. Journal of machine learning research, 9(11).
work structure with clustered random networks. BMC Bioinformatics, 10, 405. http://jmlr.org/papers/v9/vandermaaten08a.html
https://link.gale.com/apps/doc/A215758567/AONE?u=anon 328e02af&sid=google [26] Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015,
Scholar&xid=3ba57b47 May). Line: Large-scale information network embedding. In Proceedings
[2] Kim, S. Y., Jung, T. S., Suh, E. H., & Hwang, H. S. (2006). Cus- of the 24th international conference on world wide web (pp. 1067-1077).
tomer segmentation and strategy development based on customer life- https://doi.org/10.1145/2736277.2741093
time value: A case study. Expert systems with applications, 31(1), 101-107. [27] Liu, W., Chen, P. Y., Yeung, S., Suzumura, T., & Chen, L. (2017, No-
https://doi.org/10.1016/j.eswa.2005.09.004 vember). Principled multilayer network embedding. In 2017 IEEE Interna-
[3] Fortunato, S. (2010). Community detection in graphs. Physics reports, 486(3-5), tional Conference on Data Mining Workshops (ICDMW) (pp. 134-141). IEEE.
75-174. https://doi.org/10.1016/j.physrep.2009.11.002 https://doi.org/10.1109/ICDMW.2017.23
GRAE: Graph Recurrent Autoencoder for Multi-view Graph Clustering ACAI’21, December 22–24, 2021, Sanya, China

[28] Xia, R., Pan, Y., Du, L., & Yin, J. (2014). Robust Multi-View Spec- [30] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv
tral Clustering via Low-Rank and Sparse Decomposition. Proceedings of preprint arXiv:1412.6980.
the AAAI Conference on Artificial Intelligence, 28(1). Retrieved from [31] Ding, Z., Zhao, H., & Fu, Y. (2019). Multi-view Clustering with Complete In-
https://ojs.aaai.org/index.php/AAAI/article/view/8950 formation. In Learning Representation for Multi-View Data Analysis (pp. 9-50).
[29] Nie, F., Li, J., & Li, X. (2017, August). Self-weighted Multiview Clustering with Springer, Cham. https://doi.org/10.1007/978-3-030-00734-8_2
Multiple Graphs. In IJCAI (pp. 2564-2570). https://doi.org/10.24963/ijcai.2017/357

You might also like