Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Applied Intelligence (2023) 53:6978–6991

https://doi.org/10.1007/s10489-022-03891-9

A unified framework of graph structure learning, graph generation


and classification for brain network analysis
Peng Cao1,2 · Guangqi Wen1,2 · Wenju Yang1,2 · Xiaoli Liu3 · Jinzhu Yang1 · Osmar Zaiane4

Accepted: 11 June 2022 / Published online: 13 July 2022


© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022

Abstract
Recently, functional brain networks have been employed for classifying neurological disorders, such as autism spectrum
disorders (ASDs). Graph convolutional networks (GCNs) have been shown to be successful in modeling applications with
graph structures. However, brain network data is in general of complex structure with small sample size, and the use of
GCNs on available datasets remains a big challenge. Driven by this important issue, three questions arise: 1) how to capture
the critical structures of brain networks by removing noisy connections, to facilitate the following GAN and GCN; 2) how to
generate graphs by generative adversarial networks (GANs) to preserve the local graph topology as well as the global data
distribution; 3) how to sufficiently leverage the real and generated graphs with domain gaps for improved classification. In
this paper, we proposed a three-stage framework, named BrainGC-Net, which coherently joins the power of graph pooling,
GAN and GCN for brain network generation and classification. Given the original brain network with a large number of
noisy connections, we propose graph pooling to enhance the important connections with a supervision scheme. Then, based
on the coarsened brain networks, we propose a graph GAN model, named EG-GAN, to focus on the global data distribution
in the embedding space and local graph topology in the graph space simultaneously. Finally, a domain consistent GCN
model is proposed to take sufficient advantage of the two domains rather than simply merging by incorporating multiple
consistent regularizations from view correlation, class correlation and sample correlation. With extensive experiments on
the ASD classification problem, we validate the effectiveness of our method and it achieves consistent improvements over
state-of-the-art methods on the public ABIDE dataset.

Keywords Brain network · Graph convolutional networks · Generative adversarial networks · Graph classification

1 Introduction the strength of the connection between those regions. This


task can be seen as a graph classification problem. The
Recent studies have shown that rs-fMRI based analysis of fundamental task of the graph classification problem is the
brain functional connectivity (FC) is effective in helping representation learning of graph-structured data. Motivated
understand the pathology of brain diseases [1, 2]. The by the success of the graph convolution network (GCN) [3]
functional network of the brain can be modeled as a graph in graph data, recent works [4–10] have applied GCN to the
in which each node is a brain region and the edges represent functional network derived from rs-fMRI data to learn the
latent features from graphs.
Embedding learning for brain networks with GCN
 Peng Cao requires a large collection of training data. However, the
caopeng@cse.neu.edu.cn
major challenge is that the quantity of available labeled data
is usually very small in the clinical application, which limits
1 College of Computer Science and Engineering, Northeastern the classification performance. While recent attempts to
University, Shenyang, China solve the limited data problem using generative adversarial
2 Key Laboratory of Intelligent Computing in Medical Image, networks (GAN) have been successful in augmenting
Northeastern University, Shenyang, China training data by generating realistic images, most of them
3 Alibaba A.I. Labs, Hangzhou, China are based on images. In this paper, we aim to address
4 Alberta Machine Intelligence Institute, University of Alberta, the following challenges: (C.1) how to capture the critical
Edmonton, Alberta, Canada structures of brain network by removing noisy connection,

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6979

which is beneficial for the following GAN and GCN, superiority of our method over other competing methods on
(C.2) how to generate a realistic graph representation that the brain network classification for ASD diagnosis.
preserves local graph structures and globally consistent The main contributions of our work can be summarized
data distribution, and (C.3) how to alleviate the domain as follows:
adaption of both representations of graphs for improved
1) Graph structure learning: We propose an attention
classification. We propose a three-stage framework, named
based graph pooling, which is able to remove the
BrainGC-Net, which coherently joins the power of graph
noisy functional connections and alleviate the multi-
pooling, GAN and GCN for the brain network generation
center subject inconsistency. Under the proposed prior
and classification. In our work, we select a common disease
subnetwork knowledge guides, it is able to identify
called autism spectrum disorder (ASD) for evaluation.
the critical graph structure, which actually boosts the
ASD is a neurodevelopmental disorder characterized by
performance of the following graph generation and
communication, behavior and social interaction deficits in
classification procedures.
patients which may include repetitive behavior, irritability,
2) Graph generation learning: To improve the graph
and attention problems. Bajestani et al. [11, 12].
generation quality, we propose a graph GAN by deriv-
Due to the intrinsically complex structure of brain
ing two additional constraints: a local topological mea-
networks, few GAN models can be successfully applied
sure to preserve global structural properties and global
due to the complex graph structure, diverse interpolation
homeomorphic mapping to regularize the position of
incompatible with the original distribution, and the unstable
the generated samples. Most graph GAN models focus
optimization. To solve these challenges, we propose a graph
on the modeling of the underlying true connectivity dis-
pooling model to enhance the important connections and
tribution and produce the most indistinguishable fake
remove the irrelevant connections with the supervision
connections. To the best of our knowledge, the pro-
scheme. Moreover, we firstly propose a prior subnetwork
posed EG-GAN is the first work that applies GAN
structure regularization to guide the pooling procedure
on brain networks for augmenting data for improved
and ensure the accurate subnetwork identification. By
classification performance.
removing the noisy connection, the clean structure not
3) Graph embedding learning: To take sufficient advan-
only reduces computational complexity, but also facilitates
tage of the generated graphs, we propose a domain con-
the downstream classification tasks. Then, based on the
sistent GCN for graph embedding learning to alleviate
structure of α-GAN, we proposed a graph GAN model
the distribution shift between the real graph and gener-
focusing on both Embedding and Graph space, named
ate graph domains through the view consistency, class
EG-GAN, which incorporates a local topological measure
consistency and pairwise sample similarity consistency.
constraint from graph space and a distribution consistent
4) We draw a connection among the graph pooling, graph
homeomorphic mapping from the latent code space. The
generation and graph classification. For example, the
aim is to enable the generator to preserve global structural
graph pooling leads to a clean graph structure, which
properties and enforce the compatibility between latent
boosts the performance of the following models of
sample distances and the corresponding graph sample
graph generation and graph classification; our proposed
distances, both of which are important for the brain network
DC-GCN can reduce the distribution gap between the
neurological disorders. The proposed generative model is
real graphs and the generated graphs.
unsupervised through adversarial learning, which lacks
sufficiently discriminative ability for graph classification. The rest of the paper is organized as follows. Detailed
Moreover, it only generates the graph structure of brain mathematical formulations and descriptions of the proposed
networks without the node signal features. To alleviate the graph pooling, EG-GAN and DC-GCN are provided in
existing gap between the domains of real and generated Section 2. In Section 3, we evaluate the performance by
graphs, we propose a domain consistent GCN model to extensive experiments on the ABIDE datasets. We conclude
improve the graph classification performance via three this study in Section 4.
mechanisms: (1) cross-view consistency, (2) class center
consistency and (3) sample similarity consistency. The
proposed model is able to learn a consistent graph 2 Method
embedding by fully exploiting the correlation from the
perspectives of view correlation, domain correlation and 2.1 Overview
sample correlation through the training process. Extensive
ablation studies and comparison experiments conducted The notations and the corresponding descriptions are shown
on the ABIDE dataset have shown the effectiveness and in Table 1. Formally, given an original graph Gi = {Vi , Ai }

13
6980 P. Cao et al.

Table 1 The notations and the


corresponding description Variable name Description

G / Ĝ The original / coarsened brain network


A / Â Adjacent matrix of original / coarsened brain network
n and c The node amounts of G and Ĝ
Dr , Dg and Drg The datasets of real graphs, generated graphs and the mixed graphs
h and Z Graph embedding vector and node embedding matrix
F Cluster indicator matrix
s The importance score of each node vi in graph G
R The set containing part of the brain regions
w The weight of edge eij in A
Gθ (·) Graph generator
fw Graph discriminator
Eφ (·) Graph encoder
fwc Code discriminator
Nr The number of real graphs
Ng The number of generated graphs

and their corresponding label yi ∈ {−1, 1}, i = 1, . . . , Nr , deep learning research. In our work, we propose a temporal
we formulate our task as a graph classification task by graph pooling with attention mechanism where weights
learning a mapping function f : X → Y in our study. Ai ∈ are determined by GCN. We formulate graph pooling as
Rn×n denotes the corresponding adjacency matrix of Gi . a cluster assignment problem with attention mechanism.
Each value of Ai is calculated between two brain regions By hiding the non-indicative edges and highlighting the
by Pearson’s correlation estimation. V = {v1 , v2 , ..., vn } indicative edges, the nodes of an original graph are grouped
represents the set of nodes in G and each v contains one into some clusters. Formally, an original graph G is
dimensional signal. transformed into a coarsened graph Ĝ = {V̂ , Â} with
Recently, deep learning methods have made a massive a coarsened structure, where V̂ = {SN1 , SN2 , ..., SNc }
breakthrough in medical applications. A large quantity of denotes the set of supernodes in Ĝ, Â ∈ Rc×c denotes
data is the essential for training models. However, in the the corresponding weighted adjacency matrix of coarsened
medical domain, it is usually difficult to collect sufficient graph Ĝ, and c denotes the number of supernodes. To
data. To overcome these limitations, we proposed a data achieve this, we introduce a learnable parameter F ∈ Rn×c .
augmentation method based on the coarsened graphs and F indicates the membership of a node to a cluster. F is
cast ASD diagnosis as a classification problem. Specifically, formally defined as:
we develop a unified three-stage learning framework for 
s , i ∈ SNj
brain network classification. It involves graph pooling for Fij = i (1)
0, i ∈/ SNj
structure learning, graph generation for data augmentation
and graph embedding learning for classification. The where si denotes the importance score of each node vi in
framework is shown in Fig. 1. graph G.
With the optimized F , we can obtain a set of clusters
2.2 Graph pooling as supernodes V̂ = {SN1 , SN2 , ..., SNc } and the weighted
adjacency matrix of the supergraph  = F T AF ∈ Rc×c .
First, the highly noisy connections in the brain network lead Then the nodes of a cluster are pooled as one supernode
to poor classification performance. Second, inconsistencies to produce a coarsened graph. The superedge between
exits in the subjects from multi-sites. The inconsistency in supernodes is the aggregation of edges multiplying with
the brain network introduces a more significant challenge the node importance. Each superedge eij in  is defined
to our classification task. Therefore, it is essential to as si ∗ wij ∗ sj , where wij denotes the weight of edge eij
eliminate the inconsistency in FCs. Graph pooling is a in A. Therefore, the learned connection weights help us
critical operation to coarsen an original graph with high identify the informative edges. The attention mechanisms
level noisy edges for graph representation learning in GCN. allow GCN to focus more on important connections through
The aim of the graph pooling procedure is to preserve the learning a weight parameter. It is able to construct a
global topology structure in brain networks and remove coarsened and clean graph structure with only important
the noisy edges. Attention has been widely used in recent edges by highlighting the critical connections and removing

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6981

Fig. 1 The overall architecture S1 S5


of our proposed three-stage S8
S1 S5
S2 S6 S7 S8
BrainGC-Net framework S9 S2 S6 S7
S9
S3 S10 S3 S10 s-GCN
S11 S11
S4 S4

Original Graph Coarsened Graph

Graph pooling
Cross-view consistency

Coarsened Graphs

Real
Node temporal
Graph set feature learning f-GCN
Dr

EG-GraphGAN Ar
Generated
Graph set
s-GCN
Dg Ag

j
Gi Gj Dg+ Dg- Dr+ Dr-
Class semantic consistency
Graph space Embedding space
Pairwise similarity consistency

DC-GCN

the irrelevant connections. Meanwhile, all the dynamic helps our model identify potential subnetworks that are most
brain network series data become consistent. Furthermore, predictive of ASD diagnosis.
to constrain the consistency of temporal multi-graph, we
perform graph pooling on all the graphs with a shared F . 2.3 EG-GraphGAN
The aim of the graph pooling is to identify the
indicative edges for our task. However, with the optimized 2.3.1 Architecture
F , the pooling may neglect the potentially important
subnetworks. In order to address this, we consider the prior Recently, generative adversarial networks (GANs) [13]
information of subnetworks and incorporate the knowledge have achieved state-of-the-art performance in the field of
into the proposed model to encourage the edges within the image generation producing very realistic images in an
important subnetwork with higher weights. Hence, the prior unsupervised setting. In this study, we propose a graph
subnetwork knowledge regularization is: GAN model focusing on the global data distribution in
the embedding space and local graph topology in the
⎛ ⎞
 graph space simultaneously. At first, in order to effectively
1
Rsub = −log ⎝ 2
si sj ⎠ (2) address the mode collapse problem of GANs, we introduce
Rprior i,j ∈Rprior additional autoencoder and code discriminator networks
on top of the existing generator and discriminator. Instead
where Rprior contains the brain regions in the subnetwork. of using the code discriminator from α-GAN or KL loss
We choose six common networks including DMN (default from VAE-GAN, we introduce WGANs that minimize an
mode network), AN (auditory network), SN (salience net- approximation of the Wasserstein distance between the real
work), SMN (somato-motor network), VN (visual network) distributions and the distribution of the generated samples
and CEN (central executive network). The regularization [14, 15]. Similar to any typical GAN architecture, our

13
6982 P. Cao et al.

GAN consists of four main components - graph generator generated samples.


Gθ , graph discriminator fw , graph encoder Eφ and code
discriminator fwc . The graph generator network Gθ (·) is 1 r  N
1    Ng
g g
trained to learn a function Gθ (·) : h(r) → G mapping Ld = fw Ĝi − fw Gθ hi(r) + Lgp
m m
each point in the latent embedding space to a graph sample i=1 i=1
in the graph space. h(r) is the real graph embedding (7)
vector. A graph encoder network Eφ (·) is trained to learn where Nr is the number of real graphs and Ng is the number
a function Eφ (·) : G → h(e) , mapping each real graph of generated graphs.
sample to a point in the latent space. h(e) is the generated During the graph encoding stage, the graph convolu-
graph embedding vector. These networks are trained in
tandem with two discriminator networks, fwc and fw , which  function is used for each layer as follows: Eφ (·) :
tion
1 1
φ D̃ − 2 ÃD̃ − 2 Z (l) W (l) , where Z (l) is the node embed-
learn to discriminate abilities between real and generated
samples and latent points, respectively. Figure 2 illustrates dings of the layer l. During the graph generation stage,
the overall architecture of the EG-GAN framework. the link prediction function is defined as follows: p(Â |
First, the basic total loss of the graph generative Ẑ) = ni=1 nj=1 p(Aˆij | ẑi , ẑj ), with p(Aˆij | ẑi , ẑj ) =
T
adversarial model is determined by: σ (f ẑi · f (ẑj )), where  and Ẑ are the adajcent matrix
and node embeddings of the generated graph Ĝgen . We
eg g g
Ladv = Leenc + Led + Lgen + Ld (3) also incorporate the addition of reconstruction terms, that
discourages mode collapse as Gθ (·) needs to be able to
where: reconstruct every input graph Ĝ. In the graph space, we
choose a pointwise reconstruction term with L1 distance to
1) Lcenc is the loss of the graph encoder module: match the distributions of Ĝ and Ĝrec = Gθ (h(e) ). More-
over, we propose another reconstruction term for the graph
1 
Nr  embedding from the embedding space. Both reconstruction
Lcenc = − fwc Eφ (Ĝi ) (4)
Nr losses consider the graph space and latent embedding space,
i=1
respectively. The reconstruction loss is calculated as:
2) Lcd is the loss of the graph discriminator module by
treating h(r) as real and the encoder Eφ (·) output h(e) 1  
Ng   2  2 
   
Lerec (Ĝ, Ĝrec ) = Eφ Ĝi − Eφ Ĝirec  + Ĝi − Ĝirec 
as fake : m 2 1
i=1
(8)
1  c i 
Ng
1 r N
Lcd = fw h(r) − fwc hi(e) + Lcgp (5)
m m
i=1 i=1

g
2.3.2 Topology loss
3) Lgen is the loss of the graph generator module:
The previously introduced EG-GraphGAN only considers
Ng
1   the global graph connectivity structure generation with
g
Lgen =− fw Gθ (hi(r) ) (6) an adversarial learning scheme (i.e., graph connectivity
m
i=1 structure) and ignores the local topology at the node scale of
a graph (e.g., how central a node is in the graph). To solve
g
4) Ld is the loss of the graph discriminator module which this, we introduce a local topology loss to regularize the
aims to distinguish the real graph data Ĝ from the generators. In our study, we choose the proximity centrality

Fig. 2 Our framework involves Graph Generator

four modules: graph encoder, ReLU


graph generator, graph FC
Softplus
discriminator and code (Z, ZT) Generated graph

discriminator
Real
Real or
Real Gaussian Fake?
Distribu on Fake
or Reconstructed graph
Code
Fake? Embedding
Convolutional Operation

Original graph

Code Discriminator Graph Encoder Graph Discriminator

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6983

used in graph theory as the common measure. Closeness 2.4 Domain consistent GCN, DC-GCN
centrality directly relates to the cardinality of the shortest
path between two nodes. It is calculated as: CC(i) = The adversarial loss defined in the previous section forces
 c−1 , where d(i, j ) defines the distance between two the model to generate realistic samples, but does not
d(i,j )
j  =i
supernodes i and j , and c is the number of supernodes. It guarantee to provide discriminative information for graph
has a reflection of the whole connectivity in the network classification of ASD and normal control (NC) due to
structures. The topology loss is defined as follows: the domain gap. To alleviate the existing gap between the
domains of real and generated graphs, our aim is to improve
g

m
the graph classification performance via three mechanisms:
Ltp = MAE (CC(i), CC(j )) . (9) (1) cross-view consistency, (2) class center consistency
i=1
and (3) sample similarity consistency. Dr = {Vir , Ari }N r
i=1
denotes the real graphs, where V r = {v1 , v2 , ..., vc }
2.3.3 Distribution consistency loss
represents the set of nodes with BOLD signal series and
c indicates the number of subjects and brain supernodes.
A high-quality generative model should be able to g Ng
synthesize graphs that preserve global distribution of Drg = {{Ari }Ni=1 ; {Aj }j =1 } denotes the mixed graph dataset
r

realistic graphs. To prevent the generator from producing with the real and generated graphs, only the graph structure
diverse graph samples of which the corresponding latent of which is considered. To this end, we train two GCN
codes are close, we further propose a consistent loss to models, f -GCN and s-GCN on Dr and Drg for learning
encourage the graphs reconstructed by the decoder as the the graph embeddings with two specific graph convolution
nearby latent vectors to be consistently closer. It guarantees modules.
that interpolating between samples in the latent space
leads to semantic interpolation in the graph space further 2.4.1 Node spatio-temporal features learning in f -GCN
improving the mapping h(r) → G. More specifically, we
add a homeomorphic mapping to the model. We attempt We are interested in the dynamic functional connectivity
to random sample hij along the line between pairs of analysis and focus on the design of a unified model that
points vector hi and hj . With the reconstructed graph data is capable of handling spatio-temporal data for capturing
(i,j ) (i,j ) the dynamic functional connectivity. In Dr , we aim to
Ĝrec , it is desirable that Ĝrec is consistently closer to
i j jointly learn the node spatio-temporal features involving
the reconstructed Ĝrec and Ĝrec . The loss enables us to
the temporal features in the BOLD signal and the spatial
achieve smooth interpolations of generated fake samples.
features of the graph structures. In order to learn a
The homeomorphic mapping loss can be written as:
representation of the temporal dynamics contained in rs-
 
(i,j ) j (i,j ) fMRI time series, we employ the gating mechanism gated
Lehom = Lerec Ĝirec , Ĝrec + Lerec Ĝrec , Ĝrec (10)
linear units (GLU) [16] to capture dynamic temporal
features. The output through 1D convolutional layer is
divided into two equal parts, namely P and Q, which are
2.3.4 Overall loss the outputs of the first and the second half convolution
kernels individually. The temporal gated convolution can be
(t+1)
The training is proceeded in terms of four kinds of losses: defined as: fT = P ⊗ σ (Q), where ⊗ is the Hadamard
adversarial loss, reconstruction loss, topology loss and product operator. The convolution kernel is designed to map
(t)
distribution-consistency loss. The overall losses for our the input fT to a single output element [P , Q]. With the
EG-GraphGAN are as follows, temporal features, we employ a stack of graph convolutional
layers to capture the spatial features of the graph structures.
eg eg g
LGAN = Ladv + λ11 Ltp + λ12 Lehom + λ13 Lerec (11)
2.4.2 Multiple consistency regularization
Under the unified objective of the multiple losses in GAN,
the benefits of the graph generation model can be exploited Through the graph generation by our EG-GAN, the two
and the generation performance improves. The adversarial graph domains from the real graph dataset and the generated
loss forces the decoder to generate graphs that are likely graph dataset are nearly aligned from a distribution
to fool the discriminator in its task to distinguish between perspective. However, it only enforces the alignment of
real and reconstructed graphs, while both topology loss and global distribution through adversarial learning, but it
homeomorphic mapping loss aim to improve the generation ignores the local embedding alignment and the class
performance from two aspects of local graph topology and semantic alignment. Additionally, the two GCN models
global embedding distribution. proposed for focusing on capturing the different graph

13
6984 P. Cao et al.

properties on Dr and Drg lack the correlation between distribution. It helps alleviate these domain gap issues by
each other. The two GCNs are desirable to be reinforced GAN and facilitates the graph embedding by guaranteeing
by each other through collaboratively learning. To this that samples from different domains with the same class
end, we propose a domain consistent GCN, named DC- label will be mapped nearby in the feature space. We denote
GCN, to find low-dimensional graph embedding from Cr+ , Cr− , Cg+ and Cg− as the class centroids of the two
multi-view multi-domain data via three mechanisms: (1) domains of the Dr and Dg . Aligning the distribution of
cross-view consistency, (2) class center consistency and Dr and Dg is able to guide the s-GCN model to produce
(3) sample similarity consistency. In order to fully exploit a consistent graph embedding for the two domains. We
the correlation of the two view embeddings, we develop align the synthesized sample distribution to the real sample
a collaborative learning framework for the two graph distribution with the Euclidean distance as follows.
convolution modules with multiple regularizations from the  2  2
   
perspective of the multi-views from the same graph, the real Lcs = (Cr+ − Cg+ ) − (Cr− − Cg− )2  (13)
2 2
and the generated graph data distributions and the graph
data correlation. The main idea of collaborative learning is
that each sample can learn a consistent graph embedding Pairwise similarity consistency The previous regularization
from multi-view and multi-domain but also from the whole considers the consistency from the perspective of global
population through the training process. distribution ignoring the local pairwise similarity consis-
tency. The local pairwise similarity consistency requires that
Cross-view consistency For the real graph data, the cross- the graph data with the similar graph structure should be
view pairs of the same graph are captured by the mapped close. Incorporating and preserving the local con-
two different GCN models. Now we have two specific sistency can promote to learn a better embedding of the
embeddings hf and hs for each real graph with feature brain network. To this end, all the training samples are
embedding. In order to fully exploit the correlation of the treated as a global graph to exploit the association in the
two view embedding, we propose a cross-view consistency subjects. Given the mixed data Drg , we construct an undi-
for the two graph convolution modules. We introduce a rected global graph Ḡrg = V̄rg , Ārg , where Ārg describes
cross-view consistency with contrastive loss, which takes the pairwise similarities between each pair of subjects with
into account the consistency between embeddings by the brain networks and each subject is represented by a vertex v̄
f -GCN and s-GCN, which gives rise to the following associated with a graph.
constraint: In our work, we adopt the graph Laplacian regularizer,
1 1 which is defined based on an undirected weighted graph
Lcv = I d 2 + (1 − I ) {max(0, md − d)}2 (12) Ḡrg . The definition of the graph’s edges is critical in order
2 2
where I indicates whether the paired cross-view embed- to capture the underlying structure of the data and explain
dings are for the same subject(if matched, I = 1; otherwise, the similarities between each pair samples. In this setting,
I = 0); and d is the Euclidean distance between feature a kernel K : G × G → R is called a graph kernel, which
descriptors hf and hs learned by the two GCNs, respec- can capture the inherent similarity in the graph structure
tively. Such feature consistency loss encourages the feature and is reasonably efficient to evaluate. Similar to kernels on
descriptors of matching pairs to be close and non-matching vector spaces, graph kernels can be calculated implicitly by
pairs to be separated by a distance of at least a margin computing K. The similarity between two original graphs
md . Equation 12 encourages the cross-view embeddings to Ĝi and Ĝj is calculated as:
be close in the embedding space, and punishes the non n n 
i j p q
cross-view embeddings that are margin md away. ij p=1 q=1 wp wq k vi , vj
Ārg = K(Ĝi , Ĝj ) = n n j
, (14)
i
p=1 wp q=1 wq
Class semantic consistency In addition to the cross-view
p q
relationship, we also consider the possible consistency of where vi and vj denote the p-th and q-th node features
the class semantic across domains of Dr and Dg . The (Pearson’s correlation vector) of the i-th and j -th original
graph generation is independently conducted on the two graphs, wpi = n k(v1
p u and k(·, ·) denotes a RBF kernel.
,v )
u=1 i i
classes of ASD and NC, which results in the possible By minimizing the term Lps in (15), we expect that if two
indiscrimination between the two classes in the graph samples Gi and Gj have similar graph structures, hi and hj
dataset of Dg . To explicitly alleviate the negative influence are mapped close to each other through:
of those generated noisy samples and enforce the cross-

domain category consistency, we propose a regularization
Lps = T r H T L̄rg H (15)
to enforce the semantic alignment of the two domains

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6985

where L̄rg is the graph Laplacian matrix and L̄rg = D̄rg − based methods (Eigenpooling GCN [18], BrainGNN [7]
Ārg , H is the learned graph embeddings of the domains of and Population GCN [6]) and the autoencoder based
Dr and Dg . method (ASDdiagNet [19]). Moreover, the network based
Finally, the overall loss function for our graph classifica- feature is a linear classification using a ridge classifier.
tion can be formulated as: We also compare BrainGC-Net with four state-of-the-
art methods, including 4 topology-based approaches (i.e.,
LGCN = Lce + λ21 Lcv + λ22 Lcs + λ23 Lps (16) Clustering Coefficient (CC), t-BNE [20], NAG-FS [21]
and MC-NFE [10]), and 2 subgraph-based approaches (i.e.,
where Lce indicates the cross entropy loss, λ21 , λ22 and λ23 Graph Boosting, and Ordinal Pattern [22]). To make a fair
are regularization parameters. comparison, their originally released codes and published
settings are used in the experiment. Compared with all
baselines, the proposed BrainGC-Net achieves the best
3 Experiments performance in terms of all metrics. It demonstrates that our
BrainGC-Net is effective for graph classification with brain
The ABIDE database (Autism Brain Imaging Data disorder diagnosis. We observe consistent improvements
Exchange) which investigates the neural basis of autism from the proposed graph pooling. It can actually boost the
[17] aggregates data from 17 different acquisition sites performance of graph generation and graph classification.
and openly shares rs-fMRI and phenotypic data of 1112 The fact implies the necessity to develop a consistent
subjects. In this work, the images analyzed were those coarsened graph structure to tackle graph related problems.
preprocessed with the Connectome Computation System
(CCS). 3.2 Discussion on the EG-GraphGAN

3.1 Comparison with the state-of-the-art methods To demonstrate the effectiveness of our EG-GAN, we
compare EG-GAN with its three variants and three state-
To demonstrate the overall performance of brain network of-the-art graph generation methods on the original graphs
classification, we compare the proposed method with other and coarsened graphs, respectively. VAEGAN [24] uses
state-of-the-art methods. The performance in terms of the analytical KL loss to minimize the distance between
accuracy is shown in Table 2. To more comprehensively the prior and the posterior of the latents; AAE [25] is
evaluate our model, we compare BrainGC-Net with two a probabilistic autoencoder that uses GAN to perform
types of state-of-the-art neural network based methods: variational inference by matching the aggregated posterior
the covering CNN based method (BrainNetCNN), GCN of the hidden code vector of the autoencoder with an

Table 2 Performance comparison of various methods

Methods ACC AUC Specificity Sensitivity

Network based feature 0.559±0.001∗ 0.602±0.002∗ 0.578±0.001∗ 0.537±0.001∗


CC [22] 0.607±0.002∗ 0.624±0.003∗ 0.587±0.001∗ 0.559±0.004∗
t-BNE [20] 0.655±0.001∗ 0.708±0.001∗ 0.621±0.002∗ 0.608±0.002∗
Graph Boosting 0.646±0.002∗ 0.725±0.003∗ 0.639±0.003∗ 0.617±0.003∗
NAG-FS [21] 0.618±0.006∗ 0.599±0.006∗ 0.607±0.005∗ 0.588±0.005∗
MC-NFE [10] 0.684±0.003∗ 0.693±0.005∗ 0.636±0.003∗ 0.701±0.006∗
Ordinal Pattern [22] 0.641±0.002∗ 0.729±0.002∗ 0.628±0.001∗ 0.611±0.002∗
ASDdiagNet [19] 0.665±0.002∗ 0.721±0.003∗ 0.675±0.002∗ 0.653±0.003∗
BrainNetCNN [23] 0.651±0.003∗ 0.728±0.002∗ 0.656±0.003∗ 0.624±0.005∗
Eigenpooling GCN [18] 0.586±0.001∗ 0.655±0.003∗ 0.596±0.001∗ 0.574±0.001∗
BrainGNN [7] 0.671±0.003 0.743±0.003 0.681±0.003 0.659±0.003
MVS-GCN [9] 0.699±0.005 0.691±0.003 0.631±0.005 0.702±0.007
Population GCN [6] 0.635±0.002∗ 0.675±0.002∗ 0.651±0.002∗ 0.617±0.002∗
BrainGC-Net 0.728 ± 0.008 0.787 ± 0.003 0.708 ± 0.003 0.749 ± 0.003
BrainGC-Net w/o pooling 0.649±0.018 0.678±0.013∗ 0.622±0.017∗ 0.674±0.009∗

Each experiment is run 10 times and the average classification performance is reported. The best results are in bold. The values marked by ∗

indicate that our method achieves significantly different results compared with the competing methods

13
6986 P. Cao et al.

Table 3 Performance evaluation of compared algorithms regarding


graph statistical properties

Models Degree Edges Centrality

Real LCC CPL REDE CC


8.00 1.59 0.75 0.028
ARGA-VAE 1.61 0.31 0.10 0.046
α-GAN 1.74 0.32 0.12 0.051
VAEGAN 1.75 0.31 0.09 0.026
EG-GrapGAN 1.76 0.28 0.06 0.024

The real row includes the values of real graphs, while the rest are
the absolute values of differences between graphs generated by each
algorithm and the real graphs. LCC: Size of the largest connected
component; CPL: characteristic path length: REDE: Relative edge
distribution entropy; CC: clustering coefficient

arbitrary prior distribution; ARGA-VAE [26] encodes Fig. 4 t-SNE results of generated samples by comparable GAN
the topological structure and node content in a graph models
to a compact representation, on which a decoder is
trained to reconstruct the graph structure. Furthermore,
the latent representation is enforced to match a prior graphs preserving local topology properties that are similar
distribution via an adversarial training scheme. We measure to the original graphs. A high-quality generative model
the generated graphs from different GAN models in should be able to synthesize graphs that are consistent with
Table 3. Compared to the state-of-the-art methods on the the original graph distribution.
original graphs and coarsened graphs, our method achieves
better stability, balance, and competitive classification 3.3 The qualitative results of the comparable GANs
performance (Fig. 3). We show that the proposed EG-
GAN could enrich data to make it more beneficial for To intuitively understand the learning stability of EG-GAN,
the graph classification. The results indicate that the local we further illustrate the data distribution of the original
measure and global mapping regularization enables our and generated two class samples in two dimensional space.
GAN model to achieve improved graph generation quality. We also present qualitative comparisons with the αGAN,
The results suggest that the graph GANs are not able to ARGA-VAE and VAEGAN in Fig. 4. Figure 4 shows the
learn a graph representation without additional constraints. scatter plot results of t-SNE, with each point representing
It can be observed that the performances of all the GAN a data sample. We apply t-SNE analyses to both the
models are poor on the original graphs, due to the large original and synthetic datasets. This visualizes how closely
number of noisy and irrelevant connections, which limit the the distribution of generated samples resembles that of
performance of the graph generation. The results in Table 3 the original in 2-dimensional space, giving a qualitative
show that our proposed EG-GAN model leads to producing assessment. From Fig. 4, it can be observed that our
GAN is able to synthesize graphs that are consistent
CLassification Metrics Value

80
70
60
[%]

50
ACC AUC ACC AUC
Original Graph Coarsened Graph
CONDGEN VAEGAN
AAE ARGA-VA
EG-GAN w/o tp EG-GAN w/o consistency
EG-GAN
Fig. 5 The difference in the data distribution of the original and
Fig. 3 The comparison study of graph generation models on the generated graphs with the distribution consistency loss (a) or without
original graph and the coarsened graph for the graph classification it (b) in our GAN

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6987

CLassification Metrics Value [%] 85 the complementary consistency is important for the graph
83 data augmentation. It can be seen that DC-GCN w/o any
81 regularization performs worst. It actually further confirms
79 our finding that there exists inconsistency between the real
77 graphs and the generated graphs. The result demonstrates
75 that our proposed DC-GCN module is able to alleviate
73 the distribution gap between the two domains. The results
71 suggest that the distribution gap should be considered to
69 improve the classification performance when augmenting
67 data. Most previous work on data augmentation ignores the
65
ACC AUC issue of distribution gaps.
f-GCN on Dr s-GCN on Drg
3.5 The effectiveness of graph pooling
DC-GCN w/o any regularization DC-GCN(only cv)
DC-GCN(only cs) DC-GCN(only ps)
In this paper, we attempt to improve explainability by
DC-GCN
identifying the critical subnetwork through the indicative
Fig. 6 Ablation study of DC-GCN model for the graph classification edges learned. For instance, we visually explore the
coarsened graphs generated by the graph pooling module
and compare them with the original graphs. Two groups
with the original graph distribution with the help of our of samples, including ASD and normal controls, were
homeomorphic interpolation mapping. selected and visualized. The results are presented in Fig. 5.
The advantage of adopting the distribution consistency By observing them individually, the original graphs of
loss with homeomorphic mapping is that it enables us to each subject are highly irregular in the view. However,
achieve smooth interpolations of generated fake samples. the inconsistency of each object is alleviated with the
Figure 5 illustrates the embedded representation of the process of graph pooling. Therefore, our results seem
generated and true data with and without distribution to yield solid evidence that imposing a graph pooling
consistency mechanism. From Fig. 5, it can be observed that method during training the network is a viable method for
the distribution consistency loss is able to synthesize graphs improving GCN performances for diagnosis. We can also
that are consistent with the original graph distribution find that most of the edges in the brain network are non-
with the help of our homeomorphic interpolation mapping, indicative of the final classification task. Through graph
suggesting the strong ability of our method to improve the pooling, the indicative edges are highlighted. Furthermore,
modeling distribution of complex data. by comparing the results in Fig. 4(a) and (b), it is apparent
to see that the difference between the original graph of ASD
3.4 Discussion on the DC-GCN individuals and normal controls is difficult to differentiate.
In contrast, the difference between ASD individuals and
In addition to the above-mentioned results, we are also normal controls with respect to the coarsened graphs is
interested in the effectiveness of each component in the strengthened by the graph pooling.
proposed DC-GCN model. Accordingly, we conduct an
ablation study on the multiple regularization separately 3.6 The influence of the hyperparameters
to investigate how these components would affect the of BrainGC-Net
segmentation performance. From the results in Fig. 6, we
can find that compared with all baselines, the proposed In addition, we are also interested in the influence of the
DC-GCN generally achieves the best performance with model hyperparameters. To evaluate the impact of the loss
respect to all the metrics. It also demonstrates that exploiting weights on BrainGC-Net, we conducted six experiments

72.0

69.5 69.5

71.5
Value

Value

Value

69.0 ACC(%) ACC(%) ACC(%)


69.0
AUC(%) AUC(%) AUC(%)
71.0

68.5
68.5 70.5

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.2 0.4 0.6 0.8 1.0
Weight Value Weight Value Weight Value

Fig. 7 Classification performance for different weight values (λ11 : the weight value of topology loss, λ12 : the weight value of distribution
consistency loss and λ13 : the weight value of reconstruction loss) of EG-GAN losses

13
6988 P. Cao et al.

78

76
76 76

74

Value
Value
Value
ACC(%) ACC(%) 74 ACC(%)
74 AUC(%)
AUC(%) AUC(%)
72
72

72
70
70
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Weight Value Weight Value Weight Value

Fig. 8 Classification performance for different weight values (λ21 : the weight value of cross-view consistency loss, λ22 : the weight value of class
semantic consistency loss and λ23 : the weight value of pairwise similarity consistency loss) of classification losses

with varied values for the weights, including λ11 , λ12 , significant impact on the classification performance, which
λ13 , λ21 , λ22 and λ23 , which are illustrated in Figs. 7 and demonstrates that the loss weights are important factors for
8. For each experiment, we fix the weights of the other the model.
losses and individually adjust the impact of the current
hyperparameter on classification performance. Notability, 3.7 Interpretability
Fig. 7 is the results of varied values: λ11 , λ12 and λ13 for EG-
GAN. Meanwhile, Fig. 8 is the results of λ21 , λ22 and λ23 for To better understand the cortical circuitry underlying the
the classification. We can find that the weight values have a connectivity between large-scale neural networks, we evaluate

Fig. 9 The top 3 subnetworks


identified by BrainGC-Net

DMN

SN

CEN

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6989

our model to investigate the potential intrinsic subnetworks 4. Ktena SI, Parisot S, Ferrante E, Rajchl M, Lee M, Glocker B,
from a data driven perspective. In this experiment, we Rueckert D (2018) Metric learning with spectral graph convolu-
tions on brain connectivity networks. NeuroImage 169:431–442
also empirically investigate the effectiveness of subnetwork
5. Li X, Dvornek NC, Zhuang J, Ventola P, Duncan JS (2018) Brain
and inter-subnetwork connection identification. The top biomarker interpretation in asd using deep learning and fmri.
3 subnetworks selected by our BrainGC-Net are shown In: International conference on medical image computing and
in Fig. 9. We found that the subnetworks identified by computer-assisted intervention. Springer, pp 206–214
6. Parisot S, Ktena SI, Ferrante E, Lee M, Moreno RG, Glocker B,
BrainGC-Net yielded promising patterns that are expected
Rueckert D (2017) Spectral graph convolutions for population-
from prior knowledge on neuroimaging and cognition. based disease prediction. In: International conference on medical
These included the CEN, DMN and SN. Moreover, the image computing and computer-assisted intervention. Springer,
prior knowledge regularization is essential for identifying pp 177–185
7. Li X, Zhou Y, Dvornek N, Zhang M, Gao S, Zhuang J, Scheinost
the biomarkers and understanding the characteristic patterns
D, Staib LH, Ventola P, Duncan JS (2021) Braingnn: interpretable
of the disease. brain graph neural network for fmri analysis. Med Image Anal
74:102233
8. Jiang H, Cao P, Xu M, Yang J, Zaiane O (2020) Hi-gcn: a
hierarchical graph convolution network for graph embedding
4 Conclusion learning of brain network and brain disorders prediction. Comput
Biol Med 127:104096
Functional connectivity networks constructed from the rs- 9. Wen G, Cao P, Bao H, Yang W, Zheng T, Zaiane O (2022)
fMRI hold great promise for distinguishing the disorder Mvs-gcn: a prior brain structure learning-guided multi-view graph
patients from NCs. We developed a unified three-stage convolution network for autism spectrum disorder diagnosis.
Comput Biol Med, p 105239
learning framework for brain network classification, involv- 10. Wang N, Yao D, Ma L, Liu M (2022) Multi-site clustering and
ing graph structure learning, graph generation and graph nested feature extraction for identifying autism spectrum disorder
classification. By combining the strengths of these net- with resting-state fmri. Med Image Anal 75:102279
works, the proposed approach significantly improves the 11. Bajestani GS, Behrooz M, Khani AG, Nouri-Baygi M, Mollaei A
(2019) Diagnosis of autism spectrum disorder based on complex
state-of-the-art results in ASD classification on the bench- network features. Comput Methods Prog Biomed 177:277–283
mark dataset. Here we focused on a three-stage learning 12. Heinsfeld AS, Franco AR, Craddock RC, Buchweitz A,
framework; however, the end-to-end collaborative learn- Meneguzzi F (2018) Identification of autism spectrum disorder
ing is more interesting and challenging. Moreover, we aim using deep learning and the abide dataset. NeuroImage: Clinical
17:16–23
to investigate whether our framework is transferable and 13. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley
generalizable across disorder diseases with insufficient data. D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial
nets. Adv Neural Inf Process Syst 27:1–9
Acknowledgements This research was supported by the National 14. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC
Natural Science Foundation of China (No.62076059) and the Science (2017) Improved training of wasserstein gans. Adv Neural Inf
Project of Liaoning province (2021-MS-105). Process Syst 30:1–11
15. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative
adversarial networks. In: International conference on machine
Data Availability The datasets generated during and/or analysed learning. PMLR, pp 214–223
during the current study are available in the ABIDE repository, https:// 16. Dauphin YN, Fan A, Auli M, Grangier D (2017) Language
fcon 1000.projects.nitrc.org/indi/abide/. modeling with gated convolutional networks. In: International
conference on machine learning. PMLR, pp 933–941
Declarations 17. Di Martino A, Yan C-G, Li Q, Denio E, Castellanos FX, Alaerts
K, Anderson JS, Assaf M, Bookheimer SY, Dapretto M et al
(2014) The autism brain imaging data exchange: towards a large-
Conflict of Interests The authors declare that they have no conflicts of scale evaluation of the intrinsic brain architecture in autism. Mol
interest. Psychiatry 19:659–667
18. Ma Y, Wang S, Aggarwal CC, Tang J (2019) Graph convolutional
networks with eigenpooling. In: Proceedings of the 25th ACM
References SIGKDD International conference on knowledge discovery & data
mining, pp 723–731
19. Eslami T, Mirjalili V, Fong A, Laird AR, Saeed F (2019)
1. Kumar V, Garg R (2021) Resting state functional connectivity Asd-diagnet: a hybrid learning approach for detection of autism
alterations in individuals with autism spectrum disorders: a spectrum disorder using fmri data. Front Neuroinform 13:70
systematic review. Front Psychiatry, pp 1–55 20. Cao B, He L, Wei X, Xing M, Yu PS, Klumpp H, Leow AD (2017)
2. Wang M, Huang J, Liu M, Zhang D (2019) Functional T-bne: tensor-based brain network embedding. In: Proceedings of
connectivity network analysis with discriminative hub detection the 2017 SIAM International conference on data mining. SIAM,
for brain disease identification. In: Proceedings of the AAAI pp 189–197
conference on artificial intelligence, vol 33. pp 1198–1205 21. Mhiri I, Rekik I (2020) Joint functional brain network atlas esti-
3. Kipf TN, Welling M (2017) Semi-supervised classification with mation and feature selection for neurological disorder diagnosis
graph convolutional networks. ICLR, pp 1–11 with application to autism. Med Image Anal 60:101596

13
6990 P. Cao et al.

22. Zhang D, Huang J, Jie B, Du J, Tu L, Liu M (2018) Ordinal Wenju Yang is currently a
pattern: A new descriptor for brain connectivity networks. IEEE Ph.D. candidate in Key Labo-
Trans Med Imaging 37(7):1711–1722 ratory of Medical Image Com-
23. Kawahara J, Brown CJ, Miller SP, Booth BG, Chau V, puting of Ministry of Edu-
Grunau RE, Zwicker JG, Hamarneh G (2017) Brainnetcnn: cation, Northeastern Univer-
convolutional neural networks for brain networks; towards sity, China. The objectives of
predicting neurodevelopment. NeuroImage 146:1038–1049 his research are brain disease
24. Rosca M, Lakshminarayanan B, Warde-Farley D, Mohamed diagnosis, time series analysis
S (2017) Variational approaches for auto-encoding generative and speech emotion recogni-
adversarial networks, pp 1–21 tion.
25. Makhzani A, Shlens J, Jaitly N, Goodfellow I, Frey B (2015)
Adversarial autoencoders. Int Conf Learn Representations. pp 1–6
26. Pan S, Hu R, Long G, Jiang J, Yao L, Zhang C (2018) Adversar-
ially regularized graph autoencoder for graph embedding. Int Jt
Conf Artif Intell, pp 2609–2615

Publisher’s note Springer Nature remains neutral with regard to


jurisdictional claims in published maps and institutional affiliations.
Xiaoli Liu is a senior
researcher of DAMO
Peng Cao is an associate Academy, Alibaba Group.
professor of Key Laboratory She earned her Ph.D. degree
of Intelligent Computing in at Northeastern University,
Medical Image, Ministry of China. Her research interests
Education. He earned his include machine learning,
Ph.D. degree in computer computer vision.
application in 2014 at North-
eastern University, China.
His research interests include
brain network analysis,
machine learning for medical
data. His research papers have
been published in or accepted
by journals or conference
including Pattern Recogni-
tion, Computer Methods and
Programs in Biomedicine, Medical Image Computing and Computer
Assisted Intervention (MICCAI) and Thirty-Sixth AAAI Conference Jinzhu Yang received the
on Artificial Intelligence (AAAI-22). Ph.D. degree in pattern recog-
nition and intelligent system
from the Northeastern Uni-
Guangqi Wen is currently a versity, Shenyang, China, in
Ph.D. candidate in Key Labo- 2007. He is a professor and
ratory of Medical Image Com- director of Key Laboratory
puting of Ministry of Edu- of Intelligent Computing in
cation, Northeastern Univer- Medical Image, Ministry
sity, China. The objectives of Education. His research
of his research are (i) to interest focuses on the image
assist in the diagnosis of processing and analysis, arti-
brain diseases by leveraging ficial intelligence and pattern
machine learning methods and recognition and data analysis.
to explore information on the His research papers have been
critical biomarkers of brain published in or accepted by
disease occurrence and (ii) journals including Magnetic Resonance Imaging, BMC Bioinformat-
graph embedding learning in ics, Proceedings of the National Academy of Sciences of the United
the brain network. States of America(PNAS).

13
A unified framework of graph structure learning, graph generation and classification for brain network analysis 6991

Osmar Zaiane is a professor


in Computing Science at the
University of Alberta, Canada,
and Scientific Director of the
Alberta Innovates Centre for
Machine Learning. He is asso-
ciate Editor of many Interna-
tional Journals on data mining
and data analytics and served
as program chair and general
chair for scores of international
conferences in the field of
knowledge discovery and data
mining. His current research
interests include data mining,
healthcare informatics.

13

You might also like