Professional Documents
Culture Documents
Zhang Et Al. - 2021 - Adaptive Similarity Function With Structural Featu
Zhang Et Al. - 2021 - Adaptive Similarity Function With Structural Featu
1 Computer, Electrical and Mathematical Science Engineering Division, King Abdullah University of Science and
Technology (KAUST), Thuwal 23955, Saudi Arabia
2 Computational Communication Collaboratory, Nanjing University, Nanjing 210093, China
3 School of Information Science and Engineering, Shandong Normal University, Jinan 250022, China
arXiv:2111.07027v1 [cs.SI] 13 Nov 2021
Abstract—Link prediction is a fundamental problem of data networks, whether two nodes have a link must be determined
science, which usually calls for unfolding the mechanisms that experimentally, which is very costly. As a result, the known
govern the micro-dynamics of networks. In this regard, using links may represent fewer than 1% of the actual links [8].
features obtained from network embedding for predicting links
has drawn widespread attention. Though edge features based or Besides, in social networks like Facebook, only part of the
node similarity based methods have been proposed to solve the friendships among users are shown by the observed network,
link prediction problem, many technical challenges still exist due there still exist user pairs who already know each other but
to the unique structural properties of networks, especially when are not connected through Facebook. Due to this, it is always
the networks are sparse. From the graph mining perspective, a challenging yet meaningful task to identify which pairs
we first give empirical evidence of the inconsistency between
heuristic and learned edge features. Then we propose a novel link of nodes not connected in the current network are likely to
prediction framework, AdaSim, by introducing an Adaptive Sim- be connected in the actual network, i.e., predicting missing
ilarity function using features obtained from network embedding links. Acquiring such knowledge is useful, for example, in
based on random walks. The node feature representations are ob- biological domain, it gives invaluable guidance to carry out
tained by optimizing a graph-based objective function. Instead of targeted experiments, and in social network domain, it can be
generating edge features using binary operators, we perform link
prediction solely leveraging the node features of the network. We used to recommend promising friendships, thus enhance users’
define a flexible similarity function with one tunable parameter, loyalties to web services.
which serves as a penalty of the original similarity measure. The way to solve the link prediction problem [9]–[14]
The optimal value is learned through supervised learning thus is can be roughly divided into two categories, i.e., unsuper-
adaptive to data distribution. To evaluate the performance of our vised methods and supervised methods. In current research
proposed algorithm, we conduct extensive experiments on eleven
disparate networks of the real world. Experimental results show work on unsupervised link prediction, they mainly focus on
that AdaSim achieves better performance than state-of-the-art defining a similarity metric suv for unconnected node pairs
algorithms and is robust to different sparsities of the networks. (u, v) using information extracted from the network topology.
The defined metrics represent different kinds of proximity
between a pair of nodes and have different performance
I. I NTRODUCTION
among various networks and no one can dominate others.
features are handcraft and cost much human labor. Besides, learned from network embedding.
they often rely on domain knowledge, thus restrict the gener- • We show that AdaSim is flexible enough with only one
alization across different fields. tunable parameter. It is adjustable with respect to the
An alternative method is learning the features automatically network property. This flexibility endows AdaSim with
for the network. By treating networks as special documents the power of capturing the link formation mechanisms of
consist of a series of node sequences, the node features can different networks.
be learned by solving an optimization problem [28]. After • We demonstrate the effectiveness of AdaSim by conduct-
obtaining the features of nodes, the link prediction task is ing experiments on various disparate networks of the
traditionally conducted using two approaches. The first one real-world. The results show that the proposed method
is similarity-based ranking method [29], for example, cosine can boost the performance of link prediction in different
similarity is used to measure the similarity of pairs of nodes. degrees. Besides, we find that AdaSim works particularly
For two unconnected nodes, the larger the similarity value, the well for highly sparse networks.
higher the connection probability they have. The other one is The rest of the paper is structured as follows. Section II
edge feature based classification method [30], [31]. In this reviews some research works related to link prediction. The
method, the edge features are generated by heuristic binary problem definition of link prediction and feature learning are
operators such as Hadamard operator and Average operator. described in section III, and some empirical findings on the
Then a classifier is trained using these features and will be datasets are also given in this section. Section IV illustrates the
used to distinguish whether a link will form between two proposed link prediction method AdaSim with detail explana-
unconnected nodes. tions of each component. The experimental results and analysis
As the features learned through network embedding pre- are represented in Section V. Finally, Section VI concludes the
serve the network’s local structure, the cosine similarity works paper.
well for strongly assortative networks but fails to capture
the disassortativity of the network, i.e., nodes prefer to build
connections on large scales than on small scales [7]. Thus II. R ELATED WORK
using cosine similarity for link prediction suffers from sta- Early works on link prediction mainly focus on exploring
tistical performance drawbacks. Besides, the edge features topological information derived from graphs. Liben-Nowell
obtained through binary operators will potentially lose node’s and Kleinberg [32] studied several topological features such
information, since the features of nodes are learned by solving as common neighbors, Adamic-Adar, PageRank and Katz and
an optimization problem but the edge features are not (See found that topological information is beneficial in predicting
Fig.1 for a clear explanation and the details will be discussed links compared with a random predictor. Subsequently, some
in Section III-C). Furthermore, the edge and node features topology-based predictors were proposed for link prediction,
have the same dimensionality, which is usually on the scale e.g., resource allocation [33], community-enhanced predictors
of several hundred. This means that even for linear models [8] and clustering coefficient-based link prediction [34].
such as logistic regression, it still needs to learn hundreds of Hasan et al. were the first to model link prediction as a
parameters, which presents us with the question of feasibility binary classification problem from a machine learning per-
especially when the data size is large. How to design a simple, spective [21]. Various similarity metrics between node pairs
general yet efficient link prediction method using the node are extracted from the network and treated as features in a
features directly learned from network embedding still remains supervised learning setup, then a classifier is built with these
an open problem. features as inputs to distinguish positive samples (links that
To solve the above mentioned issues, we propose a novel form) and negative samples (links that do not form). Thereafter
link prediction method, AdaSim (Adaptive Similarity func- the supervised classification approach has been prevalent in the
tion), for large scale networks. The node feature represen- link prediction domain. Lichtenwalter et al. proposed a high-
tations are obtained by optimizing a graph-based objective performance link prediction framework called HPLP. Some
function using stochastic gradient descent techniques. Instead new perspectives for link prediction, e.g., the generality of
of generating edge features using heuristic binary operators, an algorithm, topological causes and sampling approaches,
we perform link prediction solely leveraging the node features were included in [15]. Later, a supervised random walk-based
of the network. Our essential contribution lies in defining algorithm was proposed by Backstrom and Leskovec [22]
a flexible node similarity function with only one tunable to effectively incorporate the information from the network
parameter, which serves as a penalty of the original similarity. structure with rich node and edge attribute data.
The optimal value can be obtained through supervised learning In addition, the link prediction problem is also extended
thus is adaptive to the data distribution, which gives AdaSim to heterogeneous information networks [35]–[38]. Among
the ability to capture the various link formation mechanisms these works, a core concept based on network schema was
of different networks. Compared with the original cosine proposed, namely meta-path. Multiple information sources can
similarity, the proposed method generalizes well across various be effectively fused into a single path and different meta paths
network datasets. have different physical meanings. Some similarity measures
In summary, our main contributions are listed as follows. can be calculated using meta paths, then they are treated as
• We propose, AdaSim, a novel link prediction method by features of a classifier to discriminate positive and negative
introducing an adaptive similarity function using features links.
3
All the works mentioned above on supervised link predic- for networks is presented. Finally, we introduce the empirical
tion use handcraft features, which require expensive human findings on several network datasets when using node features
labor and often rely on domain knowledge. To alleviate this, for link prediction.
one can use the latent features learned automatically through
representation learning [39]. For networks, the unsupervised A. Problem Formulation
feature learning methods typically use the spectral properties
Given a network G = (V, E), where V is the set of nodes,
of various matrix representations of graphs, such as adjacency
E ∈ (V ×V ) is the set of links. No multiple links or self-links
and Laplacian matrices. In the perspective of linear algebra,
are allowed for any two nodes in the network. It is assumed
this kind of method can actually be regarded as a dimensional
that some of the links in the network are unobserved or missing
reduction technique. Several works [40], [41] have been done
at the present stage. The link prediction task aims to predict
aiming to acquire the node features of graphs, but the compu-
the likelihood of a link between two unconnected nodes using
tation of eigendecomposition of a matrix is costly, thus makes
information intrinsic to the network.
these methods impractical to scale up to large networks.
Since here we are considering a supervised approach for
Perozzi et al. [28] extended the skip-gram model to graphs
link prediction, we first need to construct a labeled dataset
and proposed a framework, DeepWalk, by representing a
D = {(xi , yi ), i ∈ [1, m]}, where xi is the feature vector
network as a special “document” consists of a series of node
of the i-th sample and yi ∈ {0, 1} the corresponding label.
sequences, which are generated by random walks. DeepWalk
Specifically, xi = Φ(ui , vi ) in which ui and vi denote the
can learn features for nodes in the network, and the repre-
features of node ui and vi , respectively. The node features
sentation learning process is irrelevant to downstream tasks
are learned from network representation learning. Φ(·) is a
like node classification and link prediction. Later, Node2vec
mapping function from node features to node pair features.
was proposed by Grover and Leskovec [31]. Compared with
For any node pair in D, yi = 1 indicates that this node
DeepWalk, Node2vec uses a biased random walk to control
pair belongs to positive samples and otherwise the negative
the sampling space of node sequences. The network prop-
samples. Positive samples are the edges, E p , chosen randomly
erties such as homophily and structure equivalence can be
from the network G. We delete E p from G and keep the
captured by Node2vec. The link prediction was performed
obtained sub-network (Gs ) is fully connected. To generate
using edge features obtained through heuristic binary operators
negative samples, we sample an equal number of node pairs
on node features. In [42], the authors proposed a deep model
from G which having no edge connecting them. The dataset D
called SDNE to capture the highly non-linear property of
is spitted into two parts: training dataset DT and test dataset
networks. The first-order and second-order proximity were
DP . A classification model M can be learned with dataset
jointly exploited to capture the local and global network struc-
DT , then this model will be used for predicting whether a pair
ture, respectively. More recently, Wang et al. [43] proposed
of nodes in dataset DP should have a link connecting them.
a novel Modularized Nonnegative Matrix Factorization (M-
Our algorithms are typical methods from the field of graph
NMF) model to incorporate not only the local and global
mining. Hence, in contrast to part of our previous papers [9],
network structure but also the community information into
[10], we follow conventions in the field of artificial intelligence
network embedding. In order to model the diverse interacting
in which E p (positive samples) has 50% of the observed
roles of nodes when interacting with other nodes, Tu et al.
links, and scores are based on the combination of E p and
[44] presented a Context-Aware Network Embedding (CANE)
the same number of non-observed links (negative samples).
method by introducing a mutual attention mechanism. CANE
For the highly sparse sexual contact network, which has only
can model the semantic relationships between nodes more
a small number of nodes, E p instead comprises all observed
precisely. In order to save computation time, in [29], link
links.
prediction was directly carried out using cosine similarity of
node features instead of edge features.
The above works mainly focus on the network embedding B. Feature learning of network embedding
techniques and ignore the typical characteristics of link for- For a given network G = (V, E), a mapping function
mation. The main difference between existing work and our f : V −→ R|V |×d from nodes to feature vectors can
efforts lies in that we consider an adaptive similarity function be learned for link prediction. Here d is a user-specified
yet with a learning-based idea, making our model flexible parameter that denotes the number of dimensions of the feature
enough to capture the various link formation patterns of differ- vectors and f is a matrix of size |V | × d parameters. The
ent networks. For example, a negative value of p can weaken mapping function f is learned through a series of document-
the role of ‘structural equivalence’ and enhance the score of like node sequences, using optimization techniques originated
dissimilar node pairs, thus capturing the disassortativity on in language modeling.
link formation. The purpose of language modeling is to evaluate the like-
lihood of a sentence appearing in a document. The model is
built using a corpus C. More formally, it aims to maximize
III. P ROBLEM STATEMENT AND FEATURE LEARNING
FRAMEWORK P r(w|context(w)) (1)
In this section, we first give the formal definition of the over all training corpus, where w is a word of the vocabulary,
link prediction problem. Then the feature learning framework context(w) is the context of w that includes the words that
4
Hadamard
1 0.61 0.12 0.57 0.16 0.19 0.16 0.48 0.31 0.22 0.094 0.75 0.28 0.25 0.45 0.24 0.07 0.45 0.11 0.75
C
A
3
Division
2 8 0.2 0.52 0.52 0.044 0.55 0.34 0.45 0.14 0.17 0.83 0.13 0.57 0.048 0.074 0.2 0.29 0.4 0.14 0.60
0 7
4
10
B D
Average
F 0.47 0.11 0.26 0.19 0.12 0.37 0.49 0.24 0.17 0.14 0.36 0.52 0.4 0.11 0.46 0.33 0.064 0.49 0.45
6
5 11 14
9 13 0.30
W-L2
12 15
0.65 0.19 0.66 0.53 0.2 0.39 0.46 0.28 0.17 0.0079 0.81 0.19 0.11 0.51 0.23 0.0049 0.33 0.0037
E G H 16
0.15
I
W-L1
17 0.62 0.25 0.65 0.6 0.12 0.35 0.29 0.077 0.26 0.0051 0.85 0.018 0.15 0.61 0.058 0.027 0.4 0.046
G
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
(a) (b)
Fig. 1: (a)A toy network. (b)Pearson correlation coefficients between learned edge features and heuristic edge features. The
node sequences sampling strategy used in this example is random walk.
appear to both the left side of w and the right side. Recent TABLE I: Choice of binary operators for learning edge fea-
research on representation learning has put a lot of attention tures.
on leveraging probabilistic neural networks to build a general
Operator Symbol Definition
representation of words, extending the scope of language
modeling beyond its original goals. Each word is represented Hadamard [f (u) f (v)]i = fi (u) ∗ fi (v)
by a continuous and low-dimensional feature vector. The Average ⊕ [f (u) ⊕ f (v)]i =
fi (u)+fi (v)
2
problem then, is to maximize Division [f (u) f (v)]i =
fi (u)
fi (v)
P r(w|(f (wi−j ), · · · , f (wi−1 ), f (wi+1 ), · · · , f (wi+j ))), Weighted-L1 k · k1̄ [f (u) · f (v)]i = |fi (u) − fi (v)|
(2) Weighted-L2 k · k2̄ [f (u) · f (v)]i = |fi (u) ∗ fi (v)|2
where f (·) denotes the latent representation of a word.
The social representation of networks can be learned anal-
ogously through a series of node sequences generated by
1) Limitation of heuristic binary operators: Fig.1a shows
a specific sampling strategy S. Similar to the context of
a toy network (krackhardt kite graph) with 10 nodes and
word w in language modeling, NS (u) is defined to be the
18 edges. Each node and edge is marked with a unique
neighborhood of node u using sampling strategy S. The node
label. Given a specific sampling strategy S, we can obtain
representation of networks can be obtained by optimizing the
the node sequences and the corresponding edge sequences
following expression
simultaneously after performing S on the network. Hence,
P r(u|(f (ui−j ), · · · , f (ui−1 ), f (ui+1 ), · · · , f (ui+j ))). (3) both node representations and edge representations of the
network can be learned using optimization techniques. For a
The learned representations can capture the shared similarities specific pair of nodes, the learned edge representation is called
in local graph structure among nodes in the networks. Nodes the “true” features and the generated edge representation using
that have similar neighborhoods will acquire similar represen- binary operator is call the “heuristic” features. It is known
tations. that if a binary operator is good enough, it should be able
to accurately characterize the relationship of pairs of nodes,
C. Empirical findings on several network datasets i.e., the correlation between heuristic features and true features
After learning the representations for the nodes in the should be as strong as possible. For the 18 edges in the toy
network, there are two approaches to the link prediction task, network, five different kinds of heuristic binary operators [31]
i.e., node similarity-based method and edge feature-based (see Table I) are chosen to generate edge features1 , and their
method. The former is simple and scalable, and the latter is correlation with the true edge features are displayed in Fig.1b.
complex yet powerful. But both methods have their limitations On the basis of the evidence from Fig.1, we can tell
in effectively characterizing the link formation patterns of that different operators have different results in representing
node pairs. Since the node similarity-based method was not features of pairs of nodes and no one can dominate the
involved in learning, it can not be aware of the effects of global others. Some of the edges, e.g., edge 10 and 16, can be well
network property in link prediction. The edge feature-based characterized by the Hadamard operator, while others, for
method could not describe the node pair relationship very well example, edge 12, 14 and 17, can be characterized by the
at the feature level using a heuristic binary operator, as the Average operator. Furthermore, most values are less than 0.5,
information loss exists in the mapping procedure from node which means a weak correlation between the heuristic edge
features to edge features. We show the empirical evidence for
1 For fi (v)
the limitations of these two kinds of methods in the following the Division operator, we omit the kind of fi (u)
since it has very
fi (u)
subsections. similar results compared with fi (v)
.
5
0.25
0.6 0.3 data data
data
fitted curve fitted curve
fitted curve 0.2
0.5 0.25
Link formation probability
0.3 0.15
0.1
0.2 0.1
0.05
0.1 0.05
0 0 0
2 3 4 5 6 7 8 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 35 40
Geodesic distance Geodesic distance Geodesic distance
Fig. 2: Link formation patterns among different networks. The x axis represents the geodesic distance s between a pair of
nodes (u, v) and the y axis represents the link formation probability. The probabilities are calculated by |Esp |/|E p | in which
|E p | is the number of positive links that are randomly sampled from the network, and |Esp | is the number of pairs of nodes
that their geodesic distance is exactly s.
features and true edge features. This verifies our claim that chance to build relationships if they are structural equivalence
edge features obtained through heuristic binary operators may [31]. For two nodes, “the closer the graph distance, the easier
cause the loss of information of node features. for them to build link” holds not necessarily true, especially
2) Limitation of similarity based method: Given an uncon- when the network is sparse and disassortative.
nected node pair (u, v), several metrics can be used to measure As shown in Fig.2, we can see that different networks
their similarity, for example, the common neighbors between have different patterns in building new connections3 . These
u and v, the number of reachable paths from u and v. But patterns are closely related to the network properties, such as
here we only consider the metric of cosine similarity since clustering coefficient, graph density and assortativity. To some
we have the node pair’s feature vectors u and v, respectively. networks with high assortativity, two unconnected nodes tend
The cosine similarity is used to characterize the link formation to be connected if they are structurally close, while others are
probability and it is defined as not. More specifically, the link formation probability for two
unconnected nodes is vastly decreasing with the increase of
uT v
cos(u, v) = , (4) geodesic distance in the C.elegans dataset, and 97.8% new
kukkvk links span the geodesic distance less than 3. But for the
where (·)T denotes the transpose and k · k means the l2 - Gnutella dataset, with an increase of geodesic distance, the
norm of a vector. The cosine similarity measures the cosine of link formation probability first increases then decreases, and
the angle between two d-dimensional vectors obtained from most of the new links (62.4%) are generated by node pairs
network representation learning. In fact the idea of cosine with distance equals to 5 or 6. For the Router dataset, the new
similarity has been used for link prediction in several works links span a wide range of geodesic distances from 2 to 34
[29], [45], [46]. But there are a few issues when directly using and almost half of the new links (48.67%) span a distance
cosine similarity for link prediction. The first one is it did not larger than 5. The distribution of link formation probabilities
consider the label information of node pairs. Thus it belongs to is more complex than the other two datasets.
the category of unsupervised learning. However, lots of works The cosine similarity function assigns higher scores to pairs
have demonstrated that supervised learning approaches to link of nodes if they are close to each other and vice versa. It
prediction can enhance the performance [15], [23], [24]. The can capture link formation patterns in the case of Fig.2a, i.e.,
other one is cosine similarity is too rigid to capture different the shorter the distance between two unconnected nodes, the
link formation mechanisms of different networks. higher the probability to be connected. But cosine similarity
Since in the phase of representation learning for networks, fails to capture the patterns in the cases of Fig.2b and Fig.2c,
it is assumed that two nodes have similar representations if especially Fig.2b, in which the link formation pattern follows
they have similar context in the node sequences sampled by a Gaussian distribution. For the pattern of Fig.2b, nodes prefer
strategy S. For networks, this indicates that if two nodes to build connections with those that are relatively farther from
are structurally close2 to each other, then they have a high them. When performing a link prediction task in cases like
probability to simultaneously occur in the same sequence this, pairs of nodes with a relatively longer distance should be
which results in a high value in terms of cosine similarity. But more similar than those with a shorter one.
in real-world networks, whether two nodes will form a link is Thus, we need to design a flexible similarity function
not simply influenced by this kind of structural closeness. Two for link prediction and capture the various patterns of link
nodes far from each other in the network will also have a high formation. Besides the similarity function should be devised
on the basis of concision and scalability. This can be achieved
2 For the three nodes (v , v , v ) in the graph, suppose the geodesic
1 2 3
distance of (v1 , v2 ) is 2 and (v1 , v3 ) is 5, we say v2 is closer to v1 than v3 . 3 The datasets are described in Section V-A
6
B 2 3 2 3
ADFEF A; F ¡! D A ¡0:1107 0:0254 ¢ ¢ ¢ 0:0206
A
BEFDA D; E ¡! F 6 B 7 6¡0:1245 ¡0:1225 ¢ ¢ ¢ 7
¡0:0860 7
C
CGCBE C; C ¡! G 6 7 6
6 C 7 = 6 0:0967 ¡0:0167 ¢ ¢ ¢ ¡0:0341 7
¢¢¢ ¢¢¢¢¢¢ 6 7 6 7
E 4¢ ¢ ¢5 4 ¢ ¢ ¢ 5
B D FDFEB D; E ¡! F G ¡0:0037 0:1196 ¢ ¢ ¢ 0:0524
A GCBEF B; F ¡! E
C G
F
E
D
B B B
A A A
G
C C C
F
E uT v+p
E score = f(u; v; p) = kukkvk
E
D D D
G G G
F F F
Fig. 3: The whole framework of link prediction based on adaptive similarity function.
by adjusting the similarity of node pairs and balancing link A. Subgraph generation
formation probabilities among different distances. Inspired Unlike other tasks such as link clustering or node classifica-
by this, we propose a modified similarity function which is tion, in which the complete structural information is available,
defined as a certain fraction of the links needs to be removed before
performing network representation learning for link prediction.
uT v p
sim(u, v) = + , (5) In order to achieve this, one can iteratively select one link and
kukkvk kukkvk determine whether it is removable or not. But this operation
where p is a balance factor to control the similarity of node is less effective and very time consuming especially when the
pairs with different geodesic distances. As we have the labels network is very sparse since it needs to traverse almost all the
of node pairs, so the optimal value of p can be learned in a nodes in the graph.
supervised way. Instead, we propose a fast positive sampling method based
on minimum spanning tree (MST) in this paper. A MST is a
IV. T HE PROPOSED FRAMEWORK subset of the edges in the original graph G that connects all
the nodes together. That means all the edges are removable
In this work, we propose a novel link prediction frame- except those that belong to the MST and their deletion will
work, AdaSim, based on an adaptive similarity function using not break the property of G of being connectivity. Lines 1–
the features learned from network representation. The whole 4 in Algorithm 1 shows the core of our approach. We first
framework is illustrated in Fig.3. It can be divided into generate a MST of G denoted as Gmst = (V, Emst ) using
three parts: subgraph generation, feature representation and Kruskal’s algorithm. The positive samples Ep are randomly
similarity function learning. First, the positive and negative selected from E − Emst . To generate negative samples En ,
node pair indexes are obtained through random sampling. The we sample an equal number of node pairs from G, with no
corresponding subgraph Gs is generated via edge removal. edge connecting them (lines 5–7). Then we delete all the edges
Then we learn the representation of nodes in the network using in Ep from G and obtain the subgraph Gs (line 8).
an unsupervised way. Finally a similarity function is defined
and the optimal parameter is determined through supervised
learning. The obtained similarity function with optimal penalty B. Feature representation
can be directly used to solve the link prediction problem. Now we proceed to perform the feature learning task on
subgraph Gs . This task consists of two core components, i.e.,
Algorithm 1: Sub-graph generation a node sequence sampling strategy and a language model.
Input: G = (V, E), positive edge ratio r 1) Node sequence sampling: In terms of node sequence
Output: samples of node pairs and sub-graph Gs sampling, the most classical strategies are Breadth First Search
1: Emst = Kruskal(G) (BFS) and Depth First Search (DFS) [31]. BFS starts at a
2: n = |E| ∗ r specific node and explores the neighbors first before moving
3: Shuffle(E − Emst ) to the next level. Contrarily, DFS traversing the network starts
4: Ep = (E − Emst )[0 : n] at one node and explores as far as possible along each branch
5: for e ∈
/ E and |En | 6 n do before backtracking. BFS and DFS represent two extreme
6: append e to En sampling strategies with respect to the search space they
7: end for explore, bringing about valuable implications on the learned
8: Gs = (V, E − Ep ) representations. In fact, the neighborhood sampled by BFS
9: return Gs , Ep + En can reflect the structural equivalence about the networks and
the sampled nodes in DFS can reflect a macro-view of the
neighborhoods, which is essential in inferring communities
7
TABLE II: Basic topological information of the datasets. |V | is the number of nodes in the network and |E| is the total
links. Avg. Degree denotes the average node degree. Avg.CC represents the average clustering coefficient which indicates the
probability to be connected among neighbors of nodes. Diameter is the longest of all the calculated shortest paths in a network.
Density is the ratio of |E| to the number of possible edges.
TABLE III: AUC results of different algorithms. *For our algorithm, E p has 100% of the observed links for the highly sparse
sexual contact network, which moreover has only a small number of nodes.
C.elegans 0.7229 0.6737 0.7214 0.7176 0.7078 0.6171 0.7622 0.7700 0.7758
PB 0.8734 0.8192 0.9020 0.8752 0.8710 0.7158 0.8765 0.8792 0.9243
Wiki-Vote 0.8843 0.8694 0.9569 0.8855 0.8842 0.7999 0.8515 0.8675 0.9657
Dense
Email-enron 0.8956 0.8894 0.9195 0.8912 0.8944 0.8440 0.9300 0.9301 0.9514
Epinions 0.8131 0.8114 0.9715 0.8092 0.8129 0.9089 0.8099 0.8177 0.9746
Slashdot 0.7012 0.7005 0.8564 0.6832 0.7009 0.7512 0.7241 0.7327 0.8756
Sexual 0.4875 0.4875 0.4469 0.7481 0.5000 0.5031 0.7375 0.8750 0.8750*
Roadnet 0.5171 0.5171 0.3746 0.5000 0.5171 0.5325 0.7832 0.7909 0.8449
Sparse
Power 0.6177 0.6177 0.4925 0.5000 0.6177 0.4963 0.7533 0.7579 0.7675
Router 0.5557 0.5554 0.8919 0.5000 0.5556 0.8049 0.9074 0.9119 0.9315
p2p-Gnutella 0.5065 0.5065 0.7911 0.5018 0.5065 0.5689 0.5379 0.5572 0.7970
B. Baseline methods and evaluation metrics where tz is the number of triangles passing through node
In order to validate the performance of our proposed z.
algorithm, we compare AdaSim against the following link • Heterogeneity Index [10]. This method is based on the
prediction models. network heterogeneity and the state-of-the-art for sparse
• Common Neighbors (CN). For node u, let Γ(u) denote
and treelike networks.
the set of neighbors of u. Two nodes, u and v, have a sHEI α
uv = |ku − kv | ,
high probability of being connected if they have many
common neighbors [57], [58]. The simplest way to mea- where α is a free heterogeneity exponent.
sure this neighborhood overlap is by directly counting the •Node2vec [31]. This is a supervised way of link prediction
number of common neighbors, i.e., using logistic regression. The features used in this method
are generated through heuristic binary operators of node
sCN
uv = |Γ(u) ∩ Γ(v)|. pair features which are learned from network embedding.
• Resource Allocation (RA) [33]. For an unconnected node There are two parameters, p and q, to control the node
pair u and v, it is assumed that u can send some resources sequences sampling. Note that when p = q = 1, node2vec
to v by the medium of neighbors. The similarity between equals to DeepWalk [28].
u and v can be defined as the amount of resources Beside Node2vec, there are other approaches for unsuper-
received by v from u, which described as vised feature learning for graphs, such as spectral clustering
X 1 [60] and LINE [61]. We exclude them in this work since they
sPA
uv = , have already been shown to be inferior to Node2vec [31].
kz
z∈{Γ(u)∩Γ(v)} We also exclude other supervised methods, such as ensemble
where kz is the degree of z. learning [15] and support vector machines [21]. These methods
• Preferential Attachment (PA) [59]. Preferential attach- can get relatively better performance but at the cost of high
ment mechanism is used to generate random scale-free complexity, which is not our original attention.
networks, in which the new links connecting to u is We adopt the area under the receiver operating characteristic
proportional to ku . Similarly, the probability that a new (AUC) to quantitatively evaluate the performance of link pre-
link connecting u and v is proportional to ku × kv . The diction algorithms. The AUC value quantifies the probability
PA similarity index is defined as that a randomly chosen missing link is given a higher score
than a randomly chosen node pair without a link. A higher
sPA
uv = ku × kv . score means better performance.
• Salton Index (SI) [45]. The other name of SI is cosine
similarity and is defined as C. Experimental results
|Γ(u) ∩ Γ(v)| In order to obtain the following results, we set the pa-
sSI
uv = √ .
ku × kv rameters in line with the typical values in [31]. That is,
• Clustering Coefficient for Link Prediction (CCLP) [34]. d = 128, k = 10, l = 80, λ = 10, and the optimization is run
It is a similarity index with more local structural informa- for a single epoch. Fifty percent of the edges are removed and
tion considered. In this method, the local link information treated as positive examples. The negative node pairs which
is conveyed by clustering coefficient of common neigh- have no edge connecting them are randomly sampled from the
bors. network. For the two parameters, p and q, in Node2vec, they
X tz are selected through a grid search over p, q ∈ {0.25, 0.5, 1, 2}.
sCCLP
uv = ,
kz (kz − 1)/2 After the dataset is prepared we use ten-fold cross validation
z∈{Γ(u)∩Γ(v)}
10
0.8 1
0.9
0.75
0.8 0.8
0.7
0.7
AUC Value
AUC Value
AUC Value
0.6
0.65 0.6
0.6 0.5
0.4
0.4
0.55
0.3 0.2
0.5 0.2
0.45 0.1 0
-50 0 50 -50 0 50 -50 0 50
p p p
(a) C.elegans (b) Router (c) Wiki-vote
to evaluate the performance. For the sake of objectivity, the TABLE IV: Running time (s) comparisons among different
experiment is repeated ten times on each dataset and the algorithms on three representative sparse networks.
average results are reported in Table III.
Dataset CN CCLP HEI Node2vec AdaSim
A general observation we can draw from these results
is that the proposed link prediction algorithm, AdaSim, can Sex 0.0020 0.0015 0.0010 0.0240 0.0045
obtain better performance than all the baseline methods on Power 0.2280 0.0475 0.0255 0.2735 1.2755
all datasets. More specifically, the unsupervised similarity- p2p-Gnutella 4.4310 0.8085 0.3205 1.6615 4.0910
0.965 0.97
0.968
0.964
0.965
0.966
AUC Value
AUC Value
AUC Value
0.963
0.96
0.964
0.962
0.955
0.961 0.962
then follow the aforementioned experiment setup to report have quantitatively given the evidence of inconsistency be-
the results of different methods. The results on the Wiki-Vote tween heuristic edge features and learned ones. Moreover, we
dataset are displayed in Fig.5. Only four baseline methods are have developed a novel link prediction framework AdaSim
listed in the figure since CN,CCLP and DeepWalk performs by introducing an adaptive similarity function to deal with
similarly with RA and Node2vec, respectively. the inflexible of cosine similarity, especially for sparse or
It can be seen from the results that the AUC values decrease treelike networks. AdaSim first learns node representations of
with the increase of removed edge ratio since it is becoming networks by solving a graph-based objective function, then
more and more challenging to characterize node similarity adds a penalty parameter, p, on the original similarity function.
using the information of network topology. The similarity- At last, the optimal value of p is learned through supervised
based methods perform well when the removed edge ratio learning. The proposed AdaSim is flexible thus is adaptive to
is relatively small. But they degrade very quickly and give data distribution and can capture the various link formation
their way to Node2vec and AdaSim except the PA predictor, mechanisms of different networks. We conducted experiments
which have competitively performance but still insuperior to using publicly available real-world network datasets, and
AdaSim. AdaSim performs consistently well and robust to extensively compared AdaSim with seven well-established
different sparsity conditions of networks. Even when eighty representative baseline methods. The results show that AdaSim
percent of the edges are removed, the AdaSim can still hold the achieves better performance than state-of-the-art algorithms on
performance around 0.95 in terms of AUC. Overall, AdaSim all datasets. It is also robust to the sparsity of the networks and
is not only robust to different network conditions but also obtains competitive performance with even a large fraction of
achieves better performance than baselines. edges are missing.
[4] M. Newman, “The structure and function of complex networks,” SIAM [28] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
Rev., vol. 45, no. 2, pp. 167–256, Jan. 2003. of social representations,” in Proceedings of the 20th ACM SIGKDD
[5] M. E. J. Newman and A. Clauset, “Structure and inference in annotated International Conference on Knowledge Discovery and Data Mining,
networks,” Nature Communications, vol. 7, p. 11863, June 2016. ser. KDD ’14. New York, NY, USA: ACM, Aug. 2014, pp. 701–710.
[6] L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, and H. E. Stanley, “Toward [29] A. Zhiyuli, X. Liang, and X. Zhou, “Learning structural features of
link predictability of complex networks,” Proceedings of the National nodes in large-scale networks for link prediction,” in Proceedings of
Academy of Sciences, vol. 112, no. 8, pp. 2325–2330, Feb. 2015. the Thirtieth AAAI Conference on Artificial Intelligence, ser. AAAI’16.
[7] A. Clauset, C. Moore, and M. E. J. Newman, “Hierarchical structure Phoenix, Arizona, USA: AAAI Press, Feb. 2016, pp. 4286–4287.
and the prediction of missing links in networks,” Nature, vol. 453, no. [30] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Deep learning based
7191, pp. 98–101, May 2008. link prediction with social pattern and external attribute knowledge
[8] S. Soundarajan and J. Hopcroft, “Using community information to in bibliographic networks,” in 2016 IEEE International Conference on
improve the precision of link prediction methods,” in Proceedings of Smart Data (SmartData), 2016, pp. 815–821.
the 21st International Conference on World Wide Web, ser. WWW ’12 [31] A. Grover and J. Leskovec, “Node2vec: Scalable feature learning for
Companion. Lyon, France: ACM, Apr. 2012, pp. 607–608. networks,” in Proceedings of the 22nd ACM SIGKDD International
[9] K. ke Shang, M. Small, and W. sheng Yan, “Link direction for link Conference on Knowledge Discovery and Data Mining, ser. KDD ’16.
prediction,” Physica A: Statistical Mechanics and its Applications, vol. San Francisco, California, USA: ACM, Aug. 2016, pp. 855–864.
469, pp. 767–776, 2017. [32] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for
[10] K.-k. Shang, T.-c. Li, M. Small, D. Burton, and Y. Wang, “Link social networks,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019–
prediction for tree-like networks,” Chaos: An Interdisciplinary Journal 1031, May 2007.
of Nonlinear Science, vol. 29, no. 6, p. 061103, 2019.
[33] T. Zhou, L. Lü, and Y.-C. Zhang, “Predicting missing links via local
[11] X. Wu, J. Wu, Y. Li, and Q. Zhang, “Link prediction of time-evolving
information,” The European Physical Journal B, vol. 71, no. 4, pp. 623–
network based on node ranking,” Knowledge-Based Systems, vol. 195,
630, 2009.
p. 105740, 2020.
[12] L. Wang, J. Ren, B. Xu, J. Li, W. Luo, and F. Xia, “Model: Motif- [34] Z. Wu, Y. Lin, J. Wang, and S. Gregory, “Link prediction with node
based deep feature learning for link prediction,” IEEE Transactions on clustering coefficient,” Physica A: Statistical Mechanics and its Appli-
Computational Social Systems, vol. 7, no. 2, pp. 503–516, 2020. cations, vol. 452, pp. 1 – 8, June 2016.
[13] A. Rossi, D. Barbosa, D. Firmani, A. Matinata, and P. Merialdo, “Knowl- [35] Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Han, “Co-
edge graph embedding for link prediction: A comparative analysis,” author relationship prediction in heterogeneous bibliographic networks,”
ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 2, in Proceedings of the 2011 International Conference on Advances in
pp. 1–49, Jan. 2021. Social Networks Analysis and Mining, ser. ASONAM’11. Washington,
[14] L. Cai, J. Li, J. Wang, and S. Ji, “Line graph neural networks for DC, USA: IEEE Computer Society, July 2011, pp. 121–128.
link prediction,” IEEE Transactions on Pattern Analysis and Machine [36] Y. Dong, J. Tang, S. Wu, J. Tian, N. V. Chawla, J. Rao, and H. Cao,
Intelligence, pp. 1–1, 2021. “Link prediction and recommendation across heterogeneous social net-
[15] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla, “New perspectives works,” in Proceedings of the 12th IEEE International Conference on
and methods in link prediction,” in Proceedings of the 16th ACM Data Mining, ser. ICDM ’12. Washington, DC, USA: IEEE Computer
SIGKDD International Conference on Knowledge Discovery and Data Society, Dec. 2012, pp. 181–190.
Mining, ser. KDD ’10. New York, NY, USA: ACM, 2010, pp. 243–252. [37] Y. Dong, J. Zhang, J. Tang, N. V. Chawla, and B. Wang, “Coupledlp:
[16] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic Link prediction in coupled networks,” in Proceedings of the 21th ACM
prediction based on densely connected convolutional neural networks,” SIGKDD International Conference on Knowledge Discovery and Data
IEEE Communications Letters, vol. 22, no. 8, pp. 1656–1659, 2018. Mining, ser. KDD ’15. Sydney, NSW, Australia: ACM, Aug. 2015, pp.
[17] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer 199–208.
learning for intelligent cellular traffic prediction based on cross-domain [38] J. Zhang and P. S. Yu, “Integrated anchor and social link predictions
big data,” IEEE Journal on Selected Areas in Communications, vol. 37, across partially aligned social networks,” in Proceedings of the 24th
no. 6, pp. 1389–1401, 2019. International Joint Conference on Artificial Intelligence, ser. IJCAI ’15.
[18] W. Shen, H. Zhang, S. Guo, and C. Zhang, “Time-wise attention aided Buenos Aires, Argentina: AAAI Press, July 2015, pp. 2125–2131.
convolutional neural network for data-driven cellular traffic prediction,” [39] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
IEEE Wireless Communications Letters, vol. 10, no. 8, pp. 1747–1751, review and new perspectives,” IEEE Transactions on Pattern Analysis
2021. and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
[19] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “On telecommuni- [40] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
cation service imbalance and infrastructure resource deployment,” IEEE locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,
Wireless Communications Letters, vol. 10, no. 10, pp. 2125–2129, 2021. Dec. 2000.
[20] ——, “Dual attention-based federated learning for wireless traffic pre- [41] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality
diction,” in IEEE INFOCOM 2021 - IEEE Conference on Computer reduction and data representation,” Neural computation, vol. 15, no. 6,
Communications, 2021, pp. 1–10. pp. 1373–1396, June 2003.
[21] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link prediction using
[42] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,”
supervised learning,” in SDM’06: Workshop on Link Analysis, Counter-
in Proceedings of the 22nd ACM SIGKDD International Conference on
terrorism and Security, 2006.
Knowledge Discovery and Data Mining, ser. KDD ’16. San Francisco,
[22] L. Backstrom and J. Leskovec, “Supervised random walks: Predicting
California, USA: ACM, Aug. 2016, pp. 1225–1234.
and recommending links in social networks,” in Proceedings of the
Fourth ACM International Conference on Web Search and Data Mining, [43] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, “Community
ser. WSDM ’11. New York, NY, USA: ACM, 2011, pp. 635–644. preserving network embedding,” in Proceedings of the Thirty-First AAAI
[23] D. Davis, R. N. Lichtenwalter, and N. V. Chawla, “Supervised meth- Conference on Artificial Intelligence, ser. AAAI’17. North America:
ods for multi-relational link prediction,” Social Network Analysis and AAAI Press, Feb. 2017, pp. 203–209.
Mining, vol. 3, no. 2, pp. 127–141, June 2013. [44] C. Tu, H. Liu, Z. Liu, and M. Sun, “Cane: Context-aware network
[24] A. De, S. Bhattacharya, S. Sarkar, N. Ganguly, and S. Chakrabarti, “Dis- embedding for relation modeling,” in Proceedings of the 55th Annual
criminative link prediction using local, community, and global signals,” Meeting of the Association for Computational Linguistics, ser. ACL.
IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, Vancouver, Canada: ACL, July 2017, pp. 1–10.
pp. 2057–2070, Aug. 2016. [45] L. Lü and T. Zhou, “Link prediction in complex networks: A survey,”
[25] X. Chang, F. Nie, S. Wang, Y. Yang, X. Zhou, and C. Zhang, “Compound Physica A: Statistical Mechanics and its Applications, vol. 390, no. 6,
rank- k projections for bilinear analysis,” IEEE Transactions on Neural pp. 1150 – 1170, Dec. 2011.
Networks and Learning Systems, vol. 27, no. 7, pp. 1502–1513, 2016. [46] C. C. Fei Gao, Katarzyna Musial and S. Tsoka, “Link prediction methods
[26] H. Wang, Z. Li, Y. Li, B. Gupta, and C. Choi, “Visual saliency guided and their accuracy for different social networks and network metrics,”
complex image retrieval,” Pattern Recognition Letters, vol. 130, pp. 64– Scientific Programming, vol. 2015, pp. 1–13, Jan. 2015.
72, 2020. [47] F. Fouss, A. Pirotte, J. m. Renders, and M. Saerens, “Random-walk
[27] D. Yuan, X. Chang, P.-Y. Huang, Q. Liu, and Z. He, “Self-supervised computation of similarities between nodes of a graph with application to
deep correlation tracking,” IEEE Transactions on Image Processing, collaborative recommendation,” IEEE Transactions on Knowledge and
vol. 30, pp. 976–985, 2021. Data Engineering, vol. 19, no. 3, pp. 355–369, Mar. 2007.
13
[48] A. W. Yu, N. Mamoulis, and H. Su, “Reverse top-k search using random
walk with restart,” Proceedings of the VLDB Endowment, vol. 7, no. 5,
pp. 401–412, Jan. 2014.
[49] H.-H. Chen and C. L. Giles, “Ascos++: An asymmetric similarity
measure for weighted networks to address the problem of simrank,”
ACM Transactions on Knowledge Discovery from Data, vol. 10, no. 2,
pp. 15:1–15:26, Oct. 2015.
[50] C. Cooper, T. Radzik, and Y. Siantos, Fast Low-Cost Estimation of
Network Properties Using Random Walks, ser. WAW’13. Cambridge,
MA, USA: Springer International Publishing, Dec. 2013, vol. 8305, pp.
130–143.
[51] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of
Word Representations in Vector Space,” ArXiv e-prints, Jan. 2013.
[52] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-
world’networks,” nature, vol. 393, no. 6684, pp. 440–442, 1998.
[53] L. A. Adamic and N. Glance, “The political blogosphere and the 2004
u.s. election: Divided they blog,” in Proceedings of the 3rd International
Workshop on Link Discovery, ser. LinkKDD ’05. New York, NY, USA:
ACM, 2005, pp. 36–43.
[54] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network
dataset collection,” http://snap.stanford.edu/data, June 2014.
[55] P. Bearman, J. Moody, and K. Stovel, “Chains of affection: The structure
of adolescent romantic and sexual networks,” American Journal of
Sociology, vol. 110, no. 1, pp. 44–91, 2004.
[56] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, “Measuring
isp topologies with rocketfuel,” IEEE/ACM Transactions on Networking,
vol. 12, no. 1, pp. 2–16, Feb. 2004.
[57] G. Kossinets, “Effects of missing data in social networks,” Social
Networks, vol. 28, no. 3, pp. 247 – 268, July 2006.
[58] M. E. J. Newman, “Clustering and preferential attachment in growing
networks,” Physical review E, vol. 64, p. 025102, July 2001.
[59] A.-L. Barabási and R. Albert, “Emergence of scaling in random net-
works,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999.
[60] L. Tang and H. Liu, “Leveraging social media networks for classifi-
cation,” Data Min. Knowl. Discov., vol. 23, no. 3, pp. 447–478, Nov.
2011.
[61] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line:
Large-scale information network embedding,” in Proceedings of the 24th
International Conference on World Wide Web, May 2015, pp. 1067–
1077.