Zhang Et Al. - 2021 - Adaptive Similarity Function With Structural Featu

1
Adaptive Similarity Function with Structural

Features of Network Embedding for Missing Link
Prediction
Chuanting Zhang1 , Ke-ke Shang2 , and Jingping Qiao3
1 Computer, Electrical and Mathematical Science Engineering Division, King Abdullah University of Science and
Technology (KAUST), Thuwal 23955, Saudi Arabia
2 Computational Communication Collaboratory, Nanjing University, Nanjing 210093, China
3 School of Information Science and Engineering, Shandong Normal University, Jinan 250022, China
arXiv:2111.07027v1 [cs.SI] 13 Nov 2021
Abstract—Link prediction is a fundamental problem of data networks, whether two nodes have a link must be determined
science, which usually calls for unfolding the mechanisms that experimentally, which is very costly. As a result, the known
govern the micro-dynamics of networks. In this regard, using links may represent fewer than 1% of the actual links [8].
features obtained from network embedding for predicting links
has drawn widespread attention. Though edge features based or Besides, in social networks like Facebook, only part of the
node similarity based methods have been proposed to solve the friendships among users are shown by the observed network,
link prediction problem, many technical challenges still exist due there still exist user pairs who already know each other but
to the unique structural properties of networks, especially when are not connected through Facebook. Due to this, it is always
the networks are sparse. From the graph mining perspective, a challenging yet meaningful task to identify which pairs
we first give empirical evidence of the inconsistency between
heuristic and learned edge features. Then we propose a novel link of nodes not connected in the current network are likely to
prediction framework, AdaSim, by introducing an Adaptive Sim- be connected in the actual network, i.e., predicting missing
ilarity function using features obtained from network embedding links. Acquiring such knowledge is useful, for example, in
based on random walks. The node feature representations are ob- biological domain, it gives invaluable guidance to carry out
tained by optimizing a graph-based objective function. Instead of targeted experiments, and in social network domain, it can be
generating edge features using binary operators, we perform link
prediction solely leveraging the node features of the network. We used to recommend promising friendships, thus enhance users’
define a flexible similarity function with one tunable parameter, loyalties to web services.
which serves as a penalty of the original similarity measure. The way to solve the link prediction problem [9]–[14]
The optimal value is learned through supervised learning thus is can be roughly divided into two categories, i.e., unsuper-
adaptive to data distribution. To evaluate the performance of our vised methods and supervised methods. In current research
proposed algorithm, we conduct extensive experiments on eleven
disparate networks of the real world. Experimental results show work on unsupervised link prediction, they mainly focus on
that AdaSim achieves better performance than state-of-the-art defining a similarity metric suv for unconnected node pairs
algorithms and is robust to different sparsities of the networks. (u, v) using information extracted from the network topology.
The defined metrics represent different kinds of proximity
between a pair of nodes and have different performance
I. I NTRODUCTION
among various networks and no one can dominate others.
N ETWORKS have recently emerged as an important tool

for representing and analyzing many kinds of interacting
systems ranging from biological to social science [1]. As
Most of the metrics are easy to compute and interpret, but
they are so invariant that fundamentally unable to cope with
dynamics, interdependencies, and other properties in networks
technological innovation and data explosion gather pace, we [15]. Machine learning and artificial intelligence technologies
humans are now moving into the era of big data, hence the [16]–[20] are revolutionizing many domains including graph
reach and participate of these networks is rapidly expanding. mining. Thus, the link prediction problem can also be posed
Studying these complex, interlocking networks can help us as a supervised binary classification task from a machine
understand the operation mechanism of real-world systems. learning perspective [21]. Since then the research of supervised
Therefore, in the past years, lots of work has been dedicated methods for link prediction has become prominent [15], [22]–
to studying evolution [2], [3], topologies [4], [5], and charac- [24], and the results of these researches provide confirmatory
teristics [6] of networks, attracting researchers from physics, evidence that a supervised approach can enhance the link
sociology, and computer science. prediction performance.
Under many circumstances however, the current observa- Choosing an appropriate feature set is crucial for any su-
tions of various network data are substantially incomplete pervised machine learning task [25]–[27]. For link prediction,
[7]. For example, in protein-protein interaction and metabolic each sample in the dataset corresponds to a pair of nodes.
Corresponding authors: Ke-ke Shang (kekeshang@nju.edu.cn) and Jingping A typical solution is using multiple topological similarities
Qiao ( jingpingqiao@sdnu.edu.cn). as features and this is the most intuitive way. But all these
2
features are handcraft and cost much human labor. Besides, learned from network embedding.
they often rely on domain knowledge, thus restrict the gener- • We show that AdaSim is flexible enough with only one
alization across different fields. tunable parameter. It is adjustable with respect to the
An alternative method is learning the features automatically network property. This flexibility endows AdaSim with
for the network. By treating networks as special documents the power of capturing the link formation mechanisms of
consist of a series of node sequences, the node features can different networks.
be learned by solving an optimization problem [28]. After • We demonstrate the effectiveness of AdaSim by conduct-
obtaining the features of nodes, the link prediction task is ing experiments on various disparate networks of the
traditionally conducted using two approaches. The first one real-world. The results show that the proposed method
is similarity-based ranking method [29], for example, cosine can boost the performance of link prediction in different
similarity is used to measure the similarity of pairs of nodes. degrees. Besides, we find that AdaSim works particularly
For two unconnected nodes, the larger the similarity value, the well for highly sparse networks.
higher the connection probability they have. The other one is The rest of the paper is structured as follows. Section II
edge feature based classification method [30], [31]. In this reviews some research works related to link prediction. The
method, the edge features are generated by heuristic binary problem definition of link prediction and feature learning are
operators such as Hadamard operator and Average operator. described in section III, and some empirical findings on the
Then a classifier is trained using these features and will be datasets are also given in this section. Section IV illustrates the
used to distinguish whether a link will form between two proposed link prediction method AdaSim with detail explana-
unconnected nodes. tions of each component. The experimental results and analysis
As the features learned through network embedding pre- are represented in Section V. Finally, Section VI concludes the
serve the network’s local structure, the cosine similarity works paper.
well for strongly assortative networks but fails to capture
the disassortativity of the network, i.e., nodes prefer to build
connections on large scales than on small scales [7]. Thus II. R ELATED WORK
using cosine similarity for link prediction suffers from sta- Early works on link prediction mainly focus on exploring
tistical performance drawbacks. Besides, the edge features topological information derived from graphs. Liben-Nowell
obtained through binary operators will potentially lose node’s and Kleinberg [32] studied several topological features such
information, since the features of nodes are learned by solving as common neighbors, Adamic-Adar, PageRank and Katz and
an optimization problem but the edge features are not (See found that topological information is beneficial in predicting
Fig.1 for a clear explanation and the details will be discussed links compared with a random predictor. Subsequently, some
in Section III-C). Furthermore, the edge and node features topology-based predictors were proposed for link prediction,
have the same dimensionality, which is usually on the scale e.g., resource allocation [33], community-enhanced predictors
of several hundred. This means that even for linear models [8] and clustering coefficient-based link prediction [34].
such as logistic regression, it still needs to learn hundreds of Hasan et al. were the first to model link prediction as a
parameters, which presents us with the question of feasibility binary classification problem from a machine learning per-
especially when the data size is large. How to design a simple, spective [21]. Various similarity metrics between node pairs
general yet efficient link prediction method using the node are extracted from the network and treated as features in a
features directly learned from network embedding still remains supervised learning setup, then a classifier is built with these
an open problem. features as inputs to distinguish positive samples (links that
To solve the above mentioned issues, we propose a novel form) and negative samples (links that do not form). Thereafter
link prediction method, AdaSim (Adaptive Similarity func- the supervised classification approach has been prevalent in the
tion), for large scale networks. The node feature represen- link prediction domain. Lichtenwalter et al. proposed a high-
tations are obtained by optimizing a graph-based objective performance link prediction framework called HPLP. Some
function using stochastic gradient descent techniques. Instead new perspectives for link prediction, e.g., the generality of
of generating edge features using heuristic binary operators, an algorithm, topological causes and sampling approaches,
we perform link prediction solely leveraging the node features were included in [15]. Later, a supervised random walk-based
of the network. Our essential contribution lies in defining algorithm was proposed by Backstrom and Leskovec [22]
a flexible node similarity function with only one tunable to effectively incorporate the information from the network
parameter, which serves as a penalty of the original similarity. structure with rich node and edge attribute data.
The optimal value can be obtained through supervised learning In addition, the link prediction problem is also extended
thus is adaptive to the data distribution, which gives AdaSim to heterogeneous information networks [35]–[38]. Among
the ability to capture the various link formation mechanisms these works, a core concept based on network schema was
of different networks. Compared with the original cosine proposed, namely meta-path. Multiple information sources can
similarity, the proposed method generalizes well across various be effectively fused into a single path and different meta paths
network datasets. have different physical meanings. Some similarity measures
In summary, our main contributions are listed as follows. can be calculated using meta paths, then they are treated as
• We propose, AdaSim, a novel link prediction method by features of a classifier to discriminate positive and negative
introducing an adaptive similarity function using features links.
3
All the works mentioned above on supervised link predic- for networks is presented. Finally, we introduce the empirical
tion use handcraft features, which require expensive human findings on several network datasets when using node features
labor and often rely on domain knowledge. To alleviate this, for link prediction.
one can use the latent features learned automatically through
representation learning [39]. For networks, the unsupervised A. Problem Formulation
feature learning methods typically use the spectral properties
Given a network G = (V, E), where V is the set of nodes,
of various matrix representations of graphs, such as adjacency
E ∈ (V ×V ) is the set of links. No multiple links or self-links
and Laplacian matrices. In the perspective of linear algebra,
are allowed for any two nodes in the network. It is assumed
this kind of method can actually be regarded as a dimensional
that some of the links in the network are unobserved or missing
reduction technique. Several works [40], [41] have been done
at the present stage. The link prediction task aims to predict
aiming to acquire the node features of graphs, but the compu-
the likelihood of a link between two unconnected nodes using
tation of eigendecomposition of a matrix is costly, thus makes
information intrinsic to the network.
these methods impractical to scale up to large networks.
Since here we are considering a supervised approach for
Perozzi et al. [28] extended the skip-gram model to graphs
link prediction, we first need to construct a labeled dataset
and proposed a framework, DeepWalk, by representing a
D = {(xi , yi ), i ∈ [1, m]}, where xi is the feature vector
network as a special “document” consists of a series of node
of the i-th sample and yi ∈ {0, 1} the corresponding label.
sequences, which are generated by random walks. DeepWalk
Specifically, xi = Φ(ui , vi ) in which ui and vi denote the
can learn features for nodes in the network, and the repre-
features of node ui and vi , respectively. The node features
sentation learning process is irrelevant to downstream tasks
are learned from network representation learning. Φ(·) is a
like node classification and link prediction. Later, Node2vec
mapping function from node features to node pair features.
was proposed by Grover and Leskovec [31]. Compared with
For any node pair in D, yi = 1 indicates that this node
DeepWalk, Node2vec uses a biased random walk to control
pair belongs to positive samples and otherwise the negative
the sampling space of node sequences. The network prop-
samples. Positive samples are the edges, E p , chosen randomly
erties such as homophily and structure equivalence can be
from the network G. We delete E p from G and keep the
captured by Node2vec. The link prediction was performed
obtained sub-network (Gs ) is fully connected. To generate
using edge features obtained through heuristic binary operators
negative samples, we sample an equal number of node pairs
on node features. In [42], the authors proposed a deep model
from G which having no edge connecting them. The dataset D
called SDNE to capture the highly non-linear property of
is spitted into two parts: training dataset DT and test dataset
networks. The first-order and second-order proximity were
DP . A classification model M can be learned with dataset
jointly exploited to capture the local and global network struc-
DT , then this model will be used for predicting whether a pair
ture, respectively. More recently, Wang et al. [43] proposed
of nodes in dataset DP should have a link connecting them.
a novel Modularized Nonnegative Matrix Factorization (M-
Our algorithms are typical methods from the field of graph
NMF) model to incorporate not only the local and global
mining. Hence, in contrast to part of our previous papers [9],
network structure but also the community information into
[10], we follow conventions in the field of artificial intelligence
network embedding. In order to model the diverse interacting
in which E p (positive samples) has 50% of the observed
roles of nodes when interacting with other nodes, Tu et al.
links, and scores are based on the combination of E p and
[44] presented a Context-Aware Network Embedding (CANE)
the same number of non-observed links (negative samples).
method by introducing a mutual attention mechanism. CANE
For the highly sparse sexual contact network, which has only
can model the semantic relationships between nodes more
a small number of nodes, E p instead comprises all observed
precisely. In order to save computation time, in [29], link
links.
prediction was directly carried out using cosine similarity of
node features instead of edge features.
The above works mainly focus on the network embedding B. Feature learning of network embedding
techniques and ignore the typical characteristics of link for- For a given network G = (V, E), a mapping function
mation. The main difference between existing work and our f : V −→ R|V |×d from nodes to feature vectors can
efforts lies in that we consider an adaptive similarity function be learned for link prediction. Here d is a user-specified
yet with a learning-based idea, making our model flexible parameter that denotes the number of dimensions of the feature
enough to capture the various link formation patterns of differ- vectors and f is a matrix of size |V | × d parameters. The
ent networks. For example, a negative value of p can weaken mapping function f is learned through a series of document-
the role of ‘structural equivalence’ and enhance the score of like node sequences, using optimization techniques originated
dissimilar node pairs, thus capturing the disassortativity on in language modeling.
link formation. The purpose of language modeling is to evaluate the like-
lihood of a sentence appearing in a document. The model is
built using a corpus C. More formally, it aims to maximize
III. P ROBLEM STATEMENT AND FEATURE LEARNING
FRAMEWORK P r(w|context(w)) (1)
In this section, we first give the formal definition of the over all training corpus, where w is a word of the vocabulary,
link prediction problem. Then the feature learning framework context(w) is the context of w that includes the words that
4
Hadamard
1 0.61 0.12 0.57 0.16 0.19 0.16 0.48 0.31 0.22 0.094 0.75 0.28 0.25 0.45 0.24 0.07 0.45 0.11 0.75
C
A
3
Division
2 8 0.2 0.52 0.52 0.044 0.55 0.34 0.45 0.14 0.17 0.83 0.13 0.57 0.048 0.074 0.2 0.29 0.4 0.14 0.60
0 7
4
10
B D
Average
F 0.47 0.11 0.26 0.19 0.12 0.37 0.49 0.24 0.17 0.14 0.36 0.52 0.4 0.11 0.46 0.33 0.064 0.49 0.45
6
5 11 14
9 13 0.30
W-L2
12 15
0.65 0.19 0.66 0.53 0.2 0.39 0.46 0.28 0.17 0.0079 0.81 0.19 0.11 0.51 0.23 0.0049 0.33 0.0037
E G H 16
0.15
I
W-L1
17 0.62 0.25 0.65 0.6 0.12 0.35 0.29 0.077 0.26 0.0051 0.85 0.018 0.15 0.61 0.058 0.027 0.4 0.046
G
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
(a) (b)
Fig. 1: (a)A toy network. (b)Pearson correlation coefficients between learned edge features and heuristic edge features. The
node sequences sampling strategy used in this example is random walk.
appear to both the left side of w and the right side. Recent TABLE I: Choice of binary operators for learning edge fea-
research on representation learning has put a lot of attention tures.
on leveraging probabilistic neural networks to build a general
Operator Symbol Definition
representation of words, extending the scope of language
modeling beyond its original goals. Each word is represented Hadamard [f (u) f (v)]i = fi (u) ∗ fi (v)
by a continuous and low-dimensional feature vector. The Average ⊕ [f (u) ⊕ f (v)]i =
fi (u)+fi (v)
2
problem then, is to maximize Division [f (u) f (v)]i =
fi (u)
fi (v)
P r(w|(f (wi−j ), · · · , f (wi−1 ), f (wi+1 ), · · · , f (wi+j ))), Weighted-L1 k · k1̄ [f (u) · f (v)]i = |fi (u) − fi (v)|
(2) Weighted-L2 k · k2̄ [f (u) · f (v)]i = |fi (u) ∗ fi (v)|2
where f (·) denotes the latent representation of a word.
The social representation of networks can be learned anal-
ogously through a series of node sequences generated by
1) Limitation of heuristic binary operators: Fig.1a shows
a specific sampling strategy S. Similar to the context of
a toy network (krackhardt kite graph) with 10 nodes and
word w in language modeling, NS (u) is defined to be the
18 edges. Each node and edge is marked with a unique
neighborhood of node u using sampling strategy S. The node
label. Given a specific sampling strategy S, we can obtain
representation of networks can be obtained by optimizing the
the node sequences and the corresponding edge sequences
following expression
simultaneously after performing S on the network. Hence,
P r(u|(f (ui−j ), · · · , f (ui−1 ), f (ui+1 ), · · · , f (ui+j ))). (3) both node representations and edge representations of the
network can be learned using optimization techniques. For a
The learned representations can capture the shared similarities specific pair of nodes, the learned edge representation is called
in local graph structure among nodes in the networks. Nodes the “true” features and the generated edge representation using
that have similar neighborhoods will acquire similar represen- binary operator is call the “heuristic” features. It is known
tations. that if a binary operator is good enough, it should be able
to accurately characterize the relationship of pairs of nodes,
C. Empirical findings on several network datasets i.e., the correlation between heuristic features and true features
After learning the representations for the nodes in the should be as strong as possible. For the 18 edges in the toy
network, there are two approaches to the link prediction task, network, five different kinds of heuristic binary operators [31]
i.e., node similarity-based method and edge feature-based (see Table I) are chosen to generate edge features1 , and their
method. The former is simple and scalable, and the latter is correlation with the true edge features are displayed in Fig.1b.
complex yet powerful. But both methods have their limitations On the basis of the evidence from Fig.1, we can tell
in effectively characterizing the link formation patterns of that different operators have different results in representing
node pairs. Since the node similarity-based method was not features of pairs of nodes and no one can dominate the
involved in learning, it can not be aware of the effects of global others. Some of the edges, e.g., edge 10 and 16, can be well
network property in link prediction. The edge feature-based characterized by the Hadamard operator, while others, for
method could not describe the node pair relationship very well example, edge 12, 14 and 17, can be characterized by the
at the feature level using a heuristic binary operator, as the Average operator. Furthermore, most values are less than 0.5,
information loss exists in the mapping procedure from node which means a weak correlation between the heuristic edge
features to edge features. We show the empirical evidence for
1 For fi (v)
the limitations of these two kinds of methods in the following the Division operator, we omit the kind of fi (u)
since it has very
fi (u)
subsections. similar results compared with fi (v)
.
5
0.25
0.6 0.3 data data
data
fitted curve fitted curve
fitted curve 0.2
0.5 0.25
Link formation probability

0.4 0.2
0.15
0.3 0.15
0.1
0.2 0.1
0.05
0.1 0.05
0 0 0
2 3 4 5 6 7 8 2 3 4 5 6 7 8 9 10 0 5 10 15 20 25 30 35 40
Geodesic distance Geodesic distance Geodesic distance
(a) Epinions (b) Gnutella (c) Router
Fig. 2: Link formation patterns among different networks. The x axis represents the geodesic distance s between a pair of
nodes (u, v) and the y axis represents the link formation probability. The probabilities are calculated by |Esp |/|E p | in which
|E p | is the number of positive links that are randomly sampled from the network, and |Esp | is the number of pairs of nodes
that their geodesic distance is exactly s.
features and true edge features. This verifies our claim that chance to build relationships if they are structural equivalence
edge features obtained through heuristic binary operators may [31]. For two nodes, “the closer the graph distance, the easier
cause the loss of information of node features. for them to build link” holds not necessarily true, especially
2) Limitation of similarity based method: Given an uncon- when the network is sparse and disassortative.
nected node pair (u, v), several metrics can be used to measure As shown in Fig.2, we can see that different networks
their similarity, for example, the common neighbors between have different patterns in building new connections3 . These
u and v, the number of reachable paths from u and v. But patterns are closely related to the network properties, such as
here we only consider the metric of cosine similarity since clustering coefficient, graph density and assortativity. To some
we have the node pair’s feature vectors u and v, respectively. networks with high assortativity, two unconnected nodes tend
The cosine similarity is used to characterize the link formation to be connected if they are structurally close, while others are
probability and it is defined as not. More specifically, the link formation probability for two
unconnected nodes is vastly decreasing with the increase of
uT v
cos(u, v) = , (4) geodesic distance in the C.elegans dataset, and 97.8% new
kukkvk links span the geodesic distance less than 3. But for the
where (·)T denotes the transpose and k · k means the l2 - Gnutella dataset, with an increase of geodesic distance, the
norm of a vector. The cosine similarity measures the cosine of link formation probability first increases then decreases, and
the angle between two d-dimensional vectors obtained from most of the new links (62.4%) are generated by node pairs
network representation learning. In fact the idea of cosine with distance equals to 5 or 6. For the Router dataset, the new
similarity has been used for link prediction in several works links span a wide range of geodesic distances from 2 to 34
[29], [45], [46]. But there are a few issues when directly using and almost half of the new links (48.67%) span a distance
cosine similarity for link prediction. The first one is it did not larger than 5. The distribution of link formation probabilities
consider the label information of node pairs. Thus it belongs to is more complex than the other two datasets.
the category of unsupervised learning. However, lots of works The cosine similarity function assigns higher scores to pairs
have demonstrated that supervised learning approaches to link of nodes if they are close to each other and vice versa. It
prediction can enhance the performance [15], [23], [24]. The can capture link formation patterns in the case of Fig.2a, i.e.,
other one is cosine similarity is too rigid to capture different the shorter the distance between two unconnected nodes, the
link formation mechanisms of different networks. higher the probability to be connected. But cosine similarity
Since in the phase of representation learning for networks, fails to capture the patterns in the cases of Fig.2b and Fig.2c,
it is assumed that two nodes have similar representations if especially Fig.2b, in which the link formation pattern follows
they have similar context in the node sequences sampled by a Gaussian distribution. For the pattern of Fig.2b, nodes prefer
strategy S. For networks, this indicates that if two nodes to build connections with those that are relatively farther from
are structurally close2 to each other, then they have a high them. When performing a link prediction task in cases like
probability to simultaneously occur in the same sequence this, pairs of nodes with a relatively longer distance should be
which results in a high value in terms of cosine similarity. But more similar than those with a shorter one.
in real-world networks, whether two nodes will form a link is Thus, we need to design a flexible similarity function
not simply influenced by this kind of structural closeness. Two for link prediction and capture the various patterns of link
nodes far from each other in the network will also have a high formation. Besides the similarity function should be devised
on the basis of concision and scalability. This can be achieved
2 For the three nodes (v , v , v ) in the graph, suppose the geodesic
1 2 3
distance of (v1 , v2 ) is 2 and (v1 , v3 ) is 5, we say v2 is closer to v1 than v3 . 3 The datasets are described in Section V-A
6
B 2 3 2 3
ADFEF A; F ¡! D A ¡0:1107 0:0254 ¢ ¢ ¢ 0:0206
A
BEFDA D; E ¡! F 6 B 7 6¡0:1245 ¡0:1225 ¢ ¢ ¢ 7
¡0:0860 7
C
CGCBE C; C ¡! G 6 7 6
6 C 7 = 6 0:0967 ¡0:0167 ¢ ¢ ¢ ¡0:0341 7
¢¢¢ ¢¢¢¢¢¢ 6 7 6 7
E 4¢ ¢ ¢5 4 ¢ ¢ ¢ 5
B D FDFEB D; E ¡! F G ¡0:0037 0:1196 ¢ ¢ ¢ 0:0524
A GCBEF B; F ¡! E
C G
F
E
D
B B B
A A A
G
C C C
F
E uT v+p
E score = f(u; v; p) = kukkvk
E
D D D
G G G
F F F
Fig. 3: The whole framework of link prediction based on adaptive similarity function.
by adjusting the similarity of node pairs and balancing link A. Subgraph generation
formation probabilities among different distances. Inspired Unlike other tasks such as link clustering or node classifica-
by this, we propose a modified similarity function which is tion, in which the complete structural information is available,
defined as a certain fraction of the links needs to be removed before
performing network representation learning for link prediction.
uT v p
sim(u, v) = + , (5) In order to achieve this, one can iteratively select one link and
kukkvk kukkvk determine whether it is removable or not. But this operation
where p is a balance factor to control the similarity of node is less effective and very time consuming especially when the
pairs with different geodesic distances. As we have the labels network is very sparse since it needs to traverse almost all the
of node pairs, so the optimal value of p can be learned in a nodes in the graph.
supervised way. Instead, we propose a fast positive sampling method based
on minimum spanning tree (MST) in this paper. A MST is a
IV. T HE PROPOSED FRAMEWORK subset of the edges in the original graph G that connects all
the nodes together. That means all the edges are removable
In this work, we propose a novel link prediction frame- except those that belong to the MST and their deletion will
work, AdaSim, based on an adaptive similarity function using not break the property of G of being connectivity. Lines 1–
the features learned from network representation. The whole 4 in Algorithm 1 shows the core of our approach. We first
framework is illustrated in Fig.3. It can be divided into generate a MST of G denoted as Gmst = (V, Emst ) using
three parts: subgraph generation, feature representation and Kruskal’s algorithm. The positive samples Ep are randomly
similarity function learning. First, the positive and negative selected from E − Emst . To generate negative samples En ,
node pair indexes are obtained through random sampling. The we sample an equal number of node pairs from G, with no
corresponding subgraph Gs is generated via edge removal. edge connecting them (lines 5–7). Then we delete all the edges
Then we learn the representation of nodes in the network using in Ep from G and obtain the subgraph Gs (line 8).
an unsupervised way. Finally a similarity function is defined
and the optimal parameter is determined through supervised
learning. The obtained similarity function with optimal penalty B. Feature representation
can be directly used to solve the link prediction problem. Now we proceed to perform the feature learning task on
subgraph Gs . This task consists of two core components, i.e.,
Algorithm 1: Sub-graph generation a node sequence sampling strategy and a language model.
Input: G = (V, E), positive edge ratio r 1) Node sequence sampling: In terms of node sequence
Output: samples of node pairs and sub-graph Gs sampling, the most classical strategies are Breadth First Search
1: Emst = Kruskal(G) (BFS) and Depth First Search (DFS) [31]. BFS starts at a
2: n = |E| ∗ r specific node and explores the neighbors first before moving
3: Shuffle(E − Emst ) to the next level. Contrarily, DFS traversing the network starts
4: Ep = (E − Emst )[0 : n] at one node and explores as far as possible along each branch
5: for e ∈
/ E and |En | 6 n do before backtracking. BFS and DFS represent two extreme
6: append e to En sampling strategies with respect to the search space they
7: end for explore, bringing about valuable implications on the learned
8: Gs = (V, E − Ep ) representations. In fact, the neighborhood sampled by BFS
9: return Gs , Ep + En can reflect the structural equivalence about the networks and
the sampled nodes in DFS can reflect a macro-view of the
neighborhoods, which is essential in inferring communities
7
Algorithm 2: Feature representation Algorithm 3: Parameter learning

Input: Gs = (V, E − Ep ), window size λ, feature size d, Input: node representation f , pairs of nodes Ep + En ,
walks per node k, walk length l train test split ratio r0
Output: node representation f Output: Evaluation results val
1: for i = 1 to k do 1: Dt , Dp = DataSplit(Ep + En , f, r 0 )
2: Shuffle(V ) 2: popt = SGD(Dt )
3: for v ∈ V do 3: for (uj , vj , yj ) ∈ Dp do
4: walks = RandomWalk(Gs , v, l) 4: tmp = (uTj vj + popt )/(|uj ||vj |)
5: end for 5: yˆj = 1/(1 + e−tmp )
6: end for 6: end for
7: for walk ∈ walks do 7: val = GetEvaluation(ŷ, y)
8: for v ∈ walk do 8: return val
9: J(f ) = −log P r(v|xv )
10: f = f − α ∗ ∂J
∂f
11: end for To solve this problem, an intuition is to limit the number of
12: end for output vectors updated per training instance. Thus hierarchical
13: return f softmax [51] is proposed to improve the learning efficiency,
which we adopt in this work. In the end, we use stochastic
gradient descent (SGD) techniques to optimize the objective
based on homophily [31]. Though they are of paramount function (lines 7–12 in Algorithm 2) to get the social repre-
significance for producing interesting representations, neither sentations of each node, i.e., f (vi ), in the graph. As illustrated
can simultaneously reveal the complex properties of networks. in Fig.3, for each node in the toy network, we can get a d-
We need a sampling strategy that can smoothly interpolate dimensional representations associate with it.
between DFS and BFS, whose requirement can be fulfilled by
random walks on graphs.
C. Similarity function learning
A random walk of length l on Gs rooted at node u is
a stochastic process with random variables (v1 , v2 , · · · , vk ) For node pair (ui , vi ) ∈ DS , we use ui and vi as
such that v1 = u and vi+1 is a node chosen uniformly at their features obtained from network representation learning.
random from the neighbors of vi . Random walks arise in a Considering the distribution bias of real links among different
variety of models for large scale networks, such as computing geodesic distances, we propose a novel similarity function
node similarities [24], [47], learning to rank nodes [48], which is defined as
[49] and estimating network properties [50]. Besides, they uTi vi + p
are the foundation of a class of output-sensitive algorithms K(ui , vi ) = sim(ui , vi ) = . (7)
kui kkvi k
that employ them to calculate community structure’s local
information. We denote ai = uTi vi and bi = kui kkvi k for simplicity. Then
This connection is the reason that motivates us to use (7) can be rewritten as
random walks as the node sequence sampling strategy for ai + p
extracting network information. K(ui , vi ) = . (8)
bi
Lines 1–6 in Algorithm 2 show the procedure of node
sequence sampling. As Fig.3 shows we can obtain a series A logistic function is applied to mapping the node pair
of node sequences using random walks. For example, if we similarity to a value in (0, 1), which is a probability indicating
want a random walk of length l = 5 rooted at A on the toy it belongs to the positive class or negative class. We use yî to
network, we may get the result of W = {A, D, F, E, F }. The denote this probability which is represented as
other sequences are obtained similarly. 1
2) Language model: In order to get the representations of yî = Logistic(K) = ai +p . (9)
−
networks, the objective is to solving 1+e bi
X In order to measure the closeness between the predicted value

max log P r(vi |xvi ), (6) and the true label, we select cross-entropy loss as our objective
f
vi ∈V
which is defined as
1
P
where xvi = 2λ −λ6j6λ,j6=0 f (vi+j ), λ is the context size N
1 X
and f is the mapping function from node to feature repre- C=− [yi logyî + (1 − yi )log(1 − yî )], (10)
sentations. For P r(vi |xvi ), we can use softmax, which is a N i=1
log-linear classification model, to get the posterior distribution
The stochastic gradient descent technique is used to get the
of nodes. However softmax involves the summation over all
optimal value of p by minimizing C, it’s updating rule can be
the node pairs and doing such computation for every training
written as
instance is very expensive, making it impractical to scale up dC
to large networks. p := p − α , (11)
dp
8
TABLE II: Basic topological information of the datasets. |V | is the number of nodes in the network and |E| is the total
links. Avg. Degree denotes the average node degree. Avg.CC represents the average clustering coefficient which indicates the
probability to be connected among neighbors of nodes. Diameter is the longest of all the calculated shortest paths in a network.
Density is the ratio of |E| to the number of possible edges.
Type Dataset |V | |E| Avg. Degree Avg. CC Diameter Density
C.elegans 297 2, 148 14.47 0.2924 5 4.89 × 10−2

PB 1, 222 16, 714 27.35 0.3203 8 2.24 × 10−2
Wiki-vote 7, 066 103, 663 28.50 0.1419 7 4.15 × 10−3
Dense
Email-enron 33, 696 183, 831 10.73 0.4970 11 3.24 × 10−4
Epinions 75, 879 508, 837 10.69 0.1378 15 1.77 × 10−4
Slashdot 77, 360 905, 468 14.13 0.0555 10 3.03 × 10−4
Sexual 288 291 2.02 0.0000 37 7.04 × 10−3

Roadnet 2, 092 2, 310 2.21 0.0065 195 1.06 × 10−3
Sparse
Power 4, 941 6, 594 2.67 0.0801 46 5.40 × 10−4
Router 5, 022 6, 258 2.49 0.0116 15 4.96 × 10−4
p2p-Gnutella 10, 876 39, 994 7.35 0.0062 10 6.76 × 10−4
where users and a directed edge from node i to node j represents

N
dC dC dyî 1 X 1 that user i voted on user j.
= = (yi − yî ) . (12) • Email-enron [54] is a communication network that covers
dp dyî dp N i=1 bi
all the email communication around half a million emails.
Algorithm 3 shows the core part of the parameter learning Nodes of the network are email addresses and if there is
process. The training data set and test data set are first obtained at least one email from address i to address j, then they
through line 1 in Algorithm 3. Then the optimal value of popt have a link between them.
is learned using SGD on the training data set Dt (line 2). The • Epinions [54] is who-trust-whom online social network.
popt is used to measure the similarity of node pairs in Dp Members of the site of Epinions can decide whether to
and we can get their probability of being connected through trust each other. If user i trust user j, then there is a link
lines 3 to 6 in Algorithm 3. Finally, the evaluation results are between them.
obtained through line 7. • Slashdot [54] is technology-related news website. The
network contains friend/foe links between the users of
V. E XPERIMENTS Slashdot.
• Sexual [55] is a well-known sexual contact network. This
In this section, we first give a brief description of the network is very sparse and has almost no closed triangles.
datasets used in the experiment. Next, we introduce the base- • Roadnet [54] denotes a road network of California, which
line models and evaluation metrics for link prediction. Then, is a typical sparse and treelike network.
the experimental results are presented with a detailed analysis. • Power [52] is a traditional sparse network, which denotes
As the AdaSim framework involves several parameters, lastly, the power grid of the western United States.
we show how the different choices of these parameters affect • Router is an Internet network of router-level collected by
the performance of link prediction. Rocketfuel Project [56].
• p2p-Gnutella [54] is a peer-to-peer file sharing network of
A. Datasets Gnutella. Nodes in the network represent hosts and edges
represent connections among those hosts of Gnutella.
So as to comprehensively evaluate the performance of our
proposed link prediction algorithm, we use ten real-world The basic topological information of these networks is
datasets to conduct our experiments, and these datasets are listed in Table II, including the number of nodes and edges,
commonly used in the link prediction domain. These datasets average degree, average clustering coefficient, diameter and
come from various fields and their details are described as density of the network. We roughly divide the networks as
follows. dense and sparse based on the average degree and average
• C.elegans [52] is the neural network of the Caenorhab- clustering coefficient. To sum up, we conduct experiments on
ditis elegans worm. The nodes represent the neurons and networks with various properties, i.e., sparse and dense, small
the edges denote synapse or gap junction. and large. Thus the datasets can comprehensively reflect the
• PB [53] is a network of hyperlinks between weblogs on characteristics of the proposed method 4 .
United States politics.
• Wiki-vote [54] is a social network that contains all the
Wikipedia voting data from the inception of Wikipedia till 4 Different from other networks, E p has 100% of the observed links for
January 2008. Nodes in the network represent wikipedia the highly sparse sexual contact network due to the small number of nodes.
9
TABLE III: AUC results of different algorithms. *For our algorithm, E p has 100% of the observed links for the highly sparse
sexual contact network, which moreover has only a small number of nodes.
Type Dataset RA SI PA CCLP CN HEI DeepWalk Node2vec AdaSim
C.elegans 0.7229 0.6737 0.7214 0.7176 0.7078 0.6171 0.7622 0.7700 0.7758
PB 0.8734 0.8192 0.9020 0.8752 0.8710 0.7158 0.8765 0.8792 0.9243
Wiki-Vote 0.8843 0.8694 0.9569 0.8855 0.8842 0.7999 0.8515 0.8675 0.9657
Dense
Email-enron 0.8956 0.8894 0.9195 0.8912 0.8944 0.8440 0.9300 0.9301 0.9514
Epinions 0.8131 0.8114 0.9715 0.8092 0.8129 0.9089 0.8099 0.8177 0.9746
Slashdot 0.7012 0.7005 0.8564 0.6832 0.7009 0.7512 0.7241 0.7327 0.8756
Sexual 0.4875 0.4875 0.4469 0.7481 0.5000 0.5031 0.7375 0.8750 0.8750*
Roadnet 0.5171 0.5171 0.3746 0.5000 0.5171 0.5325 0.7832 0.7909 0.8449
Sparse
Power 0.6177 0.6177 0.4925 0.5000 0.6177 0.4963 0.7533 0.7579 0.7675
Router 0.5557 0.5554 0.8919 0.5000 0.5556 0.8049 0.9074 0.9119 0.9315
p2p-Gnutella 0.5065 0.5065 0.7911 0.5018 0.5065 0.5689 0.5379 0.5572 0.7970
B. Baseline methods and evaluation metrics where tz is the number of triangles passing through node
In order to validate the performance of our proposed z.
algorithm, we compare AdaSim against the following link • Heterogeneity Index [10]. This method is based on the
prediction models. network heterogeneity and the state-of-the-art for sparse
• Common Neighbors (CN). For node u, let Γ(u) denote
and treelike networks.
the set of neighbors of u. Two nodes, u and v, have a sHEI α
uv = |ku − kv | ,
high probability of being connected if they have many
common neighbors [57], [58]. The simplest way to mea- where α is a free heterogeneity exponent.
sure this neighborhood overlap is by directly counting the •Node2vec [31]. This is a supervised way of link prediction
number of common neighbors, i.e., using logistic regression. The features used in this method
are generated through heuristic binary operators of node
sCN
uv = |Γ(u) ∩ Γ(v)|. pair features which are learned from network embedding.
• Resource Allocation (RA) [33]. For an unconnected node There are two parameters, p and q, to control the node
pair u and v, it is assumed that u can send some resources sequences sampling. Note that when p = q = 1, node2vec
to v by the medium of neighbors. The similarity between equals to DeepWalk [28].
u and v can be defined as the amount of resources Beside Node2vec, there are other approaches for unsuper-
received by v from u, which described as vised feature learning for graphs, such as spectral clustering
X 1 [60] and LINE [61]. We exclude them in this work since they
sPA
uv = , have already been shown to be inferior to Node2vec [31].
kz
z∈{Γ(u)∩Γ(v)} We also exclude other supervised methods, such as ensemble
where kz is the degree of z. learning [15] and support vector machines [21]. These methods
• Preferential Attachment (PA) [59]. Preferential attach- can get relatively better performance but at the cost of high
ment mechanism is used to generate random scale-free complexity, which is not our original attention.
networks, in which the new links connecting to u is We adopt the area under the receiver operating characteristic
proportional to ku . Similarly, the probability that a new (AUC) to quantitatively evaluate the performance of link pre-
link connecting u and v is proportional to ku × kv . The diction algorithms. The AUC value quantifies the probability
PA similarity index is defined as that a randomly chosen missing link is given a higher score
than a randomly chosen node pair without a link. A higher
sPA
uv = ku × kv . score means better performance.
• Salton Index (SI) [45]. The other name of SI is cosine
similarity and is defined as C. Experimental results
|Γ(u) ∩ Γ(v)| In order to obtain the following results, we set the pa-
sSI
uv = √ .
ku × kv rameters in line with the typical values in [31]. That is,
• Clustering Coefficient for Link Prediction (CCLP) [34]. d = 128, k = 10, l = 80, λ = 10, and the optimization is run
It is a similarity index with more local structural informa- for a single epoch. Fifty percent of the edges are removed and
tion considered. In this method, the local link information treated as positive examples. The negative node pairs which
is conveyed by clustering coefficient of common neigh- have no edge connecting them are randomly sampled from the
bors. network. For the two parameters, p and q, in Node2vec, they
X tz are selected through a grid search over p, q ∈ {0.25, 0.5, 1, 2}.
sCCLP
uv = ,
kz (kz − 1)/2 After the dataset is prepared we use ten-fold cross validation
z∈{Γ(u)∩Γ(v)}
10
0.8 1
0.9
0.75
0.8 0.8
0.7
0.7
AUC Value
AUC Value
AUC Value
0.6
0.65 0.6
0.6 0.5
0.4
0.4
0.55
0.3 0.2
0.5 0.2
0.45 0.1 0
-50 0 50 -50 0 50 -50 0 50
p p p
(a) C.elegans (b) Router (c) Wiki-vote
Fig. 4: The change of AUC with different choices of p.
to evaluate the performance. For the sake of objectivity, the TABLE IV: Running time (s) comparisons among different
experiment is repeated ten times on each dataset and the algorithms on three representative sparse networks.
average results are reported in Table III.
Dataset CN CCLP HEI Node2vec AdaSim
A general observation we can draw from these results
is that the proposed link prediction algorithm, AdaSim, can Sex 0.0020 0.0015 0.0010 0.0240 0.0045
obtain better performance than all the baseline methods on Power 0.2280 0.0475 0.0255 0.2735 1.2755
all datasets. More specifically, the unsupervised similarity- p2p-Gnutella 4.4310 0.8085 0.3205 1.6615 4.0910
based link prediction methods achieve relatively lower value

than those supervised ones, since the label information is not
0.95
leveraged to boost model performance. But the PA predictor
achieves competitively results compared with AdaSim and 0.9
even better than Node2vec on five out of eleven datasets. This 0.85
AUC value
is because preferential attachment is one of the key features 0.8

in generating power law scale-free networks. It reflects the 0.75
mechanism of network evolution that involves the addition of RA
0.7 SI
new nodes and edges. Thus it can obtain better performance PA
0.65
on link prediction problems. But similarity-based based link Node2vec
0.6 AdaSim
prediction methods perform extremely worse when the net-
work is sparse since limited or no closed triangular structure 0.55
0 0.2 0.4 0.6 0.8 1
exists in these networks.
Ratio of |E p | to |E|
Among all the supervised link prediction methods, AdaSim
outperforms both DeepWalk and Node2vec in all the eleven Fig. 5: The performance of various link prediction methods
networks with gain ratios of different scales. The gain ratio on networks with different sparsity.
varies from 0.75% to 43.04% in the AUC values compared
with Node2vec.
that with the increase of network size, the prediction time
For intuitively show the influence of penalty p on link
needed for all algorithms also increases. Besides, the learning-
prediction performance, p is set to specific values from −val to
based algorithms usually take more time than similarity-based
val with fixed increment a (here val = 50, a = 1 for demon-
ones since vector-vector multiplication takes more time than
stration) and display the results of AUC on three datasets, i.e.,
simply calculating the neighborhood information of two nodes.
C.elegans, Router and Wiki-vote, in Fig.4. Notice that p = 0
Though our algorithm requires more time to predict, it has
corresponds to the original cosine similarity measurement. It
also achieved considerable performance gains, as we explained
can be clearly seen, from Fig.4, that the results of AUC are
above.
considerably affected by the value of p. Compared with the
rigid cosine similarity, our proposed AdaSim can substantially
improve the link prediction performance. This also verifies our D. Performance on networks with different sparsity
empirical findings in section III-C that different networks have Networks in the real world are often sparse, we only know
different link formation patterns, thus a flexible and adaptive very limited information about the interactions among the
similarity function for link prediction is needed to capture nodes. For example, 80% of the molecular interactions in cells
these various patterns. of yeast and 99.7% of human are still unknown [45]. A good
We select three representative sparse networks, that is, link prediction method should have robust performance on
Sex (small), Power (medium), and p2p-Gnutella (large), and networks with different sparsity.
report the wall-clock time of CN, CCLP, HEI, Node2vec, We change the sparsity of the networks by randomly re-
and AdaSim in Table IV. We can observe from this table moving a certain percent of links in the original network,
11
0.965 0.97
0.968
0.964
0.965
0.966
AUC Value
AUC Value
AUC Value
0.963
0.96
0.964
0.962
0.955
0.961 0.962
0.96 0.96 0.95

4 5 6 7 8 9 6 8 10 12 14 16 18 20 20 40 60 80 100
log 2 d Number of walks per node, k Walk length, l
(a) (b) (c)
Fig. 6: Parameter sensitivity of AdaSim
then follow the aforementioned experiment setup to report have quantitatively given the evidence of inconsistency be-
the results of different methods. The results on the Wiki-Vote tween heuristic edge features and learned ones. Moreover, we
dataset are displayed in Fig.5. Only four baseline methods are have developed a novel link prediction framework AdaSim
listed in the figure since CN,CCLP and DeepWalk performs by introducing an adaptive similarity function to deal with
similarly with RA and Node2vec, respectively. the inflexible of cosine similarity, especially for sparse or
It can be seen from the results that the AUC values decrease treelike networks. AdaSim first learns node representations of
with the increase of removed edge ratio since it is becoming networks by solving a graph-based objective function, then
more and more challenging to characterize node similarity adds a penalty parameter, p, on the original similarity function.
using the information of network topology. The similarity- At last, the optimal value of p is learned through supervised
based methods perform well when the removed edge ratio learning. The proposed AdaSim is flexible thus is adaptive to
is relatively small. But they degrade very quickly and give data distribution and can capture the various link formation
their way to Node2vec and AdaSim except the PA predictor, mechanisms of different networks. We conducted experiments
which have competitively performance but still insuperior to using publicly available real-world network datasets, and
AdaSim. AdaSim performs consistently well and robust to extensively compared AdaSim with seven well-established
different sparsity conditions of networks. Even when eighty representative baseline methods. The results show that AdaSim
percent of the edges are removed, the AdaSim can still hold the achieves better performance than state-of-the-art algorithms on
performance around 0.95 in terms of AUC. Overall, AdaSim all datasets. It is also robust to the sparsity of the networks and
is not only robust to different network conditions but also obtains competitive performance with even a large fraction of
achieves better performance than baselines. edges are missing.
E. Parameter sensitivity VII. ACKNOWLEDGEMENTS

There are several parameters involved in the AdaSim algo- This work is supported by National Natural Science Foun-
rithm and in Fig.6, we examine how the different choices of dation of China (61901247, 61803047), Natural Science Foun-
parameters influence the performance of AdaSim on the Wiki- dation of Shandong Province ZR2019BF032, Major Project of
Vote dataset. Except for the parameter being tested, all other The National Social Science Foundation of China (19ZDA149,
parameters assume default values. 19ZDA324) and Fundamental Research Funds for the Central
We measure the AUC as a function of the representation Universities (14370119, 14390110).
dimension d, walk length l and the number of walks per node
k. We observe that the dimension of learning representations VIII. DATA AVAILABILITY
for nodes has limited effects on link prediction performance. All datasets can be obtained from the corresponding author
With the increase of dimensionality, the AUC values increase upon request.
slightly and turn to saturate when d reaches to 128. It can also
be observed that a larger l and k will improve the performance, IX. C ONFLICTS OF I NTEREST
this is because more neighborhood information of the seed
The authors declare no competing interests.
node is included in the representation learning process, the
node similarities can be captured more precisely.
R EFERENCES
VI. C ONCLUSION [1] M. Newman, Networks: An Introduction. New York, NY, USA: Oxford
University Press, Inc., 2010.
In this work, we focus on the link prediction problem [2] G. Palla, A.-L. Barabasi, and T. Vicsek, “Quantifying social group
with features obtained from network embedding. As the edge evolution,” Nature, vol. 446, no. 7136, pp. 664–667, Apr. 2007.
[3] J. Leskovec, J. Kleinberg, and C. Faloutsos, “Graph evolution: Den-
features generated through heuristic binary operators are an sification and shrinking diameters,” ACM Trans. Knowl. Discov. Data,
information-loss projection of the original node features, we vol. 1, no. 1, Mar. 2007.
12
[4] M. Newman, “The structure and function of complex networks,” SIAM [28] B. Perozzi, R. Al-Rfou, and S. Skiena, “Deepwalk: Online learning
Rev., vol. 45, no. 2, pp. 167–256, Jan. 2003. of social representations,” in Proceedings of the 20th ACM SIGKDD
[5] M. E. J. Newman and A. Clauset, “Structure and inference in annotated International Conference on Knowledge Discovery and Data Mining,
networks,” Nature Communications, vol. 7, p. 11863, June 2016. ser. KDD ’14. New York, NY, USA: ACM, Aug. 2014, pp. 701–710.
[6] L. Lü, L. Pan, T. Zhou, Y.-C. Zhang, and H. E. Stanley, “Toward [29] A. Zhiyuli, X. Liang, and X. Zhou, “Learning structural features of
link predictability of complex networks,” Proceedings of the National nodes in large-scale networks for link prediction,” in Proceedings of
Academy of Sciences, vol. 112, no. 8, pp. 2325–2330, Feb. 2015. the Thirtieth AAAI Conference on Artificial Intelligence, ser. AAAI’16.
[7] A. Clauset, C. Moore, and M. E. J. Newman, “Hierarchical structure Phoenix, Arizona, USA: AAAI Press, Feb. 2016, pp. 4286–4287.
and the prediction of missing links in networks,” Nature, vol. 453, no. [30] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Deep learning based
7191, pp. 98–101, May 2008. link prediction with social pattern and external attribute knowledge
[8] S. Soundarajan and J. Hopcroft, “Using community information to in bibliographic networks,” in 2016 IEEE International Conference on
improve the precision of link prediction methods,” in Proceedings of Smart Data (SmartData), 2016, pp. 815–821.
the 21st International Conference on World Wide Web, ser. WWW ’12 [31] A. Grover and J. Leskovec, “Node2vec: Scalable feature learning for
Companion. Lyon, France: ACM, Apr. 2012, pp. 607–608. networks,” in Proceedings of the 22nd ACM SIGKDD International
[9] K. ke Shang, M. Small, and W. sheng Yan, “Link direction for link Conference on Knowledge Discovery and Data Mining, ser. KDD ’16.
prediction,” Physica A: Statistical Mechanics and its Applications, vol. San Francisco, California, USA: ACM, Aug. 2016, pp. 855–864.
469, pp. 767–776, 2017. [32] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem for
[10] K.-k. Shang, T.-c. Li, M. Small, D. Burton, and Y. Wang, “Link social networks,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019–
prediction for tree-like networks,” Chaos: An Interdisciplinary Journal 1031, May 2007.
of Nonlinear Science, vol. 29, no. 6, p. 061103, 2019.
[33] T. Zhou, L. Lü, and Y.-C. Zhang, “Predicting missing links via local
[11] X. Wu, J. Wu, Y. Li, and Q. Zhang, “Link prediction of time-evolving
information,” The European Physical Journal B, vol. 71, no. 4, pp. 623–
network based on node ranking,” Knowledge-Based Systems, vol. 195,
630, 2009.
p. 105740, 2020.
[12] L. Wang, J. Ren, B. Xu, J. Li, W. Luo, and F. Xia, “Model: Motif- [34] Z. Wu, Y. Lin, J. Wang, and S. Gregory, “Link prediction with node
based deep feature learning for link prediction,” IEEE Transactions on clustering coefficient,” Physica A: Statistical Mechanics and its Appli-
Computational Social Systems, vol. 7, no. 2, pp. 503–516, 2020. cations, vol. 452, pp. 1 – 8, June 2016.
[13] A. Rossi, D. Barbosa, D. Firmani, A. Matinata, and P. Merialdo, “Knowl- [35] Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Han, “Co-
edge graph embedding for link prediction: A comparative analysis,” author relationship prediction in heterogeneous bibliographic networks,”
ACM Transactions on Knowledge Discovery from Data, vol. 15, no. 2, in Proceedings of the 2011 International Conference on Advances in
pp. 1–49, Jan. 2021. Social Networks Analysis and Mining, ser. ASONAM’11. Washington,
[14] L. Cai, J. Li, J. Wang, and S. Ji, “Line graph neural networks for DC, USA: IEEE Computer Society, July 2011, pp. 121–128.
link prediction,” IEEE Transactions on Pattern Analysis and Machine [36] Y. Dong, J. Tang, S. Wu, J. Tian, N. V. Chawla, J. Rao, and H. Cao,
Intelligence, pp. 1–1, 2021. “Link prediction and recommendation across heterogeneous social net-
[15] R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla, “New perspectives works,” in Proceedings of the 12th IEEE International Conference on
and methods in link prediction,” in Proceedings of the 16th ACM Data Mining, ser. ICDM ’12. Washington, DC, USA: IEEE Computer
SIGKDD International Conference on Knowledge Discovery and Data Society, Dec. 2012, pp. 181–190.
Mining, ser. KDD ’10. New York, NY, USA: ACM, 2010, pp. 243–252. [37] Y. Dong, J. Zhang, J. Tang, N. V. Chawla, and B. Wang, “Coupledlp:
[16] C. Zhang, H. Zhang, D. Yuan, and M. Zhang, “Citywide cellular traffic Link prediction in coupled networks,” in Proceedings of the 21th ACM
prediction based on densely connected convolutional neural networks,” SIGKDD International Conference on Knowledge Discovery and Data
IEEE Communications Letters, vol. 22, no. 8, pp. 1656–1659, 2018. Mining, ser. KDD ’15. Sydney, NSW, Australia: ACM, Aug. 2015, pp.
[17] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, “Deep transfer 199–208.
learning for intelligent cellular traffic prediction based on cross-domain [38] J. Zhang and P. S. Yu, “Integrated anchor and social link predictions
big data,” IEEE Journal on Selected Areas in Communications, vol. 37, across partially aligned social networks,” in Proceedings of the 24th
no. 6, pp. 1389–1401, 2019. International Joint Conference on Artificial Intelligence, ser. IJCAI ’15.
[18] W. Shen, H. Zhang, S. Guo, and C. Zhang, “Time-wise attention aided Buenos Aires, Argentina: AAAI Press, July 2015, pp. 2125–2131.
convolutional neural network for data-driven cellular traffic prediction,” [39] Y. Bengio, A. Courville, and P. Vincent, “Representation learning: A
IEEE Wireless Communications Letters, vol. 10, no. 8, pp. 1747–1751, review and new perspectives,” IEEE Transactions on Pattern Analysis
2021. and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, Aug. 2013.
[19] C. Zhang, S. Dang, B. Shihada, and M.-S. Alouini, “On telecommuni- [40] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction by
cation service imbalance and infrastructure resource deployment,” IEEE locally linear embedding,” Science, vol. 290, no. 5500, pp. 2323–2326,
Wireless Communications Letters, vol. 10, no. 10, pp. 2125–2129, 2021. Dec. 2000.
[20] ——, “Dual attention-based federated learning for wireless traffic pre- [41] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality
diction,” in IEEE INFOCOM 2021 - IEEE Conference on Computer reduction and data representation,” Neural computation, vol. 15, no. 6,
Communications, 2021, pp. 1–10. pp. 1373–1396, June 2003.
[21] M. Al Hasan, V. Chaoji, S. Salem, and M. Zaki, “Link prediction using
[42] D. Wang, P. Cui, and W. Zhu, “Structural deep network embedding,”
supervised learning,” in SDM’06: Workshop on Link Analysis, Counter-
in Proceedings of the 22nd ACM SIGKDD International Conference on
terrorism and Security, 2006.
Knowledge Discovery and Data Mining, ser. KDD ’16. San Francisco,
[22] L. Backstrom and J. Leskovec, “Supervised random walks: Predicting
California, USA: ACM, Aug. 2016, pp. 1225–1234.
and recommending links in social networks,” in Proceedings of the
Fourth ACM International Conference on Web Search and Data Mining, [43] X. Wang, P. Cui, J. Wang, J. Pei, W. Zhu, and S. Yang, “Community
ser. WSDM ’11. New York, NY, USA: ACM, 2011, pp. 635–644. preserving network embedding,” in Proceedings of the Thirty-First AAAI
[23] D. Davis, R. N. Lichtenwalter, and N. V. Chawla, “Supervised meth- Conference on Artificial Intelligence, ser. AAAI’17. North America:
ods for multi-relational link prediction,” Social Network Analysis and AAAI Press, Feb. 2017, pp. 203–209.
Mining, vol. 3, no. 2, pp. 127–141, June 2013. [44] C. Tu, H. Liu, Z. Liu, and M. Sun, “Cane: Context-aware network
[24] A. De, S. Bhattacharya, S. Sarkar, N. Ganguly, and S. Chakrabarti, “Dis- embedding for relation modeling,” in Proceedings of the 55th Annual
criminative link prediction using local, community, and global signals,” Meeting of the Association for Computational Linguistics, ser. ACL.
IEEE Transactions on Knowledge and Data Engineering, vol. 28, no. 8, Vancouver, Canada: ACL, July 2017, pp. 1–10.
pp. 2057–2070, Aug. 2016. [45] L. Lü and T. Zhou, “Link prediction in complex networks: A survey,”
[25] X. Chang, F. Nie, S. Wang, Y. Yang, X. Zhou, and C. Zhang, “Compound Physica A: Statistical Mechanics and its Applications, vol. 390, no. 6,
rank- k projections for bilinear analysis,” IEEE Transactions on Neural pp. 1150 – 1170, Dec. 2011.
Networks and Learning Systems, vol. 27, no. 7, pp. 1502–1513, 2016. [46] C. C. Fei Gao, Katarzyna Musial and S. Tsoka, “Link prediction methods
[26] H. Wang, Z. Li, Y. Li, B. Gupta, and C. Choi, “Visual saliency guided and their accuracy for different social networks and network metrics,”
complex image retrieval,” Pattern Recognition Letters, vol. 130, pp. 64– Scientific Programming, vol. 2015, pp. 1–13, Jan. 2015.
72, 2020. [47] F. Fouss, A. Pirotte, J. m. Renders, and M. Saerens, “Random-walk
[27] D. Yuan, X. Chang, P.-Y. Huang, Q. Liu, and Z. He, “Self-supervised computation of similarities between nodes of a graph with application to
deep correlation tracking,” IEEE Transactions on Image Processing, collaborative recommendation,” IEEE Transactions on Knowledge and
vol. 30, pp. 976–985, 2021. Data Engineering, vol. 19, no. 3, pp. 355–369, Mar. 2007.
13
[48] A. W. Yu, N. Mamoulis, and H. Su, “Reverse top-k search using random
walk with restart,” Proceedings of the VLDB Endowment, vol. 7, no. 5,
pp. 401–412, Jan. 2014.
[49] H.-H. Chen and C. L. Giles, “Ascos++: An asymmetric similarity
measure for weighted networks to address the problem of simrank,”
ACM Transactions on Knowledge Discovery from Data, vol. 10, no. 2,
pp. 15:1–15:26, Oct. 2015.
[50] C. Cooper, T. Radzik, and Y. Siantos, Fast Low-Cost Estimation of
Network Properties Using Random Walks, ser. WAW’13. Cambridge,
MA, USA: Springer International Publishing, Dec. 2013, vol. 8305, pp.
130–143.
[51] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of
Word Representations in Vector Space,” ArXiv e-prints, Jan. 2013.
[52] D. J. Watts and S. H. Strogatz, “Collective dynamics of ‘small-
world’networks,” nature, vol. 393, no. 6684, pp. 440–442, 1998.
[53] L. A. Adamic and N. Glance, “The political blogosphere and the 2004
u.s. election: Divided they blog,” in Proceedings of the 3rd International
Workshop on Link Discovery, ser. LinkKDD ’05. New York, NY, USA:
ACM, 2005, pp. 36–43.
[54] J. Leskovec and A. Krevl, “SNAP Datasets: Stanford large network
dataset collection,” http://snap.stanford.edu/data, June 2014.
[55] P. Bearman, J. Moody, and K. Stovel, “Chains of affection: The structure
of adolescent romantic and sexual networks,” American Journal of
Sociology, vol. 110, no. 1, pp. 44–91, 2004.
[56] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, “Measuring
isp topologies with rocketfuel,” IEEE/ACM Transactions on Networking,
vol. 12, no. 1, pp. 2–16, Feb. 2004.
[57] G. Kossinets, “Effects of missing data in social networks,” Social
Networks, vol. 28, no. 3, pp. 247 – 268, July 2006.
[58] M. E. J. Newman, “Clustering and preferential attachment in growing
networks,” Physical review E, vol. 64, p. 025102, July 2001.
[59] A.-L. Barabási and R. Albert, “Emergence of scaling in random net-
works,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999.
[60] L. Tang and H. Liu, “Leveraging social media networks for classifi-
cation,” Data Min. Knowl. Discov., vol. 23, no. 3, pp. 447–478, Nov.
2011.
[61] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “Line:
Large-scale information network embedding,” in Proceedings of the 24th
International Conference on World Wide Web, May 2015, pp. 1067–
1077.

Zhang Et Al. - 2021 - Adaptive Similarity Function With Structural Featu

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zhang Et Al. - 2021 - Adaptive Similarity Function With Structural Featu

Uploaded by

Copyright:

Available Formats

1

Adaptive Similarity Function with Structural

N ETWORKS have recently emerged as an important tool

Link formation probability

Link formation probability

(a) Epinions (b) Gnutella (c) Router

Algorithm 2: Feature representation Algorithm 3: Parameter learning

X In order to measure the closeness between the predicted value

Type Dataset |V | |E| Avg. Degree Avg. CC Diameter Density

C.elegans 297 2, 148 14.47 0.2924 5 4.89 × 10−2

Sexual 288 291 2.02 0.0000 37 7.04 × 10−3

where users and a directed edge from node i to node j represents

Type Dataset RA SI PA CCLP CN HEI DeepWalk Node2vec AdaSim

Fig. 4: The change of AUC with different choices of p.

based link prediction methods achieve relatively lower value

is because preferential attachment is one of the key features 0.8

0.96 0.96 0.95

Fig. 6: Parameter sensitivity of AdaSim

E. Parameter sensitivity VII. ACKNOWLEDGEMENTS

You might also like