Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

Link prediction

Leonid E. Zhukov

School of Data Analysis and Artificial Intelligence


Department of Computer Science
National Research University Higher School of Economics

Structural Analysis and Visualization of Networks

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 1 / 24


Lecture outline

1 Link prediction problem

2 Proximity measures

3 Prediction by supervised learning

4 Other methods

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 2 / 24


Link prediction

Link prediction. A network is changing over time. Given a snapshot


of a network at time t, predict edges added in the interval (t, t 0 )
Link completion (missing links identification). Given a network, infer
links that are consistent with the structure, but missing (find
unobserved edges)
Link reliability. Estimate the reliability of given links in the graph.

Predictions: link existence, link weight, link type

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 3 / 24


Link prediction

Graph G(V,E)
Number of ”missing edges”: |V |(|V | − 1)/2 − |E |
In sparse graphs |E |  |V |2 , Prob. of correct random guess O( |V1|2 )

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 4 / 24


Scoring algorithm

Link prediction by proximity scoring


1 For each pair of nodes compute proximity (similarity) score c(v1 , v2 )
2 Sort all pairs by the decreasing score
3 Select top n pairs (or above some threshold) as new links

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 5 / 24


Scoring functions

Local neighborhood of vi and vj


Number of common neighbors:

|N (vi ) ∩ N (vj )|

Jaccard’s coefficient:
|N (vi ) ∩ N (vj )|
|N (vi ) ∪ N (vj )|
Adamic/Adar:
X 1
log |N (v )|
v ∈N (vi )∩N (vj )

Liben-Nowell and Kleinberg, 2003

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 6 / 24


Scoring functions

Paths and ensembles of paths between vi and vj


Shortest path:
− min{pathijs > 0}
s

Katz score:

X ∞
X
l (l)
β |paths (vi , vj )| = (βA)lij = (I − βA)−1 − I
l=1 l=1

Personalized (rooted) PageRank:

PR = α(D −1 A)T PR + (1 − α)|

Liben-Nowell and Kleinberg, 2003

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 7 / 24


Scoring functions

Expected number of random walk steps:


– hitting time: −Hij
– commute time −(Hij + Hji )
– normalized hitting/commute time (Hij πj + Hji πi )

SimRank:
C X X
SimRank(vi , vj ) = SimRank(m, n)
|N (vi )| · |N (vj )|
m∈N (vi ) n∈N (vj )

Liben-Nowell and Kleinberg, 2003

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 8 / 24


Vertex feature aggregations

Preferential attachment:

ki · kj = |N (vi )| · |N (vj )|

or
ki + kj = |N (vi )| + |N (vj )|

Clustering coefficient:
CC (vi ) · CC (vj )
or
CC (vi ) + CC (vj )

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 9 / 24


Low-rank approximations

Low-rank approximation (truncated SVD)


X
A≈ Uk Sk VkT
k

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 10 / 24


Link prediction

Liben-Nowell and Kleinberg, 2007


Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 11 / 24
Link prediction

Liben-Nowell and Kleinberg, 2007

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 12 / 24


Link prediction

Liben-Nowell and Kleinberg, 2007

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 13 / 24


Binary classificaton

Challenging classification problem:


Computational cost of evaluating of very large number of possible
edges (quadratic in number of nodes)
Highly imbalanced class distribution: number of positive examples
(existing edges) grows linearly and negative quadratically with
number on nodes

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 14 / 24


Prediction difficulty

Actual and possible collaborations between DBLP authors

from Rattigan and Jensen, 2005

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 15 / 24


Link prediction with supervised learning
Supervised learning:
1 Features generation

2 Model training

3 Testing (model application)

Features:
Topological proximity features
Aggregated features
Content based node proximity features

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 16 / 24


Features

Discriminative abilities of features

from Rattigan and Jensen, 2005

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 17 / 24


Evaluation

Simple ”hold out set” evaluation

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 18 / 24


Evaluation

Evaluation for evloving networks

image from Y. Yang et.al, 2014

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 19 / 24


Evaluation metrics

Precision and Recall, F-measure


TP TP
Precision = , Recall =
TP + FP TP + FN
2 · Precision · Recall
F =
Precision + Recall

True positive rate (TPR), False positive rate (FPR), ROC curve, AUC

TP FP
TPR = , FPR =
TP + FN FP + TN

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 20 / 24


Performance of classification algorithms

BIOBASE database (research publications)

DBLP dataset (research publications)

from M. Al Hasan, 2010


Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 21 / 24
ROC curves

from Lichtenwalter, 2010

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 22 / 24


Probabilistic models

Local model, Markov random fields [Wang, 2007]


Hierarchical probabilistic model [Clauset, 2008]
Probabilistic relations models:
– Bayesian networks [Getoor, 2002]
– relational Markov networks [Tasker, 2003, 2007]

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 23 / 24


References

D. Liben-Nowell and J. Kleinberg. The link prediction problem for


social networks. Journal of the American Society for Information
Science and Technology, 58(7):1019?1031, 2007
R. Lichtenwalter, J.Lussier, and N. Chawla. New perspectives and
methods in link prediction. KDD 10: Proceedings of the 16th ACM
SIGKDD, 2010
M. Al Hasan, V. Chaoji, S. Salem, M. Zaki, Link prediction using
supervised learning. Proceedings of SDM workshop on link analysis,
2006
M. Rattigan, D. Jensen. The case for anomalous link discovery. ACM
SIGKDD Explorations Newsletter. v 7, n 2, pp 41-47, 2005
M. Al. Hasan, M. Zaki. A survey of link prediction in social networks.
In Social Networks Data Analytics, Eds C. Aggarwal, 2011.

Leonid E. Zhukov (HSE) Lecture 18 26.05.2015 24 / 24

You might also like