Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

Complejidad y Redes

Centrality measures

Complejidad y Redes.
Universidad Politécnica de Madrid

Designed by starline / Freepik


Slides based on:

Network centrality: an introduction


by Francisco A. Rodrigues https://arxiv.org/pdf/1901.07901.pdf

MA5Q3 Topics in Complexity Science


slides by Francisco A. Rodrigues
Lecture 3 http://conteudo.icmc.usp.br/pessoas/francisco/networks/lecture3.pdf

Nuevas Tecnologías y Empresa


slides by J.I. Santos
Chapter 2 and 3 https://sites.google.com/site/meetnachosantos/

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 2
Overview

Centrality measures

Degree centrality
k-core centrality
Closeness centrality
Betweenness centrality
KATZ centrality
Page Rank centrality

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 3
How to describe node characteristics?
Picture from https://www.inverse.com/article/27435-stranger-
things-season-2-super-bowl-trailer-demogorgon-villain

Until know we have seen:


- degree
- distances to other nodes
- clustering

Demogorgon: very well-connected node,


should have high values of centrality
How can we measure importance, influence, power?

--> This is what centrality measures are made for Which kind of
centrality measures
can we calculate?
Complejidad y Redes. Ideas?
Universidad Politécnica de Madrid Centrality measures 4
Who is most important in a network?

Who the most connected in the network?


Degree centrality
k-core centrality

Who is closest to everyone else?


Closeness centrality

Who links groups far from the network?


Betweenness centrality

Who is connected to the best connected?


Eigenvector centrality
Katz centrality
Pagerank centrality

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 5
® Translated slide from NNTT y Empresa, by J.I. Santos
Degree Centrality
Picture from https://www.telltalesonline.com/26925/popular-celebs/

Hypothesis:

Individuals who have more links have more influence, more prestige, more access to
information, are more popular, ... than those who have less

Ex: "Celebrities"

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 6
® Translated slide from NNTT y Empresa, by J.I. Santos
Degree Centrality: measure of connectedness

Remember: adjacency matrix Aij where (i, j) represents link from j to i

Directed network
How popular is an individual: Sometimes it
is normalized
by dividing by
How many people know an individual:
(N-1)

Undirected network
adjacency matrix Aij ((i, j) = (j, i)):

Undirected weighted network


adjacency matrix Wij:

Examples: #followers (Twitter); #friends (Facebook); #citations (scientific papers)


Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 7
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

Degree centrality (Gephi)

In Gephi instead of the normalised degree centrality, we have


just the degree.
In Python we can calculate the normalized degree centrality
with the function degree_centrality

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 8
® Translated slide from NNTT y Empresa, by J.I. Santos
Degree Centrality
Picture from Network centrality: an introduction, Francisco Aparecido Rodrigues
https://arxiv.org/pdf/1901.07901.pdf

Limitations

The degree centrality of a node depends exclusively on the number of links it has,
regardless of its importance.

E.g.: would my twitter account be equally important is followed by an "unknown" twitter


account that by Mark Newman’s twitter account?

It is a local measure; it does not depend on the rest of the network.

It can happen that the nodes with the highest degree


are at the periphery, they are not central, and we may
also be interested in highlight centrality in terms of position.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 9
® Translated slide from NNTT y Empresa, by J.I. Santos
K-Core Centrality

It allows to identify peripheral hubs. The most central nodes have the highest values of k-
core centrality.

A k-core of a graph G is a maximal


connected subgraph of G in which
all vertices have degree at least k.

A node i has coreness kc(i) = k


whether it belongs to the k-core

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 10
® Slide from MA5Q3 Topics in Complexity Science by F. A. Rodrigues
K-Core Centrality

“This centrality measure is obtained by the k-shell decomposition, which partitions the network by iteratively
removing all nodes whose degree is smaller than k. After removing these nodes, the network is re-analyzed to verify
whether there are nodes with less than k connections. If such nodes are present, then they are also removed.”
Original network Original network

The remaining nodes have k-core=1

Remove nodes whose degree < 3 Remove nodes whose degree < 2

Again remove nodes whose degree < 3 Again remove nodes whose degree < 2

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 11
® Text from Network centrality: an introduction by Francisco A. Rodrigues
Example

k-core centrality

4
7
3 1 5 6

2 8

Kci Label

2 1

2 2

2 3

2 4

2 6

2 7

In Gephi you cannot compute the k-core centrality but can 2 8


filter the graph according to k-cores 2 8

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 12
K-Core Centrality

Limitations

The limitation of this measure lies in the fact that many nodes may be assigned to the
same k-core number.

4
7
3 1 6

2 8

Right picture from Network centrality: an introduction, Francisco Aparecido Rodrigues


https://arxiv.org/pdf/1901.07901.pdf

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 13
Closeness Centrality: measure of proximity

The centrality of a node can also be seen from the perspective of proximity to the other
nodes.

Hypothesis:
The nodes closest to all the others have better access to information from other nodes
and / or can transmit their opinion more quickly to others.

Given the mean geodesic distance (shortest path) di from node i to all others

Its "closeness" Ci is defined: higher values of closeness indicate higher


centrality.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 14
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

Closeness centrality (Gephi)

Ci Label

0.583333 1

0.583333 5

0.5 6

0.4375 3
0.411765 2

0.411765 4

0.368421 7

0.368421 8

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 15
® Translated slide from NNTT y Empresa, by J.I. Santos
Closeness Centrality

Limitations:

It is based only on the shortest distances and, therefore, in small-diameter networks the
range of variation is too narrow. The ratio between the largest and minimal distances is of
order log(N), since the minimal distance is equal to one.

It is very sensitive to changes in the network.

It is undefined when there are different components that make the distance between
nodes of both components is infinite.
In disconected networks, Gephi computes the closeness centrality for the
component. NetworkX computes the closeness centrality for the
component and scales by the size of the component (see eq. on the rigth)
n: number of nodes in the component
https://networkx.org/documentation/stable/reference/algorithms/generate N: number of nodes in the network
d/networkx.algorithms.centrality.closeness_centrality.html

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 16
® Translated slide from NNTT y Empresa, by J.I. Santos
Betweenness Centrality: measure of load

“If we consider the flow of particles on a network, then we can define centrality in terms of the load.
It is natural to think that the most central node receives the largest number of particles in a defined
time interval. Assuming that these particles move following the shortest distances, the load in a node
i is given by the total number of shortest paths passing through i. However, since we can have more
than one shortest path between a pair of nodes a and b, it is more suitable to define the load in node
i as the fraction of shortest paths connecting each pair of nodes (a,b) that includes i.

𝜂(𝑎, 𝑖, 𝑏)
So the Betweenness centrality is the total load of i: 𝐵! = #
𝜂(𝑎, 𝑏)
(#,%)

where η(a,i,b) is the number of shortest paths connecting vertices a and b that pass through node i
and η(a,b) is the total number of shortest paths between a and b.

Saying it other way: Serves to identify those "bridge" nodes between separate groups, to identify
"bottlenecks”.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 17
® Based on Network centrality: an introduction by F.A. Rodrigues
Betweenness Centrality

Nodes with high betweenness have a great power of intermediation insofar as they can
influence a greater number of messages.

They are "bridge" between remote groups. If they are eliminated, they can fragment the
network into isolated groups (some algorithms for identifying communities are built on
this property).

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 18
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

Betweenness centrality (Gephi)

List of shortest paths

Node 1 Node 2 Node 3 Node 4


{1,2} {2,3} {3,4} {4,1,5}
{1,3} {2,3,4}, {2,1,4} {3,1,5} {4,1,5,6}
{1,4} {2,1,5} {3,1,5,6} {4,1,5,6,7}
{1,5} {2,1,5,6} {3,1,5,6,7} {4,1,5,6,8} #({2,3,4})
𝐵! = = 0.5
{1,5,6} {2,1,5,6,7} {3,1,5,6,8} #( 2,3,4 , {2,1,4})
{1,5,6,7} {2,1,5,6,8}
{1,5,6,8}
Node 5 Node 6 node 7 Node 8
{5,6} {6,7} {7,8}
{5,6,7} {6,8}
{5,6,8}

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 19
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

Betweenness centrality (Gephi)

List of shortest paths

Node 1 Node 2 Node 3 Node 4


{1,2} {2,3} {3,4} {4,1,5}
{1,3} {2,3,4}, {2,1,4} {3,1,5} {4,1,5,6}
{1,4} {2,1,5} {3,1,5,6} {4,1,5,6,7}
{1,5} {2,1,5,6} {3,1,5,6,7} {4,1,5,6,8}
{1,5,6} {2,1,5,6,7} {3,1,5,6,8} #({1,5,6,7}) #({1,5,6,8})
{1,5,6,7} {2,1,5,6,8} 𝐵" = + … = 10
{1,5,6,8} #({1,5,6,7}) #({1,5,6,8})

Node 5 Node 6 node 7 Node 8


{5,6} {6,7} {7,8}
{5,6,7} {6,8}
{5,6,8}

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 20
® Translated slide from NNTT y Empresa, by J.I. Santos
Betweenness Centrality

Limitation:

Calculus is computationally expensive. It considers only the shortest distances and,


therefore, is not general, since information can travel long distances in a network.

Requiring O(N3) time and O(N2) space, where N is the number of nodes in the network.
Even the solution proposed by Brandes to calculate exact betweenness centrality, which
runs in O(NM), where M is the number of edges in the network, is computationally
expensive for large graphs.

To overcome these limitations, calculus based on random walks can be considered.


In disconected networks, both Gephi and NetworkX assign a zero value to the
betweenness centrality of isolated nodes ( changing the indeterminate 0/0 to B=0).
Also both compute the betweenness centrality inside each component separately.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 21
® Slide from MA5Q3 Topics in Complexity Science by F. A. Rodrigues
Random-walk Betweenness

The previous metric assumes that "messages" use only the shortest paths, discarding the
contribution of any other alternative.
A variant is "random-walk betweenness" (Newman, 2005) the traffic between two nodes
(s, t) is measured (repeatedly) by a random passer that exits s and finally arrives at t. Now
Bi is the number of times the walker passed through node i.
With this algorithm any path between s and t contributes, although those longer
contribute less (they are less likely).

So, the betweenness centrality based on random walks is given by the expected number
of visits to each node i during a random walk.

Newman, M. (2005). A measure of betweenness centrality based on random walks. Social Networks. http://arxiv.org/pdf/cond-mat/0309045.pdf

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 22
® Translated slide from NNTT y Empresa, by J.I. Santos
Eigenvector Centrality: Influence / Prestige measure

“not what you know, but who you know.. “ M.O. Jackson, Social and Economic Networks: Models and Analysis

Hypothesis:

The importance of a node in the network grows.


if it has links that they are also important.

Ex.: ”Kim Kardashian"

Source: https://www.dailymail.co.uk/tvshowbiz/article-12580769/Fans-convinced-
Anna-Wintour-snubbed-Kim-Kardashian-Victoria-Beckhams-Paris-fashion-show.html

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 23
® Translated slide from NNTT y Empresa, by J.I. Santos
Eigenvector Centrality

Let us make some initial guess about the centrality 𝑥! of each node i. For instance, we could
start off by setting 𝑥! = 1 for all i. Obviously this is not a useful measure of centrality, but we
can use it to calculate a better one 𝑥!" , which we define to be the sum of the centralities of
iʹs neighbors thus:

𝑥!" = ∑# 𝐴!# 𝑥# ,

where Aij is an element of the adjacency matrix. We can also write this expression in matrix
notation as xʹ = Ax, where x is the vector with elements 𝑥! . Repeating this process to make
better estimates, we have after t steps a vector of centralities x(t) given by:

𝑥 𝑡 = 𝐴$ 𝑥(0)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 24
® Text from Networks: An Introduction, by M.E.J. Newman
Eigenvector Centrality

Now let us write x(0) as a linear combination of the eigenvectors vi of the adjacency matrix, A, thus:

𝑥 0 = # 𝑐! 𝒗!
!
for some appropriate choice of constants ci. Then

'
𝑘 !
𝑥 𝑡 = 𝐴' 𝑥 0 = 𝐴' # 𝑐! 𝒗! = # 𝑐! 𝑘!' 𝒗! = 𝑘(' # 𝑐! 𝒗!
𝑘(
! ! !

where the κi are the eigenvalues of A, and κ1 is the largest of them.


Since κi/κ1 < 1 for all i ≠1, all terms in the sum other than the first decay exponentially as t becomes
large, and hence in the limit t → ∞ we get 𝑥 𝑡 → 𝑐(𝑘('𝒗( (note that 𝑣1 is the leading eigenvector)

In other words, the limiting vector of centralities is simply proportional to the leading eigenvector of
the adjacency matrix.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 25
® Text from Networks: An Introduction, by M.E.J. Newman
Eigenvector Centrality

Equivalently we could say that the centrality x satisfies. 𝑨 𝒙 = 𝑘(𝒙

This then is the eigenvector centrality, first proposed by Bonacich in 1987.


The centrality xi of node i is proportional to the sum of the centralities of iʹs neighbours:

𝑥! = 𝑘()( # 𝐴!* 𝑥*
*

which gives the eigenvector centrality the nice property that it can be large either because a node has
many neighbors or because it has important neighbors (or both).

In theory eigenvector centrality can be calculated for either undirected or directed networks. It works best however
for the undirected case. In the directed case other complications arise. First of all, a directed network has an adjacency
matrix that is, in general, asymmetric. This means that it has two sets of eigenvectors, the left eigenvectors and the
right eigenvectors, and hence two leading eigenvectors. So which of the two should we use to define the centrality? In
most cases the correct answer is to use the right eigenvector. The reason is that centrality in directed networks is
usually bestowed by other vertices pointing towards you, rather than by you pointing to others.
Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 26
® Text from Networks: An Introduction, by M.E.J. Newman
Example

Eigenvector centrality (Gephi)

xi Label

Gephi obtain the leading eigenvector 1 1

and normalize it by the leading 0.846328 3

eigenvalue. 0.668745 2

0.668745 4
0.560831 5

0.490898 6
In NetworkX the centralities are not normalized
0.337226 7
https://networkx.org/documentation/stable/reference/algorithms/ge
nerated/networkx.algorithms.centrality.eigenvector_centrality.html 0.337226 8

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 27
® Translated slide from NNTT y Empresa, by J.I. Santos
Eigenvector Centrality

Limitations

There are still problems with eigenvector centrality on directed networks. Only vertices that
are in a strongly connected component of two or more vertices, or the out-component of
such a component, can have non-zero eigenvector centrality. Recall also that acyclic
networks, such as citation networks, have no strongly connected components of more than
one node, so all vertices will have centrality zero. Clearly this make the standard
eigenvector centrality completely useless for acyclic networks.

A variation on eigenvector centrality that addresses these problems is the Katz centrality,
which is the subject of the next section.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 28
® Text from Networks: An Introduction, by M.E.J. Newman
Katz Centrality

Katz (1953): the definition of "eigenvector centrality" is modified by adding two


parameters α> 0 and β> 0

where 1 is the vector (1, 1, 1 ...)

β is a positive constant that guarantees that all nodes have at least one non-zero positive
value as a centrality.
“We simply give each node a small amount of centrality “for free,” regardless of its
position in the network or the centrality of its neighbours”. ® Networks: An Introduction, by M.E.J. Newman

α modulates the weight of "eigenvector" with respect to the constant β in the centrality of
a node.
Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 29
® Translated slide from NNTT y Empresa, by J.I. Santos
Katz Centrality

Matritially:

The range of variation of α is limited by the principal eigenvalue k1 of A


(for 𝛼 = 1/𝑘% the centrality of Katz diverges, thus, we should choose a value of α smaller
than 1/𝑘% if we wish the expression for the centrality to converge to meaningfull values).

The computation of the inversion of is computationally expensive: O(n3). There


are recursive algorithms to compute this in O(t·m)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 30
® Text from Networks: An Introduction, by M.E.J. Newman
Katz Centrality

Considering a constant term βi different for each node


Gephi plugin:

One could incorporate relevant information from each node not contained in the network,
for example in a social network: age, wealth, etc.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 31
® Translated slide from NNTT y Empresa, by J.I. Santos
PageRank

Eigenvector centrality and Katz centrality have a drawback: a prestigious node also makes
prestigious all those who points out.

Intuition: my association with a prestigious node should be "downgraded" if prestige is


shared among many.

Hypothesis: The importance that a node receives from its neighbours is proportional to its
centrality divided by its out-degree

Larry Page Page Rank Algorithm: PageRank

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 32
® Translated slide from NNTT y Empresa, by J.I. Santos
PageRank

The range of variation of α is limited by the main eigenvalue of AD-1.


This eigenvalue in undirected networks is 1.
PageRank arbitrarily fixed α = 0.85.

As before, calculating X by the inversion of matrices is computationally expensive, so


different heuristic approximations are used.
Example: http://ccl.northwestern.edu/netlogo/models/PageRank
Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 33
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

PageRank centrality (Gephi)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 34
® Translated slide from NNTT y Empresa, by J.I. Santos
Hubs-Authorities (For directed networks)

In directed networks, the previous measures compute the centrality of a node insofar as
central nodes point to it.

Sometimes it can also be interesting to identify those nodes that point to important
nodes:

Ex: Twitter: a twitter account (hub) that follows others that are a reference (authority) in
a subject
twitter@gassol

twitter@falonso twitter@cronaldo
twitter@marca

twitter@mesi
twitter@nadal

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 35
® Translated slide from NNTT y Empresa, by J.I. Santos
Hubs-Authorities

Hypothesis:
Authority: nodes that are relevant because they contain important information.
Hubs: nodes that tell us where the best authorities are

Hub Authority

Point to "authorities" It is pointed by "hubs"

It is a circular definition: A node can play both roles.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 36
® Translated slide from NNTT y Empresa, by J.I. Santos
Hubs-Authorities

Algorithm Hyperlink-Induced Topic Search (HITS): each node is given an initial value (t = 0)
of "authority centrality" xi (0) and of "hub centrality" yi (0)

Ajixj(0)

"Authority centrality” "Hub centrality”


corresponds to the main eigenvector of corresponds to the main eigenvector of
the cocitation matrix (#nodes that point the bibliographic pairing matrix (#nodes
simultaneously to (i, j)) to which (i, j) point simultaneously)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 37
® Translated slide from NNTT y Empresa, by J.I. Santos
Example

cocitation matrix (#nodes that point bibliographic pairing matrix (#nodes to


simultaneously to (i, j)) which (i, j) point simultaneously)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 38
® Translated slide from NNTT y Empresa, by J.I. Santos
Hubs-Authorities

The HITS algorithm does not suffer the disadvantages in directed networks of the
eigenvector centralities that demand to introduce a constant.
Ex: an "article" may not be cited by anyone (xi = 0) and yet cite relevant articles in a
subject (yi ≠ 0).

It is more suitable for directed networks than the eigenvector algorithms.

It was the base of the ask.com search engine.

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 39
® Translated slide from NNTT y Empresa, by J.I. Santos
None is better than the other, its application depends
Comparison on the nature of the problem and the relationships

Degree centrality Closeness

Eigenvector centrality PageRank


Betweenness

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 40
® Translated slide from NNTT y Empresa, by J.I. Santos
Periodic Table of network centralities
http://schochastics.net/sna/periodic.html (Interactive version)

Complejidad y Redes.
Universidad Politécnica de Madrid Centrality measures 41
¡Gracias!

Centrality measures

Complejidad y Redes.
Universidad Politécnica de Madrid

Designed by starline / Freepik

You might also like