Professional Documents
Culture Documents
5y6 ScaleFreeNetworks and BAM
5y6 ScaleFreeNetworks and BAM
Scale-free Networks
Complejidad y Redes.
Universidad Politécnica de Madrid
Network Science
slides by L. Barabási and R. Sinatra
Ch 4 Original slides available at: http://barabasi.com/networksciencebook/
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 2
Overview
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 3
Network models
We have seen that real networks are usually not like random networks:
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 4
® Picture by Network Science by A.L. Barabási
The Scale-Free property
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 5
Power law distribution
a.k.a. the 80-20 rule, a.k.a. the Pareto principle
small is
common
large is rare
x
“When the probability of measuring a par5cular value of some quan5ty varies inversely as a power of that value, the
quan5ty is said to follow a power law, also known variously as Zipf’s law or the Pareto distribu5on” (Newman, 2006)
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 6
® Image from Wikimedia commons
Power laws everywhere
“Cumula've distribu'ons of twelve different quan''es measured in
physical, biological, technological and social systems of various kinds. All
have been proposed to follow power laws over some part of their range.
The ubiquity of power-law behavior in the natural world has led many
scien'sts to wonder whether there is a single, simple, underlying
mechanism linking all these different systems together. Several candidates
for such mechanisms have been proposed, going by names like self-
organized cri/cality and highly op/mized tolerance”.
® Newman 2006. Power laws, Pareto distributions and Zipf’s law
r e power-
How a erated?
en
laws g
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 7
What does this have to do
with networks?
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 8
The first (network) analysis of the WWW R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999).
Expected
ROBOT: collects all URL’s found in a document and
follows them recursively
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 9
® Slide by Network Science by A.L. Barabási
The first (network) analysis of the WWW
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 10
® Slide by Network Science by A.L. Barabási
The first (network) analysis of the WWW R. Albert, H. Jeong, A-L Barabasi, Nature, 401 130 (1999).
Expected
~1012 nodes (documents)
𝑃 𝑘 ~𝑘 !#
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 11
® Slide by Network Science by A.L. Barabási
Poisson degree distribution vs. Power law degree distribution
Hubs
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 12
® Slide by Network Science by A.L. Barabási
How big are the hubs?
1010
POWER LAW
SCALE-FREE It depends
109 1 on the
107
kmax
ax-
105
on 104 RANDOM NETWORK
ed 103
kmax ~ InN
102
nce 101
100
102 104 106 N 10
8
1010 1012
𝑘𝑚𝑎𝑥 increases with the size of the network. The larger a system is, the larger its biggest
hub
o For 𝛾 = 2, 𝑘𝑚𝑎𝑥~𝑁.
The size of the biggest hub is 𝑂(𝑁)
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 14
® Slide by Network Science by A.L. Barabási
The meaning of scale-free
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 15
Moments of a distribution
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 16
® * Screenshot from networksciencebook.com
In a Random network:
Binomial probability mass function
Figure bt Tayste - Own work, Public Domain,
p(k) https://commons.wikimedia.org/w/index.php?curid=3646951
𝜎=2.18
𝜎=3.12
k
𝑘 =10 𝑘 =20
• If 𝑛 − 𝛾 + 1 > 0 then ⟨𝑘𝑛⟩ goes to infinity as kmax→∞. Therefore, all moments n larger than γ−1
diverge.
For many scale-free networks the degree exponent γ is between 2 and 3. (γ=2 à n> 1 diverge)
Hence, for these in the N → ∞ limit the first moment 〈k〉 is finite, but the second and higher
moments,〈k2〉, 〈k3〉, go to infinity.
We cannot say nodes are in 𝑘 ± 𝜎 because 𝜎 → ∞ . FluctuaKons around the average can be arbitrary
large (or small).
ln ln 𝑁
2<𝛾<3 Ultra-Small World
ln(𝛾 − 1)
𝑑 ~
ln 𝑁
𝛾=3 Cri7cal Point
ln ln 𝑁
Anomalous Regime (γ = 2)
The degree of the biggest hub grows linearly with the system size,
i.e., kmax ~ N. This forces the network into a hub and
spoke configura0on in which all nodes are close to each other
because they all connect to the same central hub.
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 21
® From networksciencebook.com
(b)
1000 Figure 4.4
Distances
10
10
100
0 in scale-free
p ~k networks
k
-2.1
Poisson vs. Power-law Distributions
-2.1
-1
10-1 pk ~ k-2.1
-1 pk k~ k-2.1
10 (a) Comparing a Poisson function with a
10-1
10-2-2 power-law function ( = 2.1) on a linear plot.
p10 10-2
kk -2 Both distributions have k = 10.
p
Ultra-Small World (2 ‹ (b)
pk10k -3-3
10-3
10-3
γ ‹The3)same curves as in (a), but shown on a
10-4-4 log-log plot, allowing us to inspect the dif-
10-4 ference between the two functions in the
10-4 POISSON POISSON
10-5-5
-5
POISSON
POISSON high-k regime.
10-5
10 -6
10 -6
10-6 0 (c) A random network with k = 3 and N = 50,
50
50
10-6 10 00
100
1011 k
101 1 k
1022
102 2
1033
103 The average distance
illustrating that most increases as 𝑑 ~ln ln 𝑁, a significantly slower
nodes have compara-
50 10 10 k 10 103 ble degree k k .
(d)
growth than the 𝑑 ~ ln 𝑁 derived for random networks.
(d) A scale-free network with =2.1 and k =
3, illustrating that numerous small-degree
The hubs radically reduce the path length.
nodes coexist with a few highly connected
hubs.
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 22
® From networksciencebook.com
Distances in scale-free networks
Cri=cal Point (γ = 3)
This value is of par0cular theore0cal interest, as the second moment of the degree
distribu0on does not diverge any longer.
At this cri0cal point, the lnN dependence encountered for random networks returns.
𝑙𝑛𝑁
𝑑~
𝑙𝑛 𝑙𝑛 𝑁
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 23
® From networksciencebook.com
Distances in scale-free networks
In this regime ⟨𝑘2⟩ is finite and the average distance follows the small world result
derived for random networks.
𝑑 ~ ln 𝑁
While hubs con0nue to be present, for γ > 3 they are not sufficiently large and
numerous to have a significant impact on the distance between the nodes.
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 24
® From networksciencebook.com
Summary of the behaviour of scale-free networks
Large scale-free
network with
γ ‹ 2, that lack
multi-links,
cannot exist.
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 25
® From
® Slide
networksciencebook.com
by Network Science by A.L. Barabási
Complejidad y Redes
Complejidad y Redes.
Universidad Politécnica de Madrid
Networks: An Introduction.
Book by M.J. Newman, Ch 14
Network Science
slides by L. Barabási and R. Sinatra
Ch 5 Original slides available at: http://barabasi.com/networksciencebook/
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 27
Overview
• Introduc0on
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 28
INTRODUCTION
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 29
Network models
They model the mechanisms by which networks are created. The idea behind models such
as these is to explore hypothesized genera7ve mechanisms to see what structures they
produce.
If the structures are similar to those of networks we observe in the real world, it suggests—
though does not prove—that similar genera7ve mechanisms may be at work in the real
networks.
The best-known example of a genera0ve network model, and the one that we study first in
this chapter, is the “preferen0al aUachment” model for the growth of networks with power-
law degree distribu0ons.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 30
® From Networks: An Introduction, by M.J. Newman
The rich get richer
In 1955 Herbert Simon proposed the hypothesis "the rich get richer" on the mechanism by
which power law distributions are generated.
(Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika 42 (3–4): 425–440)
“Simon noted the occurrence of power laws in a variety of (non-network) economic data,
such as the distribution of people’s personal wealth. Simon proposed an explanation for the
wealth distribution based on the idea that people who have money already gain more at a
rate proportional to how much they already have” ® From Networks: An Introduction, by M.J. Newman
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 31
What does this have to do
with networks?
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 32
Growth and preferen8al a9achment
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 33
® Slide by Network Science by A.L. Barabási
Growth
Nodes Prefer to Link to the More Connected Nodes (a) 1•109
9•108 WORLD WIDE WEB
The random network model assumes that we randomly choose the in- 8•108
teraction partners of a node. Yet, most real networks new nodes prefer 7•108
NUMBER OF HOSTS
6•108
to link to the more connected nodes, a process called preferential attach-
ER model:ment (Figure 5.2).
5•108
4•108
NUMBER OF PAPERS
ter the billions of less-prominent nodes that populate the Web. As our
Networksknowledge
expand through
is biased thepopular
towards the more addi3on of we
Web documents,
250000
200000
NUMBER OF MOVIES
Barabási & Albert, Science
senting 286, 509 (1999)
the high-degree nodes of the citation network. 150000
100000
• The more movies an actor has played in, the more familiar is a casting
50000
Complejidad y Redes. director with her skills. Hence, the higher the degree of an actor in the
Universidad Politécnica de Madrid The Barabási-Albert Model 0 34
actor network, the higher are the chances that she will be considered 1880 1900 1920 1940 1960 1980 2000 2020
® Slide by Network Science by A.L. Barabási
YEARS
Preferential attachment
Preferential attachment:
New nodes prefer to connect to the more connected nodes.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 35
® Slide by Network Science by A.L. Barabási
Network Science: Evolving Network Models
Cumulative advantage
and Price’s model
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 36
Cumulative advantage
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 37
® From Networks: An Introduction, by M.J. Newman
Cumulative advantage
Price adapted Simon’s methods, with rela0vely liUle change, to the network context. Price
gave a name to Simon’s mechanism: he called it cumula7ve advantage.
(Price, D. J. de S. (1976). "A general theory of bibliometric and other cumulative advantage processes". J. Amer. Soc.
Inform. Sci. 27 (5): 292–306)
• We assume that papers are published con0nually (though they do not have to be
published at a constant rate) and that newly appearing papers cite previously exis0ng
ones.
• the papers and cita0ons form a directed cita0on network: the papers being the ver0ces
and the cita0ons being the edges.
• no paper ever disappears a^er it is published, ver0ces in this network are created but
never destroyed.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 38
® From Networks: An Introduction, by M.J. Newman
Cumulative advantage
• Let the average number of papers cited by a newly appearing paper be c. In the
language of graph theory, c is the average out-degree of the network.
• The probability of receiving a new citation is proportional to the number of citations.
• It cannot be precisely proportional since papers start out life with zero citations:
To get around this hitch, Price proposed that in fact the probability that a paper receives a
new citation should be proportional to the number that it already has plus a positive
constant a.
The constant a in effect gives each paper a number of “free” citations to get it started in the
race—each paper acts as though it started off with a citations instead of none.
An alternative interpretation is that a certain fraction of citations go to papers chosen
uniformly at random without regard for how many citations they currently have, while the
rest go to papers chosen in proportion to current citation count.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 39
® From Networks: An Introduction, by M.J. Newman
Cumulative advantage
The crucial central assump0on of Price’s model is that a newly appearing paper cites
previous ones chosen at random with probability propor7onal to the number of cita-ons
those previous papers already have.
No ques7on of which papers are most relevant topically or which papers are most original
or best wriUen or the difference between research ar0cles and reviews, or any of the many
other factors that certainly affect real cita0on paUerns.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 40
® From Networks: An Introduction, by M.J. Newman
The Barabási-Albert model (BAM)
The preferential Attachment model (PAM)
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 41
The preferential attachment model
Preferen0al aUachment did not become widely accepted as a mechanism for genera0ng
power laws in networks un0l much later than Price’s model;
in the 1990s, when it was independently discovered by Barabási and Albert , who proposed
their own model of a growing network (along with the name “preferen0al aUachment”).
The Barabási-Albert model, which is one of the best-known genera0ve network model in
use today, is similar to Price’s, though not iden0cal, being a model of an undirected rather
than a directed network.
Albert-László Barabási & Réka Albert (1999) Emergence of scaling in random networks. Science 286 (5439): 509–512
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 42
® From Networks: An Introduction, by M.J. Newman
The preferential attachment model
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 43
® Slide by Network Science by A.L. Barabási
The preferential attachment model
Undirected network.
https://youtu.be/-QEx9B5FyEQ
Algorithm:
"'
π 𝑘$ = ∑
( "(
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 44
Mathematical definition of the preferential attachment model
The defini)on of the Barabási-Albert model leaves many mathema)cal details open:
• It does not specify the precise ini)al configura)on of the first m0 nodes.
• It does not specify whether the m links assigned to a new node are added one by one, or
simultaneously.
This leads to poten)al mathema)cal conflicts: If the links are truly independent, they could
connect to the same node i, resul)ng in mul$-links.
Bollobás and collaborators proposed the Linearized Chord Diagram (LCD) to resolve these
problems, making the model more amenable to mathema)cal approaches.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 45
® Slide by Network Science by A.L. Barabási
NetworkX implementation barabasi_albert_graph(n, m, seed=None, initial_graph=None)
The defini)on of the Barabási-Albert model leaves many mathema)cal details open:
• It does not specify whether the m links assigned to a new node are added one by one, or
simultaneously.
This leads to poten)al mathema)cal conflicts: If the links are truly independent, they could
connect to the same node i, resul)ng in mul$-links.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 46
® Slide by Network Science by A.L. Barabási
Model properties
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 47
Preferential attachment model properties
• Analy0cally, it can be proven (see Networks: An Introduc;on, by M.J. Newman) that the
degree distribu0on follows a power law:
𝑝" ~𝑘 !'
• The hubs are large because they arrived earlier, a phenomenon called first-mover
advantage.
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 48
® Translated slide from NNTT y Empresa, by J.I. Santos
Preferential attachment model properties
• Average distance
𝑙𝑛𝑁
small-world property 𝑑~
ln 𝑙𝑛𝑁
(since the degree exponent 𝛾 = 3)
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 49
® Slide by Network Science by A.L. Barabási
Preferential attachment model properties
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 50
® Translated slide from NNTT y Empresa, by J.I. Santos
But…
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 51
The preferential attachment model
Complejidad y Redes.
Universidad Politécnica de Madrid The Barabási-Albert Model 52
® Slide by Network Science by A.L. Barabási
Complejidad y Redes
Complejidad y Redes.
Universidad Politécnica de Madrid
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 54
® From networksciencebook.com
Linear scale
LINEAR SCALE L
100
Using a linear k-axis compresses the numerous 0.15 (a)
small degree nodes in the small-k region, 10-1
rendering them invisible.
Similarly, as there can be orders of magnitude 0.1 10-2
differences in pk for k = 1 and for large k, if we plot pk pk
pk on a linear ver0cal axis, its value for large k will 10-3
appear to be zero. 0.05
10-4
The use of a log-log plot avoids these problems.
10-5
0 1000 2000 k 3000 4000 1
LOG-BINNING C
Complejidad y Redes.
100 100
Universidad Politécnica de Madrid Scale-free networks 55 (c)
® From networksciencebook.com
10-1
10-1
Avoid Linear Binning
LINEAR SCALE
The most flawed method (yet frequently seen in the literature) LINEAR BINNING
is to simply plot pk = Nk/N on a log-log plot. This is called linear 100
binning, as each bin has the0.15
same size Δk = 1. For a scale-free
(a) (b)
network linear binning results in an instantly recognizable 10-1
plateau at large k, consisIng of numerous data points that form
a horizontal line. 0.1 10-2
pk pk
This plateau has a simple explanaIon: typically, we have only 10-3
one copy of each high degree
0.05node, hence in the high-k region
we either have Nk=0 (no node with degree k) or Nk=1 (a single 10-4
node with degree k). Consequently, linear binning will either
provide pk=0, not shown on a log-log plot, or pk = 1/N, which 10-5
applies to all hubs, genera7ng0a plateau at pk = 1/N.
k 3000 4000 100 101 102 3
104
1000 2000 k 10
LOG-BINNING CUMULATIVE
Complejidad y Redes. 100 100 56
Universidad Politécnica de Madrid Scale-free networks
(c) (d)
® From networksciencebook.com 10-1
Avoid Linear Binning
LOG-BINNING CUMULATIVE
Complejidad y Redes. 100 100 57
Universidad Politécnica de Madrid Scale-free networks
(c) (d)
® From networksciencebook.com 10-1
Logarithmic Binning
0 1000 2000 k 3000 4000
Logarithmic binning corrects the non-uniform sampling of linear
binning. LOG-BINNING
100
For log-binning we let the bin sizes increase with the degree, (c)
making sure that each bin has a comparable number of nodes. 10-1
10-2
For example, we can choose the bin sizes to be multiples of 2, so 10-3
that the first bin has size b0=1, containing all nodes with k=1; the pk
second has size b1=2, containing nodes with degrees k=2 and 3; 10-4
the third bin has size b2=4 containing nodes with degrees k=4, 5, 10-5
6, 7. By induction the nth bin has size 2n-1 and contains all nodes 10-6
with degrees k=2n-1, 2n-1+1, ..., 2n-1-1. 10-7
10-8
The degree distribution is given by p⟨kn⟩=Nn/bn, where Nn is the
100 101 102 k 103 104
number of nodes found in the bin n of size bn and ⟨kn⟩ is the
average degree of the nodes in bin bn. 4.22b using linear binning, the obtai
real value =2.5. The reason is that un
Complejidad y Redes.
58
Universidad Politécnica de Madrid Scale-free networks number of nodes in small k bins, henc
® From networksciencebook.com
10-4
LOG-BINNING CUMULATIVE
Another way to extract informaIon from the tail of pk is to plot
100 100
the complementary cumulaIve distribuIon
(c) (d)
10-1
-2
10-1
10
which again enhances the 10
staIsIcal
-3 significance of the high-
10-2
degree region. pk
Pk
10-4 Cut off
The cumulaIve distribuIon10-5 again eliminates the plateau 10-3
observed for linear binning and leads to an extended scaling
10-6
region, allowing for a more -7accurate es7mate of the degree 10-4
10
exponent.
10-8 10-5
100 101 102 k 103 104 100 101 102 k 10
3
104
4.22b using linear binning, the obtained is quite different from the
real value =2.5. The reason is that under linear binning we have a large
Complejidad y Redes.
Universidad Politécnica de Madrid number nodes in small k bins, hence in this regime
ofnetworks
Scale-free 59 we can confident-
® From networksciencebook.com
Scale Free network examples
Complejidad y Redes.
Universidad Politécnica de Madrid Scale-free networks 60
The internet network
Complejidad y Redes.
Universidad Politécnica de Madrid
Nodes: papers How popular is your paper? An empirical study of the citation distribution
Links: citations (S. Redner, 1998)
The number of papers with x citations, N(x), has
a large-x power law decay with exponent 3 25
H.E. Stanley,...
578...
P(k) ~k-g
(g = 3)
Complejidad y Redes.
Universidad Politécnica de Madrid
M: math
NS: neuroscience
Twitter: Facebook
Complejidad y Redes.
Universidad Politécnica de Madrid
Brian Karrer, Lars Backstrom, Cameron Marlowm 2011
® Slide by Network Science by A.L. Barabási
Metabolic network
Nodes: proteins
Links: physical interac0ons-binding
k + k0
P(k ) ~ (k + k0 ) -g exp(- )
kt
H. Jeong,
Complejidad S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)
y Redes.
Universidad Politécnica de Madrid
Complejidad y Redes.
Universidad Politécnica de Madrid
Rual et al. Nature 2005; Stelze et al. Cell 2005
® Slide by Network Science by A.L. Barabási
Not all networks are scale-free
Complejidad y Redes.
Universidad Politécnica de Madrid
Complejidad y Redes.
Universidad Politécnica de Madrid