Session7Slides Draft Nov2 2020 2pm

Session 7: Preferential attachment and
random growth models
IDS 564, Prof. Ali Tafti
1
“sum of many small independent random effects...” (EK 18.1)
plot(function(x) dnorm(x), -4, 4, main = " Normal density")
2
2 vs. k
-k -2
exponential distribution vs. power (scale-free)
3
2 vs. k
-k -2
4
Writing the power law (scale-free)
distribution in linear form:
f(k) = a / (k^c) = a k^(-c), c is some integer 1, 2, 3,…
ln f(k) = ln (a) – c ln(k) (just take long on both sides)

Rewrite ln(a) to be Beta_0 (this is a constant)
Rewrite –c to Beta_1
Rewrite ln f(k) to be Y
Rewrite ln(k) to be X
Y = Beta_0 + Beta_1 * X + epsilon
5
Y = Beta_0 + Beta_1 * X + epsilon
Figure 18.2. A power law distribution (such as this one for the number of Web page in-
links, from Broder et al. [79]) shows up as a straight line on a log-log plot. 6
2 vs. k
-k -2
7
Erdos-Renyi Random graph
• Start with n nodes
• Each link is formed independently with some

probability p
• Serves as a benchmark
8
Random Graph: Degree Distribution
• probability that a node has d links is binomial

[ (n‐1)! / (d!(n‐1‐d)!) ] pd (1‐p)n-1‐d
• Large n, small p, this is approximately a

Poisson distribution: [ (n‐1)d / d! ] pd e‐(n‐1)p
• hence name ``Poisson random graphs’’
9
Poisson distribution: random graphs
Poisson P(d) = (EXP(-(n-1)*p)*((n-1)*p)^d)/FACT(d)
0.16
0.14
0.12
0.1
Poisson P(d) = (EXP(-(n-1)*p)*((n-
1)*p)^d)/FACT(d)
0.08
0.06
0.04
0.02
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
10
Poisson
11
Scale-free
12
Figure 18.3. The distribution of popularity.
13
x-axis is ordered by popularity of book
Figure 18.4. The distribution of popularity.

14
PURE RANDOM SELECTION IN
GROWING NETWORKS
15
What do they add?
• Captures something in the real world that is not
captured in the static models…
• Dynamic model…
• Natural way of varying degree distributions; without

just building them in. A model that ends up
generating features that look like the real world.
16
Benchmark model: uniformly random growth
• Each date a new node is born
• Forms m links to existing nodes
• Each node is chosen with equal likelihood
17
Degree distribution
• Start with a complete graph of m nodes (i.e. fully
connected)
• New nodes form m links to existing t nodes
• An existing node has a probability m/t of getting a

new link in each period
18
Distribution of expected degrees
• Expected degree at time t for node i born at i,
where m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m/t
(m links formed at birth).
19
where m < i < t is:
m + m/(i+1) + m /(i+2) + ... + m / t
(m links formed at birth).
20
where m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m / t
(expected from the next node born at i + 1)
Time starts at 0: One node with no connections. At time 1,

there’s one already-existing node. At time t, an entering node
chooses m links among t already-existing nodes.
21
• Expected degree at time t for node i born at i, where
m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m/t
This is a harmonic series, approximated by

m(1 + log(t/i))
Nodes that have expected degree less than d at

some time t are those such that m(1+log(t/i)) < d,
so it is those i > t e‐(d‐m)/m
22
m<-1
t<-100
plot(function(i) m*(1+log(t/i)) , 0, t, main = "Expected degree at t=100 for birthtime i",

xlab="Birthdate i", ylab="Expected degree at time t")
Curvature means it gets harder to gain new links over time.
23
25
x-axis corresponds to red-dotted line x-axis corresponds to green-dotted line
26
Based on Matthew Jackson ch. 4
20(1+log(100/i)) < 35
i > 100 e-(35-20)/20=47.2
i > 47.2 represents a fraction of (100 –

47.2)/100 nodes at time 100. So
52.8/100, or 0.528 nodes have degree
less than 35.
m = 20
27
• Expected degree at time t for node i born at i, where m <
i < t is:
m + m/(i+1) + m/(i+2) + ... + m/ t
Nodes that have expected degree less than d at

some time t are those such that m(1+log(t/i)) < d,
so it is those i > t e‐(d‐m)/m
Ft(d) =(t‐ t e‐(d‐m)/m)/t = 1‐ e‐(d‐m)/m

(Even though this function describes the expected degrees, it is a good approximation for the actual
degree distribution; MJ ch.4 has more detail.)
28
MEAN FIELD APPROXIMATION
29
Mean field approximation
• Continuous time approximation
• Distribution of expected degrees
• Check by simulation??
30
Mean-field approximation
RANDOM GROWTH
31
(random growth)
• ddi(t)/dt = m/t and di(i)=m
• solution (simple differential equation):
di(t) = m + m log (t/i)
32
Mean-field approximation
PREFERENTIAL ATTACHMENT
GROWTH
33
(preferential attachment)
• ddi(t)/dt = m(di(t)/2tm) and di(i)=m
34
• ddi(t)/dt = m(di(t)/2tm) and di(i)=m
• di(t) = m (t/i)1/2
35
m=1
36
37
By Matthew Jackson (Fall 2015 MOOC)
38
• Ft(d) = 1 – (m/d)2 and ft(d) = 2m2/d3
39
HYBRID MODELS
40
Model of hybrid attachment
• Fraction a incoming links made uniformly at random;
1 - a links are made by searching neighborhoods
(friends) of friends.
• New node meets am nodes uniformly at random and

directs links to them. Then, it meets (1-a)m of their
neighbors and attaches to them as well.
41
Relation to preferential attachnment
• In a network with half degree k and half degree 2k
individuals:
• Randomly select a link and then a node on one end
of it: 2/3 chance that it has degree 2k, 1/3 chance
that it has degree k
p(neighbor deg 2) = (1/3) (1/2 + 1 + 1/2) = 2/3 p(neighbor deg 2) = (1/2) (1/2 + 1/2) = 1/2
p(neighbor deg 1) = (1/3) (1/2 + 0 + 1/2) = 1/3 p(neighbor deg 1) = (1/2) (1/2 + 1/2) = 1/2
42
Relation to preferential attachment
• Consider a similar model: Randomly select a node,
and then look at its neighbor. Over a very large
network, this will approximate the prior model of
selecting a link first.
• However, we expect a numerical difference with

small models: Examples
43
Relation to preferential attachnment
• Consider a similar model: Randomly select a node, and then
look at its neighbor. Over a very large network, this will
approximate the prior model of selecting a link first.
• However, we expect numerical difference with small models.

Examples (compare to prior formulation) :
p(neighbor deg 2) = ¼ (1 + ½ + ½ + 1) = ¾ p(neighbor deg 2) = (1/3) (1 + 0 + 1) = 2/3

p(neighbor deg 1) = ¼ (0 + ½ + ½ + 0) = ¼ p(neighbor deg 1) = (1/3) (0 + 1 + 0) = 1/3
44
Friends of friends
• Randomly find a node
• Randomly pick one of the nodes it attached to
• Chance of finding some node via the second part of

this procedure is proportional to its degree. As a
node has more neighbors, you are first more likely to
find one of its neighbors...
45
Simple Hybrid
• Fraction a uniformly at random, 1‐a via
preferential attachment:
• ddi(t)/dt = am/t + (1‐a)di(t)/2t, and di(i)=m

Differential equations solution: (See EK ch. 18,
advanced material for a related derivation)
• di(t) = (m + 2am/(1‐a))(t/i)(1‐a)/2 ‐ 2am/(1‐a)
46
Degree distribution
• Nodes that have expected degree less than d at some
time t are those i, such that:
(m + xam)(t/i)1/x ‐ xam < d where x = 2/(1‐a).
critical i is such that i/t = [(m + xam)/ (d + xam)]x
47
Degree distribution
• F(d) = (t – i)/t
• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)
Consider a as a randomness parameter.

• a near 1 nearly exponential,
• a near 0 nearly preferential
48
Fitting hybrid models
• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)
• log(1‐F(d)) = c – x log(d+amx)
• estimate m directly (see last slide)
• select a to minimize distance between actual
distribution and model’s distribution
49
Spans Extremes
• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)
• a near 1 nearly exponential,

• a near 0 nearly preferential
50
Fitting hybrid models to the data
• F(d) = 1 – ((m+amx)/(d+amx))x
• log(1‐F(d)) = c – x log(d+amx)
• Estimate m directly from the data, because this is one-

half the average degree in the network.
– Note: total degree is 2tm; and so m = (total degree/2t), in a network growth
model t is approximately the number of nodes.
• Select a to minimize distance between actual distribution

and model’s distribution (use linear-regression).
51

Session7Slides Draft Nov2 2020 2pm

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session7Slides Draft Nov2 2020 2pm

Uploaded by

Copyright:

Available Formats

Session 7: Preferential attachment and

random growth models

IDS 564, Prof. Ali Tafti

plot(function(x) dnorm(x), -4, 4, main = " Normal density")

exponential distribution vs. power (scale-free)

exponential distribution vs. power (scale-free)

f(k) = a / (k^c) = a k^(-c), c is some integer 1, 2, 3,…

ln f(k) = ln (a) – c ln(k) (just take long on both sides)

Y = Beta_0 + Beta_1 * X + epsilon

exponential distribution vs. power (scale-free)

• Each link is formed independently with some

• probability that a node has d links is binomial

• Large n, small p, this is approximately a

• hence name ``Poisson random graphs’’

Figure 18.4. The distribution of popularity.

• Natural way of varying degree distributions; without

• Each date a new node is born

• Forms m links to existing nodes

• Each node is chosen with equal likelihood

• New nodes form m links to existing t nodes

• An existing node has a probability m/t of getting a

(m links formed at birth).

(m links formed at birth).

(expected from the next node born at i + 1)

Time starts at 0: One node with no connections. At time 1,

This is a harmonic series, approximated by

Nodes that have expected degree less than d at

plot(function(i) m*(1+log(t/i)) , 0, t, main = "Expected degree at t=100 for birthtime i",

Curvature means it gets harder to gain new links over time.

i > 100 e-(35-20)/20=47.2

i > 47.2 represents a fraction of (100 –

Nodes that have expected degree less than d at

Ft(d) =(t‐ t e‐(d‐m)/m)/t = 1‐ e‐(d‐m)/m

• New node meets am nodes uniformly at random and

• However, we expect a numerical difference with

• However, we expect numerical difference with small models.

p(neighbor deg 2) = ¼ (1 + ½ + ½ + 1) = ¾ p(neighbor deg 2) = (1/3) (1 + 0 + 1) = 2/3

• Randomly pick one of the nodes it attached to

• Chance of finding some node via the second part of

• ddi(t)/dt = am/t + (1‐a)di(t)/2t, and di(i)=m

(m + xam)(t/i)1/x ‐ xam < d where x = 2/(1‐a).

critical i is such that i/t = [(m + xam)/ (d + xam)]x

Consider a as a randomness parameter.

• a near 1 nearly exponential,

• Estimate m directly from the data, because this is one-

• Select a to minimize distance between actual distribution

You might also like