Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

Session 7: Preferential attachment and

random growth models

IDS 564, Prof. Ali Tafti

1
“sum of many small independent random effects...” (EK 18.1)

plot(function(x) dnorm(x), -4, 4, main = " Normal density")

2
2 vs. k
-k -2

exponential distribution vs. power (scale-free)

3
2 vs. k
-k -2

exponential distribution vs. power (scale-free)

4
Writing the power law (scale-free)
distribution in linear form:

f(k) = a / (k^c) = a k^(-c), c is some integer 1, 2, 3,…

ln f(k) = ln (a) – c ln(k) (just take long on both sides)


Rewrite ln(a) to be Beta_0 (this is a constant)
Rewrite –c to Beta_1
Rewrite ln f(k) to be Y
Rewrite ln(k) to be X

Y = Beta_0 + Beta_1 * X + epsilon

5
Y = Beta_0 + Beta_1 * X + epsilon

Figure 18.2. A power law distribution (such as this one for the number of Web page in-
links, from Broder et al. [79]) shows up as a straight line on a log-log plot. 6
2 vs. k
-k -2

exponential distribution vs. power (scale-free)

7
Erdos-Renyi Random graph
• Start with n nodes

• Each link is formed independently with some


probability p

• Serves as a benchmark

8
Random Graph: Degree Distribution

• probability that a node has d links is binomial


[ (n‐1)! / (d!(n‐1‐d)!) ] pd (1‐p)n-1‐d

• Large n, small p, this is approximately a


Poisson distribution: [ (n‐1)d / d! ] pd e‐(n‐1)p

• hence name ``Poisson random graphs’’

9
Poisson distribution: random graphs
Poisson P(d) = (EXP(-(n-1)*p)*((n-1)*p)^d)/FACT(d)
0.16

0.14

0.12

0.1
Poisson P(d) = (EXP(-(n-1)*p)*((n-
1)*p)^d)/FACT(d)
0.08

0.06

0.04

0.02

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

10
Poisson

11
Scale-free

12
Figure 18.3. The distribution of popularity.
13
x-axis is ordered by popularity of book

Figure 18.4. The distribution of popularity.


14
PURE RANDOM SELECTION IN
GROWING NETWORKS

15
What do they add?
• Captures something in the real world that is not
captured in the static models…

• Dynamic model…

• Natural way of varying degree distributions; without


just building them in. A model that ends up
generating features that look like the real world.

16
Benchmark model: uniformly random growth

• Each date a new node is born

• Forms m links to existing nodes

• Each node is chosen with equal likelihood

17
Degree distribution
• Start with a complete graph of m nodes (i.e. fully
connected)

• New nodes form m links to existing t nodes

• An existing node has a probability m/t of getting a


new link in each period

18
Distribution of expected degrees
• Expected degree at time t for node i born at i,
where m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m/t

(m links formed at birth).

19
Distribution of expected degrees
• Expected degree at time t for node i born at i,
where m < i < t is:
m + m/(i+1) + m /(i+2) + ... + m / t

(m links formed at birth).

20
Distribution of expected degrees
• Expected degree at time t for node i born at i,
where m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m / t

(expected from the next node born at i + 1)

Time starts at 0: One node with no connections. At time 1,


there’s one already-existing node. At time t, an entering node
chooses m links among t already-existing nodes.

21
Distribution of expected degrees
• Expected degree at time t for node i born at i, where
m < i < t is:
m + m/(i+1) + m/(i+2) + ... + m/t

This is a harmonic series, approximated by


m(1 + log(t/i))

Nodes that have expected degree less than d at


some time t are those such that m(1+log(t/i)) < d,
so it is those i > t e‐(d‐m)/m
22
m<-1
t<-100

plot(function(i) m*(1+log(t/i)) , 0, t, main = "Expected degree at t=100 for birthtime i",


xlab="Birthdate i", ylab="Expected degree at time t")

Curvature means it gets harder to gain new links over time.

23
25
x-axis corresponds to red-dotted line x-axis corresponds to green-dotted line

26
Based on Matthew Jackson ch. 4

20(1+log(100/i)) < 35

i > 100 e-(35-20)/20=47.2

i > 47.2 represents a fraction of (100 –


47.2)/100 nodes at time 100. So
52.8/100, or 0.528 nodes have degree
less than 35.

m = 20

27
Distribution of expected degrees
• Expected degree at time t for node i born at i, where m <
i < t is:
m + m/(i+1) + m/(i+2) + ... + m/ t

Nodes that have expected degree less than d at


some time t are those such that m(1+log(t/i)) < d,
so it is those i > t e‐(d‐m)/m

Ft(d) =(t‐ t e‐(d‐m)/m)/t = 1‐ e‐(d‐m)/m


(Even though this function describes the expected degrees, it is a good approximation for the actual
degree distribution; MJ ch.4 has more detail.)

28
MEAN FIELD APPROXIMATION

29
Mean field approximation
• Continuous time approximation
• Distribution of expected degrees
• Check by simulation??

30
Mean-field approximation

RANDOM GROWTH

31
Distribution of expected degrees
(random growth)
• ddi(t)/dt = m/t and di(i)=m
• solution (simple differential equation):
di(t) = m + m log (t/i)

32
Mean-field approximation

PREFERENTIAL ATTACHMENT
GROWTH

33
Distribution of expected degrees
(preferential attachment)
• ddi(t)/dt = m(di(t)/2tm) and di(i)=m

34
Distribution of expected degrees
• ddi(t)/dt = m(di(t)/2tm) and di(i)=m

• di(t) = m (t/i)1/2

35
m=1

36
37
By Matthew Jackson (Fall 2015 MOOC)

38
Distribution of expected degrees
• Ft(d) = 1 – (m/d)2 and ft(d) = 2m2/d3

39
HYBRID MODELS

40
Model of hybrid attachment
• Fraction a incoming links made uniformly at random;
1 - a links are made by searching neighborhoods
(friends) of friends.

• New node meets am nodes uniformly at random and


directs links to them. Then, it meets (1-a)m of their
neighbors and attaches to them as well.

41
Relation to preferential attachnment
• In a network with half degree k and half degree 2k
individuals:
• Randomly select a link and then a node on one end
of it: 2/3 chance that it has degree 2k, 1/3 chance
that it has degree k

p(neighbor deg 2) = (1/3) (1/2 + 1 + 1/2) = 2/3 p(neighbor deg 2) = (1/2) (1/2 + 1/2) = 1/2
p(neighbor deg 1) = (1/3) (1/2 + 0 + 1/2) = 1/3 p(neighbor deg 1) = (1/2) (1/2 + 1/2) = 1/2

42
Relation to preferential attachment
• Consider a similar model: Randomly select a node,
and then look at its neighbor. Over a very large
network, this will approximate the prior model of
selecting a link first.

• However, we expect a numerical difference with


small models: Examples

43
Relation to preferential attachnment
• Consider a similar model: Randomly select a node, and then
look at its neighbor. Over a very large network, this will
approximate the prior model of selecting a link first.

• However, we expect numerical difference with small models.


Examples (compare to prior formulation) :

p(neighbor deg 2) = ¼ (1 + ½ + ½ + 1) = ¾ p(neighbor deg 2) = (1/3) (1 + 0 + 1) = 2/3


p(neighbor deg 1) = ¼ (0 + ½ + ½ + 0) = ¼ p(neighbor deg 1) = (1/3) (0 + 1 + 0) = 1/3

44
Friends of friends
• Randomly find a node

• Randomly pick one of the nodes it attached to

• Chance of finding some node via the second part of


this procedure is proportional to its degree. As a
node has more neighbors, you are first more likely to
find one of its neighbors...

45
Simple Hybrid
• Fraction a uniformly at random, 1‐a via
preferential attachment:

• ddi(t)/dt = am/t + (1‐a)di(t)/2t, and di(i)=m


Differential equations solution: (See EK ch. 18,
advanced material for a related derivation)
• di(t) = (m + 2am/(1‐a))(t/i)(1‐a)/2 ‐ 2am/(1‐a)

46
Degree distribution
• Nodes that have expected degree less than d at some
time t are those i, such that:

(m + xam)(t/i)1/x ‐ xam < d where x = 2/(1‐a).

critical i is such that i/t = [(m + xam)/ (d + xam)]x

47
Degree distribution
• F(d) = (t – i)/t

• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)

Consider a as a randomness parameter.


• a near 1 nearly exponential,
• a near 0 nearly preferential
48
Fitting hybrid models
• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)

• log(1‐F(d)) = c – x log(d+amx)
• estimate m directly (see last slide)
• select a to minimize distance between actual
distribution and model’s distribution

49
Spans Extremes
• F(d) = 1 – ((m+amx)/(d+amx))x
where x = 2/(1‐a)

• a near 1 nearly exponential,


• a near 0 nearly preferential

50
Fitting hybrid models to the data
• F(d) = 1 – ((m+amx)/(d+amx))x
• log(1‐F(d)) = c – x log(d+amx)

• Estimate m directly from the data, because this is one-


half the average degree in the network.
– Note: total degree is 2tm; and so m = (total degree/2t), in a network growth
model t is approximately the number of nodes.

• Select a to minimize distance between actual distribution


and model’s distribution (use linear-regression).

51

You might also like