Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

COMS 6998-06 Network Theory

Week 2: January 31, 2008

Dragomir R. Radev
Thursdays, 6-8 PM
233 Mudd
Spring 2008
(3) Random graphs
Statistical analysis of networks
• We want to be able to describe the behavior of
networks under certain assumptions.
• The behavior is described by the diameter,
clustering coefficient, degree distribution, size of
the largest connected component, the presence
and count of complete subgraphs, etc.
• For statistical analysis, we need to introduce the
concept of a random graph.
Erdos-Renyi model
• A very simple model with several variants.
• We fix n and connect each candidate edge with
probability p. This defines an ensemble Gn,p
• The two examples below are specific instances of G10,0.2.
In other models, m is fixed. There are also versions in
which some graphs are more likely than others, etc.

Try Pajek
Erdos-Renyi model
• We are interested in the computation of
specific properties of E-R random graphs.
• The number of
 n  n(n  1)
candidate edges is:   
 2 2

• The actual number of edges m


is on average: m  p n(n  1)
2

• We will look at the actual distribution in a


bit.
Properties
• The expected value of a Poisson-
distributed random variable is equal to λ
and so is its variance.
• The mode of a Poisson-distributed random
variable with non-integer λ is equal to
floor(λ), which is the largest integer less
than or equal to λ. When λ is a positive
integer, the modes are λ and λ − 1.
Degree distribution
• The probability p(k) that a node has a degree k
is Binomial:  k 
p (k )    p k (1  p ) n 1 k
 n  1

• In practice, this is the Poisson distribution


k e  
p(k ) 
k!
for large n (n >> kz)
where  is the mean degree
• Average degree = 
= 2m/n = p(n-1) ≈ pn
Giant component size
• Let v be the number of nodes that are not in the giant
component. Then u=v/n is the fraction of the graph
outside of the giant component.
• If a node is outside of the giant component, its k
neighbors are too. The probability of this happening is uk.
 
(  u ) k
u   pk u k e    e  (u 1)
k 0 k 0 k!

S 1  eS
• Let S=1-u. We now have
For <1, the only non-negative solution is S=0
For >1 (after the phase transition), the only non-
negative solution is the size of the giant component
• At the phase transition, the component sizes are
distributed according to a power law with exponent 5/2.
Giant component size
1
• Similarly one can prove that s 
1    S

[Newman 2003]
Diameter
• A given vertex i has Ni1 first neighbors. The
expected value of this number is .
• But we also know that  = pn.
• Now move to Ni2. This is the number of second
neighbors of i. Let’s make the assumption that
these 2are the1 neighbors of the first neighbors.
So, N i  N i  
2

• What does this remind you of?


• When must the procedure end?
Diameter (cont’d)
For D equal to the diameter of the graph:

N 
i
D D

At all distances:

n  N i1  N i2    N iD  D
In other words (after taking a logarithm):
log n
D
log 
Are E-R graphs realistic?
• They have small world properties
(diameter is logarithmic in the size of the
graph)
• But low clustering coefficient. Example for
autonomous internet systems, compare
0.30 with 0.0004 [Pastor-Satorras and
Vespignani]
• And unrealistic degree distributions
• Not to mention skinny tails
Clustering coefficient
• Given a vertex i and its two real neighbors
j and k, what is the probability that the
graph contains an edge between j and k.
• Ci = #triangles at i / #triples at I
• C = average over all Ci
• Typical value in real graphs can be as high
as 50% [Newman 2002].
• In random graphs, C = p (ignoring the fact
that j and k share a neighbor (i).
Some real networks
• From Newman 2002:
Network n Mean degree z Cc Cc for random graph

Internet (AS level) 6,374 3.8 0.24 0.00060

WWW (sites) 153,127 35.2 0.11 0.00023

Power grid 4,941 2.7 0.080 0.00054

Biology collaborations 1,520,251 15.5 0.081 0.000010

Mathematics collaborations 253,339 3.9 0.15 0.000015

Film actor collaborations 449,913 113.4 0.20 0.00025

Company directors 7,673 14.4 0.59 0.0019

Word co-occurrence 460,902 70.1 0.44 0.00015

Neural network 282 14.0 0.28 0.049

Metabolic network 315 28.3 0.59 0.090

Food web 134 8.7 0.22 0.065


[Newman 2002]
Graphs with predetermined degree
sequences
• Bender and Canfield introduced this
concept.
• For a given degree sequence, gie the
same statistical weight to all graphs in the
ensemble.
• Generate a random sequence in
proportion to the predefined sequence
• Note that the sum of degrees must be
even.
(4) Software
List of packages
• Pajek:
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
• Jung: http://jung.sourceforge.net/
• Guess: http://graphexploration.cond.org/
• Networkx: https://networkx.lanl.gov/wiki
• Pynetconv: http://pynetconv.sourceforge.net/
• Clairlib: http://www.clairlib.org
• UCINET

You might also like