Professional Documents
Culture Documents
Stochastic Blockmodels and Community Structure in Networks
Stochastic Blockmodels and Community Structure in Networks
works have also explored blockmodels with various forms tially no difference between the model described here and
of degree heterogeneity [17–21], motivated largely by the the standard blockmodel.
recent focus on degree distributions in the networks lit- With this in mind, the model we study is now defined
erature. We note particularly the currently unpublished as follows. Let G be an undirected multigraph on n ver-
work of Patterson and Bader [21], who apply a variational tices, possibly including self-edges, and let Aij be an el-
Bayes approach to a model close, though not identical, ement of the adjacency matrix of the multigraph. Recall
to the one considered here. that the adjacency matrix for a multigraph is convention-
In this paper we build upon the ideas of these au- ally defined such that Aij is equal to the number of edges
thors but take a somewhat different tack, focusing on between vertices i and j when i 6= j, but the diagonal el-
the question of why degree heterogeneity in blockmodels ement Aii is equal to twice the number of self-edges from
is a good idea. To study this question, we develop a i to itself (and hence is always an even number).
degree-corrected blockmodel with closed-form parameter We let the number of edges between each pair of ver-
solutions, which allows us more directly to compare tra- tices (or between a vertex and itself in the case of self-
ditional and degree-corrected models. As we show, the edges) be independently Poisson distributed and define
incorporation of degree heterogeneity in the stochastic ωrs to be the expected value of the adjacency matrix el-
blockmodel results in a model that in practice performs ement Aij for vertices i and j lying in groups r and s
much better, giving significantly improved fits to network respectively. Note that this implies that the expected
data, while being only slightly more complex than the number of self-edges at a vertex in group r is 21 ωrr be-
simple model described above. Although we here exam- cause of the factor of two in the definition of the diagonal
ine only the simplest version of this idea, the approaches elements of the adjacency matrix.
we explore could in principle be incorporated into other Now we can write the probability P (G|ω, g) of graph G
blockmodels, such as the overlapping or mixed member- given the parameters and group assignments as
ship models.
In outline, the paper is as follows. We first review Y (ωgi gj )Aij
P (G|ω, g) = exp −ωgi gj
the ideas behind the ordinary stochastic blockmodel to Aij !
i<j
understand why degree heterogeneity causes problems. Aii /2
1
Then we introduce a degree-corrected version of the Y
2 ωgi gi
exp − 21 ωgi gi .
model and demonstrate its use in a posteriori blockmod- × (1)
i
(Aii /2)!
eling to infer group memberships in empirical network
data, showing that the degree-corrected model outper- Given that Aij = Aji and ωrs = ωsr , Eq. (1) can after
forms the original model both on actual networks and on a small amount of manipulation be rewritten in the more
new synthetic benchmarks. The benchmarks introduced, convenient form
which generalize previous benchmarks for community de-
tection, may also be of independent interest. 1
P (G|ω, g) = Q Q A /2
A
i<j ij ! i2
ii (Aii /2)!
Y
mrs /2
exp − 21 nr ns ωrs ,
× ωrs (2)
II. STANDARD STOCHASTIC BLOCKMODEL rs
In this section we review briefly the use of the original, where nr is the number of vertices in group r and
non-degree-corrected blockmodel, focusing on undirected X
networks since they are the most commonly studied. mrs = Aij δgi ,r δgj ,s , (3)
ij
For consistency with the degree-corrected case we will
allow our networks to contain both multi-edges and self- which is the total number of edges between groups r and
edges, even though many real-world networks have no group s, or twice that number if r = s.
such edges. Like most random graph models for sparse Our goal is to maximize this probability with respect to
networks the incorporation of multi-edges and self-edges the unknown model parameters ωrs and the group assign-
makes computations easier without affecting the funda- ments of the vertices. In most cases, it will in fact be sim-
mental outcome significantly—typically their inclusion pler to maximize the logarithm of the probability (whose
gives rise to corrections to the results that are of or- maximum is in the same place). Neglecting constants
der 1/n and hence vanishing as the size n of the net- and terms independent of the parameters and group as-
work becomes large. For networks with multi-edges, the signments (i.e., independent of ωrs , nr , and mrs ), the
previously-defined probability ψrs of an edge between logarithm is given by
vertices in groups r and s is replaced by the expected
number of such edges, and the actual number of edges log P (G|ω, g) =
X
(mrs log ωrs − nr ns ωrs ). (4)
between any pair of vertices will be drawn from a Pois- rs
son distribution with this mean. In the limit of a large
sparse graph, where the probability of an edge and the We will maximize this expression in two stages, first
expected number of edges become equal, there is essen- with respect to the model parameters ωrs , then with
3
respect to the group assignments gi . The maximum- for X and Y instead of the assumed true distribution pK .
likelihood values ω̂rs of the model parameters (where hat- So intuitively it can be considered as measuring how far
ted variables indicate maximum-likelihood estimates) are pK is from p1 . The most likely group assignments under
found by simple differentiation to be the ordinary stochastic blockmodel are then those as-
mrs signments that require the most information to describe
ω̂rs = , (5) starting from a model that does not have group structure.
nr ns
This type of approach, in which one constructs an ob-
and the value P of Eq. (4) at this maximum is jective function that measures the difference between an
log P (G|ω̂, g) = rs mrs log(mrs /nr ns ) − 2m, where
observed quantity and the expected value of the same
m = 21 rs mrs is the total number of edges in the net- quantity under an appropriate null model, is common in
P
work. Dropping the final constant, we define the unnor- work on community detection in networks. One widely
malized log-likelihood for the group assignment g: used objective function is the so-called modularity:
X mrs 1 X
L(G|g) = mrs log . (6) Q(g) = [Aij − Pij ]δ(gi , gj ), (9)
rs
nr ns 2m ij
The maximum of this quantity with respect to the group where Aij is an element of the adjacency matrix and Pij
assignments now tells us the most likely set of assign- is the expected value of the same element under some
ments [11, 22]. In effect, Eq. (6) gives us an objective or null model. The null model assumed in our blockmodel
quality function which is large for “good” group assign- calculation is one in which Pij is constant. Making the
ments and small for “poor” ones. Many such objective same choice for the modularity would lead to
functions have been defined elsewhere in the literature on
community detection and graph partitioning, but Eq. (6) K
X
differs from most other choices in being derived from first Q(g) = [pK (r, r) − p1 (r, r)]. (10)
principles, rather than heuristically motivated or simply r=1
proposed ad hoc.
The modularity, however, is not normally used this way
Equation (6) has an interesting information-theoretic
and for good reason. This null model, corresponding to
interpretation. By adding and dividing by constant fac-
a multigraph version of the Erdős–Rényi random graph,
tors of the total number of vertices and edges the equa-
produces highly unrealistic networks, even for networks
tion can be written in the alternative form
with no community structure. Specifically, it produces
X mrs mrs /2m networks with Poisson degree distributions, in stark con-
L(G|g) = log , (7)
2m nr ns /n
2 trast to most real networks, which tend to have broad
rs
distributions of vertex degree. To avoid this problem,
where again we have neglected irrelevant constants. Now modularity is usually defined using a different null model
imagine, for a given set of group assignments, that we that fixes the expected degree sequence to be the same
choose an edge uniformly at random from our network, as that of the observed network. Within this model
and let X be the group assignment at one (randomly se- Pij = ki kj /2m where ki is the degree of vertex i. Then
lected) end of the edge and Y be the group assignment at the probability distribution over the group assignments
the other end of the edge. The probability distribution at the end of a randomly chosen edge becomes
of the variables X and Y is then pK (X = r, Y = s) = κr κs
pK (r, s) = mrs /2m, which appears twice in the above ex- pdegree (X = r, Y = s) = pdegree (r, s) = , (11)
2m 2m
pression. The remaining terms in the denominator of the
logarithm in (7) are equal to the expected value of the where
same probability in a network with the same group as- X X
signments but different edges, the edges now being placed κr = mrs = ki δgi ,r (12)
completely at random without regard for the groups. Call s i
this second distribution p1 (r, s). Equation (7) can then is the total number of ends of edges, commonly called
be written stubs, that emerge from vertices in group r, or equiva-
X pK (r, s) lently the sum of the degrees of the vertices in group r.
L(G|g) = pK (r, s) log , (8) (Note that Eq. (12) correctly counts two stubs for edges
rs
p1 (r, s)
that both start and end in group r.) Then the desired
which is the well-known Kullback–Leibler divergence be- group assignments are given by the maximum of
tween the probability distributions pK and p1 [23]. K
The Kullback–Leibler divergence is not precisely a dis- X
Q(g) = [pK (r, r) − pdegree (r, r)]. (13)
tance measure since it’s not symmetric in pK and p1 .
r=1
However, if the logarithms are taken base 2 then it mea-
sures the expected number of extra bits required to en- This choice of null model is found to give significantly
code X and Y if p1 is mistakenly used as the distribution better results than the original uniform model because
4
it allows for the fact that vertices with high degree are, form
all other things being equal, more likely to be connected
than those with low degree, simply because they have 1
P (G|θ, ω, g) = Q Q A /2
more edges. From an information-theoretic viewpoint, an i<j Aij ! i2
ii (Aii /2)!
edge between two high-degree vertices is less surprising Y Y
θiki mrs /2
exp − 21 ωrs , (16)
× ωrs
than an edge between two low-degree vertices and we get
i rs
better results if we incorporate this observation in our
model. with ki being the degree of vertex i as previously and mrs
Returning to the stochastic blockmodel, using p1 in- defined as in Eq. (3). As before, rather than maximizing
stead of pdegree in the objective function causes problems this probability, it is more convenient to maximize its
similar to those that affect the modularity. Fits to the logarithm, which, ignoring constants, is
model may incorrectly suggest that structure in the net-
work due merely to the degree sequence is a result in-
X X
log P (G|θ, ω, g) = 2 ki log θi + (mrs log ωrs − ωrs ).
stead of group memberships. We will shortly see explicit i rs
real-world cases in which such incorrect conclusions arise. (17)
The solution to this problem, as with the modularity, is Allowing for the constraint (15), the maximum-likelihood
to define a stochastic blockmodel that directly incorpo- values of the parameters θi and ωrs are then given by
rates arbitrary heterogeneous degree distributions.
ki
θ̂i = , ω̂rs = mrs , (18)
κgi
III. DEGREE-CORRECTED STOCHASTIC where κr is the sum of the degrees in group r as be-
BLOCKMODEL fore (see Eq. (12)). This maximum-likelihood parameter
estimate has the appealing property of preserving the
In the degree-corrected blockmodel, the probability expected numbers of edges between groups and the ex-
distribution over undirected multigraphs with self-edges pected degree sequence of the network. To see this, let
(again denoted G) depends not only on the parameters hxi be the average of x in the ensemble of graphs with
introduced previously but also on a new set of parame- parameters (18). Then the expected number of edges
ters θi controlling the expected degrees of vertices i. between groups r and s is
As before, we assume there are K groups, ωrs is a
X ki kj mgi gj
K × K symmetric matrix of parameters controlling edges
X
hAij iδgi ,r δgj ,s = δgi ,r δgj ,s = mrs , (19)
between groups r and s, and gi is the group assignment ij ij
κgi κgj
of vertex i. As in the uncorrected blockmodel, let the
numbers of edges each be drawn from a Poisson distri- where we have made use of Eq. (12). Similarly, the aver-
bution, but now, following [20] and [21], let the expected age degree of vertex i in the ensemble is
value of the adjacency matrix element Aij be θi θj ωgi gj .
Then graph G has probability X X ki X kj
hAij i = θ̂i θ̂j ω̂gi gj = mg i g j
j j
κgi j κgj
Y (θi θj ωgi gj )Aij
P (G|θ, ω, g) = exp(−θi θj ωgi gj ) ki X X kj
Aij ! = mg ,r δgj ,r
i<j κgi j r κr i
1 2
Aii /2
2 θ i ωgi gi ki X
Y
exp − 21 θi2 ωgi gi .
× = mgi ,r = ki . (20)
(Aii /2)! κgi r
i
(14)
Traditional blockmodels, by contrast, preserve only the
expected value of the matrix mrs and not the expected
The θ parameters are arbitrary to within a multiplicative degree—every vertex in group r in the
constant which is absorbed into the ω parameters. Their P traditional block-
model has the same expected degree j mr,gj /(nr ngj ) =
normalization can be fixed by imposing the constraint
κr /nr .
X Substituting Eq. (18) into Eq. (17), the maximum of
θi δgi ,r = 1 (15) log P (G|θ, ω, g) for the degree-corrected blockmodel is
i
X ki X
log P (G|θ, ω, g) = 2 + ki log
mrs log mrs − 2m.
for all groups r, which makes θi equal to the probability i
κgi rs
that an edge connected to the community to which i be- (21)
longs lands on i itself. With this constraint, the probabil- where as before m is the total number of edges in the net-
ity P (G|θ, ω, g) can be simplified to the more convenient work. The first term in this expression can be rewritten
5
Consider moving vertex i from community r to community s. Let kit be the number of edges from vertex i to
vertices in group t excluding self-edges, and let ui be the number of self-edges for vertex i. These quantities are the
same for all possible moves of vertex i. Define a(x) = 2x log x and b(x) = x log x where a(0) = 0 and b(0) = 0. Then
the change in the log-likelihood can be written:
X
∆L = a(mrt + kit ) − a(mrt ) + a(mst + kit ) − a(mst ) + a(mrs + kir − kis ) − a(mrs )
t6=r,s
+ b(mrr − 2(kir + ui )) − b(mrr ) + b(mss + 2(kis + ui )) − b(mss ) − a(κr − ki ) + a(κr ) − a(κs + ki ) + a(κs ).
(25)
This quantity can be evaluated in time O(K + hki) on average and finding the s that gives the maximum ∆L for
6
stochastic blockmodel, which is ideally designed to play In mixed models such as these, each edge in effect has a
exactly this role. (Indeed, though it is not the primary probability λ of being chosen from the planted structure
focus of this article, we believe that the blockmodel may and 1−λ of being chosen from the null model. Among the
in general be of use as a source of flexible and challeng- edges attached to a given vertex, the expected fraction
ing benchmark networks for testing the performance of drawn from the planted structure is λ and the remainder
community detection strategies.) are drawn randomly.
In order to generate networks we must first choose the Once we have chosen our values for g, θ, and ω,
values of g, ω, and θ. The group assignments g can be the network generation itself is a straightforward imple-
chosen in any way we please, and we can also choose mentation of the blockmodel: we first draw a Poisson-
freely the values for the expected degrees of all vertices, distributed number of edges for each pair of groups r, s
which fixes the θ variables according to Eq. (18). Choos- with mean ωrs (or 21 ωrs when r = s), then we assign each
ing the values of ωrs involves a little more work. In prin- end of an edge to a vertex in the appropriate group with
ciple, any set of nonnegative values is acceptable
P provided probability θi .
it is symmetric in r and s and satisfies s ωrs = κr , with
κr as in Eq. (12). However, because we wish to be able
to vary the level of community structure in our networks C. Performance on synthetic networks
we choose ωrs in the present case to have the particular
form
There are two primary considerations in comparing
ωrs = planted
λωrs + (1 − random
λ)ωrs . (27) the degree-corrected and uncorrected blockmodels on our
synthetic benchmark networks. The first is how close
This form allows us to interpolate linearly between the the group assignments found in our calculations are to
planted random the planted group assignments. The second is the per-
values ωrs and ωrs using the parameter λ. The
random formance of the heuristic optimization algorithm. It is
ωrs represents a fully random network with no group
structure; it is defined to be the expected value of ωrs in possible that the maximum-likelihood group assignment
a random graph with fixed expected degrees [32], which may be close to the true group assignment but that our
random heuristic is unable to find it. And if the heuristic per-
is simply ωrs = κr κs /2m.
planted forms better in general for either the corrected or un-
The value of ωrs by contrast is chosen to create
group structure. A simple example with four groups is: corrected blockmodel it may make comparisons between
the models unreliable: we want to claim that the degree-
κ1 0 0 0
corrected model gives better results than the uncorrected
0 κ2 0 0 version because it has a better objective function for het-
ω planted = . (28)
0 0 κ3 0 erogeneous networks and not because we used a biased
0 0 0 κ4 optimization algorithm.
To shed light on these questions we take the following
With this choice, all edges will be placed within com- approach. For both the degree-corrected model and the
munities when λ = 1 and none between communities. uncorrected model we perform tests with random initial
When λ = 0, on the other hand, all edges will be placed conditions and with initial conditions equal to the known
randomly, conditioned on the degree sequence, and for in- planted group structure. The latter (planted) initializa-
termediate values of λ we interpolate between these two tions tell us whether the planted group assignment, or
extremes in a controlled fashion. (This model is similar to something close to it, is a local optimum of the respec-
the benchmark network ensemble previously proposed by tive objective function—if it is, our heuristic should find
Lancichinetti [33]—roughly speaking it is the “canonical that optimum most of the time and return a final assign-
ensemble” version of the “microcanonical” model in [33].) ment similar to the planted one. This should be true for
planted
More complicated choices of ωrs are also possible. essentially any reasonable heuristic, even a biased one,
Examples include the core-periphery structure since the heuristic will be making only minimal changes
κ1 − κ2 κ2
to the group assignments (or none at all).
ω planted = , (29) For small values of λ we expect that the planted as-
κ2 0
signment is not near a local maximum, but for large λ we
where κ1 ≥ κ2 . In the case where κ1 ≃ κ2 this choice would hope that it is. Thus, if we discover in the process
also generates approximately bipartite networks, where of running our heuristic that it is not, it strongly sug-
most edges run between the two groups and few lie inside. gests we have made a poor choice of objective function
Another possibility is a hierarchical structure of the form (and this conclusion should hold even if the heuristic is
biased).
κ1 − A A 0 The results of such tests on our synthetic networks are
ω planted = A κ2 − A 0 , (30) shown in Fig. 3. We plot the normalized mutual infor-
0 0 κ3 mation as a function of λ for various choices of planted
structure. Each data point represents an average over 30
where A ≤ min(κ1 , κ2 ). networks of size n = 1000 for both the degree-corrected
9
NMI
NMI
0.4 0.4 0.4
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
FIG. 3: The average normalized mutual information as a function of λ for the three synthetic tests described in the text.
Filled squares and transparent circles indicate tests initialized with planted and random assignments respectively. Green points
denote results for the degree-corrected blockmodel and black for the ordinary uncorrected model. The left, middle, and right
panels show respectively the results for the two-group two-degree networks, core-periphery networks, and hierarchical networks.
The error bars indicate the standard error on the mean computed from 30 networks per data point.
and uncorrected objective functions. In the case of ran- We have also tested our blockmodels against syn-
dom initializations, ten initializations were performed for thetic networks with two other types of structure, one
each network and we take the best result among the ten. the core-periphery or approximately bipartite structure
The left panel in the figure shows results for networks of Eq. (29) and the other the hierarchical structure of
with two communities and just two possible values of the Eq. (30). In these examples we use a more realistic de-
expected degree, 10 and 30. Each of the 1000 vertices gree distribution that approximately follows a power law
was assigned to one of the four possible combinations of with a minimum expected degree of 10 and an exponent
degree and community with equal probability, and the of −2.5. For the core-periphery networks we randomly
planted structure was chosen diagonal, as in Eq. (28). assign vertices to one of the two groups, while for the
The green points in the figure indicate the perfor- hierarchical networks we fix 500 vertices to be in the
mance of the degree-corrected blockmodel, while the first of the three groups, put the rest randomly in the
black points are for the uncorrected model. Solid squares other two, and set A = 41 min(κ1 , κ2 ). (It has been sug-
and open circles show performance starting from the gested that choosing non-equal sizes for groups in this
planted community structure and random assignments way presents a harder challenge for structure detection
respectively. Bearing in mind that λ = 0 corresponds to algorithms [30, 34].)
zero planted structure (in which case neither algorithm The performance of our blockmodels on these two
should find any significant result) and that a normal- classes of networks is shown in the middle (core-
ized mutual information approaching 1 indicates success- periphery) and right (hierarchical) panels of Fig. 3.
ful detection of the planted structure, we can see from Again we see that the normalized mutual information in-
the figure that the degree-corrected blockmodel signifi- creases with increasing λ for all algorithms but that the
cantly out-performs the uncorrected one in this simple degree-corrected blockmodel performs significantly bet-
test. As λ increases from zero, the mutual information ter than the uncorrected model. The degree-corrected
for all algorithms rises, but the corrected model starts to model with planted assignments consistently does the
detect some signatures of the planted structure almost best among the four options as we would expect, and
immediately and for λ = 21 returns a normalized mutual the degree-corrected model with random initializations
information above 0.7 for both initial conditions. The un- performs respectably in all cases, although it’s entirely
corrected model, by contrast, finds no planted structure possible that better performance could be obtained with
at λ = 21 for either initialization—including when the al- a better optimization strategy. The performance of
gorithm is initialized to the known correct answer. The the uncorrected model with random initializations, on
reason for this poor performance is precisely because of the other hand, is quite poor [35]. But perhaps the
the variation in degrees: for values of λ up to around 0.6 most telling comparison is the one between the degree-
the uncorrected model fits these networks better if ver- corrected model with random initial assignments and the
tices are assigned to groups according to their degree than uncorrected model with the planted assignment. This
if they are assigned according to the planted structure, comparison tilts the playing field heavily in favor of the
and hence the best-fit group structure has no correlation uncorrected model and yet, as Fig. 3 shows, the degree-
with the planted structure. corrected model still performs about as well as, and in
10
some cases better than, the uncorrected model. scales with the size of the network, which for example
prevents fits to a network of one size being used to gen-
erate synthetic networks of another size.
V. CONCLUSIONS But perhaps the chief current drawback of the model
is that the number K of blocks or groups in the network
In this paper, we have studied how one can incorpo- is assumed given. In most structure detection problems
rate heterogeneous vertex degrees into stochastic block- the number of groups is not known and a complete calcu-
models in a simple way, improving the performance of the lation will therefore require not only the algorithms de-
models for statistical inference of group structure. The scribed in this paper but also a method for estimating K.
resulting degree-corrected blockmodels can also be used Some previously suggested approaches to this problem in-
as generative models for creating benchmark networks, clude cross-validation [14], minimum description length
retaining the generality and tractability of other block- methods using two-part or universal codes [30], max-
models while producing degree sequences closer to those imization of a marginal likelihood [10], and nonpara-
of real networks. metric Bayesian methods. The marginal likelihood for
We have found the performance of the degree-corrected our degree-corrected blockmodel can be computed explic-
model for inference of group structure to be quantita- itly if one assumes conjugate priors on the parameters—
tively better on both synthetic and real-world test net- Dirichlet for θ and gamma for ω—but then one must
works than the uncorrected model. In networks with also choose the parameters of those priors, called hyper-
substantial degree heterogeneity, the uncorrected model parameters in the statistical literature. In principle one
prefers to split networks into groups of high and low de- wants to choose values of the hyperparameters that pro-
gree, and this preference can prevent it from finding the vide asymptotic consistency—the blockmodel should re-
true group memberships. The degree-corrected model turn the correct number of groups when applied to a
correctly ignores divisions based solely on degree and network generated from the same blockmodel, at least
hence is more sensitive to underlying structure. in certain limits. At present, however, it is not known
It seems likely that other more sophisticated block- how to make this choice. An alternative possibility is
models, such as the recently proposed overlapping and to note that the blockmodel used here is equivalent to a
mixed membership models, would benefit from incor- model that generates an ensemble of matrices with inte-
porating degree sequences also. In applications to on- ger entries, implying potential connections to the large
line social network data, for example, where overlapping statistical literature on contingency table analysis that
groups are common, there is frequently substantial de- could be helpful in determining the number of groups in
gree heterogeneity and hence potentially significant ben- a principled fashion. We leave these questions for future
efits to using a degree-corrected model. work.
The degree-corrected blockmodel is not without its
faults. For instance, the model as described can produce
an unrealistic number of zero-degree vertices, and is also Acknowledgments
unable to model some degree sequences, such as those in
which certain values of the degree are entirely forbidden. The authors would like to thank Joel Bader and Cris
As a model of real-world networks, it may also fail to Moore for useful conversations, and Joel Bader for shar-
accurately represent higher-order network structure such ing an early draft of Ref. [21]. This work was funded
as overrepresented network motifs or degree correlations. in part by the National Science Foundation under grant
From a statistical point of view, it is also somewhat un- DMS–0804778 and by the James S. McDonnell Founda-
satisfactory that the number of parameters in the model tion.
[1] P. W. Holland, K. B. Laskey, and S. Leinhardt, Social [7] L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas, J.
Networks 5, 109 (1983). Stat. Mech. P09008 (2005).
[2] K. Faust and S. Wasserman, Social Networks 14, 5 [8] S. Fortunato, Physics Reports 486, 75 (2010).
(1992). [9] M. Hastings, Phys. Rev. E 74, 035102 (2006).
[3] C. J. Anderson, S. Wasserman, and K. Faust, Social Net- [10] J. M. Hofman and C. H. Wiggins, Phys. Rev. Lett. 100,
works 14, 137 (1992). 258701 (2008).
[4] T. A. Snijders and K. Nowicki, Journal of Classification [11] P. J. Bickel and A. Chen, Proc. Natl. Acad. Sci. USA
14, 75 (1997). 106, 21068 (2009).
[5] A. Goldenberg, A. X. Zheng, S. E. Feinberg, and E. M. [12] Technically this is only true if the number K of groups
Airoldi, Foundations and Trends in Machine Learning 2, is moderate. If K can become arbitrarily large then it is
1 (2009). possible to represent any network trivially as a stochastic
[6] A. Condon and R. M. Karp, Random Structures and Al- blockmodel by making each vertex its own group and
gorithms 18, 116 (1999). setting the independent probability of each edge to one
11
or zero depending on whether an edge exists between the [25] M. E. J. Newman, Phys. Rev. E 67, 026126 (2003).
corresponding vertices in the empirical data. [26] M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
[13] P. Latouche, E. Birmele, and C. Ambroise, preprint [27] B. W. Kernighan and S. Lin, Bell System Technical Jour-
arXiv:0910.2098 (2009). nal 49, 291 (1970).
[14] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, [28] M. Meila, Journal of Multivariate Analysis 98, 873
J. Mach. Learn. Res. 9, 1981 (2008). (2007).
[15] Y. Park, C. Moore, and J. S. Bader, PLoS One 5, e8118 [29] W. Zachary, Journal of Anthropological Research 33, 452
(2010). (1977).
[16] Y. J. Wang and G. Y. Wong, Journal of the American [30] M. Rosvall and C. T. Bergstrom, Proc. Natl. Acad. Sci.
Statistical Association 82, 8 (1987). USA 104, 7327 (2007).
[17] A. Dasgupta, J. E. Hopcroft, and F. McSherry, in Pro- [31] L. A. Adamic and N. Glance, in Proceedings of the
ceedings of the 45th Annual Symposium on Foundations WWW-2005 Workshop on the Weblogging Ecosystem
of Computer Science, p. 602 (2004). (2005).
[18] J. Reichardt and D. White, Eur. Phys. J. B 60, 217 [32] F. Chung and L. Lu, Proc. Natl. Acad. Sci. USA 99,
(2007). 15879 (2002).
[19] M. Mørup and L. K. Hansen, in NIPS Workshop on An- [33] A. Lancichinetti, S. Fortunato, and F. Radicchi, Phys.
alyzing Networks and Learning With Graphs (2009). Rev. E 78, 046110 (2008).
[20] A. Coja-Oghlan and A. Lanka, SIAM Journal on Discrete [34] L. Danon, A. Diaz-Guilera, and A. Arenas, J. Stat. Mech.
Mathematics 23, 1682 (2009). P11010 (2006).
[21] A. S. Patterson and J. S. Bader, unpublished. [35] This algorithm’s performance may not be quite as bad as
[22] A. Clauset, C. Moore, and M. E. J. Newman, Nature the figure seems to indicate because for high λ the mu-
453, 98 (2008). tual information values are roughly bimodal, with most
[23] T. M. Cover and J. A. Thomas, Elements of Information values being close to either 0 or 1. The mean shown is
Theory, Second Edition (Wiley, New York, 2006). thus roughly the percentage of times that the uncorrected
[24] I. S. Dhillon, S. Mallela, and D. S. Modha, in Proceed- model successfully finds the planted structure. The same
ings of the Ninth ACM SIGKDD International Confer- does not apply for the other curves, where the error rep-
ence on Knowledge Discovery and Data Mining, p. 89 resents relatively minor variation about the mean.
(ACM, New York, 2003).