Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Stochastic blockmodels and community structure in networks

Brian Karrer1 and M. E. J. Newman1, 2


1
Department of Physics, University of Michigan, Ann Arbor, MI 48109
2
Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109
Stochastic blockmodels have been proposed as a tool for detecting community structure in net-
works as well as for generating synthetic networks for use as benchmarks. Most blockmodels,
however, ignore variation in vertex degree, making them unsuitable for applications to real-world
networks, which typically display broad degree distributions that can significantly distort the results.
Here we demonstrate how the generalization of blockmodels to incorporate this missing element leads
to an improved objective function for community detection in complex networks. We also propose a
heuristic algorithm for community detection using this objective function or its non-degree-corrected
counterpart and show that the degree-corrected version dramatically outperforms the uncorrected
one in both real-world and synthetic networks.

I. INTRODUCTION ably more general than traditional community detection


methods, since it can detect many forms of structure in
A stochastic blockmodel is a generative model for addition to simple communities of dense links. Moreover,
blocks, groups, or communities in networks. Stochastic it has the desirable property (not shared by most other
blockmodels fall in the general class of random graph approaches) of asymptotic consistency under certain con-
models and have a long tradition of study in the so- ditions [11], meaning that if applied to networks that
cial sciences and computer science [1–5]. In the simplest were themselves generated from the same blockmodel,
stochastic blockmodel (many more complicated variants the method can correctly recover the block structure.
are possible), each of n vertices is assigned to one of K Unfortunately, however, the simple blockmodel de-
blocks, groups, or communities, and undirected edges are scribed above does not work well in many applications
placed independently between vertex pairs with probabil- to real-world networks. The model is not flexible enough
ities that are a function only of the group memberships to generate networks with structure even moderately sim-
of the vertices. If we denote by gi the group to which ilar to that found in most empirical network data, mean-
vertex i belongs, then we can define a K × K matrix ψ ing that a posteriori fits to such data often give poor
of probabilities such that the matrix element ψgi gj is the results [12]. Just as the fitting of a straight line to intrin-
probability of an edge between vertices i and j. sically curved data is likely to miss important features of
While simple to describe, this model can produce a the data, so a fit of the simple stochastic blockmodel to
wide variety of different network structures. For exam- the structure of a complex network is likely to miss much
ple, a diagonal probability matrix would produce net- and, as we will show, can in some cases give radically
works with disconnected components, while the addi- incorrect answers.
tion of small off-diagonal elements would generate con- Attempts to overcome these problems by extending the
ventional “community structure”—a set of communities blockmodel have focused particularly on the use of (more
with dense internal connections and sparse external ones. complicated) p∗ or exponential random graph models,
Other choices of probability matrix can generate core- but while these are conceptually appealing, they quickly
periphery, hierarchical, or multipartite structures, among lose the analytic tractability of the original blockmodel as
others. This versatility, combined with analytic tractabil- their complexity increases. Other recent attempts to ex-
ity, has made the blockmodel a popular tool in a number tend blockmodels take the flavor of mixture models that
of contexts. For instance, the planted partition model [6], allow vertices to participate in overlapping groups [13] or
which is equivalent to the model above with a specific to have mixed membership [14, 15].
parametrization of the matrix ψ, is widely used as a the- In this paper we adopt a different approach, consider-
oretical testbed for graph partitioning and community ing a simple and apparently minor extension of the classic
detection algorithms [7, 8]. stochastic blockmodel to include heterogeneity in the de-
Another important application, and the one that is grees of vertices. Despite its innocuous appearance, this
the primary focus of this paper, is the fitting of block- extension turns out to have substantial effects, as we will
models to empirical network data as a way of discov- see. A number of previous authors have considered sim-
ering block structure, an approach referred to in the so- ilar extensions of blockmodels. As early as 1987, Wang
cial networks literature as a posteriori blockmodeling [4]. and Wong [16] proposed a stochastic blockmodel for di-
A number of ways of performing the fitting have been rected simple graphs incorporating arbitrary expected in-
suggested, including some that make use of techniques and out-degrees, along with a selection of other features.
from physics [9, 10]. A posteriori blockmodeling can be Unfortunately, this model is not solvable for its param-
thought of as a method for community structure detec- eter values in closed form which limits its usefulness for
tion in networks [8], though blockmodeling is consider- the types of calculations we consider. Several more recent
2

works have also explored blockmodels with various forms tially no difference between the model described here and
of degree heterogeneity [17–21], motivated largely by the the standard blockmodel.
recent focus on degree distributions in the networks lit- With this in mind, the model we study is now defined
erature. We note particularly the currently unpublished as follows. Let G be an undirected multigraph on n ver-
work of Patterson and Bader [21], who apply a variational tices, possibly including self-edges, and let Aij be an el-
Bayes approach to a model close, though not identical, ement of the adjacency matrix of the multigraph. Recall
to the one considered here. that the adjacency matrix for a multigraph is convention-
In this paper we build upon the ideas of these au- ally defined such that Aij is equal to the number of edges
thors but take a somewhat different tack, focusing on between vertices i and j when i 6= j, but the diagonal el-
the question of why degree heterogeneity in blockmodels ement Aii is equal to twice the number of self-edges from
is a good idea. To study this question, we develop a i to itself (and hence is always an even number).
degree-corrected blockmodel with closed-form parameter We let the number of edges between each pair of ver-
solutions, which allows us more directly to compare tra- tices (or between a vertex and itself in the case of self-
ditional and degree-corrected models. As we show, the edges) be independently Poisson distributed and define
incorporation of degree heterogeneity in the stochastic ωrs to be the expected value of the adjacency matrix el-
blockmodel results in a model that in practice performs ement Aij for vertices i and j lying in groups r and s
much better, giving significantly improved fits to network respectively. Note that this implies that the expected
data, while being only slightly more complex than the number of self-edges at a vertex in group r is 21 ωrr be-
simple model described above. Although we here exam- cause of the factor of two in the definition of the diagonal
ine only the simplest version of this idea, the approaches elements of the adjacency matrix.
we explore could in principle be incorporated into other Now we can write the probability P (G|ω, g) of graph G
blockmodels, such as the overlapping or mixed member- given the parameters and group assignments as
ship models.
In outline, the paper is as follows. We first review Y (ωgi gj )Aij 
P (G|ω, g) = exp −ωgi gj
the ideas behind the ordinary stochastic blockmodel to Aij !
i<j
understand why degree heterogeneity causes problems. Aii /2
1
Then we introduce a degree-corrected version of the Y
2 ωgi gi
exp − 21 ωgi gi .

model and demonstrate its use in a posteriori blockmod- × (1)
i
(Aii /2)!
eling to infer group memberships in empirical network
data, showing that the degree-corrected model outper- Given that Aij = Aji and ωrs = ωsr , Eq. (1) can after
forms the original model both on actual networks and on a small amount of manipulation be rewritten in the more
new synthetic benchmarks. The benchmarks introduced, convenient form
which generalize previous benchmarks for community de-
tection, may also be of independent interest. 1
P (G|ω, g) = Q Q A /2
A
i<j ij ! i2
ii (Aii /2)!
Y
mrs /2
exp − 21 nr ns ωrs ,

× ωrs (2)
II. STANDARD STOCHASTIC BLOCKMODEL rs

In this section we review briefly the use of the original, where nr is the number of vertices in group r and
non-degree-corrected blockmodel, focusing on undirected X
networks since they are the most commonly studied. mrs = Aij δgi ,r δgj ,s , (3)
ij
For consistency with the degree-corrected case we will
allow our networks to contain both multi-edges and self- which is the total number of edges between groups r and
edges, even though many real-world networks have no group s, or twice that number if r = s.
such edges. Like most random graph models for sparse Our goal is to maximize this probability with respect to
networks the incorporation of multi-edges and self-edges the unknown model parameters ωrs and the group assign-
makes computations easier without affecting the funda- ments of the vertices. In most cases, it will in fact be sim-
mental outcome significantly—typically their inclusion pler to maximize the logarithm of the probability (whose
gives rise to corrections to the results that are of or- maximum is in the same place). Neglecting constants
der 1/n and hence vanishing as the size n of the net- and terms independent of the parameters and group as-
work becomes large. For networks with multi-edges, the signments (i.e., independent of ωrs , nr , and mrs ), the
previously-defined probability ψrs of an edge between logarithm is given by
vertices in groups r and s is replaced by the expected
number of such edges, and the actual number of edges log P (G|ω, g) =
X
(mrs log ωrs − nr ns ωrs ). (4)
between any pair of vertices will be drawn from a Pois- rs
son distribution with this mean. In the limit of a large
sparse graph, where the probability of an edge and the We will maximize this expression in two stages, first
expected number of edges become equal, there is essen- with respect to the model parameters ωrs , then with
3

respect to the group assignments gi . The maximum- for X and Y instead of the assumed true distribution pK .
likelihood values ω̂rs of the model parameters (where hat- So intuitively it can be considered as measuring how far
ted variables indicate maximum-likelihood estimates) are pK is from p1 . The most likely group assignments under
found by simple differentiation to be the ordinary stochastic blockmodel are then those as-
mrs signments that require the most information to describe
ω̂rs = , (5) starting from a model that does not have group structure.
nr ns
This type of approach, in which one constructs an ob-
and the value P of Eq. (4) at this maximum is jective function that measures the difference between an
log P (G|ω̂, g) = rs mrs log(mrs /nr ns ) − 2m, where
observed quantity and the expected value of the same
m = 21 rs mrs is the total number of edges in the net- quantity under an appropriate null model, is common in
P
work. Dropping the final constant, we define the unnor- work on community detection in networks. One widely
malized log-likelihood for the group assignment g: used objective function is the so-called modularity:
X mrs 1 X
L(G|g) = mrs log . (6) Q(g) = [Aij − Pij ]δ(gi , gj ), (9)
rs
nr ns 2m ij

The maximum of this quantity with respect to the group where Aij is an element of the adjacency matrix and Pij
assignments now tells us the most likely set of assign- is the expected value of the same element under some
ments [11, 22]. In effect, Eq. (6) gives us an objective or null model. The null model assumed in our blockmodel
quality function which is large for “good” group assign- calculation is one in which Pij is constant. Making the
ments and small for “poor” ones. Many such objective same choice for the modularity would lead to
functions have been defined elsewhere in the literature on
community detection and graph partitioning, but Eq. (6) K
X
differs from most other choices in being derived from first Q(g) = [pK (r, r) − p1 (r, r)]. (10)
principles, rather than heuristically motivated or simply r=1
proposed ad hoc.
The modularity, however, is not normally used this way
Equation (6) has an interesting information-theoretic
and for good reason. This null model, corresponding to
interpretation. By adding and dividing by constant fac-
a multigraph version of the Erdős–Rényi random graph,
tors of the total number of vertices and edges the equa-
produces highly unrealistic networks, even for networks
tion can be written in the alternative form
with no community structure. Specifically, it produces
X mrs mrs /2m networks with Poisson degree distributions, in stark con-
L(G|g) = log , (7)
2m nr ns /n
2 trast to most real networks, which tend to have broad
rs
distributions of vertex degree. To avoid this problem,
where again we have neglected irrelevant constants. Now modularity is usually defined using a different null model
imagine, for a given set of group assignments, that we that fixes the expected degree sequence to be the same
choose an edge uniformly at random from our network, as that of the observed network. Within this model
and let X be the group assignment at one (randomly se- Pij = ki kj /2m where ki is the degree of vertex i. Then
lected) end of the edge and Y be the group assignment at the probability distribution over the group assignments
the other end of the edge. The probability distribution at the end of a randomly chosen edge becomes
of the variables X and Y is then pK (X = r, Y = s) = κr κs
pK (r, s) = mrs /2m, which appears twice in the above ex- pdegree (X = r, Y = s) = pdegree (r, s) = , (11)
2m 2m
pression. The remaining terms in the denominator of the
logarithm in (7) are equal to the expected value of the where
same probability in a network with the same group as- X X
signments but different edges, the edges now being placed κr = mrs = ki δgi ,r (12)
completely at random without regard for the groups. Call s i
this second distribution p1 (r, s). Equation (7) can then is the total number of ends of edges, commonly called
be written stubs, that emerge from vertices in group r, or equiva-
X pK (r, s) lently the sum of the degrees of the vertices in group r.
L(G|g) = pK (r, s) log , (8) (Note that Eq. (12) correctly counts two stubs for edges
rs
p1 (r, s)
that both start and end in group r.) Then the desired
which is the well-known Kullback–Leibler divergence be- group assignments are given by the maximum of
tween the probability distributions pK and p1 [23]. K
The Kullback–Leibler divergence is not precisely a dis- X
Q(g) = [pK (r, r) − pdegree (r, r)]. (13)
tance measure since it’s not symmetric in pK and p1 .
r=1
However, if the logarithms are taken base 2 then it mea-
sures the expected number of extra bits required to en- This choice of null model is found to give significantly
code X and Y if p1 is mistakenly used as the distribution better results than the original uniform model because
4

it allows for the fact that vertices with high degree are, form
all other things being equal, more likely to be connected
than those with low degree, simply because they have 1
P (G|θ, ω, g) = Q Q A /2
more edges. From an information-theoretic viewpoint, an i<j Aij ! i2
ii (Aii /2)!
edge between two high-degree vertices is less surprising Y Y
θiki mrs /2
exp − 21 ωrs , (16)

× ωrs
than an edge between two low-degree vertices and we get
i rs
better results if we incorporate this observation in our
model. with ki being the degree of vertex i as previously and mrs
Returning to the stochastic blockmodel, using p1 in- defined as in Eq. (3). As before, rather than maximizing
stead of pdegree in the objective function causes problems this probability, it is more convenient to maximize its
similar to those that affect the modularity. Fits to the logarithm, which, ignoring constants, is
model may incorrectly suggest that structure in the net-
work due merely to the degree sequence is a result in-
X X
log P (G|θ, ω, g) = 2 ki log θi + (mrs log ωrs − ωrs ).
stead of group memberships. We will shortly see explicit i rs
real-world cases in which such incorrect conclusions arise. (17)
The solution to this problem, as with the modularity, is Allowing for the constraint (15), the maximum-likelihood
to define a stochastic blockmodel that directly incorpo- values of the parameters θi and ωrs are then given by
rates arbitrary heterogeneous degree distributions.
ki
θ̂i = , ω̂rs = mrs , (18)
κgi

III. DEGREE-CORRECTED STOCHASTIC where κr is the sum of the degrees in group r as be-
BLOCKMODEL fore (see Eq. (12)). This maximum-likelihood parameter
estimate has the appealing property of preserving the
In the degree-corrected blockmodel, the probability expected numbers of edges between groups and the ex-
distribution over undirected multigraphs with self-edges pected degree sequence of the network. To see this, let
(again denoted G) depends not only on the parameters hxi be the average of x in the ensemble of graphs with
introduced previously but also on a new set of parame- parameters (18). Then the expected number of edges
ters θi controlling the expected degrees of vertices i. between groups r and s is
As before, we assume there are K groups, ωrs is a
X ki kj mgi gj
K × K symmetric matrix of parameters controlling edges
X
hAij iδgi ,r δgj ,s = δgi ,r δgj ,s = mrs , (19)
between groups r and s, and gi is the group assignment ij ij
κgi κgj
of vertex i. As in the uncorrected blockmodel, let the
numbers of edges each be drawn from a Poisson distri- where we have made use of Eq. (12). Similarly, the aver-
bution, but now, following [20] and [21], let the expected age degree of vertex i in the ensemble is
value of the adjacency matrix element Aij be θi θj ωgi gj .
Then graph G has probability X X ki X kj
hAij i = θ̂i θ̂j ω̂gi gj = mg i g j
j j
κgi j κgj
Y (θi θj ωgi gj )Aij
P (G|θ, ω, g) = exp(−θi θj ωgi gj ) ki X X kj
Aij ! = mg ,r δgj ,r
i<j κgi j r κr i
1 2
Aii /2
2 θ i ωgi gi ki X
Y
exp − 21 θi2 ωgi gi .

× = mgi ,r = ki . (20)
(Aii /2)! κgi r
i
(14)
Traditional blockmodels, by contrast, preserve only the
expected value of the matrix mrs and not the expected
The θ parameters are arbitrary to within a multiplicative degree—every vertex in group r in the
constant which is absorbed into the ω parameters. Their P traditional block-
model has the same expected degree j mr,gj /(nr ngj ) =
normalization can be fixed by imposing the constraint
κr /nr .
X Substituting Eq. (18) into Eq. (17), the maximum of
θi δgi ,r = 1 (15) log P (G|θ, ω, g) for the degree-corrected blockmodel is
i
X ki X
log P (G|θ, ω, g) = 2 + ki log
mrs log mrs − 2m.
for all groups r, which makes θi equal to the probability i
κgi rs
that an edge connected to the community to which i be- (21)
longs lands on i itself. With this constraint, the probabil- where as before m is the total number of edges in the net-
ity P (G|θ, ω, g) can be simplified to the more convenient work. The first term in this expression can be rewritten
5

as Information-theoretic quantities have been proposed


previously as possible objective functions for community
X ki X XX
2 ki log =2 ki log ki − 2 ki δgi ,r log κr detection or clustering. Dhillon et al. [24], for instance,
i
κgi i i r used mutual information as an objective function for clus-
X X X tering bipartite graphs, as part of an approach they call
=2 ki log ki − κr log κr − κs log κs
“information-theoretic co-clustering.” Equation (23) is
i r s
X X also somewhat reminiscent of an objective function of
=2 ki log ki − mrs log κr κs , (22) Reichardt et al. [18] which, if translated into our termi-
i rs nology and adapted to undirected networks, is equivalent
to the total variation distance between pK and pdegree ,
where we have again made use of Eq. (12). Substituting variation distance being an alternative measure of the
back into Eq. (21) and dropping overall constants then distance between two probability distributions. While
gives us an unnormalized log-likelihood function of the variation distance and the Kullback–Leibler diver-
X mrs gence are related, both falling in the class of so-called f -
L(G|g) = mrs log . (23) divergences, the optimization of variation distance does
rs
κ r κs
not, to our knowledge, correspond to maximizing the like-
Notice that the only difference between this degree- lihood of any generative model, and there are significant
corrected log-likelihood and the uncorrected log- benefits to the connection with generative models. In
likelihood of Eq. (6) is the replacement of the number nr particular, one can easily create networks from the en-
of vertices in each group by the number κr of stubs. Mi- semble of our model and in addition the connection to
nor though this replacement may seem, however, it has generative processes means that a posteriori blockmodel-
a big effect, as we will shortly see. ing fits into standard frameworks for statistical inference,
As before, we can interpret the optimization of the ob- which are well studied and understood in other contexts.
jective function (23) through the lens of information the- Equation (23) could also be used as a measure of as-
ory. Adding and multiplying by constant factors allows sortative mixing among discrete vertex characteristics in
us to write the log-likelihood in the form networks [25, 26]. In a network such as a social net-
work, where connections between individuals can depend
X mrs mrs /2m on characteristics such as nationality, race, or gender, our
L(G|g) = log , (24) objective function could be used, for instance, to quantify
rs
2m (κr /2m)(κs /2m)
which of several such characteristics is more predictive of
which is the Kullback–Leibler divergence between pK network structure.
and pdegree . Alternatively, noting that A useful property of the objective function in Eq. (23)
P pdegree is the when used for a posteriori blockmodeling is that it is pos-
product of the marginal distributions r pK (r, s) and
P sible to quickly compute the change in the log-likelihood
s pK (r, s), this particular form of divergence can also
be thought of as the mutual information of the random when a single vertex switches groups. When a vertex
variables representing the group labels at either end of changes groups from r to s only κr , κs , mrt , and mst
a randomly chosen edge. Loosely speaking, the best fit (for any t) can change (with mrs symmetric). This
to the degree-corrected stochastic blockmodel gives the means that many terms cancel out of the difference of
group assignment that is most surprising compared to the log-likelihoods and can be ignored in the computations.
null model with given expected degree sequence, whereas
the ordinary stochastic blockmodel gives the group as-
signment that is most surprising compared to the Erdős–
Rényi random graph.

Consider moving vertex i from community r to community s. Let kit be the number of edges from vertex i to
vertices in group t excluding self-edges, and let ui be the number of self-edges for vertex i. These quantities are the
same for all possible moves of vertex i. Define a(x) = 2x log x and b(x) = x log x where a(0) = 0 and b(0) = 0. Then
the change in the log-likelihood can be written:
X 
∆L = a(mrt + kit ) − a(mrt ) + a(mst + kit ) − a(mst ) + a(mrs + kir − kis ) − a(mrs )
t6=r,s

+ b(mrr − 2(kir + ui )) − b(mrr ) + b(mss + 2(kis + ui )) − b(mss ) − a(κr − ki ) + a(κr ) − a(κs + ki ) + a(κs ).
(25)

This quantity can be evaluated in time O(K + hki) on average and finding the s that gives the maximum ∆L for
6

given i and r can thus be done in time O(K(K + hki)).


Because these computations can be done quickly for a
reasonable number of communities, local vertex switch-
ing algorithms, such as single-vertex Monte Carlo, can be
implemented easily. Monte Carlo, however, is slow, and
we have found competitive results using a local heuristic
algorithm similar in spirit to the Kernighan–Lin algo-
rithm used in minimum-cut graph partitioning [27].
Briefly, in this algorithm we divide the network into
some initial set of K communities at random. Then we
repeatedly move a vertex from one group to another, se-
lecting at each step the move that will most increase the (a) Without degree correction
objective function—or least decrease it if no increase is
possible—subject to the restriction that each vertex may
be moved only once. When all vertices have been moved,
we inspect the states through which the system passed
from start to end of the procedure, select the one with the
highest objective score, and use this state as the starting
point for a new iteration of the same procedure. When
a complete such iteration passes without any increase in
the objective function, the algorithm ends. As with many
deterministic algorithms, we have found it helpful to run
the calculation with several different random initial con-
ditions and take the best result over all runs.
(b) With degree-correction

FIG. 1: Divisions of the karate club network found using the


IV. RESULTS
(a) uncorrected and (b) corrected blockmodels. The size of a
vertex is proportional to its degree and vertex color reflects
We have tested the performance of the degree- inferred group membership. The dashed line indicates the
corrected and uncorrected blockmodels in applications split observed in real life.
both to real-world networks with known community as-
signments and to a range of synthetic (i.e., computer-
generated) networks. We evaluate performance by quan- A. Empirical networks
titative comparison of the community assignments found
by the algorithms and the known assignments. As a met-
We have tested our algorithms on real-world networks
ric for comparison we use the normalized mutual infor-
ranging in size from tens to tens of thousands of ver-
mation, which is defined as follows [7]. Let nrs be the
tices. In networks with highly homogeneous degree distri-
number of vertices in community r in the inferred group
butions we find little difference in performance between
assignment and in community s in the true assignment.
the degree-corrected and uncorrected blockmodels, which
Then define p(X = r, Y = s) = nrs /n to be the joint
is expected since for networks with uniform degrees the
probability that a randomly selected vertex is in r in the
two models have the same likelihood up to an additive
inferred assignment and s in the true assignment. Using
constant. Our primary concern, therefore, is with net-
this joint probability over the random variables X and
works that have heterogeneous degree distributions, and
Y , the normalized mutual information is
we here give two examples that show the effects of het-
erogeneity clearly.
2 MI(X, Y ) The first example, widely studied in the field, is the
NMI(X, Y ) = , (26) “karate club” network of Zachary [29]. This is a social
H(X) + H(Y )
network representing friendship patterns between the 34
members of a karate club at a US university. The club
where MI(X, Y ) is the mutual information and H(Z) is in question is known to have split into two different fac-
the entropy of random variable Z. The normalized mu- tions as a result of an internal dispute, and the members
tual information measures the similarity of the two com- of each faction are known. It has been demonstrated
munity assignments and takes a value of one if the as- that the factions can be extracted from a knowledge
signments are identical and zero if they are uncorrelated. of the complete network by many community detection
A discussion of this and other measures can be found in methods.
Ref. [28]. Applying our inference algorithms to this network, us-
7

ing corrected and uncorrected blockmodels with K = 2,


we find the results shown in Fig. 1. As pointed out also
by other authors [11, 30], the non-degree-corrected block-
model fails to split the network into the known factions
(indicated by the dashed line in the figure), instead split-
ting it into a group composed of high-degree vertices and
another of low. The degree-corrected model, on the other
hand, splits the vertices according to the known commu-
nities, except for the misidentification of one vertex on
the boundary of the two groups. (The same vertex is also
misplaced by a number of other commonly used commu-
nity detection algorithms.)
The failure of the uncorrected model in this context
is precisely because it does not take the degree sequence
into account. The a priori probability of an edge be-
tween two vertices varies as the product of their degrees,
a variation that can be fit by the uncorrected blockmodel (a) Without degree-correction
if we divide the network into high- and low-degree groups.
Given that we have only one set of groups to assign, how-
ever, we are obliged to choose between this fit and the
true community structure. In the present case it turns
out that the division into high and low degrees gives the
higher likelihood and so it is this division that the algo-
rithm returns. In the degree-corrected blockmodel, by
contrast, the variation of edge probability with degree is
already included in the functional form of the likelihood,
which frees up the block structure for fitting to the true
communities.
Moreover it is apparent that this behavior is not lim-
ited to the case K = 2. For K = 3, the ordinary
stochastic blockmodel will, for sufficiently heterogeneous
degrees, be biased towards splitting into three groups by
degree—high, medium, and low—and similarly for higher
values of K. It is of course possible that the true com-
munity structure itself corresponds entirely or mainly to
(b) With degree-correction
groups of high and low degree, but we only want our
model to find this structure if it is still statistically sur-
prising once we know about the degree sequence, and this
FIG. 2: Divisions of the political blog network found using the
is precisely what the corrected model does.
(a) uncorrected and (b) corrected blockmodels. The size of a
As a second real-world example we show in Fig. 2 an vertex is proportional to its degree and vertex color reflects
application to a network of political blogs assembled by inferred group membership. The division in (b) corresponds
Adamic and Glance [31]. This network is composed of roughly to the division between liberal and conservative blogs
blogs (i.e., personal or group web diaries) about US pol- given in [31].
itics and the web links between them, as captured on
a single day in 2005. The blogs have known political
leanings and were labeled by Adamic and Glance as ei- (To make sure that these results were not due to a fail-
ther liberal or conservative in the data set. We consider ure of the heuristic optimization scheme, we also checked
the network in undirected form and examine only the that the group assignments found by the heuristic have a
largest connected component, which has 1222 vertices. higher objective score than the known group assignments,
Figure 2 shows that, as with the karate club, the uncor- and that using the known assignments as the initial con-
rected stochastic blockmodel splits the vertices into high- dition for the optimization recovers the same group as-
and low-degree groups, while the degree-corrected model signments as found with random initial conditions.)
finds a split more aligned with the political division of
the network. While not matching the known labeling ex-
actly, the split generated by the degree-corrected model B. Generation of synthetic networks
has a normalized mutual information of 0.72 with the la-
beling of Adamic and Glance, compared with 0.0001 for
the uncorrected model. We turn now to synthetic networks. The networks we
use are themselves generated from the degree-corrected
8

stochastic blockmodel, which is ideally designed to play In mixed models such as these, each edge in effect has a
exactly this role. (Indeed, though it is not the primary probability λ of being chosen from the planted structure
focus of this article, we believe that the blockmodel may and 1−λ of being chosen from the null model. Among the
in general be of use as a source of flexible and challeng- edges attached to a given vertex, the expected fraction
ing benchmark networks for testing the performance of drawn from the planted structure is λ and the remainder
community detection strategies.) are drawn randomly.
In order to generate networks we must first choose the Once we have chosen our values for g, θ, and ω,
values of g, ω, and θ. The group assignments g can be the network generation itself is a straightforward imple-
chosen in any way we please, and we can also choose mentation of the blockmodel: we first draw a Poisson-
freely the values for the expected degrees of all vertices, distributed number of edges for each pair of groups r, s
which fixes the θ variables according to Eq. (18). Choos- with mean ωrs (or 21 ωrs when r = s), then we assign each
ing the values of ωrs involves a little more work. In prin- end of an edge to a vertex in the appropriate group with
ciple, any set of nonnegative values is acceptable
P provided probability θi .
it is symmetric in r and s and satisfies s ωrs = κr , with
κr as in Eq. (12). However, because we wish to be able
to vary the level of community structure in our networks C. Performance on synthetic networks
we choose ωrs in the present case to have the particular
form
There are two primary considerations in comparing
ωrs = planted
λωrs + (1 − random
λ)ωrs . (27) the degree-corrected and uncorrected blockmodels on our
synthetic benchmark networks. The first is how close
This form allows us to interpolate linearly between the the group assignments found in our calculations are to
planted random the planted group assignments. The second is the per-
values ωrs and ωrs using the parameter λ. The
random formance of the heuristic optimization algorithm. It is
ωrs represents a fully random network with no group
structure; it is defined to be the expected value of ωrs in possible that the maximum-likelihood group assignment
a random graph with fixed expected degrees [32], which may be close to the true group assignment but that our
random heuristic is unable to find it. And if the heuristic per-
is simply ωrs = κr κs /2m.
planted forms better in general for either the corrected or un-
The value of ωrs by contrast is chosen to create
group structure. A simple example with four groups is: corrected blockmodel it may make comparisons between
the models unreliable: we want to claim that the degree-
κ1 0 0 0
 
corrected model gives better results than the uncorrected
 0 κ2 0 0  version because it has a better objective function for het-
ω planted =  . (28)
0 0 κ3 0  erogeneous networks and not because we used a biased
0 0 0 κ4 optimization algorithm.
To shed light on these questions we take the following
With this choice, all edges will be placed within com- approach. For both the degree-corrected model and the
munities when λ = 1 and none between communities. uncorrected model we perform tests with random initial
When λ = 0, on the other hand, all edges will be placed conditions and with initial conditions equal to the known
randomly, conditioned on the degree sequence, and for in- planted group structure. The latter (planted) initializa-
termediate values of λ we interpolate between these two tions tell us whether the planted group assignment, or
extremes in a controlled fashion. (This model is similar to something close to it, is a local optimum of the respec-
the benchmark network ensemble previously proposed by tive objective function—if it is, our heuristic should find
Lancichinetti [33]—roughly speaking it is the “canonical that optimum most of the time and return a final assign-
ensemble” version of the “microcanonical” model in [33].) ment similar to the planted one. This should be true for
planted
More complicated choices of ωrs are also possible. essentially any reasonable heuristic, even a biased one,
Examples include the core-periphery structure since the heuristic will be making only minimal changes

κ1 − κ2 κ2
 to the group assignments (or none at all).
ω planted = , (29) For small values of λ we expect that the planted as-
κ2 0
signment is not near a local maximum, but for large λ we
where κ1 ≥ κ2 . In the case where κ1 ≃ κ2 this choice would hope that it is. Thus, if we discover in the process
also generates approximately bipartite networks, where of running our heuristic that it is not, it strongly sug-
most edges run between the two groups and few lie inside. gests we have made a poor choice of objective function
Another possibility is a hierarchical structure of the form (and this conclusion should hold even if the heuristic is
  biased).
κ1 − A A 0 The results of such tests on our synthetic networks are
ω planted =  A κ2 − A 0  , (30) shown in Fig. 3. We plot the normalized mutual infor-
0 0 κ3 mation as a function of λ for various choices of planted
structure. Each data point represents an average over 30
where A ≤ min(κ1 , κ2 ). networks of size n = 1000 for both the degree-corrected
9

1.0 1.0 1.0

0.8 0.8 0.8

0.6 0.6 0.6


NMI

NMI

NMI
0.4 0.4 0.4

0.2 0.2 0.2

0.0 0.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

FIG. 3: The average normalized mutual information as a function of λ for the three synthetic tests described in the text.
Filled squares and transparent circles indicate tests initialized with planted and random assignments respectively. Green points
denote results for the degree-corrected blockmodel and black for the ordinary uncorrected model. The left, middle, and right
panels show respectively the results for the two-group two-degree networks, core-periphery networks, and hierarchical networks.
The error bars indicate the standard error on the mean computed from 30 networks per data point.

and uncorrected objective functions. In the case of ran- We have also tested our blockmodels against syn-
dom initializations, ten initializations were performed for thetic networks with two other types of structure, one
each network and we take the best result among the ten. the core-periphery or approximately bipartite structure
The left panel in the figure shows results for networks of Eq. (29) and the other the hierarchical structure of
with two communities and just two possible values of the Eq. (30). In these examples we use a more realistic de-
expected degree, 10 and 30. Each of the 1000 vertices gree distribution that approximately follows a power law
was assigned to one of the four possible combinations of with a minimum expected degree of 10 and an exponent
degree and community with equal probability, and the of −2.5. For the core-periphery networks we randomly
planted structure was chosen diagonal, as in Eq. (28). assign vertices to one of the two groups, while for the
The green points in the figure indicate the perfor- hierarchical networks we fix 500 vertices to be in the
mance of the degree-corrected blockmodel, while the first of the three groups, put the rest randomly in the
black points are for the uncorrected model. Solid squares other two, and set A = 41 min(κ1 , κ2 ). (It has been sug-
and open circles show performance starting from the gested that choosing non-equal sizes for groups in this
planted community structure and random assignments way presents a harder challenge for structure detection
respectively. Bearing in mind that λ = 0 corresponds to algorithms [30, 34].)
zero planted structure (in which case neither algorithm The performance of our blockmodels on these two
should find any significant result) and that a normal- classes of networks is shown in the middle (core-
ized mutual information approaching 1 indicates success- periphery) and right (hierarchical) panels of Fig. 3.
ful detection of the planted structure, we can see from Again we see that the normalized mutual information in-
the figure that the degree-corrected blockmodel signifi- creases with increasing λ for all algorithms but that the
cantly out-performs the uncorrected one in this simple degree-corrected blockmodel performs significantly bet-
test. As λ increases from zero, the mutual information ter than the uncorrected model. The degree-corrected
for all algorithms rises, but the corrected model starts to model with planted assignments consistently does the
detect some signatures of the planted structure almost best among the four options as we would expect, and
immediately and for λ = 21 returns a normalized mutual the degree-corrected model with random initializations
information above 0.7 for both initial conditions. The un- performs respectably in all cases, although it’s entirely
corrected model, by contrast, finds no planted structure possible that better performance could be obtained with
at λ = 21 for either initialization—including when the al- a better optimization strategy. The performance of
gorithm is initialized to the known correct answer. The the uncorrected model with random initializations, on
reason for this poor performance is precisely because of the other hand, is quite poor [35]. But perhaps the
the variation in degrees: for values of λ up to around 0.6 most telling comparison is the one between the degree-
the uncorrected model fits these networks better if ver- corrected model with random initial assignments and the
tices are assigned to groups according to their degree than uncorrected model with the planted assignment. This
if they are assigned according to the planted structure, comparison tilts the playing field heavily in favor of the
and hence the best-fit group structure has no correlation uncorrected model and yet, as Fig. 3 shows, the degree-
with the planted structure. corrected model still performs about as well as, and in
10

some cases better than, the uncorrected model. scales with the size of the network, which for example
prevents fits to a network of one size being used to gen-
erate synthetic networks of another size.
V. CONCLUSIONS But perhaps the chief current drawback of the model
is that the number K of blocks or groups in the network
In this paper, we have studied how one can incorpo- is assumed given. In most structure detection problems
rate heterogeneous vertex degrees into stochastic block- the number of groups is not known and a complete calcu-
models in a simple way, improving the performance of the lation will therefore require not only the algorithms de-
models for statistical inference of group structure. The scribed in this paper but also a method for estimating K.
resulting degree-corrected blockmodels can also be used Some previously suggested approaches to this problem in-
as generative models for creating benchmark networks, clude cross-validation [14], minimum description length
retaining the generality and tractability of other block- methods using two-part or universal codes [30], max-
models while producing degree sequences closer to those imization of a marginal likelihood [10], and nonpara-
of real networks. metric Bayesian methods. The marginal likelihood for
We have found the performance of the degree-corrected our degree-corrected blockmodel can be computed explic-
model for inference of group structure to be quantita- itly if one assumes conjugate priors on the parameters—
tively better on both synthetic and real-world test net- Dirichlet for θ and gamma for ω—but then one must
works than the uncorrected model. In networks with also choose the parameters of those priors, called hyper-
substantial degree heterogeneity, the uncorrected model parameters in the statistical literature. In principle one
prefers to split networks into groups of high and low de- wants to choose values of the hyperparameters that pro-
gree, and this preference can prevent it from finding the vide asymptotic consistency—the blockmodel should re-
true group memberships. The degree-corrected model turn the correct number of groups when applied to a
correctly ignores divisions based solely on degree and network generated from the same blockmodel, at least
hence is more sensitive to underlying structure. in certain limits. At present, however, it is not known
It seems likely that other more sophisticated block- how to make this choice. An alternative possibility is
models, such as the recently proposed overlapping and to note that the blockmodel used here is equivalent to a
mixed membership models, would benefit from incor- model that generates an ensemble of matrices with inte-
porating degree sequences also. In applications to on- ger entries, implying potential connections to the large
line social network data, for example, where overlapping statistical literature on contingency table analysis that
groups are common, there is frequently substantial de- could be helpful in determining the number of groups in
gree heterogeneity and hence potentially significant ben- a principled fashion. We leave these questions for future
efits to using a degree-corrected model. work.
The degree-corrected blockmodel is not without its
faults. For instance, the model as described can produce
an unrealistic number of zero-degree vertices, and is also Acknowledgments
unable to model some degree sequences, such as those in
which certain values of the degree are entirely forbidden. The authors would like to thank Joel Bader and Cris
As a model of real-world networks, it may also fail to Moore for useful conversations, and Joel Bader for shar-
accurately represent higher-order network structure such ing an early draft of Ref. [21]. This work was funded
as overrepresented network motifs or degree correlations. in part by the National Science Foundation under grant
From a statistical point of view, it is also somewhat un- DMS–0804778 and by the James S. McDonnell Founda-
satisfactory that the number of parameters in the model tion.

[1] P. W. Holland, K. B. Laskey, and S. Leinhardt, Social [7] L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas, J.
Networks 5, 109 (1983). Stat. Mech. P09008 (2005).
[2] K. Faust and S. Wasserman, Social Networks 14, 5 [8] S. Fortunato, Physics Reports 486, 75 (2010).
(1992). [9] M. Hastings, Phys. Rev. E 74, 035102 (2006).
[3] C. J. Anderson, S. Wasserman, and K. Faust, Social Net- [10] J. M. Hofman and C. H. Wiggins, Phys. Rev. Lett. 100,
works 14, 137 (1992). 258701 (2008).
[4] T. A. Snijders and K. Nowicki, Journal of Classification [11] P. J. Bickel and A. Chen, Proc. Natl. Acad. Sci. USA
14, 75 (1997). 106, 21068 (2009).
[5] A. Goldenberg, A. X. Zheng, S. E. Feinberg, and E. M. [12] Technically this is only true if the number K of groups
Airoldi, Foundations and Trends in Machine Learning 2, is moderate. If K can become arbitrarily large then it is
1 (2009). possible to represent any network trivially as a stochastic
[6] A. Condon and R. M. Karp, Random Structures and Al- blockmodel by making each vertex its own group and
gorithms 18, 116 (1999). setting the independent probability of each edge to one
11

or zero depending on whether an edge exists between the [25] M. E. J. Newman, Phys. Rev. E 67, 026126 (2003).
corresponding vertices in the empirical data. [26] M. E. J. Newman, Phys. Rev. Lett. 89, 208701 (2002).
[13] P. Latouche, E. Birmele, and C. Ambroise, preprint [27] B. W. Kernighan and S. Lin, Bell System Technical Jour-
arXiv:0910.2098 (2009). nal 49, 291 (1970).
[14] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, [28] M. Meila, Journal of Multivariate Analysis 98, 873
J. Mach. Learn. Res. 9, 1981 (2008). (2007).
[15] Y. Park, C. Moore, and J. S. Bader, PLoS One 5, e8118 [29] W. Zachary, Journal of Anthropological Research 33, 452
(2010). (1977).
[16] Y. J. Wang and G. Y. Wong, Journal of the American [30] M. Rosvall and C. T. Bergstrom, Proc. Natl. Acad. Sci.
Statistical Association 82, 8 (1987). USA 104, 7327 (2007).
[17] A. Dasgupta, J. E. Hopcroft, and F. McSherry, in Pro- [31] L. A. Adamic and N. Glance, in Proceedings of the
ceedings of the 45th Annual Symposium on Foundations WWW-2005 Workshop on the Weblogging Ecosystem
of Computer Science, p. 602 (2004). (2005).
[18] J. Reichardt and D. White, Eur. Phys. J. B 60, 217 [32] F. Chung and L. Lu, Proc. Natl. Acad. Sci. USA 99,
(2007). 15879 (2002).
[19] M. Mørup and L. K. Hansen, in NIPS Workshop on An- [33] A. Lancichinetti, S. Fortunato, and F. Radicchi, Phys.
alyzing Networks and Learning With Graphs (2009). Rev. E 78, 046110 (2008).
[20] A. Coja-Oghlan and A. Lanka, SIAM Journal on Discrete [34] L. Danon, A. Diaz-Guilera, and A. Arenas, J. Stat. Mech.
Mathematics 23, 1682 (2009). P11010 (2006).
[21] A. S. Patterson and J. S. Bader, unpublished. [35] This algorithm’s performance may not be quite as bad as
[22] A. Clauset, C. Moore, and M. E. J. Newman, Nature the figure seems to indicate because for high λ the mu-
453, 98 (2008). tual information values are roughly bimodal, with most
[23] T. M. Cover and J. A. Thomas, Elements of Information values being close to either 0 or 1. The mean shown is
Theory, Second Edition (Wiley, New York, 2006). thus roughly the percentage of times that the uncorrected
[24] I. S. Dhillon, S. Mallela, and D. S. Modha, in Proceed- model successfully finds the planted structure. The same
ings of the Ninth ACM SIGKDD International Confer- does not apply for the other curves, where the error rep-
ence on Knowledge Discovery and Data Mining, p. 89 resents relatively minor variation about the mean.
(ACM, New York, 2003).

You might also like