Module3 Communitynetworks 2

Module3_CommunityNetworks_2
Reference: R. Zafarani, M. A. Abbasi, and H.

Liu, Social Media Mining: An Introduction,
Cambridge University Press, 2014.
Book at http://socialmediamining.info/
Network and Community Evolution
• Community detection algorithms discussed so
far assume that networks are static
– Their nodes and edges are fixed and do not change
over time
• In reality, with the rapid growth of social media,
networks and their internal communities
change over time.
• Earlier community detection algorithms have to
be extended to deal with evolving networks.
Network and Community Evolution
• How does a network change over time?
• How does a community change over time?
• What properties do we expect to remain
roughly constant?
• What properties do we expect to change?
• For example,
– Where do we expect new edges to form?
– Which edges do we expect to be dropped?
Network Growth Patterns
• Large social networks are highly dynamic, where
nodes and links appear or disappear over time
• In these evolving networks, many interesting
patterns are observed
– For instance, when distances (in terms of shortest
path distance) between two nodes increase, their
probability of getting connected decreases
Network Growth Patterns:
1. Network Segmentation
2. Graph Densification
3. Diameter Shrinkage
1. Network Segmentation
• Often, in evolving networks, segmentation
takes place, where the large network is
• The network is
decomposed over time into three parts
• Giant Component : As network
decomposed into a
connections stabilize, a giant giant component
component of nodes is formed, with a
large proportion of network nodes (dark gray), star
and edges falling into this component.
components
• Stars : These are isolated parts of the (medium gray), and
network that form star structures. A
star is a tree with one internal node singletons(light
and n leaves.
gray)
• Singletons : These are orphan nodes
disconnected from all nodes in the
network.
2. Graph Densification
•• Density
of the graph increases as the network grows
– The number of edges increases faster than the number of nodes
• This phenomenon is called densification
• Let denote nodes at time t and let denote edges at time t,
• If densification happens, then we have

• Densification exponent: :
– : linear growth – constant out-degree
– : quadratic growth – clique structures
Densification in Real Networks
• Examples
3. Diameter Shrinking
• In networks, diameter shrinks over time
3. Diameter Shrinking
• Examples:
Community Evolution
• Communities in evolving networks also evolve.
• They appear, grow, shrink, split, merge, or even dissolve over
time (in dynamic networks)
Community Detection in Evolving Networks
• Consider an instant messaging (IM) application in social media.
• In these IM systems, members become “available” or “online”
frequently.
• Consider individuals as nodes and messages between them as edges.
– In this example, we are interested in finding a community of individuals who
send messages to one another frequently.
– Community detection at any time stamp is not a valid solution because
interactions are limited at any point in time.
– A valid solution to this problem needs to use temporal information and
interactions between users over time.
• Hence, community detection algorithms must incorporate temporal
information
• To incorporate temporal information, previously discussed static
methods are extended.
Communities in Evolving Networks
-Extending Previous Methods
• Take t snapshots of the network, , where is a snapshot at
1.
time
2. Perform a static community detection algorithm (all
methods discussed before) on all snapshots
independently
3. Assign community members based on communities
found in all time stamps.
– E.g., Assign nodes to communities based on voting (assign
nodes to communities they belong to the most over time)
This method is unstable in highly dynamic networks as
community memberships are always changing
Communities in Evolving Networks -Extending
Previous Methods
• Let us consider 3 snapshots of a network at time t1, t2 and t3.
• Assume that the following are the communities detected using one
of the existing algorithms. Use the extended community detection
method to detect communities.
– Communities detected on network snapshot at time t1
{1,3,4} , {2, 5,7} , { 6, 9}
{1,3,5} , {2, 4} , {6, 7}
{1,3,8} , {2, 4,5} , {6, 7}
 {1,3,8} , {2, 4,5} , {6, 7,9}
If t1 < t2 < t3 , actor 9 in the last community may be excluded.

Evolutionary Clustering
• Assume that communities don’t change most
of the time
• Minimize an objective function that considers
– Snapshot Cost. Communities at different times
(SC)
– Temporal Cost. How communities evolve (TC)
• Objective function is defined as
Evolutionary Clustering (con’t)
•• If we use Spectral clustering for each snapshot
• Then, , the objective function at time is
• is the community membership matrix at

• To define TC we can use
• Challenges with this definition

– Assumes that we have the same number of communities at time and
– is non-unique (any orthogonal transformation is still a solution)
• We can define TC as
• Hence cost will be

• Assuming Normalized Laplacian is used
Similar to Spectral Clustering

Xt can be obtained by taking the top
eigenvectors of of
Evaluating the Communities
•• When
communities are found, one must evaluate how
accurately the detection task has been performed.
• We are given objects of two different kinds (, )
• The perfect community: All objects inside
the community are of the same type
• Two Scenarios
– Evaluation with ground truth
– Evaluation without ground truth
Evaluation with Ground Truth
• When ground truth is available
– We have partial knowledge of what communities should look
like
– Assume that we are given the correct community (clustering)
assignments
• Measures
– Precision and Recall
– F-Measure
– Purity
– Normalized Mutual Information (NMI)
Precision and Recall
• Community detection can be considered a problem of
assigning all similar nodes to the same community.
• In the simplest case, any two similar nodes should be
considered members of the same community.
• Based on our assignments, four cases can occur (four
ways of being right or wrong):
– True Positive (TP) Assignment
– True Negative (TN) Assignment
– False Negative (FN) Assignment
– False Positive (FP) Assignment
Precision and Recall
• False Negative (FN) :
•• True Positive (TP) : •• When similar members are
assigned to different communities
• When similar members are
assigned to the same • An incorrect decision
communities • False Positive (FP) :
• A correct decision. • When dissimilar members are
• True Negative (TN) : assigned to the same
communities
• When dissimilar members are • An incorrect decision
assigned to different communities
• Recall : Percentage of the
• A correct decision
positive cases caught
• Precision : Percentage of • Recall : fraction of pairs that the
positive predictions those were community detection algorithm
correct assigned to the same
• Precision : Fraction of pairs that community of all the pairs that
have been correctly assigned to should have been in the same
the same community community
• •
Precision and Recall: Example
• Precision defines the fraction of pairs that
have been correctly assigned to the same
community.
• Recall defines the fraction of pairs that the
community detection algorithm assigned to
the same community of all the pairs that
should have been in the same community.
• TP: For TP, we need to compute the number of pairs with the
same label that are in the same community
• FP,FN
• Hence
F-Measure
•Consolidation
of precision and recall into one
measure
– To integrate them into one measure, we can use
the harmonic mean of precision and recall
– For the earlier example,

– 54
Purity
We can assume that the majority of a
community represents the community
– Hence, we use the label of the majority of the
community against the label of each member of
the community to evaluate the algorithm
– Purity :The fraction of instances that have labels
equal to their community’s majority label
• : the number of
communities
• : total number of nodes,
• : the set of instances with
label in all communities
• : the set of members in
community
Purity - Example
• The majority in Community 1 is X ; therefore, we assume

majority label X for that community.
• The purity is then defined as the fraction of instances that have
labels equal to their community’s majority label.
• Purity is (5+6+4)/ 20 = 0.75
Purity
• Purity can be easily manipulated to generate high values
– consider when nodes represent singleton communities (of
size 1) or
• when we have very large pure communities (ground truth = majority
label).
– In both cases, purity does not make sense because it
generates high values.
• A more precise measure to solve problems associated
with purity is the normalized mutual information (NMI)
measure, which originates from information theory.
Mutual Information
• Mutual
information (MI): The amount of
information that two random variables share.
– By knowing one of the variables, MI measures the
amount of uncertainty reduced regarding the
other variable
– Here
•
Mutual Information
• Mutual information (MI) is defined by
• and are labels and found communities;

• and are the number of data points in community
and with label , respectively;
• is the number of nodes in community and with
label ; and is the number of nodes
Normalizing Mutual Information (NMI)
•• Mutual
information (MI) is unbounded
• It is common for measures to have values in range [0,1]
• To address this issue, we can normalize MI
• The following equation, without proof, will help us
normalize mutual information
, Since MI <= H(L) and MI<= H(H)
• is the entropy function

Normalized Mutual Information
is used to normalize mutual
information
•
We can also define it as

Note that
•
– where and are known (with labels) and found communities,

respectively
– and are the number of members in the community and
with , respectively,
– is the number of members in community and labeled ,
– is the size of the dataset
• NMI values close to one indicate high similarity
between communities found and labels
– Values close to zero indicate high dissimilarity between
them
Normalized Mutual Information: Example
Found communities (H) Actual Labels (L)

[1,1,1,1,1,1, 2,2,2,2,2,2,2,2] [2,1,1,1,1,1, 2,2,2,2,2,2,1,1]
nh nl nh,l
𝑛=14
h=1 6 7 h=1 5 1
h=2 8 7 h=2 2 6
Evaluation without Ground Truth
• Evaluation with Semantics

– A simple way of analyzing detected communities is to analyze other attributes (posts, profile
information, content generated, etc.) of community members to see if there is a coherency among
community members
– The coherency is often checked via human subjects.
– To help analyze these communities, one can use word frequencies.
– By generating a list of frequent keywords for each community, human subjects determine whether
these keywords represent a coherent topic.
• Evaluation Using Clustering Quality Measures

– Use clustering quality measures (like sum of squared error)SSE/inter cluster distance)
– Use more than two community detection algorithms and compare the results and pick the algorithm
with better quality measure

Module3 Communitynetworks 2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module3 Communitynetworks 2

Uploaded by

Copyright:

Available Formats

Module3_CommunityNetworks_2

Reference: R. Zafarani, M. A. Abbasi, and H.

• If densification happens, then we have

If t1 < t2 < t3 , actor 9 in the last community may be excluded.

• is the community membership matrix at

• Challenges with this definition

• Hence cost will be

Similar to Spectral Clustering

– For the earlier example,

• The majority in Community 1 is X ; therefore, we assume

• and are labels and found communities;

, Since MI <= H(L) and MI<= H(H)

• is the entropy function

We can also define it as

– where and are known (with labels) and found communities,

Found communities (H) Actual Labels (L)

• Evaluation with Semantics

• Evaluation Using Clustering Quality Measures

You might also like