05 Motifs PDF

CS224W: Analysis of Networks
Jure Leskovec, Stanford University

http://cs224w.stanford.edu
Network Metrics
¡ Many metrics at the node level:
Network Metrics
There are many available metrics at the node level:
§ E.g., node degree, PageRank score, node clustering
– E.g. degree, betweenness, closeness
¡ Many– metrics at the whole-network level:
There are many available metrics at the node level:
– E.g. degree, betweenness, closeness
There are also many metrics at the global level:
§ E.g.,
–
diameter, clustering, size of giant component
– E.g. average distance, density, clustering coefficient
There are also many metrics at the global level:
¡ What about
– E.g. average something in-between?
distance, density, clustering coefficient
What about something inbetween?
§ A What
mesoscale characterization
about something inbetween?of networks
> ?? ? >
Macroscopic: Microscopic:
Mesoscopic Pedro Ribeiro
10/9/18
Whole network
Jure Leskovec, Stanford CS224W: Analysis of Networks
Single node Pedro Ribeiro
2
Building of
Building Blocks Blocks of Networks
Networks
¡ Subnetworks, or subgraphs,
Subnetworks, are the
or subgraphs, arebuilding
the building
Subnetworks, or subgraphs,
of networks are the building
blocks of networks:
blocks
blocks of networks
They have the power to characterize and

discriminate networks
¡ They
Theyhave
havethe
the power tocharacterize
power to characterizeand
and Pedro Ribeiro
Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 3
Subgraph decomposition of an electronic circuit
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Oxford Protein Informatics Group 4
Example Application
Consider
Let’s considerall
all possible directed
possible (non-isomorphoic)
subgraphs of size 3
directed subgraphs of size 3
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Pedro Ribeiro

5
¡ For each subgraph:
§ Imagine you have a metric capable of classifying the
subgraph “significance” [more on that later]
§ Negative values indicate under-representation
§ Positive values indicate over-representation
¡ We create a network significance profile:

§ A feature vector with values for all subgraph types
¡ Next: Compare profiles of different networks:

§ Regulatory network (gene regulation)
§ Neuronal network (synaptic connections)
§ World Wide Web (hyperlinks between pages)
§ Social network (friendships)
§ Language networks (word adjacency)
Example Application
Gene regulation Network significance profile
networks
Neurons
Web and social
Language networks
Image: (Milo et al., 2004)

Different networks have similar fingerprints!
Different networks have similar significance profiles Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004 7
Example Application
Correlation
Clustering of networks based on their significance profiles
Network significance profile similarity

Correlation in
significance profile of
the English and French
language networks
Closely related networks have more similar significance profiles

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004 8

Different networks have similar fingerprints!
Example Application – Science
Network significance profile

Subgraph types (corresponding to the X-axis of the plot)
Co-Authorship Network in different scientific areas
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Choobdar et al., ASONAM 2012 9
Image: (Choobdar et al, 2002)
Pedro Ribeiro
¡ Network motifs: “recurring, significant
patterns of interconnections”
¡ How to define a network motif:

§ Pattern: induced/non-induced subgraph
§ Recurring: found many times, i.e., with high
frequency
§ Significant: more frequent than expected, i.e., in
randomly generated networks
§ Erdos-Renyi random graphs, scale-free networks

¡ Motifs:
§ Help us understand how networks work
§ Help us predict operation and reaction of the
network in a given situation
Feed-forward loop
¡ Examples:
§ Feed-forward loops: found in networks of
neurons, where they neutralize “biological noise”
§ Parallel loops: found in food webs
§ Single-input modules: found in gene control
networks
Single-input module Parallel loop
*+#,-"#&.,)/$0123
Induced subgraph
of interest
(aka Motif):
No match!
Match!

Subgraph concepts - Frequency
How to Subgraph
count? concepts - Frequency
– Allow overlapping
Motif of interest:
How to count?
– Allow overlapping
¡ Allow overlapping of motifs

– 4 occurrences:
{1,2,3,4,5}
¡ Network on the right has
– 4 occurrences:
{1,2,3,4,6}
4 occurrences of the motif:
{1,2,3,4,5}
{1,2,3,4,7}
§ {1,2,3,4,5}
{1,2,3,4,8}{1,2,3,4,6}
§ {1,2,3,4,6} {1,2,3,4,7}
{1,2,3,4,8}
§ {1,2,3,4,7}
Pedro Ribeiro
§ {1,2,3,4,8}
Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Example borrowed from Pedro Ribeiro 15
¡ Key idea: Subgraphs that occur in a real
network much more often than in a random
network have functional significance
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et. al., Science 2002 16
¡ Motifs are overrepresented in a network
when compared to randomized networks:
§ !" captures statistical significance of motif #:
!" = (&"'()* −& ,"')-. )/std(&"')-. )
§ &"'()* is #(subgraphs of type 4) in network 5 '()*
§ &"')-. is #(subgraphs of type 4) in randomized network 5 ')-.
¡ Network significance profile:
67" = !" / ∑!9:
§ 67 is a vector of normalized Z-scores
§ 67 emphasizes relative significance of subgraphs:
§ Important for comparison of networks of different sizes
10/9/18 § Generally, larger networks display higher Z-scores 17
¡ Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
§ We can compare the real network G and a “random”
G’ which has the same degree sequence as G
¡ Configuration model:
A C
A C
B D B D
A B C D
Randomly pair up
Nodes with spokes “mini”-n0des Resulting graph
We ignore double edges and self-loops when creating the final graph
10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
¡ Start from a given graph !
¡ Repeat the switching step "|$ % | times: C
A
§ Select a pair of edges AàB, CàD at random
§ Exchange the endpoints to give AàD, CàB B D
§ Exchange edges only if no multiple edges

or self-edges are generated A C
¡ Result: A randomly rewired graph: B D
§ Same node degrees, randomly rewired edges
¡ " is chosen large enough (e.g., " = 100) for the

process to converge
Example Application
Network significance profile
Different networks have&'()

!" = (%" +"
&(,-
−%
similar fingerprints! )/std(%"&(,- )
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004
Pedro Ribeiro 20
¡ Count subgraphs ! in " #$%&
¡ Count subgraphs ! in random networks " #%'( :
§ Configuration model: Each " #%'( has the same
#(nodes), #(edges) and #(degree distribution) as " #$%&
¡ Assign Z-score to !:
0+#%'( )/std(.+#%'( )
§ *+ = (.+#$%& −.
§ High Z-score: Subgraph !
is a network motif of 6

Network Motifs Applicability
¡ Canonical definition:
Canon definition:
§ Directed and undirected
– Directed and Undirected
§ Colored and
– Colored anduncolored
uncolored Example: colored networ
§ Temporal and static motifs Examples from my own pap
¡ Variations
Variations onconcept
on the the concept
– Different frequency concepts
§ Different frequency concepts
– Different significance metrics
§ Different significance metrics
– Under-Representation (anti-motifs)
§ Under-Representation (anti-motifs)
– Different constraints for null model
§ – Weighted
Different networks for null model
constraints
Pedro Ribeiro
Z-scores of individual motifs for different networks
Z-scores of individual motifs for different networks
¡ Network of neurons and a gene network
contain similar motifs:
§ Feed-forward loops and bi-fan structures
§ Both are information processing networks with
sensory and acting components
¡ Food webs have parallel loops:

§ Prey of a particular predator share prey
¡ WWW network has bidirectional links

§ Design that allows the shortest path among sets of
related pages
¡ Graphlets: connected non-isomorphic subgraphs
§ Induced subgraphs of any frequency
For ! = #, %, &, … () there are *, +, *(, … ((,(+&,( graphlets!

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Przulj et al., Bioinformatics 2004 26
¡ Next: Use graphlets to obtain a node-level
subgraph metric
¡ Degree counts #(edges) that a node touches:

§ Can we generalize this notion for graphlets? – Yes!
¡ Graphlet degree vector counts #(graphlets)

that a node touches

¡ An automorphism
Graphlet Degree Vectororbit takes into account the
symmetries of a subgraph
An automorphism “orbit” takes into account the
¡ Graphlet
symmetries Degree Vector (GDV): a vector with
of the graph
the frequency of the node in each orbit position
The graphlet degree vector is a feature vector with
¡ Example:
the frequency Graphlet
of the node in eachdegree vector of node v
orbit position
For a node ! of graph ", the

automorphism orbit of ! is #$% ! =
{( ∈ * " ; ( = , ! for some , Aut(")}.
The Aut denotes an automorphism

group of ", i.e., an isomorphism from "
to itself.

Pedro Ribeiro
¡ Graphlet degree vector counts #(graphlets) that a
node touches at a particular orbit
¡ Considering graphlets on 2 to 5 nodes we get:
§ Vector of 73 coordinates is a signature of a node that
describes the topology of node's neighborhood
§ Captures its interconnectivities out to a distance of 4 hops
¡ Graphlet degree vector provides a measure of a

node’s local network topology:
§ Comparing vectors of two nodes provides a highly
constraining measure of local topological similarity
between them

4 ergm.graphlets: A Package for ERG Modeling Based on Graphlet Statistics
F F F F
D D D D
A B A B A B A B
E E E E
C C C C
Orbit 15 Orbit 19 Orbit 27 Orbit 35
Orbit 0 1 2...3 4 5 6 7...14 15 16...18 19 20...26 27 28...34 35 36...72

GDV(A) 1 2 0...0 3 0 1 0...0 1 0...0 1 0...0 1 0...0 1 0...0
Graphlet Degree Vector (GDV) of node A:

Figure 2: An illustration of the graphlet signature of node A. The number of graphlets that
touch node A at orbit i is the ith element of graphlet degree vector of a node. The figure
§ !-th element of GDV(A): #(graphlets) that touch A at
highlights the graphlets that touch node A at orbits 15, 19, 27, and 35 from left to right,
orbit !
respectively (Kuchaiev et al. 2010).
Highlighted
§ only
terms, as arepatterns
induced subgraph graphlets
are takenthat touch
into account node
by the A at orbits
ergm.graphlets terms.
15, package
ergm.graphlets 19, 27, alsoand 35anfrom
provides leftlistto
extended right pattern based terms that
of subgraph
covers any observable subgraph pattern of size 2, 3, 4, and 5.
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Yaveroglu et al., Journal of Statistical Software 2015 30
In the remainder of this article, we proceed as follows. First, we provide a brief summary
¡ Finding size-k motifs/graphlets requires solving
two challenges:
§ 1) Enumerating all size-k connected subgraphs
§ 2) Counting #(occurrences of each subgraph type)
¡ Just knowing if a certain subgraph exists in a

graph is a hard computational problem!
§ Subgraph isomorphism is NP-complete
¡ Computation time grows exponentially as the

size of the motif/graphlet increases
§ Feasible motif size is usually small (3 to 8)
¡ Network-centric approaches:
§ Step 1) Enumerate all k-connected sets of nodes
§ Step 2) Count subgraphs of each type via graph
isomorphisms test
¡ Algorithms:
§ Exact subgraph enumeration (ESU) [Wernicke 2006]
§ Kavosh [Kashani et al. 2009]
§ Subgraph sampling [Kashtan et al. 2004]
¡ Today: ESU algorithm

¡ Two sets:
§ !"#$%&'() : currently constructed subgraph (motif)
§ !*+,*-"./- : set of candidate nodes to extend the motif
¡ Idea: Starting with a node 0, add those nodes to
!*+,*-"./- set that have two properties:
§ Their node_id must be larger than that of 0
§ They may only be neighbored to the newly added
node 1 but not to a node already in !"#$%&'()
¡ ESU is implemented as a recursive function:
§ The running of this function can be displayed as a
tree-like structure of depth 2, called the ESU-Tree

350 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIO
subrouti
a node
w1 # w2 .
in this se
lemma s
for provi
Lemma
1.
2.
Fig. 3. Pseudocode for the algorithm ESU which 3.

Wernicke, enumerates allonsize-k
IEEE/ACM Transactions Computational
Biology and Bioinformatics 2006
10/9/18subgraphs in a Jure
given graph
Leskovec, Stanford G. of(The
CS224W: Analysis Networks definition of the exclusive 34
neighborhood Nexcl ðv; V 0 Þ is given in Section 2.1.)

012345674829:56;<= >, @ :
0A741B829:56;<( D , @ , D)
!"#$%&'() !*+,*-"./-
{1, 3}, {2, 4, 5} {2, 3}, {4, 5}
Leaves represent
all size-3 induced
sub-graphs
¡ Nodes in the ESU-tree include two adjoining sets:

§ !"#$%&'() : A set of adjacent nodes
§ !*+,*-"./- : Nodes adjacent to at least one of the SUB nodes
whose numerical labels are larger than the SUB nodes labels
¡ So far, we enumerated all size-k subgraphs in
the input graph
¡ Next step of ESU: Classify subgraphs placed in

the ESU-Tree leaves into non-isomorphic size-
k classes:
§ Determine which subgraphs in ESU-Tree leaves are
topologically equivalent (isomorphic) and group
them into subgraph classes accordingly
§ Use McKay’s nauty algorithm [McKay 1981]

¡ Graphs ! and " are isomorphic if there exists a
bijection #: % ! → %(") such that:
§ Any two nodes ) and * of + are adjacent in + iff
,()) and ,(*) are adjacent in -
¡ Example: Are ! and " topologically equivalent?
! " A
B
4
7
C 1
D 3
#: E
F
5
8
G 2
H 9
Need to check 9! possible bijections between node sets I 6
Hard computational problem! + and - are isomorphic!

¡ Project is a substantial part of the class
§ Students put significant effort, and great
results have been obtained in the past
¡ Types of projects:
§ (1) Analysis of an interesting dataset with the goal to
develop a (new) model or an algorithm
§ (2) A test of a model or algorithm (that you have read
about or your own) on real & simulated data.
§ Fast algorithms for big graphs. Can be integrated into SNAP.
¡ Other points:
§ The project should contain some mathematical analysis,
and some experimentation on real or synthetic data
§ The result of the project will typically be an 8 page paper,
describing the approach, the results, and related work.
§ Come to us if you need help with a project idea!
Project proposal: 3-5 pages, teams of up to 3 students
¡ Project proposal has 3 parts:
§ (0) Quick 200 word abstract
§ (1) Related work / Reaction paper (2-3 pages):
§ Read 3 papers related to the project/class
§ Do reading beyond what was covered in class
§ Think beyond what you read. Don’t take other’s work for granted!
§ 2-3 pages: Summary (~1 page), Critique (~1 page)
§ (2) Proposal (1-2 pages):
§ Clearly define the problem you are solving.
§ How does it relate to what you read for the Reaction paper?
§ What data will you use? (make sure you already have it!)
§ Which algorithm/model will you use/develop? Be specific!
§ How will you evaluate/test your method?
See http://cs224w.stanford.edu/info.html for detailed instructions
and examples of previous proposals
¡ Logistics:
§ 1) Register your group on this Google Form:
https://goo.gl/forms/2wvoXXLm8R91y4jV2
§ 2) Submit PDF on GradeScope AND at
http://snap.stanford.edu/submit/
§ Due in 9 days: Thu Oct 18 at 23:59 PST!
¡ If you need help/ideas/advice come to

Office Hours (Jure and Michele) or email us

¡ Here you can find a list of interesting datasets
for your CS224W project:
http://cs224w.stanford.edu/data.html

05 Motifs PDF

Uploaded by

Copyright:

Available Formats

You might also like

05 Motifs PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

05 Motifs PDF

Uploaded by

Copyright:

Available Formats

CS224W: Analysis of Networks

Jure Leskovec, Stanford University

They have the power to characterize and

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Pedro Ribeiro

¡ We create a network significance profile:

¡ Next: Compare profiles of different networks:

Web and social

Image: (Milo et al., 2004)

Network significance profile similarity

Closely related networks have more similar significance profiles

Image: (Milo et al., 2004)

Network significance profile

¡ How to define a network motif:

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 12

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 14

¡ Allow overlapping of motifs

§ Exchange edges only if no multiple edges

¡ Result: A randomly rewired graph: B D

§ Same node degrees, randomly rewired edges

¡ " is chosen large enough (e.g., " = 100) for the

Different networks have&'()

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 21

¡ Food webs have parallel loops:

¡ WWW network has bidirectional links

For ! = #, %, &, … () there are *, +, *(, … ((,(+&,( graphlets!

¡ Degree counts #(edges) that a node touches:

¡ Graphlet degree vector counts #(graphlets)

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 27

For a node ! of graph ", the

The Aut denotes an automorphism

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 28

¡ Graphlet degree vector provides a measure of a

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 29

Orbit 15 Orbit 19 Orbit 27 Orbit 35

Orbit 0 1 2...3 4 5 6 7...14 15 16...18 19 20...26 27 28...34 35 36...72

Graphlet Degree Vector (GDV) of node A:

¡ Just knowing if a certain subgraph exists in a

¡ Computation time grows exponentially as the

¡ Today: ESU algorithm

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 33

Fig. 3. Pseudocode for the algorithm ESU which 3.

neighborhood Nexcl ðv; V 0 Þ is given in Section 2.1.)

{1, 3}, {2, 4, 5} {2, 3}, {4, 5}

¡ Nodes in the ESU-tree include two adjoining sets:

¡ Next step of ESU: Classify subgraphs placed in

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 36

Need to check 9! possible bijections between node sets I 6

Hard computational problem! + and - are isomorphic!

¡ If you need help/ideas/advice come to

10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41

10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42

You might also like

For ! = #, %, &, … () there are , +, (, … ((,(+&,( graphlets!