Professional Documents
Culture Documents
05 Motifs PDF
05 Motifs PDF
05 Motifs PDF
> ?? ? >
Macroscopic: Microscopic:
Mesoscopic Pedro Ribeiro
10/9/18
Whole network
Jure Leskovec, Stanford CS224W: Analysis of Networks
Single node Pedro Ribeiro
2
Building of
Building Blocks Blocks of Networks
Networks
¡ Subnetworks, or subgraphs,
Subnetworks, are the
or subgraphs, arebuilding
the building
Subnetworks, or subgraphs,
of networks are the building
blocks of networks:
blocks
blocks of networks
¡ They
Theyhave
havethe
the power tocharacterize
power to characterizeand
and Pedro Ribeiro
discriminate networks
discriminate networks
Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 3
Subgraph decomposition of an electronic circuit
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Oxford Protein Informatics Group 4
Example Application
Consider
Let’s considerall
all possible directed
possible (non-isomorphoic)
subgraphs of size 3
directed subgraphs of size 3
Neurons
Language networks
Correlation
Clustering of networks based on their significance profiles
No match!
Match!
How to Subgraph
count? concepts - Frequency
– Allow overlapping
Motif of interest:
How to count?
– Allow overlapping
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et. al., Science 2002 16
¡ Motifs are overrepresented in a network
when compared to randomized networks:
§ !" captures statistical significance of motif #:
!" = (&"'()* −& ,"')-. )/std(&"')-. )
§ &"'()* is #(subgraphs of type 4) in network 5 '()*
§ &"')-. is #(subgraphs of type 4) in randomized network 5 ')-.
¡ Network significance profile:
67" = !" / ∑!9:
§ 67 is a vector of normalized Z-scores
§ 67 emphasizes relative significance of subgraphs:
§ Important for comparison of networks of different sizes
10/9/18 § Generally, larger networks display higher Z-scores 17
¡ Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
§ We can compare the real network G and a “random”
G’ which has the same degree sequence as G
¡ Configuration model:
A C
A C
B D B D
A B C D
Randomly pair up
Nodes with spokes “mini”-n0des Resulting graph
We ignore double edges and self-loops when creating the final graph
10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
¡ Start from a given graph !
¡ Repeat the switching step "|$ % | times: C
A
§ Select a pair of edges AàB, CàD at random
§ Exchange the endpoints to give AàD, CàB B D
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004
Pedro Ribeiro 20
¡ Count subgraphs ! in " #$%&
¡ Count subgraphs ! in random networks " #%'( :
§ Configuration model: Each " #%'( has the same
#(nodes), #(edges) and #(degree distribution) as " #$%&
¡ Assign Z-score to !:
0+#%'( )/std(.+#%'( )
§ *+ = (.+#$%& −.
§ High Z-score: Subgraph !
is a network motif of 6
¡ Variations
Variations onconcept
on the the concept
– Different frequency concepts
§ Different frequency concepts
– Different significance metrics
§ Different significance metrics
– Under-Representation (anti-motifs)
§ Under-Representation (anti-motifs)
– Different constraints for null model
§ – Weighted
Different networks for null model
constraints
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 22
Pedro Ribeiro
Z-scores of individual motifs for different networks
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 23
Z-scores of individual motifs for different networks
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 24
¡ Network of neurons and a gene network
contain similar motifs:
§ Feed-forward loops and bi-fan structures
§ Both are information processing networks with
sensory and acting components
F F F F
D D D D
A B A B A B A B
E E E E
C C C C
Highlighted
§ only
terms, as arepatterns
induced subgraph graphlets
are takenthat touch
into account node
by the A at orbits
ergm.graphlets terms.
15, package
ergm.graphlets 19, 27, alsoand 35anfrom
provides leftlistto
extended right pattern based terms that
of subgraph
covers any observable subgraph pattern of size 2, 3, 4, and 5.
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Yaveroglu et al., Journal of Statistical Software 2015 30
In the remainder of this article, we proceed as follows. First, we provide a brief summary
¡ Finding size-k motifs/graphlets requires solving
two challenges:
§ 1) Enumerating all size-k connected subgraphs
§ 2) Counting #(occurrences of each subgraph type)
¡ Algorithms:
§ Exact subgraph enumeration (ESU) [Wernicke 2006]
§ Kavosh [Kashani et al. 2009]
§ Subgraph sampling [Kashtan et al. 2004]
subrouti
a node
w1 # w2 .
in this se
lemma s
for provi
Lemma
1.
2.
0A741B829:56;<( D , @ , D)
!"#$%&'() !*+,*-"./-
Leaves represent
all size-3 induced
sub-graphs
#: E
F
5
8
G 2
H 9
http://cs224w.stanford.edu/data.html