05 Motifs PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 41

CS224W: Analysis of Networks

Jure Leskovec, Stanford University


http://cs224w.stanford.edu
Network Metrics
¡ Many metrics at the node level:
Network Metrics
There are many available metrics at the node level:
§ E.g., node degree, PageRank score, node clustering
– E.g. degree, betweenness, closeness
¡ Many– metrics at the whole-network level:
There are many available metrics at the node level:
– E.g. degree, betweenness, closeness
There are also many metrics at the global level:
§ E.g.,

diameter, clustering, size of giant component
– E.g. average distance, density, clustering coefficient
There are also many metrics at the global level:
¡ What about
– E.g. average something in-between?
distance, density, clustering coefficient
What about something inbetween?
§ A What
mesoscale characterization
about something inbetween?of networks

> ?? ? >
Macroscopic: Microscopic:
Mesoscopic Pedro Ribeiro

10/9/18
Whole network
Jure Leskovec, Stanford CS224W: Analysis of Networks
Single node Pedro Ribeiro
2
Building of
Building Blocks Blocks of Networks
Networks
¡ Subnetworks, or subgraphs,
Subnetworks, are the
or subgraphs, arebuilding
the building
Subnetworks, or subgraphs,
of networks are the building
blocks of networks:
blocks
blocks of networks

They have the power to characterize and


discriminate networks

¡ They
Theyhave
havethe
the power tocharacterize
power to characterizeand
and Pedro Ribeiro

discriminate networks
discriminate networks
Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 3
Subgraph decomposition of an electronic circuit
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Oxford Protein Informatics Group 4
Example Application

Consider
Let’s considerall
all possible directed
possible (non-isomorphoic)
subgraphs of size 3
directed subgraphs of size 3

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Pedro Ribeiro


5
¡ For each subgraph:
§ Imagine you have a metric capable of classifying the
subgraph “significance” [more on that later]
§ Negative values indicate under-representation
§ Positive values indicate over-representation

¡ We create a network significance profile:


§ A feature vector with values for all subgraph types

¡ Next: Compare profiles of different networks:


§ Regulatory network (gene regulation)
§ Neuronal network (synaptic connections)
§ World Wide Web (hyperlinks between pages)
§ Social network (friendships)
§ Language networks (word adjacency)
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 6
Example Application
Gene regulation Network significance profile
networks

Neurons

Web and social

Language networks

Image: (Milo et al., 2004)


Different networks have similar fingerprints!
Different networks have similar significance profiles Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004 7
Example Application

Correlation
Clustering of networks based on their significance profiles

Network significance profile similarity


Correlation in
significance profile of
the English and French
language networks

Closely related networks have more similar significance profiles


10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004 8

Image: (Milo et al., 2004)


Different networks have similar fingerprints!
Example Application – Science

Network significance profile


Subgraph types (corresponding to the X-axis of the plot)
Co-Authorship Network in different scientific areas
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Choobdar et al., ASONAM 2012 9
Image: (Choobdar et al, 2002)
Pedro Ribeiro
¡ Network motifs: “recurring, significant
patterns of interconnections”

¡ How to define a network motif:


§ Pattern: induced/non-induced subgraph
§ Recurring: found many times, i.e., with high
frequency
§ Significant: more frequent than expected, i.e., in
randomly generated networks
§ Erdos-Renyi random graphs, scale-free networks

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 12


¡ Motifs:
§ Help us understand how networks work
§ Help us predict operation and reaction of the
network in a given situation
Feed-forward loop
¡ Examples:
§ Feed-forward loops: found in networks of
neurons, where they neutralize “biological noise”
§ Parallel loops: found in food webs
§ Single-input modules: found in gene control
networks
Single-input module Parallel loop
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 13
*+#,-"#&.,)/$0123
Induced subgraph
of interest
(aka Motif):

No match!
Match!

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 14


Subgraph concepts - Frequency

How to Subgraph
count? concepts - Frequency
– Allow overlapping
Motif of interest:
How to count?
– Allow overlapping

¡ Allow overlapping of motifs


– 4 occurrences:
{1,2,3,4,5}
¡ Network on the right has
– 4 occurrences:
{1,2,3,4,6}
4 occurrences of the motif:
{1,2,3,4,5}
{1,2,3,4,7}
§ {1,2,3,4,5}
{1,2,3,4,8}{1,2,3,4,6}
§ {1,2,3,4,6} {1,2,3,4,7}
{1,2,3,4,8}
§ {1,2,3,4,7}
Pedro Ribeiro
§ {1,2,3,4,8}
Pedro Ribeiro
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Example borrowed from Pedro Ribeiro 15
¡ Key idea: Subgraphs that occur in a real
network much more often than in a random
network have functional significance

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et. al., Science 2002 16
¡ Motifs are overrepresented in a network
when compared to randomized networks:
§ !" captures statistical significance of motif #:
!" = (&"'()* −& ,"')-. )/std(&"')-. )
§ &"'()* is #(subgraphs of type 4) in network 5 '()*
§ &"')-. is #(subgraphs of type 4) in randomized network 5 ')-.
¡ Network significance profile:
67" = !" / ∑!9:
§ 67 is a vector of normalized Z-scores
§ 67 emphasizes relative significance of subgraphs:
§ Important for comparison of networks of different sizes
10/9/18 § Generally, larger networks display higher Z-scores 17
¡ Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
§ We can compare the real network G and a “random”
G’ which has the same degree sequence as G
¡ Configuration model:
A C
A C

B D B D
A B C D
Randomly pair up
Nodes with spokes “mini”-n0des Resulting graph

We ignore double edges and self-loops when creating the final graph
10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 18
¡ Start from a given graph !
¡ Repeat the switching step "|$ % | times: C
A
§ Select a pair of edges AàB, CàD at random
§ Exchange the endpoints to give AàD, CàB B D

§ Exchange edges only if no multiple edges


or self-edges are generated A C

¡ Result: A randomly rewired graph: B D

§ Same node degrees, randomly rewired edges

¡ " is chosen large enough (e.g., " = 100) for the


process to converge
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 19
Example Application
Network significance profile

Different networks have&'()


!" = (%" +"
&(,-
−%
similar fingerprints! )/std(%"&(,- )
Image: (Milo et al., 2004)

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Milo et al., Science 2004
Pedro Ribeiro 20
¡ Count subgraphs ! in " #$%&
¡ Count subgraphs ! in random networks " #%'( :
§ Configuration model: Each " #%'( has the same
#(nodes), #(edges) and #(degree distribution) as " #$%&

¡ Assign Z-score to !:
0+#%'( )/std(.+#%'( )
§ *+ = (.+#$%& −.
§ High Z-score: Subgraph !
is a network motif of 6

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 21


Network Motifs Applicability
¡ Canonical definition:
Canon definition:
§ Directed and undirected
– Directed and Undirected
§ Colored and
– Colored anduncolored
uncolored Example: colored networ
§ Temporal and static motifs Examples from my own pap

¡ Variations
Variations onconcept
on the the concept
– Different frequency concepts
§ Different frequency concepts
– Different significance metrics
§ Different significance metrics
– Under-Representation (anti-motifs)
§ Under-Representation (anti-motifs)
– Different constraints for null model
§ – Weighted
Different networks for null model
constraints
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 22

Pedro Ribeiro
Z-scores of individual motifs for different networks
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 23
Z-scores of individual motifs for different networks
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 24
¡ Network of neurons and a gene network
contain similar motifs:
§ Feed-forward loops and bi-fan structures
§ Both are information processing networks with
sensory and acting components

¡ Food webs have parallel loops:


§ Prey of a particular predator share prey

¡ WWW network has bidirectional links


§ Design that allows the shortest path among sets of
related pages
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 25
¡ Graphlets: connected non-isomorphic subgraphs
§ Induced subgraphs of any frequency

For ! = #, %, &, … () there are *, +, *(, … ((,(+&,( graphlets!


10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Przulj et al., Bioinformatics 2004 26
¡ Next: Use graphlets to obtain a node-level
subgraph metric

¡ Degree counts #(edges) that a node touches:


§ Can we generalize this notion for graphlets? – Yes!

¡ Graphlet degree vector counts #(graphlets)


that a node touches

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 27


¡ An automorphism
Graphlet Degree Vectororbit takes into account the
symmetries of a subgraph
An automorphism “orbit” takes into account the
¡ Graphlet
symmetries Degree Vector (GDV): a vector with
of the graph
the frequency of the node in each orbit position
The graphlet degree vector is a feature vector with
¡ Example:
the frequency Graphlet
of the node in eachdegree vector of node v
orbit position

For a node ! of graph ", the


automorphism orbit of ! is #$% ! =
{( ∈ * " ; ( = , ! for some , Aut(")}.

The Aut denotes an automorphism


group of ", i.e., an isomorphism from "
to itself.

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 28


Pedro Ribeiro
¡ Graphlet degree vector counts #(graphlets) that a
node touches at a particular orbit
¡ Considering graphlets on 2 to 5 nodes we get:
§ Vector of 73 coordinates is a signature of a node that
describes the topology of node's neighborhood
§ Captures its interconnectivities out to a distance of 4 hops

¡ Graphlet degree vector provides a measure of a


node’s local network topology:
§ Comparing vectors of two nodes provides a highly
constraining measure of local topological similarity
between them

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 29


4 ergm.graphlets: A Package for ERG Modeling Based on Graphlet Statistics

F F F F
D D D D

A B A B A B A B
E E E E

C C C C

Orbit 15 Orbit 19 Orbit 27 Orbit 35

Orbit 0 1 2...3 4 5 6 7...14 15 16...18 19 20...26 27 28...34 35 36...72


GDV(A) 1 2 0...0 3 0 1 0...0 1 0...0 1 0...0 1 0...0 1 0...0

Graphlet Degree Vector (GDV) of node A:


Figure 2: An illustration of the graphlet signature of node A. The number of graphlets that
touch node A at orbit i is the ith element of graphlet degree vector of a node. The figure
§ !-th element of GDV(A): #(graphlets) that touch A at
highlights the graphlets that touch node A at orbits 15, 19, 27, and 35 from left to right,
orbit !
respectively (Kuchaiev et al. 2010).

Highlighted
§ only
terms, as arepatterns
induced subgraph graphlets
are takenthat touch
into account node
by the A at orbits
ergm.graphlets terms.
15, package
ergm.graphlets 19, 27, alsoand 35anfrom
provides leftlistto
extended right pattern based terms that
of subgraph
covers any observable subgraph pattern of size 2, 3, 4, and 5.
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks Yaveroglu et al., Journal of Statistical Software 2015 30

In the remainder of this article, we proceed as follows. First, we provide a brief summary
¡ Finding size-k motifs/graphlets requires solving
two challenges:
§ 1) Enumerating all size-k connected subgraphs
§ 2) Counting #(occurrences of each subgraph type)

¡ Just knowing if a certain subgraph exists in a


graph is a hard computational problem!
§ Subgraph isomorphism is NP-complete

¡ Computation time grows exponentially as the


size of the motif/graphlet increases
§ Feasible motif size is usually small (3 to 8)
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 31
¡ Network-centric approaches:
§ Step 1) Enumerate all k-connected sets of nodes
§ Step 2) Count subgraphs of each type via graph
isomorphisms test

¡ Algorithms:
§ Exact subgraph enumeration (ESU) [Wernicke 2006]
§ Kavosh [Kashani et al. 2009]
§ Subgraph sampling [Kashtan et al. 2004]

¡ Today: ESU algorithm


10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 32
¡ Two sets:
§ !"#$%&'() : currently constructed subgraph (motif)
§ !*+,*-"./- : set of candidate nodes to extend the motif
¡ Idea: Starting with a node 0, add those nodes to
!*+,*-"./- set that have two properties:
§ Their node_id must be larger than that of 0
§ They may only be neighbored to the newly added
node 1 but not to a node already in !"#$%&'()
¡ ESU is implemented as a recursive function:
§ The running of this function can be displayed as a
tree-like structure of depth 2, called the ESU-Tree

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 33


350 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIO

subrouti
a node
w1 # w2 .
in this se
lemma s
for provi
Lemma

1.

2.

Fig. 3. Pseudocode for the algorithm ESU which 3.


Wernicke, enumerates allonsize-k
IEEE/ACM Transactions Computational
Biology and Bioinformatics 2006
10/9/18subgraphs in a Jure
given graph
Leskovec, Stanford G. of(The
CS224W: Analysis Networks definition of the exclusive 34

neighborhood Nexcl ðv; V 0 Þ is given in Section 2.1.)


012345674829:56;<= >, @ :

0A741B829:56;<( D , @ , D)

!"#$%&'() !*+,*-"./-

{1, 3}, {2, 4, 5} {2, 3}, {4, 5}

Leaves represent
all size-3 induced
sub-graphs

¡ Nodes in the ESU-tree include two adjoining sets:


§ !"#$%&'() : A set of adjacent nodes
§ !*+,*-"./- : Nodes adjacent to at least one of the SUB nodes
whose numerical labels are larger than the SUB nodes labels
10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 35
¡ So far, we enumerated all size-k subgraphs in
the input graph

¡ Next step of ESU: Classify subgraphs placed in


the ESU-Tree leaves into non-isomorphic size-
k classes:
§ Determine which subgraphs in ESU-Tree leaves are
topologically equivalent (isomorphic) and group
them into subgraph classes accordingly
§ Use McKay’s nauty algorithm [McKay 1981]

10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 36


¡ Graphs ! and " are isomorphic if there exists a
bijection #: % ! → %(") such that:
§ Any two nodes ) and * of + are adjacent in + iff
,()) and ,(*) are adjacent in -
¡ Example: Are ! and " topologically equivalent?
! " A
B
4
7
C 1
D 3

#: E
F
5
8
G 2
H 9

Need to check 9! possible bijections between node sets I 6

Hard computational problem! + and - are isomorphic!


10/9/18 Jure Leskovec, Stanford CS224W: Analysis of Networks 37
¡ Project is a substantial part of the class
§ Students put significant effort, and great
results have been obtained in the past
¡ Types of projects:
§ (1) Analysis of an interesting dataset with the goal to
develop a (new) model or an algorithm
§ (2) A test of a model or algorithm (that you have read
about or your own) on real & simulated data.
§ Fast algorithms for big graphs. Can be integrated into SNAP.
¡ Other points:
§ The project should contain some mathematical analysis,
and some experimentation on real or synthetic data
§ The result of the project will typically be an 8 page paper,
describing the approach, the results, and related work.
§ Come to us if you need help with a project idea!
10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 39
Project proposal: 3-5 pages, teams of up to 3 students
¡ Project proposal has 3 parts:
§ (0) Quick 200 word abstract
§ (1) Related work / Reaction paper (2-3 pages):
§ Read 3 papers related to the project/class
§ Do reading beyond what was covered in class
§ Think beyond what you read. Don’t take other’s work for granted!
§ 2-3 pages: Summary (~1 page), Critique (~1 page)
§ (2) Proposal (1-2 pages):
§ Clearly define the problem you are solving.
§ How does it relate to what you read for the Reaction paper?
§ What data will you use? (make sure you already have it!)
§ Which algorithm/model will you use/develop? Be specific!
§ How will you evaluate/test your method?
See http://cs224w.stanford.edu/info.html for detailed instructions
and examples of previous proposals
10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 40
¡ Logistics:
§ 1) Register your group on this Google Form:
https://goo.gl/forms/2wvoXXLm8R91y4jV2
§ 2) Submit PDF on GradeScope AND at
http://snap.stanford.edu/submit/
§ Due in 9 days: Thu Oct 18 at 23:59 PST!

¡ If you need help/ideas/advice come to


Office Hours (Jure and Michele) or email us

10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 41


¡ Here you can find a list of interesting datasets
for your CS224W project:

http://cs224w.stanford.edu/data.html

10/12/18 Jure Leskovec, Stanford CS224W: Analysis of Networks, http://cs224w.stanford.edu 42

You might also like