Graph Neural Networks

Graph Neural Networks: Introduction,
some theoretical properties

Nicolas Keriven
CNRS, GIPSA-lab
Graphs ?
A Graph is formed by:
●
Nodes (or vertices)
●
Edges
1/29
Graphs ?
●
Nodes (or vertices)
●
Edges
Can include:
●
Node features
●
Edges features
●
Directed or undirected edges
1/29
Graphs ?
●
Nodes (or vertices)
●
Edges
Can include: A graph is:

●
Node features ●
A purely mathematical object!
●
Edges features ●
A principled way to represent many types of
●
Directed or undirected edges complex data (eg. any type of network)
1/29
Graphs: examples
Molecule
Knowledge graph
Social network
Brain connectivity network
Internet
Computer network
Gene regulatory network Transportation network
Scene understanding network
Protein interaction
network 3D mesh
2/29
Graphs: examples
Molecule
Knowledge graph
Social network
Brain connectivity network
Internet
Computer network
Gene regulatory network Transportation network
Scene understanding network
“if all you have is a hammer,

Protein interaction
everything looks like a nail”
network 3D mesh
2/29
Graphs: notations
A graph is usually represented by
●
An adjacency matrix
●
(Optionally) node/edge feature matrices
3/29
Graphs: notations
●
An adjacency matrix
●
3/29
Graphs: notations
●
An adjacency matrix
●
A is usually sparse, (lots of 0s), so

fast to handle with dedicated tools
3/29
Machine learning… on graphs
Machine learning on graphs comes in many flavors
4/29
●
Supervised/semi-supervised:
●
Graph classification: labelled graphs -> label new graph
●
Molecule classification, drug efficiency prediction
4/29
●
●
●
●
Node (or edge) classification: labelled nodes -> label other nodes
●
Advertisement, protein interface prediction
4/29
●
●
●
●
●
●
Unsupervised (... also semi-supervised):
●
Community detection: one graph -> group nodes
●
Social network analysis
4/29
●
●
●
●
●
●
●
●
●
Link prediction: one graph -> potential new edge?
●
Recommender systems
4/29
●
●
●
●
●
●
●
●
●
Link prediction: one graph -> potential new edge?
●
Recommender systems
●
But also: dynamic graph (node, edge) prediction (physical systems simulation),
graph generation (drug design)...
4/29
ML on graphs: Graph Neural Networks
Graph Neural Networks (GNN) are “deep architectures” to do ML on graphs.
5/29

●
Very (very) trendy right now!
●
A lot of good papers, a lot of not-so-good papers
●
a lot of “noise”! (review papers coming out regularly)
5/29

●
Very (very) trendy right now!
●
A lot of good papers, a lot of not-so-good papers
●
a lot of “noise”! (review papers coming out regularly)
●
Does NOT work that well! (compared to other “deep learning”)
●
Simple methods may perform better, people might not test them...
●
Room for improvement! (many interesting challenges)
●
No “ImageNet moment” yet for GNNs
5/29
ML on graphs: some material
●
(Some) GNN reviews
●
Bruna et al. Geometric Deep Learning: Going beyond Euclidean data (2017)
●
Wu et al. A Comprehensive Survey on Graph Neural Networks. (2020)
●
Hamilton. Graph Representation Learning (2020) (book)
●
Dwivedi et al. Benchmarking Graph Neural Networks. (2020)
6/29
●
(Some) GNN reviews
●
●
●
●
●
Datasets
●
Stanford Large Network Dataset Collection. snap.stanford.edu/data
●
Hu et al. Open Graph Benchmark (2020)
6/29
●
(Some) GNN reviews
●
●
●
●
●
Datasets
●
●
●
Python Libraries
●
Networkx (medium-sized graph manipulation, visualization)
●
Pytorch Geometric (pytorch-based GNN)
●
Deep Graph Library (Tensorflow-based GNN)
6/29
●
(Some) GNN reviews
●
●
●
●
●
Datasets
●
●
●
Python Libraries
●
Networkx (medium-sized graph manipulation, visualization)
●
Pytorch Geometric (pytorch-based GNN)
●
Deep Graph Library (Tensorflow-based GNN)
●
Online material, etc.
●
Sergey Ivanov. GraphML Newsletter. graphml.substack.com
●
M. Bronstein’s posts on Medium: medium.com/@michael.bronstein
●
Xavier Bresson’s talks on Youtube (search his name)
6/29
Outline
From Deep Convolutional Networks to GNNs
Some recent (theoretical) results

On small graphs
On large graphs
7/29
Deep Neural Networks
“Deep” learning: alternates between
linearities and (differentiable) non-
linearities
8/29
linearities
State-of-the-art in: most everything ? (with sufficient data and domain knowledge…)
• Computer vision
• Speech recognition
• Natural Language Processing
• Reinforcement learning
• Etc etc etc.
8/29
linearities
State-of-the-art in: most everything ? (with sufficient data and domain knowledge…)
• Computer vision
• Speech recognition
• Natural Language Processing
• Reinforcement learning
• Etc etc etc. How do we extend them to Graphs?
No node ordering: must be invariant to
relabelling of the nodes (graph isomorphism)
8/29
ConvNets: convolution?
●
The building blocks of Deep (Convolutional) Neural Networks are convolutions.
9/29
●
9/29
●
Convolutions are local pattern-matching linear operators.
9/29
●
Convolutions are local pattern-matching linear operators.

Usual filter banks (wavelets) use fixed filters, in ML the filters are (usually) learned.
9/29
ConvNets: multiscale
●
How to detect complex “high-level” shapes?
10/29
●
●
Trying every pattern is impossible! -> stack filters (and subsampling) to make it hierarchical.
10/29
●
●
●
Naively stacking convolutions is still linear... -> add non-linear functions between each scale (layer)
10/29
●
●
●
●
Deeper layers are indeed activated by “higher-level” patterns.
Zeiler and Fergus. Visualizing and Understanding Convolutional Networks (2013)
10/29
●
●
●
●
Deeper layers are indeed activated by “higher-level” patterns.
Zeiler and Fergus. Visualizing and Understanding Convolutional Networks (2013)
●
DNN are robust to (small) spatial deformation.
Bietti and Mairal. Group invariance, stability to deformations, and complexity of deep convolutional representations. (2019)
10/29
Convolution on graphs?
How to perform convolution of graphs?
11/29
Convolution on graphs?
How to perform convolution of graphs?
Two (main) problems to “pattern-

matching” on graphs:
●
No inherent node ordering
●
No fixed neighborhood size
11/29
Convolution and Fourier
The “convolution theorem”: convolution is multiplication in the Fourier domain.
12/29
12/29
12/29
12/29
12/29
12/29
12/29
Fourier transform on graphs: Laplacian
How to define the Fourier transform on graphs?
13/29
●
(On the torus) complex exponentials are eigenfunctions of the Laplacian
13/29
●
●
Laplacian operators (matrix) on graphs can be defined!
13/29
●
●
●
Dirichlet energy , and where
13/29
●
●
●
Dirichlet energy , and where
●
Normalized Laplacian (eigenvalues between 0 and 2)
13/29
Fourier transform on graphs
14/29
Diagonalize the Laplacian:
14/29
Diagonalize the Laplacian:
Eigenvalues: “frequencies”
14/29
Diagonalize the Laplacian: Eigenvectors: “Fourier modes”
14/29
Diagonalize the Laplacian: Eigenvectors: “Fourier modes”
●
Fourier transform
●
Inverse Fourier transform
14/29
Filtering on graphs
How to filter a signal ?
15/29
Filtering on graphs
●
Compute Fourier transform
15/29
Filtering on graphs
●
●
Multiply by filter
15/29
Filtering on graphs
●
●
Multiply by filter
●
Compute inv. Fourier transform
15/29
Filtering on graphs
●
●
Multiply by filter
●
Compute inv. Fourier transform
Chung. Spectral Graph Theory. (1999)

Shuman et al. The Emerging Field of Signal Processing on Graphs. (2013)
15/29
Filtering on graphs
●
Frequencies and Fourier modes are graph-dependent
16/29
Filtering on graphs
●
●
Use a function of the frequencies
16/29
Filtering on graphs
●
●
16/29
Filtering on graphs
●
●
Ex: Low-pass
16/29
Filtering on graphs
●
●
Ex: Low-pass
●
Diagonalization is (very) costly on large graphs!
16/29
Filtering on graphs
●
●
Ex: Low-pass
●
●
Use polynomial filters
16/29
Filtering on graphs
●
●
Ex: Low-pass
●
●
16/29
Filtering on graphs
●
●
Ex: Low-pass
●
●
Poly. Approx. of low-pass
16/29
Filtering on graphs
●
●
Ex: Low-pass
●
●
●
Filter are “localized” Poly. Approx. of low-pass
●
Can make use of efficient sparse matrix-
vector multiplication Hammond et al. Wavelets on Graphs via Spectral Graph Theory. (2011)
16/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Convolutional Networks on
Graph-Structured Data (2015)
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable (Polynomial) Filters

Defferrard et al. Convolutional Neural Networks on
Graphs with Fast Localized Spectral Filtering (2016)
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable Bias
Defferrard et al. Convolutional Neural Networks on
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable Bias
Non-lin. function (eg ReLU) Defferrard et al. Convolutional Neural Networks on
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable Bias
●
Final layer: output signal over nodes (for node classif) or perform pooling (for graph classif)
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable Bias
●
●
Early architectures include “graph coarsening” (subsampling) but difficult problem
17/29
(Spectral) GNNs
Spectral GNN
Henaff et al. Deep
Trainable Bias
●
●
Early architectures include “graph coarsening” (subsampling) but difficult problem
●
Need input node feature . No real solution otherwise...
Duong et al. On Node Features for Graph Neural Networks (2019)
Vignac et al. Building powerful and equivariant graph neural networks with structural message-passing (2020)
17/29
The message-passing paradigm
“Modern” GNNs are often built around a message-passing interpretation.
Gilmer et al. Neural Message Passing for Quantum Chemistry. (2017)
Kipf et al. Semi-Supervised Learning with Graph Convolutional Networks (2017)
18/29
At each layer, each node

receives “messages” from its
neighbors.
18/29

●
Messages are treated as a set: no node ordering!
receives “messages” from its
neighbors.
18/29

●
receives “messages” from its ●
When AGGREGATE is SUM: order-1 polynomial filter!! (but
neighbors. can be more general: eg MAX or MIN, Attention-based...)
18/29

●
receives “messages” from its ●
When AGGREGATE is SUM: order-1 polynomial filter!! (but
neighbors. can be more general: eg MAX or MIN, Attention-based...)
●
Tip of the iceberg: approx. 100 GNN papers a month on arXiv
●
Despite 1000s of papers, same ideas coming round: be critical, learn to spot incremental changes!
18/29
Outline

On small graphs
On large graphs
19/29
Expressive power of GNN
●
Classical DNN are “universal”: as the number of neurons grow, they can approximate
any continuous function. What about GNNs? Hornik et al. Multilayer Feedforward Networks are Universal Approximators (1989)
Cybenko. Approximation by superpositions of a sigmoidal function (1989)
20/29
●
●
“Graph-classif” GNN are insensitive to relabelling of the nodes, aka graph isomorphism
●
They are permutation-invariant. “Node-classif” GNN are permutation-equivariant
20/29
●
●
“Graph-classif” GNN are insensitive to relabelling of the nodes, aka graph isomorphism
●
They are permutation-invariant. “Node-classif” GNN are permutation-equivariant
Graph isomorphism problem:

●
No known polynomial algorithm. Best:
●
Not known if NP-complete
●
Might be a class of complexity on its own! Babai. Graph Isomorphism in Quasipolynomial Time (2015)
20/29
GNN vs. WL
●
A classical algorithm for graph isomorphism is the Weisfeiler-Lehman test.
21/29
GNN vs. WL
●
●
Starts with arbitrary labelling of nodes (among discrete set)
21/29
GNN vs. WL
●
●
●
Propagate labels with injective agg. function, repeat n times, and compares final sets of labels.
21/29
GNN vs. WL
●
●
●
●
Can distinguish a “large-class” of non-isomorphic graphs (but not all!)
Weisfeiler and Lehman. A reduction of a
graph to a canonical form and an algebra
arising during this reduction (1968)
Babai and Kucera. Canonical labelling of
graphs in linear average time (1979)
WL fails here...
21/29
GNN vs. WL
●
●
●
●
Can distinguish a “large-class” of non-isomorphic graphs (but not all!)
Weisfeiler and Lehman. A reduction of a
graph to a canonical form and an algebra
arising during this reduction (1968)
Babai and Kucera. Canonical labelling of
graphs in linear average time (1979)
WL fails here...
By construction, message-passing GNNs are not more powerful than WL test, and
can be as powerful if AGGREGATE is injective (sufficient number of neurons).
Xu et al. How Powerful are Graph Neural Networks? (2019)
21/29
Beyond WL...
●
GNN expressivity can be improved...
22/29
Beyond WL...
●
GNN expressivity can be improved... Maron et al. Provably Powerful Graph Networks (2019)
Chen et al. On the equivalence between graph isomorphism
●
to be as powerful as “higher-order WL” testing and function approximation with GNNs (2019)
22/29
Beyond WL...
●
●
Vignac et al. Building powerful and equivariant graph

●
by giving nodes unique/random identifiers neural networks with structural message-passing (2020)
22/29
Beyond WL...
●
●

●
Bouritsas et al. Improving Graph Neural Network

●
by counting/sampling substructures
Expressivity via Subgraph Isomorphism Counting (2020)
22/29
Beyond WL...
●
●

●

●
●
True universality can be attained by allowing unbounded
“tensorization” order Maron et al. On the Universality of Invariant Networks (2019)
Keriven and Peyré. Universal Invariant and Equivariant Graph
●
Far too expensive to implement in practice... Neural Network. (2019)
22/29
Beyond WL...
●
●

●

●
●
●
New Stone-Weierstrass
theorem!
22/29
Beyond WL...
●
●

●

●
●
●
New Stone-Weierstrass
●
Limitations... theorem!
●
As with classical NNs, universality is hardly related to practical results
Dwivedi et al. Benchmarking Graph Neural Networks (2020)
●
Real graphs are never even close to being isomorphic!
22/29
Outline

On small graphs
On large graphs
23/29
Large graphs?
●
Large graphs may “look the same”, but are never isomorphic.
24/29
Large graphs?
●
●
CNN (translation-invariant) are robust to spatial deformation
24/29
Large graphs?
●
●
●
GNN: stability to discrete graph metrics
24/29
Large graphs?
●
●
●
●
Difficult to interpret, difficult to define for different-sized graphs
●
What’s a meaningful notion of deformation for a graph?
24/29
Large graphs?
●
●
●
●
Difficult to interpret, difficult to define for different-sized graphs
●
What’s a meaningful notion of deformation for a graph?
Keriven, Bietti, Vaiter. Convergence and Stability of Graph Convolutional We use models of large random
Networks on Large Random Graphs. NeurIPS 2020 (Spotlight) graphs to study GNNs.
24/29
Random graphs models
Chung and Lu. Complex Graphs and Networks (2004)
Long history of modelling large graphs with Penrose. Random Geometric Graphs (2008)
Lovasz. Large networks and graph limits (2012)
random generative models
Frieze and Karonski. Introduction to random graphs (2016)
25/29
Latent position models (W-random graphs, kernel random graphs...)
25/29
Unknown latent variables
25/29
Unknown latent variables Connectivity kernel
25/29
Unknown latent variables Connectivity kernel Node features
25/29

Dense Sparse
Relatively sparse
25/29

Dense Sparse
Relatively sparse
Includes Erdös-Rényi,
Stochastic Block Models,
Gaussian kernel, epsilon-
graphs...
25/29
Discrete vs. continuous
As the number of nodes grows, the GNN will converge to a limit “continuous” model.
26/29
(Spectral) Graph Neural Networks Continuous Graph Neural Networks
26/29

●
Propagate signal over nodes ●
Propagate function over latent space
26/29

●
Polynomial graph filters Filters with normalized

with normalized Laplacian Laplacian operator
26/29

●
Polynomial graph filters Filters with normalized

with normalized Laplacian Laplacian operator
Output Output
●
Signal over nodes (permutation-equivariant) ●
Function (“continuous” permutation-equivariant)
●
Single vector (permutation-invariant) ●
Vector (“continuous” permutation-invariant)
26/29
Continuous limit of GNNs
Thm (Non-asymptotic convergence)

If , with probability , the “deviation” between
discrete and continuous GNN is at most
27/29
Direct norm for permutation-invariant, MSE for permutation-equivariant

27/29

27/29

NB: Thanks to normalized

Laplacian, the limit does not
depend on but the rate
of convergence does...
27/29
Stability of continuous GNNs
Latent position models allow to define intuitive geometric deformations
Deformation of distribution Deformation of kernel
28/29
Thm (Stability, simplified)

For translation-invariant kernels, if:
●
is replaced by
●
is replaced by (and is translated)
●
is replaced by
Then, the deviation of c-GNN is bounded by
28/29
Thm (Stability, simplified)

For translation-invariant kernels, if:
●
is replaced by
●
is replaced by (and is translated)
●
is replaced by
Outlooks: approximation power, generalization,
Then, the deviation of c-GNN is bounded by optimization, other RG models...
28/29
Conclusion
●
Graph ML and GNN are now “first-class citizen” in ML
●
Mostly “engineering/computer-science” driven, some blind spots (statistics, probability...)
●
Still a lot to do! (“low-hanging fruits”)
●
The community is fast-paced and growing exponentially, important to have a critical eye!
29/29
Conclusion
●
●
●
●
●
Many feel like the “message-passing” paradigm is coming to an end
●
“Real” challenging applications/datasets start to emerge, “ImageNet moment” may be around
the corner (or not?)
●
Incredibly many open questions (including many in “non-deep” graph ML!)
29/29
Conclusion
●
●
●
●
●
Many feel like the “message-passing” paradigm is coming to an end
●
“Real” challenging applications/datasets start to emerge, “ImageNet moment” may be around
the corner (or not?)
●
Incredibly many open questions (including many in “non-deep” graph ML!)
Don’t hesitate to contact me if

nkeriven.github.io you’re interested in the topic
29/29

Graph Neural Networks

Uploaded by

Copyright:

Available Formats

You might also like

Graph Neural Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Graph Neural Networks

Uploaded by

Copyright:

Available Formats

Graph Neural Networks: Introduction,

some theoretical properties

Can include: A graph is:

“if all you have is a hammer,

A is usually sparse, (lots of 0s), so

Graph Neural Networks (GNN) are “deep architectures” to do ML on graphs.

Graph Neural Networks (GNN) are “deep architectures” to do ML on graphs.

Graph Neural Networks (GNN) are “deep architectures” to do ML on graphs.

From Deep Convolutional Networks to GNNs

Some recent (theoretical) results

Convolutions are local pattern-matching linear operators.

Convolutions are local pattern-matching linear operators.

Two (main) problems to “pattern-

Chung. Spectral Graph Theory. (1999)

Poly. Approx. of low-pass

Trainable (Polynomial) Filters

At each layer, each node

At each layer, each node

At each layer, each node

At each layer, each node

From Deep Convolutional Networks to GNNs

Some recent (theoretical) results

Graph isomorphism problem:

Vignac et al. Building powerful and equivariant graph

Vignac et al. Building powerful and equivariant graph

Bouritsas et al. Improving Graph Neural Network

Vignac et al. Building powerful and equivariant graph

Bouritsas et al. Improving Graph Neural Network

Vignac et al. Building powerful and equivariant graph

Bouritsas et al. Improving Graph Neural Network

Vignac et al. Building powerful and equivariant graph

Bouritsas et al. Improving Graph Neural Network

From Deep Convolutional Networks to GNNs

Some recent (theoretical) results

Latent position models (W-random graphs, kernel random graphs...)

Latent position models (W-random graphs, kernel random graphs...)

Unknown latent variables

Latent position models (W-random graphs, kernel random graphs...)

Unknown latent variables Connectivity kernel

Latent position models (W-random graphs, kernel random graphs...)

Unknown latent variables Connectivity kernel Node features

Latent position models (W-random graphs, kernel random graphs...)

Latent position models (W-random graphs, kernel random graphs...)

(Spectral) Graph Neural Networks Continuous Graph Neural Networks

(Spectral) Graph Neural Networks Continuous Graph Neural Networks

(Spectral) Graph Neural Networks Continuous Graph Neural Networks

Polynomial graph filters Filters with normalized

(Spectral) Graph Neural Networks Continuous Graph Neural Networks

Polynomial graph filters Filters with normalized

Thm (Non-asymptotic convergence)

Thm (Non-asymptotic convergence)

Thm (Non-asymptotic convergence)

Thm (Non-asymptotic convergence)

NB: Thanks to normalized

Deformation of distribution Deformation of kernel

Deformation of distribution Deformation of kernel

Thm (Stability, simplified)

Deformation of distribution Deformation of kernel