friday_final_ppt

Contrastive Learning methods for Graph
Representation Learning
Chandan Kumar G P
M Tech AI
Indian Institute of Science
Bangalore
July 21, 2023
Chandan Kumar G P 1/12

Graph representation learning :
Aim of graph representation learning is to learn effective

representations of graphs.
The main goal of graph representation learning is to map each

node in a graph to a vector representation in a continuous
vector space, commonly referred to as an embedding.
Some methods of representation learning are Node2Vec ,

Graph Neural Networks (GNNs) , Graph Autoencoders.
Some applications are Node Classification, Link Prediction,

Graph Clustering and Community Detection, etc.

Contrastive Learning:
Contrastive learning is a selfsupervised representation learning
method
Contrastive learning in computer vision
Figure 1

Contrastive learning in graphs
Get two diffrent views from a single graphs and learn better
representations
Figure 2
Encode using encoders and use contrastive loss to learn better

representations

Contrastive Learning framework:
Let G = (V, E) denote a graph, where V = {v1 , v2 , ......., vN }
and E ∈ V × V represent the node set and the edge set
respectively , X ∈ RN ×F , A ∈ {0, 1}N ×N denote the feature
matrix and the adjacency matrix respectively .
′
Our objective is to learn a GNN encoder f (X, A) ∈ RN ×F
that produces node embeddings in low dimensionality.
Figure 3: Illustrative model

Loss function: for each positive pair (ui , vi )
θ(u ,v )/τ
ℓ(ui , vi ) = log θ(u ,v )/τ X eθ(u i,v i)/τ X θ(u ,u )/τ
e| {z } +
i i
e i k + e i k
positive pair k̸=i k̸=i
| {z } | {z }
inter-view negative pairs intra-view negative pairs
where τ is a temperature parameter,θ(u, v) = s(g(u), g(v)) ,
where s(., .) is the cosine similarity and g(.) is a nonlinear
projection (implemented with a two-layer perceptron model).
The overall objective to be maximized is the average over all

positive P
pairs
1 N
L = 2N i=1 [ℓ(ui , vi ) + ℓ(ui , vi )]

Adaptive Graph Augmentation:
This augmentation scheme tend to keep important structures

and attributes unchanged, while perturbing possibly
unimportant links and features.
T opology level augmentation : we sample a modified
subset Ẽ from the original E with probability
P ((u, v) ∈ Ẽ) = 1 − peuv
1 − peuv should reflect the importance of (u, v)
We define edge centrality as the average of two adjacent
e = (ϕ (u) + ϕ (v))/2
node’s centrality scores i.e., wuv c c
On directed graph, we simply use the centrality of the tail i.e.,
e = ϕ (v)
wuv c
seuv = log(wuv
e )
e −suv e
peuv = min( ssmax
e −µe .pe , pτ )
max s

We can use Degree centrality , Eigenvector centrality or
PageRank centrality
N ode attribute level augmentation : We add noise to
node attributes via randomly masking a fraction of dimensions
with zeros in node features. with probability pfi
the probability pfi should reflect the importance of the i-th
dimension of node features.
For each feature dimension we calculate weights as
wif = u∈V |xui |.ϕc (u)
P
We compute probabilty as
seuv = log(wuv
e )
f f
pfi = min( smax
f
−suv
f .pf , pτ )
smax −µs

Canonical Correlation Analysis based Contrastive learning:
This introduces a non-contrastive and non-discriminative

objective for self-supervised learning, which is inspired by
Canonical Correlation Analysis methods.
Canonical Correlation Analysis: For two random variables
xP∈ Rm and y ∈ Rn , their covariance matrix is
xy = Cov(x, y)
CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such

aT xy b
that the correlation, ρ =corr(aT x, bT y)= √ q P
aT xy a bT xy b
P
is maximized
Objective is: maxa,b aT s.t aT = bT

P P P
xy b xy a xy b =1

By replacing the linear transformation with neural networks.
Concretely, assuming x1, x2 as two views of an input data.
objective is: maxθ1 ,θ2 Tr(PθT1 (x1)Pθ2 (x2)) s.t
PθT1 (x1)Pθ1 (x1) = PθT2 (x2)Pθ2 (x2) =I
where Pθ1 and Pθ2 are two feedforward neural networks and I
is an identity matrix.
still such computation is really expensive and soft CCA
removes the hard decorrelation constraint by adopting the
following Lagrangian relaxation:
minθ1 ,θ2
Ldist (Pθ1 (x1), Pθ2 (x2))+λ(LSDL (Pθ1 (x1)) + LSDL (Pθ2 (x2)))
Ldist measures correlation between two views representations
and LSDL called stochastic decorrelation loss
2 2 2
L = ∥Z̃A − Z̃B ∥F +λ (∥Z̃TA Z̃A − I∥F + ∥Z̃TB Z̃B − I∥F )
| {z } | {z }
invariance term decorrelation term

Results:
X for node features, A for adjacency matrix, S for diffusion matrix,

and Y for node labels

Thank You

friday_final_ppt

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

friday_final_ppt

Uploaded by

Copyright:

Available Formats

Contrastive Learning methods for Graph

July 21, 2023

Chandan Kumar G P 1/12

Aim of graph representation learning is to learn effective

The main goal of graph representation learning is to map each

Some methods of representation learning are Node2Vec ,

Some applications are Node Classification, Link Prediction,

Chandan Kumar G P 2/12

Chandan Kumar G P 3/12

Encode using encoders and use contrastive loss to learn better

Chandan Kumar G P 4/12

Figure 3: Illustrative model

Chandan Kumar G P 5/12

The overall objective to be maximized is the average over all

Chandan Kumar G P 6/12

This augmentation scheme tend to keep important structures

Chandan Kumar G P 7/12

Chandan Kumar G P 8/12

This introduces a non-contrastive and non-discriminative

CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such

Objective is: maxa,b aT s.t aT = bT

Chandan Kumar G P 9/12

Chandan Kumar G P 10/12

X for node features, A for adjacency matrix, S for diffusion matrix,

Chandan Kumar G P 11/12

Chandan Kumar G P 12/12

You might also like