DRDO_PPT_m1

Unsupervised Representation Learning methods
for Knowledge Graph
Chandankumar G P
M Tech AI
Indian Institute of Science
Bangalore
Prof. Sundeep Prabhakar Chepuri
November 28, 2023
Chandankumar G P 1/16
Outline of Discussion :
Contrastive Learning.
Contrastive Learning with Adaptive Graph Augmentation.
Canonical Correlation Analysis based Contrastive learning.
Scattering Transforms.
Graph representation learning :
Aim of graph representation learning is to learn effective

representations of graphs.
The main goal of graph representation learning is to map each

node in a graph to a vector representation in a continuous
vector space, commonly referred to as an embedding.
Some methods of representation learning are Node2Vec ,

Graph Neural Networks (GNNs) , Graph Autoencoders.
Some applications are Node Classification, Link Prediction,

Graph Clustering and Community Detection, etc.
Contrastive Learning:
Contrastive learning is a selfsupervised representation learning
method
Contrastive learning in computer vision
Figure 1
Contrastive learning in graphs
Get two diffrent views from a single graphs and learn better
representations
Figure 2
Encode using encoders and use contrastive loss to learn better

representations
Contrastive Learning framework:
Let G = (V, E) denote a graph, where V = {v1 , v2 , ......., vN }
and E ∈ V × V represent the node set and the edge set
respectively , X ∈ RN ×F , A ∈ {0, 1}N ×N denote the feature
matrix and the adjacency matrix respectively .
′
Our objective is to learn a GNN encoder f (X, A) ∈ RN ×F
that produces node embeddings in low dimensionality.
Figure 3: Illustrative model
Loss function: for each positive pair (ui , vi )
θ(u ,v )/τ
ℓ(ui , vi ) = log θ(u ,v )/τ X eθ(u i,v i)/τ X θ(u ,u )/τ
e| {z } +
i i
e i k + e i k
positive pair k̸=i k̸=i
| {z } | {z }
inter-view negative pairs intra-view negative pairs
where τ is a temperature parameter,θ(u, v) = s(g(u), g(v)) ,
where s(., .) is the cosine similarity and g(.) is a nonlinear
projection (implemented with a two-layer perceptron model).
The overall objective to be maximized is the average over all

positive P
pairs
1 N
L = 2N i=1 [ℓ(ui , vi ) + ℓ(ui , vi )]
Adaptive Graph Augmentation:
This augmentation scheme tend to keep important structures

and attributes unchanged, while perturbing possibly
unimportant links and features.
T opology level augmentation : we sample a modified
subset Ẽ from the original E with probability
P ((u, v) ∈ Ẽ) = 1 − peuv
1 − peuv should reflect the importance of (u, v)
We define edge centrality as the average of two adjacent
e = (ϕ (u) + ϕ (v))/2
node’s centrality scores i.e., wuv c c
On directed graph, we simply use the centrality of the tail i.e.,
e = ϕ (v)
wuv c
seuv = log(wuv
e )
e −suve
peuv = min( ssmax
e −µe .pe , pτ )
max s
We can use Degree centrality , Eigenvector centrality or
PageRank centrality
N ode attribute level augmentation : We add noise to
node attributes via randomly masking a fraction of dimensions
with zeros in node features. with probability pfi
the probability pfi should reflect the importance of the i-th
dimension of node features.
For each feature dimension we calculate weights as
wif = u∈V |xui |.ϕc (u)
P
We compute probabilty as
seuv = log(wuv
e )
f f
pfi = min( smax
f
−suv
f .pf , pτ )
smax −µs
Canonical Correlation Analysis based Contrastive learning:
This introduces a non-contrastive and non-discriminative

objective for self-supervised learning, which is inspired by
Canonical Correlation Analysis methods.
Canonical Correlation Analysis: For two random variables
xP∈ Rm and y ∈ Rn , their covariance matrix is
xy = Cov(x, y)
CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such

aT xy b
that the correlation, ρ =corr(aT x, bT y)= √ q P
aT xy a bT xy b
P
is maximized
Objective is: maxa,b aT s.t aT = bT

P P P
xy b xy a xy b =1
By replacing the linear transformation with neural networks.
Concretely, assuming x1, x2 as two views of an input data.
objective is: maxθ1 ,θ2 Tr(PθT1 (x1)Pθ2 (x2)) s.t
PθT1 (x1)Pθ1 (x1) = PθT2 (x2)Pθ2 (x2) =I
where Pθ1 and Pθ2 are two feedforward neural networks and I
is an identity matrix.
still such computation is really expensive and soft CCA
removes the hard decorrelation constraint by adopting the
following Lagrangian relaxation:
minθ1 ,θ2
Ldist (Pθ1 (x1), Pθ2 (x2))+λ(LSDL (Pθ1 (x1)) + LSDL (Pθ2 (x2)))
Ldist measures correlation between two views representations
and LSDL called stochastic decorrelation loss
2 2 2
L = ∥Z̃A − Z̃B ∥F +λ (∥Z̃TA Z̃A − I∥F + ∥Z̃TB Z̃B − I∥F )
| {z } | {z }
invariance term decorrelation term
Geometric Scattering Transform for Graph Data
Transfer learning approaches provide evidence that suitably

constructed deep filter banks should be able to extract
task-agnostic semantic information from the data.
The geometric scattering transform is a deep filter bank for

feature extraction on graphs, which generalizes the Euclidean
scattering transform.
The scattering transform is constructed as a cascade of linear

wavelet transforms and nonlinear complex modulus operations
that provides features with guaranteed invariance to a
predetermined Lie group of operations such as rotations,
translations, or scaling.
Graph Wavelets
Graph wavelets are defined as the difference between lazy

random walks that have propagated at different time scales.
Let G = (V, E, W ) be a weighted graph, consisting of n
vertices V = {v1 , ..., vn },edges E ⊆ {(vl , vm ) : 1 ≤ l, m, < n}
and weights W = {w (vl , vm ) > 0 : (vl , vm ) ∈ E} .
P
Weighted degree of vertex vl is deg (vl ) = m A (vl , vm ) and
the corresponding diagonal n × n degree matrix D is given by
D (vl , vl ) = deg (vl ) , D (vl , vm ) = 0, l ̸= m.
The n × n graph Laplacian matrix LG = L on G is define as
−1 −1
L = D − A and normalized Laplacian is N = D 2 LD 2
−1 −1
= I − D 2 AD 2
The n × n lazy random walk matrix is P = 21 I + AD−1

and Pt governs the probability distribution of a lazy random

walk after t steps.
The value Pt x (vl ) is the weighted average of x (vl ) with all
values x (vm ) such that vm is within t steps of vl .
The n × n wavelet matrix at the scale 2j is defined as
j−1 j j−1
j−1

Ψj = P2 − P2 = P2 I − P2 .
The wavelet coefficients up to the scale 2J are:
Ψ(J) x (vl ) = [Ψj x (vl ) : 1 ≤ j ≤ J] .
Geometric Scattering on Graphs :
The zero order scattering moments is given by the

unnormalized q th moments of x given by.
n
X
Sx (q) = x (vl )q , 1 ≤ q ≤ Q.
l=1
First order geometric scattering moments is given by:

n
X
Sx (j, q) = |Ψj x (vl )|q , 1 ≤ j ≤ J, 1 ≤ q ≤ Q.
l=1
Second order geometric scattering moments:

n
′ X q ′
Sx j, j , q = Ψj ′ |Ψj x (vl )| , 1 ≤ j < j ≤ J, 1 ≤ q ≤ Q.
l=1
Thank You

DRDO_PPT_m1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DRDO_PPT_m1

Uploaded by

Copyright:

Available Formats

Unsupervised Representation Learning methods

for Knowledge Graph

Prof. Sundeep Prabhakar Chepuri

November 28, 2023

Contrastive Learning with Adaptive Graph Augmentation.

Canonical Correlation Analysis based Contrastive learning.

Aim of graph representation learning is to learn effective

The main goal of graph representation learning is to map each

Some methods of representation learning are Node2Vec ,

Some applications are Node Classification, Link Prediction,

Encode using encoders and use contrastive loss to learn better

Figure 3: Illustrative model

The overall objective to be maximized is the average over all

This augmentation scheme tend to keep important structures

This introduces a non-contrastive and non-discriminative

CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such

Objective is: maxa,b aT s.t aT = bT

Transfer learning approaches provide evidence that suitably

The geometric scattering transform is a deep filter bank for

The scattering transform is constructed as a cascade of linear

Graph wavelets are defined as the difference between lazy

and Pt governs the probability distribution of a lazy random

The wavelet coefficients up to the scale 2J are:

Ψ(J) x (vl ) = [Ψj x (vl ) : 1 ≤ j ≤ J] .

The zero order scattering moments is given by the

First order geometric scattering moments is given by:

Second order geometric scattering moments:

You might also like