Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Unsupervised Representation Learning methods

for Knowledge Graph

Chandankumar G P

M Tech AI
Indian Institute of Science
Bangalore

Prof. Sundeep Prabhakar Chepuri

November 28, 2023

Chandankumar G P 1/16
Outline of Discussion :

Contrastive Learning.

Contrastive Learning with Adaptive Graph Augmentation.

Canonical Correlation Analysis based Contrastive learning.

Scattering Transforms.

Chandankumar G P 2/16
Graph representation learning :

Aim of graph representation learning is to learn effective


representations of graphs.

The main goal of graph representation learning is to map each


node in a graph to a vector representation in a continuous
vector space, commonly referred to as an embedding.

Some methods of representation learning are Node2Vec ,


Graph Neural Networks (GNNs) , Graph Autoencoders.

Some applications are Node Classification, Link Prediction,


Graph Clustering and Community Detection, etc.

Chandankumar G P 3/16
Contrastive Learning:
Contrastive learning is a selfsupervised representation learning
method
Contrastive learning in computer vision

Figure 1

Chandankumar G P 4/16
Contrastive learning in graphs
Get two diffrent views from a single graphs and learn better
representations

Figure 2

Encode using encoders and use contrastive loss to learn better


representations

Chandankumar G P 5/16
Contrastive Learning framework:
Let G = (V, E) denote a graph, where V = {v1 , v2 , ......., vN }
and E ∈ V × V represent the node set and the edge set
respectively , X ∈ RN ×F , A ∈ {0, 1}N ×N denote the feature
matrix and the adjacency matrix respectively .

Our objective is to learn a GNN encoder f (X, A) ∈ RN ×F
that produces node embeddings in low dimensionality.

Figure 3: Illustrative model

Chandankumar G P 6/16
Loss function: for each positive pair (ui , vi )
θ(u ,v )/τ
ℓ(ui , vi ) = log θ(u ,v )/τ X eθ(u i,v i)/τ X θ(u ,u )/τ
e| {z } +
i i
e i k + e i k
positive pair k̸=i k̸=i
| {z } | {z }
inter-view negative pairs intra-view negative pairs
where τ is a temperature parameter,θ(u, v) = s(g(u), g(v)) ,
where s(., .) is the cosine similarity and g(.) is a nonlinear
projection (implemented with a two-layer perceptron model).

The overall objective to be maximized is the average over all


positive P
pairs
1 N
L = 2N i=1 [ℓ(ui , vi ) + ℓ(ui , vi )]

Chandankumar G P 7/16
Adaptive Graph Augmentation:

This augmentation scheme tend to keep important structures


and attributes unchanged, while perturbing possibly
unimportant links and features.
T opology level augmentation : we sample a modified
subset Ẽ from the original E with probability
P ((u, v) ∈ Ẽ) = 1 − peuv
1 − peuv should reflect the importance of (u, v)
We define edge centrality as the average of two adjacent
e = (ϕ (u) + ϕ (v))/2
node’s centrality scores i.e., wuv c c
On directed graph, we simply use the centrality of the tail i.e.,
e = ϕ (v)
wuv c

seuv = log(wuv
e )

e −suve
peuv = min( ssmax
e −µe .pe , pτ )
max s

Chandankumar G P 8/16
We can use Degree centrality , Eigenvector centrality or
PageRank centrality
N ode attribute level augmentation : We add noise to
node attributes via randomly masking a fraction of dimensions
with zeros in node features. with probability pfi
the probability pfi should reflect the importance of the i-th
dimension of node features.
For each feature dimension we calculate weights as
wif = u∈V |xui |.ϕc (u)
P

We compute probabilty as
seuv = log(wuv
e )

f f
pfi = min( smax
f
−suv
f .pf , pτ )
smax −µs

Chandankumar G P 9/16
Canonical Correlation Analysis based Contrastive learning:

This introduces a non-contrastive and non-discriminative


objective for self-supervised learning, which is inspired by
Canonical Correlation Analysis methods.
Canonical Correlation Analysis: For two random variables
xP∈ Rm and y ∈ Rn , their covariance matrix is
xy = Cov(x, y)

CCA aims at seeking two vectors a ∈ Rm and b ∈PRn such


aT xy b
that the correlation, ρ =corr(aT x, bT y)= √ q P
aT xy a bT xy b
P

is maximized

Objective is: maxa,b aT s.t aT = bT


P P P
xy b xy a xy b =1

Chandankumar G P 10/16
By replacing the linear transformation with neural networks.
Concretely, assuming x1, x2 as two views of an input data.
objective is: maxθ1 ,θ2 Tr(PθT1 (x1)Pθ2 (x2)) s.t
PθT1 (x1)Pθ1 (x1) = PθT2 (x2)Pθ2 (x2) =I
where Pθ1 and Pθ2 are two feedforward neural networks and I
is an identity matrix.
still such computation is really expensive and soft CCA
removes the hard decorrelation constraint by adopting the
following Lagrangian relaxation:
minθ1 ,θ2
Ldist (Pθ1 (x1), Pθ2 (x2))+λ(LSDL (Pθ1 (x1)) + LSDL (Pθ2 (x2)))
Ldist measures correlation between two views representations
and LSDL called stochastic decorrelation loss
2 2 2
L = ∥Z̃A − Z̃B ∥F +λ (∥Z̃TA Z̃A − I∥F + ∥Z̃TB Z̃B − I∥F )
| {z } | {z }
invariance term decorrelation term

Chandankumar G P 11/16
Geometric Scattering Transform for Graph Data

Transfer learning approaches provide evidence that suitably


constructed deep filter banks should be able to extract
task-agnostic semantic information from the data.

The geometric scattering transform is a deep filter bank for


feature extraction on graphs, which generalizes the Euclidean
scattering transform.

The scattering transform is constructed as a cascade of linear


wavelet transforms and nonlinear complex modulus operations
that provides features with guaranteed invariance to a
predetermined Lie group of operations such as rotations,
translations, or scaling.

Chandankumar G P 12/16
Graph Wavelets

Graph wavelets are defined as the difference between lazy


random walks that have propagated at different time scales.
Let G = (V, E, W ) be a weighted graph, consisting of n
vertices V = {v1 , ..., vn },edges E ⊆ {(vl , vm ) : 1 ≤ l, m, < n}
and weights W = {w (vl , vm ) > 0 : (vl , vm ) ∈ E} .
P
Weighted degree of vertex vl is deg (vl ) = m A (vl , vm ) and
the corresponding diagonal n × n degree matrix D is given by
D (vl , vl ) = deg (vl ) , D (vl , vm ) = 0, l ̸= m.
The n × n graph Laplacian matrix LG = L on G is define as
−1 −1
L = D − A and normalized Laplacian is N = D 2 LD 2
−1 −1
= I − D 2 AD 2

Chandankumar G P 13/16
The n × n lazy random walk matrix is P = 21 I + AD−1


and Pt governs the probability distribution of a lazy random


walk after t steps.
The value Pt x (vl ) is the weighted average of x (vl ) with all
values x (vm ) such that vm is within t steps of vl .
The n × n wavelet matrix at the scale 2j is defined as
j−1 j j−1
 j−1

Ψj = P2 − P2 = P2 I − P2 .

The wavelet coefficients up to the scale 2J are:

Ψ(J) x (vl ) = [Ψj x (vl ) : 1 ≤ j ≤ J] .

Chandankumar G P 14/16
Geometric Scattering on Graphs :

The zero order scattering moments is given by the


unnormalized q th moments of x given by.
n
X
Sx (q) = x (vl )q , 1 ≤ q ≤ Q.
l=1

First order geometric scattering moments is given by:


n
X
Sx (j, q) = |Ψj x (vl )|q , 1 ≤ j ≤ J, 1 ≤ q ≤ Q.
l=1

Second order geometric scattering moments:


n
 ′  X q ′
Sx j, j , q = Ψj ′ |Ψj x (vl )| , 1 ≤ j < j ≤ J, 1 ≤ q ≤ Q.
l=1

Chandankumar G P 15/16
Thank You

Chandankumar G P 16/16

You might also like