Professional Documents
Culture Documents
DRDO_PPT_m1
DRDO_PPT_m1
Chandankumar G P
M Tech AI
Indian Institute of Science
Bangalore
Chandankumar G P 1/16
Outline of Discussion :
Contrastive Learning.
Scattering Transforms.
Chandankumar G P 2/16
Graph representation learning :
Chandankumar G P 3/16
Contrastive Learning:
Contrastive learning is a selfsupervised representation learning
method
Contrastive learning in computer vision
Figure 1
Chandankumar G P 4/16
Contrastive learning in graphs
Get two diffrent views from a single graphs and learn better
representations
Figure 2
Chandankumar G P 5/16
Contrastive Learning framework:
Let G = (V, E) denote a graph, where V = {v1 , v2 , ......., vN }
and E ∈ V × V represent the node set and the edge set
respectively , X ∈ RN ×F , A ∈ {0, 1}N ×N denote the feature
matrix and the adjacency matrix respectively .
′
Our objective is to learn a GNN encoder f (X, A) ∈ RN ×F
that produces node embeddings in low dimensionality.
Chandankumar G P 6/16
Loss function: for each positive pair (ui , vi )
θ(u ,v )/τ
ℓ(ui , vi ) = log θ(u ,v )/τ X eθ(u i,v i)/τ X θ(u ,u )/τ
e| {z } +
i i
e i k + e i k
positive pair k̸=i k̸=i
| {z } | {z }
inter-view negative pairs intra-view negative pairs
where τ is a temperature parameter,θ(u, v) = s(g(u), g(v)) ,
where s(., .) is the cosine similarity and g(.) is a nonlinear
projection (implemented with a two-layer perceptron model).
Chandankumar G P 7/16
Adaptive Graph Augmentation:
seuv = log(wuv
e )
e −suve
peuv = min( ssmax
e −µe .pe , pτ )
max s
Chandankumar G P 8/16
We can use Degree centrality , Eigenvector centrality or
PageRank centrality
N ode attribute level augmentation : We add noise to
node attributes via randomly masking a fraction of dimensions
with zeros in node features. with probability pfi
the probability pfi should reflect the importance of the i-th
dimension of node features.
For each feature dimension we calculate weights as
wif = u∈V |xui |.ϕc (u)
P
We compute probabilty as
seuv = log(wuv
e )
f f
pfi = min( smax
f
−suv
f .pf , pτ )
smax −µs
Chandankumar G P 9/16
Canonical Correlation Analysis based Contrastive learning:
is maximized
Chandankumar G P 10/16
By replacing the linear transformation with neural networks.
Concretely, assuming x1, x2 as two views of an input data.
objective is: maxθ1 ,θ2 Tr(PθT1 (x1)Pθ2 (x2)) s.t
PθT1 (x1)Pθ1 (x1) = PθT2 (x2)Pθ2 (x2) =I
where Pθ1 and Pθ2 are two feedforward neural networks and I
is an identity matrix.
still such computation is really expensive and soft CCA
removes the hard decorrelation constraint by adopting the
following Lagrangian relaxation:
minθ1 ,θ2
Ldist (Pθ1 (x1), Pθ2 (x2))+λ(LSDL (Pθ1 (x1)) + LSDL (Pθ2 (x2)))
Ldist measures correlation between two views representations
and LSDL called stochastic decorrelation loss
2 2 2
L = ∥Z̃A − Z̃B ∥F +λ (∥Z̃TA Z̃A − I∥F + ∥Z̃TB Z̃B − I∥F )
| {z } | {z }
invariance term decorrelation term
Chandankumar G P 11/16
Geometric Scattering Transform for Graph Data
Chandankumar G P 12/16
Graph Wavelets
Chandankumar G P 13/16
The n × n lazy random walk matrix is P = 21 I + AD−1
Chandankumar G P 14/16
Geometric Scattering on Graphs :
Chandankumar G P 15/16
Thank You
Chandankumar G P 16/16