Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before _ by D

29/06/2024, 12:51 Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before
Before | by Dr. Ashish Bama…
Member-only story
Kolmogorov-Arnold Networks (KANs) Are Being Used To

Boost Graph Deep Learning Like Never Before
A deep dive into how Graph Kolmogorov-Arnold Networks (GKANs) are improving Graph Deep Learning to
surpass traditional approaches
Dr. Ashish Bamania · Follow

Published in Level Up Coding
8 min read · 2 days ago
Listen Share More
Image generated with DALL-E 3
KANs have gained a lot of attention since they were published in April 2024.
They are being used to solve several machine-learning problems that previously used Multi-layer Perceptrons
(MLPs), and their results have been impressive.
A team of researchers recently used KANs on Graph-structured data.
They called this new neural network architecture — Graph Kolmogorov-Arnold Networks (GKANs).
And, how did it go — you’d ask?
https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 1/19
29/06/2024, 12:51 Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before | by Dr. Ashish Bama…
They found that GKANs achieve higher accuracy in semi-supervised learning tasks on a real-world graph
dataset (Cora) than the traditional ML models used for Graph Deep Learning, i.e. Graph Convolutional Networks
(GCNs).
This is a big step for KANs!
Here is a story where we dive deep into GKANs, learn how they are used with graph-structured data, and discuss
how they surpass traditional approaches in Graph Deep Learning.
But First, What Is Graph Deep Learning?

Graphs are mathematical structures that consist of nodes (or vertices) and edges (or links) connecting these
nodes.
Graph visualised (Image from author’s upcoming book ‘Computer Science In 100 Images’)
Some examples of real-world data that is structured in the form of graphs include:
Social network connections (Users as nodes and relationships as edges)
Recommendation systems (Items as nodes and user interactions as edges)
Chemical Molecules and compounds (Atoms as nodes and bonds as edges)
Biological molecules such as Proteins (Amino acids as nodes and bonds as edges)
Transportation networks like Roadways (Intersections as nodes and pathways as edges)
Graph Deep Learning is a set of methods developed to learn from such graph-structured data and solve
problems based on this learning.
Some of these Graph Learning problems involve:
Graph classification (labelling a graph according to its properties)
Node classification (predicting the label of a new node)
Link prediction (predicting the existence of relations/ edges between nodes)
Graph generation (creating new graphs based on existing ones)
Community detection (identifying clusters of densely connected nodes within a graph)
Graph Embedding generation (generating low dimensional representations of higher dimensional graphs)
Graph clustering (grouping similar graph nodes together)
Graph anomaly detection (figuring out abnormal nodes or edges that do not match the expected pattern in a
graph)
Traditionally, these problems have been solved using Graph Neural Networks (GNNs) and their variants (notably
Graph Convolutional Networks), which use MLPs at their core.
Graph Neural Networks visualised (Image from author’s book ‘AI In 100 Images’)
Let’s explore Graph Convolutional Networks (GCNs) in a bit more detail.
What Are Graph Convolutional Networks?

A Graph Convolutional Network (GCN) combines a graph’s node features with its topology (or how the nodes are
connected in space). This allows it to effectively capture the dependencies and relationships in the graph.
In other words, GCNs are based on the assumption that node labels y are mathematically dependent on both
the node features X and the graph’s structure (i.e. its adjacency matrix A ).
This can be mathematically expressed as:
y = f (X, A)
A multi-layer GCN updates the node representations by aggregating the information from neighbouring nodes
using a layer-wise propagation rule:
Layer-wise propagation rule employed by a Graph Convolutional Network (Image from original research paper)
where:
Ã = A + I is the Augmented adjacency matrix or the graph's adjacency matrix with added self-connections
for each node. It is the sum of the graph’s adjacency matrix with its identity matrix I.
D~ is a diagonal matrix of Ã where each diagonal element represents the degree of node i in the
augmented graph
D~^(-1/2) Ã D~^(-1/2) is used to symmetrically normalize Ã, to make sure that each node’s influence is
appropriately scaled by its degree. This normalized adjacency matrix is usually represented with Â.
H(l) is the matrix of node features at layer l
H(0) represents the initial node features (or X)
W(l) is the trainable weight matrix at layer l
σ represents an activation function (e.g. ReLU)
A simple two-layer GCN’s forward propagation (used for node classification) can be expressed as follows:
Forward propagation in a 2-layer GCN for graph node classification (Image from original research paper)
where:
Â is the normalized adjacency matrix
X is the input feature matrix
W(0) and W(1) are the weight matrices for the first and second layers, respectively. These weights are
optimized using Gradient Descent.
Overview of a two-layer GCN (Image from original research paper)
Now that we know about GCNs let’s move on to learning about KANs.
Next, What Are KANs?

Kolmogorov-Arnold Network (KAN) is a novel and innovative neural network architecture based on the
Kolmogorov-Arnold representation theorem.
They are a promising alternative to the currently popular MLPs that are based on the Universal Approximation
Theorem.
The core idea behind KANs is to use learnable univariate activation functions (shaped as a B-Spline) on the
edges and simple summations on the nodes of a neural network.
This contrasts with MLPs that use learnable weights on the edges while having a fixed activation function on
the neural network nodes.
A comparison between MLPs and KANs (Image from the research paper titled ‘KAN: Kolmogorov–Arnold Networks’
published in ArXiv)
When compared with MLPs, KANs:
Lead to smaller computational graphs
Are more parameter-efficient and accurate
Converge faster and achieve lower losses
Have steeper scaling laws
Are highly interpretable
On the contrary, given the same number of parameters, they take longer to train compared to MLPs.
The Birth Of GKANs

Considering the advantages that KANs offer, researchers devised a novel hybrid architecture called Graph
Kolmogorov-Arnold Networks (GKANs) that extended the use of KANs on graph-structured data.
They aimed to find out if GKANs could effectively learn from both labelled and unlabeled data in a semi-
supervised setting and outperform traditional graph learning methods.
The team developed two GKAN architectures, which are described below.
GKAN Architecture 1: Activations After Summation

In this architecture, the learnable univariate activation functions are applied to the aggregated node features
after the summation step.
In other words, the node embeddings are first aggregated using the normalized adjacency matrix, and then they
are passed through the KAN layer.
The layer-wise propagation rule for GKAN Architecture 1 is shown below.
Layer-wise propagation rule for GKAN Architecture 1 (Image created by author)
where:
H(l) and H(l+1) represent the node feature matrix at layers l and l+1 , respectively
the KANLayer operation applies learnable univariate activation functions or B-Splines to the aggregated node
features ÂH(l)
The forward propagation model for the architecture is expressed as:
Forward propagation model for GKAN Architecture 1 with L layers (Image created by author)
Overview of a two-layer GKAN Architecture 1 (Image from original research paper)
GKAN Architecture 2: Activations Before Summation

In this architecture, the learnable univariate activation functions are applied to the aggregated node features
before the summation step.
In other words, the node features are first passed through the KAN layer and then summated using the
normalized adjacency matrix.
The layer-wise propagation rule for GKAN Architecture 2 is shown below.
Layer-wise propagation rule for GKAN Architecture 2 (Image created by author)
where:
H(l) and H(l+1) represent the node feature matrix at layers l and l+1 , respectively
the KANLayer operation applies learnable univariate activation functions or B-Splines to each element of the
input node features H(l)
The forward propagation model for the architecture is expressed as:
Forward propagation model for GKAN Architecture 2 with L layers (Image created by author)
Overview of a two-layer GKAN Architecture 2 (Image from original research paper)
The Performance Of GKANs

GKANs vs. GCN
Both GKAN architectures were first trained on the Cora dataset.
The Cora dataset is a citation network that consists of documents as nodes and the citation links between these
documents as edges.
There are 7 different classes in this dataset that have 1433 features per document.
Next, their performance was compared to that of a conventional GCN, with a comparable number of
parameters, over both train and test data using a subset of 200 features from the total 1433 available in the
dataset.
And the results were quite incredible!
GKANs achieved higher accuracy than GCNs for both 100 and 200 feature sets.
On the first 100 features of the dataset, both GKAN architectures achieved higher accuracy than the GCN.
Notably, the GKAN Architecture 2 achieved 61.76% accuracy compared to 53.5% for GCN.
Performance of different architectures on the first 100 features of the Cora dataset, where k is the polynomial degree
in the spline functions, g is the spline grid size, and h is the size of hidden layers. (Image from original research paper)
Similarly, both GKAN architectures achieved higher accuracy than the GCN on the first 200 features of the
dataset, with the GKAN Architecture 2 achieving 67.66% accuracy compared to 61.24% for GCN.
Performance of different architectures on the first 100 features of the Cora dataset (Image from original research
paper)
The training and test accuracy plots below showed that GKANs achieved higher accuracy during both the
training and testing phases.
Training and Test Accuracy plots for different architectures (Image from original research paper)
It was also noted that GKAN architectures showed a sharper decrease in loss values during training and required
fewer epochs to be trained.
Training and Test Loss for different architectures (Image from original research paper)
Influence Of Parameters on GKANs

Researchers also evaluated how different parameters impacted the performance of GKANs.
These parameters were:
k : the degree of the polynomial in the spline functions
g: the grid size for the spline functions
h: the size of hidden layers in the network
It was found that the following led to the most effective GKANs.
Lower polynomial degrees ( k = 1 out of {1, 2, 3} )
Intermediate grid sizes ( g = 7 out of {3, 7, 11} )
Moderate hidden layer sizes ( h = 12 out of {8, 12, 16} )
Training Time
Although GKANs showed high accuracy and better efficiency with faster convergence, researchers noted that
their training process was relatively slow, requiring future optimizations.
K ANs have opened up a new avenue for improved graph learning and could also be a promising alternative
for other graph learning approaches (including Graph Autoencoders, Graph Transformers, and more)
that use MLPs at their core.
What are your thoughts on them? Have you used KANs in your projects yet? Let me know in the comments below!
Further Reading
Research paper titled ‘GKAN: Graph Kolmogorov-Arnold Networks’ on ArXiv
Software implementation of GKANs on GitHub (yet to be publically released by the research team)
Author’s story titled ‘Kolmogorov-Arnold Networks (KANs) Might Change AI As We Know It, Forever’, that explains
KANs in detail
GitHub repository featuring a curated list of projects using Kolmogorov-Arnold Networks (KANs)
Here are my mailing list links if you’d like to stay connected to my work —
Get an email whenever Dr. Ashish Bamania publishes.

Get an email whenever Dr. Ashish Bamania publishes.
bamania-ashish.medium.com
Ashish’s Substack | Ashish Bamania | Substack

Sharing Everything That I Have Learned & Have Been Learning About, Unfiltered.
ashishbamania.substack.com
Byte Surgery | Ashish Bamania | Substack

🚀 A Deep Dive Into The Best Of Software Engineering ⚙️
bytesurgery.substack.com
Subscribe to Dr. Ashish Bamania on Gumroad

Top Tech & AI Writer On Medium | Self-Taught Software Engineer 👨‍💻 | Emergency Doctor 🩺 | AIIMS,
New Delhi 👨‍🎓
bamaniaashish.gumroad.com
Technology Data Science Programming Artificial Intelligence Machine Learning
Follow
Written by Dr. Ashish Bamania

27K Followers · Writer for Level Up Coding
Self- Taught Software Engineer 👨‍💻 | Emergency Physician 🩺 | AIIMS, New Delhi 👨‍🎓| Free 'AI In 100 Images' :
https://bamaniaashish.gumroad.com/l/visual_ai
More from Dr. Ashish Bamania and Level Up Coding
Dr. Ashish Bamania in Level Up Coding
Google’s New Algorithms Just Made Searching Vector Databases Faster Than Ever
A Deep Dive into how Google’s ScaNN and SOAR Search algorithms supercharge the performance of Vector Databases
Jun 18 442 1
Alexander Nguyen in Level Up Coding
The resume that got a software engineer a $300,000 job at Google.

1-page. Well-formatted.
Jun 1 9.5K 118
Open in app
Search
Fareed Khan in Level Up Coding
Building LLaMA 3 From Scratch with Python

Code Your Own Billion Parameter LLM
May 28 1.5K 12
Dr. Ashish Bamania in Level Up Coding
Ditch JSON! Here Are 5 (Better) Data Serialization Formats To Use In Your Next Project
Have you heard about “Cap’n Proto”, the Infinity times faster protocol?
Apr 1 1.2K 26
See all from Dr. Ashish Bamania
See all from Level Up Coding
Recommended from Medium
Jan Kammerath
Why Tech Workers Are Fleeing Germany — A Reality Check

Over the past months and years I have seen a number of friends and colleagues leave Germany for good. Some of them were
native Germans…
6d ago 1.8K 49
Joe Procopio in Entrepreneurship Handbook
Tech Employees Are Beyond Burned Out, and This Time They Need More Than Empty Promises
Morale is in the tank. Quiet quitting is through the roof. Here’s why.
5d ago 979 22
Lists
Staff Picks
673 stories · 1099 saves
Stories to Help You Level-Up at Work

Self-Improvement 101
Productivity 101
Liu Zuo Lin in Level Up Coding
Write Python Functions Like This Or I’ll Reject Your Pull Request
This was the energy I was getting from my tech lead at work. And I actually agree with him at this point.
Jun 22 1.1K 29
Andrew Zuo
Async Await Is The Worst Thing To Happen To Programming

I recently saw this meme about async and await.
Jun 21 526 45
Kasper Müller in Cantor’s Paradise
The Mystery of Catalan’s Constant

And famous solutions to infinite series that everyone should know
5d ago 340 2
Lucas de Lima Nogueira in Towards Data Science
How Bend Works: A Parallel Programming Language That “Feels Like Python but Scales Like
CUDA”
A brief introduction to Lambda Calculus, Interaction Combinators, and how they are used to parallelize operations on Bend /
HVM.
2d ago 482 5
See more recommendations

Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before _ by D

Uploaded by

Copyright:

Available Formats

You might also like

Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before _ by D

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before _ by D

Uploaded by

Copyright:

Available Formats

29/06/2024, 12:51 Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before

Before | by Dr. Ashish Bama…

Kolmogorov-Arnold Networks (KANs) Are Being Used To

Dr. Ashish Bamania · Follow

Listen Share More

Image generated with DALL-E 3

A team of researchers recently used KANs on Graph-structured data.

And, how did it go — you’d ask?

This is a big step for KANs!

But First, What Is Graph Deep Learning?

Social network connections (Users as nodes and relationships as edges)

Recommendation systems (Items as nodes and user interactions as edges)

Chemical Molecules and compounds (Atoms as nodes and bonds as edges)

Transportation networks like Roadways (Intersections as nodes and pathways as edges)

Some of these Graph Learning problems involve:

Graph classification (labelling a graph according to its properties)

Node classification (predicting the label of a new node)

Link prediction (predicting the existence of relations/ edges between nodes)

Graph generation (creating new graphs based on existing ones)

Community detection (identifying clusters of densely connected nodes within a graph)

Graph clustering (grouping similar graph nodes together)

Let’s explore Graph Convolutional Networks (GCNs) in a bit more detail.

What Are Graph Convolutional Networks?

This can be mathematically expressed as:

H(l) is the matrix of node features at layer l

H(0) represents the initial node features (or X)

W(l) is the trainable weight matrix at layer l

σ represents an activation function (e.g. ReLU)

Â is the normalized adjacency matrix

X is the input feature matrix

Overview of a two-layer GCN (Image from original research paper)

Next, What Are KANs?

When compared with MLPs, KANs:

Lead to smaller computational graphs

Are more parameter-efficient and accurate

Converge faster and achieve lower losses

Have steeper scaling laws

Are highly interpretable

The Birth Of GKANs

GKAN Architecture 1: Activations After Summation

The layer-wise propagation rule for GKAN Architecture 1 is shown below.

Layer-wise propagation rule for GKAN Architecture 1 (Image created by author)

Â is the normalized adjacency matrix

The forward propagation model for the architecture is expressed as:

Overview of a two-layer GKAN Architecture 1 (Image from original research paper)

GKAN Architecture 2: Activations Before Summation

The layer-wise propagation rule for GKAN Architecture 2 is shown below.

Layer-wise propagation rule for GKAN Architecture 2 (Image created by author)

Â is the normalized adjacency matrix

The forward propagation model for the architecture is expressed as:

Overview of a two-layer GKAN Architecture 2 (Image from original research paper)

The Performance Of GKANs

And the results were quite incredible!

Influence Of Parameters on GKANs

These parameters were:

k : the degree of the polynomial in the spline functions

g: the grid size for the spline functions

h: the size of hidden layers in the network

Lower polynomial degrees ( k = 1 out of {1, 2, 3} )