Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before

Kolmogorov-Arnold Networks (KANs) Are Being Used To

Boost Graph Deep Learning Like Never Before
A deep dive into how Graph Kolmogorov-Arnold Networks (GKANs) are improving Graph Deep Learning to
surpass traditional approaches

Dr. Ashish Bamania · Follow

Published in Level Up Coding
8 min read · 2 days ago

Image generated with DALL-E 3

KANs have gained a lot of attention since they were published in April 2024.

They are being used to solve several machine-learning problems that previously used Multi-layer Perceptrons
(MLPs), and their results have been impressive.

A team of researchers recently used KANs on Graph-structured data.

They called this new neural network architecture — Graph Kolmogorov-Arnold Networks (GKANs).

And, how did it go — you’d ask?

They found that GKANs achieve higher accuracy in semi-supervised learning tasks on a real-world graph
dataset (Cora) than the traditional ML models used for Graph Deep Learning, i.e. Graph Convolutional Networks

This is a big step for KANs!

Here is a story where we dive deep into GKANs, learn how they are used with graph-structured data, and discuss
how they surpass traditional approaches in Graph Deep Learning.

But First, What Is Graph Deep Learning?

Graphs are mathematical structures that consist of nodes (or vertices) and edges (or links) connecting these

Graph visualised (Image from author’s upcoming book ‘Computer Science In 100 Images’)

Some examples of real-world data that is structured in the form of graphs include:

Social network connections (Users as nodes and relationships as edges)

Recommendation systems (Items as nodes and user interactions as edges)

Chemical Molecules and compounds (Atoms as nodes and bonds as edges)

Biological molecules such as Proteins (Amino acids as nodes and bonds as edges)

Transportation networks like Roadways (Intersections as nodes and pathways as edges)

Graph Deep Learning is a set of methods developed to learn from such graph-structured data and solve
problems based on this learning.

Some of these Graph Learning problems involve:

Graph classification (labelling a graph according to its properties)

Node classification (predicting the label of a new node)

Link prediction (predicting the existence of relations/ edges between nodes)

Graph generation (creating new graphs based on existing ones)

Community detection (identifying clusters of densely connected nodes within a graph)

Graph Embedding generation (generating low dimensional representations of higher dimensional graphs)

Graph clustering (grouping similar graph nodes together)

Graph anomaly detection (figuring out abnormal nodes or edges that do not match the expected pattern in a

Traditionally, these problems have been solved using Graph Neural Networks (GNNs) and their variants (notably
Graph Convolutional Networks), which use MLPs at their core.

Graph Neural Networks visualised (Image from author’s book ‘AI In 100 Images’)

Let’s explore Graph Convolutional Networks (GCNs) in a bit more detail.

What Are Graph Convolutional Networks?

A Graph Convolutional Network (GCN) combines a graph’s node features with its topology (or how the nodes are
connected in space). This allows it to effectively capture the dependencies and relationships in the graph.

In other words, GCNs are based on the assumption that node labels y are mathematically dependent on both
the node features X and the graph’s structure (i.e. its adjacency matrix A ).

This can be mathematically expressed as:

y = f (X, A)

https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 4/19
A multi-layer GCN updates the node representations by aggregating the information from neighbouring nodes
using a layer-wise propagation rule:

Layer-wise propagation rule employed by a Graph Convolutional Network (Image from original research paper)


à = A + I is the Augmented adjacency matrix or the graph's adjacency matrix with added self-connections
for each node. It is the sum of the graph’s adjacency matrix with its identity matrix I.

D~ is a diagonal matrix of à where each diagonal element represents the degree of node i in the
augmented graph

D~^(-1/2) Ã D~^(-1/2) is used to symmetrically normalize Ã, to make sure that each node’s influence is
appropriately scaled by its degree. This normalized adjacency matrix is usually represented with Â.

H(l) is the matrix of node features at layer l

H(0) represents the initial node features (or X)

W(l) is the trainable weight matrix at layer l

σ represents an activation function (e.g. ReLU)

A simple two-layer GCN’s forward propagation (used for node classification) can be expressed as follows:

Forward propagation in a 2-layer GCN for graph node classification (Image from original research paper)


 is the normalized adjacency matrix

X is the input feature matrix

W(0) and W(1) are the weight matrices for the first and second layers, respectively. These weights are
optimized using Gradient Descent.

Overview of a two-layer GCN (Image from original research paper)

Now that we know about GCNs let’s move on to learning about KANs.

Next, What Are KANs?

Kolmogorov-Arnold Network (KAN) is a novel and innovative neural network architecture based on the
Kolmogorov-Arnold representation theorem.

They are a promising alternative to the currently popular MLPs that are based on the Universal Approximation

The core idea behind KANs is to use learnable univariate activation functions (shaped as a B-Spline) on the
edges and simple summations on the nodes of a neural network.

This contrasts with MLPs that use learnable weights on the edges while having a fixed activation function on
the neural network nodes.

A comparison between MLPs and KANs (Image from the research paper titled ‘KAN: Kolmogorov–Arnold Networks’
published in ArXiv)

When compared with MLPs, KANs:

Lead to smaller computational graphs

Are more parameter-efficient and accurate

Converge faster and achieve lower losses

Have steeper scaling laws

Are highly interpretable

On the contrary, given the same number of parameters, they take longer to train compared to MLPs.

The Birth Of GKANs

Considering the advantages that KANs offer, researchers devised a novel hybrid architecture called Graph
Kolmogorov-Arnold Networks (GKANs) that extended the use of KANs on graph-structured data.

They aimed to find out if GKANs could effectively learn from both labelled and unlabeled data in a semi-
supervised setting and outperform traditional graph learning methods.

The team developed two GKAN architectures, which are described below.

GKAN Architecture 1: Activations After Summation

In this architecture, the learnable univariate activation functions are applied to the aggregated node features
after the summation step.

In other words, the node embeddings are first aggregated using the normalized adjacency matrix, and then they
are passed through the KAN layer.

The layer-wise propagation rule for GKAN Architecture 1 is shown below.

Layer-wise propagation rule for GKAN Architecture 1 (Image created by author)


H(l) and H(l+1) represent the node feature matrix at layers l and l+1 , respectively

 is the normalized adjacency matrix

the KANLayer operation applies learnable univariate activation functions or B-Splines to the aggregated node
features ÂH(l)

https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 7/19
The forward propagation model for the architecture is expressed as:

Forward propagation model for GKAN Architecture 1 with L layers (Image created by author)

Overview of a two-layer GKAN Architecture 1 (Image from original research paper)

GKAN Architecture 2: Activations Before Summation

In this architecture, the learnable univariate activation functions are applied to the aggregated node features
before the summation step.

In other words, the node features are first passed through the KAN layer and then summated using the
normalized adjacency matrix.

The layer-wise propagation rule for GKAN Architecture 2 is shown below.

Layer-wise propagation rule for GKAN Architecture 2 (Image created by author)


H(l) and H(l+1) represent the node feature matrix at layers l and l+1 , respectively

 is the normalized adjacency matrix

the KANLayer operation applies learnable univariate activation functions or B-Splines to each element of the
input node features H(l)

https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 8/19
The forward propagation model for the architecture is expressed as:

Forward propagation model for GKAN Architecture 2 with L layers (Image created by author)

Overview of a two-layer GKAN Architecture 2 (Image from original research paper)

The Performance Of GKANs

Both GKAN architectures were first trained on the Cora dataset.

The Cora dataset is a citation network that consists of documents as nodes and the citation links between these
documents as edges.

There are 7 different classes in this dataset that have 1433 features per document.

Next, their performance was compared to that of a conventional GCN, with a comparable number of
parameters, over both train and test data using a subset of 200 features from the total 1433 available in the

And the results were quite incredible!

GKANs achieved higher accuracy than GCNs for both 100 and 200 feature sets.

On the first 100 features of the dataset, both GKAN architectures achieved higher accuracy than the GCN.

Notably, the GKAN Architecture 2 achieved 61.76% accuracy compared to 53.5% for GCN.

https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 9/19
Performance of different architectures on the first 100 features of the Cora dataset, where k is the polynomial degree
in the spline functions, g is the spline grid size, and h is the size of hidden layers. (Image from original research paper)

Similarly, both GKAN architectures achieved higher accuracy than the GCN on the first 200 features of the
dataset, with the GKAN Architecture 2 achieving 67.66% accuracy compared to 61.24% for GCN.

Performance of different architectures on the first 100 features of the Cora dataset (Image from original research

The training and test accuracy plots below showed that GKANs achieved higher accuracy during both the
training and testing phases.

Training and Test Accuracy plots for different architectures (Image from original research paper)

It was also noted that GKAN architectures showed a sharper decrease in loss values during training and required
fewer epochs to be trained.

https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 10/19
Training and Test Loss for different architectures (Image from original research paper)

Influence Of Parameters on GKANs

Researchers also evaluated how different parameters impacted the performance of GKANs.

These parameters were:

k : the degree of the polynomial in the spline functions

g: the grid size for the spline functions

h: the size of hidden layers in the network

It was found that the following led to the most effective GKANs.

Lower polynomial degrees ( k = 1 out of {1, 2, 3} )

Intermediate grid sizes ( g = 7 out of {3, 7, 11} )

Moderate hidden layer sizes ( h = 12 out of {8, 12, 16} )

Training Time
Although GKANs showed high accuracy and better efficiency with faster convergence, researchers noted that
their training process was relatively slow, requiring future optimizations.

K ANs have opened up a new avenue for improved graph learning and could also be a promising alternative
for other graph learning approaches (including Graph Autoencoders, Graph Transformers, and more)
that use MLPs at their core.

What are your thoughts on them? Have you used KANs in your projects yet? Let me know in the comments below!

Further Reading
Research paper titled ‘GKAN: Graph Kolmogorov-Arnold Networks’ on ArXiv

Software implementation of GKANs on GitHub (yet to be publically released by the research team)

Author’s story titled ‘Kolmogorov-Arnold Networks (KANs) Might Change AI As We Know It, Forever’, that explains
KANs in detail

GitHub repository featuring a curated list of projects using Kolmogorov-Arnold Networks (KANs)

Technology Data Science Programming Artificial Intelligence Machine Learning


https://levelup.gitconnected.com/kolmogorov-arnold-networks-kans-are-being-used-to-boost-graph-deep-learning-like-never-before-2d39fec7dfc3 16/19
29/06/2024, 12:51 Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before | by Dr. Ashish Bama…

29/06/2024, 12:51 Kolmogorov-Arnold Networks (KANs) Are Being Used To Boost Graph Deep Learning Like Never Before | by Dr. Ashish Bama…

