Professional Documents
Culture Documents
A State-Of-The-Art Computer Vision Adopting Non - E
A State-Of-The-Art Computer Vision Adopting Non - E
A State-Of-The-Art Computer Vision Adopting Non - E
Review Article
A State-of-the-Art Computer Vision Adopting Non-Euclidean
Deep-Learning Models
Sakib H. Chowdhury , Md. Robius Sany , Md. Hafz Ahamed , Sajal K. Das ,
Faisal Rahman Badal , Prangon Das , Zinat Tasneem , Md. Mehedi Hasan ,
Md. Robiul Islam , Md. Firoj Ali , Sarafat Hussain Abhi , Md. Manirul Islam ,
and Subrata Kumar Sarker
Department of Mechatronics Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
Received 7 November 2022; Revised 17 April 2023; Accepted 18 April 2023; Published 9 May 2023
Copyright © 2023 Sakib H. Chowdhury et al. Tis is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
A distance metric known as non-Euclidean distance deviates from the laws of Euclidean geometry, which is the geometry that
governs most physical spaces. It is utilized when Euclidean distance is inappropriate, for as when dealing with curved surfaces or
spaces with complex topologies. Te ability to apply deep learning techniques to non-Euclidean domains including graphs,
manifolds, and point clouds is made possible by non-Euclidean deep learning. Te use of non-Euclidean deep learning is rapidly
expanding to study real-world datasets that are intrinsically non-Euclidean. Over the years, numerous novel techniques have been
introduced, each with its benefts and drawbacks. Tis paper provides a categorized archive of non-Euclidean approaches used in
computer vision up to this point. It starts by outlining the context, pertinent information, and the development of the feld’s history.
Modern state-of-the-art methods have been described briefy and categorized by application felds. It also highlights the model’s
shortcomings in tables and graphs and shows diferent real-world applicability. Overall, this work contributes to a collective in-
formation and performance comparison that will help enhance non-Euclidean deep-learning research and development in the future.
1. Introduction this point, these were directed with structured parameters and
traceable features. While 2D models are tamed, 3D models have
For decades, machine learning has been enriched in many been the main focus in the computer vision department for the
dimensions in terms of rich data inputs, nobler algorithms, and last decade. Modern GPU-based computers’ increasing pro-
output optimization. Te computer vision’s primary goal is to cessing power, the accessibility of the size of training datasets,
analyze, process, and give meaning to digital images. To achieve and efective stochastic optimization techniques have all made it
this feat, machine-learning (ML) algorithms as per diferent possible in recent years to design and successfully train complex
methods are being developed day by day [1]. For text-based network frameworks with numerous degrees of freedom. Tis
models or 1D inputs, K-nearest neighbor (KNN) models are sparked the feld’s growth by enabling deep neural networks to
being used with profciency [2, 3]. Text-based inputs are more signifcantly improve productivity with a large range of ap-
straightforward as only one dimension is enough to assess plications, starting from processing speech and language for
features. Ten, came the 2D or image-based algorithms which machine processing to image processing and computer vision.
are segmented with grids to analyze. Convolutional neural 3D models are more potent in terms of extracting an-
network (CNN) [4], artifcial neural network (ANN) [5], and alytics, simply containing more information. Analyzing 3D
recurrent neural network (RNN) [6] models are established as datasets became a necessity with the requirements of ana-
an almost saturated model for image processing [7–9]. Until lyzing massive information from social media with many
2 International Journal of Intelligent Systems
parameters added with each node [10], observations of and computer vision. So the evolution of this deep learning
physical objects found in nature with archaeological values begins.
[11], analyzing the protein structures [12], 3D scanning in Figure 1 describes practices of deep geometrical learning
the medical department [13], and many more. Virtual reality that started a while back starting with recursive neural
is becoming more embedded in the world with each passing networks (RvNN) in 1997 and are used on directed graphs
day, but still, it is in its infancy stage, with its usage only in [17]. A breakthrough in machine learning came in 1998 with
the entertainment department. For practical embedding of the convolution of a neural network, which is further de-
virtual reality, fawless accuracy must be achieved in ana- veloped to assist in the creation of many more models for
lyzing the 3D surroundings [14]. Tat said, 3D models come understanding 3D objects [18]. Te reason for the possibility
with non-Euclidean geometry, which brings a lot of dif- is local spatial feature extraction in multiscale. Te frst
culties while measuring features. modifed version of a graph neural network (GNN) [19] to
As 1D and 2D models are easy to organize and con- calculate graph data started its journey in 2005 and fully
nectivity among nodes is prioritized, the CNN model can came to attention in 2009 with improved performance using
easily recognize and analyze the provided input patterns. the SL algorithm [20]. Following that, the graph convolu-
Tese inputs are called Euclidean inputs that feature Eu- tional neural network (GCNN) [21] model was introduced
clidean geometry that can be defned with 2D shapes and to counter the difculties of analyzing non-Euclidean
fgures following explainable mathematical rules. Te main manifolds. GCNN was followed by lookup-based CNN
diference lies in Euclidean geometry being only on the same (LCNN) [22], which used a learned lexicon feature to encode
planes, whereas non-Euclidean geometry is the geometry in CNN convolution. Te difusion CNN (DCNN) [23] also
3D places with infnite planes. Tus, conventional 2D ge- known as difusion-convolutional neural network was the
ometry that has been used till now is useless in non- following method put forth to develop a difusion-based
Euclidean geometry. Tat said, non-Euclidean geometry model of nodes from network data to categorize them. A
uses complex structures that hold more data. Euclidean common technique for extracting the local feature from the
geometry has only one plane to store data and, therefore, has graph was proposed in the work of GNN, which is com-
less freedom for inputs. Existing models for 2D assessment parable to convolutional networks that work based on
are being upgraded for 3D analysis and unlocking their full images and on the inputs connected by local regions.
potential to analyze data of multiple planes. Tus, new al- ChebNet [24] was frst proposed in 2016. Following that, an
gorithms are being created to recognize the 3D analysis by anisotropic CNN (ACNN) [25] model was introduced that
non-Euclidean deep learning, also known as “geometric added a shape factor to CNN for analyzing using shapes as
deep learning” [15], taking on a challenge to give structure to a unit. PointNet [26] models that already existed were
unstructured mesh grids, manifolds, and point clouds. improved using the recursive structure the following year
Data having a non-Euclidean spatial structure is of in- and established PointNet++ [27]. Afterward, GCN was
terest to many scientifc disciplines. Social networks in the suggested as a more specifc variant. CayleyNet [28] was
feld of computational social sciences, sensor networks in proposed a year later that used Cayley polynomials to
information exchange, functional networks in neuro- existing GCN [29]. Two recently proposed models are an-
imaging, regulatory networks in genetics, and meshed isotropic Chebyshev spectral CNN (ACSCNN) [30] which
surfaces in 3D modeling are just a few examples. aggregates local feature values for efective signal collection
In social networking sites, user characteristics may be in 2020 and UV-Net [31] which combines image data with
replicated using signals on the vertex points of the social GCN for low memory overhead cost and computational cost
graph. Distributed, interconnected sensors make up sensor in 2021. Convolutional networks constituted the foundation
networks, and their measurements are shown on graphs as for the research fndings mentioned above for graph-based
time-varying signals. 3D objects are modeled in computer methods. Similarly, manifold or voxel-based algorithms saw
vision and graphics as Riemannian manifolds [16] with their development over the years side by side using 3D CNN
attributes like color, texture, and motion felds such as as the base model. Working inherently with geometric forms
dynamic meshes. Since these data are non-Euclidean, it is commonplace in the feld of computer graphics [32]. Here,
follows that they lack well-known characteristics like global 3D objects are often treated as Riemannian manifolds and
parametrization, a standard set of coordinates, graph-based discretized using meshes.
structure, and shift invariance. As a result, fundamental Many image-based machine-learning methods have
operations like linear combination and convolution, which been directly applied to 3D geometric data, with varying
are assumed to be clearly defned in the Euclidean context, degrees of success, with the data being represented as range
are considerably less, so in non-Euclidean domains. Tis is pictures [33, 34] or rasterized volumes [35, 36]. Te primary
a signifcant barrier that has prevented the application of problem with these methods is that they incorrectly assume
efective deep-learning techniques, like convolution or re- that geometric data can be represented as Euclidean
current neural networks, to non-Euclidean geometric data structures. To explain Figure 2, when dealing with com-
up to this point. As a result, felds like computer graphics and plicated 3D objects, representations based on Euclidean
computational sociology have not yet experienced the geometry, such as depth pictures or voxels, may distort or
practical and theoretical breakthroughs that deep-learning even destroy the object’s topological structure, resulting in
models have given to voice recognition, natural language, a considerable loss of information. Second, when an item is
International Journal of Intelligent Systems 3
UV-Net
CayleyNet
ACNN
LCNN Combines
CNN GNN + SL
Adds a scale Introduces image and GCN
Improved Encodes CNN Cayley in computation
Process and a shape
performance convolutions polynomials and memory-
pixel data factor to
with SL using a learned to GCN efficient manner
efficiently CNN
lexicon.
1997 1998 2005 2009 2015 2015 2016 2016 2017 2019 2020 2021
posed diferently or deformed, its Euclidean representation types. Newer hybrid frameworks, such as a spatial-
will change [37]. spectral-based model, are also included. Each model
Tere have been some reviews on geometric deep was outlined together with its underlying logic and
learning in the past few years. Bronstein et al. gave a so- operational principles.
phisticated overview of this context. Quite a few general (iii) To fully comprehend these models, it has summa-
models with corresponding mathematical derivations with rized their key features, such as their uniqueness
some wide extensive application in 2017 were described in and their limits, into a performance table. Various
detail [15]. Zhang et al. [38] reviewed the diferent types of numerical fndings demonstrating performance
deep-learning methods on graphs only [39, 40]. Later on, metrics on their respective datasets are also
Cao et al. [41] reviewed some models with their mathe- included.
matical derivations on graphs and manifolds.
(iv) To help put the spotlight on the current trend in this
Tis paper shifts the emphasis from a broad overview to
area, it has compiled a table summarizing the most
computer vision exclusively, since the previous publications
recent uses of these graph-based frameworks in
all concentrate on their models in general use cases across
a variety of contexts.
disciplines. Models like UV-Net [31] and MDGCN [42]
presented more recently improved computer vision in an (v) To emphasize the importance, it provides data from
intuitive manner with efective visual representations for recent research showing the current trajectory and
hobbyist computer vision researchers. In contrast to the promising future of the feld.
cited publications, the strengths and weaknesses in terms of (vi) In light of these most recent work patterns and
the performances of these models are discussed and sum- technical challenges, this paper analyzed potential
marized. Unlike previous works published in the modern future scopes.
age of computer vision, this one covers a broad range of
Tis paper reviews the most recent deep-learning
cutting-edge applications. Te sole contributions of this
methods for computer vision on graphs with manifolds
study are statistical fndings, trends, obstacles, and recom-
and the applications that traverse them. It started with the
mendations for further research. Tis article provides a de-
introduction in Section 1 of our article, where it gave
tailed examination of current deep-learning architectures in
a succinct history and evolution of the various models put
computer vision, together with an exhaustive account of
forth over the years. It also discussed the importance and
their methodology and their applications.
infuence of our feld in this section. Te background re-
Our contributions can be summarized as follows:
search in the linked felds of the point cloud, graph, and
(i) Tis paper ofered a quick summary of mostly used manifold is detailed in Section 2, along with relevant the-
and efcient non-Euclidean deep-learning models ories. In Section 3, it has categorized every suggested
suggested during the last two decades. In addition, it methodology for non-Euclidean felds with simple math and
ofered theoretical and operational analyses of the descriptions. Te models were divided into GCNs and
model. It has visualizations, explanations, and manifolds, which were then divided into spectral and spatial
mathematical reasoning. subcategories. Section 4 is devoted to outlining several
(ii) Tis literature categorized the models according to current algorithmic limitations, applications, and pro-
their underpinnings, such as spectral and spatial spective future opportunities for the progress of our
4 International Journal of Intelligent Systems
(a) (b)
Figure 2: (a) Image in the Euclidean space and (b) graph data in the non-Euclidean space.
P1 P5
Geometric deep-learning models use point clouds, shapes, P3
graphs, and manifolds whose mathematical operations are
described. y
x
z
2.1. Point Cloud. Commonly used ways to analyze 3D ob- P0
jects are to use a laser scanner to gather a lot of data as a point
cloud. Providing them inputs, analysis becomes much more Figure 3: B-spline curve.
complicated as all the nodes are undirected and independent
[43]. Te points are non-Euclidean and require feature
segmentation for further calculation. Object recognition and bending portion, a V-shaped joint is provided called control
detection take point clouds as inputs in computer vision points, whereas the corresponding points on the curve are
tasks due to their availability. presented as knot vectors [46]. Te control points can be
used as the reference point, and local features can be
In point clouds, a single point from the point cloud P can
obtained.
be defned as R − (Px , Py , Pz ) ∈ R3 , where R is the reference
Te B-spline curve of degree d in R3 is exemplifed by
point and Px , Py , Pz represent the 3D position of the point P
[47]:
[44]. As R3 represents the three-dimensional space, navi-
n
gating between two points is performed by radial rotation H(t) � i�0 Si,d (t).Ci , (1)
from a fxed point. Te connecting line is defned by r, θ, in
which r is the distance between two locations and θ rep- where Bt , d are the B-spline basis function defned on the
resents the angle between the axes. knot vectors and functions on knot vectors are denoted as
While the point cloud needs a reference point to be Si,d [47]:
located and used, the mesh feature allows the points to be
referred through another point. Mesh form is built by point- V � 0, . . . , 0, vd+1 , . . . , vn , 1, . . . , 1, (2)
by-point addition and reduction of some vertices by
compression. where similar end-to-end points are denoted as d + 1; the
Te data’s complexity necessitates using a B-spline curve control point [48] P(t) is exemplifed by
for this kind of mathematical manipulation, as stated above.
C � Ci R2 , i � 0, . . . , n. (3)
In this part, the B-spline curve will be discussed.
A B-spline curvature fts only on a fxed knot vector.
2.1.1. B-Spline Curve. Approximate 3D segmentation comes Assuming for any point, when cloud Z � Zk , k � 0, . . . , N,
with a problem of denoting features as fxed or defned as on the B-spline curve P(t) is [48]:
continuous curvature surfaces, and there are no reference 1 N 1
min C d2 Zk , P(t) + λ .fs . (4)
points [45]. To counter the problem, the B-spline curve is N + 1 k�0 n+1
presented, which is visualized in Figure 3. At the maximum
International Journal of Intelligent Systems 5
Te minimal distance between Zk and P(t) is stated as Te Fourier transform is utilized for eigendecomposition or
d(Zk , P(t)), in which fs is the term for regularization and λ the breakdown of a matrix into its parts for the graph’s
is the error coefcient. Tis regularization is defned as in Laplacian matrix. Te term “convolution” is used to describe
[49]: the process of multiplying the input neurons by a set of
1�� �� 1� � ��2 weights, sometimes called “flters” or “kernels,” in a graph.
fs � α ���P′ (t)���dt +(1 − α) ���P″ (t)��� dt. (5) In this part, discrete Laplacian, Fourier transform, and
0 0 convolutional operations on the graph will be discussed.
For 0 ≤ α ≤ 1, multiplicands of α and (1 − α) are denoted
as f1 and f2 . Assuming P(tk ) to be the projection point on 2.2.1. Discrete Laplacian on Graph Teory. Discrete Lap-
P(t) of Zk , the local Frenet frame on P(t) at C(tk ) is lacian (also known as the Laplacian matrix) is a spectral
Tk , Nk given that Tk is unit tangential and Nk is a unit graph machine-learning algorithm. Te Laplacian matrix
normal vector of curvature P(tk ). allows the creation of a link between discrete inputs of
Let Wk � Zk – P(tk ). Te quadratic approximation for graphs and manifolds. Te function provides a mathemati-
d2 (Zk , P(t)) is given by [48]: cally tractable solution to graph localization limitations. Te
2 2 eigenvalues at the adjacency matrix are an essential factor in
d2 Zk , P(t) ≈ ck .Wk T Tk + Wk T Nk . (6)
localizing two graph vectors as similar graphs provide the
Tus, the local quadratic model [47] is same eigenvalues [52]. Tus, graph Laplacian is a must to
understand the undirected graph.
1 N 2 2 1
F� c .Wk T Tk + Wk T Nk + λf . Te Laplacian matrix is given by L � D – A. Te degree
N + 1 k�0 k n+1 s matrix is denoted as D such that Dii � j Aij . Te Laplacian
(7) matrix can be of three forms [53]:
Te approximation of F matches to the Gauss–Newton
combinatorial: L � D − A,
method for ck � 0, and the approximation leads to intrinsic
parametrization for ck � 1. symmetric normalized: Lsym � D− 1/2
LD− 1/2
1/2 1/2
(8)
� I − D− AD− ,
2.2. Graph Teory. Graphs have been used for a long time for rando m walk normalized: Lrw � D− 1 L � I − D− 1 A.
analyzing and enhancing grid inputs from pictures. For 2D
pictures, grid inputs are directed graphs with components of Assuming a graph G that is undirected and straight-
link with direction [50]. Being successful in 2D analysis, 3D forward, the adjacency matrix will only contain 1 and 0.
observation comes with a few difculties to handle un- Tus, the value of L is given by [53]:
directed graphs and diferent types of input of mesh grids.
Te undirected graphs are structured by using the discrete ⎧
⎪ deg vi ; i � j,
⎪
⎪
Laplacian and Fourier transform using eigenvectors to add ⎪
⎨
local features. Lij � ⎪ − 1; i ≠ j, vi is adj acent to vj ,
⎪
⎪
Graph theories come with these challenges while ⎪
⎩
analyzing: 0; else,
(a) Node classifcation 1 ; i � j, deg vi ≠ 0,
⎪
⎧
⎪
(b) Graph categorization ⎪
⎪
⎪
⎪
(c) Clustering of nodes ⎪
⎪
⎪
⎨ −1
� ⎪ �������������� ; i ≠ j, vi is adj acent to vj ,
sym
(d) Prediction of link Lij
⎪ deg vi deg vj
⎪
(e) Infuence maximization ⎪
⎪
⎪
⎪
⎪
⎪
A graph [51] is defned by G (V, E) given that V and E ⎩
0; else,
represent vertices and edges, respectively. We assume vi ∈ V
and aij � (vi , vj ) ∈ E. Here, vi denotes the nodes and eij ⎪
⎧ 1 ; i � j, deg vi ≠ 0,
denotes edges among vi and vj . For N � |V|, the adjacency ⎪
⎪
⎪
⎪
matrix A is given by an N × N matrix stated as edge weights. ⎪
⎪
⎪
⎨ (− 1)
Here, Aij � aij > 0 ; eij ∈ E and Aij � 0; eij ∉ E. Graphs can Lrw �⎪ ; i ≠ j, vi is adj acent to vj ,
be of two types: directed and undirected, based on the types ⎪ deg vi
⎪
⎪
⎪
of edges. Te edges are connected with a direction for di- ⎪
⎪
⎪
⎩
rected graphs, and undirected graphs have connected edges 0; else,
without a direction. Tus, Aij � Aji � 1 for the undirected (9)
and Aij ≠ Aji for the directed graphs.
By analyzing the eigenvalues of the graph Laplacian where deg(vi ) denotes the degree of the node i. Te diferent
matrix, spectral graph theory may provide insight into forms of the Laplacian matrix are given by the type of the
whether a graph is linked and the quality of that connection. degree of the node.
6 International Journal of Intelligent Systems
For using the eigenvectors in the Fourier transform, the 2.3.1. Calculus Operation Manifold. Being manifold as non-
eigenfunction must the transformed via the basis function Euclidean surfaces, calculus cannot be performed directly as
e− iwt . For any graph signal, g ∈ RN such that gi is denoted as declaring variables is not possible. However, various
the value of ith node. methods have been present to create smooth surfaces on
Te Fourier transform can be expressed as ĝ � UT and a manifold that is known as a diferentiable manifold. A
the inverse Fourier transform as g � U ĝ. Here, eigenvectors diferential manifold is just a space on a manifold where
are used as the basis, and the transform projects the input calculus can be performed. Tat said, manifold data can be
graphs on the orthogonal space. segmented into many more spaces, and the calculus function
must be applied to all of them. Tus, the diference between
normal and manifolds is normal calculus works on only one
2.2.3. Convolutional Operations on Graphs. Using the dimension, whereas manifold calculus is structured with
Fourier transform of a graph in the frequency domain, the higher dimensions to adapt to multiple spaces.
undirected graph becomes well-directed and component- We state that f: M ⟶ R, such that f is a smooth
wise multiplication transforms. Convolution requires a flter function of manifolds defned in the scalar feld, where the
to fnd out the compact on convolution layers. Now, for any mapping function is given by M ⟶ TM. F(x) ∈ Tx M,
input b and using a flter g, the result is [55] as follows: such that F(x) is a tangent vector at the point x. Considering
b ∗ g � Uĝ(∧)UT b � ĝ(L)b. (12) Hilbert space felds of L2 (M) as a scalar feld and L2 (TM) as
a vector feld, thus multiplication results [56] are
Although the flters bring complexity with the number of
nodes and do not clarify a lot, we use parametrization of ĝ. (f, g)L2 (M) � f(x)g(x)dx,
Tus, the resultant is [55] as follows: (14)
D− 1 (F, G)L2 (TM) � (F(x), G(x))Tx Mdx.
ĝθ (L) � d�0 θd Ld , (13)
where ĝθ is the polynomial parametrization, whereas ĝ has With dx being the area element, diferentiating f results
a degree of D. Te complexity becomes clearer as ĝ is in df: TM ⟶ R. Again, diferentiation on the closest
D-localized with known relation θ0 , . . . , θ{D− 1} . neighbor points is defned as df(x) � (∇f(x), .)Tx M. Now,
applying the operation to the tangent vector,
df(x) � (∇f(x), F(x))Tx M. Now, a small displacement of x
2.3. Manifold Geometry. A manifold can be defned as results in ∇f: L2 (M) ⟶ L2 (TM) which is the gradient
curved, but locally, it can be seen as fat, as in Figure 4. Tus, operator. Tus, divergence [56] becomes
although a manifold is a non-Euclidean shape, it can be div: L2 (TM) ⟶ L2 (M):
treated locally as a Euclidean model.
Considering a manifold of the topological space M with (F, ∇f)L2 (TM) � (− div F, f)L2 (M) . (15)
the dimension number of n, Also, let m1 , m2 ∈ M such that Te relation between the gradient and divergence is as
m1 and m2 are neighboring points. Te inner product of the follows [56]:
tangent space is denoted by (., .): Tm1 M × Tm2 M ∈ R given
that Tm represents the tangential space of the Euclidean part ∆f � − div(∇f). (16)
and R is an abstract manifold capable of comparable
measurements. For describing 3D data objects, computer Laplacian is found symmetric as in [56]:
vision takes two neighboring surfaces as input and embeds (∇f, ∇f)L2 (TM) � (− div(∇f), f)L2 M � (f, ∆f)L2 M . (17)
them in the R3 space. Although the model is not completely
International Journal of Intelligent Systems 7
For compact manifolds, the functions of x work similarly Neighbor Node Neighbor Node
to the spectral analysis for graphs. Also, it can be seen that
Laplacian stands as a vital part of analyzing non-
Euclidean space. A
W
T
2.3.2. Discretization of Manifold. Discretization is required
Neighbor Node
for data conversion, such as from point clouds to manifolds.
For discretization, a manifold [57] can be sampled by N
points, where the positions of nodes are stated as B
x1 , x2 , . . . , xN , thus creating a graph accordingly. Te un-
directed graphs can be difcult to pose for the increase in
Target Node R
nodes of a unit area. Again, the surface can be created as the
mesh (V, E, F), where V is the vertex, E is the edge, and F is
the triangular face [57]: V � {1, . . . , N}, ijk, ijh ∈ F, and
(i, j), (i, k), (j, k) ∈ E. Te triangular mesh must have the Neighbor Node
designated manifold boundary, and the edge length should
Figure 5: Input graphs of GCNs.
be lij > 0, satisfying the inequity of a triangle. For
lij � ‖xi − xj ‖2, cotangent weights are given by
wij � 1/2(cot αij + cot βij ) for the manifold’s
Laplacian mesh. 9 2 5 7
0 4 6
8 3 9 2
3. Methods 4 7 2 3
6 5 0 15
B
or
ghb
m nei
a fro
T Dat
T
Data fro W
m neigh
bor
Da
ta A
fro
m
Neural ne
Network igh
bo R
r
Based on the context, it expressed the graph Fourier a hidden level of the network [61]. With the use of this
Transform and its inverse equation. Te equations distin- model, convolution flters are altered to provide greater
guish between spectral and special domains of a graph. g � optimization capabilities via complex-coefcient spectral
U ĝ represents the spatial domain, while ĝ � UT ^ indicates parameterization. Competitive outcomes on classifcation
the spectral domain. Consequently, if a signal g is used in the and approximation tasks were accomplished without the
spatial domain, ĝ is likewise signifcant in the spectral do- need for dropout or max pooling thanks to a more recent
main. Tese signals are typically referred to as kernels, and method of randomized change of resolution.
the Fourier coefcient of a graph often decays fast. Te signal
is compressible since its Fourier coefcients may be cal- (2) CayleyNets. Te CayleyNet model is a modifed model of
culated from a few graph coefcients. ChebNet [24]. ChebNet uses Chebyshev flters that avoid
expensive computation without using eigenvectors. Te
main drawback of the model is that it cannot produce
3.1.1. Spectral-Based GCNs. Te spectral-based GCN model narrow-band flters. It occurs when there are eigenvalues
that has been constructed cannot be applied directly to clustered around minimal frequencies and the spectral gap is
graphs; nevertheless, because of its efective feature ex- high [62]. CayleyNets add a new type of flter that takes the
traction capacity, it may be highly benefcial to extract simplicity of the Chebyshev flters and can also produce
features [60]. It is possible to defne non-Euclidean con- narrow-band flters to counter disadvantages. Te real value
volution using it, and by analogy, it may be used to describe of a complex function is determined as the Cayley poly-
the relation in terms of the frequency domain. In recent nomial of order r [24]:
times, the graph Laplacian matrix has been used directly
because its convolutional layer architecture is so efective. r
gv,z (λ) � v0 + 2Rej�1 vj (zλ − i)j (zλ + i)− j , (18)
(1) Spectral CNN. Te “specifying” architectural cluster in
where v � (v0 , v1 , . . . , vr ) is given as a vector for a single real
spectral CNNs includes additional inputs. A vector repre-
coefcient and r is the coefcient of complex. Z > 0 is
senting the network’s most recent output distribution is sent
denoted as the zoom parameter. For a real signal f, the
to the specifed input for each training dataset on each
Cayley flter is given as Gf and defned by [24]:
training step. Te propagation of this input then proceeds in
a conventional feed-forward manner, with the specifcation Gf � gv,z (∆)f
of a cluster and network layer at the conclusion. Tis r (19)
extracluster is likewise subject to a learning rule. Tis � v0 f + 2Rej�1 vj (z∆ − iI)j (z∆) + iI)− j f.
strategy comes in a variety of architectural forms: using
a single-layer structure or a multilayer structure, we link the Parameters of v and z are optimized in training. Te flter
outputs of the specifed cluster with the output layer and works with the basic calculation of matrix operations similar
International Journal of Intelligent Systems 9
to ChebNet. Tus, no eigendecomposition is required for the neighbors, and lθ represents linear projection from the edge
Gf flter to work. Cayley flters are based on the rational to the node feature space [31]. End-point features infuence
function of Laplacian and named ARMA flters. As general a concealed edge feature. Te following recursive model
ARMA flters require matrix inversion, there is no way to underpins this learning process [31]:
ensure stable inversion as the training path is unknown. Te (h− 1) 1) 1)
Cayley flters to ensure inversion are stable. Also, the general g(h) (h) (h)
uv � β 1 + f guv + lθ′ g(h−
u + g(h−
v , (21)
ARMA flter uses a larger number of parameters which are
where β(h) is also an MLP having 2 layers. Final shape
overftting for the objective, whereas the Cayley flter uses
embedding is obtained by projecting (linearly) these char-
a moderate number of parameters.
acteristics into 128 D vectors and summing them [31]:
Te model presents a new class of extremely regular,
H
localized, complicated rational Cayley flters that can rep- gX � h�1 y(h) .g(h) + c(h) . (22)
resent any smooth spectral transfer function. Te funda-
mental characteristic of the model is its ability to maintain Te model utilizes existing image and graph convolu-
localization in the spatial domain while specializing in tional neural networks and can operate on B-rep data. On
narrow frequency bands with a limited number of flter both supervised and self-supervised tasks spanning fve B-
parameters. rep datasets, advantages and adaptability are demonstrated,
outperforming other representations such as point clouds,
(3). UV-Net (Boundary Representations). U and V param- voxels, and meshes. A fresh synthetic B-rep dataset with
eters of curves and surfaces are clearly expressed, while an diferences in geometry and topology was once more in-
adjacency graph explicitly defnes topology in a boundary- troduced as SolidLetters.
representation data model. Tat happens when the user
combines convolutional neural networks with image pro- (4) CurvaNet. Analyzing 2D images with the regular grid is
cessing to create UV-Net [31], a network that both memories much less challenging than 3D images using mesh surfaces
and computes economically. or manifold input. Although traditional GNN models seg-
Numerous topological components, such as faces, edges, ment and use smaller surfaces, considering them fat, the
half-edges, vertices, and their connections, compose the model cannot diferentiate higher surface changes due to
boundary-representation data model. With a few clicks, it a lack of data. CurvaNet modifes the GNN model by in-
extracts the most critical geometric and topological data tegrating diferential geometry [63]. Data accuracy is en-
from the boundary-representation and transforms it into sured by downsampling by mesh pooling and upsampling by
a format suitable for current neural network unpooling operation using an encoder and decoder. Tus,
architectures [31]. minimization of classifcation error is done by considering
Te UV-Net representation ofers some benefts: more input properties. Te architecture is quite similar to
(1) For both primitive and geometric surface types, the U-Net model, which runs through the curvature flter
curve assessment of parameters can be applied (CF) and graph convolution flter (GC) for segmentation.
quickly and easily [31] Skip connection is used to preserve precise boundaries [64].
A directional curvature flter negates fxed curvature
(2) Te representation is sparse and proportional to the limitations such as data loss and underfts or overfts.
number of B-rep contour surfaces Addressing weight parameter direction is easier by seg-
(3) Te grid is mostly independent of precise menting all directions with many tangent vectors. Te
parametrization vectors must have a unique origin to point out null values. To
ensure fxed parameters are provided for pool rotation for
Using graph convolutions, local curves as well as surface
diferent angles. Graph convolution layers are used for
characteristics are conveyed over the whole boundary rep-
sampling a neighborhood using the graph Laplacian matrix.
resentation. Curve and surface convolution is performed by
ChebyNet24 and graph attention network (GAT) 56 are used
taking 2D UV-grids. For message passing, contour CNNs
for the function. Afterward, the properties of curvature are
are hidden features considered input edges and node fea-
conserved by downsampling and upsampling. Let σ be the
tures of the GNN. We calculate the hidden node features
nonlinear activation function and b and d be the shared
gv (h) in the graph layer h ∈ 1 · · · H by combining all the
kernels. Te feature matrix of curvature yields C, where k is
input features of the node gv (h − 1) from a one-hop
the interval number and m is the maximum number [63]:
neighborhood u ∈ A(v) while conditioning them on the
edge features [31] guv (h − 1):
c(k)
i � σ cTi A(k)
i b
(k)
+ d(k) ,
(23)
g(h) ⎢
(h) ⎡
⎣1 + e(h) g(h− 1)
+ lθ g(h− 1)
⊙ gu ⎤⎥⎦.
(h− 1)
ci � max c(1) (k) (m)
v � α v uv i , . . . , ci , . . . , ci .
u∈A(v)
dependent on mesh simplifcation was presented to make properly functioning, repetitive relation eliminates the
use of multiscale curvature features. eigendecomposition of ALBO and learning becomes easier
[24, 68]. If the Chebyshev polynomial is Ln (c) of order n, the
(5). Anisotropic Chebyshev Spectral CNNs (ACSCNNs). flter ĥ(c) can be expressed [30]:
Anisotropic Chebyshev spectral CNN [30] is a new shape
n− 1
correspondence architecture based on manifold convolu-
ĥ(c) � bn Ln (c). (30)
tion. Extended convolution operators combine local signal n�0
characteristics by a series of directed kernels around each
point, capturing additional signal information. Based on An extension of the manifold convolution operator, the
multiple anisotropic Laplace–Beltrami operator (LBO) model suggests the anisotropic convolution operator. Due to
eigendecomposition, spectral fltering is used to train ker- its direction-based consideration, this sort of anisotropic
nels. To decrease computing difculties, trainable Chebyshev convolution enables a more thorough capture of the intrinsic
polynomial expansion coefcients are used to represent local information of signals when compared to earlier works.
spectrum flters [30]. To simplify the computation, Chebyshev polynomials are
Te manifold X [30] is defned by used to express the flters with trainable coefcients. In
certain cases, the achieved outcome was superior to that of
M2 (X) � g: X ⟶ R, g(x)2 dx < ∞, (24) the earlier models.
and dx is the area element. 3.1.2. Spatial-Based GCN. Unlike the spectral base, this
Vallet and Lévy [65] observed that the eigenvalues and convolution can be used directly because the kernel size is
eigenfunctions of the LBO are comparable to the frequency fxed. So it must need to select the neighbor of concern to be
as well as Fourier basis in the Euclidean space. Te LBO can convoluted in a traditional manner. It uses pseudocoordi-
be described [30] as nates by the flter function. Tese pseudocoordinates ac-
∆X g(x) � − divX ∇X g(x). (25) cumulated at the time of convolution. Te most difcult
aspect of developing CNNs that function with core nodes
Te inner product ĝ(cp ) � < g, σ p > X is known as the that have a variety of neighboring nodes is establishing local
Fourier transform (coefcient) for manifolds, because the invariance for such CNNs.
eigenvalues and eigenfunctions of LBO have periodic Te frst intrinsic version of CNNs was introduced by
properties. Its inverse Fourier transform for the manifold Masci et al. [16]. Ten, evolution began by Boscaini et al. [25]
g ∈ M2 (x) can be expressed [30] as by introducing anisotropic heat kernels. Te more general
framework (MoNet) was then introduced by Monti et al.
g(x) � < g, σ p > X σ p (x)
[69] to develop the deep convolutional architecture on
p≥0
(26) graphs and manifolds. Ten, the B-spline-based flter was
� ĝcp σ p (x). introduced by Fey et al. [70], which works quite efciently in
p≥0 the input of arbitrary dimensionality.
To defne the convolution theorem based on manifolds (1) Difusion CNN (DCNN). Te motivation for difusion
[30], CNN is that a form encompassing graph difusion can serve
(g ∗ h)(x) � ĝcp ĥcp σ p (x). as a more reliable foundation for forecasting than a graph
(27) alone. A simple method for including contextual in-
ĝ(cp )
formation about things that are calculated in polynomial
Anisotropic LBO [66] can be defned as time and efectively used on the GPU is provided by graph
difusion, which may be repressed. Many methods, such as
∆X g(x) � − divX T(x)∇X g(x). (28) probabilistic structural models and kernel methods, in-
As illustrated in the following model, ALBO is redefned corporate depth information in classifcation tasks; DCNNs
[66] as ofer a supplementary strategy that signifcantly improves
predictive performance at node classifcations [23]. When
∆βλ g(x) � − divX Cλ Tβ (x)Cλ Q ∆X g(x). (29) performing node classifcation tasks, difusion-
convolutional neural networks outperform probabilistic
Here, Cλ is a revolution about a surface with a normal relational models and kernel approaches thanks to the
angle λ on the direction of the tangent, T(x) is a thermal representation that captures the efects of graph difusion.
conductivity tensor, and parameter β controls the aniso- Difusion processes perform a good job of representing
tropic level [30]. nodes, but they are inefective in summarizing complete
Instead of employing anisotropic heat kernels in [25], it graphs.
aims to learn kernels that depend on tasks by learning their
parameterized flters ĥ(c). As stated in [24, 67], a poly- (2) Graph Neural Network (GNN). Te graph neural network
nomial flter may be used to solve these problems. Cheby- is a supervised neural architecture that works well in terms of
shev polynomials are adopted to the flter ĥ(c), and due to its the graph and node-based applications. Two existing
International Journal of Intelligent Systems 11
concepts are combined into a single framework by this shape requires a separate local function, while the functions
model. Te neural network model will be referred to as the are not learnable to use on similar locals. Using parametric
GNN. Both random walk models and recursive neural construction which can be used in similar localities, a dif-
networks are shown to be extensions of the GNN and to ferent framework of a mixture model network is presented
retain their features. Te model expands repetitive neural [69]. Operations are convolution working on spatial-domain
networks because it can handle node-focused tasks with no methods using local parameters of ‘patches’ of manifolds or
processing stage and can analyze a wider range of graphs, graphs [73]. Te patches create a local function that can be
covering cyclic, oriented, and undirected graphs. Te represented as a Gaussian kernel mixture.
method broadens the range of processes that may be de- A general spatial framework is used in mixture model
scribed and adds a learning mechanism to random walk CNNs for graphs and manifolds. For any point on the
theory [19]. manifold x and the vertex y of the neighborhood N such that
Te model ofers a cutting-edge neural network archi- y ∈ N (x), a vector of dimension d and pseudocoordinates
tecture that can handle inputs from cyclic, directed, and are denoted as u(x, y). A weighting function is achieved
undirected graphs or a combination of these. Te difusion using learnable parameters. Using fxed parameters of
of information and relaxation mechanisms are the model’s Gaussian kernels and geodesic coordinates, the mixture
foundations. Analysis of the outcome shows that the strategy model CNN can be reverted to GCNN, ACNN, and GCN
is also appropriate for huge datasets. models. Te mixture model is based on parametric kernels
by using learnable parameters denoted as follows [69]:
(3) GraphSAGE. To accomplish the objective of node cat-
egorization, GraphSAGE models fully utilize the attribute −1
information, structure, and knowledge of nodes in social ⎝− 1u − w T u − w ⎞
gj (u) � exp ⎛ ⎠, (31)
j j
2 j
networks, as well as mine the implicit mutual information
among nodes. Te best performance of a graph neural
network may be comparable to that of the graph WL where Σj is the covariance matrix and wj is a mean vector
ISOMORPHISM test when the data structure (update fea- constructed from a Gaussian kernel. Covariances are re-
ture plus aggregate feature) in graph networks is singular. stricted to 2D and 2jD for patch operators.
An enhanced GraphSage [71] method is used to create the To achieve deep learning, the aggregate ensemble CNN
model for learning about GraphSAGE. model integrates two distinct convolutional neural network
Te model ofers a revolutionary method that makes it architectures. Two deep-learning networks—AlexNet66 and
possible to efectively create embeddings for invisible nodes. NIN67—are integrated to calculate the weighted average of
GraphSAGE successfully balances performance and runtime feature vectors. Based on AECNN modeling runs, we see
via sampling node neighborhoods, regularly outperforms that the aggregate model outperforms the single-CNN en-
state-of-the-art baselines, and ofers a theoretical analysis semble model in terms of classifcation accuracy and re-
that sheds light on understanding local graph structures. trieval precision for images.
(4) Large-Scale Graph Convolution Network (LGCN). Te (6) SplineCNN. SplineCNN is a modifed version of CNN for
algorithmic structure cooptimization to speed up large-scale countering the non-Euclidean geometry focused on using B-
GCN inference on FPGA is provided to combat the sig- spline. Te model uses the convolution of spline bases, and
nifcant expense of external storage access when evaluating the convolution layer takes undirected data with a directed
the graph-structured dataset. To comply with on-chip graph as input [70].
storage restriction, frst, data splitting is executed. Ten, A trainable set is used to aggregate the node features in
to decrease computational complexity and improve data the spatial layer. Represented by n(i), the node features are
locality, a two-phase preprocessing approach is created. Te weighted by the continuous kernel function [74]. Te spatial
main computational kernels are mapped on an FPGA during relation of the nodes is stated by pseudocoordinates in U.
hardware design, and data transmission for pipelined exe- Presenting no restrictions on U, no values are lost in a local
cution occurs through on-chip memory [72]. Varied GCN neighborhood as it can contain edge weights, features of
architectures and various analytic orders are supported by nodes, and local data.
the data path. A continuous kernel function is used for the convolution
Te suggested architecture allows for the application of operation that uses B-spline bases. Constant values of
standard convolutional methods while transforming generic trainable sets are used for parametrizing the function. For
graphs into data with grid-like patterns. Transformation is computation efciency, all input outside the preset interval is
carried out using a brand-new k-largest node selection set to zero. Now, considering a B-spline curvature of degree
method that ranks the values of node features. d and a trainable variable tp,l ∈ T for every element p formed
from the Cartesian product [70], P � (Nm m
1,i )i × . . . × (Nd,i )i
(5) Mixture Model CNN (MoNet). For analyzing non- while l is the input feature. For Iin as input, the trainable
Euclidean geometries of GCNN and ACNN for graphs, parameter can be defned as K � Iin .di�1 ki . Now, defning
GCN and DCNN models are proposed, but they come with the continuous convolution function [70] gl [a1 , b1 ] × . . . ×
some shortcomings. For analyzing, each segment of the [ad , bd ] ⟶ R as
12 International Journal of Intelligent Systems
gl (u) � tp,l .Bp (u), models accomplish localization by grouping the nodes in
(32) their immediate surroundings.
p∈P
Te most signifcant thing to note is that it may be
where Bp represents the product of basic functions. Relating applied directly to the graph data in contrast to the spectral
to traditional CNN, SplineCNN uses a normalization factor one. While a model based on spectral analysis has difculty
as an extra and a diferently functioned flter. Otherwise, dealing with graph-based input data, a model based on
SplineCNN can also take 2D inputs with a few kernels to spatial analysis can very efectively handle many sources of
analyze the dataset. graph input (such as edge features and edge directions).
On irregularly structured, geometric input data, the Because it functions well only on certain preset graphs,
model learns. Te proposed convolution flter combines the spectral-based approach only has a restricted range of
nearby information in the spatial domain by using a train- applications. To put it another way, it is improbable that the
able continuous kernel function with trainable B-spline model trained for one application would function well for
control values. SplineCNN is the frst architecture that another, because its graphs and Laplacian nature are unique.
enables robust end-to-end deep learning directly from On the other hand, on a spatial basis, it is somewhat
geometric data. reliant on each node of graphs. Because of this, it is used
extensively in three-dimensional form and structural anal-
ysis (e.g., shape correspondence on the FAUST dataset).
3.1.3. Spectral- and Spatial-Based Models. In this method,
Because of this, it is relatively easy to apply this model to
models are benefted from both spectral and spatial char-
a variety of roles and structures. Because of this, the spatial
acteristics. Teir combined features use for better data
model is broader and getting more appreciation day by day.
extraction.
Effective Effective on
on Larger
Moderate Data
Data
Used
in Dependent
3D forms on each
and graph
Requires structural node
Eigen Spectral
Preset design Spatial
Decomposition Domain
Graphs Domain
Application
Node of group
Graph grouping data
Handling directly
Difficulties
(a) (b)
Figure 10: (a) Voxel data using 3D segment and (b) image data using the 2D grid. Because of higher dimensions, quality becomes higher
than that of the right one.
(a)
CNN1 Chair
Shelf
CNN1
Table
Wardrove
CNN1
Cabinet
CNN1
(b)
Figure 11: (a) Shape analysis with diferent views; (b) the multiview model architecture takes views as 2D input to analyze 3D objects.
recovered using sketches with high precision and take ad- 3D form
vantage of the implicit understanding of 3D shapes included accumulated
Data Combined from multi-
in their 2D views by connecting the information of 3D
with sphere perspective
shapes to 2D representations like sketches. representations representation
Input is
3.2.3. Diference between Volumetric CNN and MVCNN. Better
generated
classification
A 3D form is encoded in the volumetric representation as Multi View from multi-
performance
a 3-dimensional tensor of binary or real values. Oppositely, CNN angle views
the multiview representation organizes a 3-dimensional Volumetric
form as an accumulation of multiple perspective repre- 3-D form CNN Larger
sentations. It seems intuitive that the volumetric represen- is input
tation should be able to input extra information about the encoded data
characteristics of three-dimensional structures rather than Provides
Real-time
the multiview representation. Te most important highlights < 7.3% accurate
value taken
are shown in Figure 12. value in
as input
However, Qi et al. [36] replicated the tests using a grid comparison
of 30 voxels with 3D ShapeNets with multiview CNNs on Figure 12: MVCNN vs. volumetric CNN model features.
the ModelNet40 dataset. According to the data, the cate-
gorization performance of individual algorithms suggests
that the volumetric CNN with voxel-dependent perfor- ShapeNets (84.7%). In this experiment, data from
mance is 7.3% less accurate than the MVCNN [82]. Tere MVCNNs are combined with sphere representations of the
are at least two probable explanations, including the input grid of 30 equal in height, width, and length. Even with
data performance and the diversity in network architecture lower features (resolution) of input data, the classifcation
[36]. However, when both networks are fed equal levels of performance of the MVCNN is much greater than that of
information, the accuracy of the classifcation of MVCNNs 3D ShapeNets. Tis shows that the design of VCNNs has
is much higher (89.5%) than that of 3-dimensional many opportunities for improvement.
16 International Journal of Intelligent Systems
3.2.4. Point-Based. Te advantage of the point-based ap- that intelligently aggregate multiscale data following local
proach is that it may take point cloud data as input without point densities, producing improved results to address the
frst transforming it to voxel, mesh, or another type of 3D problem of nonuniform point sampling.
representation. A point cloud is a combination of geo-
metrically important points that form a structure. In place of (2) Taylor GMM Convolutional Network (TGNet). Te Taylor
a large number of benefts, there are certain difculties, such GMM convolutional network constructs a graph pyramid
as sparsity and the unpredictability of the geometric data. through clustering point clouds. In each layer of the pyra-
Because it may be used efectively in various contexts, re- mid, local regions are abstracted progressively. For learning
searchers are becoming more interested in this model. As local features, TGConv is applied and the targeted data are
a direct consequence of this, there is persistent progression again interpolated at a fnessed scale at each layer. On
[83, 84]. a similar scale, the features are interconnected [85]. TGNet
uses limited computation, and information losses occur. To
(1) PointNet++. PointNet++ [27] is a hierarchical structure counter the issue, MLP can be applied to input for con-
of the neural network that performs recursion of PointNet serving information along the process. Finally, sampled
[26] on layered partitioning of the input. It is a direct features and the fnessed scale are combined for per-point
successor of PointNet. Although PointNet was the very frst segmentation.
DNN capable of manipulating 3D point clouds natively, Te TGNet model is driven by TGConv [86]. Let a graph
several networks have since been developed. It learns the G � (V, E, D) formed of a point cloud
spatial coding of every location in the input cloud and then, C � c1 , c2 , . . . , cn ⊆R3 , where V � {1, 2, . . . , n} is the col-
by aggregating all the features, it determines the global lection of vertices and E⊆V × V is the collection of edges
characteristics of a point cloud. However, PointNet++ [86]. Every directed edge (x, y) ∈ E has 3D pseudo-
removes its shortcoming by solving how a local division of coordinates defned as d(x, y) given that D is the set of the
a point cloud can be carried out and how local characteristics coordinates. We consider all point y ∈ H(x), where S is the
of a point cloud can be extracted. In other words, it learns neighbor set vertex x, d(x, y) which indicates a 3D vector of
about local characteristics with increasing contextual scales coordinates of y. Let a � a1 , a2 , . . . , aN be the set of input
[27]. Using an analysis of metric space lengths, Qi et al. [27] features of the vertex. For F as a feature dimension, the
claimed that it is capable of learning features very robustly, features ai ∈ RF are associated with a related graph i⊆V.
even in nonuniformly sampled point sets. Te way it extracts Local coordinates are generated from input characteristics
features can be briefed into three sections: using Taylor kernel functions [86] with Gaussian weighting.
Te learnable weighted functions are denoted by D. Te
(a) Sampling Layer. Te sampling method is known as
convolution function is performed by the aggregation of the
FPS, or farthest point sampling, which begins its
feature sets and is given by [86]:
work by picking a random series of points out from
point cloud functioning as its input. S
(f ∗ g)(x) � Agggθ s�1 Ds (x)ay , y ∈ H(x). (36)
(b) Grouping Layer. Te objective of the grouping layer
is to construct local areas before extracting charac-
teristics. More specifcally, this research uses the Te model suggested altering the linear combination of
neighborhood ball approach rather than the KNN convolutional feature maps in the conventional convolu-
algorithm since it is feasible to ensure a set area scale. tional operation of CNNs to collect detailed high-frequency
Multiple subpoint clouds are created by employing and low-frequency information. It is shown that TaylorNets
surrounding points surrounding centroid points have a nonlinear combination of the convolutional feature
(within a defned radius) [27]. maps based on Talyor expansion 87. Te steerable module
created by TaylorNets is generic, making it simple to include
(c) PointNet Layer. As described earlier, it uses the in various deep architectures and to be taught using the same
PointNet algorithm to originate preliminary as- backpropagation algorithm pipeline. Tis results in a higher
sumption and then run it through some iterations to representational capacity.
get closer to the extraction of features.
In terms of 3D shape, there may be an ununiform density (3) MongeNet. In the ShapeNet model, triangular meshes are
of sampling points like the perspective efect, and radial used for sampling 3D surfaces, but sampling gets irregular as
density variations cause great trouble learning features. Qi the surface angle changes and clamping or undersampling
et al. [27] proposed an abstraction layer that may aggregate occurs. Te problem can be defned as a transport problem
information from multiple aspects according to local point of discrete measures and simplex. MongeNet is a neural
densities. By this algorithm, the author comes up with an network working as a uniform mesh sampler to counter the
accuracy of 90.7% in ModelNet40 [83]. mentioned problem [87]. Computation is performed on
Te model is suggested for handling sampled point sets GPUs and batchwise across triangles. Tis model’s direct
in the metric space. PointNet++ efciently learns hierar- competitor is the current cutting-edge sampler PyTorch3D.
chical features concerning the distance metric by performing Test fndings suggest that, for a moderate increase in
recursive operations on nested partitioning of the input computational cost, the model provides greater performance
point set. Two unique sets were suggested abstraction layers than the previous model.
International Journal of Intelligent Systems 17
MongeNet is proposed to replace the already established well spectral-shaped descriptors. Tus, spectral descriptors
sampling technique using a triangle mesh. Te model may be utilized as heat kernel signature (HKS), wave kernel
minimizes 2-Wasserstein distance which implies favorable signature (WKS), and optimal spectral descriptors (OSDs)
transport distance for segmented Euclidean metrics [87]. [16]:
MongeNet approximation relies on solving convex which
relies on Laguerre tessellation for computing the optimal (D(x)f)(ρ, θ) � vρ,θ x, x′ fx′ dx′ . (39)
distance. X
Generally, solving the resulting point cloud should
consume a lot of time with a high cost, but the model Mapping is performed by (D(x)f)(ρ, θ), and the value
proposes learning optimal positional points using a feed- of the function f is in the range of x ϵ X to the neighbor
forward neural network with satisfactory approximation and polar coordinates (ρ, θ), in which vρ (x, x′ ) is the radial
fast calculation. Stating that the network is denoted as fθ , interpolation weights having a geodesic distance from x
revolving around ρ, vθ (x, x′ ) represents the angular weights
where θ is the learnable parameter. Te input is the triangle t
with limited output points such that o ∈ [a, b] and random derived from a collection of geodesics radiating from x in
noise n ∈ R that follows the normal distribution, and the direction θ, dx′ indicates the surface component of the
output provided is a random order of fθ (t, o, n) ∈ R3×o . For Riemannian measure, and vρ,θ localizes a weighting function
training set D with components [t, S], the sampled points around vρ,θ [16]. (D(x)f)(ρ, θ) in GCNN converts the
values for the function f around the node x into the regional
[87] are
polar coordinates ρ, θ, hence forming the geodesic convo-
N b lution [16]:
argmin θ L fθ ti , o, ni , Si . (37)
i�a o�a (f ∗ h)(x) � max∆θϵ{0,)2π h(ρ, θ + ∆θ)(D(x)f)(ρ, θ)dρ dθ,
Stating Wε2 as optimal transport and ni ∼ N(0, 1), the (40)
loss function L is given as follows [88]:
where h(ρ, θ + ∆θ) acts as a flter. GCNN is composed of
L(t, o, n, S) � Wε2 fθ (c, o, n), S
numerous consecutively applied layers. Te layer is difer-
(38)
− αWε2 fθ (c, o, n), fθ c, o, n′ . entiated as follows:
(1) Typically, the linear layer comes after the input layer
Te model overcomes drawbacks of the common mesh and before the output layer to modify the input and
sampling approach used by most 3D deep-learning models, output sizes by a linear function.
such as its proneness to erroneous sampling and clamping, (2) Te usual Euclidean convolutional layer is replaced
which leads to noisy distance estimates. For a small addi- with the geodesic convolution (GC) layer.
tional investment in computer deep learning, MongeNet
outperforms already used methods, such as widely used (3) Te angular max pooling layer combines the GC
random uniform sampling. layer to estimate the optimum flter rotation [16].
(4) Te FTM layer is an extra constant layer that per-
forms the patch operation to every input dimension,
3.2.5. Spatial-Based. Spatial convolution is the imple- proceeded by rotational coordinates as well as actual
mentation of graph-based convolution processes directly. value Fourier transform.
Being the size of the standard convolution kernel predefned, (5) Te covariance (COV) layer is utilized in recovery
a predetermined-length neighborhood must be selected for applications that need the aggregation of point-wise
convolution if standard convolution is performed on descriptors into something like a descriptor of global
a graph. However, graph nodes often contain a variable shape [89].
number of neighbors despite data with a normal grid form.
Te graph convolution technique mimics the image con- Te model was developed for uses like shape corre-
volution process and is constructed from spatial node re- spondence or retrieval to learn hierarchical task-specifc
lationships. Similar to the central pixel in normal CNN 3 × 3 features on non-Euclidean manifolds. Our model is ex-
flters, the presentation of a center node depends on the tremely fexible and general, and by stacking additional
aggregate output of its neighboring nodes. layers, it may be made arbitrarily complicated. By altering
the local geodesic charting process, GCNN could be used for
(1) Geodesic Convolutional Neural Networks (GCNN). Te diferent form of representations, such as point clouds.
GCNN [16] model is an extension of non-Euclidean man-
ifolds of the convolutional neural network (CNN) paradigm. (2) Anisotropic Convolutional Neural Networks (ACNNs).
Tis local geodesic framework of polar coordinates is used to Tese networks are generalization of standard CNNs to non-
extract “patches,” which pass through a series of flters in- Euclidean entities in which traditional convolutions are
cluding linear and nonlinear processes. To reduce a task- substituted by projections over a set of centered approach
specifc cost function, the values of the flters with linear anisotropic difusion kernels [25]. As spatial scaling func-
combination weights are optimized variables. Utilizing the tions, anisotropic heat kernels retrieve the inherent regional
diagonal results of heat-like operators yields a variety of very representation of a function defned on the manifold. Tis
18 International Journal of Intelligent Systems
ACNN architecture is a CNN. Tis patch operator design is “Convolutional Neural Network”)) which is shown in Ta-
far easier than those of GCNN, irrespective of the manifold’s ble 2. It describes the most advanced uses of computer vision
subsurface diameter, which is not limited to triangular applications of these models, including object detection,
meshes. Te basis of this method is the generation of re- medical imaging, face detection, action, and activity de-
gional geodesic polar coordinates utilizing a technique that tection, human pose detection, network detection, and
may have been used for fundamental shape context pedestrian trajectory. Tese applications are stated along
descriptors [90]. with their specifc employments.
ACNN translates heat kernels as a regional weighted
function and builds the patch operator as follows [25]:
3.2.6. Comparative Analysis of Described Models. Table 1
X hαθ1 (x, y)f(y)dy provides a comparative analysis of described models in
Dα (x)f(θ, t) � ⎝
⎛ ⎠
⎞, (41)
X hαθ1 (x, y)dy terms of numerical values, novelty, and their limitations,
which is sorted according to the proposed year. Initially,
and for some anisotropy level, α > 1. hαθ1 (x, y) is the an- ShapeNet [78] was proposed in 2015 in the feld of non-
isotropic heat kernel that indicates the quantity of heat Euclidean geometry which was successful in terms of gen-
transmitted at the period t from the point x to the point eralizing CNN to learn specifc features. However, it is only
y [25]. limited to mesh features. With continuous improvements, in
Convolution [25] can be described as the same year, MVCNN [33] was proposed which is compact
and efcient with improved accuracy. However, the 3D
(f ∗ b)(x) � b(θ, t) Dα (x)f(θ, t)dt dθ. (42) descriptor is untested. Later on, in 2017, TGNet [85]
changed the linear convolution of CNN in the feature map,
Te major curvature direction is primarily used as the but its TGConv has a very narrow dynamic range. In 2017,
reference θ � 0 in ACNN’s creation. Te most potential MoNet [69] was proposed to ensemble the CNN model, but
future work path is the use of ACNN to graph learning. this method does not follow segmentation and weighted
GCNN and ACNN approaches work in spatial domains; categorical cross-entropy outcomes.
avoiding the limitations of standard spectral approaches In the next year, SplineCNN [70] was proposed which
with varied domains, these techniques have proven to be uses trainable continuous kernel functions and B-spline
more successful than traditional hand-crafted methods in values to extract local features. However, its global behav-
locating deformable shapes. ior becomes worse along the large geodesic error. In 2019,
Convolutional neural networks are generalized to non- CayleyNet [28] was proposed, which specializes in small
Euclidean domains in the proposed model, which enables frequency bands with few flter parameters while main-
deep learning on geometric data. Te work, which is cur- taining spatial localization. However, its bidirectional line
rently the most generic intrinsic CNN model, continues the cost is more, and the routing method is more sophisticated.
very recent trend of applying machine-learning techniques In this continuation, CurvaNet [63] proposed in 2020 for
to computer graphics and geometry processing applications. a U-Net-like hierarchical structure is shown to exploit
Te provided models are fairly efcient and benefcial for multiscale curvature characteristics but does not have
a variety of reasons; therefore, their results and analyses are structural regularization such as segment class topology. In
quite valuable for surveying their efcacy on various data- this succession, MDGCN [42] initiated in 2020 utilizes
sets. Te accuracy and error of the previously stated models dynamic graphs that are gradually refned and can accurately
are summarized in Table 1 together with their related encode the inherent similarities between image regions.
datasets. LSTM is used which requires high computational cost. Te
A second perspective was studied, based on the Scopus introduced GC unit has lower-order approximations of
string TITLE-ABS-KEY (geometric AND deep AND spectral graph convolution.
learning, OR graph, OR manifold) AND (LIMIT-TO Again in 2020, ACSCNN [30] ofers anisotropic con-
(PUBSTAGE, “fnal”)) AND (LIMIT-TO (PUBYEAR, 2022) volution enabling a more thorough capture of the intrinsic
OR LIMIT-TO (PUBYEAR, 2021) OR LIMIT-TO (PUB- local information of signals. However, the model needs
YEAR, 2020) OR LIMIT-TO (PUBYEAR, 2019) OR LIM- shape segmentation and classifcation. In 2021, MongeNet
IT-TO (PUBYEAR, 2018) OR LIMIT-TO (PUBYEAR, 2017) [87] was proposed. Te gap between the target point cloud
OR LIMIT-TO (PUBYEAR, 2016) OR LIMIT-TO (PUB- and the sample point cloud from the mesh shrunk more
YEAR, 2015)) AND (LIMIT-TO (SUBJAREA, “COMP”)) quickly. So, at a given optimization time, the input point
AND (LIMIT-TO (EXACTKEYWORD, “Deep Learning”) cloud was represented more accurately. However, it is
OR LIMIT-TO (EXACTKEYWORD, “Geometry”) OR a costly and time-consuming computation. Tis year, UV-
LIMIT-TO (EXACTKEYWORD, “Deep Neural Networks”) Net [31] utilizes existing image and graph convolutional
OR LIMIT-TO (EXACTKEYWORD, “Convolution”) OR neural networks and can operate on B-rep data. However,
LIMIT-TO (EXACTKEYWORD, “Convolutional Neural the model did not use the B-rep curve and surface types, edge
Networks”) OR LIMIT-TO (EXACTKEYWORD, “Com- convexity, half-edge ordering, etc. Moreover, UV-grid fea-
puter Vision”) OR LIMIT-TO (EXACTKEYWORD, tures do not rotate.
Table 1: Performance summary of the described models with their corresponding datasets as well as their limitations and complexity.
Architectures Year Category Datasets Result (accuracy) Novelty Limitations
FAUST [91] 4.04 (EER %) Suggests generalizing CNN to
Experiments were limited to
ShapeNet [78] 2015 Voxel-based SCAPE [92] 3.34 (EER %) learn hierarchical task-specifc
meshes only
TOSCA [93] 5.61 (EER %) features
Compactness, efciency, and Building compact and
ModelNet40 [94] 90.1% improved accuracy are attained discriminative 3D descriptors is
MVCNN [33] 2015 Multiview-based by creating descriptors that are untested
aggregations of data from
SketchClean [95] 87.2% Time-consuming, large datasets
many viewpoints
ScanNet [96] 62.2%
International Journal of Intelligent Systems
700
Review
591 31
600
498
500
400
349
311
Article
300 890
174 Conference Paper
200 1126
91
100
14 42
0
2015 2016 2017 2018 2019 2020 2021 2022
Figure 13: Graph shows documents published by year (till 2022), and charts show published documents by type.
4. Future Prospects and Challenges Various usages and challenges are illustrated in Fig-
ure 14. As challenges tend to be resolved, our future will
Tese models are meant to deal with data that cannot be progress toward these advanced applications. Key points of
conveniently represented in the Euclidean space and have challenges are visualized and explained as follows:
shown exceptional performance in tasks like shape identi-
fcation, chemical design, and social network analysis. Tese (1) Computational Difculties. Despite the success of
non-Euclidean deep-learning models have several promising GNN in several disciplines, the high cost of com-
applications, including computer vision, robotics, natural puting remains a challenge for academics and ap-
language processing, and drug discovery. As a result, the plications. A DNN includes a high set of variables,
future scope of non-Euclidean deep-learning models is very which makes testing and training stages computa-
promising, and the research area is expanding in this feld at tionally costly. It requires higher hardware computer
an impressive rate. In addition to computer vision, it has resources like GPUs. GNN exhibits the same char-
already been used in applications such as AI pathfnding, 3D acteristics. In addition, computation is difcult due
mapping, medical diagnostics, molecular analysis, VR-based to the complex relationship among graph nodes as
applications, and even big data classifcation. well as the nongrid structure. In contrast, the great
Te extracted data by the string are plotted as “Docu- majority of existing deep-learning systems deal with
ment Published by Year” and “Documents by Type” in the normalized datasets in the Euclidean space, such as
following graphs and pie chart in Figure 13. a 1D or 2D grid, which may make use of the strong
Te bar graph gives information about documents processing capacity of contemporary GPUs. In most
published by year (till 2022). According to the graph, it can instances, however, geometric data may not have
be seen that the number of publications started to increase a grid-like structure, necessitating other approaches
from 2015 to 2021. In 2015, there were 14 documents for efcient and sophisticated computing. Terefore,
published. Ten, 42 were published in 2016 (an increased accelerating graph neural network performance is
121%), 91 were published in 2017 (an increased 116%), 174 a pressing demand.
were published in 2018 (an increased 91%), 349 were (2) Complex Architecture. Concerning the issue of Eu-
published in 2019 (an increased 100%), and 498 were clidean data, the CNN architecture has achieved
published in 2020 (an increased 42%). It was highest in 2021 remarkable progress in the area of deep learning. Te
when the document published the most, 591 published (an model gets more sophisticated as the number of
increased 18%). 311 documents were published till the frst network levels grows. Empirically, neural networks
half of 2022. with more variables have the potential to perform
From the pie chart, it is clear that conference paper is well. However, stacked multilayered GNNS will
published at the highest rate at 1126, and 890 articles were exacerbate the smoothness issue. Te nodes in
published in the document. However, the review rate is too a graph are connected to their surroundings, as well
poorer than the article and conference paper. Tere are as the graph function in a network is a stream of data
a total of 31 reviews. In 100% of the pie chart, the total type of dispersion and consolidation. Te more layers that
documents published is as follows: conference paper 55% are stacked, the more data from nodes will be in-
(1126) + article 43% (890) + review 1.5% (31) � 100% (2047). tegrated, resulting in an identical picture for all
By evaluating the statistical future, it may be concluded nodes. Consequently, each vertex will fall to the same
that demand for geometrically based deep learning will value. For instance, the majority of complex GCN
increase. However, it leads us to newer challenges and architectures include little or more than three or four
difculties day after day. Some difculties are solved, and layers. Te study [206] sought to use a more so-
more of these are just ahead of us. phisticated network architecture; however, its
International Journal of Intelligent Systems 25
Future Abbreviations
Scopes
Px , Py , Pz : Single points of point cloud P
Figure 14: Challenges and scopes of non-Euclidean deep learning. ĝ: Fourier transform
∆X g(x): Laplace–Beltrami operator
R: Reference point
efcacy is insufcient. Nonetheless, in 2019, g: Inverse Fourier transform
DeepGCNs [207] were able to construct a 56-layer M2 (X): Manifold
network by leveraging the concepts of residual r: Distance between two locations
connections as well as dilated convolutions. Building f: Smooth function of manifolds
a neural network based on a deep graph is an in- Ln (c): Chebyshev polynomial
teresting yet difcult challenge. θ: Te angle between the axes
(3) Unpredictable Graph. Most known techniques are df(x): Closest neighbor points
used to edit static graphs, and many datasets are also ĥ(c): Filter
static, though many graphs change over time. In H(t): B-spline curve
social media platforms, the reduction of existing gv,z (λ): Cayley polynomial
users and the addition of new users may occur at any α: Anisotropy level
time, as well as the relationship among members can Si,d : Knot vectors
vary considerably. Te question of how to adequately Gf : Cayley flter for f signal
describe the development of dynamic graphs is n(i): Node features
unresolved, which has some impact on the appli- P(t): Control points
cability of graph neural networks. Tere are several gv (h): Hidden node features
eforts to resolve this issue such as [208–210]. Bp : Product of basic functions
(4) Scalability. It is a challenging challenge to apply fs : Regularization
graph neural networks on huge graphs. On the one α(h) : Multilayer perceptron
side, each node seems to have its neighborhood l(∅): Minimized loss
topology, which includes the hidden layer of sur- λ: Error coefcient
rounding nodes, making it challenging to train using e(h) : Diferentiate center nodes and neighbors
the batch technique [39]. In contrast, while dealing T± : Set of layers
with millions of nodes and edges, the Laplacian V: Vertices
matrix of the network was challenging to compute U: Eigenvector matrix
for researchers. Tere are additional approaches to ‖pi − qj ‖: Space between feature vectors
increase the performance of the model by quick E: Edges
sampling [211, 212] and subgraph training [213, 214] lθ : Linear projection
although the results are not particularly impressive. rx , ry : Image descriptors
G (V, E): Graph function
5. Conclusion β(h) : MLP (2 layers)
S: Neighbor set vertex
Even while deep learning has traditionally relied on Eu- L: Laplacian matrix
clidean geometry, the modern era’s complex structural ge- σ: Nonlinear activation function
ometry demands the use of non-Euclidean geometry. Te (ρ, θ): Neighbor polar coordinates
current world is aiming to include this non-Euclidean ge- D: Degree matrix
ometry in these deep-learning methods to satisfy the need C: Feature matrix of curvature
for 3D complex structure analysis. Learning from vρ (x, x′ ): Interpolation weight
26 International Journal of Intelligent Systems
Advances in Neural Information Processing Systems, vol. 30, Geoscience and Remote Sensing, vol. 58, no. 5, pp. 3162–3177,
2017. 2020.
[28] R. Levie, F. Monti, X. Bresson, and M. M. Bronstein, [43] J. Yu, C. Yu, C. Lin, and F. Wei, “Improved iterative closest
“CayleyNets: graph convolutional neural networks with point (ICP) point cloud registration algorithm based on
complex rational spectral flters,” IEEE Transactions on matching point pair quadratic fltering,” in Proceedings of the
Signal Processing, vol. 67, no. 1, pp. 97–109, 2019. 2021 International Conference on Computer, Internet of
[29] T. N. Kipf and M. Welling, “Semi-supervised classifcation Tings and Control Engineering, CITCE 2021, pp. 1–5,
with graph convolutional networks,” in Proceedings of the Guangzhou, China, November 2021.
5th International Conference on Learning Representations, [44] R. B. Rusu, Z. C. Marton, N. Blodow, M. Dolha, and
ICLR 2017 - Conference Track Proceedings, Toulon, France, M. Beetz, “Towards 3D Point cloud based object maps for
April 2016. household environments,” Robotics and Autonomous Sys-
[30] Q. Li, S. Liu, L. Hu, and X. Liu, “Shape correspondence using tems, vol. 56, no. 11, pp. 927–941, 2008.
anisotropic Chebyshev spectral CNNs,” in Proceedings of the [45] Wikipedia, “B-Spline - wikipedia,” 2022, https://en.
IEEE Computer Society Conference on Computer Vision and wikipedia.org/wiki/B-spline.
Pattern Recognition, pp. 14646–14655, July 2020. [46] M. Ammad and A. Ramli, “Cubic B-spline curve in-
[31] P. K. Jayaraman, A. Sanghi, J. G. Lambourne et al., “UV-net: terpolation with arbitrary derivatives on its data points,” in
learning from boundary representations,” in Proceedings of Proceedings of the 2019 23rd International Conference in
the 2021 IEEE/CVF Conference on Computer Vision and Information Visualization - Part II, pp. 156–159, Adelaide,
Pattern Recognition (CVPR), Nashville, TN, USA, June 2021. SA, Australia, July 2019.
[32] S. Arshad, M. Shahzad, Q. Riaz, and M. M. Fraz, “DPRNet: [47] W. Boehm, “Inserting new knots into B-spline curves,”
deep 3D point based residual network for semantic seg- Computer-Aided Design, vol. 12, no. 4, pp. 199–201, 1980.
mentation and classifcation of 3D point clouds,” IEEE [48] H. Yang, W. Wang, and J. Sun, “Control point adjustment
Access, vol. 7, pp. 68892–68904, 2019. for B-spline curve approximation,” Computer-Aided Design,
[33] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi- vol. 36, no. 7, pp. 639–652, 2004.
view convolutional neural networks for 3D shape recogni- [49] Towards data science, “Understanding regularization in
tion,” in 2015 IEEE International Conference on Computer machine learning | by ashu prasad | towards data science,”
Vision (ICCV), pp. 945–953, IEEE Computer Society, 2023, https://towardsdatascience.com/understanding-
Washington, DC, USA, 2015. regularization-in-machine-learning-d7dd0729dde5.
[34] L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li, “Dense [50] M. Xu, W. Dai, C. Li, J. Zou, H. Xiong, and P. Frossard,
human body correspondences using convolutional net- “Graph neural networks with lifting-based adaptive graph
works,” in Proceedings of the IEEE Conference on Computer wavelets,” IEEE Transactions on Signal and Information
Vision and Pattern Recognition, pp. 1544–1553, Las Vegas, Processing over Networks, vol. 8, pp. 63–77, 2022.
NV, USA, June 2016. [51] W. T. Tutte and W. T. Tutte, Graph Teory, Vol. 21,
[35] Z. Wu, S. Song, A. Khosla et al., “3d shapenets: a deep Cambridge University Press, Cambridge, UK, 2001.
representation for volumetric shapes,” in Proceedings of the [52] T. Wang, B. Li, J. Zou, F. Sun, and Z. Zhang, “New bounds of
IEEE Conference on Computer Vision and Pattern Recogni- the Nordhaus-Gaddum type of the Laplacian matrix of
tion, pp. 1912–1920, IEEE Computer Society, Washington, graphs,” in Proceedings of the 2012 8th International Con-
DC, USA, 2015. ference on Computational Intelligence and Security, CIS 2012,
[36] C. R. Qi, H. Su, M. Niebner, A. Dai, M. Yan, and L. J. Guibas, pp. 411–414, Guangzhou, China, November 2012.
“Volumetric and multi-view CNNs for object classifcation [53] R. B. Bapat, S. J. Kirkland, and S. Pati, “Te perturbed
on 3D data,” 2016 IEEE Conference on Computer Vision and laplacian matrix of a graph,” Linear and Multilinear Algebra,
Pattern Recognition (CVPR), IEEE Computer Society, vol. 49, no. 3, pp. 219–242, 2001.
Washington, DC, USA, pp. 5648–5656, 2016. [54] J. Domingos and J. M. F. Moura, “Graph fourier transform:
[37] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena- a stable approximation,” IEEE Transactions on Signal Pro-
Martinez, P. Martinez-Gonzalez, and J. Garcia-Rodriguez, cessing, vol. 68, pp. 4422–4437, 2020.
“A survey on deep learning techniques for image and video [55] M. Niepert, M. Ahmed, and K. Kutzkov, “Learning con-
semantic segmentation,” Applied Soft Computing, vol. 70, volutional neural networks for graphs,” 2023, https://
pp. 41–65, 2018. proceedings.mlr.press/v48/niepert16.html.
[38] Z. Zhang, P. Cui, and W. Zhu, “Deep learning on graphs: [56] S. Fiori, “Manifold calculus in system theory and con-
a survey,” IEEE Transactions on Knowledge and Data En- trol—fundamentals and frst-order systems,” Symmetry,
gineering, vol. 34, no. 1, pp. 249–270, 2022. vol. 13, no. 11, p. 2092, 2021.
[39] J. Zhou, G. Cui, S. Hu et al., “Graph neural networks: [57] F. Demengel and J. M. Ghidaglia, “Inertial manifolds for
a review of methods and applications,” AI Open, vol. 1, partial diferential evolution equations under time-dis-
pp. 57–81, 2020. cretization: existence, convergence, and applications,”
[40] N. A. Asif, Y. Sarker, R. K. Chakrabortty et al., “Graph neural Journal of Mathematical Analysis and Applications, vol. 155,
network: a comprehensive review on non-euclidean space,” no. 1, pp. 177–225, 1991.
IEEE Access, vol. 9, Article ID 60588, 2021. [58] Analytics Vidhya, “What are Graph Neural Networks, and
[41] W. Cao, Z. Yan, Z. He, and Z. He, “A comprehensive survey how do they work?,” 2022, https://www.analyticsvidhya.
on geometric deep learning,” IEEE Access, vol. 8, Article ID com/blog/2022/03/what-are-graph-neural-networks-and-
35929, 2020. how-do-they-work/.
[42] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, [59] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and
“Multiscale dynamic graph convolutional network for P. Vandergheynst, “Te emerging feld of signal processing
hyperspectral image classifcation,” IEEE Transactions on on graphs: extending high-dimensional data analysis to
28 International Journal of Intelligent Systems
networks and other irregular domains,” IEEE Signal Pro- [74] K. Lazri, A. Blin, J. Sopena, and G. Muller, “Toward an in-
cessing Magazine, vol. 30, no. 3, pp. 83–98, 2013. kernel high performance key-value store implementation,”
[60] Y. Ma, J. Hao, Y. Yang, H. Li, J. Jin, and G. Chen, “Spectral- in Proceedings of the 2019 38th Symposium on Reliable
based Graph Convolutional Network for Directed Graphs,” Distributed Systems (SRDS), p. 268, Lyon, France, October
2019, https://arxiv.org/abs/1907.08990. 2019.
[61] K. Kochetov and E. Putin, “SpecNN: the specifying neural [75] T. Sederberg, “Computer aided geometric design,” 2012,
network,” in Proceedings of the 2016 International Sympo- https://scholarsarchive.byu.edu/facpub/1.
sium on INnovations in Intelligent SysTems and Applications, [76] P. Veličković, A. Casanova, P. Liò, G. Cucurull, A. Romero,
INISTA 2016, Sinaia, Romania, August 2016. and Y. Bengio, “Graph attention networks,” in Proceedings of
[62] Z. L. Wang, X. O. Song, and X. R. Wang, “Spectrum sensing the 6th International Conference on Learning Representa-
detection algorithm based on eigenvalue variance,” in Pro- tions, ICLR 2018, Vancouver, BC, Canada, April 2018.
ceedings of the conference 2019 IEEE 8th Joint International [77] D. Xu, D. Zheng, F. Chen, E. Korpeoglu, S. Kumar, and
Information Technology and Artifcial Intelligence Confer- K. Achan, “Studying the efect of carrier type on the per-
ence, ITAIC 2019, pp. 1656–1659, Chongqing, China, May ception of vocoded stimuli via mismatch negativity,” in
2019. Proceedings of the Annual International Conference of the
[63] A. A. M. Muzahid, W. Wan, F. Sohel, L. Wu, and L. Hou, IEEE Engineering in Medicine and Biology Society, vol. 2019,
“CurveNet: curvature-based multitask learning deep net- pp. 3167–3170, Berlin, Germany, July 2019.
works for 3D object recognition,” IEEE/CAA Journal of [78] J. Masci, D. Boscaini, M. Bronstein, and P. Vandergheynst,
Automatica Sinica.vol. 8, no. 6, pp. 1177–1187, 2021. “ShapeNet: Convolutional Neural Networks on Non-
[64] W. Zhang, P. Tang, L. Zhao, and Q. Huang, “A comparative euclidean Manifolds,” 2015, http://infoscience.epf.ch/
study of U-nets with various convolution components for record/204949.
building extraction,” in Proceedings of the 2019 Joint Urban [79] D. Chicco, “Siamese neural networks: an overview,” Artifcial
Remote Sensing Event (JURSE) 2019, Vannes, France, May Neural Networks. Methods in Molecular Biology, Vol. 2190,
2019. Humana, New York, NY, USA, 2021.
[65] B. Vallet and B. Lévy, “Spectral geometry processing with [80] J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek, “Image
manifold harmonics,” Computer Graphics Forum, vol. 27, classifcation with the Fisher vector: theory and practice,”
no. 2, pp. 251–260, 2008. International Journal of Computer Vision, vol. 105, no. 3,
[66] M. Andreux, E. Rodolá, M. Aubry, and D. Cremers, “An- pp. 222–245, 2013.
isotropic laplace-beltrami operators for shape analysis,” [81] J. Donahue, Y. Jia, O. Vinyals et al., “Decaf: a deep con-
Lecture Notes in Computer Science, vol. 8928, pp. 299–312, volutional activation feature for generic visual recognition,”
2015. in Proceedings of the International Conference on Machine
[67] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral Learning PMLR, pp. 647–655, Beijing, China, June 2014.
networks and locally connected networks on graphs,” in [82] ModelNet, “Princeton ModelNet,” 2022, https://modelnet.
Proceedings of the 2nd International Conference on Learning cs.princeton.edu/.
Representations, ICLR 2014, Scottsdale, AZ, Maricopa, De- [83] A. Nguyen and B. Le, “3D point cloud segmentation:
cember 2013. a survey,” in Proceedings of the 2013 6th IEEE Conference on
[68] D. K. Hammond, P. Vandergheynst, and R. Gribonval, Robotics, Automation and Mechatronics (RAM), pp. 225–
“Wavelets on graphs via spectral graph theory,” Applied and 230, Manila, Philippines, November 2013.
Computational Harmonic Analysis, vol. 30, no. 2, pp. 129– [84] Y. Ishikawa, R. Hachiuma, N. Ienaga, W. Kuno, Y. Sugiura,
150, 2011. and H. Saito, “Semantic segmentation of 3D point cloud to
[69] F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and virtually manipulate real living space,” in Proceedings of the
M. M. Bronstein, “Geometric deep learning on graphs and 2019 12th Asia Pacifc Workshop on Mixed and Augmented
manifolds using mixture model cnns,” in Proceedings of the Reality, APMAR 2019, Ikoma, Japan, March 2019.
IEEE Conference on Computer Vision and Pattern Recogni- [85] Y. Li, L. Ma, Z. Zhong, D. Cao, and J. Li, “TGNet: geometric
tion, pp. 5115–5124, Hangzhou, China, December 2017. graph CNN on 3-D point cloud segmentation,” IEEE
[70] M. Fey, J. E. Lenssen, F. Weichert, and H. Müller, “Splinecnn: Transactions on Geoscience and Remote Sensing, vol. 58,
fast geometric deep learning with continuous b-spline no. 5, pp. 3588–3600, 2020.
kernels,” in Proceedings of the IEEE Conference on Com- [86] M. Yuan, Y. Chen, H. Guan, S. Shi, X. Wei, and B. Li, “An
puter Vision and Pattern Recognition, pp. 869–877, IEEE improved analytical probabilistic load fow method with
Computer Society, Washington, DC, USA, 2018. GMM models of correlated der Generation,” in Proceedings
[71] L. Xiao, X. Wu, and G. Wang, “Social network analysis based of the 2020 IEEE Sustainable Power and Energy Conference
on graph SAGE,” in Proceedings of the 2019 12th In- (iSPEC), pp. 73–78, Chengdu, China, November 2020.
ternational Symposium on Computational Intelligence and [87] L. Lebrat, R. S. Cruz, C. Fookes, and O. Salvado, “MongeNet:
Design (ISCID), Hangzhou, China, December 2019. efcient sampler for geometric deep learning,” in Proceedings
[72] B. Zhang, H. Zeng, and V. Prasanna, “Hardware acceleration of the IEEE/CVF Conference on Computer Vision and Pattern
of large scale GCN inference,” in Proceedings of the 2020 RecognitionIEEE Computer Society, Washington, DC, USA,
IEEE 31st International Conference on Application-specifc Article ID 16664, 2021.
Systems, Architectures and Processors (ASAP), Manchester, [88] K. Jothiramalingam, N. R. Dhineshbabu, and S. R. Srither,
UK, July 2020. “Study the Complex Dielectric and Energy loss function of
[73] X. Gao and H. Xiong, “A hybrid wavelet convolution net- Graphene oxide nanostructures using linear optical con-
work with sparse-coding for image super-resolution,” in stant,” in Proceedings of the 2018 Conference on Emerging
Proceedings of the 2016 IEEE International Conference on Devices and Smart Systems (ICEDSS), pp. 209–212, Tir-
Image Processing (ICIP) 2016, pp. 1439–1443, Phoenix, AZ, uchengode, India, March 2018.
USA, August 2016.
International Journal of Intelligent Systems 29
[89] O. Tuzel, F. Porikli, and P. Meer, “Region covariance: a fast [109] Y. Wang, T. Zhou, Z. Li, H. Huang, and B. Qu, “Salient object
descriptor for detection and classifcation,” Lecture Notes in detection based on multi-feature graphs and improved
Computer Science, vol. 3952, pp. 589–600, Springer, Berlin, manifold ranking,” Multimedia Tools and Applications,
Germany, 2006. vol. 81, Article ID 27551, 2022.
[90] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and [110] P. Zhang, G. Chen, Y. Li, and W. Dong, “Agile Formation
S. Süsstrunk, “SLIC superpixels compared to state-of-the-art control of drone focking enhanced with active vision-based
superpixel methods,” IEEE Transactions on Pattern Analysis relative localization,” IEEE Robotics and Automation Letters,
and Machine Intelligence, vol. 34, no. 11, pp. 2274–2282, vol. 7, no. 3, pp. 6359–6366, 2022.
2012. [111] T. Hempel and A. Al-Hamadi, “An online semantic mapping
[91] Faust Dataset, “MPI Faust Dataset,” 2023, https://faust- system for extending and enhancing visual SLAM,” Engi-
leaderboard.is.tuebingen.mpg.de/. neering Applications of Artifcial Intelligence, vol. 111, Article
[92] Stanford, “Drago Anguelov,” 2023, http://ai.stanford.edu/ ID 104830, 2022.
drago/Projects/projects.html. [112] P. Antonante, V. Tzoumas, H. Yang, and L. Carlone,
[93] Tosca, “OpenDataLab,” 2023, https://opendatalab.com/ “Outlier-robust estimation: hardness, minimally tuned al-
TOSCA. gorithms, and applications,” IEEE Transactions on Robotics,
[94] ModelNet, “Princeton ModelNet,” 2023, https://modelnet. vol. 38, no. 1, pp. 281–301, 2022.
cs.princeton.edu/. [113] P. Dhiman, V. Kukreja, P. Manoharan et al., “A novel deep
[95] R. G. Schneider and T. Tuytelaars, “Sketch classifcation and learning model for detection of severity level of the Disease
classifcation-driven analysis using Fisher vectors,” ACM in citrus fruits,” Electronics, vol. 11, p. 495, 2022.
Transactions on Graphics, vol. 33, no. 6, pp. 1–9, 2014. [114] L. Jiang, W. Nie, J. Zhu, X. Gao, and B. Lei, “Lightweight
[96] D. Angela, X. C. Angel, S. Manolis, H. Maciej, F. Tomas, object detection network model suitable for indoor mobile
and N. Matthias, “ScanNet | richly-annotated 3D re- robots,” Journal of Mechanical Science and Technology,
constructions of indoor scenes,” 2023, http://www.scan-net. vol. 36, no. 2, pp. 907–920, 2022.
org/. [115] M. A. Ansari, M. Meraz, P. Chakraborty, and M. Javed,
[97] Stanford, “S3DIS dataset | papers with code,” 2023, https:// “Angle-based feature learning in GNN for 3D object de-
paperswithcode.com/dataset/s3dis. tection using point cloud,” Lecture Notes in Electrical En-
[98] X. Roynard, J. E. Deschaud, and F. Goulette, “Paris-Lille-3D: gineering, vol. 858, pp. 419–432, 2022.
a large and high-quality ground-truth urban point cloud [116] X. Liu, X. Jing, Z. Zheng, W. Du, X. Ding, and Q. Zhu, “A
dataset for automatic segmentation and classifcation,” Te symmetric fusion learning model for detecting visual re-
International Journal of Robotics Research, vol. 37, no. 6, lations and scene parsing,” Scientifc Programming, vol. 2022,
pp. 545–557, 2018. pp. 1–9, 2022.
[99] Yann, “MNIST handwritten digit database, yann LeCun, [117] P. Jiang, X. Deng, L. Wang, Z. Chen, and S. Zhang,
corinna cortes and chris burges,” 2023, http://yann.lecun. “Hypergraph representation for detecting 3D objects from
com/exdb/mnist/. noisy point clouds,” IEEE Transactions on Knowledge and
[100] Cora, “Dataset,” 2023, https://relational.ft.cvut.cz/dataset/ Data Engineering, vol. 1, 2022.
CORA. [118] C. Xia, X. Gao, X. Fang, K. C. Li, S. Su, and H. Zhang, “RLP-
[101] NCBI, “PubMed,” 2023, https://pubmed.ncbi.nlm.nih.gov/. AGMC: robust label propagation for saliency detection
[102] M. Graña, M. A. Veganzons, and B. Ayerdi, “Hyperspectral based on an adaptive graph with multiview connections,”
Remote sensing scenes - grupo de Inteligencia computa- Signal Processing: Image Communication, vol. 98, Article ID
cional (GIC),” 2023, https://www.ehu.eus/ccwintco/index. 116372, 2021.
php/Hyperspectral Remote Sensing Scenes. [119] R. Ji, Z. Liu, L. Zhang et al., “Multi-peak graph-based multi-
[103] Pavia University, “V7 Open Datasets,” 2023, https://www. instance learning for weakly supervised object detection,”
v7labs.com/open-datasets/pavia-university. ACM Transactions on Multimedia Computing, Communi-
[104] Kennedy Space Center, “V7 Open Datasets,” 2023, https:// cations, and Applications, vol. 17, no. 2, pp. 1–21, 2021.
www.v7labs.com/open-datasets/kennedy-space-center. [120] H. Wang, C. Zhu, J. Shen, Z. Zhang, and X. Shi, “Salient
[105] Z. Zhang, P. Jaiswal, and R. Rai, “FeatureNet: machining object detection by robust foreground and background seed
feature recognition based on 3D convolution neural net- selection,” Computers & Electrical Engineering, vol. 90,
work,” Computer-Aided Design, vol. 101, pp. 12–22, 2018. Article ID 106993, 2021.
[106] A. Angrish, E. P. Fitts, B. Craver, and B. Starly, “FabSearch”: [121] K. Meier, S. J. Chung, and S. Hutchinson, “River segmen-
a 3D CAD model based search engine for sourcing tation for autonomous surface vehicle localization and river
manufacturing services,” 2018, https://arxiv.org/abs/1809. boundary mapping,” Journal of Field Robotics, vol. 38, no. 2,
06329. pp. 192–211, 2021.
[107] W. Cao, T. Robinson, Y. Hua, F. Boussuge, A. R. Colligan, [122] N. Gothoskar, M. Cusumano-Towner, B. Zinberg et al.,
and W. Pan, “Graph representation of 3D CAD models for “3DP3: 3D scene perception via probabilistic programming,”
machining feature recognition with deep learning,” in Advances in Neural Information Processing Systems, vol. 34,
Proceedings of the ASME Design Engineering Technical pp. 9600–9612, 2021.
Conference, pp. 11A–2020A, St Louis Missouri, USA, No- [123] B. Mao and B. Li, “City object detection from airborne Lidar
vember 2020. data with OpenStreetMap-tagged superpixels,” Concurrency
[108] S. Koch, A. Matveev, Z. Jiang et al., “ABC: a Big CAD model and Computation: Practice and Experience, vol. 32, no. 23,
dataset for geometric deep learning,” in Proceedings of the Article ID e6026, 2020.
2019 IEEE/CVF Conference on Computer Vision and Pattern [124] B. Santra, A. K. Shaw, and D. P. Mukherjee, “Graph-based
Recognition (CVPR), pp. 9593–9603, Long Beach, CA, USA, non-maximal suppression for detecting products on the
June 2019. rack,” Pattern Recognition Letters, vol. 140, pp. 73–80, 2020.
30 International Journal of Intelligent Systems
[125] R. Ren, H. Fu, X. Hu, H. Xue, X. Li, and M. Wu, “Towards [139] A. Bessadok, M. A. Mahjoub, and I. Rekik, “Brain graph
efcient and robust LiDAR-based 3D mapping in urban synthesis by dual adversarial domain alignment and target
environments,” in Proceedings of the 2020 3rd International graph prediction from a source graph,” Medical Image
Conference on Unmanned Systems (ICUS), pp. 511–516, Analysis, vol. 68, Article ID 101902, 2021.
Harbin, China, November 2020. [140] F. P. Kalaganis, N. A. Laskaris, E. Chatzilari, S. Nikolopoulos,
[126] J. Poschmann, T. Pfeifer, and P. Protzel, “Factor graph based and I. Kompatsiaris, “A data augmentation scheme for
3D multi-object tracking in point clouds,” in Proceedings of geometric deep learning in personalized brain–computer
the 2020 IEEE/RSJ International Conference on Intelligent interfaces,” IEEE Access, vol. 8, Article ID 162218, 2020.
Robots and Systems (IROS), Article ID 10343, Las Vegas, NV, [141] M. B. Gurbuz and I. Rekik, “Deep graph normalizer:
USA, October 2020. a geometric deep learning approach for estimating con-
[127] M. Zhang, J. Chen, P. Li, M. Jiang, and Z. Zhou, “Topic scene nectional brain templates,” Lecture Notes in Computer Sci-
graphs for image captioning,” IET Computer Vision, vol. 16, ence, vol. 12267, pp. 155–165, Springer Cham, New York,
no. 4, pp. 364–375, 2022. NY, USA, 2020.
[128] X. Li, B. Wu, J. Song, L. Gao, P. Zeng, and C. Gan, “Text- [142] V. Vosylius, A. Wang, C. Waters et al., “Geometric deep
instance graph: exploring the relational semantics for text- learning for post-menstrual age prediction based on the
based visual question answering,” Pattern Recognition, neonatal white matter cortical surface,” Lecture Notes in
vol. 124, Article ID 108455, 2022. Computer Science, vol. 12443, pp. 174–186, Springer Cham,
[129] W. Gao, L. Yu, Y. Du, and S. Lu, “Single image 3D re- New York, NY, USA, 2020.
construction based on attention mechanism and graph [143] Z. Huang, H. Cai, T. Dan, Y. Lin, P. Laurienti, and G. Wu,
convolution network,” in Proceedings of the 8th International “Detecting brain state changes by geometric deep learning of
Conference on Computing and Artifcial Intelligence, vol. 18, functional dynamics on riemannian manifold,” Lecture
pp. 670–676, Tianjin, China, March 2022. Notes in Computer Science, vol. 12907, pp. 543–552, Springer
[130] B. Guldur Erkal and J. F. Hajjar, “Using extracted member Cham, New York, NY, USA, 2021.
properties for laser-based surface damage detection and [144] A. Bessadok, M. A. Mahjoub, and I. Rekik, “Dual adversarial
quantifcation,” Structural Control and Health Monitoring, connectomic domain alignment for predicting isomorphic
vol. 27, no. 11, Article ID e2616, 2020. brain graph from a baseline graph,” Lecture Notes in
[131] J. Li, J. Yi, L. Hua et al., “Development and validation of Computer Science, vol. 11767, pp. 465–474, Springer Cham,
a predictive score for venous thromboembolism in newly New York, NY, USA, 2019.
diagnosed non-small cell lung cancer,” Trombosis Research, [145] Z. Zhen, Y. Chen, I. Segovia-Dominguez, and Y. R. Gel,
vol. 208, pp. 45–51, 2021. “Tlife-GDN: detecting and forecasting spatio-temporal
[132] N. Yang, R. Wang, J. Stuckler, and D. Cremers, “Deep virtual anomalies via persistent homology and geometric deep
stereo odometry: leveraging deep depth prediction for learning,” Lecture Notes in Computer Science, vol. 13281,
monocular direct sparse odometry,” in Proceedings of the pp. 511–525, Springer Cham, New York, NY, USA, 2022.
European Conference on Computer Vision (ECCV), [146] L. Hansen, D. Dittmer, and M. P. Heinrich, “Learning de-
pp. 817–833, Munich, Germany, September 2018. formable point set registration with regularized dynamic
[133] T. Graph, “Graph convolutional enhanced discriminative graph CNNs for large lung motion in COPD patients,”
broad learning system for hyperspectral image classifca- Lecture Notes in Computer Science, vol. 11849, pp. 53–61,
tion,” IEEE Access, vol. 10, Article ID 90299, 2022. Springer Cham, New York, NY, USA, 2019.
[134] M. Zeng, Y. Cai, X. Liu, Z. Cai, and X. Li, “Spectral-spatial [147] V. C. Gogineni, S. R. E. Langberg, V. Naumova et al., “Data-
clustering of hyperspectral image based on laplacian regu- driven personalized cervical cancer risk prediction: a graph-
larized deep subspace clustering,” in Proceedings of the perspective,” in Proceedings of the 2021 IEEE Statistical
IGARSS 2019 - 2019 IEEE International Geoscience and Signal Processing Workshop (SSP), pp. 46–50, Rio de Janeiro,
Remote Sensing Symposium, pp. 2694–2697, Yokohama, Brazil, July 2021.
Japan, July 2019. [148] B. Das, M. Kutsal, and R. Das, “A geometric deep learning
[135] H. Wu, Y. Yan, Y. Ye, M. K. Ng, and Q. Wu, “Geometric model for display and prediction of potential drug-virus
Knowledge Embedding for unsupervised domain adapta- interactions against SARS-CoV-2,” Chemometrics and In-
tion,” Knowledge-Based Systems, vol. 191, Article ID 105155, telligent Laboratory Systems, vol. 229, Article ID 104640,
2020. 2022.
[136] Q. Yang, Y. Xu, Z. Wu, and Z. Wei, “Hyperspectral and [149] T. Dissanayake, T. Fernando, S. Denman, S. Sridharan, and
multispectral image fusion based on deep attention net- C. Fookes, “Geometric deep learning for subject independent
work,” in Proceedings of the 2019 10th Workshop on epileptic seizure prediction using scalp EEG signals,” IEEE
Hyperspectral Imaging and Signal Processing: Evolution in J Biomed Health Inform, vol. 26, no. 2, pp. 527–538, 2022.
Remote Sensing (WHISPERS), Amsterdam, Netherlands, [150] L. Z. J. Williams, A. Fawaz, M. F. Glasser, A. D. Edwards, and
September 2019. E. C. Robinson, “Geometric deep learning of the human
[137] H. Fang, A. Li, T. Wang, H. Chang, and H. Xu, “Hyper- connectome project multimodal cortical parcellation,” Lec-
spectral Unmixing Using Graph-Regularized and Sparsity- ture Notes in Computer Science, vol. 13001, pp. 103–112,
Constrained Deep NMF,” in Proceedings of the Volume Springer Cham, New York, NY, USA, 2021.
10846, Optical Sensing and Imaging Technologies and Ap- [151] B. Zhao, X. Zhang, Z. Zhan, Q. Wu, and H. Zhang, “Mul-
plications, pp. 664–672, Beijing, China, December 2018. tiscale graph-guided convolutional network with node at-
[138] Z. Lin, F. Zhu, Q. Wang et al., “RSSGG CS: Remote sensing tention for intelligent health state diagnosis of a 3-PRR
image scene graph generation by fusing contextual in- planar parallel manipulator,” IEEE Transactions on In-
formation and statistical knowledge,” Remote Sensing, dustrial Electronics, vol. 69, no. 11, Article ID 11733, 2022.
vol. 14, p. 3118, 2022. [152] F. Bagheri, M. J. Tarokh, and M. Ziaratban, “Skin lesion
segmentation from dermoscopic images by using Mask R-
International Journal of Intelligent Systems 31
CNN, Retina-Deeplab, and graph-based methods,” Bio- [167] B. Zhang, J. Wan, Y. Zhao, Z. Tong, and Y. Du, “Multi-actor
medical Signal Processing and Control, vol. 67, Article ID activity detection by modeling object relationships in ex-
102533, 2021. tended videos based on deep learning,” Engineering Appli-
[153] L. Yang, X. Zhan, D. Chen, J. Yan, C. C. Loy, and D. Lin, cations of Artifcial Intelligence, vol. 114, Article ID 105055,
“Learning to cluster faces on an afnity graph,” in Pro- 2022.
ceedings of the IEEE/CVF Conference on Computer Vision [168] Y. Han, T. Zhuo, P. Zhang et al., “One-shot video graph
and Pattern Recognition, pp. 2298–2306, Long Beach, CA, generation for explainable action reasoning,” Neuro-
USA, June 2019. computing, vol. 488, pp. 212–225, 2022.
[154] X. Wu, X. Ma, J. Zhang, and Z. Jin, “Salient object detection [169] J. N. Zaech, A. Liniger, D. Dai, M. Danelljan, and
via reliable boundary seeds and saliency refnement,” IET L. van Gool, “Learnable online graph representations for 3D
Computer Vision, vol. 13, no. 3, pp. 302–311, 2019. multi-object tracking,” IEEE Robotics and Automation Let-
[155] F. Albalas, A. Alzu’Bi, A. Alguzo, T. Al-Hadhrami, and ters, vol. 7, no. 2, pp. 5103–5110, 2022.
A. Othman, “Learning discriminant spatial features with [170] B. Xie, Y. Deng, Z. Shao, H. Liu, and Y. Li, “VMV-GCN:
deep graph-based convolutions for occluded face detection,” volumetric multi-view based graph CNN for event stream
IEEE Access, vol. 10, pp. 35162–35171, 2022. classifcation,” IEEE Robotics and Automation Letters, vol. 7,
[156] N. Dall’Asen, Y. Wang, H. Tang, L. Zanella, and E. Ricci, no. 2, pp. 1976–1983, 2022.
“Graph-based generative face anonymisation with pose [171] Z. Xiong, H. Ma, Y. Wang, T. Hu, and Q. Liao, “LiDAR-
preservation,” Lecture Notes in Computer Science, Springer based 3D video object detection with foreground context
Cham, vol. 13232, pp 503–515, New York, NY, USA, 2022. modeling and spatiotemporal graph reasoning,” in Pro-
[157] S. Nayak, A. Routray, M. Sarma, and S. Uttarkabat, “GNN ceedings of the 2021 IEEE International Intelligent Trans-
based embedded framework for consumer afect recognition portation Systems Conference (ITSC) 2021, pp. 2994–3001,
using thermal facial ROIs,” IEEE Consumer Electronics Indianapolis, IN, USA, September 2021.
Magazine, 2022. [172] J. Zhan, H. Zhao, P. Zheng, H. Wu, and L. Wang, “Salient
[158] P. Karuppanan and K. Dhanalakshmi, “Criminal persons superpixel visual tracking with graph model and iterative
recognition using improved feature extraction based local segmentation,” Cognitive Computation, vol. 13, no. 4,
phase quantization,” Intelligent Automation & Soft Com- pp. 821–832, 2019.
puting, vol. 33, no. 2, pp. 1025–1043, 2022. [173] N. Gkalelis, A. Goulas, D. Galanopoulos, and V. Mezaris,
[159] Z. Shao, Y. Zhou, B. Liu, H. Zhu, W. L. Du, and J. Zhao, “Objectgraphs: using objects and a graph convolutional
“Facial action unit detection via hybrid relational reasoning,” network for the bottom-up recognition and explanation of
Te Visual Computer, vol. 38, no. 9, pp. 3045–3057, 2022. events in video,” in Proceedings of the IEEE/CVF Conference
[160] A. Alguzo, A. Alzu’Bi, and F. Albalas, “Masked face detection on Computer Vision and Pattern Recognition, pp. 3375–3383,
using multi-graph convolutional networks,” in Proceedings Nashville, TN, USA, June 2021.
of the 2021 12th International Conference on Information and [174] W. Zhang and F. Meyer, “Graph-based multiobject tracking
Communication Systems (ICICS), pp. 385–391, Valencia, with embedded particle fow,” in Proceedings of the 2021
Spain, May 2021. IEEE Radar Conference (RadarConf21) 2021, Atlanta, GA,
[161] V. B. Kamath, S. S. Pai, S. S. Poojary, A. Rastogi, and USA, May 2021.
K. S. Srinivas, “Drunkenness face detection using graph [175] G. Jung, J. Lee, and I. Kim, “Tracklet pair proposal and
neural networks,” in Proceedings of the CSITSS 2021 5th context reasoning for video scene graph generation,” Sensors,
International Conference on Computational Systems and vol. 21, p. 3164, 2021.
Information Technology for Sustainable Solutions, Bangalore, [176] M. Tomei, L. Baraldi, S. Calderara, S. Bronzin, and
India, December 2021. R. Cucchiara, “Video action detection by learning graph-
[162] Z. Lu, C. Zhou, X. Xuyang, and W. Zhang, “Face detection based spatio-temporal interactions,” Computer Vision and
and recognition method based on improved convolutional Image Understanding, vol. 206, Article ID 103187, 2021.
neural network,” International Journal of Circuits, Systems [177] Y. Yin, X. Feng, and H. Wu, “Learning for graph matching
and Signal Processing, vol. 15, pp. 774–781, 2021. based multi-object tracking in auto driving,” J Phys Conf Ser,
[163] J. Bai, W. Yu, Z. Xiao et al., “Two-stream spatial-temporal vol. 1871, no. 1, Article ID 012152, 2021.
graph convolutional networks for driver drowsiness de- [178] S. A. Kazakova, P. A. Leonteva, M. I. Frolova,
tection,” IEEE Transactions on Cybernetics, vol. 52, no. 12, J. V. Donetskaya, I. Y. Popov, and A. Y. Kuznetsov, “A study
Article ID 13821, 2022. of human motion in computer vision systems based on
[164] Z. Wen, “Large-scale face clustering method research based a skeletal model,” Scientifc and Technical Journal of In-
on deep learning,” in Proceedings of the 2021 3rd In- formation Technologies, Mechanics and Optics, vol. 21, no. 4,
ternational Conference on Machine Learning, Big Data and pp. 571–577, 2021.
Business Intelligence, MLBDBI, pp. 731–734, Taiyuan, China, [179] Y. M. Seo and Y. S. Choi, “Graph convolutional networks for
December 2021. skeleton-based action recognition with LSTM using tool-
[165] Z. Liu, J. Dong, C. Zhang, L. Wang, and J. Dang, “Relation information,” in Proceedings of the 36th Annual ACM
modeling with graph convolutional networks for facial ac- Symposium on Applied Computing, pp. 986–993, Virtual
tion unit detection,” Lecture Notes in Computer Science, Event, Korea, March 2021.
Springer Cham, vol. 11962, pp 489–501, New York, NY, [180] M. Xu, B. Liu, P. Fu, J. Li, Y. H. Hu, and S. Feng, “Video
USA, 2020. salient object detection via robust seeds extraction and
[166] X. Xu, Z. Ruan, and L. Yang, “Facial expression recognition multi-graphs manifold propagation,” IEEE Transactions on
based on graph neural network,” in Proceedings of the 2020 Circuits and Systems for Video Technology, vol. 30, no. 7,
IEEE 5th International Conference on Image, Vision and pp. 2191–2206, 2019.
Computing, ICIVC 2020, pp. 211–214, Beijing, China, July [181] M. Xu, B. Liu, P. Fu, J. Li, and Y. H. Hu, “Video saliency
2020. detection via graph clustering with motion energy and
32 International Journal of Intelligent Systems
spatiotemporal objectness,” IEEE Transactions on Multi- [195] A. A. Liu, Y. Wang, N. Xu, S. Liu, and X. Li, “Scene-Graph-
media, vol. 21, no. 11, pp. 2790–2805, 2019. Guided message passing network for dense captioning,”
[182] Y. Zhang, H. Sheng, Y. Wu, S. Wang, W. Ke, and Z. Xiong, Pattern Recognition Letters, vol. 145, pp. 187–193, 2021.
“Multiplex labeling graph for near-online tracking in [196] I. U. Hewapathirana, D. Lee, E. Moltchanova, and J. McLeod,
crowded scenes,” IEEE Internet of Tings Journal, vol. 7, “Change detection in noisy dynamic networks: a spectral
no. 9, pp. 7892–7902, 2020. embedding approach,” Social Network Analysis and Mining,
[183] Z. Su, Y. Wang, Q. Xie, and R. Yu, “Pose graph parsing vol. 10, no. 1, pp. 14–22, 2020.
network for human-object interaction detection,” Neuro- [197] A. Paul, M. Shamsul Arefn, and R. Karim, “Developing
computing, vol. 476, pp. 53–62, 2022. a Framework for Analyzing Heterogeneous Data from Social
[184] L. He and J. Zhang, “Railway driver behavior recognition Networks,” International Journal of Advanced Computer
system based on deep learning algorithm,” in Proceedings of Science and Applications (IJACSA), vol. 10, 2019.
the 2021 4th International Conference on Artifcial In- [198] Z. Yuan, Q. Yuan, and J. Wu, “Phishing detection on
telligence and Big Data, ICAIBD 2021, pp. 398–403, ethereum via learning representation of transaction sub-
Chengdu, China, May 2021. graphs,” Communications in Computer and Information
[185] Y. Tang, B. Wang, W. He, and F. Qian, “PointDet: an object Science, vol. 1267, pp. 178–191, 2020.
detection framework based on human local features in the [199] Y. Ren, B. Wang, J. Zhang, and Y. Chang, “Adversarial active
task of identifying violations,” in Proceedings of the 2021 11th learning based heterogeneous graph neural network for fake
International Conference on Information Science and Tech- news detection,” in Proceedings of the 2020 IEEE In-
nology (ICIST), pp. 673–680, Chengdu, China, May 2021. ternational Conference on Data Mining (ICDM) 2020,
[186] S. Zheng, S. Chen, and Q. Jin, “Skeleton-based interactive pp. 452–461, Sorrento, Italy, November 2020.
graph network for human object interaction detection,” in [200] D. Zhu, Y. Ma, and Y. Liu, “A fexible attentive temporal
Proceedings of the 2020 IEEE International Conference on graph networks for anomaly detection in dynamic net-
Multimedia and Expo (ICME). 2020, London, UK, July 2020. works,” in Proceedings of the 2020 IEEE 19th International
[187] G. Ning, J. Pei, and H. Heng, “Lighttrack: A generic Conference on Trust, Security and Privacy in Computing and
framework for online top-down human pose tracking,” in Communications, TrustCom 2020, pp. 870–875, Guangzhou,
Proceedings of the IEEE/CVF Conference on Computer Vision China, December 2020.
and Pattern Recognition Workshops, pp. 1034-1035, Seattle, [201] R. Wen, J. Wang, C. Wu, and J. Xiong, “ASA: adversary
WA, USA, June 2020. situation awareness via heterogeneous graph convolutional
[188] J. Jiahao Lin and G. Hee Lee, “Learning spatial context with networks,” in Proceedings of the Web Conference 2020 -
graph neural network for multi-person pose grouping,” in Companion of the World Wide Web Conference, WWW
Proceedings of the 2021 IEEE International Conference on 2020, pp. 674–678, Taipei, Taiwan, April 2020.
Robotics and Automation (ICRA). 2021, pp. 4230–4236, [202] C. Shen, X. Zhao, X. Fan et al., “Multi-receptive feld graph
Xi’an, China, May 2021. convolutional neural networks for pedestrian detection,” IET
[189] T. Cui, W. Song, G. An, and Q. Ruan, “Prototype generation Intelligent Transport Systems, vol. 13, no. 9, pp. 1319–1328,
based shift graph convolutional network for semi-supervised 2019.
anomaly detection,” Communications in Computer and In- [203] J. Xie, Y. Pang, H. Cholakkal, R. Anwer, F. Khan, and L. Shao,
formation Science, Springer, vol. 1480, pp 159–169, Singa- “PSC-Net: learning part spatial co-occurrence for occluded
pore, 2021. pedestrian detection,” Science China Information Sciences,
[190] C. Zhang and H. He, “Research on Pose Recognition Al- vol. 64, no. 2, Article ID 120103, 2020.
gorithm for Sports Players Based on Machine Learning of [204] A. K. F. Lui, Y. H. Chan, and M. F. Leung, “Modelling of
Sensor Data,” Security and Communication Networks, destinations for data-driven pedestrian trajectory prediction
vol. 2021, 2021. in public buildings,” in Proceedings of the 2021 IEEE In-
[191] D. Yang and Y. Zou, “A graph-based interactive reasoning ternational Conference on Big Data, Big Data 2021,
for human-object interaction detection,” in Proceedings of pp. 1709–1717, Orlando, FL, USA, December 2021.
the IJCAI International Joint Conference on Artifcial In- [205] A. K. F. Lui, Y. H. Chan, and M. F. Leung, “Modelling of
telligence, pp. 1111–1117, Messe Wien, Vienna, Austria, pedestrian movements near an amenity in walkways of
January 2020. public buildings,” in Proceedings of the 2022 8th In-
[192] P. R. K. Pranav, S. Verma, S. Shenoy, and S. Saravanan, ternational Conference on Control, Automation and Robotics,
“Detection of botnets in IoT networks using graph theory ICCAR 2022, pp. 394–400, Xiamen, China, April 2022.
and machine learning,” in Proceedings of the 2022 6th In- [206] Q. Li, Z. Han, and X. M. Wu, “Deeper insights into graph
ternational Conference on Trends in Electronics and In- convolutional networks for semi-supervised learning,” in
formatics, ICOEI 2022, pp. 590–597, Tirunelveli, India, April Proceedings of the Tirty-Second AAAI Conference on Arti-
2022. fcial Intelligence, New Orleans, LA, USA, February 2018.
[193] I. Papakis, A. Sarkar, and A. Karpatne, “A graph convolu- [207] G. Li, M. Muller, A. Tabet, and B. Ghanem, “Deepgcns: can
tional neural network based approach for trafc monitoring gcns go as deep as cnns?” in Proceedings of the IEEE/CVF
using augmented detections with optical fow,” in Pro- International Conference on Computer Vision, pp. 9267–
ceedings of the 2021 IEEE International Intelligent Trans- 9276, Seoul, Korea, October 2019.
portation Systems Conference (ITSC) 2021, pp. 2980–2986, [208] M. Looks, M. Herreshof, D. L. Hutchins, and P. Norvig,
Indianapolis, IN, USA, September 2021. “Deep learning with dynamic computation graphs,” in
[194] F. Zhou, S. Chen, J. Wu, C. Cao, and S. Zhang, “Trajectory- Proceedings of the 5th International Conference on Learning
user linking via graph neural network,” in Proceedings of the Representations, ICLR 2017, Toulon, France, February 2017.
ICC 2021 - IEEE International Conference on Communica- [209] Y. Ma, Z. Guo, Z. Ren, J. Tang, and D. Yin, “Streaming graph
tions, Montreal, QC, Canada, June 2021. neural networks,” in Proceedings of the 43rd International
ACM SIGIR Conference On Research And Development In
International Journal of Intelligent Systems 33