Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO.

9, SEPTEMBER 2023 6033

GGCNN: An Efficiency-Maximizing Gated Graph


Convolutional Neural Network Architecture for
Automatic Modulation Identification
Pejman Ghasemzadeh , Graduate Student Member, IEEE, Michael Hempel , Member, IEEE,
Honggang Wang , Fellow, IEEE, and Hamid Sharif , Fellow, IEEE

Abstract— Automatic modulation identification (AMI) is a AMI enables receivers to blindly identify a received signal’s
technique to detect the modulation type and order of a received modulation scheme, without a priori information about the
signal, which has the potential to enhance cognitive radio capabil- signal. This ability has been recognized as an important contri-
ities for future generations of communication devices. However,
AMI classifiers traditionally have exhibited low efficiency in bution towards cognitive radio technology [2]. This is because
low signal-to-noise ratio (SNR) environments. Hence, to address implementing AMI in receivers provides the transmitter the
this problem we present our novel Gated Graph Convolutional freedom to select the best modulation scheme from a wide
Neural Network (GGCNN) classifier for feature-based AMI. This pool of available modulation schemes based on the environ-
architecture includes a robust feature extraction stage to extract mental conditions providing that the transmitter functions on
deep correlative patterns about the received symbols. Not only
does this feature extraction stage use the temporal characteristics a software-defined-equipped platform. Then, such changes in
of the received symbols, but it also takes advantage of embedded transmission characteristics do not need to be communicated
signaling features from the received signal. In the proposed with the receiver [1], [2]. In addition, providing this capability
classifier, the received constellations are treated as a graph, allow- to a transceiver system can also further help in adapting other
ing it to outperform state-of-the-art classifiers due to its strong communication parameters such as sample rate in order to
performance in graph classification. This is observed clearly in
the visualization of the extracted features, even for high-order maximize achievable channel throughput [3]. Furthermore,
modulation schemes. In this paper, we present our systematic AMI possesses multiple other civilian applications such as
research conducted for maximizing the efficiency obtainable intelligent modem designs [4], spectrum sensing [5], safety
by our classifier. Extensive simulation results demonstrate a monitoring or threat assessment [6], interference mitigation
significant accuracy improvement of 18.44 percentage points, and
and dynamic spectrum access [7]. All these applications have
an efficiency increase by 60.78% for our GGCNN-AMI classifier
compared to state-of-the-art classifiers in low-SNR environments. either direct or indirect contributions towards the improvement
of cognitive radio technology [8]. Out of these applications,
Index Terms— Automatic modulation identification (AMI),
feature-based (FB) classification, gated graph convolutional
interference mitigation plays a more prominent role; yet it
neural network (GGCNN), performance analysis, efficiency traditionally has been a challenge for receivers. With the
improvement. near-ubiquitous presence of wireless devices, communication
systems face a significant problem in spectrum congestion [3].
Thus, by deploying AMI at the receiver and then conducting
I. I NTRODUCTION
a sweep of all supported frequencies, the receiver can then

A UTOMATIC modulation identification (AMI), also


known as automatic modulation classification (AMC),
refers to a signal processing procedure in which a received
efficiently conduct a survey that maps each frequency to the
main modulation scheme used at that frequency. Any other
modulation scheme used at that frequency is then considered
signal’s modulation scheme type and order are identified [1]. as an interfering one. Hence, through interference reduction
Manuscript received 26 June 2022; revised 20 December 2022;
or avoidance processes driven by that modulation scheme [9],
accepted 15 January 2023. Date of publication 30 January 2023; date of the receiver can potentially avoid or mitigate the effects of
current version 12 September 2023. This work was supported in part by the the interfering signal, and more accurately recover the source
U.S. Federal Railroad Administration (FRA) and in part by the Rural Railroad
Safety Center (RRSC). The associate editor coordinating the review of
signal of interest [1]. Applying AMI within the realm of civil-
this article and approving it for publication was A. El Gamal. (Corresponding ian cognitive radio applications presents significant research
author: Pejman Ghasemzadeh.) challenges and opportunities to improve the performance and
Pejman Ghasemzadeh, Michael Hempel, and Hamid Sharif are with the
Advanced Telecommunication Engineering Laboratory (TEL), Department of
efficiency of AMI [10].
Electrical and Computer Engineering, University of Nebraska–Lincoln, AMI designs can be categorized into model-driven and data-
Lincoln, NE 68588 USA (e-mail: pejman.ghasemzadeh@huskers.unl.edu; driven approaches [11]. Model-driven approaches, in particular
mhempel@unl.edu; hsharif@unl.edu).
Honggang Wang is with the Internet of Things and Data Engineering
the likelihood-based (LB) [12] and distribution test-based
Laboratory, Department of Electrical and Computer Engineering, Univer- AMI [13], suffer from either high computational complexity
sity of Massachusetts at Dartmouth, Dartmouth, MA 02747 USA (e-mail: in the estimation of the hypothesis model’s parameters or
hwang1@umassd.edu).
Color versions of one or more figures in this article are available at
poor performance when abrupt changes in the pattern of the
https://doi.org/10.1109/TWC.2023.3239311. received symbols occur [14]. Hence, model-driven approaches
Digital Object Identifier 10.1109/TWC.2023.3239311 fail to meet the performance efficiency requirements of modern
1536-1276 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6034 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

communications systems in low signal-to-noise ratio (SNR) Wigner-Ville distribution (SPWVD), which can represent both
environments [15]. On the other hand, data-driven AMI essen- time and frequency features of the received signal. Considering
tially involves deep learning models acting as classifiers that a low-SNR environment and a high-order modulation scheme,
are robust in pattern recognition [16]. These types of AMI SPWVD cannot be effective in carrying deep correlative
classifiers are trained before deployment for the modulations information of the received constellation with a low number of
they support over the environments they are intended to samples due to the integration process in its mathematical rep-
be used in. From among the various deep learning mod- resentation. Improving their AMI classifier therefore increases
els [17], convolutional neural networks (CNN) [18] have been the execution latency, which further implies that their classifier
broadly and successfully adopted for AMI by learning valid cannot be efficiently used in real-time classification for such
representations from complex data and mapping them into environments and high-order modulation schemes. We refer to
image characteristics and features [19]. Therefore, we select this model as SPWVD-CNN in our paper.
CNN-based deep learning models to review and assess their The closest model to CNN-based architectures to observe
operations in improving the performance and efficiency of the input data as an image and exhibits the capability of learn-
AMI in low-SNR environments. ing convolutional features is graph convolutional neural net-
works (GCNN). In the only GCNN-based work for AMI [25],
A. Related Work referred to as AMRGNN in this paper, the authors implemented
Wang et al. [20] proposed a data-driven deep learning a stacked-layer GCNN architecture with 2 layers followed by
model, which we refer to as DR-CNN in our paper, that com- a softmax operation acting as the final classification stage. The
bines two CNN-based models trained over different simulated presented results in this work show not only higher classifica-
datasets generated by the authors. Even though their method tion accuracy compared to CNN-based architectures, but also
of combining CNN-based models performed slightly better the likelihood of less computational complexity (less number
than other AMI classifiers they investigated, their method of trainable parameters). This initial study of a GCNN-based
achieves lower efficiency since their classification processing model indicates the potential for improving the AMI general
chain appears more computationally complex compared to performance in low-SNR environments.
other schemes in this domain due to the need for training
and testing two separate CNN structures. Additionally, the use B. Problem Statement
of simulated datasets rather than real-world captured datasets The presented related works and their efforts to improve the
makes their approach and the trained classifier less applicable AMI classifier performance can be categorized as follows:
to real-world scenarios. Zhang et al. [21] proposed a dual- • The first group contains those works adopting a heavy
stream CNN-based model that utilizes long short-term memory CNN-based architecture to extract deep correlative infor-
(LSTM) to explore feature interactions of the received signal. mation from received symbols. Even though these mod-
However, while this model could increase the performance els have made improvements in increasing the ultimate
of their AMI classifier for low-order modulation schemes, classification accuracy of their AMI classifier, they also
such an increase does not occur for higher-order modulation significantly increased the computational complexity of
schemes. This limits their model’s real-world applicability. their models structure. This further reduced their model’s
Moreover, using dual-stream and LSTM structures in their efficiency, which we define as the tradeoff between clas-
model significantly increases their model’s computational sification accuracy and computational complexity, as well
complexity, which leads to lower efficiency for implementation as its suitability for real-world implementations.
in low-SNR environments. This model is referred to as LSTM- • The second group are those related works that adopted
CNN in our paper. Meng et al. [22] proposed a CNN-based a light-weight CNN-based architecture to maintain or
deep learning model, which is trained in two different steps reduce their model’s computational complexity compared
to obtain a closer approximation to the optimal LB approach. to literature works. These models fail to produce sig-
This method exhibits a slight increase in the performance of nificant improvements in their ultimate AMI classifier’s
the AMI classifier in low-SNR environments based on the classification accuracy, however.
evaluation they provided in their paper. On the other hand, this Even though related works all have attempted to increase
model’s high computational complexity due to dual-training the ultimate performance of their AMI classifiers, their models
of their model does not represent a practical implementation fail to either produce significant improvements in classification
of an AMI classifier in a real-world scenario. In our paper’s accuracy or maintain the computational complexity, i.e., exe-
comparative analysis, we refer to this method as TwoStep- cution latency as defined in subsection B of section III in the
CNN. Wang et al. [23] proposed a light-weight CNN-based context of AMI. As it is analyzed in subsection D of section
model with smaller model sizes, obtained through reduction IV, this results in a further decrease in efficiency of their AMI
in kernel sizes or filter counts, which leads to lower compu- classifiers in low-SNR environment. This is especially true
tational complexity. In our paper, this model is referred to as considering that high-order modulation schemes will further
Light-CNN. Moreover, in their model, they also took advantage exacerbate these problems.
of SNR estimation in the receiver structure. Due to its very
light-weight feature extraction structure, this model performs C. Contributions
poorly in low-SNR environments. Zhang et al. [24] proposed To improve both the classification accuracy and the effi-
a CNN-based model that takes advantage of smooth pseudo ciency of the AMI classifier in low-SNR environments,

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6035

we present our research on a deep learning architecture TABLE I


called “gated graph convolutional neural network” (GGCNN), L IST OF S YMBOLS
which is built upon a feature extraction stage coupled with
a graph convolutional neural network (GCNN) sequence in
order to take advantage of the graph classification capabilities
of such networks. In our architecture for AMI classification,
we design and evaluate a feature extraction stage that consists
of three components: The first component extracts embedded
signaling features, i.e., unbiased cumulants, from the received
samples. Computing these features with any order greater than
2 provides the advantage of eliminating the effect of noise
that follows the Additive White Gaussian Noise (AWGN) dis-
tribution over the received samples. Consequently, providing
such features unaffected by AWGN noise to the subsequent
extraction procedures will improve the ultimate classification
accuracy of the AMI classifier. To further empower the total
feature extraction process, the second component utilizes
two-stacked gated recurrent units (GRU) followed by a short
time-distributed neural network in order to extract more of the
temporal features of the received signal. These features can
help the AMI classifier’s decision-making to be more robust
and stable against abrupt changes in the received samples. The
third component is the designed GCNN sequence that imple-
ments a detection pattern procedure through adjacency matrix
operation coupled with a graph convolutional neural block.
This procedure treats the received constellation as a graph,
while it shows strong performance in graph classification. It is
this capability that is responsible for the improvement in the
proposed GGCNN-AMI classifier’s training and classification
performance in low-SNR environments. In order to further
improve the classification accuracy of the proposed GGCNN-
AMI classifier, we prevent overfitting by utilizing a dropout
operation in the graph convolutional blocks. Additionally,
we implement a dense layer to keep the growth rate of the
network low to eventually incur a very small increase in
the entire AMI classifier’s computational complexity. In order
to design an efficient AMI classifier, we evaluate the effect
of different activation functions and pooling operations, and
the number of GCNN blocks in the GCNN sequence. The
designed feature extraction stage is very well-suited to pro-
duce highly differentiated features, as can be observed from
visualizing the results by plotting the scatter map of the
extracted features. A classification module will follow the fea-
ture extraction stage to label the received signal’s modulation
scheme and its order. Based on our extensive evaluation in
section IV, our proposed GGCNN-AMI classifier not only
exhibits significantly improved classification accuracy, but also integration and the procedure we utilized to maximize the
much higher efficiency compared to the state-of-the-art related efficiency of our architecture through the systematic evaluation
works. In our quantitative analysis, we present the lower and of activation functions and pooling operations are provided
upper performance bounds of AMI classifier as a broader per- in section III. Section IV provides the comparison models
formance evaluation compared to literature works by using the utilized for our quantitative evaluation and a detailed per-
highest- and lowest-order digital modulation schemes available formance analysis of the proposed GGCNN-AMI classifier
within the RadioML dataset, i.e., 256QAM and BPSK. compared to related works. This analysis includes visualization
of our designed feature extraction capability, investigation
D. Paper Organization and List of Symbols of classification accuracy and computational complexity, and
The rest of the paper is organized as follows. Section II finally assessing efficiency. Finally, we conclude the paper in
provides a detailed description of the proposed GGCNN- Section V. Definitions of the important symbols used in this
AMI classifier’s architecture. Information on our simulation paper is summarized in Table I.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6036 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

Fig. 1. The proposed GGCNN-AMI classifier’s architecture.

II. P ROPOSED GGCNN-AMI C LASSIFIER from:


The overview of our proposed GGCNN-AMI classifier’s L
1X
architecture can be observed in Fig 1. As can be seen, X= xi . (3)
L i=1
the proposed architecture consists of two main components:
the feature extraction module and the classification module. When the received symbols xi are complex-valued, the unbi-
The feature extraction module itself is comprised of three ased joint 4th -order cumulants estimator of the received con-
procedures to extract embedded signaling features, temporal stellation X need to be computed from:
features and graph-based features. Next, we will present the
operation of each procedure in the feature extraction stage and L2
C4 (X) = [(L+1)X 4 −4(L+1)X 3 X
classification module. L3 −6L2 +13L−12
4
− 3(L−1)X 2 X 2 +12L(X 2 X X)−6LX ]. (4)
A. Embedded Signaling-Based Feature Extraction
After these embedded signaling-based features are extracted,
One set of the embedded signaling features that can be
they are appended to their original corresponding received
extracted from the received signal is the set of high-order
signal vector (constellation) for further processing in the
statistics-based (HoS) features. These features are capable of
GCNN sequence.
extracting highly correlative information about the symbols in
the received constellations. This characteristic is of significant
importance to AMI classifiers in order to determine patterns B. GRU & Short Time-Distributed Neural Network
among the received symbols for modulation schemes with A fully gated recurrent unit exhibits an increased ability
high orders [26]. Moreover, the effects of the AWGN noise to be trained over long-term changes with lower frequency
impacting the received symbols’ extracted HoS features for than LSTM, which has been conventionally used in the AMI
any order greater than 2 will be removed [6]. This property domain. Thus, the two-stacked GRU is implemented to deter-
enables the AMI classifier to identify modulation schemes with mine temporal patterns among the received symbols, to more
high classification accuracy even in low-SNR environments. accurately determine symbol boundaries [18]. Additionally,
This was experimentally proven for AMI classifiers using the it requires a lower execution latency than LSTM because it
4th -order cumulants [19]. This feature can be computed based requires fewer parameters in its optimization procedure. The
on the numeric complexity of the received symbols. If the other reason to implement a two-stacked GRU followed by
received symbols xi in constellation X are real-valued, the a short time-distributed neural network (STDNN) to extract
4th -order cumulants are recursively computed from [26]: temporal features is to incorporate the timing factor in the
n−1 training and classification processes to more closely represent
X (n − 1)!
Cn = Mn − Cm Mn−m , (1) a real-world scenario where the signal is received in a time-
m=1
(m − 1)!(n − m)!
division-based mechanism. The first GRU unit accepts the
where n = 4, and the unbiased signal moment estimator Mn input data Xt ∈ Rn×d with n number of samples and d number
of order n is computed in (2) based on the two-pass correction of signal inputs at time step t from the received signal vectors
algorithm via computing trial mean [19]. (dataset), and outputs the vector Ht ∈ Rn×d with h number
n   X L L of hidden units. The output vector Ht is computed from:
X n −1 X
Mn = ( (xi − X)n−k )( xi − X)k (2)
k i=1 L i=1 Ht = Zt ⊙ Ht−1 + (1 − Zt ) ⊙ Kt . (5)
k=0

In equation (2), where the L is the number of received where Ht−1 is the old state of Ht , and where the update gate
symbols, the mean of the received symbols X is computed Zt ∈ Rn×d and hidden state Kt ∈ Rn×d are computed based

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6037

on (6) and (7), respectively.


Zt = σ(Xt Wα + Ht−1 Wβ + bz ) (6)
Kt = tanh(Xt Wα + (Rt ⊙ Ht−1 )Wγ + bk ) (7)
In (6), σ represents the sigmoid function, and the reset gate
Rt ∈ Rn×d is computed similar to the update gate. The output
of the first GRU unit (Ht ∈ Rn×d ) will be used as the input
for the second GRU unit with the same operation.
The short time-distributed neural network saves the derived
information from the two-stacked GRU using the following Fig. 2. Implementation of the GCNN block adjacency matrix operation, the
procedure: Through processing of a short time-distributed forming layer and their corresponding elements with FC layer filter counts.
neural network layer, which is characterized with weight Wζ ,
bias bζ matrices and total number of time steps T , the output
of the second GRU unit {Ht }Tt=0−1
results in the calculation of layer G(.) receiving the vector of k th signal S(k) ∈ RV ×dk as
T −1
attention weights α = {αt }t=0 based on a softmax activation input and generates S(k+1) ∈ RV ×dk+1 from:
function, as shown in (8). (k+1) (k)
X
Sl = G(S(k) ) = Activation Function( BS(k) θl ),
σ(Ht · Wζ + bζ ) l∈A
αt = T −1
(8) (10)
X
σ(Ht · Wζ + bζ )
where l = d1 , · · · , dk+1 and B represent the sets of input
t=0
signal vectors and biases. The set of trainable parameters
PT −1 (k) (k) (k)
It should be noted that t=0 αt = 1 while αt ≥ 0. The are Θ = {θ1 , · · · , θ|A| } with θl ∈ Rdk ×dk+1 [28]. The
calculated αt will be discarded if αt ⩽ 0. Then the final output activation function can be selected from the candidate pool of
of the short time-distributed neural network is calculated by: {ReLU, LeakyReLU, GeLU and ELU} to achieve an efficient
T −1
model [29], [30]. We will numerically investigate this in the
next section. We have selected these activation functions for
X
Y= αt Ht . (9)
t=0
their two common properties: 1) their derivative is in the shape
of a step function, and 2) they all are zero-centered [31],
This operation will result in: 1) reducing the final output [32], [33], [34]. These properties will reduce the gradient
dimensionality of the two-stacked GRU blocks for each time vanishing problem that can occur during the training of the
step [26], and 2) helps avoid overfitting by following a model [26]. Approximation to a wide range of graph inference
temporal attention mechanism performed over the training tasks can be achieved by cascaded operations in the form of
time steps. Further details can be found in [26]. Extracted tem-
(10) [35]. In particular, generalized cascaded operations are
poral features will be appended to the original corresponding
capable of learning edge features and collect them in form of
received signal vectors for processing in graph-based feature
the adjacency matrix AK {i,j} from:
extraction procedure.
(K) (k) (k)
A{i,j} = MLPθ̂ (abs(vi − vj )). (11)
C. GCNN Sequence As shown in Fig 2, this operation can be implemented through
This sequence is comprised of 3 GCNN blocks and 2 dense a multilayer perceptron (MLP) neural network - a sequence
layers, each followed by flattening layers. Please see sub- of fully-connected (FC) layers - that takes as input the abso-
(k) (k)
section C of section III for justification of the number of lute difference between the two vector nodes vi and vj
implemented GCNN blocks. Each GCNN block consists of (received samples) [36]. Fig 2 also shows the forming layers
an adjacency matrix operation followed by a convolutional and their corresponding elements of the MLP neural network.
block operation. In the context of modulation identification, A batch normalization process with ϵ = 0.001 and π = 0.99 is
the adjacency matrix operation, which interprets the received applied after MLP neural network implementation to mitigate
constellation as a graph, can be defined as follows: We the effect of large noise variance in the feature maps possibly
consider a family graph intrinsic operator A acting locally on in full measure [27]. ϵ and π represent the small float added to
a given input set of constellations or signals S ∈ Rl×d with d the variance to avoid dividing by zero, and the momentum for
number of data inputs for each signal on the set of nodes V filter moving average, respectively. Across all of the operations
(vk ∈ V corresponds to the k th signal, where 1 < k < d) of and features of the GCNN blocks, it is the learning of more
a weighted graph G(V, E) with E as the set of edges, which precise edge features, i.e., detecting the relationships of the
are unidirectional and acyclic due to the randomness (noise) received samples through the adjacency matrix operation, that
of received samples in the constellations [25]. The adjacency is the key factor in achieving a more accurate training process,
operator
P is defined as A : S → A(S) where (AS)i := which in turn enables the AMI classifier to obtain higher
j∼i w i,j Sj with i ∼ j if and only if (i, j) ∈ E and wi,j classification accuracy over the entire SNR range, specifi-
its associated weight [27]. Implementation of aforementioned cally including for environments with low-SNR characteristics.
adjacency operation in deep learning is a graph neural network This is due to the fact that in this particular problem, edge

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6038 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

(K) (K)
Due to the diagonality of either A{i,i+1} or A{i+2,i+3} ,
equation (14) can be rewritten based on equation (15) as:
(K) (K) (K)
A{i,i+1} ∗ A{i+2,i+3} = Xdiag(Â{i+2,i+3} (λ0 ),
(K) (K)
. . . , Â{i+2,i+3} (λN −1 ))XT A{i,i+1} .
(16)

Fig. 3. Graph convolutional blocks’ structure. This operation can be considered as a convolution kernel,
which can be implemented through a set of free parameters
N −1 (K) (K)
features measuring the differences between the vector nodes {θm }m=0 ∼ diag(Â{i+2,i+3} (λ0 ), . . . , Â{i+2,i+3} (λN −1 )) in
(received samples) of two consecutive constellations carry the the Laplacian eigenspace, i.e., Fourier domain. Moreover,
geometric information of the constellations and their points these parameters can also be considered as a function of
(graph nodes). This can mathematically be seen in (11). eigenvalues G(Θ) = diag(θ0 , . . . , θN −1 ), which leads to a
Moreover, edge features also include the direction information rewritten form of equation (16) as:
of graph nodes, which can determine if nodes have a for- (K) (K) (K)
A{i,i+1} ∗ A{i+2,i+3} = XG(Θ)XT A{i,i+1} . (17)
ward, backward, or undirected relationship. Determining the
directionality of nodes’ relationship will improve the training The above convolution kernel operation exhibits notable com-
process due to the fact that it can determine the correlation putational cost due to the eigendecomposition performed in
between graph nodes. The more accurate these features are, each computation described by (17), as well as incompatibility
the higher the ultimate classification accuracy will be. with CNN local connections due to the relationship of the
The adjacency matrix operation is conducted for every two these parameters to the global vertices in the defined Laplacian
received signal vectors (constellations), which will result in space [39]. Addressing these problems involves implementing
(K) (K) (K) a fast localized convolution based on low-order polynomial
a set of adjacency matrices: {A{1,2} , A{3,4} , . . . , A{d−1,d} }.
This set will be used in the next procedure, i.e., graph convo- approximation. Therefore, G(Θ) can be expressed as:
lution block, which its overall structure can be seen in Fig 3. R
X
In each GCNN block, the graph convolutional blocks extract G(Θ) = θr Λ r , (18)
the information related to the frequency of the computed l=0
adjacency matrix characteristics from previous operation in
where R is the polynomial order and {Λ}R r=0 is the vector of
GCNN block. Hence, a Fourier-based operation based on
polynomial coefficients. Then, equation (17) can be rewritten
adjacency matrix should be adopted. We let the eigenvectors
−1 (K) as:
{Xm }N m=0 of the Laplacian of adjacency matrix A{i,j} that R
satisfy the orthogonality condition be used the decomposition (K) (K)
X (K)
A{i,i+1} ∗ A{i+2,i+3} = X( θr Λr )XT A{i,i+1} . (19)
bases in the Fourier transform instead of the conventional
r=0
complex exponentials [37]. N represents the dimension of
eigenvector. Then, the Fourier transform and its inverse can Due to the independence of free parameters and modified
be defined, respectively, as: Fourier transform, we can obtain:
R
N −1 (K) (K) (K)
X
(K) X
T (K) (K) A{i,i+1} ∗ A{i+2,i+3} = θr (XΛr XT )A{i,i+1} , (20)
Â{i,j} (λm ) = Xm (n) A{i,j} (n) = XT A{i,j} , (12) r=0
n=0
N −1 through which eigendecomposition computation is avoided,
(K)
X (K) (K)
A{i,j} = Â{i,j} (λm ) Xm (n) = XÂ{i,j} . (13) since the convolution kernel is then calculated by K number
m=0 of multiplications instead. Incorporating the ReLU activation
function after graph convolution operation with a layer-wise
(K)
This can be interpreted as the expansion of A{i,j} in terms propagation rule can be expressed as:
of eigenvectors of Laplacian [38]. Thus, the convolution over P X R
(K) (K) (K
X
two consecutive adjacency matrix can be converted into a Yi = σ( ( θ{i,i+1} r (XΛr XT )A{i,i+1} )+bj )), (21)
point-wise product in the Fourier domain as: i=0 r=0

(K) (K) (K)


A{i,i+1} ∗ A{i+2,i+3} where P and bj represent the number of trainable param-
N −1
eters and biases, respectively, which are initialized using
X (K) (K) random values. A pooling operation, which is typically used
= Â{i,i+1} (λm )Â{i+2,i+3} (λm )Xm (n), (14)
to compute the statistics of local neighborhoods into a single
m=0
output value, is applied to reduce the complexity of the model.
which can be shortened based on equation (12) as: However, we will investigate the format of this pooling opera-
(K) (K) (K) (K)
tion, i.e., max-, average- or no-pooling, to achieve an efficient
A{i,i+1} ∗ A{i+2,i+3} = X((XT A{i,i+1} ).(XT A{i+2,i+3} )). GGCNN model in the next section. We chose to not investigate
(15) the min-pooling operation as part of this research because

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6039

its principal functionality - selection of identified features temporal-based features on top of the GCNN sequence, where
with the least amount of repetition within the feature set and the deep feature extraction and training processes are con-
removal of all others - contradicts with the goal of training ducted by adjacency matrix and graph convolutional block
the model to focus on tightly clustered features. If overfitting operations, enables the proposed GGCNN-AMI classifier to
happens in an AMI classifier, it generally causes a reduced operate with high robustness in low-SNR environments.
classification accuracy, especially when the data behaves more
randomly, which is the case with received samples in low- III. S IMULATION I NTEGRATION , E VALUATION
SNR environments. Hence, a dropout is employed to prevent P ROCEDURE AND E FFICIENT A RCHITECTURE
overfitting in the training process. A concatenation process This section provides information about the simulation
is applied to pass the raw features extracted from received integration, such as the chosen dataset and the environment
symbols by the adjacency matrix operation to the next GCNN selected for training and testing. We also introduce the eval-
block, where same operation is implemented. Hence, the uation procedure along with its comparison factors in our
output of the graph convolutional block is: performance analysis. Using this evaluation procedure, we will
Vi
(K)
= Pooling(Yi
(K) (K)
) + A{i,i+1} . (22) determine an efficient architecture of our proposed GGCNN-
AMI classifier.
Network optimization and parameter training are both con-
ducted through minimizing the loss function for each batch
A. Simulation Integration
of data and gradient back-propagation algorithm, respectively.
This procedure is similar to the conventional CNN models The RadioML 2018.01A dataset [41] is selected as the
apart from the convolution operation as can be seen in equation modulation signal reference for our evaluation effort. This
(20). As stated above, this modified convolution operation dataset contains 24 digital and analog modulation schemes,
only involves the addition and multiplication of adjacency each with 4096 signal waveforms at each individual SNR
matrix and training parameters. Hence, the two gradients can point. Each signal waveform includes 1024 separate complex
be obtained as: IQ samples (2 × 1024). This creates a dataset of 2,555,904
vectors of modulated signal waveforms. Additionally, having
∂J (K) ∂J
= (XΛr XT )A{i,i+1} (K) , (23) separate complex IQ samples provides flexibility for the AMI
∂θ{i,i+1} r ∂A classifier to process received symbols in their complex or real-
{i+2,i+3}
valued forms. The number of training, validation and testing
and
samples of the RadioML dataset are set to be 50%, 10% and
P R
∂J X ∂J X 40%, respectively. We selected a smaller than usual training
(K)
= ( (K)
( θ{i,i+1} r (XΛr XT ))). dataset compared to literature works, specifically with the aim
∂A{i,i+1} i=0 ∂A{i+2,i+3} r=0
to represent a real-world scenario where the available and
(24) usable samples for training of an AMI classifier might be few.
This training operation will exhibit the same learning compu- By studying the training procedure of most of the literature
tational complexity as conventional CNN models [40]. This works, we realize that their models were each trained on a sub-
fact helps limit the computational complexity of the designed set of modulation schemes available in the RadioML dataset.
GCNN sequence. To serve the same purpose, a dense layer However, it should be noted that such limited training does not
with 16 units, Xavier uniform set for kernel initializer and necessarily represent a real-world scenario, where typically a
no activation function and kernel regularizer is implemented transceiver is capable of handling a wide range of modulation
between each GCNN block. This operation is followed by a schemes and thus any AMI classifier similarly needs to be
flatten layer to turn the high-dimensional space of extracted trained on a wide set of modulation schemes. Therefore,
features into a one-dimensional vector to be able to be used in order to achieve a more realistic view of the performance
in the subsequent GCNN block, which leverages the same and as part of our comparative evaluation, we train our
operation as explained above. Finally, the features that were GGCNN-AMI model as well as the comparison models, which
extracted and abstracted in the feature extraction stage will will be introduced in next section, on all digital modulation
then be used in classification module. schemes in the dataset. The training procedure is designed
to be executed pre-deployment on a per-modulation scheme
basis, over the entire supported SNR range. We perform
D. Classification Module this process for the proposed GGCNN-AMI classifier and all
The classification module consists of a fully connected comparison models. This training procedure represents a more
neural network with two layers. The first layer has 96 neurons realistic scenario where the AMI classifier has no a priori
while the second layer’s number of neurons s equivalent to the knowledge of the SNR value for the environment where it
targeted number of modulation schemes to be classified. The will be deployed in.
Softmax activation function produces the final probability of Our training, validating and testing environments are imple-
detecting a given modulation type for a received signal. mented using the deep learning library Keras running on top
With the functionality of each GGCNN-AMI classifier of TensorFlow, executed on our university’s supercomputing
element detailed, we can observe that designing such fea- infrastructure, HCC Crane [42]. The training factors for our
ture extraction stage capable of extracting signaling- and designed GGCNN-AMI classifier can be seen in Table II.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6040 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

TABLE II expression representing the computational complexity of


GCNN S EQUENCE H YPERPARAMETERS the AMI classifier design. Since the training and vali-
dation processes of the AMI models are assumed to be
carried out before their deployment in the environment,
we will separately assess the computational complexity
for training + validation and classif ication in terms
of execution latency. In order to ensure a fair comparison
regardless of the computing power of the implementa-
tion platform, we independently normalize the execution
latency for training + validation and classif ication
The adaptive moment estimation (Adam) algorithm has been
by their respectively fastest performing model.
used to optimize the learning rate.
• Efficiency (ρ): We define the efficiency of AMI classi-
fier as the tradeoff between their probability of correct
B. Evaluation Procedure classification and execution latency. In other words, this
The majority of the literature works only investigate the factor will show us the fraction of what AMI classi-
probability of correct classification of their classifier, while fier will achieve over what it spends. Hence, we can
some also attempt to assess their classifier’s computational mathematically define the efficiency ρ for any modu-
complexity. However, we will incorporate two more factors in lation scheme as the ratio of the average probability
our evaluation procedure to analyze the classifier performance of correct classification across the entire evaluated SNR
in more details. Thus, our evaluation will utilize the following range over the execution latency for τ ∈ {training +
factors: validation and/or classif ication} as:
• Feature Extraction Robustness: In order to highlight 1 X
the capabilities of the AMI classifier’s feature extraction PCC iSN R
N
stage to produce highly differentiated features, we can SN R
ρ{τ } = , (25)
visualize the scatter map of the extracted features, i.e., τ
learned node embeddings for our GGCNN-AMI classifier. where i ∈ M and N are the set of all modula-
The t-SNE method with learning rate, perplexity and tion schemes to be investigated and the number of
step size equating to 200, 50 and 5000, respectively, SNR values, respectively. For this evaluation, M =
was used to carry out the visualization procedure for {BP SK, 256QAM } and N = 40.
the extracted features [43]. Through using t-SNE, the
high dimensionality of the feature space from the last C. Efficiency-Maximization Approach for Our Architecture
GCNN block is reduced into a 2-dimensional space Now that the efficiency of an AMI classifier is defined,
for visualization. We created two groups of modulation and before we proceed to present the evaluation for our
schemes: {256QAM, 128QAM and 64QAM} represent- GGCNN-AMI model compared to literature works, we need
ing the difficult class, and {8PSK, QPSK and BPSK} to establish the most efficient architecture configuration of
representing the normal class. Each will be plotted at our GGCNN-AMI model. In the first step, we determine
SN R = 5 dB. It is important to note that through t-SNE, how many GCNN blocks are needed to efficiently extract
the distance between any two points in the 2-dimensional deep correlative information from received samples. However,
output represents the similarity of their respective feature before determining so, we need to ensure the correctness of
vectors in the high-dimensional space. our model design, our dataset and our approach to parameter
• Probability of Correct Classification (PCC ): This factor initialization. To achieve this verification, we can observe
is directly related to the classification accuracy of the that the trend of the training and validation accuracy and
classifier, and hence is determined by the fraction of loss over different training iterations follows the expected
the number of correctly classified received signals over trend, i.e., the accuracy increases and the loss decreases
the total number of received signals that were classified. with each iteration. We perform this verification stage by
In our evaluation, in order to more broadly investigate selecting 3 GCNN blocks, ELU as the activation function, the
and analyze the performance of the AMI classifiers, no-pooling operation, and then training this model architecture
we selected both the highest- and lowest-order digi- for 16QAM at SN R = 0 dB. The results of this test are
tal modulation schemes available within the RadioML shown in Fig 4. From the results we can indeed confirm the
dataset, i.e. 256QAM and BPSK, in order to determine expected behavior of our model during training. It should be
the lower and upper performance bounds of an AMI noted that all the curves in this paper have been interpolated
classifier’s PCC . It can therefore be stated that the and smoothed for better visibility.
performance of all these AMI classifiers for any other From Fig 4, we can observe that as the number of iterations
modulation scheme found in the RadioML dataset is increases, the training and validation accuracy also increases,
bounded by these two limits. while the training and validation loss decreases.
• Execution Latency (τ ): The execution time of the Now that we are ensured that every element in the model’s
AMI classifier, which is a key contributor to the overall architecture works as expected, we can proceed with determin-
execution latency of the receiver, was selected as an ing the most effective number of GCNN blocks to be used in

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6041

TABLE III
E XECUTION L ATENCY FOR 1-4 GCNN B LOCKS IN S ECONDS

Fig. 4. Training and validation accuracy and loss for our designed model
during the model verification process (3 GCNN blocks, ELU as activation
function, no-pooling, 16-QAM, 0 dB SNR).

Fig. 6. Probability of correct classification for the 12 combinations of acti-


vation functions and pooling operations after averaging across all modulation
schemes.

Fig. 5. Average PCC based on the number of GCNN block. Based on Table III, adding a GCNN block will incur a
relative increase of 8% in execution latency. This increase
in execution latency justifies the PCC improvement for up
the final architecture. To accomplish this, we can simply inves- to 3 GCNN blocks. Hence, we select to use 3 GCNN blocks
tigate the PCC gain versus execution latency increase while in our proposed model’s architecture.
stepwise increasing the number of GCNN blocks. Fig 5 shows In the second step of determining the efficient architecture,
the average PCC for all modulation schemes for these tests we will need to evaluate different options for activation
using 1 through 4 GCNN blocks. It should be noted that since function selection in the adjacency matrix, as shown in Fig 2
at this point we have not yet conducted our investigation into and equation (10), and for different pooling operations within
determining the best choice for activation function and pooling the Graph convolutional block, as shown in Fig 3 and equation
operations, we will continue to use ELU and no-pooling for (22). For this purpose, as mentioned earlier, we study four
determining the best number of GCNN blocks. different activation functions (ReLU, LeakyReLU, GeLU and
As can be seen from Fig 5, increasing the number of GCNN ELU) and three different pooling operations (max-, average-
block from 1 to 2 and from 2 to 3 results in a significant or no-pooling), resulting in 12 combinations that can be used
improvement of PCC by on average 15.45 and 17.09 per- in our model. For determining the most effective combination,
centage point (p.p.) across the entire range of SNR. However, we first need to investigate each combination’s probability of
having 4 GCNN blocks does not exhibit any substantial PCC correct classification. We conduct this evaluation by averaging
gain. We observe only a gain of 2.81 p.p. on average compared across all available modulation schemes within our dataset.
to using 3 GCNN blocks, especially in lower SNR values, This allowed us to include any modulation-specific effects
which is the focus of this paper. Investigating the execution in this selection process. The resulting PCC averaged across
latency, shown in Table III, alongside the results in Fig 5 modulation schemes can be seen in Fig 6, plotted for all
provides confirmation that the selection of 3 GCNN blocks 12 aforementioned activation+pooling combinations.
for the final model’s architecture provides the most effective We can observe from these results that purely in terms
and efficient performance outcome. of accuracy, the combination of LeakyReLU + No Pooling

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6042 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

TABLE IV
N ORMALIZED E XECUTION L ATENCY OF 12 C OMBINATIONS TO D ETER -
MINE E FFICIENT A RCHITECTURE

TABLE V
C OMPUTED E FFICIENCY FOR E ACH A FOREMENTIONED C OMBINATION

Fig. 7. Training and validation accuracy and losses of the comparison models.

extends to the neural network models referred to as VGG


and RN implemented in [41]. The CNN-based models were
selected because they exhibit similar capability of learning
convolutional features from the input data to GCNN-based
models. The models in [41] were also selected because they
have a very high referral rate within the AMI research commu-
nity. We reconstructed their models and validated this imple-
achieved the highest overall results. However, in order to mentation by comparing the number of trainable parameters
obtain the efficiency of each combination of activation function and the trend of their results for PCC . In our evaluation,
and pooling operation, we also need to consider the execution we adjusted their training procedure as explained in the
latency. The execution latency for training + validation and previous section. In order to ensure the correct reconstruction
classif ication is computed and can be seen in Table IV. For process of these models as well as the adjusted training
ease of comparison we normalized the execution latency using procedure, the training and validation accuracy and losses of
the lowest achieved latency. these models are plotted in Fig 7 for 16QAM at SN R = 0 dB,
This will lead to compute the efficiency for each combina- similar to Fig 4.
tion, which can be seen in Table V. As we can see from Fig 7, the training and validation accu-
As can be seen from Table IV and Fig 6, the max-pooling racy increases as the number of the trained epochs increases,
operation demonstrates a higher probability of correct classifi- while the training and validation loss decreases, as expected.
cation as well as efficiency with any of the activation functions Next, we will investigate and compare the performance of
compared to the average-pooling operation. This very well our model against comparison models based on the evaluation
connects to the training of free parameters in the graph con- factors introduces in the previous section.
volutional block and their relationship with the global vertices
in the Laplacian space, as explained in the previous section.
Additionally, selecting LeakyReLU for the activation function A. Extracted Feature Visualization
and max-pooling as the pooling operation will achieve the One of the factors that can provide intuition for robustness
highest efficiency among all evaluated combinations. Hence, of a feature extraction module or stage is visualizing the
this architecture is adopted for our performance analysis and extracted features to determine how distinct these features
comparison against other literature works that is shown in the are. The more distinct the extracted features of a class or
next section. group are, the more robust the feature extraction stage is.
This will have direct impact on the classification accuracy
IV. P ERFORMANCE A NALYSIS AND D ISCUSSION since better separation aids in distinguishing between features
In this section, we compare the performance of our classes. Visualization of the extracted features is conducted by
GGCNN-AMI model against various models including the using the t-SNE method, as described earlier. Fig 8 shows the
CNN-based models discussed in section I-A as related plotted scatter map visualization of the extracted features for
works [20], [21], [22], [23], [24], [25]. Our comparison also our GGCNN-AMI model.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6043

Fig. 8. Scatter map visualization of the proposed GGCNN-AMI classifier.

As can be seen in Fig 8, the centers of the extracted fea-


tures show significant separation from one another. This fur-
ther helps with grouping the extracted features and leads
to higher classification accuracy of the GGCNN-AMI clas-
sifier. We can observe that the extracted features for the
lowest-order modulation schemes in both difficult and normal
classes are entirely separated from other extracted features.
For higher-order modulation schemes in both classes, the
distribution of extracted features is less pronounced but still
observable. This observation of groupings cannot be observed,
however, when evaluating the comparison models’ approaches,
as demonstrated in Fig 9.
The LSTM-CNN model, shown in Fig 9(e), is the only
one capable of clustering the extracted features closer to
one another. Yet, the coverage area of these features sig-
nificantly overlap and are far less separable, which visu-
ally indicates difficulty for the classifier to determine any
decision-making boundaries for classification. On the other
hand, the AMRGCNN model, which has a similar operating
principle as our proposed GGCNN-AMI model, shows some
capability in clustering the extracted features for the BPSK
and QPSK modulation schemes. This can be seen in Fig 9(b).
Based on this analysis, the feature extraction module
designed in this research demonstrates its capacity in grouping
and separating the extracted features. This capability is the
foundation of the improvement of the AMI classifier classi- Fig. 9. Scatter map visualization of the extracted features of each comparison
fication accuracy directly impacting the PCC , which will be model’s approach.
evaluated in the following subsection.
achieves on average a 15.38 p.p. higher classification accuracy
B. Probability of Correct Classification for the 256QAM modulation scheme compared to all other
As mentioned in the previous section, this evaluation of AMI classifiers, which all follow a very similar trend. This
the PCC is presented utilizing the lower and upper bounds is a very substantial improvement for low-SNR environments.
of PCC , corresponding the PCC obtained for 256-QAM and For the upper SNR range (SN R >= 5 dB), the improvement
BPSK, respectively. The results of this analysis are shown in achieved by the proposed GGCNN-AMI classifier further
Fig 10. increases compared to other AMI classifiers. We can clearly
The curves are interpolated for more convenient observation observe a steeper slope in accuracy improvement as the SNR
of each curve’s trend. We will separately analyze and discuss increases. This allows the proposed GGCNN-AMI classifier
the lower and upper bounds. to achieve 100% accuracy at around SN R = 26 dB whereas
1) Lower Performance Bound Discussion: As can be all other AMI classifiers achieve this accuracy level only for
observed from Fig 10(a), the proposed GGCNN-AMI classifier SN R = 30 dB or higher. For the same SNR range, numerical
outperforms all other AMI classifiers over the entire studied assessment shows that the proposed GGCNN-AMI classifier
SNR range. For the lower SNR range (SN R < 5 dB), as a achieves on average a 26.41 p.p. higher classification accuracy
quantitative evaluation, the proposed GGCNN-AMI classifier compared to all other AMI classifiers.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6044 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

TABLE VI
AVERAGE P ERFORMANCE G AIN OF THE P ROPOSED GGCNN-AMI
C LASSIFIER C OMPARED TO OTHER C LASSIFIERS

other models, indicating that the proposed GGCNN model


exhibits more robustness in labeling each received signal
frame’s modulation scheme. This can also be observed in
Fig 11(b), where the decision threshold versus detection prob-
ability is plotted. As the decision threshold is increases, i.e.,
the threshold upon which the decision is made for labeling the
received signal’s modulation scheme, the detection probability
for other models drops more dramatically than for the pro-
posed GGCNN model. This means that other models label the
received signal’s modulation scheme with less confidence. For
example, the Light-CNN model never labels a received signal’s
modulation scheme equal to or higher than 90%, whereas
the proposed GGCNN model labels 10% of the total labeled
data with 100% probability. This scenario mostly happens for
low-order modulation schemes in very high SNRs. Such higher
performance can lead to higher success of prediction, which
can be seen in Fig 11(c) where the precision versus recall
curves are plotted. As can be seen, the higher area under
the curve for the proposed GGCNN model represents both
high recall and high precision, where high precision relates
to a low false-positive rate, and high recall relates to a low
false-negative rate. High scores for both demonstrate that the
Fig. 10. Classification accuracy of the proposed GGCNN-AMI classifier proposed GGCNN model is returning accurate results (high
compared to related works for lower and upper bounds.
precision) as well as returning a majority of all positive results
(high recall).
2) Upper Performance Bound Discussion: For the lower
range of SNR (SN R < 5 dB), observing Fig 10(b) demon- C. Execution Latency
strates that the proposed GGCNN-AMI classifier outperforms Based on the explanation of evaluating the execution latency
the other approaches by on average 14.18 p.p. higher accuracy for AMI classifiers provided in the previous section, the
compared to all other AMI classifiers. Similar to the previous normalized execution latency of the investigated AMI clas-
discussion, this is also a significant improvement in low-SNR sifiers computed separately for training + validation and
environments. For the SNR range between 5 dB and 15 dB, classif ication can be seen in Table VII.
we can observe similar performance gains. For the upper SNR For the normalized training + validation and
range (SN R > 15 dB), the performance difference begins to classif ication execution latency, the proposed GGCNN-AMI
shrink, since all AMI classifiers tend to reach their highest classifier achieves a 37.62% lower normalized execution
accuracy. latency compared to TwoStep-CNN, a 23.55% lower latency
In summary, Table VI indicates the percentage point compared to LSTM-CNN, and a 11.14% reduction in latency
improvement for average of the proposed GGCNN-AMI clas- compared to RN. On the other hand, it resulted in a 91.5%,
sifier compared to the other tested classifiers, for the lower 27.67%, 18.21%, 23.95% and 6.39% higher execution latency
and upper performance bounds in two SNR ranges: low-SNR compared to the Light-CNN, DR-CNN, SPWVD-CNN, VGG
(SN R < 5 dB) and high-SNR (SN R >= 5 dB). and AMRGCNN models, respectively. We will numerically
Analyzing the metrics in Fig 11 helps explain the higher show that even though the proposed GGCNN-AMI classifier
performance of the proposed GGCNN model. The area under has higher execution latency than some AMI classifiers, it has
the receiver operating characteristic (ROC) curve in Fig 11(a) significantly higher efficiency and thus is a more suitable
for the proposed GGCNN model is intuitively larger than choice for real-time implementations.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6045

Fig. 11. This figure shows the receiver operating characteristic (ROC), decision threshold versus detection probability and precision versus recall plots for
the proposed GGCNN model compared to related works.

TABLE VII
N ORMALIZED E XECUTION L ATENCY OF E VALUATED AMI C LASSIFIERS

D. Efficiency
Based on the definition of AMI classifier efficiency pre-
sented in the previous section, Fig 12 shows the computed ρ
for each AMI classifier based on the lower and upper perfor-
mance bounds, shown separately for training + validation
and classif ication.
Fig 12(a) shows that the efficiency of the proposed
GGCNN-AMI classifier is on average 63% and 87.84%
higher than the comparison models’ training + validation
or classif ication efficiency, respectively. Fig 12(b) indi-
cates an average efficiency improvement of 43.23% and
49% for training + validation or classif ication effi-
ciency, respectively, for the proposed GGCNN-AMI clas-
Fig. 12. Computed efficiency for each AMI classifier based on lower
sifier compared to all other investigated comparison and upper bounds for their respective training + validation and
models. classif ication execution latency.
To summarize, the efficiency of the proposed GGCNN-AMI
classifier is on average 75.42% and 46.11% higher for
V. C ONCLUSION
lower and upper bounds, respectively, while considering both
training + validation and classif ication execution latency In this research, we proposed a novel architecture to
compared to all comparison models. In other words, our maximize the efficiency of feature-based AMI in low-SNR
GGCNN approach achieves on average a 60.78% higher environments. The proposed GGCNN-AMI classifier includes
efficiency, which further provides a compelling argument jus- a feature extraction module, where embedded signaling-based
tifying the incurred increase in execution latency. In particular, and temporal features are extracted and appended to the
the high efficiency of GGCNN for the lower bound confirms original received signal. Additionally, a sequence of graph
that the proposed GGCNN-AMI classifier was very successful convolutional neural networks is implemented to extract the
in improving the AMI classifier performance for low-SNR graph-based features. Considering the significant effect of the
environments, while not significantly increasing computational activation function and pooling operation on the performance
complexity. of the AMI classifier, we adopted an efficient architecture

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
6046 IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL. 22, NO. 9, SEPTEMBER 2023

through investigating all combinations of activation functions [13] P. Ghasemzadeh, S. Banerjee, M. Hempel, M. Alahmad, and H. Sharif,
and pooling operations considered in this research. Visual- “Analysis of distribution test-based and feature-based approaches toward
automatic modulation classification,” in Proc. IEEE 30th Annu. Int.
ization of the scatter maps of the extracted features showed Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2019,
the robustness of our designed feature extraction module in pp. 1–6.
producing highly distinguishable features, which will directly [14] S. Banerjee, J. Santos, M. Hempel, P. Ghasemzadeh, and H. Sharif,
“A novel method of near-miss event detection with software defined
impact the decision making bounds of the AMI classifier. radar in improving railyard safety,” Safety, vol. 5, no. 3, p. 55, 2019.
The proposed GGCNN-AMI classifier exhibits a significant [15] Y. A. Eldemerdash, O. A. Dobre, O. Üreten, and T. Yensen, “A robust
improvement of 20.89 p.p. and 16 p.p. in low-SNR envi- modulation classification method for PSK signals using random graphs,”
IEEE Trans. Instrum. Meas., vol. 68, no. 2, pp. 642–644, Feb. 2018.
ronments for 256QAM and BPSK, respectively, representing
[16] T. J. O’Shea and J. Hoydis, “An introduction to deep learning for
the lower and upper performance bounds. We also evalu- the physical layer,” IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4,
ated the execution latency and efficiency of our proposed pp. 563–575, Oct. 2017.
GGCNN-AMI classifier and compared them against those of [17] S. Banerjee, M. Hempel, P. Ghasemzadeh, and H. Sharif, “A novel
biomimicry-based analysis of D2D user association retention for achiev-
other state-of-the-art published AMI classifiers. We showed ing maximal throughput,” in Proc. 15th Int. Wireless Commun. Mobile
that the proposed GGCNN-AMI classifier achieves a signifi- Comput. Conf. (IWCMC), Jun. 2019, pp. 2036–2042.
cant efficiency improvement by, on average, 60.78% compared [18] S. Huang, “Automatic modulation classification using compressive con-
volutional neural network,” IEEE Access, vol. 7, pp. 79636–79643,
to all AMI classifiers investigated as part of this research. 2019.
[19] S. Huang, Y. Yao, Z. Wei, Z. Feng, and P. Zhang, “Automatic modulation
classification of overlapped sources using multiple cumulants,” IEEE
ACKNOWLEDGMENT Trans. Veh. Technol., vol. 66, no. 7, pp. 6089–6101, Jul. 2016.
[20] Y. Wang, M. Liu, J. Yang, and G. Gui, “Data-driven deep learning for
This study is being conducted at the University of Nebraska- automatic modulation recognition in cognitive radios,” IEEE Trans. Veh.
Lincoln by the faculty and students at the Advanced Telecom- Technol., vol. 68, no. 4, pp. 4074–4077, Apr. 2019.
munications Engineering Laboratory (www.TEL.unl.edu). [21] Z. Zhang, H. Luo, C. Wang, C. Gan, and Y. Xiang, “Automatic modula-
tion classification using CNN-LSTM based dual-stream structure,” IEEE
Trans. Veh. Technol., vol. 69, no. 11, pp. 13521–13531, Oct. 2020.
R EFERENCES [22] F. Meng, P. Chen, L. Wu, and X. Wang, “Automatic modulation
classification: A deep learning enabled approach,” IEEE Trans. Veh.
[1] L. Han, F. Gao, Z. Li, and O. A. Dobre, “Low complexity automatic Technol., vol. 67, no. 11, pp. 10760–10772, Sep. 2018.
modulation classification based on order-statistics,” IEEE Trans. Wireless [23] Y. Wang, J. Yang, M. Liu, and G. Gui, “LightAMC: Lightweight
Commun., vol. 16, no. 1, pp. 400–411, Jan. 2017. automatic modulation classification via deep learning and compressive
[2] S. Banerjee, M. Hempel, P. Ghasemzadeh, N. Albakay, and H. Sharif, sensing,” IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 3491–3495,
“High speed train wireless communication: Handover performance anal- Mar. 2020.
ysis for different radio access technologies,” in Proc. ASME Joint Rail [24] Z. Zhang, C. Wang, C. Gan, S. Sun, and M. Wang, “Automatic mod-
Conf., vol. 58523, 2019, Art. no. V001T03A006. ulation classification using convolutional neural network with features
[3] S. Amuru and C. R. da Silva, “A blind preprocessor for modulation clas- fusion of SPWVD and BJD,” IEEE Trans. Signal Inf. Process. Netw.,
sification applications in frequency-selective non-Gaussian channels,” vol. 5, no. 3, pp. 469–478, Sep. 2019.
IEEE Trans. Commun., vol. 63, no. 1, pp. 156–169, Jan. 2015. [25] Y. Liu, Y. Liu, and C. Yang, “Modulation recognition with graph
[4] R. Gupta, S. Kumar, and S. Majhi, “Blind modulation classification convolutional network,” IEEE Wireless Commun. Lett., vol. 9, no. 5,
for asynchronous OFDM systems over unknown signal parameters pp. 624–627, May 2020.
and channel statistics,” IEEE Trans. Veh. Technol., vol. 69, no. 5, [26] P. Ghasemzadeh, S. Banerjee, M. Hempel, and H. Sharif, “A novel deep
pp. 5281–5292, May 2020. learning and polar transformation framework for an adaptive automatic
[5] S. Banerjee, M. Hempel, N. Albakay, P. Ghasemzadeh, and H. Sharif, modulation classification,” IEEE Trans. Veh. Technol., vol. 69, no. 11,
“A framework for high-speed passenger train wireless network pp. 13243–13258, Nov. 2020.
radio evaluations,” in Proc. Joint Rail Conf., Apr. 2019, [27] V. N. Ioannidis, A. G. Marques, and G. B. Giannakis, “Tensor graph
Art. no. V001T08A003. convolutional networks for multi-relational and robust learning,” IEEE
[6] H. C. Wu, M. Saquib, and Z. Yun, “Novel automatic modulation Trans. Signal Process., vol. 68, pp. 6535–6546, 2020.
classification using cumulant features for communications via multipath [28] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convolutional neural
channels,” IEEE Trans. Wireless Commun., vol. 7, no. 8, pp. 3098–3105, network architectures for signals supported on graphs,” IEEE Trans.
Aug. 2008. Signal Process., vol. 67, no. 4, pp. 1034–1049, Feb. 2019.
[7] S. Banerjee, M. Hempel, P. Ghasemzadeh, Y. Qian, and H. Sharif, [29] M. Gustineli, “A survey on recently proposed activation functions for
“A novel approach to social-behavioral D2D trust associations using deep learning,” 2022, arXiv:2204.02921.
self-propelled Voronoi,” in Proc. IEEE 90th Veh. Technol. Conf. (VTC- [30] P. Movva, “Survey on activation functions: A comparative study between
Fall), Sep. 2019, pp. 1–5. state-of-the-art activation functions and oscillatory activation functions,”
[8] S. Hu, Y. Pei, P. P. Liang, and Y.-C. Liang, “Deep neural network for Indian Inst. Inf. Technol., Sri City, India, Tech. Rep., 2022. [Online].
robust modulation classification under uncertain noise conditions,” IEEE Available: https://engrxiv.org/preprint/view/2250/version/3345
Trans. Veh. Technol., vol. 69, no. 1, pp. 564–577, Jan. 2020. [31] L. Datta, “A survey on activation functions and their relation with Xavier
[9] N. Gresset, H. Halbauer, J. Koppenborg, W. Zirwas, and H. Khanfir, and He normal initialization,” 2020, arXiv:2004.06632.
“Interference-avoidance techniques: Improving ubiquitous user experi- [32] D.-A. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate
ence,” IEEE Veh. Technol. Mag., vol. 7, no. 4, pp. 37–45, Dec. 2012. deep network learning by exponential linear units (ELUs),” 2015,
[10] S. Rajendran, W. Meert, D. Giustiniano, V. Lenders, and S. Pollin, “Deep arXiv:1511.07289.
learning models for wireless signal classification with distributed low- [33] D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),”
cost spectrum sensors,” IEEE Trans. Cogn. Commun. Netw., vol. 4, no. 3, 2016, arXiv:1606.08415.
pp. 433–445, Sep. 2018. [34] Activation Functions Explained (GELU, SELU, ELU, RELU and More).
[11] Y. Wang, J. Wang, W. Zhang, J. Yang, and G. Gui, “Deep learning-based Accessed: Apr. 17, 2022. [Online]. Available: https://mlfromscratch.
cooperative automatic modulation classification method for MIMO com/activation-functions-explained
systems,” IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4575–4579, [35] V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,”
Apr. 2020. 2017, arXiv:1711.04043.
[12] M. Abu-Romoh, A. Aboutaleb, and Z. Rezki, “Automatic modulation [36] X. Zhang, C. Xu, X. Tian, and D. Tao, “Graph edge convolutional neural
classification using moments and likelihood maximization,” IEEE Com- networks for skeleton-based action recognition,” IEEE Trans. Neural
mun. Lett., vol. 22, no. 5, pp. 938–941, May 2018. Netw. Learn. Syst., vol. 31, no. 8, pp. 3047–3060, Aug. 2019.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.
GHASEMZADEH et al.: GGCNN: AN EFFICIENCY-MAXIMIZING GATED GRAPH CONVOLUTIONAL NETWORK ARCHITECTURE FOR AMI 6047

[37] D. K. Hammond, P. Vandergheynst, and R. Gribonval, “Wavelets on Michael Hempel (Member, IEEE) received
graphs via spectral graph theory,” Appl. Comput. Harmon. Anal., vol. 30, the Ph.D. degree in computer engineering from
no. 2, pp. 129–150, Mar. 2011. the University of Nebraska–Lincoln, Nebraska.
[38] X. Yan, T. Ai, M. Yang, and H. Yin, “A graph convolutional neural He is currently working as a Research Assistant
network for classification of building patterns using spatial vector data,” Professor at the Advanced Telecommunication
ISPRS J. Photogramm. Remote Sens., vol. 150, pp. 259–273, Apr. 2019. Engineering Laboratory (TEL), University of
[39] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural Nebraska–Lincoln. He has authored or coauthored
networks on graphs with fast localized spectral filtering,” in Proc. Adv. more than 150 publications in major international
Neural Inf. Process. Syst., vol. 29, 2016, pp. 1–9. journals and conferences. His research interests
[40] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning include wireless communication protocol design
applied to document recognition,” in Proc. IEEE, vol. 86, no. 11, and performance analysis, wireless multimedia
pp. 2278–2324, Nov. 1998. services, and distributed computing. For his research in networking, he has
[41] T. J. O’Shea, T. Roy, and T. C. Clancy, “Over-the-air deep learning also been developing various network simulation and analysis solutions for
based radio signal classification,” IEEE J. Sel. Topics Signal Process., streaming media and WiFi/WiMAX technologies. He has served as a TPC
vol. 12, no. 1, pp. 168–179, Feb. 2018. member for numerous international conferences.
[42] Holland Computing Center (HCC) at University of Nebraska-Lincoln.
Accessed: May 6, 2021. [Online]. Available: https://hcc.unl.edu/
Honggang Wang (Fellow, IEEE) is currently a Pro-
[43] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
fessor at the University of Massachusetts (UMass)
J. Mach. Learn. Res., vol. 9, no. 11, pp. 2579–2605, 2008.
Dartmouth. His research interests include wireless
health, body area networks, cyber security, mobile
multimedia and cloud, wireless networks and cyber-
physical systems, and big data in mHealth. He has
also been serving as the Editor-in-Chief for IEEE
I NTERNET OF T HINGS J OURNAL.

Pejman Ghasemzadeh (Graduate Student Mem-


ber, IEEE) received the M.Sc. degree in telecom-
munications engineering from the University of
Nebraska–Lincoln, Lincoln, NE, USA, in 2020, Hamid Sharif (Fellow, IEEE) is currently the
where he is currently pursuing the Ph.D. degree Charles J. Vranek Professor with the Department
in computer engineering with the Department of of Electrical and Computer Engineering, University
Electrical and Computer Engineering. He has been of Nebraska–Lincoln (UNL), where he is also the
a Research Assistant at the Advanced Telecom- Director of the Advanced Telecommunication Engi-
munication Engineering Laboratory (TEL). He has neering Laboratory (TEL). He has published about
authored a number of technical research papers cov- 400 research papers in national and international
ering topics in wireless communications, signal pro- journals and conferences. He has been serving on
cessing, backscatter communications, and machine learning in IEEE/ASME many IEEE and other international journal’s editorial
T RANSACTIONS, journals, and international conferences. He has also served boards. He was a recipient of a number of research
as a technical reviewer for several IEEE T RANSACTIONS, journals, and and best papers awards. He is currently a Distin-
international conferences. guished Lecturer of the IEEE Vehicular Technology Society.

Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on May 20,2024 at 19:49:13 UTC from IEEE Xplore. Restrictions apply.

You might also like