Graph Sample and Aggregate-Attention Network For Hyperspectral Image Classification

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL.

19, 2022 5504205

Graph Sample and Aggregate-Attention Network


for Hyperspectral Image Classification
Yao Ding , Xiaofeng Zhao , Zhili Zhang, Wei Cai, and Nengjun Yang

Abstract— Graph convolutional network (GCN) has shown Deep learning has achieved great success in many appli-
potential in hyperspectral image (HSI) classification. However, cations and it has also greatly promoted the technological
GCN is a transductive learning method, which is difficult to progress of HSI classification. For example, in [6] and [7],
aggregate the new node. The available GCN-based methods fail the HSIs were classified using different dimensional con-
to understand the global and contextual information of the volutions; Liu et al. [8] introduced a spectral–spatial feature
graph. To address this deficiency, a novel semisupervised net-
extraction method based on the long and short-term memory
work based on graph sample and aggregate-attention (SAGE-A)
for HSIs’ classification is proposed. Different from the GCN- artificial neural networks (LSTM) network. Zhang et al. [9]
based method, SAGE-A adopts a multilevel graph sample and used image semantic context to classify HSIs. He et al. [10]
aggregate (graphSAGE) network, as it can flexibly aggregate the adopted residual networks to learn spatial and spectral charac-
new neighbor node among arbitrarily structured non-Euclidean teristics of the image to improve the classification rate. In [11],
data and capture long-range contextual relations. Inspired by an unsupervised spectral–spatial feature extraction network
the convolution neural network (CNN) self-attention mechanism, was proposed. However, convolution neural network (CNN)
the proposed network uses the graph attention mechanism to needs a large number of training labels and calculation.
characterize the importance among spatially neighboring regions, Simultaneously, the CNN only performs convolution on the
so the deep contextual and global information of the graph can be regular region. Furthermore, the size of the CNN convolution
learned automatically by focusing on important spatial targets.
Extensive experimental results on different real hyperspectral
kernel is fixed, which will lead to edges missing phenomenon
data sets demonstrate the performances of our proposed method in the process of feature extraction [12].
compared with the state-of-the-art methods. To ameliorate these issues, people have conducted exten-
sive researches on classification using graph convolution
Index Terms— Global and contextual information, graph networks (GCNs). The GCN conducts semisupervised learn-
convolution neural network, hyperspectral image (HSI) ing on graph-structured data and can operate on graph
classification.
signals directly via a variant of CNNs. Sha et al. [13]
applied the graph attention network to hyperspectral clas-
I. I NTRODUCTION sification. Mou et al. [14] proposed a nonlocal graph con-
volution network, which constructs a graph by calculating
H YPERSPECTRAL images (HSIs) provide detailed spec-
tral information through hundreds of (narrow) spectral
channels, which can be used to accurately classify diverse
the relationship between nonadjacent pixels to improve the
classification accuracy. Hong et al. [15] proposed a graph
materials of interest [1], [2]. However, the increased dimen- convolution classification method combining GCN and CNN
sionality of such data provides a challenge to conventional to increase the classification accuracy. Wan et al. [16] used a
techniques, and hyperspectral classification has great research multiscale graph convolutional network to extract multiscale
value. graph features. Wan et al. [17] adopted a context-aware mech-
In the past few decades, people have conducted significant anism to learn the local contextual of the graph. The mentioned
efforts on HSI classification, which can be summarized into methods are GCN-based. However, GCN is a transductive
two categories: traditional methods and neural network meth- learning and whole graph training method, which is difficult
ods. Traditional methods have made some efforts on explored to aggregate the new node and will bring a huge amount of
more discriminative feature representations, such as morpho- computation.
logical features and texture features [3]. Apart from these The main contributions in this letter are as follows: 1) incor-
subspaces learning, sparse learning algorithms and machine poration of sample and aggregate (SAGE) (first time) for
learning, such as random forest and support vector machine extracting contextual relations among superpixels; 2) utiliza-
(SVM) [4], [5], have received great attention in the community. tion of multilevel graph projection and flexible reprojec-
However, traditional methods have defects in feature extraction tion framework for extracting long-range contextual relations
completeness and may suffer from overfitting because of the and producing truthful local-region features; and 3) adoption
deficiency in training samples. of attention mechanism graph refinement for characterizing
global and contextual relations and accurately finding precise
Manuscript received February 8, 2021; accepted February 22, 2021. Date region representations.
of publication March 15, 2021; date of current version December 28, 2021.
This work was supported in part by the National Natural Science Foundation II. R ELATED W ORK
of China under Grant 41404022 and in part by the National Natural Science
Foundation of Shanxi Province Grant 2015JM4128. (Xiaofeng Zhao is co-first Many researchers have published their methods to classify
author.) (Corresponding author: Xiaofeng Zhao.)
The authors are with the Xi’an Research Institute of High Technology, Xi’an
HSIs. In this part, we mainly introduce the graph neural
710000, China (e-mail: xife_zhao@163.com). network (GNN) method, which has a lot of relationships with
Digital Object Identifier 10.1109/LGRS.2021.3062944 our work.
1558-0571 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
5504205 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022

A. Spectral Graph Convolution


For a graph G = (V , E), the Laplacian matrix L is
expressed as
L= D− A (1)
where V represents the vertex set, E represents the edge set,
D is the degree matrix of the graph, and A is the adjacency
Fig. 1. Overview of the SAGE-A network. (a) HSI. (b) Superpixels seg-
matrix of the graph. mented by the SLIC algorithm. (c) Circles and lines represent the superpixel
The corresponding symmetrically normalized Laplacian L n (graph node) and edges, respectively. Different colors of the nodes represent
L n = I − D 2 AD − 2
1 1
(2) different land-cover types and the input of the network is the spectral
characteristics of each node. (d) Multilevel new node embedding using SAGE
where I is an identity matrix. and attention mechanism. (e) Output of the network.
To conduct node embedding for G, a filter g θ = diag(θ ) this issue, we find that neighbor pixels have a large probability
parameterized by θ ∈ R N is defined, which can be expressed to belong to the same land-cover type. Therefore, simple linear
as the multiplication of a signal x (a scalar for each node) iterative clustering (SLIC) has been adopted to segment the
with g θ the Fourier domain, that is, entire image into a small number of local regions, and the
g θ ∗ x = U g θ ()U T x (3) pixels consisted of regions that have a strong spectral–spatial
where U is the matrix of eigenvectors of L n and can be similarity. Concretely, SLIC conducts image region segmenta-
computed by L n = UU −1 .  is the diagonal matrix of tion via iteratively growing the local clusters using a k-means
eigenvalues of L n . g θ can be understood as a function of algorithm. In the letter, the local regions are treated as the
eigenvalues of , that is, g θ (). However, note that evaluat- graph node, which can significantly reduce the number of
ing (3) requires explicitly calculating. We could approximately graph nodes and improve the computational efficiency. Here,
fit g θ () using a truncated expansion in terms of Chebyshev the average spectral signatures of the involved pixels in the
polynomials up to the K th order, which leads to node (local region) are taken as the feature vector of the node.
K
g θ () ≈ θ k T k (L n ) (4) B. GraphSAGE
k=0 Traditional GCN models assume that the adjacent pixels
where T k is the Chebyshev polynomials. on a graph are more likely to share the same label, and
Then we could simplify (4) K = 1, which is the first-order thus the label information can be propagated to unlabeled
approximation by Chebyshev polynomials, and the largest samples from labeled samples via graph Laplacian regular-
eigenvalue λmax ≈ 2. Equation (4) can be rewritten as ization. However, transductive learning requires all nodes to
 1  participate in training to get node embedding, and it cannot
−1
g θ ∗ x ≈ θ (I N + L)x = θ D̃ 2 Ã D̃ 2 x. (5) quickly get embedding of new nodes. Therefore, GCN can
only learn the information about neighboring nodes and cannot
With the activation layer σ , the GCN propagation rule is as naturally generalize to unknown vertices. In this letter, a graph
follows:  1  SAGE (graphSAGE) algorithm is adopted to learn more spatial
−1
H (l+1) = σ D̃ 2 Ã D̃ 2 H (l) W (l) (6) scale information, which improves the generalization ability
of the model for new nodes. The forward propagation rule is
where H (l+1) and H (l) denote the values of l + 1 and l layers, expressed as Algorithm 1.
respectively, and W is the weight matrix.
B. Spatial Graph Convolution Algorithm 1 GraphSAGE Embedding Generation (i.e., For-
According to the traditional CNN operation on the image, ward Propagation) Algorithm
the spatial-based GNN defines the graph convolution opera- Input: Graph G = (V , E); input features {x v , ∀v ∈ V };
tor based on the spatial relationship of a node. The image the number of layers of the network K ; weight matrices
is regarded as a special graph, and each pixel represents W k , ∀k ∈ {1, . . . ,K }; non-linearity σ ; mean aggregator
a node; because the adjacent nodes have a fixed order, functions AGG; neighborhood function N : v → 2v
the training weights can be shared with different local spaces. Output: Vector representations for all v ∈ V
Spatial-based GNN method has better efficiency, flexibility, 1: h 0 ← x v , ∀v ∈ V ;
and versatility compared with GCN. For details of these
2: for k = {1, . . . ,K } do
algorithms, the readers can refer to relevant papers.
3: for v ∈ V do
III. P ROPOSED M ETHOD hkN(v) ← AGG({hk−1
4: u , ∀u ∈ N(v)});
In this section, we will present SAGE-attention (SAGE-A) 5: hkv ← σ (W k · C O NC AT (hk−1
v , h N (v) ))
k
for HSI classification (Fig. 1). It is mainly composed of three 6: end
parts, including pixel-to-region assignment (Section III-A), hk
7: hkv ← hkv , v ∈ V
contextual relations refinement (Sections III-B and III-C), and v 2
8: end
region-to-pixel assignment (Section III-D).
Output: z v ← hvK , v ∈ V
A. Pixel-to-Region Assignment In Algorithm 1, K is the number of layers of the network,
HSI contains a large number of pixels in the spatial which also represents the number of hops of adjacent points
dimension, and a huge amount of computation is needed to that can be aggregated at each vertex, because each additional
classification and sometimes it is unacceptable. To ameliorate layer can aggregate the information about the neighbors of

Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
DING et al.: GRAPH SAGE-A NETWORK FOR HSI CLASSIFICATION 5504205

Fig. 3. Network multilevel information learning.

Fig. 2. Schematic of the graph attention mechanism [see (9)]. l denotes D. Region-to-Pixel Assignment
the lth layer of the network. Different colors of the nodes represent different
land-cover type features. Multiscale information has been widely proved to be very
useful for HSI classification [18], [19]. Ground objects have
different geometric features, and multilevel feature extraction
a further layer, ∀u is the eigenvector of the node u, {hk−1u , can fully learn the contextual information of the image. The
∀u ∈ N(v)} denotes the embedding of the neighbor U of the
network uses multilayer graphSAGE to learn the relationship
node V in the k − 1 layer, and hkv represents the characteristic between superpixels of different scales as Algorithm 1. Fig. 3
of all neighbors of node v at the k level. In this letter, demonstrates the 1-hop and 2-hop neighbors of a central exam-
aggregator
 functions (AGG) can be expressed as AGG = ple A to illustrate the multilevel design. Then, the receptive
u∈N (v) u )/(|N(v)|). From Algorithm 1, for each iteration
(h k−1
field of A at the scale s is formed as
or search depth, the nodes collect information from their local
H s (x i ) = H 1 (H s−1 (x i ), x s−1 ) (10)
neighbors, and as the process iterates, the nodes gradually get
more and more information from farther reaches of the graph. where H 0 (x i ) = x i , H 1 (x i ) is the new node embedding
Thus, long-range contextual relations are extracted. of 1-hop neighbors of x i . Considering that the information
association degree of different nodes is different, we use (10)
to analyze the association degree of the learned information.
C. Graph Attention The network output is expressed as follows:
In the experiment, we find that the association degree O = A(H s (x i )) (11)
between different nodes is different. To extract the global and
contextual information better, a graph attention mechanism is where A is the attention mechanism, and O is the output of
added into the network to make the important node infor- SAGE-A. In our network, the cross-entropy error is adopted
mation have greater weight. The graph attention mechanism to penalize the difference between the network output and the
can obtain the global geometric features by calculating the labels of the original labeled examples, namely
relationship between any two nodes in the graph. To get 
C
the corresponding transformation between the input and the L=− Y z f ln O z f (12)
output, it is necessary to obtain the output features by linear z∈ y G f =1
transformation according to the input characteristics at least
once. A weight matrix is trained for all nodes: W ∈ R F ×F ,
where yG is the labeled examples set, C denotes the number
which is the relationship between the input features F and the of classes, and Y z f is the label matrix. The details of our
output features F . Node-to-node correlation can be learned SAGE-A are shown in Algorithm 2. The input feature of the
through the network layer SAGE-A is the average spectral signatures of the graph nodes,
    which enables the network to process the spectral information
ei j = LeakyReLU aT W hi ||W h j . (7) about HSIs. At the same time, the SAGE method is adopted
Equation (7) shows the importance of node j to node i , to process the spatial relationship of the nodes in the graph
a T ∈ R 2F is the parameter vector of the network, || denotes network, so that the model can learn the long-range spatial
the concatenation operation, and LeakyReLU(·) is a nonlinear information of the HSIs, and the graph attention mechanism is
layer. used to process the overall information of the graph to learn the
Then, normalizing and converting ei j to a probability output global and contextual information of each node in the graph.
ai j through a softmax
 function    IV. E XPERIMENTAL R ESULTS
exp LeakyReLU a T W hi ||W h j A. Data Set Description and Implementation
ai j =      . (8)
k∈Ni exp LeakyReLU a W h i ||W h j
T
Two real hyperspectral data sets of Pavia University (PU)
Therefore, the graph convolution output of each node can and Houston 2013 are adopted to verify the classification
be expressed as follows: performance of our proposed method. The first data set PU
⎛ ⎞ contains 610 × 340 pixels and 103 bands, including a large
 number of background pixels, and 42 776 pixels can be applied
hli = σ ⎝ ai j · W T hl−1
i
⎠ (9)
to classification. The whole map contains nine kinds of fea-
j ∈Ni
tures. The second data set Houston 2013 has been used in the
where σ denotes the activate function, and ai j is the learned 2013 Geoscience and Remote Sensing Society (GRSS) Data
attention weight. Fusion Contest. The Salinas scene is composed of 144 spectral
Fig. 2 shows the working process of the graph attention bands and 349 × 1905 pixels. These pixels are divided into
mechanism in SAGE-A. By learning the importance weight 15 categories.
of each node to the classified node, the graph attention
mechanism makes the important nodes have greater weight, B. Experimental Setting
and hence, global and contextual information can be learned For the two HSI data sets described in Section IV-A,
from the graph via an attention mechanism. 30 labeled pixels in each class are randomly selected for

Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
5504205 IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, VOL. 19, 2022

Algorithm 2 Proposed SAGE-A for HSI Classification TABLE I


A CCURACY C OMPARISONS FOR THE PU S CENE .
Input: Input image; number of epoch T = 2000; learn- B OLD N UMBERS I NDICATE T HE B EST P ERFORMANCE
ing rate = 0.0001; number of graph convolutional layers
L = 3; dropout = 0.2; Adam gradient descent;
python = 3.8; pytorch = 1.6.0.
1: Segment the whole image into super-pixels via SLIC
algorithm;
2: Extract the superpixels input features (average spectral
signatures);
3: //Construct the graph and train the SAGE-A model
4: for t = 1 to T do
5: Graph convolution notes feature;
6: Perform graph learning at adjacent points spatial
feature by Algorithm 1; TABLE II
7: Bachnormalzation, dropout and relu; A CCURACY C OMPARISONS FOR THE H OUSTON 2013 S CENE .
8: Perform graph learning at adjacent points and farther B OLD N UMBERS I NDICATE T HE B EST P ERFORMANCE
points spatial level by Algorithm 1;
9: Bachnormalzation and relu;
10: //Graph convolution attention mechanism
11: Perform graph learning at global level by Eq. (9) and
output the spatial and spectral feature;
12: Calculate the error term according to Eq. (12) and
update the weight matrices using Adam gradient
descent;
13: end for
14: Conduct label prediction based on the trained network;
Output: Predicted label for each pixel.

network training, and the remaining unlabeled pixels are


used for network testing. The hyperparameters’ selection
in our SAGE-A is shown in Algorithm 2. To validate the
performance of SAGE-A, the other six state-of-the-art image
classification methods are used to conduct a comparison.
Specifically, our network is compared with two CNN-based
methods, that is, convolution autoencoder (CAE) [11] and methods are difficult to learn the long-range contextual infor-
convolutional recurrent neural network (CRNN) [8], and mation of the graph. While our proposed SAGE-A aggregates
two GCN-based methods, that is, context-aware dynamic the different levels of nodes (superpixel) via adjusting the
graph convolutional network (CAD-GCN) [17] and multiscale node embedding layers of the network. The employment of an
dynamic graph convolutional network (MDGCN) [16]. attention mechanism enables SAGE-A to automatically learn
Meanwhile, two traditional machine learning methods the global and contextual information of the graph.
are also adopted, namely, multiband compact texture unit
(MBCTU) [3] and Joint collaborative representation and SVM D. Impact of Parameters/Hyperparameters
with Decision Fusion (JSDF) [4]. Overall accuracy (OA), Many significant parameters/hyperparameters should be
average accuracy (AA), kappa coefficient (κ), and per-class tuned in the proposed SAGE-A architecture. In the experiment,
accuracy are adopted and used as evaluation indices. the sensitivity of the classification performance to different
hyperparameter settings will be evaluated in detail.
Fig. 4 demonstrates the classification performances of the
C. Comparisons With Other Methods seven algorithms with different numbers of labeled examples
From Tables I and II, we conclude that the proposed (i.e., pixels) for training being investigated. We vary the
SAGE-A achieves better results compared with the other state- number of labeled examples per class from 5 to 30 with
of-the-art models in OA, AA, and κ, which validates the an interval of 5 and report the OA performance acquired
effectiveness of the proposed multilevel graphSAGE network by seven algorithms on PU and Houston 2013 data sets.
with an attention mechanism. It is also notable that the From the results, we can find that the OA of each proposed
GCN-based methods perform better than multiband compact method in the PU and Houston 2013 data sets has been
texture unit (MBCUT), JSDF, CAE, and CRNN. This is significantly improved with the increase in labeled examples
because the graph convolution network can learn the relations per class. Besides, the proposed SAGE-A model performs
among neighbor nodes automatically, which is suitable for better than the contrast algorithms from beginning to end,
classification with limited labeled training samples. However, which shows the effectiveness of multilevel spatial information
GCN is a transductive learning method, which is difficult to on HSI classification. Furthermore, the proposed SAGE-A
aggregate the new node. In another word, the GCN-based allows to automaticity learn global contextual features based

Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.
DING et al.: GRAPH SAGE-A NETWORK FOR HSI CLASSIFICATION 5504205

beyond the regular image grids by adopting the pixel-to region


(superpixel) assignment and further encode the contextual
relations among local regions, so that originally long-range
local node in the 2-D space can be connected by multilevel
SAGE. Moreover, we learn the importance weight of each
node to the classified node, and therefore, global and con-
textual relations among pixels can be gradually refined, and
local-region features can be represented precisely.

Fig. 4. OAs of various methods under different numbers of labeled examples R EFERENCES
per class. (a) University of Pavia data set. (b) Houston 2013 data set. [1] B. Rasti et al., “Feature extraction for hyperspectral imagery: The evo-
lution from shallow to deep: Overview and toolbox,” IEEE Geosci.
Remote Sens. Mag., vol. 8, no. 4, pp. 60–88, Dec. 2020, doi:
10.1109/MGRS.2020.2979764.
[2] P. Zhong, Z. Gong, and J. Shan, “Multiple instance learning for multiple
diverse hyperspectral target characterizations,” IEEE Trans. Neural Netw.
Learn. Syst., vol. 31, no. 1, pp. 246–258, Jan. 2020.
[3] K. Djerriri, A. Safia, R. Adjoudj, and M. S. Karoui, “Improving
hyperspectral image classification by combining spectral and multiband
compact texture features,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Jul. 2019, pp. 465–468.
[4] C. Bo, H. Lu, and D. Wang, “Hyperspectral image classification via JCR
and SVM models with decision fusion,” IEEE Geosci. Remote Sens.
Lett., vol. 13, no. 2, pp. 177–181, Feb. 2016.
Fig. 5. Parametric sensitivity of l and S. (a) PU data set. (b) Houston
[5] L. Wang, S. Hao, Q. Wang, and Y. Wang, “Semi-supervised classification
2013 data set.
for hyperspectral imagery based on spatial-spectral label propagation,”
TABLE III ISPRS J. Photogramm. Remote Sens., vol. 97, pp. 123–137, Nov. 2014.
OA, AA (%), AND K APPA C OEFFICIENT A CHIEVED BY D IFFERENT D ATA [6] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional
S ETS . M ODEL S ETTINGS ON PU A ND HOUSTON 2013 D ATA S ETS neural networks for hyperspectral image classification,” J. Sensors,
vol. 2015, pp. 1–12, Jul. 2015.
[7] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deep
supervised learning for hyperspectral data classification through con-
volutional neural networks,” in Proc. IEEE Int. Geosci. Remote Sens.
Symp. (IGARSS), Jul. 2015, pp. 4959–4962.
[8] Q. Liu, F. Zhou, R. Hang, and X. Yuan, “Bidirectional-convolutional
LSTM based spectral-spatial feature learning for hyperspectral image
classification,” Remote Sens., vol. 9, no. 12, p. 1330, Dec. 2017.
[9] M. Zhang, W. Li, and Q. Du, “Diverse region-based CNN for hyperspec-
on classified land cover, which is more robust than using a tral image classification,” IEEE Trans. Image Process., vol. 27, no. 6,
precomputed fixed graph. pp. 2623–2634, Jun. 2018.
The impact of convolution layers l and segment scales S on [10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
the two data sets is revealed in Fig. 5. We can conclude that image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2016, pp. 1–9.
both l and S have a significant impact on the classification [11] R. Kemker and C. Kanan, “Self-taught feature learning for hyperspectral
accuracies. Meanwhile, the best result is usually reached with image classification,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5,
convolution layer 3. Multilevel is able to learn more spatial pp. 2693–2705, May 2017, doi: 10.1109/TGRS.2017.2651639.
information at a larger scale. However, the characteristics [12] D. Hong, N. Yokoya, J. Chanussot, and X. X. Zhu, “An augmented linear
mixing model to address spectral variability for hyperspectral unmixing,”
learned through iteration or search depth have an inhibitory IEEE Trans. Image Process., vol. 28, no. 4, pp. 1923–1938, Apr. 2019.
effect on the classification due to the low correlation. For [13] A. Sha, B. Wang, X. Wu, and L. Zhang, “Semisupervised classification
S, the classification accuracies would increase as the seg- for hyperspectral images using graph attention networks,” IEEE Geosci.
ment scale increases. However, the amount of calculation Remote Sens. Lett., vol. 18, no. 1, pp. 157–161, Jan. 2021.
also increases exponentially, which may be unaccepted under [14] L. Mou, X. Lu, X. Li, and X. X. Zhu, “Nonlocal graph convolu-
tional networks for hyperspectral image classification,” IEEE Trans.
limited experimental conditions. In our proposed method, the Geosci. Remote Sens., vol. 58, no. 12, pp. 8246–8257, Dec. 2020, doi:
segment scale S is 30 000, which has reached the limits of 10.1109/TGRS.2020.2973363.
computing. [15] D. Hong, L. Gao, J. Yao, B. Zhang, A. Plaza, and J. Chanussot,
“Graph convolutional networks for hyperspectral image classification,”
E. Ablation Study IEEE Trans. Geosci. Remote Sens., early access, Aug. 18, 2020, doi:
10.1109/TGRS.2020.3015157.
In this experiment, we investigate the ablative effect [16] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale
of the SAGE-based attention mechanism. For the sake of dynamic graph convolutional network for hyperspectral image classifica-
comparison, we record the classification results produced tion,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, pp. 3162–3177,
without using an attention mechanism, and the simplified May 2020.
[17] S. Wan, C. Gong, P. Zhong, S. Pan, G. Li, and J. Yang, “Hyperspec-
model is denoted as “SAGE.” The experimental setting tral image classification with context-aware dynamic graph convolu-
is kept identical to Section IV-B. The comparative results tional network,” IEEE Trans. Geosci. Remote Sens., vol. 59, no. 1,
are demonstrated in Table III. As shown in the table, pp. 597–612, Jan. 2021, doi: 10.1109/TGRS.2020.2994205.
the SAGE-based attention mechanism plays an important role [18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
in the improvement of learning efficiency. R. Salakhutdinov, “Dropout: A simple way to prevent neural networks
from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,
V. C ONCLUSION Jun. 2014.
[19] S. Zhang and S. Li, “Spectral-spatial classification of hyperspectral
In this letter, a novel SAGE-A for HSI classification is images via multiscale superpixels based sparse representation,” in Proc.
proposed. To extract long-range contextual relations, we go IEEE IGARSS, Jul. 2016, pp. 2423–2426.

Authorized licensed use limited to: Southeast University. Downloaded on May 16,2023 at 06:45:21 UTC from IEEE Xplore. Restrictions apply.

You might also like