koumoutsou2020

A Deep Learning approach to Hyperspectral Image Classification
using an improved Hybrid 3D-2D Convolutional Neural Network

DIMITRA KOUMOUTSOU ELENI CHAROU
National Technical University of Athens, Department of National Centre for Scientific Research “Demokritos”,
Electrical and Computer Engineering, Athens, Greece Institute of Informatics & Telecommunications, Athens,
dkoumoutsou@gmail.com Greece
exarou@iit.demokritos.gr
ABSTRACT 1 INTRODUCTION
In recent years, the task of Hyperspectral Image (HSI) classification Hyperspectral Imaging is a combination of imaging and spec-
has appeared in various fields, including Remote Sensing. Mean- troscopy technology. Hyperspectral data are acquired over multiple
while, the evolution of Deep Learning, and the prevalence of the spectral bands, but unlike multispectral imaging, the measurement
Convolutional Neural Network (CNN) has revolutionized the way is performed at hundreds of contiguous and narrow wavelength
unstructured, especially visual, data are processed. 2D CNN have intervals. The spectral resolution of a sensor is the narrowest band-
proved highly efficient in exploiting the spatial information of im- width it can measure radiation in, and the spatial resolution is the
ages, but in HSI classification, data contain both spectral and spatial distance between two distinct objects. Together they determine the
features. To make use of these characteristics, many variations of a accuracy of the features that can be extracted. HSI has emerged in
3D CNN have been proposed, but a 3D Convolution comes at a high Remote Sensing due to its property of containing both spatial and
computational cost. A fusion of 3D and 2D convolutions decreases spectral high-resolution information. Since hyperspectral sensors
processing time by distributing spectral-spatial feature extraction capture the light reflected by objects in different spectra, which
across a lighter, less complex model. An enhanced Hybrid network is characteristic of different materials [1], the abundance of in-
architecture is proposed alongside a data preprocessing plan, with formation in these data can ameliorate the detection of objects
the aim of achieving a significant improvement in classification and land-cover types for classification applications. Hyperspectral
results. Four benchmark datasets (Indian Pines, Pavia University, imaging is a promising technology with contributions in numer-
Salinas and Data Fusion 2013 Contest) are used to compare the ous branches of science other than Remote Sensing, including, but
model to other hand-crafted or deep learning architectures. It is not limited to, environmental monitoring, medical diagnosis, food
demonstrated that the proposed network outperforms state-of-the- analysis, agriculture, mineralogy and chemical imaging [2].
art approaches in terms of classification accuracy, while avoiding Although it would seem that denser information collected by
some commonly used, computationally expensive design choices. sensors will result in more accurate classification outcomes, this is
not always the case with hyperspectral data. The Curse of Dimen-
CCS CONCEPTS sionality, or Hughes effect, was first mentioned by Richard Bellman
• Computing Methodologies; • Machine Learning; • Machine in 1961 in his work in Dynamic Programming. The Curse of Dimen-
Learning Algorithms; • Spectral Methods; • Machine Learn- sionality is the term used to describe the issue of “an increase in
ing Approaches; • Neural Networks; • Applied Computing; • sparseness and dissimilarity of data” [3, 4]. The problem is that an
Computers in other domains; • Agriculture; increase in dimensionality will actually deteriorate classification,
affecting clustering, and the categorization of objects, materials or
KEYWORDS targets in general. In HSI, the features generated by continuous
Hyperspectral Image, Classification, Deep Learning, Convolutional spectral bands can be so many that the available training samples
Neural Network, Remote Sensing are inadequate and the trained model cannot distinguish one feature
from another. Consequently, we encounter a very common issue
ACM Reference Format: in Machine Learning called overfitting, meaning the inability of a
DIMITRA KOUMOUTSOU and ELENI CHAROU. 2020. A Deep Learning
classifier to generalize, that results in poor classification outcomes.
approach to Hyperspectral Image Classification using an improved Hybrid
In order to harvest the vast amount of information stored in
3D-2D Convolutional Neural Network. In 11th Hellenic Conference on Artifi-
cial Intelligence (SETN 2020), September 02–04, 2020, Athens, Greece. ACM, hyperspectral data, while avoiding the downsides, the branch of
New York, NY, USA, 8 pages. https://doi.org/10.1145/3411408.3411462 mathematics called Dimensionality Reduction (DR) comes to aid.
Since many channels in a hyperspectral image are correlated and/or
Permission to make digital or hard copies of all or part of this work for personal or noisy, redundant data could be eliminated from datasets. The chal-
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation lenge of Dimensionality Reduction is to dispose of unnecessary or
on the first page. Copyrights for components of this work owned by others than ACM low quality information, while retaining the total of the significant
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
features about the targets of interest. In Remote Sensing HSI appli-
fee. Request permissions from permissions@acm.org. cations Dimensionality Reduction is used as a preprocessing step,
SETN 2020, September 02–04, 2020, Athens, Greece mainly in two main forms, feature extraction and band selection.
© 2020 Association for Computing Machinery. The first transforms the original image onto a lower-dimension
ACM ISBN 978-1-4503-8878-8/20/09. . . $15.00
https://doi.org/10.1145/3411408.3411462
85
SETN 2020, September 02–04, 2020, Athens, Greece Dimitra Koumoutsou and Eleni Charou
space through a projection, while the latter selects a subset of the For that reason, we observe a significant improvement in classi-
original bands using various criteria. fication outcomes from Deep Neural Networks when compared
The rest of the paper is organized in 4 Sections. Section 2 de- to hand-crafted machine learning methods. Different versions of
scribes previous work on DR and classification techniques. Then, DNN have been exploited by Remote Sensing, but the CNN stands
Section 3 outlines the methodology and a detailed outline of the out. A CNN is a subset of DNNs with convolutional layers used
proposed preprocessing steps and the overall network. In Section where hidden layers would be in a conventional neural network.
4, experiments are carried out and discussed and finally, in Section Convolutions performed on each layer’s input are the way in which
5 the paper concludes on the proposed approach. the layer learns the features. A CNN can be 1D, 2D or 3D depending
on the type of kernel used in each layer. A 1D Convolution layer
is a convolution with a 1D filter that can only move across one
dimension on the input data, which could be useful for example in
2 PREVIOUS WORK time-series applications. 2D convolutional layers can learn spatial
As previously mentioned, band selection identifies the most features across a 2D object, for example an image with pixels (x,y).
information-dense channels from the original data. The selected For that reason, 2D CNN is a standard method for conventional
bands retain all their physical characteristics, unlike feature extrac- imaging applications [15]. In the same fashion, in a 3D CNN the
tion algorithms. Unsupervised band selection can be ranking-based, kernel moves across 3 directions, enabling simultaneous spectral-
meaning that the employed criteria consider band information spatial feature extraction, for example in 3D or multidimensional
and properties to select the most information-dense ones. Princi- imaging.
pal Component Analysis (PCA) has been utilized to rank bands Based on the characteristics described above, CNNs are ex-
according to minimum variance [5]. Another approach based on tremely promising in multiple fields like language processing [16],
Independent Component Analysis (ICA) picks bands by comparing especially when visual information is of interest (medical appli-
the average absolute weight coefficient of independent components cations [17, 18] image classification [19] etc). In Remote Sensing,
[6]. many CNN models have been proposed. A 2D-CNN was proposed
Feature extraction can be supervised or unsupervised. PCA is by Makantasis et al. [20] for spatial classification which showed
widely used as an unsupervised method in HSI applications [7–9]. great potential. However, plenty of data in the spectral domain
ICA, also unsupervised, was studied in different variations by Falco were left unused. Soon, 3D-CNNs outperformed previous methods
et al. [10], including fastICA, INFOMAX, JADE and RobustICA, [21, 22] because a 3D network is able to handle both spectral and
giving good results, especially the last one. Supervised feature ex- spatial information. But despite their excellent classification re-
traction was implemented with Linear Discriminant Analysis (LDA) sults, 3D-CNNs come at a high computational cost, so new, “lighter”
by Joy, A.A. et al. [11] and was compared to PCA and a fusion of techniques had to be discovered [23].
both. Best classification outcomes were achieved by PCA alone, fol- Recently, a state-of-the-art CNN approach was proposed by Roy,
lowed by LDA alone, followed by their combination. Depending on S.K. et al. [24]. The model is called HybridSN and is a 3D-2D CNN,
whether labelled data is available as training samples, classification with the 3D convolution performing the joint spectral-spatial fea-
can be either supervised or unsupervised. In this paper we focus ture recognition, followed by a 2D convolution to provide more
on supervised methods, which are more precise since they utilize depth in for spatial data. It is notable not only for its accuracy,
class-specific information [12]. but also for the decreased computational complexity that burdens
Neural Networks have been used in HSI classification for many classic 3D-CNNs. Here, we propose an improved version of the
years with great success, from basic applications in the 1990s [13], HybridSN original approach. The purpose of our work is two-fold:
to more complex ones [8]. Another widely applied method is a Firstly, to present the altered CNN model, justifying each step and
Support Vector Machine (SVM), for the reason that it can produce proposition. Secondly, to discuss different preprocessing methods
accurate results in high-dimensional data even in the absence of and carry out experiments to conclude on a full pre-processing
large training datasets. Since SVMs originated in linear classifica- plan that will achieve accurate and efficient classification. Finally,
tion, kernels have been introduced to handle nonlinear problems, a conclusion is drawn for the optimal classification method, with
with the most common in Remote Sensing being the Gaussian RBF detailed explanations on parameters that affect performance, and
[12]. SVMs are extremely popular in HSI classification [14] and are our improved accuracy is confirmed on several reference datasets.
commonly used when preprocessing methods are tested [3, 11], as
simple but reliable classifiers to evaluate those studies. In recent 3 METHODOLOGY
years, neural networks with more layers have emerged, called Deep
An outline of the network’s architecture is shown in Figure 1(a) and
Neural Networks (DNN) and constitute a promising area of study
the overall classification steps in Figure 1(b). Specific parameters are
called Deep Learning. DNNs are models that automate the feature
discussed below, alongside an introduction to the different dimen-
construction process, otherwise performed by hand, by extract-
sionality reduction techniques which will be tested and evaluated
ing higher-level features from lower-level ones. Apart from some
in Section 4.
fine-tuning, the learning process is regulated by the DNN, which de-
termines on its own how feature learning will be distributed across
its layers in an optimal way. In sophisticated applications, with large 3.1 Feature Extraction
datasets or complex unstructured data types, it is rarely easy for The PCA transformation, as previously mentioned, is the prevalent
humans to adequately determine which features are of importance. unsupervised feature extraction method in HSI data pre-processing.
86
A Deep Learning approach to Hyperspectral Image Classification using an improved Hybrid 3D-2D Convolutional
Neural Network SETN 2020, September 02–04, 2020, Athens, Greece
Figure 1: Graphical Outline of the proposed Model for HSI Classification (a) 3D-2D CNN architecture (b) Preprocessing, training
and evaluation steps
PCA employs global second-order statistics to derive Principal ICA variations. One is the very common FastICA and the second
Components (PCs), which are linear transformations that contain is CoroICA, a novel method proposed by [28]. To the best of our
almost all of the total variance. A very important property of PCA is knowledge, CoroICA has not been evaluated on hyperspectral data
whitening, meaning that it decorrelates the original data. Whitening before.
the data is crucial because of the sparseness of HSI. Unwhitened
data contain redundant and highly correlated bands that increase
computational time and decrease accuracy due to the Curse of
3.2 Proposed 3D-2D Model
Dimensionality that is explained above. The hyperspectral image is represented as a 3D tensor of size (H,
ICA, also unsupervised, searches for a linear transformation to W, C) with H denoting the height, W the width and D the num-
minimize statistical dependence between components and Inde- ber of spectral bands. The first preprocessing step is one of the
pendent Components (ICs) are derived that, like PCA, project the aforementioned dimensionality reduction techniques, which aims
original data to a lower dimensional feature space. The idea be- to decrease spectral complexity while preserving all of the spatial
hind ICA is that it can extract ICs that represent distinct materials features, and will be evaluated in the following section. After pro-
in the original image, solely by transforming the mixture signal jecting the tensor to a lower dimensional space, its shape becomes
directly from the sensor. A common approach in literature when (H, W, N), with N denoting the number of components used in the
using ICA in Remote Sensing HSI is to first perform PCA [25, 26]. reduction step.
As mentioned, other dimensionality reduction techniques prior to The next step is patch extraction. Here, overlapping patches are
ICA feature extraction were thoroughly studied in [10]. In our pa- used, which are small 3D cubes, centered at a pixel (x,y) of the
per, both PCA and ICA are studied, as well as their combinations. hyperspectral plane, formed when a SxS cube is extracted around
Applying PCA first is expected to improve ICA performance for those pixels. Each patch is assigned the label of its central pixel (x,y)
the following reasons. Firstly, the low-variance components con- and is then given as in input to our CNN. As a tensor, a patch of size
tained in the original data are discarded. Secondly, PCA whitens W is represented as having (S, S, N) dimensions and also contains
the input to the ICA algorithm, fixing the issue of highly correlated spectral and spatial information. The patch size is crucial to the
bands deteriorating ICA results. Lastly, apart from decorrelated, performance of the network and will be discussed in Section 4. After
resulting components will be independent, noise will be cancelled converting the original HSI to overlapping patches, the dataset is
and more patterns can emerge from distinctive materials expressed split in 60% testing and 40% training data, which will be further
through ICs [27]. In the following section we consider two different split in test and validation subsets. Then, a new step is proposed
for preprocessing the training data, Oversampling of Weak Classes.
87
Oversampling weaker classes aims to determine which classes have 3.3.1 Pavia University. The Pavia University dataset contains hy-
fewer training samples and show more from them in the training set. perspectral imaging for 610x340 pixels in the spatial dimension,
This method is expected to be highly beneficial for all hyperspectral and 103 bands in the spectral, ranging from 430 to 860 nm in wave-
datasets due to the fact that they usually lack labeled data, but length. It was captured with the ROSIS sensor with a Ground Sam-
especially for those who present significant class imbalance. Class ple Distance (GSD) of 1.3m over the University of Pavia, Italy. It is
imbalance may often lead to few prevailing classes taking over comprised of 9 urban land-cover classes.
the dataset, leaving weaker classes unrepresented. The number of
3.3.2 Indian Pines. The Indian Pines dataset consists of 145x145
samples for each label is computed and sorted from maximum to
pixels in the spatial dimension and 224 spectral bands, although
minimum. Then, classes are selected in reverse order, from weaker
the corrected version we use has 220, in the range of 400 to 2500nm.
to stronger and samples are repeated and stored at a random place
Data was captured with the AVIRIS sensor in North-Western In-
in the dataset. The implementation is based on the code of [29].
diana, USA, with a GSD of 20m and are labeled with 16 classes of
Our CNN model is based on HybridSN [24], but a few distinctive
vegetation.
alterations have been made. Our approach is described in Section 3
with details and explanations on the proposed, adjusted CNN. 3.3.3 Salinas. The Salinas dataset consists of a 512x217 hyperspec-
The first layer of our model is a 3D Convolutional layer of 8 tral image on 224 spectral bands of wavelength varying from 360
filters, with a kernel of size 3x3x5, and is responsible for a simulta- to 2500nm. It was captured with the AVIRIS sensor over Salinas
neous spectral-spatial, initial training on our data. Then, instead Valley, California with a GSD of 3.7m , with 16 classes of labeled
of using more KxKxN layers, we propose a spatial and temporal data.
separable 3D convolution. Thus, the second layer of 16 filters, re-
3.3.4 Data Fusion Contest 2013 (DFC2013). The Data Fusion Con-
duces the spatial dimension with a 3x3x1 filter, while extracting
test 2013 (DFC2013) dataset consists of 144 spectral bands and
only spatial features, and the third layer of 32 filters reduces the
349x1905 spatial pixels. Spectral bands range to 380 to 1050 nm
spectral dimension filtering with a 1x1x3 kernel. In this way, spec-
with a spectral resolution of 4.8nm. Data was captured over the
tral and spatial features are learned in alternation. In this paper, it
University of Houston campus in Texas, United States, with a GSD
is proved that the method proposed above outperforms the original
of 2.5m. [31]
approach of using more spectral-spatial 3D Convolution layers.
Then, a 2D convolution follows with a 3x3 kernel and finally three
Fully Connected Layers that map the input layer to the number of
4 EXPERIMENTS
classes for each dataset. All tests were run on Google Colaboratory, a platform by Google
The ReLU activation function was used in all the layers for the that provided 25GB of Random Access Memory (RAM) and the op-
combination of excellent performance in deep neural networks and tion to run on a Graphical Processing Unit (GPU), which is highly
low computational complexity when compared to alternatives such recommended for time-consuming Deep Learning projects. Our ex-
as sigmoid [1]. Training is carried out using the Adam optimizer to periments focus on three preprocessing steps: patch size selection,
optimize the CNN’s weights. Adam was selected for its capacity to the feature extraction algorithm, and oversampling of weak classes.
adjust the learning rate [7]. Also, Bera, S. and Shrivastava, V.K. [30] For evaluation, we compute three metrics from the extracted con-
have proven that Adam outperforms all popular optimizers on deep fusion matrices. Those are, the Kappa coefficient (κ), a statistical
neural networks for hyperspectral remote sensing classification. To metric, the Overall Accuracy (OA), the fraction of correctly classi-
avoid overfitting, dropout regularization is employed before each fied results to total examples, and Average Accuracy (AA), which
fully connected layer. When training the model, shuffling of the evaluates the class-wise classification outcomes. To visualize the
dataset after each epoch is applied to stabilize the optimization results, we also present a geographical display of each class, called
and avert repeating patterns. Also, the training set is divided in a classification map.
70% training and 30% validation data. The optimal model does not
necessarily correspond to the last epoch, so we employ checkpoints. 4.1 Patch Size
The criterion is minimization of data loss in the validation set. The proposed CNN was used to test how different patch sizes would
Epochs are initialized at 200, but an Early Stopping callback is affect classification results, on all four datasets. We tried 10 differ-
utilized to stop training after 30 epochs of no recorded improvement ent patch sizes, from 7x7 through to 25x25, for Indian Pines, Pavia
in validation loss. This saves significantly on computational time University and Salinas, and 7 patch sizes for Data Fusion, from 7x7
since most of the time CNNs require no more than 50 epochs to to 19x19. The reason for not testing Data Fusion on bigger patches
reach a satisfactory performance. is due to computational complexity. Testing for patch size was con-
ducted with Oversampling of Weak Classes, which will be discussed
on the following experiment, and dimensionality reduction was
3.3 Datasets
performed with PCA. For Indian Pines and Data Fusion, the first
We used the 3 reference datasets used in the original paper[24] 30 PCA components were selected, while on Pavia University and
(Pavia University, Indian Pines, Salinas), plus the dataset provided Salinas, 10 PCAs were used. The numbers selected are explained in
in the IEEE (Institute of Electrical and Electronic Engineering) GRSS the respective sub-section.
(Geoscience and Remote Sensing Society) Data Fusion Contest in It is evident from the results on Table 1 that the optimal value
2013. A brief description follows for each one. for patch size is 13. This result is in accordance with studies by
Makantasis et al. [20], on a 2D CNN. Beyond this size, classification
88
results often deteriorate, but even for instances where they remain input data followed by fastICA with the same number of compo-
roughly the same, or slightly improve, larger patch sizes should not nents on PCA-transformed data, PCA on the input data followed
be preferred due to their increased computational cost. These results by CoroICA in the same fashion. In all cases, the total number of
can be justified, considering that a larger patch size introduces more components remains the same. The results are displayed in Table 3,
noise to our model, and we found that the most useful contextual and it is evident that PCA outperforms all other methods. FastICA,
information is in a 13x13 window around a target pixel. When although believed to have certain advantages over PCA, performed
smaller patches are considered we observe a gradual improvement worse on our experiments. It is also clear that CoroICA could be
leading up to the optimization of the process, probably because applied to hyperspectral data, but compared to other feature ex-
including neighboring pixels comes with more abstract patterns traction methods it has the worse performance and should not be
that aid in determining complex features. Selecting a smaller patch applied in HSI application as is.
size is an important step in decreasing preprocessing and training
time. The original HybridSN model in [24] used a 25x25 patch 4.4 Overall Results
size, therefore our proposed model achieving comparable or even
In Table 4 we present a comparison of our model’s classification
superior classification accuracies with 13x13 patches is an important
accuracies for three out of four benchmark datasets to 4 other
contribution.
methods: SVM[14], 2D-CNN[20], 3D-CNN[32] and HybridSN[24].
Since the papers referenced for Indian Pines, Pavia University and
4.2 Oversampling Salinas do not include tests on Data Fusion Data, and the later was
Oversampling of weak classes, as described in Section 3, is proven given at a classification competition, we can compare our approach
to be an important pre-processing step. In Table 2 we observe the to the winning paper that had the higher Overall Accuracy in the
results from training the same model with 13x13 patches, as in- classification task. We observe that our method outperformed all
dicated by the results of the previous experiment. Classification the hand-crafted and CNN approaches we considered, as well as
results improved for all four datasets, with the most notable being the methodology proposed in [33] in the Data Fusion Contest. For
in Overall Accuracy, with a 0.65% improvement in Indian Pines and a visual representation of the results, we use the classification map
0.29% in Data Fusion. On closer examination, we can see that clas- (Figure 3), which shows the geographic locations of the input data.
sification outcomes outperformed the original method especially in We conclude that our model is comparable to the state-of-the-art
cases where a larger number of weak classes was present. This is methods, in terms of classification accuracy, and very close to the
visible in the confusion matrices shown in Figure 2. For the Indian ground truth, which is the annotated label value for each pixel.
Pines dataset, where we have the most noticeable results, there In Table 5, we compare the computational times for the Indian
are 4 classes with less than 100 samples, making their appearance Pines Dataset for the original HybridSN model and our proposed
in training rare and their features underrepresented, leading to model. To have comparable results on HybridSN, we used the code
misclassifications and an overall deterioration of the classification available in [24] and run the model on Google Colab. Computational
outcome. In Pavia University, although the Overall Accuracy is times were monitored for three phases: Preprocessing of Indian
already very high, the three classes with fewer samples can be Pines Dataset, Training the model and Testing. Preprocessing time
more adequately represented and this is evident from the increase is similar, so the added preprocessing step of Oversampling Weak
in class-wise accuracy (AA). Classes, although being computationally complex, was balanced out
by reducing the patch size. In Training and Testing, the reduction of
4.3 Feature Extraction our model’s computational complexity is clear. Training time was
less than 50% than the original, while Testing time was reduced by
The feature extraction algorithms presented in Section 3 are tested
more than 80%. These results, alongside the improved accuracies
in 5 different ways and classification outcomes are observed. After
shown in Table 4 demonstrate the contribution of our proposed
reducing the spectral dimensions, all datasets are processed in the
method.
same way, with a patch size of 13x13, using Oversampling and the
model is trained for a maximum of 200 epochs, with a patience of
30 depending on validation loss changes. Since there are no solid 5 CONCLUSIONS
studies that address the issue of components selection in feature HSI classification is a computationally complex task, albeit a chal-
extraction algorithms, based on application, dataset characteristics lenging one. New sensors make hyperspectral data more widely
etc. our choice is supported by experimental validation. For the available, thus exploiting the vast amount of information contained
Indian Pines dataset, the total number of components was fixed at in these data is definitely a promising area of study. In Remote Sens-
30, while Pavia University is trained with 10 components. These ing, the CNN has facilitated land-cover applications with tremen-
values not only result in satisfactory classification accuracies on dous success. The proposed method focuses on tackling the issue of
our model, but were also proposed as optimal by Makantasis et al. computational complexity, while performing remarkably well com-
[20] for the same datasets in their 2D CNN approach. The Salinas pared to state-of-the-art models. Experimental validation demon-
dataset is not considered in this part since a perfect classification strates that HSI Remote Sensing classification has taken excellent
result (100%) has already been achieved, so there is no improvement advantage of Deep Learning techniques. The methods developed
we can expect to see. provide an interesting insight into complex image classification
The methods compared are the following: PCA on the input data, applications. Future perspectives could investigate how those meth-
fastICA on the input data, CoroICA on the input data, PCA on the ods perform in interdisciplinary studies, or if existing knowledge
89
Table 1: Classification results using different patch sizes
Indian Pines Pavia University Salinas Data Fusion

PATCH Kappa OA AA Kappa OA AA Kappa OA AA Kappa OA AA
SIZE
7x7 97.217 97.560 98.088 99.690 99.766 99.649 99.593 99.547 99.593 99.833 99.820 99.833
9x9 99.684 99.723 99.723 99.943 99.957 99.932 99.886 99.873 99.886 99.733 99.712 99.733
11x11 99.833 99.853 99.665 99.948 99.961 99.941 99.987 99.986 99.987 99.767 99.748 99.767
13x13 99.925 99.934 99.941 99.994 99.996 99.993 99.996 99.996 99.996 99.966 99.964 99.966
15x15 99.777 99.804 99.885 99.989 99.992 99.974 99.996 99.996 99.996 99.922 99.916 99.922
17x17 99.703 99.739 99.333 99.984 99.988 99.954 100.00 100.000 100.000 99.988 99.988 99.988
19x19 99.666 99.707 99.407 99.994 99.996 99.980 100.00 100.00 100.00 99.878 99.868 99.878
21x21 99.814 99.837 99.897 99.994 99.996 99.980 100.00 100.00 100.00
23x23 99.555 99.609 99.488 99.984 99.988 99.981 100.00 100.00 100.00
25x25 99.647 99.691 99.727 99.974 99.980 99.943 100.00 100.00 100.00
Table 2: Classification results with and without oversampling weak classes in data preprocessing

Oversampling
Kappa OA AA Kappa OA AA Kappa OA AA Kappa OA AA
With 99.925 99.934 99.941 99.994 99.996 99.993 99.996 99.996 99.998 99.964 99.966 99.966
without 99.184 99.284 98.916 99.994 99.996 99.991 99.993 99.993 99.993 99.652 99.678 99.601
Table 3: Classification results using different feature extraction algorithms in several ways (PCA, FastICA, CoroICA)
Indian Pines Pavia University

Method Kappa OA AA Kappa OA AA
PCA 99.925 99.934 99.941 99.994 99.996 99.993
fastICA 99.573 99.626 99.589 99.989 99.992 99.992
coroICA 95.474 96.032 96.626 99.302 99.474 99.165
fastICA – PCA 99.555 99.609 99.844 99.989 99.992 99.974
coroICA – PCA 98.052 98.292 98.371 98.987 99.236 99.092
Figure 2: Confusion Matrices for (a) Indian Pines and (b) Pavia University.
90
Figure 3: Classification Map for Indian Pines dataset. (a) Ground Truth, predicted images by (b) SVM, (c) 2D-CNN, (d) 3D-CNN,
(e) HybridSN, (f) our model.
Table 4: Final Classification results in percentages for benchmark datasets

Method Kappa OA AA Kappa OA AA Kappa OA AA Method OA Kappa
SVM 83.10 85.30 79.03 92.59 94.34 92.98 92.11 92.95 94.60 1st place 94.43 93.96
[33]
2D CNN 87.96 89.48 86.14 97.16 97.86 96.55 97.08 97.38 98.84 Ours 99.99 99.99
3D CNN 89.98 91.10 91.58 95.51 96.53 97.57 93.32 93.96 97.01
HybridSN 99.71 99.75 99.63 99.98 99.98 99.97 100.0 100.0 100.0
Ours 99.93 99.93 99.94 99.99 99.99 99.99 100.0 100.0 100.0
Table 5: Computational time for Indian Pines Dataset
Indian Pines
Model Preprocessing (s) Training (m) Testing (s)
HybridSN 4.23 11.76 19.97
Ours 3.92 5.48 3.54
could be transferred to or incorporated by other fields of science. [9] Chen, Y. et al. 2014. Deep Learning-Based Classification of Hyperspectral Data.
Several benchmark datasets were employed to facilitate the com- IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sens-
ing. 7, 6 (Jun. 2014), 2094–2107. DOI:https://doi.org/10.1109/jstars.2014.2329330.
parison to previous work, but in the future further testing could be [10] Falco, N. et al. 2014. A Study on the Effectiveness of Different Independent
carried out on different, newer datasets. Component Analysis Algorithms for Hyperspectral Image Classification. IEEE
Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 7,
6 (Jun. 2014), 2183–2199. DOI:https://doi.org/10.1109/jstars.2014.2329792.
REFERENCES [11] Joy, A.A. et al. 2019. A Comparison of Supervised and Unsupervised Dimension
[1] Audebert, N. et al. 2019. Deep Learning for Classification of Hyperspectral Data: Reduction Methods for Hyperspectral Image Classification. 2019 International
A Comparative Review. IEEE Geoscience and Remote Sensing Magazine. 7, 2 Conference on Electrical, Computer and Communication Engineering (ECCE)
(Jun. 2019), 159–173. DOI:https://doi.org/10.1109/mgrs.2019.2912563. (Feb. 2019).
[2] Khan, M.J. et al. 2018. Modern Trends in Hyperspectral Image Analysis: A Review. [12] Ghamisi, P. et al. 2017. Advanced Spectral Classifiers for Hyperspectral Images:
IEEE Access. 6, (2018), 14118–14129. DOI:https://doi.org/10.1109/access.2018. A review. IEEE Geoscience and Remote Sensing Magazine. 5, 1 (Mar. 2017), 8–32.
2812999. DOI:https://doi.org/10.1109/mgrs.2016.2616418.
[3] Agrawal, N. and Verma, K. 2020. Dimensionality Reduction on Hyperspectral [13] Yang, H. 1999. A back-propagation neural network for mineralogical mapping
Data Set. 2020 First International Conference on Power, Control and Computing from AVIRIS data. International Journal of Remote Sensing. 20, 1 (Jan. 1999),
Technologies (ICPC2T) (Jan. 2020). 97–110. DOI:https://doi.org/10.1080/014311699213622.
[4] R. Bellman. Adaptive control processes: a guided tour, Princeton: Princeton [14] Melgani, F. and Bruzzone, L. 2004. Classification of hyperspectral remote sensing
University Press images with support vector machines. IEEE Transactions on Geoscience and
[5] van der Meer, F.D. et al. 2012. Multi- and hyperspectral geologic remote sensing: A Remote Sensing. 42, 8 (Aug. 2004), 1778–1790. DOI:https://doi.org/10.1109/tgrs.
review. International Journal of Applied Earth Observation and Geoinformation. 2004.831865.
14, 1 (Feb. 2012), 112–128. DOI:https://doi.org/10.1016/j.jag.2011.08.002. [15] Deng, J. et al. 2009. ImageNet: A large-scale hierarchical image database. 2009
[6] Hongtao Du et al. 0. Band selection using independent component analysis for IEEE Conference on Computer Vision and Pattern Recognition (Jun. 2009).
hyperspectral image processing. 32nd Applied Imagery Pattern Recognition [16] Kalchbrenner, N. et al. 2014. A Convolutional Neural Network for Modelling
Workshop, 2003. Proceedings. Sentences. Proceedings of the 52nd Annual Meeting of the Association for Com-
[7] Feng, Q. et al. 2019. Multisource Hyperspectral and LiDAR Data Fusion for putational Linguistics (Volume 1: Long Papers) (2014).
Urban Land-Use Mapping based on a Modified Two-Branch Convolutional Neural [17] Milletari, F. et al. 2016. V-Net: Fully Convolutional Neural Networks for Volumet-
Network. ISPRS International Journal of Geo-Information. 8, 1 (Jan. 2019), 28. ric Medical Image Segmentation. 2016 Fourth International Conference on 3D
DOI:https://doi.org/10.3390/ijgi8010028. Vision (3DV) (Oct. 2016).
[8] Giampouras, P. et al. 2013. Artificial Neural Network Approach for Land Cover [18] Kamnitsas, K. et al. 2017. Efficient multi-scale 3D CNN with fully connected CRF
Classification of Fused Hyperspectral and Lidar Data. IFIP Advances in Informa- for accurate brain lesion segmentation. Medical Image Analysis. 36, (Feb. 2017),
tion and Communication Technology. Springer Berlin Heidelberg. 255–261. 61–78. DOI:https://doi.org/10.1016/j.media.2016.10.004.
91
[19] Krizhevsky, A. et al. 2017. ImageNet classification with deep convolutional neural Signal Processing (Jul. 2007).
networks. Communications of the ACM. 60, 6 (May 2017), 84–90. DOI:https: [27] Abbasi, A.N. and He, M. 2019. CNN with ICA-PCA-DCT Joint Preprocessing for
//doi.org/10.1145/3065386. Hyperspectral Image Classification. 2019 Asia-Pacific Signal and Information
[20] Makantasis, K. et al. 2015. Deep supervised learning for hyperspectral data clas- Processing Association Annual Summit and Conference (APSIPA ASC) (Nov.
sification through convolutional neural networks. 2015 IEEE International Geo- 2019).
science and Remote Sensing Symposium (IGARSS) (Jul. 2015). [28] Pfister, N., Weichwald, S., Bühlmann, P., and Schölkopf, B.. 2019. Robustifying
[21] Chen, Y. et al. 2016. Deep Feature Extraction and Classification of Hyper- Independent Component Analysis by Adjusting for Group-Wise Stationary Noise.
spectral Images Based on Convolutional Neural Networks. IEEE Transactions Journal of Machine Learning Research 20, 147 (2019), 1–50.
on Geoscience and Remote Sensing. 54, 10 (Oct. 2016), 6232–6251. DOI:https: [29] KonstantinosF. 2019. KonstantinosF/Classification-of-Hyperspectral-Image. (Sep-
//doi.org/10.1109/tgrs.2016.2584107. tember 2019). https://github.com/KonstantinosF/Classification-of-Hyperspectral-
[22] Li, Y. et al. 2017. Spectral–Spatial Classification of Hyperspectral Imagery with 3D Image
Convolutional Neural Network. Remote Sensing. 9, 1 (Jan. 2017), 67. DOI:https: [30] Bera, S. and Shrivastava, V.K. 2019. Analysis of various optimizers on deep
//doi.org/10.3390/rs9010067. convolutional neural network model in the application of hyperspectral remote
[23] Zhang, H. et al. 2019. Hyperspectral Classification Based on Lightweight 3-D-CNN sensing image classification. International Journal of Remote Sensing. 41, 7 (Dec.
With Transfer Learning. IEEE Transactions on Geoscience and Remote Sensing. 2019), 2664–2683. DOI:https://doi.org/10.1080/01431161.2019.1694725.
57, 8 (Aug. 2019), 5813–5828. DOI:https://doi.org/10.1109/tgrs.2019.2902568. [31] 2013 IEEE GRSS Data Fusion Contest - GRSS: IEEE: Geoscience & Remote Sens-
[24] Roy, S.K. et al. 2020. HybridSN: Exploring 3-D–2-D CNN Feature Hierarchy for ing Society. http://www.grss-ieee.org/community/technical-committees/data-
Hyperspectral Image Classification. IEEE Geoscience and Remote Sensing Letters. fusion/2013-ieee-grss-data-fusion-contest/
17, 2 (Feb. 2020), 277–281. DOI:https://doi.org/10.1109/lgrs.2019.2918719. [32] Ben Hamida, A. et al. 2018. 3-D Deep Learning Approach for Remote Sensing
[25] J. A. Palmason, J. A. Benediktsson, J. R. Sveinsson, and J. Chanussot, “Classification Image Classification. IEEE Transactions on Geoscience and Remote Sensing. 56,
of hyperspectral data from urban areas using morphological preprocessing and 8 (Aug. 2018), 4420–4434. DOI:https://doi.org/10.1109/tgrs.2018.2818945.
independent component analysis,” in Proc. IEEE Int. Geosci. Remote Sens. Symp. [33] Debes, C. et al. 2014. Hyperspectral and LiDAR Data Fusion: Outcome of the
(IGARSS’05), 2005, vol. 1, pp. 176–179. 2013 GRSS Data Fusion Contest. IEEE Journal of Selected Topics in Applied
[26] Jutten, C. et al. 2007. How to Apply ICA on Actual Data? Example of Mars Earth Observations and Remote Sensing. 7, 6 (Jun. 2014), 2405–2418. DOI:https:
Hyperspectral Image Analysis. 2007 15th International Conference on Digital //doi.org/10.1109/jstars.2014.2305441.
92

koumoutsou2020

Uploaded by

Copyright:

Available Formats

You might also like

koumoutsou2020

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

koumoutsou2020

Uploaded by

Copyright:

Available Formats

A Deep Learning approach to Hyperspectral Image Classification

using an improved Hybrid 3D-2D Convolutional Neural Network

Table 1: Classification results using different patch sizes

Indian Pines Pavia University Salinas Data Fusion

Indian Pines Pavia University Salinas Data Fusion

Indian Pines Pavia University

Table 4: Final Classification results in percentages for benchmark datasets

Indian Pines Pavia University Salinas Data Fusion

Table 5: Computational time for Indian Pines Dataset

You might also like