Machine Learning and Deep Learning Techniques for Spectral
Spatial Classification of Hyperspectral Images: A
Comprehensive Survey
Reaya Grewal 1 , Singara Singh Kasana 2 and Geeta Kasana 3

1 Computer Science and Engineering Department, Thapar Institute of Engineering and Technology,
Patiala 147004, India;
2 Computer Science and Engineering Department, Thapar Institute of Engineering and Technology,
Patiala 147004 , India;
3 Computer Science and Engineering Department, Thapar Institute of Engineering and Technology,
Patiala 147004 , India;
Abstract: The growth of Hyperspectral Image (HSI) analysis is due to technology advancements that
enable cameras to collect hundreds of continuous spectral information of each pixel in an image. HSI
classification is challenging due to the large number of redundant spectral bands, limited training
samples and non-linear relationship between the collected spatial position and the spectral bands. Our
survey highlights recent research in HSI classification using traditional Machine Learning techniques
like kernel-based learning, Support Vector Machines, Dimension Reduction and Transform-based
techniques. Our study also digs into Deep Learning (DL) techniques that involve the usage of
Autoencoders, 1D, 2D and 3D-Convolutional Neural Networks to classify HSI. From the comparison,
it is observed that DL-based classification techniques outperform ML-based techniques. It has also
been observed that spectral-spatial HSI classification outperforms pixel-by-pixel classification because
it incorporates spectral signatures and spatial domain information. The performance of ML and
DL-based classification techniques has been reviewed on commonly used land cover datasets like
Indian Pines, Salinas valley and Pavia University.

Kasana, G. Machine Learning and
Deep Learning Techniques for
Spectral Spatial Classification of
1. Introduction
Comprehensive Survey. Electronics
Remote Sensing (RS) has advanced significantly, leading to the use of new technologies.
This enables novel data processing methods and better quality data with enhanced spatial
electromagnetic spectrum images with a large number of contiguous bands. The spectral
dimension (Sz). The hyperspectral data is illustrated as a 3D hyperspectral data cube in
Figure 2. Spatial and spectral resolution play essential roles in various HSI applications.
It has drawn researchers interested in developing HSI techniques related to HSI in both
Electronics 2023, 1, 0 2 of 35

classification consistently suffers from various hurdles such as high dimensionality, limited
or unbalanced training patterns, spectral variability and mixed pixels.

Figure 1. Spectral curves of soil, unhealthy and healthy plant [1].

Figure 2. Hyperspectral Image Cube Representation [2].

1.1. Advantages of HSI

Hyperspectral techniques are used vastly in various research areas and various appli-
cations due to their proficiency in detecting and distinguishing between the characteristic
features of entities using multiple contiguous spectral channels. A few advantages of HSI
technology are enlisted below:
• HSI captures the finer details about different objects within similar features (e.g.,
different plant species or tree species).
• HSI delivers better-detailed information than multispectral or RGB images. For exam-
ple, HSI can differentiate three minerals in a region, it is only because of its increased
spectral resolution. In contrast, multispectral or RGB imaging could not do the same
due to its wider spectral width.
• Another benefit of HSI is that the data analyst does not need any prior knowledge of
the sample as a full spectrum is acquired at each point and post-processing can extract
all available information from the data set.
• HSI can also exploit the structural relationships between the various spectra in a
neighborhood, fostering more sophisticated spectral structure models for more precise
analysis and classification of an image.
• The rich spectral-spatial detail has been very beneficial for implementing traditional
and some newer classification methods.
Electronics 2023, 1, 0 3 of 35

1.2. Applications of HSI

HSI is being employed for various industrial, commercial and military applications.
A few applications have been enlisted below and illustrated in Figure 3.

Figure 3. General Applications of HSI.

• Food and Safety—HSI has contributed in food quality assessment and safety. It
has been used for identification of defects and levels of contamination. For e.g.,
Leiva et al. [3] employed HSI to find firmness of blueberries and achieved an accuracy
of 87%.
• Medical Diagnosis—Due to high spectral resolution, there is sharp capture of materi-
als and their chemical and physical compositions are highlighted. HSI has embarked
on excellent performance for studying and diagnosing tissues. For e.g., Liu, Wang and
Li [4] utilized HSI images of tongue tissues to detect the tumor. the spectral signatures
of tissues played a vital role for detection.
• Precision Agriculture—Manual crop monitoring is limited since apparent symptoms
often develop late in the disease’s progression, making it difficult to restore plant
health. Advances in HSI methodologies have made crop stress assessment and study
of soil and vegetation attributes more cost-effective. For e.g., Liu et al. [5] used spectral
signatures to estimate the yield of wheat crop.
• Environment Monitoring—HSI has also been applied for floods and water resources
management. HSI provides efficient and reliable information on water quality pa-
rameters which include hydrophysical, biochemical and biological properties. HSI
measured chlorophyll content in water bodies by Kutser et al. [6]
Electronics 2023, 1, 0 4 of 35

There are many approaches to classify a HSI image. In this work, ML and DL clas-
sification techniques have been reviewed and compared. ML-based image classification
focuses on developing algorithms to predict and detect patterns without human interven-
tion. Various classifiers like Support Vector Machine (SVM), K-Nearest Neighbor (KNN),
Decision Trees (DT) etc. are trained. Several steps of data pre-processing and feature engi-
neering need to be performed to get insights from raw images and improve performance of
classification techniques. In this study, we have sub-categorised traditional ML techniques
into commonly employed techniques in recent years like kernel-based learning, SVM
classification, dimension reduction and transform-based techniques. Peers have majorly
used kernel-based techniques to efficiently learn non-linearity of HSI dataset. Spectral and
spectral-spatial kernels have been added as another dimension of learning by authors to
capture complex details of HSI. SVM classifier also belongs to the family of kernel learning.
SVM has been extensively used to classify the high-dimension HSI data and discuused
. With transform-based techniques, authors have been able to extract useful information
while suppressing noise in HSI. HSI dataset. The influence of classification grows with
the increase of available training samples. The limited availability of HSI training samples
diminishes the classification performance with the rise of spectral dimension. This effect is
famously termed the “Hughes phenomenon.” To address this challenge, many authors have
implemented dimension reduction techniques prior to classification. We have discussed
various dimension reduction driven HSI classification that works on spectral features.
Unlike traditional ML techniques, DL delivers a dynamic approach for unsupervised
feature learning using a huge raw image data set. DL-based techniques can depict complex
relationships of data using numerous neural connections. The DL models for HSI classifi-
cation generally consist of three layers: (i) Input data, (ii) Construction of the deep layer
(iii) Classification [7]. A general representation of DL-based HSI classification has been
illustrated in Figure 4.

Figure 4. A general representation of DL classification of an Hyperspectral image [8].

The papers reviewed are focused on how different state of the art classification tech-
niques have been used for HSI in the previous decade. A brief discussion on existing
classification techniques is in Section 2. Methodology adopted to conduct this survey
has been briefly stated too. The Section 3 elaborates traditional ML techniques employed
by authors like SVM, kernel-based methods, dimension reduction and transform-based
methods. Section 4 emphasises on DL techniques for spectral and spectral-spatial HSI
classification. Sections 5 and 6 highlight the analysis of this survey. It brings out comparison
in performance of ML and DL techniques for HSI. The paper is concluded with challenges
and future scope of research and improvement in HSI analysis.

2. Preliminaries
This section briefly defines the HSI classification techniques utilised in the surveyed

2.1. Overview of Classification Techniques

The major techniques mentioned in the paper are briefly discussed:
• Kernel-based Classification Techniques—HSI is highly non-linear and complex in
nature and can be addressed using mathematical functions termed as a kernel. It
helps with better data representation and segregation into separate classes as depicted
in Figure 5. Since available HSI datasets suffer from limited training samples, this
approach has been viewed as stable and efficient.
Electronics 2023, 1, 0 5 of 35

Figure 5. A schematic approach of Kernel-based learning to classify data.

• Support Vector Machine-based Classification—SVM is typically a linear classifier

associated with kernel functions and optimization theory, and is prominent in HSI
classification. It uses a mathematical kernel as hyperplane to distinguish data points
such that the points nearest to hyperplane are farthest from data points of other class
as shown in Figure 6. SVM separates the inseparable data by learning an optimal
hyperplane which distinguishes well in a high-dimensional features space. It outper-
forms the conventional supervised classification methods especially under prevailing
conditions such as an increased number of spectral bands and the limited availability
of training samples.

Figure 6. A schematic approach of SVM hyperplane separating different classes [10].

• Wavelet Transform-based Classification—Wavelet transform-based classification

techniques have been used in many of the subsequent studies. These techniques filter
out noise and also aids in data compression. Wavelet analysis is a time-frequency
method for determining the optimum frequency band for images-based on their
attributes. Figure 7 shows a general approach of how wavelet transform breaks down
data into approximated data, horizontal and vertical detailed data with different
decomposition levels.
• Dimension Reduction—The dimensionality curse, often known as the Hughes phe-
nomenon, poses the greatest difficulty to HSI categorization. To overcome this prob-
lem, feature extraction methods are employed to reduce dimensionality by picking the
most important characteristics. Unsupervised approaches arrange pixels with compa-
rable spectral features (means, standard deviations, etc.) into discrete clusters-based
on statistically derived parameters. Furthermore, no prior information is required
to train the data using unsupervised categorization algorithms. Unsupervised meth-
Electronics 2023, 1, 0 6 of 35

ods that are well-known are Principal Component Analysis (PCA) and Independent
component analysis (ICA). Figure 8 illustrates basic steps of PCA dimension reduction.

Figure 7. A schematic approach of Wavelet Transform decomposing data into two levels.

Figure 8. General schema of PCA dimension reduction.

• Deep Learning-based Classification—The vast and growing family of Neural Net-

works have been studied for classification of HSI as depicted in Figure 4. Recent
researches show that DL has improved the accuracy of the classification of HSI. The DL
models have exploited spectral, spatial and both spectral-spatial features. Apart from
kernel and dimension reduction methods, DL methods themselves extract features
automatically and solve non-linearity problem. Some of the commonly employed
DL models implemented for HSI are Convolutional Neural Networks (CNNs), Long
Short Term Memory Networks (LSTMs), Multilayer Perceptrons (MLPs), Deep Belief
Networks (DBNs) and more. The limitations however still persist with limited training
samples and over optimistic results due to overlapping training and testing dataset.

2.2. Methodology Adopted for Survey

• The research papers have been reviewed from 2010 onwards, from reputed and leading
sources like ScienceDirect, IEEE and others.
• The paper has been categorised majorly into two techniques: Traditional ML and
DL techniques.
• A comparison on performance of aforementioned techniques is discussed in the end
to help readers understand difference in performance of techniques.
• Land cover benchmark datasets such as Indian Pines, Pavia University, and others
have been used to assess the performance of techniques.
Electronics 2023, 1, 0 7 of 35

• Each technique’s performance was compared on the basis of their accuracy in


3. Traditional Machine Learning Classification Techniques

3.1. Kernel Learning-based Classification Techniques
Gu and Liu [13] classified the limited samples of HSI using Sample Screening Multiple
Kernel Learning (S2MKL) in 2016. The training samples were adaptively screened using
their probability distributions. Subsets of samples were fed into different base kernels
of SVM. Using a linear combination, the optimal weights of the base kernels were de-
termined. Instead of the original spectra, MP were extracted using erosion and dilation
operations. MP provided spatial and spectral features used in Multiple Kernel Learning
(MKL) with Adaboost strategy. Executed on the Indian Pines, Salinas valley and Pavia
University, the proposed method achieved better results as compared to other state of the
art approaches.
In 2017, Fang et al. [14] used Extinction Profiles (EPs)-based spectral spatial feature
extraction method. The earlier researchers used EPs in a stacking manner in which whole
information would not be utilised. Hence, to extract better features, the authors created a
fusion framework namely, EPs-F. In this they exploited the information among and within
EPs. For within EP information, three independent components of HSI were taken into
account, adaptive superpixel-based composite kernels were proposed which fused the
kernels of various features. The classification map obtained from different EPs reflected
the information of whole HSI in different perspectives. Hence, for information among
EPs, these maps were fused together and a final classification map was produced. Their
approach achieved an Overall Accuracy (OA) of 96.12% and 99.14% on Indian Pines and
Pavia University, respectively.
Li et al. [15] used composite kernels in 2018, to extract spatial and spectral features.
Adaboost framework with weighted Extreme Learning Machine (ELM) was used. Radial
Basis Function (RBF) kernel was applied for spatial information. Polynomial function ker-
nel was applied to extract spectral information. Adaboost algorithm was applied iteratively
and weights were updated accordingly. The experiment was performed on Indian Pines
and Pavia University dataset and achieved an accuracy of 98.08% and 96.46%, respectively,
in comparison with SVM, ELM, Kernel-SVM, Kernel-ELM, Combined Kernel–SVM and
Combined Kernel-ELM. In this work, simple spatial features were extracted like standard,
local mean. Features like gabor filter, MP can be applied in future for better results. The clas-
sification accuracy could be improved using various other machine learning algorithms.
In 2019, Li, Lu and Zhang [16] proposed a combination of SVM and Multiple Kernel
Boosting (MKB). It searched for optimum combination of kernel SVMs. Initially multiple
kernel functions were prepared and every SVM was trained over the dataset forming pool
of kernel SVMs. From this pool, which the strong SVM classifiers were repeatedly chosen
and added to the final decision function. Weight of each SVM and samples were updated in
each iteration. The experiment was performed on reflectance spectrum of planton, Salinas
Valley and Pavia university datasets. It outperformed classifiers like KNN, SVM, Bayes and
Sparse Representation method. In future, the combination of spatial-contextual constrain
with MKB framework could achieve higher accuracy.
In 2020, Li, Wang and Kong [17] proposed an adaptive kernel sparse representation
method-based on multiple feature learning (AKSR-MFL). Initially multiple spatial and
spectral features from different perspectives were extracted. They extracted five types
of feature descriptor from the original HSIs, including spectral, Extended MP (EMP),
differential MP, Local Binary Pattern (LBP) and Gabor texture. The last four features gave
spatial information. For each test pixel, shape adaptive region was constructed using shape
adaptive algorithm. It is-based on local variations of HSI to conform spatial structure and
explore the contextual information. AKSR method was designed using kernel joint sparse
pattern to address the linearly inseparable problem in multiple feature space. It grouped
the pixels having similar distribution. In composite kernel, base kernels used for different
feature descriptors were combined with optimal weights. The experiment was performed
on Indian Pines, Pavia University and Salinas Valley datasets and achieved the highest OA
in comparison with various state of the art approaches.
Electronics 2023, 1, 0 8 of 35

In the same year, Gao et al. [18] used a composite Spectral-Spatial Kernel for Anomaly
Detection (SSCAD). It considered non-linear characteristics of data unlike other detection
models that worked in linear space and just exploited spectral information. Using a kernel-
based approach, the data is implicitly mapped into high dimensional features space that
deals with non-linear problems well. Using local homoegeneity, superpixels were extracted
using ERS that provided spatial information. It was fused with direct spectral information
extracted from images to form composite kernel. Weights were adaptively determined
using iterative kernel learning algorithm-based on Centred Kernel Alignment (CKA). CKA
measured cosine similarity between two centred kernels. High value of CKA determined
that two kernels are similar to each other. The authors focused on obtaining highest possible
value of CKA between the composite kernel and target kernel. The detection map was
built using kernel-based Reed-Xiaoli anomaly detection algorithm. It used Mahalanobis
distance to form decision rules to distinguish text pixels and backgrounds. The proposed
work was implemented on real datasets obtained using HYDICE sensor, ROSIS sensor over
Pavia centre and AVIRIS sensor over San Diego area. It gave better performances in terms
of Receiver Operating Characteristic (ROC) curve and Area Under the ROC curve (AUC)
when compared with state of the art anomaly detection methods.
Following this, A MKL-based approach involving spectral, spatial and semantic infor-
mation using SVM were used by Wang et al. [20] for better classification results of HSI. First
three PCs (PC1-PC3) were obtained by applying PCA. These were used to obtain Gabor
features, entropy rate superpixel segmentation map and EMPs. Structure and textural
features were extracted and stacked as feature vectors for each pixel using combination
of gabor and EMP features. For uniformity in spatial characteristics, Mean filtering was
performed within each superpixel. For semantic information, k-means clustering map and
segmentation map via ERS were used to produce semantic feature vector for each super-
pixel. Each superpixel was treated as a separate document/image. Spectral features, ERS
map and manually decided ‘k’ number of cluster centroids were inputs to create semantic
features using Bag of Visual Words (BOVW). K-means clustering was performed on the
spectral features to cluster them into ‘k’ cluster centres that was used as visual dictionary.
Number of pixels belonging to each cluster inside each superpixel were counted. Creation
of k × 1 histogram feature vector was done for each superpixel. Three individual kernels
were used to extract spectral, spatial and semantic information. For final results, composite
kernel with SVM was applied using weighted sum of these three kernels. The work was
implemented on Indian Pines and Pavia university and obtained highest OA of 98.39% and
99.77%, respectively.
HSI dataset faces with mixed pixels and purely pixel driven classifiers like SVM
cannot deal with overlapping data. Recently in 2021, Ma et al. [19] overcame it using Kernel
Constrained Energy Minimization (KCEM) and Kernel Linearly Constrained Minimum
Variance (KLCMV) classification. KCEM was for binary classification whereas KLCMV
for multi-classification. KCEM achieved an OA of 99.48% and 99.50% for Indian Pines
dataset, respectively. Both the former and latter achieved an OA of 99.6% on Salinas
Valley. It surpassed the performance of other spectral spatial methods. The aforementioned
Kernel-based classification techniques have been compared in Table 1.
Electronics 2023, 1, 0 9 of 35

Table 1. Comparison analysis of Kernel-based classification.

Year Authors Methodology Used Evaluation Parameters

• Indian Pines: OA-93.30% and AA-95.52%.

Sample Screening MKL, Boosting
• Pavia University: OA-96.03% and
2016 Gu and Liu [13] strategy for limited number of training
samples, MPs, SVM classification.
• Salinas valley: OA-94.78% and AA-97.15%

Composite kernels, Adaboost framework,

• Indian Pines: OA-98.08% and AA-97.67%.
weighted ELM, RBF kernel, polynomial
2018 Li et al. [15] • Pavia University: OA-96.46% and
function kernel, spectral spatial
features extraction.

• Pavia university: OA-85.3% and

Combination of SVM, MKL, AA-89.74%.
2019 Li, Lu and Zhang [16]
Boosting algorithm. • Salinas Valley: OA-97.55% and AA-98.02%.

Multiple feature learning, Shape

• Indian Pines: OA-99.19% and AA-99.24%
Adaptive kernel with Sparse
• Pavia University: OA-97.09% and
2020 Li,Wang and Kong [17] Representation. Spectral, EMP, DMP, LBP
and Gabor texture feature descriptors
• Salinas valley: OA-99.57% and AA-99.51%.
were used.

Composite spectral-spatial kernel, ERS

for superpixels, CKA-based kernel
Better performances in terms of ROC and area
2020 Gao et al. [18] learning, Reed-Xiaoli anomaly detection
under the ROC curve.
algorithm. Mahalanobis distance to form
decision rules.

A MKL approach. Obtained spectral

features through PCA, spatial features
• Indian Pines: OA-98.39% and AA-98.30%.
through Gabor and EMP, semantic
2020 Wang et al. [20] • Pavia university: OA-99.77% and
information using k-means clustering on
each superpixel. Composite kernels used
in SVM classification.

Spectral spatial kernel generation

network, Region segmentation, clustering
• Indian Pines: OA-99.29%.
and mapping operations for spatial
2021 Ma et al. [19] • Pavia university: OA-99.56%.
kernels, Correlation characteristics of
• Salinas Valley: OA-99.37%.
bands into spectral attention mechanism
for spectral kernels.

Convolutional kernel classifier, • Pavia University: OA-95.22% and

Nystrom’s approximation to reduce high AA-94.26%.
2021 Ansari et al. [21]
dimensionality of basis kernels, Deep • Salinas Valley: OA-95.23% and AA-97.66%.
kernels using 1D CNN.

Fuzzy twin SVM kernel deep learning

• Indian Pines: OA-99.82%.
framework, Gaussian, ANOVA and
2022 Krishna et al. [22] • Pavia university: OA-99.73%.
linear spline kernel with fuzzy triangular
• Salinas Valley: OA-99.83%.
function SVM hyperplanes.

• Indian Pines: OA-98.81% and AA-98.83%.

Spectral spatial kernel,
2022 Wang et al. [23] • Pavia university: OA-99.76% and
Vision transformer
Electronics 2023, 1, 0 10 of 35

3.2. Support Vector Machine

In 2010, Dalla Mura et al. [24] dealt with data redundancy and extracted useful
information from raw HSI using ICA and Extended Morphological Attribute Profiles
(EAPs). This helped them to map the data into a subspace where components were as
independent as possible. With the help of various attributes, APs were applied on each
of these components followed by morphological processing and SVM classification. Their
methodology obtained highest OA of 94.47% on Pavia University.
PCA has been earlier researched upon to extract useful bands from HSI but it is a linear
spectral technique applied on non-linear HSI. In 2011, Licciardi et al. [25] classified HSI
using MPs built using selected and informative features by non-linear PCA. The authors
classified it using SVM and Neural Networks. It had improved results as compared to
Kernel PCA and standard PCA.
In 2013, Dopido et al. [26] worked on the limited training samples challenge of the
HSI classification. They used active learning methods where a trained expert was used
to select unlabeled samples-based on spatial information-based on labeled samples. This
was further adapted to a self-learning framework in which the classifier chose the most
informative unlabeled samples for classification. This reduced effort and cost as the
classifier itself determined the labels for selected pixels. They employed probabilistic SVM
and multinominal logistic regression as classifiers for the same.
Zhong et al. [27] used both spatial and spectral information for HSI classification
in 2018. SVM classifier is a pixel-based spectral technique. They used it iteratively in
combination with Gaussian filter. Initially, they combined original image with its first
PC and obtained classification map using SVM. Using this map, spatial information was
extracted through a Gaussian filter. They fed this map back in the next iteration to be
combined with the currently processing data cube. Their approach obtained high OA of
98.60% and 98.68% on Indian Pines and Pavia University dataset, respectively.
SVM is a spectral classifier that cannot deal with spectral variability on its own.
Recently in 2022, Pathak et al. [28] improved the aforementioned issue by implementing
SVM with RBF, polynomial and linear kernels to obtain spectral spatial features of each
pixel. Patches for each pixel were chosen and equivalent spatial features were fused with
the spectral values. It achieved an OA of 95.75% with RBF kernel on Indian Pines dataset.
For Pavia university, Polynomial kernel gave the highest OA of 98.84%.
In 2022, Li et al. [29] also focused on spectral spatial features using SVM. The authors
used Shape Adaptive Reconstruction in pre processing, which extracted neighbours for
each pixel-based on Pearson correlation in their shape adaptive regions. The probability
maps for classification was obtained from SVM. In post processing, it was denoised and
final classification map was achieved using Smoothed Total variation (STV). It achived an
OA of 92% ad 95% for Pavia University and Indian Pines dataset, respectively.

3.3. Transform-based Techniques

In 2008, Akbari et al. [30] applied HSI analysis in surgeries to detect organs. Compres-
sion of images was done using wavelet transform. Linear Vector Quantization (LVQ) and
ANN were used for segmentation and classification. The images were post processed using
image fill function and region growing. The experiment was performed on seven images
each of different abdominal organs captured using ImSpector HSI sensor. False Negative
Rate (FNR) and False Positive Rate (FPR) were evaluation criteria for quality of detection.
Spleen was detected the best with FNR of 1.3% and FPR of 0.5%.
In 2014, Chen et al. [31] used Mean Shift Algorithm (MSA) and wavelet transform to
obtain useful features and good classification results. The dimension was reduced using
PCA. The images were smoothed using MSA. It converged the spectral curves of similar
pixels. Edges information was extracted using wavelet transform. The experiment was
conducted on Indian Pines and Changchun data set. Compared with Canny and Log
operator, it produced better results with less noisy edges.
Spectral spatial classification of HSI were done using wavelet transforms and morpho-
logical operations by Quesada et al. [32]. Dimensions of HSI were reduced using spectral
features extracted by wavelet. EMPs were obtained from dimensionally reduced images.
Noise removal is done using wavelets as well. The information obtained fro both was
Electronics 2023, 1, 0 11 of 35

combined and classified using SVM. The proposed work achieved an OA of 98.8%, AA of
99.0% on Pavia university Dataset.
In 2017, Two-Dimensional Empirical Wavelet Transform (2D-EWT) was used by Prab-
hakar and Geetha [33] for selection of informative and non-redundant bands. It was
compared with Image Empirical Mode Decomposition (IEMD). EWT segmented the signal
via Fourier transform and estimated supports. The estimated supports helped in building
the wavelet filter banks. The signal was filtered that provided with frequency component
to be processed having detail and approximation coefficients. The proposed work im-
plemented a 2-D extension of Littlewood-Paley transform. Sparse-based classifiers were
employed for the classification of the HSI dataset like Subspace Pursuit (SP) and Orthogonal
Matching Pursuit (OMP) along with SVM, Hybrid Support Vector Selection and Adaptation
(HSVSA). The methodology was performed on the Indian Pines dataset. IEMD gave better
OA but in more time as compared to 2D-EWT. The low frequency components of IEMD
and 2D-EWT had improved kappa measure, OA and Average Accuracy (AA).
In 2019, Ji et al. [34] detected bruises on potatoes using Discrete Wavelet Transform
(DWT) technique. Characteristic bands were selected using PCA. Top significant PC images
were chosen. The images’ texture were enhanced with histogram equalization. The pro-
cessed PC images were decomposed using DWT. The textural properties like contrast,
entropy, and correlation were obtained using GLCM. These feature data were further
extracted using AdaBoost-Fisher Linear Discriminant (FLD) Algorithm. The identification
of bruised potatoes was done by Adaboost modeling. The highest detection accuracy of
99.82% was obtained
In 2021, Anand et al. [35] extracted 3D spectral spatial features of an HSI cube simulta-
neously. Haar, Coiflet and Fejer-Korovkin filters were used for the same. The features were
fed into SVM, KNN and Random Forest. It achieved highest performance with KNN and
Random Forest.
Recently in 2022, Xu, Zhao and Liu [36] implemented 3D wavelet transform in the
pre-processing step to reduce the number of the learnable parameters of CNN. It extracted
both spatial and spectral features and had robust feature representation. Haar wavelet was
used as the mother wavelet.
In 2022, Miclea et al. [37] with the aid of wavelet obtained spectral features which were
concatenated with spatial features of LBP. The spectral spatial features were fed to SVM.
To prevent data overlapping which caused exaggerated classification results, the training
and testing sets were divided through controlled sampling. The training samples were
selected such which had spectral and spatial variance. The samples were added through
region growing with a specified window size. The aforementioned Transform-based
classification techniques have been compared in Table 2.

Table 2. Comparison analysis of transform-based methods.

Year Authors Methodology Used Evaluation Parameters

Compression of medical image using ANN for segmentation and classification. Post
2008 Akbari et al. [30]
wavelet transform. LVQ processing using region growing.

• False Negative Rate (FNR) and False

Positive Rate (FPR) were evaluation criteria
for quality of detection.
PCA, MSA and edge extraction using • Spleen was detected the best with FNR of
2014 Chen et al. [31]
wavelet transform. 1.3% and FPR of 0.5%.
• In comparison with Canny and Log
operator, it produced better results with
less noisy edges.

Morphological operations, spectral • The proposed work achieved an OA of

2014 Quesada et al. [32] spatial features extraction through 98.8%, AA of 99.0% on Pavia
wavelet, SVM classification. university Dataset.
Electronics 2023, 1, 0 12 of 35

Table 2. Cont.

Year Authors Methodology Used Evaluation Parameters

• IEMD gave better OA but in more time as

2D-EWT for band selection, 2-D compared to 2D-EWT.
Prabhakar and
2017 extension of Littlewood-Paley transform • The low frequency components of IEMD
Geetha [33]
and sparse-based classifiers. and 2D-EWT had improved kappa
measure, OA and AA.

PCA, image decomposition using DWT,

2019 Ji et al. [38] GLCM and AdaBoost-FLD. • Accuracy of 99.82% in detection of bruises.
CLassification using Adaboost modeling.

3D-DWT to extract features and remove • Indian Pines: OA-91.98% and AA-80.34%
2020 Cao et al. [39] stripe noise effect. CNN classification • Pavia University: OA-91.27% and
with active learning strategy. AA-82.37%.

Texture classification using graph-based • Indian Pines: OA- 98.90% and AA-98.77%.
wavelet transform, de-correlation • Pavia University: OA- 99.65% and
2020 Zikiou et al. [40] between close pixels-based on spectral AA-99.47% .
similarity to build spectral graph • Kennedy Space Centre: OA- 99.73% and
wavelets, SVM classification. AA-99.80%.

Whale optimization band selection

technique, 3D-DWT to increase variation • Indian Pines: OA- 99.44%.
2021 Manoharan et al. [41] in selected bands, CNN with 3D • Pavia University: OA- 99.85%.
convolutions fused using spectral and • Salinas Valley: OA-99.83%.
wavelet spatial features.

3D spectral spatial features. Haar,

• Indian Pines: OA- 90.4%.
2021 Anand et al. [35] Frejet-Korovkin and Coiflet filters. SVM,
• Salinas Valley: OA- 96.7%.
KNN and RF classification.

• Indian Pines: OA- 98.15%.

3D Haar wavelet filter for spectral spatial
2022 Xu et al. [36] • Pavia University: OA- 96.27%.
features, CNN.
• Kennedy Space Centre: OA- 96.84%.

Spectral features through wavelet, LBP

• Indian Pines: OA- 93.85%.
2022 Miclea et al. [37] for spatial features and
• Pavia University: OA- 92.86%.
SVM classification.

Multi-head Transformer-based attention

• Indian Pines: OA- 65.54% and AA-57.95%.
mechanism to select features, Coiflet
2022 Tulapurkar et al. [42] • Pavia University: OA- 94.62% and
wavelet filter-CNN-based
feature extraction.

3.4. Dimension Reduction-based Techniques

High dimensionality is a challenge for HSI classification. This section discusses and
analyses the researchers’ contribution in their work for improving HSI classification. A
general schema of dimension reduction is depicted in Figure 9.

3.4.1. Unsupervised
In 2011, Villa et al. [43] focused on removal of redundant bands and used Independent
Component Discriminant Analysis (ICDA) for the same. The authors obtained classifi-
cation results using Bayesian classifier. Their approach achieved better accuracy than
SVM classification.
In 2016, HSI band selection using combination of entropy filtering and K-means
clustering was done by Santos and Pedrini [44]. For increased intra cluster similarity and
inter cluster variance, the bands were grouped together using their correlations. The images
Electronics 2023, 1, 0 13 of 35

were downsized by selecting fewer features vector using bi-cubic interpolation to improve
computation time. K-means was applied where each band was treated as a sample and
the Pearson correlation coefficient was used. K Representative bands were selected from
grouped bands and a 2d entropy filter was applied to each band. The central pixel of each
kernel was replaced with computed entropy giving a new vector that was submitted to
radial kernel SVM. The methodology obtained an OA of 97.1%, 98.3% and 97.1% on Indian
Pines, Salinas valley and Pavia centre datasets, respectively.

Figure 9. General schema of dimension reduction used for classification.

In 2017, Schclar and Averbuch [45] focused on improving the classification results of
HSI using Diffusion Bases (DB)-based methodology. The non-linear correlations amongst
wavelengths were captured that produced low dimension representation of data, reducing
the amount of noise. A modified version of the DB method was also proposed that used
eigendecomposition of symmetric matrices. These were conjugate to the non-symmetric
Markov matrix and used weight functions comprising pairwise similarity between pixels.
To cluster the low dimensional data, two-phased histogram-based segmentation method
named as Wavelength-Wise Global segmentation (WWG) was used. In wavelength wise
understanding of n-band HSI, cube was considered as collection of n images having
size m*m. The clustering was performed on the basis of colour similarity. The colour-
based segmentation included normalisation of input image followed by it’s quantization.
The frequency colour histogram was built in which certain number of highest peaks were
detected that were assumed to belong to different objects in the image. The highest peak
being the largest homogeneous area i.e., background. It was assumed that quantized colour
vectors belonging to same peak were part of same coloured object. After identification of
peaks, each quantized colour vector was associated with a single peak using euclidean
distance and final images were constructed. Microscopy and remotely sensed images
of Washington DC’s National Mall were used on which various iterations of proposed
methodology were performed. The classification results were dependent on the dimension
of diffusion space whose optimal value selection was yet to be studied by the authors.
In 2018, Jain et al. [46] proposed classification of HSI and trained the important features
by optimizing the SVM using Self Organizing Maps (SOM). They classified the interior and
exterior pixels using the posterior probabilities. SOM is data compression technique in
which the incoming signal/pattern of any dimension is reduced to 1D or 2D lattice using
competitive learning of neurons. In their approach the input images were converted to
grayscale, and ROI were selected over which SOM algorithm was applied to properly group
together the pixels in terms of features and intensity levels. The SOM training algorithm
provided inputs and weights to each edge of the image. On the basis of neighbourhood Best
Matching Unit (BMU) using Euclidean distance, each neighbouring node’s weights were
updated iteratively. It brought them closer to the input pattern. For classification of interior
and exterior pixels, posterior probabilities and an optimal threshold were computed. If the
probability of a pixel was greater than the threshold, then the pixel belonged to the interior
of the particular class else it belonged to the boundary of certain class. The experiment
was performed on Indian Pines and Pavia University dataset where it outperformed other
baseline methods achieving highest accuracy of 85.29% and 95.46%, respectively.
Band reduction techniques would reveal nonlinear properties but at the expense of
losing orginal data’s representation. To address the same, Ahmad et al. [47] in 2019 used
non-linear Unsupervised, non-segmented and segmented Denoising Autoencoder(UDAE)-
based b method for improving the classification of HSI. For segmented UDAE, the HSI cubes
were segmented spatially-based on the pixel locations and further processing of segmented
Electronics 2023, 1, 0 14 of 35

HSI images was done spectrally by autoencoder. The experiment was performed on Pavia
Centre, Pavia university and Salinas valley dataset where the proposed methodology
achieved highest accuracy using SVM.

3.4.2. Semi-Supervised
In 2016, Romaszewski et al. [48] proposed a co-training approach-based on P-N
learning scheme inspired by the Tracking-Learning-Detection framework (TLD) used to
track the objects in videos. In P-N scheme, two independent learners P and N were used
that scored the unlabeled samples in different feature spaces and extended the training set.
P-expert assumed same class for spatially close pixels-based on region growing. The score
function was estimated using Gaussian Kernel Density Estimation that used distance from
known samples (seeds). N-expert assumed the same class for pixels with similar spectra
and was defined as a Nearest Neighbor classifier (NN) having a rejection score for pixel
i. It identified the n-closest spectral neighbours from the seeds and spectral Euclidean
distance was computed between the pixel i and pixel j. The score formula was-based on
the probability estimation with the distance-weighted KNN rule. The scores from both the
expert were combined. Spectral classification was performed for unlabeled pixels that could
not be labeled using region growing due to disjoint regions. They applied the approach
on six data sets: the Indian Pines, Salinas Valley, University of Pavia, La Selva Biological
Station and Madonna, Villelongue, France. The method achieved highest classification
accuracy in comparison with various state of the art approaches.

3.4.3. Supervised
In 2016, Li et al. [49] used dual -layer supervised Mahalanobis distance kernel for
HSI classification. The traditional unsupervised approach was modified using supervised
Mahalanobis matrix to obtain a new kernel using relativity information of the various
materials present in the images. The proposed approach was executed in two steps where
firstly, the traditional Mahalanobis matrix was used to map the raw data. Then using the
mapped data, difficult-to-identify classes from the various classes were selected and second
mahalanobis matrix was learned using this particular data only. A new mahalanobis kernel
was formed using the combination of these two matrices. In the end, on this dimensionally
reduced data, SVM was used achieving high performance on the Indian Pines, Salinas
valley and Pavia university dataset. It resolved the drawback of traditional Mahalanobis
distance metric learning, which learned a matrix without taking into accounts the weights
of each class.
Nhaila et al. [50] performed supervised classification of HSI in 2019 using SVM, KNN,
RF and Linear Discriminant Analysis (LDA) with different kernels along with MI for
dimension reduction. The features/bands were selected by computing the MI between the
ground truth and each band. The subsets of bands were intialised with the band having
highest MI value with ground truth. The average of last band and new candidate band
built a reference map called as ground truth estimated. Finally, the candidate band was
added to the subset if it increased the previous MI value between ground truth and the
reference map. The experiment was performed on Indian Pines, Salinas valley and Pavia
university dataset. SVM with RBF kernel and RF outperformed other learners.
The aforementioned supervised, semi-supervised and unsupervised dimension reduction-
based classification techniques have been compared in Table 3.
Electronics 2023, 1, 0 15 of 35

Table 3. Comparison analysis of supervised, semi-supervised and unsupervised dimension reduction-

based classification techniques.

Year Authors Methodology Used Evaluation Parameters

P-N learning scheme, P learner extracted

• Indian Pines: OA-94.05%.
spatial features through region growing,
2016 Romaszewski et al. [48] • Pavia University: OA-97.40%
N learner extracted spectral filters using
• Salinas Valley: OA-98.38%.
NN classifier.

• Indian Pines: OA-71.24% and AA-78.43%.

Dual-layer supervised Mahalanobis
• Pavia University: OA-77.67% and
2016 Li et al. [49] distance kernel for HSI classification.
SVM classification.
• Salinas valley: OA-88.04% and AA-94.01%.
Electronics 2023, 1, 0 16 of 35

Table 3. Cont.

Year Authors Methodology Used Evaluation Parameters

• Indian Pines: OA-97.1%.

Entropy filtering, K-means clustering,
• Salinas valley: OA-97.1%.
2016 Santos and Pedrini [44] band grouping using correlation,
• Pavia centre: OA-98.3%.
SVM classification.
• Pavia university: OA-96.2%.

Diffusion bases method, wavelength wise

Schclar and
2017 global segmentation to cluster low NA
Averbuch [45]
dimensional data.

• Indian Pines: OA-85.29% and AA-86.23%.

Data compression using SOM,
2018 Jain et al. [46] • Pavia University: OA-95.46% and
SVM classification

• For segmented DAE

• OA = 99.06% on Salinas-A
• OA = 91.37% on Salinas
• OA = 97.69% on Pavia Centre.
UDAE for spectral and spatial • OA = 84.07% on Pavia University.
2019 Ahmad et al. [47]
features extraction. • For non-segmented DAE
• OA = 98.79% on Salinas-A
• OA = 91.50% on Salinas
• OA = 98.13% on Pavia Centre.
• OA = 89.20% on Pavia University.

MI band reduction, Supervised • Indian Pines: Highest OA-93.27% of SVM.

2019 Nhaila et al. [50] classification of HSI using SVM, KNN, • Salinas Valley: Highest OA-97.09% of RF.
RF and LDA with different kernels. • Pavia University: Highest OA-95.50% of RF.

Supervised dimension reduction using

Relief-F method. Band selection from
• Salinas Valley: OA-94.45%.
2020 Ren et al. [51] reduced feature set on the basis of high
• Pavia University: OA-94.71%.
importance scores to eliminate
contiguous bands.

Superpixel Collaborative Represention of

pixels (SPCR) having similar spectral
signatures and spatial adjacency. Global OA of 97.64% and AA of 97.77% on
2021 Liu et al. [52]
projection matrix to reduce discrepancies Pavia University.
between original spectral features
and SPCR.

Supervised Isometric Feature mapping

for dimension reduction. Triple geodesic
distance learning using pixel label,
2022 Ding et al. [53] OA of 96.83% on Indian Pines.
neighbourhood and credibility
information. Generalised regression
NN classification.

3.4.4. Features Selection

In 2016, MKB framework was used by Qi et al. [54]. Kullback-Leibler (KL) distance-
based kernel function was used to develop SVM. In this ensemble learning framework,
KL-MKB used Adaboost strategy to learn MKL-based classifier. Optimum Index Factor
(OIF) was employed for selection of informative features. The OIF value selected bands
with most variance and least correlation. The work had a stable performance and gave
higher accuracy of 85.89% in comparison with SVM, using a single fixed kernel and Simple
MKL on Indian Pines dataset. In future, band clustering and selection could be used. Sparse
Electronics 2023, 1, 0 17 of 35

MKL could be built for compact representation. The drawback was choosing an appropriate
number of kernels which was a tradeoff between efficiency and accuracy. The number was
chosen between 9 and 12.
In 2017, Yang et al. [55] too worked on representative band selection in HSI. The dis-
tances between spectral bands were computed using disjoint information. Bands were clus-
tered using k-means and ‘K’ representative bands were selected from these clusters. The cri-
teria for optimal selection was-based on minimizing the distances between bands inside the
clusters and maximizing the gap between different representative bands. The disjoint infor-
mation was calculated using joint entropy and MI of two spectral images. The proposed
technique used KNN and SVM classifiers on the Indian Pines dataset and outperformed
various state of the art techniques.
In 2018, Medjahed et al. [56] proposed feature selection in HSI as optimization problem
by using a stochastic approach namely. Simulated annealing was used to optimize the
objective function embedded with classification accuracy rate and relevance among features
in terms of MI. The experiment was compared with existing feature selection approaches
like Mutual Information (MI) Feature Selection, MI Maximization (MIM), Joint MI (JMI),
Minimum Redundancy Maximum Relevance (MRMR) and Conditional MI Maximization
(CMIM). The proposed work achieved highest accuracy rate of 88.75% having 10 features
as compared to above techniques on the Pavia university dataset. Their study achieved
highest OA of 91.47% as compared to the other classifiers in their literature on the same
dataset. For Indian Pines dataset, the highest OA of 76.48% and AA of 71.72% was obtained
in comparison with SVM, genetic algorithm and using 10 features of 20% training pixels.
Xie et al. [57] addressed the problem of dimensionality reduction in 2019 via fea-
tures/bands selection that was information rich and less redundant. Improved Subspace
Decomposition (ISD) and Artificial Bee Colony algorithm (ABC) were used. The correlation
coefficients between adjacent bands were calculated. Local minima and spectral curve
visualization helped in achieving the subspace decomposition of choosing m bands from
the original n bands. Band subset selection was done where randomly k bands were chosen
from each band subspace. It was optimized by the ABC algorithm with the help of ISD
and maximum entropy. In the end, SVM was applied for the classification of the obtained
optimized band subsets. The proposed work was implemented on Pavia University, Indian
Pines and Salinas Valley datasets and achieved better performance than the various state of
the art approached for features selection.
In 2019, Sellami et al. [58] focused on tackling the curse of dimensionality and limited
number of training samples by selecting appropriate features/bands. Adaptive dimension
reduction was used that seeked relevant bands with high discrimination, information, low
redundancy. To extract spatial-spectral information, the spatial window includes features
from neighbouring pixels. These were loaded into a semi-supervised 3-D CNN with convo-
lutional encoder-decoder layers for 3-D convolution and max-pooling. The categorization
map was created using a linear regression classifier. The investigation was carried out
using data from Indian Pines, Pavia University, and Salinas Valley. In comparison to other
recent techniques, the suggested study attained the highest OA for all datasets..
Elzaimi et al. [59] used a filter-based approach using information gain function to
reduce the dimensionality in 2019. The bands were chosen-based on their interaction
and complimentarity. Classification was performed using SVM. The algorithm selected
the discriminative bands using an evaluation of interaction gain that maximised the com-
promise of the MI between the ground truth and the selected band. The average of the
interaction information helped in controlling the redundancy. The selected bands subset
was initialized with a band that had highest MI with class label that served as ground truth
estimated. Iteratively, candidate bands were added by computing their MI with ground
truth. Their information gain was calculated-based on the mean interaction information
between the candidate bands, ground truth and the estimated ground truth. The band that
maximized the information gain criterion was chosen in each step. The experiment was
performed on two benchmark hyperspectral datasets Indian Pines and Pavia University
and compared with other band selection algorithms like MI Feature Selection, Minimum
Redundancy Maximum Relevance (MRMR) method and MI-based Filter approach (MIBF).
Electronics 2023, 1, 0 18 of 35

The proposed work achieved highest OA of 95.25% and 96.83% in Indian Pines and Pavia
University dataset, respectively.
In 2020, Sawant et al. [60] proposed meta-heuristic-based optimization method of
bands selection using Modified Cuckoo Search algorithm (MCS). Initially, Chebyshev
chaotic map was used in the algorithm to initialize the nest locations (solutions). This
ensured non-repetition of generation of similar bands. Fitness value and current iteration
number were used to update iteratively the step size and a scaling factor of the Levy Flight
method. It generated new solutions (bands) in every iteration. These two modifications
in the standard Cuckoo Search algorithm gave MCS and helped in escaping from local
optimum. They used wrapper-based selection method due to which accuracy was checked
by involving the classifier in every iteration. Global best solution was obtained in the end.
The proposed technique outperformed standard CS algorithm and achieved the maximum
OA of 95.10% for Pavia University dataset, and 86.92% for Indian Pines dataset.
To reduce complexity of numerous spectral bands, Zhu et al. [61] used Affinity Propa-
gation (AP) clustering algorithm. An improved AP was used where subsets were created
inside the clusters, the information entropy was combined to change the availability matrix
and create clusters with arbitrary shapes. It achieved an OA of 91.5% on Salinas Valley.
The aforementioned features selection-based classification techniques have been com-
pared in Table 4.

Table 4. Comparison analysis of features selection-based classification.

Year Authors Methodology Used Evaluation Parameters

OIF for dimension reduction, KL

• Accuracy of 85.89% on Indian pines dataset.
2016 Qi et al. [54] distance-based kernel function, MKL and
SVM classification.

K-means for band clustering, KNN and

2017 Yang et al. [55] NA
SVM classification.

• Accuracy rate of 88.75% having 10 features

Simulated annealing for features, MI and
on the Pavia university dataset.
2018 Medjahed and Ouali [56] classification embedded with
• For Indian Pines dataset, highest OA of
objective function.
76.48% and AA of 71.72%.

Band selection through ISD, ABC and

2019 Xie et al. [57] maximum entropy algorithms. NA
SVM classification.

Adaptive dimension reduction and

2019 Sellami et al. [58] spectra spatial features using NA
semi-supervised 3-D CNN.

Filter-based approach using information • Indian Pines: OA-95.25%.

2019 Elzaimi et al. [59] gain function to reduce • Pavia University: OA-96.83%.
the dimensionality.

Band selection using Modified Cuckoo

Sawant and Search algorithm, Levy flight and • Indian Pines: OA-86.92%.
2020 • Pavia University: OA-95.10%.
Manoharan [60] Meta-heuristic-based
optimization method.
Electronics 2023, 1, 0 19 of 35

Table 4. Cont.

Year Authors Methodology Used Evaluation Parameters

Feature selection using MI-based

2021 Uddin et al. [62] minimum redundancy and OA-95.39% and AA-95.09% on Indian Pines.
maximum variance

Hybrid clustering with filtering feature

2022 Zhang et al. [63] selection-based on weights of similarity OA of 79.24% on Indian Pines.
measure between bands.

3.4.5. Features Extraction

In 2016, Imani et al. [64] used Binary Coding-based Feature Extraction (BCFE) in HSI
data. The spectral signature of every pixel was divided into equal segments. In each
segment the features were extracted using weighted mean of the spectral bands. A new
method for calculation of weights was used in BCFE where the class means’ binary codes
were obtained. The information present in the positive or the negative edges and the binary
values of the class means in each band were used to calculate the weight of each spectral
band. The proposed work employed SVM and maximum likelihood classifiers. It achieved
better results than various other features extraction techniques on Indian Pines, Pavia
University and Kennedy Space Center.
In 2017, Qi et al. [65] used MKB framework-based on Particle Swarm Optimization
(PSO). Features were extracted using standard deviation, correlation coefficient and KL
divergence. PSO with mutation mechanism was chosen to select the best parameters
for SVM. The proposed mechanism outperformed many state of the art approaches by
achieving OA of 88.02% and 95.81% on the Indian Pines and Pavia University, respectively.
The methodology performed faster than single kernel methods but slower than mixture
kernel methods. Computational costs and efficiency needs to be improved.
Ksieniewi et al. [66] proposed a novel pipeline in 2018 of features extraction and
classification of HSI in which the statistical properties of the images were extracted that
were embedded into feature space having 14 features/channels for dimension reduc-
tion that were input into the ensemble learning-based on randomized neural networks.
The ensemble of ELMs were used for randomized feature subspaces and trained combiner.
For statistical features of images, initially, edge detection was performed followed by gen-
eration of filter to drain out the noisy pixels and then features from filtered images like
red channel, green channel, blue channel, minimum value, highest value, median, mean
of spectral signature were extracted. The ensemble of ELMs were formed on the basis of
Random Subspace Method (RSM) where the image having d- dimensional feature vector
in given feature space F the base classifiers were constructed using r features where r <
d and randomly selected from F. For trained combination of ELMs, weighted classifier
combination was used and the continuous outputs of ELMs for different classes, were
considered as support values in form that classifier supports that a particular object belongs
to class m and final class to be outputted by ELM was-based on maximum rule (winner
takes all). Perceptron-based combiner was used to compute the weights assigned to each
class and the classifier. The experiment was performed on Salinas Valley, Salinas A (Subset
of Salinas Valley), Indian Pines, Pavia Centre and university, Botswana and Kennedy Space
center dataset and compared with single model ELM, SVM with one-versus-all strategy
and one-versus-one strategy, where the proposed method achieved higher accuracy.
Qiao et al. [67] used joint bilateral filtering and spectral similarity-based Joint Sparse
Representation Classification (SS-JSRC). Extracted the spatial features via joint bilateral
filter on first PC image. SS-JSRC filtered out the pixels of different classes within the
neighborhood window of every pixel. The proposed work gave better results than SRC,
JSRC and NLW-JSRC. The OA of the proposed method on Indian Pines dataset achieved
was 98.13% which was higher than other state of the art approaches. Similarly, for Pavia
University dataset, highest OA was achieved of 99.76%. The work could be improved using
saliency-based algorithms, weakly supervised learning, histogram of sparse codes.
Electronics 2023, 1, 0 20 of 35

Paul et al. [68] used MI-based S-SAE method in 2018. MI is a dependency measure
between bands. 1 indicates high dependency while 0 indicates independent bands. Non-
parametric MI-based spectral segmentation was performed. Local features of each segment
were extracted using S-SAE. MPs of the segmented spectral features gave spatial informa-
tion. The experiment was performed on 10%, 5% and 10% training samples of each class of
the Indian Pines, Pavia University and Botswana dataset. SVM with Gaussian kernel gave
better performance in classification of Pavia University and Botswana datasets. Random
Forest classified Indian Pines dataset better. It overcame the limitation of time consuming
and complex SAE-based features extraction method. The methodology performed well
even for limited number of samples. In future, various other non-linear feature extraction
methods like kernel PCA could be used with the proposed method. DL models could be
assimilated for spectral-spatial classification.
The comparative study of aforementioned features extraction-based classification
techniques is presented in Table 5.

Table 5. Comparison analysis of features extraction-based Classification.

Year Authors Methodology Used Evaluation Parameters

Binary coding-based feature extraction

2016 Imani et al. [64] (BCFE), SVM and maximum NA
likelihood classifiers.

Features extraction using PSO, standard

• Indian Pines: OA-88.02%.
2017 Qi et al. [65] deviation, correlation coefficient and KL
• Pavia University: OA-95.81%.
divergence, MKB framework.

14 statistical features, ensemble of ELMS

2018 Ksieniewicz et al. [66] NA
using Random Subspace Method.

Joint bilateral filtering, Spectral similarity,

• Indian Pines: OA-98.13%.
2018 Qiao et al. [67] PCA, Joint Sparse Representation
• Pavia University: OA-99.76%.
Classification (JSRC)

2018 Paul et al. [68] MI-based SAEs, MPs for spatial features. NA

• Indian Pines: OA-93.69% and AA-89.98%.

Bilateral filter-based feature extraction on
2019 Chen et al. [69] • Pavia University: OA-93.30% and
superpixels, SVM classification.

PCA, Adaptive total variation filtering to

extract features from top PCs and • Indian Pines: OA-98.84% and AA-99.01%.
2020 Li et al. [70] transformed features using Ensemble • Salinas Valley: OA-99.51% and AA-99.68%.
IEMD. Stacked all the features obtained
for classification.

Multi-scale spectral features using band

• Indian Pines: OA-98.30% and AA-99.09%.
grouping and LSTM, spatial
2021 Wang et al. [71] • Pavia University: OA-96.26% and
features-based on multi-scale spectral
features and convolution LSTM.

Dimension reduction using Minimum

Noise Fraction (MNF) and local features • Indian Pines: OA-90.32% and AA-93.04%.
2022 Liang et al. [72] extracted using relative total variation, • Salinas Valley: OA-98.13% and AA-97.57%.
superpixel segmentation to extract
non-local structural features

4. Deep Learning-based Classification

Recently, peers have explored DL-based classification techniques tremendously. Their
research has been discussed below and a general mechanism of DL is depicted in Figure 10.
Electronics 2023, 1, 0 21 of 35

In 2010, Ratle et al. [73] proposed semi-supervised neural network-based framework

to deal with limited samples of HSI. They added a flexible embedding regularizer to the loss
function and some additional balancing constraints to Stochastic Gradient Descent (SGD)
to avoid local minima problem. On comparison with methods like k-means, Laplacian
SVM, Transductive SVM, their approach had better accuracy and scalability.
Lin et al. [74] used autoencoder in 2013 with different depths to classify HSI. PCA
for spectral dimension reduction was used while autoencoders extracted spatial fea-
tures. Single-layer autoencoder extracted spectral features and classified them using SVM.
For deep representation, stacked autoencoder was used with logistic regression classifica-
tion. PCA was introduced in second layer for dimension reduction. The experiment was
performed on Kennedy Space Centre and Pavia University. Low error rates were recorded.
In 2015, yue et al. [75] utilised both spatial and spectral features using novel DL
framework. Their framework was a hybrid of Deep CNNs (DCCNs), PCA and Logistic
Regression. The spectral feature maps were generated using a mathematical algorithm
over which PCA was applied to generate joint spectral spatial information. The authors
used this joint information over which DCNNs and LR were applied to extract high level
features and fine tune the model. They achieved highest OA of 95.18% on Pavia University.

Figure 10. General DL mechanism for image classification.

Hu et al. [76] got inspired from application of CNN on 2D images in 2015 and applied
the same in the spectral domain of HSIs. They used 1-D CNN with five layers consisting of
input, convolution, max pooling and fully connected layers. It helped in discriminating
each spectral signature amongst others. Their 5-layer architecture of CNN achieved better
accuracy than traditional SVM, 2-layer Neural Network and LeNet-5 architecture.
Chan et al. [77] proposed a DL-based network in 2015. It consisted of basic processing
components. Cascaded PCA to learn multistage filter banks, binary hashing and blockwise
histograms for indexing and pooling. This net was called PCANet. It was applied to
benchmark visual datasets for digit and face recognition. PCANet served as an effective
baseline where more advanced processing components or more sophisticated architectures
could be justified.
DL has been extensively used for HSI analysis and classification. But high quality
labeled samples are needed for DL to be utilised efficiently. In 2016, Liu et al. [78] tackled
this challenge using weighted incremental dictionary learning on which active learning-
based algorithm was developed. They selected only those training samples which improved
the selection criteria namely uncertainty and representative. This trained deep network on
how and which samples to select at each iteration for training. Their approach achieved
accuracy of 92.4% and 91.6% on Pavia University and Botswana dataset, respectively.
In 2016, Chen et al. [79] dealt with the challenges of limited training samples and
high dimensionality using regularized deep feature extraction method. To obtain better
spectral spatial features, the authors employed 3D CNN. They also applied L2 regular-
ization and dropout techniques to overcome overfitting. The authors improved the CNN
performance by also using virtual samples. These were generated by multiplying a random
factor with training samples and added noise. Their work achieved an OA of 97.56%,
99.54% and 96.31% on Indian Pines, Pavia University and Kennedy Space Centre dataset,
respectively. In future, a post processing methodology could help in further improvements
in classification.
Electronics 2023, 1, 0 22 of 35

Dimension reduction and features were extracted by Zabalza et al. [80] in 2016, using
Segmented Stacked Autoencoders (S-SAE). With S-SAE, the spectral segmentation of the
pixels was performed. The original features were confronted into smaller segments of data
processed separately by smaller and local SAEs on the segmented spectrum. The complexity
was highly reduced with the proposed method. It achieved better accuracy in segmentation
and classification of the scenes in Indian Pines and Centre of Pavia dataset. The work could
be extended using saliency detection methods, adaptive sparse representation and weakly
supervised learning. The major drawback was not extracting the spatial features.
To deal with highly correlated bands and limited samples Yu et al. [81], proposed
CNN in 2017 which dealt with raw HSI input in an end to end manner. Also, they used
small training dataset to optimise the parameters of CNN which helped with the problem
of overfitting. To deal with HSI information 1 × 1 convolutional layers were adopted. Their
approach obtained high OA of 64.19% on Indian Pines, 67.85% on Pavia University and
85.4% on Salinas Valley using 3 labelled samples per class using training.
In CNN, a lot of parameters are needed and hence more training samples are desired
for the convolution filters. But due to limited samples of HSI, overfitting happens which
gives overoptimistic results. Addressing these concerns, Chen et al. [82] focused on reduc-
tion of feature extraction of CNN by using Gabor filters which extracted spatial information,
edge information and textural features. They combined convolution filters with gabor
filters. Grid search was also used to find parameters for gabor filters. On comparison with
traditional methods like SVM, CNN with PCA and simple CNN, their approach achieved
highest OA and AA.
Deep CNN was used to reconstruct images and enhance their spatial features by
Yunsong et al. [83]. Each band was normalized in the range [0,1]. The spatial features of
different classes that had similar characteristics were enhanced to avoid spectral distortion.
PCA was performed to extract PC images. First PC image was chosen as reference image
due to high spatial information. Gray Level Co-Occurrence (GLCM) was used to extract
spatial features like entropy, contrast, correlation, dissimilarity. GLCM features of bands
were compared with the specific features of the first PC and used them in a ratio. The band
with minimum value of ratio was selected as the training label. CNN model with optimized
parameters as used to train the data. ELM was used for further classification. This combined
framework gave high performance for lesser training samples of Indian Pines, Salinas Valley
and Centre of Pavia dataset. Using image reconstruction helped in increasing the AA of
ELM by as high as 30.04%. It performed faster than other state of the art classifiers.
Although earlier authors gave good performance with 1-D CNNs, but it resulted in
information loss while representing HSI pixels as they are sequence-based datasets. Hence,
Mou et al. [84] analysed the pixels using deep Recurrent Neural Network (RNN) with
Parametric Rectified tanh (PRetanh) instead of regular activation functions used by others
like tanh or rectified linear unit. With this approach, band to band variability and spectral
correlation was understood well. It also helped them to learn with high learn rates without
risk of divergence in the training period. The authors reduced the number of parameters
by using gated recurrent unit to build their network. These units used Pretanh for hidden
representation and efficiently processing HSI. Their approach outperformed traditional
methods like SVM, RF and CNN.
In 2018, Zhang et al. [85] used CNN framework encoded with semantic information
which was context aware. Their approach had more discriminative power due to diverse
region-based inputs. Their model had different branches of CNN with each branch rep-
resenting different regions for pixel under inspection. Unlike traditional square window
across a pixel, they extracted six regions namely, right,left, top, bottom, whole region and
local region of a pixel with flexible shapes of patches. They also extracted deep spectral
spatial features using a multi-scale summation module which dealt with limited training
samples, enhanced learning capability and improved generalization. An accuracy of 98.54%
and 98.33% was recorded for Indian Pines and Salinas Valley, respectively.
Although, earlier many joint spectral spatial representations of features of limited
samples were done, but those were not very generic and robust. Deng et al. [86] built
a unified deep network in combination with Active Transfer learning (ATL). Initially,
the authors extracted joint spectral—spatial features using Stacked Sparse AutoEncoders
Electronics 2023, 1, 0 23 of 35

(SSAE). With the help of ATL, they transferred the pre-trained SSAE network and limited
training samples from a source domain to a target domain. The SSAE network was
correspondingly fine tuned using limited samples from both source and target domain
using active learning strategies. They obtained highest OA of 99.61% and 99.86% after
transferring the samples from Pavia university to Pavia Centre dataset and vice versa,
HSI classification is improved by fusing spectral-spatial information. Taking advantage
of the same, Liang et al. [87] extracted deep multi-scale spectral spatial features for HSI and
named the framework as DMSF. They transferred the filter banks of VGG16 model which
learned about the spatial structure of HSI. They fused these deep spatial features with raw
spectral information using sparse autoencoders. They obtained the final discriminative
features by a weighted fuse of these spectral spatial features in VGG16. Their proposed
algorithm was classified using SVM and obtained high accuracy.
Wang et al. [88] focused on improving the training time and accuracy for classification
of HSI. The traditional methods used hand crafted features and needed improvement in
accuracy. To solve this, they developed end to end Fast Dense Spectral-Spatial Convolution
framework (FDSSC). They did not rely on PCA or any other feature extractors. In FDSSC,
they used “valid” convolutions of different sizes to extract spectral spatial features and
reduce dimensions. To achieve highly accurate results, they used densely connected
layers where each previous layer of neurons had a contribution in next layers. Authors
resorted to dynamic learning rate, parametric Rectified Linear Unit (ReLU) activation,
batch normalisation and dropout layers for more speed and reduce overfitting. This helped
authors to achieve high performance within 80 epochs.
Yang et al. [89] exploited the success of CNN in HSI classification in 2018. They
used spectral and spatial information both and built different models like 2D CNN, 3D
CNN, Recurrent 2D CNN (R-2D CNN) and R-3D CNN. Their models converged faster in
comparison with traditional methods like CNN and SVM. Although, their models were
superior yet those needed more training samples than other methods. Incorporating prior
domain knowledge of dataset and transfer learning could help improve performance more.
Pan et al. [90] used PCANet as the foundation, where multi-grain and semi-supervised
information were integrated. A multi-grained network called MugNet was used. It was
a simplified DL model to deal with less samples of training data. In each grain, there
was a DL model. Classification results were obtained via ensemble approach. MugNet
was built with three strategies to enhance the classification accuracy. First, multi-grained
scanning approach, to utilize the spectral relationships between the bands and the spatial
correlation within the neighbouring pixels. This scanning strategy extracted the joint
spatial-spectral information. In second strategy, the convolutional kernels were generated
in semi-supervised manner. Lastly, it did not include any hyperparameters for tuning.
The MugNet has two parallel branches: spectral MugNet and spatial MugNet. Their
frameworks were-based on Semi-Supervised PCA Net (SSPCANet) that had 4 layers: 1
input, 2 convolutional layers and 1 output layer. SSPCANet used the unlabeled pixels
for more representative convolutional kernels. The labeled pixels were used in training
using SVM classifier. It obtained highest OA of 90.65%, 90.82% and 93.15% on Indian Pines,
Grss_dfc_2013 and Grss_dfc_2014 datasets, respectively in comparison with other state of
the art approaches. The computational efficiency needs to be improved. In future, MugNet
could be transformed to a completely end-to-end manner.
Paoletti et al. [91] proposed a 3D CNN architecture to obtain spectral and spatial
features of HSI and classified them using Graphics Processing Unit (GPUs). A border
mirroring strategy was applied to process the border areas in the image. The images were
divided into patches of dxdxn where d was the width and height of the neighbourhood
window centered at a pixel and n were the number of spectral bands of original image.
d/2 pixels of border were mirrored outwards so that they could be used like any other
pixel in the image. The 3D patches were grouped into batches and sent to convolution
layers. Four fully connected layers were used and cross entropy was the loss function of
CNN. The experiment was performed on Indian Pines and Pavia University dataset using
various values of parameter d. On comparison with 1D, 2D, 3D CNNs and Multi-Layer , it
Electronics 2023, 1, 0 24 of 35

achieved highest accuracy for different values of parameter d. The classification accuracy
was dependent on manual selection of parameters.
In 2018, Chen et al. [92] proposed a joint spatial and spectral features driven HSI
classification. Image blocks containing local neighbourhood features gave spatial and
spectral features were merged using the convolutional layers. The results were obtained
from the fully connected layer and it outperformed other state of the art approaches.
The proposed network was also combined with the SVM (RBF kernel) in some of the fully
connected layers. Adaptive mechanism to select the spatial window size was proposed.
For obtaining the features, the first convolution layer was Multi-scale features extraction
layer that extracted features invariant of deformation and scaling. The second convolution
layer, feature fusion layer merged the spatial and spectral features followed by features
reduction convolution layer. The proposed network obtained an OA of 98.02% on Indian
Pines dataset which was higher than other approaches. On combination with SVM, highest
accuracy of 98.39% and 98.44% was obtained in the Indian Pines and Pavia University
dataset, respectively. The best size selection for the adaptive window was done on the basis
of confidence criterion where Conf(k) represented the possibility of input pattern being
classified into kth class. The algorithm worked as follows: two random size of window
A×A and B×B were chosen. When A > B, ‘m’ was the most possible class when window is
A×A and ‘n’ being the second most possible class. If for A, Conf(n) < Conf(m)×theta then
the output would be mth class. But if condition was not satisfied then window size B×B
would give higher confident result and classify the input block into m’ th class. Adaptive
window size selection helped in overcoming the problem of large window size that might
contain many intersecting categories hence confusing the network. This proposed method
improved the classification accuracy for HSI significantly.
Earlier classification techniques did not extract HSI features effectively. To address
the same concern, Singh and Kasana [93] used deep features to classify HSI. The authors
initially reduced the dimension to suppress data redundancy using Locality Preserving
Projection (LPP). This processed data was forwarded to Stacked Auto Encoder (SAE) for
deep feature extraction. Logistic regression was used and their work achieved an OA of
84.4% and 87.2% on Indian Pines and Salinas Valley, resp.
In 2019, Zhou et al. [94] used spectral-spatial LSTM networks shown in Figure 11,
for the classification of HSI. The spectral values of each pixel in all the channels were fed
into the Spectral LSTM (SeLSTM) as shown. Initially, the pixel vector having K number of
bands was transformed into K- length sequence. This sequence was fed one by one into
SeLSTM and the last output was fed to the SVM. 1st PC image, local patches centered at a
pixel and the row vectors of each image patch were one by one fed into the spatial LSTM
(SaLSTM). The rows of neighbourhood were converted into S-length sequence. Figures 12
and 13 display structure of SeLSTM and SaLSTM, respectively. For classification, spectral
and spatial features were obtained separately for each pixel. A decision fusion strategy
was adopted to obtain joint features. For joint spectral-spatial classification, results of
individual LSTMs were intuitively fused in weighted summation. The performance of
SeLSTM, SaLSTM and SSLSTMs were compared with several methods, including PCA,
LDA, non-parametric weighted feature extraction (NWFE), regularized local discriminant
embedding (RLDE), matrix-based discriminant analysis (MDA) and CNN where their
method improved the classification accuracy by at least 2.69%, 1.53% and 1.08% on Indian
Pines, Pavia University and Kennedy Space Centre dataset, respectively.
Electronics 2023, 1, 0 25 of 35

Figure 11. Joint spectral spatial-based LSTM [94].

Figure 12. Spectral LSTM architecture [94].

Figure 13. Spatial LSTM architecture [94].

In 2019, Fang et al. [95] also extracted deep spectral spatial features at different patch
scales using 3D dilated convolutions. All the feature maps were densely connected with
each other. To obtain more distinguishing and less redundant spectral features, the authors
also built spectral-wise attention mechanism(SA) which used soft weights for features. It
achieved an OA of 86.62% on Indian Pines and 92.99% on Pavia University.
Earlier researches implementing ELM did not deal with insufficient samples efficiently.
To address the same, Liu et al. [96] in 2020 implemented ELM-based ensemble transfer
learning. The learners of the target domain helped in determining whether the source
dataset was useful or not. They retained biases and weights learned of the ELM in target
domain and utilised the instances of the source domain to iteratively update the output
weights of ELM. These weights were used by the authors for the training models which
were further ensembled using the same. In this manner, they used source data to improve
the ability of the learner in target domain. They used Pavia University and Pavia Centre
interchangeably as source and target domains to check efficiency of their approach.
Ramamurthy et al. [97] tried to reduce computational complexity by denoising and
reducing dimensions of HSI. Initially,they recognised edges of images through image
denoising and David Marr edge recognition with Canny edge detector. Further, they
segmented HSIs into pixels, reconstructed them and optimised the reconstruction loss.
The HSI were denoised again using AutoEncoders and dimension was reduced using PCA.
In the end, they obtained classification results using CNN. They obtained high OA of 92.5%
on Pavia University dataset.
Sharifi et al. [98] also focused on extracting spectral spatial features of HSI. Earlier,
gabor filters were used to extract shallow texture features and fed into DL model. The au-
Electronics 2023, 1, 0 26 of 35

thors aimed to improve the performance and hence extracted two stage textural features.
The authors applied PCA, afterwards extracted gabor features and took mean of them in
all directions in each scale. Then they obtained LBP of these gabor filters which were more
discriminative than gabor features and LBP alone. They stacked these features and used
3D CNN for classification. Their work recorded OA of 97.72% on Indian Pines dataset.
Cao et al. [99] proposed a new architecture for CNN termed as 3D-2D SSHDR. It
was an end to end hybrid dilated residual networks. 3D hyperspectral cubes were the
input. 3D-2D SSHDR contained five parts, i.e., spectral feature learning process, 3D to 2D
deformable part, spatial feature learning process, an average pooling layer, and a fully
connected layer. The 3D spectral residual blocks learned discriminant spectral features.
For spatial feature learning, the extracted spectral features of 3D images were converted
into 2D features map. To continue learning discriminative spatial features, hybrid dilated
convolution (HDC) residual blocks were used that increased the receptive field of the
convolution kernel. It did not increase any other parameters The proposed network was
trained using supervised learning. The experiment was applied on Indian Pines, Kennedy
Space center and Pavia University datasets achieving high OA of 99.46%, 99.89% and
99.81%, respectively as compared with other models of CNN. The spatial features had not
been extracted in 3D. Also, in future transfer learning could help to extend samples and
improve accuracy.
Nalepa et al. [100] proposed resource frugal quantized spectral CNN. The weights/
activations were represented in compact format like integer or binary numbers without
affecting the classification process. They utilized multi-stage quantization aware training.
The deep model was trained in full precision followed by fake quantization and trained
again before being quantized to final low-bit version. Fake quantization was used as
intermediate step to simulate the quantization of weights/activations. The experiment was
performed on Pavia University and Salinas Valley. This model, four times smaller in size
than the original counterparts segmented equally well. It helped to reduce the memory
footprint of large-capacity model to classify the HSI. Varying the quantization levels could
help understand abilities of DL model better.
Vaddi et al. [101] worked on data normalization and CNN-based classification of HSI.
The normalization was performed by downsizing pixel scalar values by dividing them
with the maximum pixel intensity value. Probabilistic PCA was used to extract spectral
features. Gabor filter helped in acquiring the spatial features. Both the spatial and spectral
information were integrated to form fused features used by CNN. The experiment was
performed on Indian pines, Salinas valley and Pavia University dataset where the proposed
approach gave highest accuracy as compared to other state of art approaches. The running
time of the propose approach needs to be improved.
Various deep neural network models were used by Jiao et al. [102] for HSI classifi-
cation. In first approach, multi scale spatial features were extracted using convolution
network-based on VGG-verydeep-16. It contained 13 convolutional layers, five pooling lay-
ers, three fully connected layers and activation and dropout layers. The deep scale spatial
features were fused with spectral features using weighted fusion method and z-score. It
was used to segment the scenes and obtained pixel-based classification results on Indian
Pines dataset. In second approach, Recursive Autoencoders were employed. It formed
high level spatial spectral features from the original data. It learned local homogeneous
area of the image using the pixel under investigation. The spatial features of the pixel were
learned using weighting scheme-based on the neighbouring pixels. The weights were deter-
mined using the spectral similarity between the investigated pixel and neighbouring pixels.
Unsupervised RAE was employed on Pavia University dataset achieving an accuracy of
99.91%. Third approach involved Superpixels-based Multi Local CNN (SML-CNN). Super-
pixels were formed using a linear iterative clustering algorithm. Multiple local regions of
superpixels were jointly represented namely original, central and corner regions. It gave
different semantic environment of each superpixel even if there was spectral similarity. Fea-
tures were fused from the same. The classification was improved using multi-information
modification strategy to eliminate the errors by combining semantic (superpixel level) and
detailed information (pixel level). The proposed algorithm achieved a good accuracy.
Electronics 2023, 1, 0 27 of 35

Sharifi et al. [98] extracted complex spatial features using multi-scale CNN where
patches of different sizes were used. The spatial features were proved to improve the
classification performance. Hence, the authors included spatial features obtained from
gabor filters, morphological operations and LBP. All these features were fused with PCA’s
spectral features at the decision level for classification. It achieved an OA of 97.98% and
99.44% on 1% and 5% training samples from each class.
Due to radiometric and atmospheric corrections, many informative bands would be
lost. In 2021, Singh and Kasana [103] performed a different spectral-spatial classification
by approximating lost noisy bands. They used linear interpolation to gain approximated
bands. Further, they reduced spectral dimension and obtained spatial features through a
combination of LPP and PCA. The features were classified using deep network alongwith
SAE. The work achieved an OA of 88.9%, 93.3%, 91% and 91.5% on IP, Sa, KSC and PU, resp.
The recent DL classification techniques discussed above have been compared in Table 6.

Table 6. Comparison analysis of DL Classification Techniques.

Year Authors Methodology Used Evaluation Parameters

Bands segmentation using SSAE, local • Indian Pines: OA-80.66%.

2016 Zabalza et al. [80]
SAEs to process original features. • Pavia Centre: OA-97.5%.

• Indian Pines: OA-98.08%, AA-97.67%

and k-97.81%.
Band normalisation between [0,1], PCA, • Pavia University: OA-96.46%, AA-93.32%
2017 Li et al. [104] GLCM for spatial features extraction, and k-95.31%.
CNN and ELM model. • Using image reconstruction helped in
increasing the AA of ELM by as high
as 30.04%.

LPP, Deep features using SAE, • Indian Pines: OA-84.4%.

2018 Singh and Kasana [93]
Logistic Regression • Salinas Valley: OA-87.2%.

Spectral and Spatial Mugnet DL • Indian Pines: OA-90.65%.

2018 Pan et al. [90] architecture to deal with lesser samples, • OA of 90.82% and 93.15% on Grss_dfc_2013
SVM classification. and Grss_dfc_2014 datasets.

• Indian Pines: OA-98.37%, AA-99.27%

3D CNN run on GPUs for spatial and and k-98.15%.
2018 Paoletti et al. [91]
spectral features. • Pavia university: OA-98.06%, AA-98.61%
and k-97.44%.

• Indian Pines: OA-98.02%.

Large spatial windows to extract local
• On combination with SVM, highest
neighbourhood features, convolution
2018 Chen et al. [92] accuracy of 98.39% and 98.44% was
kernels to merge spectral features. SVM
obtained in the Indian Pines and Pavia
classification with RBF kernel.
University dataset.

Improved the classification accuracy by at least

Spectral-spatial LSTM, PCA and 2.69%, 1.53% and 1.08% on Indian Pines, Pavia
2019 Zhou et al. [94]
softmax classification. University and Kennedy Space Centre
datasets, respectively.

• Indian Pines: OA-99.46%, AA-99.43%

and k-99.38%.
Supervised learning using 3D-2D spectral • Kennedy Space center: OA-99.89%,
2020 Cao et al. [99]
spatial hybrid dilated residual networks. AA-99.77% and k-99.88%.
• Pavia University: OA-99.81%, AA-99.69%
and k-99.74%.
Electronics 2023, 1, 0 28 of 35

Table 6. Cont.

Year Authors Methodology Used Evaluation Parameters

Resource frugal spectral CNN. Deep

model trained in full precision followed
The model four times smaller in size than
2020 Nalepa et al. [100] by fake quantization and then trained
original counterparts, segmented equally well.
again before being quantized to final
low-bit version.

Data normalization and CNN-based

• Indian Pines: OA-99.02% and AA-99.17%.
classification of HSI, Probabilistic PCA
2020 Vaddi et al. [101] • Pavia University: OA-99.94%
for spectral features and gabor filter for
and AA-99.92%.
spatial features.

Various approaches of deep neural

network models. VGG-verydeep-16, In second approach, accuracy of 99.91% was
2020 Jiao et al. [102]
RAEs and superpixels-based obtained on Pavia University dataset.
multi-local CNN.

Approximation of lost noisy bands, PCA • Indian Pines: OA-99.02% and AA-99.17%.
2021 Singh and Kasana [103] and LPP-based spectral-spatial features, • Pavia University: OA-99.94%
Deep network SAE. and AA-99.92%.

U-within-UNet architecture to handle

both spectral and spatial features for
2021 Manifold et al. [105] OA of 99.48% on Indian Pines.
segmentation, feature extraction
and classification.

Hierarchical Residual network with

OA of 99.80% and AA of 99.29% on
2021 Xue et al. [106] attention mechanism for multi-scale
Pavia University.
spectral spatial features.

• Indian Pines: OA-97.68% and AA-97.55%.

Spatial features using AE, Fused spectral
• Salinas Valley: OA-98.24% and AA-98.17%.
2022 Sellami et al. [107] and spatial features using deep AE into
• Pavia University: OA-99.16%
joint latent representation, graph CNN.
and AA-99.04%.

• Indian Pines: OA-97.69% and AA-97.19%.

Combination of LSTM, residual network
• Salinas Valley: OA-98.34% and AA-98.84%.
2022 Zhan et al. [108] and spectral spatial attention network for
• Pavia University: OA-95.87%
HSI classification
and AA-95.37%.

5. Discussion
After an extensive survey of spectral, spatial and spectral-spatial features-based classi-
fication of HSI, following insights have been observed.
• Majorly, land cover HSI datasets have been covered in this work. Indian Pines and
Pavia University are the commonly used dataset for classification as depicted in
Figure 14. Figure 15 displays the highest and lowest OA achieved by different classifi-
cation techniques in the survey.
• In traditional ML, kernel-based techniques have been employed for landcover images.
Table 1 shows the greatest OA of 99.5%, obtained with Shape adaptable kernels. It
incorporated spectral and spatial features, which helped to increase performance.
The main disadvantage of mathematical kernel is calculations overhead.
• SVM classifier, a kernel-based classifier, has been widely used for land cover im-
ages. The highest performance was an accuracy of 98.68%. SVM classifier improves
classification results when combined with a spatial Gaussian filter.
Electronics 2023, 1, 0 29 of 35

• The transform-based techniques aid in the denoising and compression HSI. Table 2
demonstrates the highest OA with SVM on benchmark landcover photos of 99.0% and
99.82 percent using Adaboost modelling to detect bruising in fruits.
• PCA has been commonly utilised as a data pre-processing step in traditional ML
approaches. It aided in the elimination of unnecessary spectral data.
• Many classification methods include dimension reduction techniques as pre-processing
steps. However, we have explicitly included a few different strategies, such as super-
vised, unsupervised, feature selection, and extraction, to emphasise their performance.
Table 5 demonstrates that the land cover image with bilateral filtering and spectral
similarity calculated and used in sparse representation classification and had the
greatest OA of 99.76%.
• DL techniques have heavily invaded into the research for HSI. It has shown better
performances due to in-built features processing and convolution kernels to deal with
complex HSI data. The resource frugal networks for land cover image achieved the
highest OA of 99.89% as evident in Table 7. However, the data partitioning remains
a challenge for HSI. Due to limited samples, training and testing data overlaps and
exaggerated results are recorded.

Figure 14. Percentage of majorly used datasets in existing techniques.

Figure 15. The highest and lowest OA achieved by different classification techniques in the survey.
Electronics 2023, 1, 0 30 of 35

Table 7. Comparison of performances of ML and DL techniques on landcover HSI.

Techniques OA Remarks

SVM [28] 95.75% SVM implemented with different kernels.

CNN offers more computations to handle

complex data and generate useful features. It
does not need an expert for manual labeling
SVM + DL [92] 98.4% which is the case for supervised classifiers like
SVM. In this work, convolution kernels were
used for spatial features. It helped spectral SVM
to perform better classification.

DWT was used for denoising and enhancement

Wavelet Transform [37] 93.85%
of HSI and fed to SVM.

CNN offers high computations and deals with

complex data with many parameters. Here,
Wavelet Transform + DL [36] 98.64% DWT combined with CNN reduced the learnable
parameters and created a light, robust
CNN architecture.

Extracted features through heavy computations

Simple Band reduction [65] 95.8%
of KL divergence, PSO and MKB/

DL offers automatic and multi-layer processing

for extracting features. It is more powerful than
manual hit and trial of different feature
Band Reduction + DL [47] 99.06% engineering techniques. The authors
implemented the computation power of
Autoencoders to extract informative spectral
spatial features.

SVM is a spectral classifier. The spatial features

SVM + Gaussian Filter [27] 98.68% from filter were combined with SVM
classification map.

High computation power of multi-scale CNN

DL + Gabor Filter [109] 99.38% extracted better spectral-spatial features than
SVM + Gaussian.

The purpose of this paper is to explore how well various categorization techniques
performed for HSI analysis. Some authors employed either spectral or spatial data, however
in recent papers, the emphasis has changed to both spectral and spatial data. In terms of
OA, Table 7 demonstrates significant differences between classic ML and DL approaches.
Although the OA of both algorithms is comparable, DL outperforms due to its automatic
feature development and robustness in dealing with complex HSI.

6. Conclusions and Future Scope

6.1. Conclusions
Exploration of HSI has created a new path in the field of research. With numerous real
world applications, its efficient classification is of utmost importance. Imaging hundreds of
spectral bands has certain advantages over multispectral and RGB imaging but still there
are fair share of limitations and disadvantages. The prime drawbacks are enlisted below:
There needs to be an improvement in dealing with the challenges that HSI analysis brings
with it as enlisted below:
• High cost and data complexity.
Electronics 2023, 1, 0 31 of 35

• Even though high spectral information is available, the low spatial resolution offers
irregularities and difficult interpretations.
• Vast number of continuous spectral channels also gives birth to redundant and less
informative bands.
• The dataset available have limited labelled training samples.
• With lesser number of samples and huge number of spectral bands, Hughes Phe-
nomena occurs in HSI. In this, with increasing bands and data, the classification
performance increases initially but decreases gradually.
• Target detection also remains one of HSI’s significant challenges, as the inherent
variability in target and background spectra poses a severe obstacle to developing
effective target detection algorithms for HSI. This may be due to the problem of un-
known backgrounds or shortage of sufficient target data, making it more challenging
and becoming a problem to be solved by more sophisticated techniques.

6.2. Future Scope

The exhaustive survey brought into light the existing classification techniques prac-
ticed on HSI and compared their performances. Keeping the limitations and challenges of
HSI in view, below discussed techniques can put the researchers in a newer and brighter
side of HSI analysis.
• Meta-Learning: Learning how to learn is the spine of meta-learning. It constructs
algorithms that combine the predictions of other models and apply them to iden-
tical datasets. The prospect of meta-learning is an untapped area of research for
HSI classification.
• Some different datasets: Existing research and improved techniques need to be tested
on newer datasets such as Berlin, RIT-18 Remote Sensory Dataset and others. This
provides robust and more general techniques for classification.
• ELM: ELM is a new area to be explored for HSI that might handle overfitting better
and slow training pace.
• Research in automatic selection and optimization of parameters for SVM, dimension-
ality reduction, DL, and other techniques demands an efficient evolutionary technique
or genetic algorithms, as this is still an open research area with enormous possibility
for refinement.

Author Contributions: All the authors made significant contributions to this work. Conceptualiza-
tion, S.S.K. and G.K.; Writing—original draft preparation, R.G.; Writing—revision and editing, R.G.,
S.S.K. and G.K. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding
Data Availability Statement: Publicly available datasets were analyzed in this study. This data can
be found here []
Conflicts of Interest: The authors declare that there is no conflict of interest.

