Effective Denoising and Classification of Hyperspectral Images Using Curvelet Transform and Singular Spectrum Analysis

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1
Effective Denoising and Classification

of Hyperspectral Images Using Curvelet
Transform and Singular Spectrum Analysis
Tong Qiao, Jinchang Ren, Member, IEEE, Zheng Wang, Jaime Zabalza, Meijun Sun, Huimin Zhao,
Shutao Li, Senior Member, IEEE, Jón Atli Benediktsson, Fellow, IEEE, Qingyun Dai, and Stephen Marshall
Abstract—Hyperspectral imaging (HSI) classification has be- I. I NTRODUCTION

come a popular research topic in recent years, and effective
feature extraction is an important step before the classification
task. Traditionally, spectral feature extraction techniques are ap-
plied to the HSI data cube directly. This paper presents a novel
H YPERSPECTRAL imaging (HSI) sensors, which capture
hundreds of continuous bands in a broad spectral range
covering visible, near-infrared, and beyond, play an important
algorithm for HSI feature extraction by exploiting the curvelet- role in many research areas, including applications in food
transformed domain via a relatively new spectral feature process- quality control and analysis [1], pharmaceuticals [2], computer-
ing technique—singular spectrum analysis (SSA). Although the
wavelet transform has been widely applied for HSI data analysis, based forensics and security [3], as well as planet surface
the curvelet transform is employed in this paper since it is able investigation such as Mars [4]. In addition to these applications,
to separate image geometric details and background noise effec- one of the most active areas of HSI is remote sensing, where
tively. Using the support vector machine classifier, experimental researchers develop diverse algorithms based on it, e.g., target
results have shown that features extracted by SSA on curvelet detection for military surveillance [5], data compression for
coefficients have better performance in terms of classification
accuracy over features extracted on wavelet coefficients. Since the faster transmission [6], [7], and surface and data classification
proposed approach mainly relies on SSA for feature extraction on for land-cover analysis [8]–[12]. However, due to the character-
the spectral dimension, it actually belongs to the spectral feature istics of HSI, data redundancy is inevitable. Furthermore, for re-
extraction category. Therefore, the proposed method has also been mote sensing HSI, noise could be involved during the process of
compared with some state-of-the-art spectral feature extraction data acquisition and transmission. Therefore, effective feature
techniques to show its efficacy. In addition, it has been proven that
the proposed method is able to remove the undesirable artifacts extraction and denoising of HSI data is necessary for remote
introduced during the data acquisition process. By adding an sensing applications, particularly for supervised classification
extra spatial postprocessing step to the classified map achieved problems as discussed in this paper.
using the proposed approach, we have shown that the classification Feature extraction could be considered as a linear or nonlin-
performance is comparable with several recent spectral–spatial ear data transformation. Several unsupervised and supervised
classification methods.
feature extraction methods have been proposed by researchers,
Index Terms—Classification, curvelet transform, hyperspectral including the most widely used principal component analysis
imaging (HSI), singular spectrum analysis (SSA), spatial post- (PCA) [8], [12], along with other approaches such as inde-
processing (SPP), support vector machine (SVM).
pendent component analysis [13], linear discriminant analysis
(LDA) [14], minimum noise fraction [15], nonnegative ma-
trix factorization (NMF) [16], and singular value decompo-
Manuscript received March 9, 2016; revised June 10, 2016; accepted July 23,
2016. This work was supported in part by the National Natural Science Foun- sition (SVD) [17]. The aforementioned approaches not only
dation of China under Grants 61672008, 61202165, and 61572351, by the extract features but also could reduce the dimensionality of
Science and Technology Major Project of Education Department of Guangdong HSI data. Other alternatives exist for supervised feature ex-
Province, China under Grant 2014KZDXM060, by the Natural Science Foun-
dation of Guangdong Province, China under Grant 2016A030311013, and by traction without reducing data dimensionality, such as spectral
the International Scientific and Technological Cooperation Projects of Educa- curve fitting of hyperspectral image bands [18]. Additionally,
tion Department of Guangdong Province, China under Grant 2015KGJHZ021. an impressive technique, which is called singular spectrum
(Corresponding author: Jinchang Ren.)
T. Qiao, J. Ren, J. Zabalza, and S. Marshall are with the Department analysis (SSA), has recently demonstrated its ability in effective
of Electronic and Electrical Engineering, University of Strathclyde, Glasgow feature extraction for HSI data. SSA is a great improvement
G1 1XW, U.K. (e-mail: jinchang.ren@strath.ac.uk). in terms of potential classification accuracy compared with
Z. Wang and M. Sun are with the School of Computer Software and School
of Computers, Tianjin University, Tianjin 300072, China. another popular technique, namely, empirical mode decomposi-
H. Zhao and Q. Dai are with the School of Electronic and Information, tion (EMD) [9]–[11]. Rather than reducing the dimensionality,
Guangdong Polytechnic Normal University, Guangzhou 510665, China. SSA ‘smooths’ the spectral profile of HSI data so that effective
S. Li is with the College of Electrical and Information Engineering, Hunan
University, Changsha 410082, China. features can be extracted and classification accuracy can be
J. A. Benediktsson is with the Faculty of Electrical and Computer Engineer- improved accordingly.
ing, University of Iceland, 101 Reykjavik, Iceland. It is assumed that directly applying denoising techniques on
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. the original image could probably remove fine features and
Digital Object Identifier 10.1109/TGRS.2016.2598065 noise at the same time, which is undesirable for following
0196-2892 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
applications. To avoid this problem, the multiscale wavelet- This paper is organized as follows: The theory of the curvelet
transform-based approaches are widely used for image denois- transform and SSA is introduced in Section II. The proposed
ing, where the transform decomposes the image into a set denoising and feature extraction approach applied on HSI data
of wavelet coefficients at different decomposition levels, and is given in Section III. Section IV introduces the experimental
noise in the low-energy channels of the transformed domain setup, including data sets, the classifier, and tuning of optimal
can be removed. A famous algorithm called soft threshold- parameters. Results and analysis are presented in Section V.
ing in the wavelet domain was proposed by Donoho in [19]. Conclusions are drawn in Section VI.
By removing small coefficients under a certain threshold and
shrinking large coefficients, most unwanted noise can be easily
II. BACKGROUND P RINCIPLES
discarded. For HSI data, Othman and Qian proposed a hybrid
spatial–spectral derivative-domain wavelet shrinkage approach In this section, the theory of the curvelet transform and
based on soft thresholding [20]. This algorithm works in the SSA [9], [26], [30]–[33], which will be used in the following
spectral derivative domain, in which the noise level is elevated sections, is reviewed.
and signal regularity is dissimilar in the spatial and spectral
domains of HSI data. Recently, Chen and Qian combined
A. Curvelet Transform
feature extraction and denoising, leading to a more effective
denoising method for HSI data using PCA and wavelet shrink- Similar to the wavelet transform, the curvelet transform
age [21]. In this approach, first, PCA transform is performed could also provide multiscale analysis on images. It was first
on the HSI data, and then, a 2-D bivariate wavelet thresh- proposed by Candes and Donoho in 1999, and their version is
olding method is used to remove noise for low-energy PCA also called the first-generation curvelet transform [34]. How-
channels. ever, this transform is rather complicated and needs at least four
In this paper, we are also aiming at combining ideas of steps to complete, including subband decomposition, smooth
feature extraction and denoising together for improving clas- partitioning, renormalization, and ridgelet analysis [34]. A few
sification accuracy of remote sensing hyperspectral images. years later, a simpler and faster version of the curvelet transform
This work is inspired by that in [22] where the curvelet was developed, which is the second-generation transform called
transform is applied to HSI data, and the representation of the fast discrete curvelet transform (FDCT) [31]. Two forms of
noise-free signals in the curvelet domain is predicted us- FDCT with the same computational complexity were proposed
ing multiple linear regression (MLR), which is a regression by Candes et al., based on the unequally spaced fast Fourier
that has been previously used for noise analysis [23], [24]. transforms (USFFTs) and the wrapping of specially selected
Although the wavelet transform has been widely applied for Fourier samples, respectively. The curvelet transform used here
image denoising, many studies have concluded that the wavelet is the one based on USFFT.
transform cannot provide a good representation of anisotropic Let us assume there is a pair of smooth, nonnegative, and
singularity, such as curves or edges in the image [25]–[27]. For real-valued windows called the radial window W (r) and the
this reason, soft-thresholding directional curvelet coefficients angular window V (t), such that W is supported on r ∈ (1/2, 2)
that match image edges could achieve better noise reduction and that V is supported on t ∈ [−1, 1]. These windows will
effect than the coefficients obtained in the wavelet domain. It always satisfy the admissibility conditions, i.e.,
has been proven that the curvelet transform could represent
∞
piecewise-linear contours on multiple scales through a few 3 3
W 2 (2j r) = 1, r ∈ , (1)
significant coefficients, leading to a better separation between j=−∞
4 2
geometric details and background noise [28]. Therefore, the ∞
1
curvelet transform is a good candidate for image denoising and V 2 (t − l) = 1, t ∈ −1/2, . (2)
2
enhancement. After the curvelet transform, the coefficients of l=−∞
two adjacent bands still maintain the correlation similarity of
For each scale j, the frequency window Uj is defined in the
the original HSI data [29], which means spectral processing
Fourier domain by
techniques could then be applied in the curvelet domain. Fea-
[j/2]
ture extraction by MLR in [22] is achieved by estimating the 2 θ
noise-free band by using all pixels in adjacent bands at the same Uj (r, θ) = 2−3j/4 W (2−j r)V (3)
2π
time, while more accurate spectral feature extraction is pro-
posed to be achieved by applying the aforementioned SSA to where [j/2] is the integer part of j/2. Consequently, the support
each pixel. The denoising performance in [22] is then assessed of Uj is defined by the support of W and V , which will be a
by adding noise to the original HSI data and comparing the polar wedge.
mean square errors and the mean structural similarity between A mother curvelet waveform ϕj (x) is defined related to
the original data and denoised data, where the classification Uj (ω) (Uj (r, θ) will be abbreviated as Uj (ω)), where the
performance on the denoised image is unknown. We therefore Fourier transform of ϕj is equal to Uj . Thus, all curvelets
adopt the curvelet transform instead of the wavelet transform in of scale 2−j can be acquired by rotations and translations
our approach and combine SSA to extract effective spectral fea- of ϕj , where the equispaced sequence of rotation angles is
tures in the curvelet domain to improve classification accuracy θl = 2π · 2−[j/2] · l, with l = 0, 1, . . . such that 0 ≤ θl < 2π,
for remote sensing hyperspectral images. and the sequence of translation parameters k = (k1 , k2 ) ∈ Z 2 .
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 3
Then, we can have the curvelets at scale 2−j , orientation θl , and 1) 2-D FFT: Apply a 2-D fast Fourier transform (FFT) to
position xk = Rθ−1 (k1 · 2−j , k2 · 2−j/2 ), defined as
(j,l)
input arrays and obtain Fourier samples as

l
(j,l)
n−1
ϕj,l,k (x) = ϕj Rθl x − xk (4) fˆ[n1 , n2 ] = f [t1 , t2 ]e−i2π(n1 t1 +n2 t2 )/n (13)
t1 ,t2 =0
where Rθ is the rotation by θ radians, and Rθ−1 is its inverse, i.e.,
with −n/2 ≤ n1 , n2 < n/2.
cos θ sin θ 2) Interpolation: For each pair of scale and angle (j, l),
Rθ = . (5)
− sin θ cos θ Fourier samples fˆ[n1 , n2 ] are interpolated to get new values,
fˆ[n1 , n2 − n1 tan θl ], for n1 , n2 ∈ Pj . Pj is defined as a set in
A curvelet coefficient of an element f ∈ 2 is the inner the following equation:
product of f and a curvelet ϕj,l,k , i.e.,
Pj = {(n1 , n2 ) : n10 ≤ n1 < n10 +L1,j n20 ≤ n2 < n20 +L2,j }
c(j, l, k) := f, ϕj,l,k = f (x)ϕj,l,k (x)dx. (6) (14)
R2
where L1,j and L2,j are the length and width of a rectangle,
and (n10 , n20 ) is the index of the pixel at the bottom left of the
Our discussion of the digital curvelet transforms will always rectangle.
be in the frequency domain. Therefore, the curvelet coeffi- 3) Multiplication: Multiply the interpolated samples with
cient could be reexpressed in the frequency domain using the frequency window U
j and obtain
Plancherel’s theorem, i.e.,
f
j,l [n1 , n2 ] = fˆ[n1 , n2 − n1 tan θl ]U

j [n1 , n2 ]. (15)
1
c(j, l, k) := fˆ(ω)ϕ̂j,l,k (ω)dω 4) Inverse 2-D FFT: The last step is to apply the inverse 2-D
(2π)2
1

(j,l)
FFT to f
j,l to get the discrete curvelet coefficients, i.e.,
= fˆ(ω)Uj (Rθ ω) ei xk ,ω dω. (7)
l
(2π)2 cD (j, l, k) = fˆ[n1 , n2 − n1 tan θl ]
n1 ,n2 ∈Pj
In practice, instead of using the polar window defined in (3),
j [n1 , n2 ]ei2π(k1 n1 /L1,j +k2 n2 /L2,j ) . (16)
×U
it is more common to use Cartesian equivalents. The Cartesian
frequency window is shown as follows: With all four steps, the discrete curvelet transform via USFFT
requires O(n2 log n) flops for computation and O(n2) for storage.

j (ω) := ψj (ω1 )Vj (ω)
U (8)
B. Singular Spectrum Analysis
where we define ψ(ω1 ) = φ(ω1 /2)2 − φ(ω1 )2 as a bandpass As a well-established approach, SSA has been applied for
profile in ψj (ω1 ) = ψ(2−j ω1 ) and Vj (ω) = V (2[j/2] ω2 /ω1 ) with time-series analysis and forecasting and is widely used in
V still obeying (2). By introducing the set of equispaced slopes different areas, including mathematics, economics, and even
tan θl := l · 2−[j/2] with l = −2[j/2] , . . . , 2[j/2] −1, for each θl ∈ biomedical engineering [35]. SSA shares the same theoretical
[−π/4, π/4), the Cartesian window can be rewritten as foundations as PCA. Both of them are able to decompose

j,l (ω) := ψj (ω1 )Vj (Sθ ω) the original time series into a linear combination of a new
U (9)
l
orthogonal basis, which includes eigenvectors generated from
the diagonalization of the data correlation matrix [36]. The main
where Sθ is the shear matrix defined as
capability of SSA is that it can decompose the original series

1 0 into some interpretable components, such as the trend, oscilla-
Sθ := . (10) tions, and unstructured noise [9]. The SSA algorithm consists
− tan θ 1
of two stages including decomposition and reconstruction, and
Similarly, curvelets should be redefined in Cartesian form as it is briefly explained as follows.
follows: 1) Decomposition: Assume that X is a nonzero 1-D series
vector with length N , i.e., X = (x1 , x2 , . . . , xN ). Given a

j SθTl x − Sθ−T
3j

j,l,k (x) = 2 4 ϕ
ϕ l
b (11) window size L(1 < L < N ), the original series X is mapped
to K lagged vectors, Xi = (xi , xi+1 , . . . , xi+L−1 )T for i =
with b := (k1 · 2−j , k2 · 2−j/2 ), where superscript T represents 1, 2, . . . , K, where K = N − L + 1. The window size should
the transpose of the matrix. Accordingly, the curvelet coeffi- be properly chosen depending on the application. Then, the
cient will be trajectory matrix is formed as

T = X1 X2 ··· XK
c(j, l, k) = fˆ (Sθl ω) U
j (ω)eib,ω dω. (12) ⎛ ⎞
x1 x2 ··· xK
⎜ x2 x3 ··· xK+1 ⎟
⎜ ⎟
Assume the input Cartesian arrays for an n × n image are in =⎜ . . . .. ⎟ . (17)
⎝ .. .. .. . ⎠
the form of f [t1 , t2 ],0 ≤ t1 ,t2 < n, a four-step implementation
of FDCT via USFFT can be obtained as follows. xL xL+1 ··· xN
It should be noted that the matrix T in (17) is a Hankel

matrix of size L × K, which means that its entries along the
antidiagonals are equal.
The next step is to compute the SVD of trajectory matrix T.
First, the eigenvalues of TTT are calculated and sorted in
descending order, i.e., λ1 ≥ λ2 ≥ · · · ≥ λL ≥ 0. Let the corre-
sponding eigenvectors be (U1 , U2 , . . . , UL ), and the resulting
trajectory matrix after SVD is shown as
T = T1 + T2 + · · · + Td (18)
√
where d is the rank of T, Ti = λi Ui ViT (i = 1, 2, . . . , d)
is called
√ the elementary matrix with rank 1, and Vi =
TT Ui / λi are often referred to as the principal components of
matrix T. Generally, the contribution of the elementary matrix Ti
to the trajectory matrix T is determined by the ratio of each
eigenvalue and the sum of all eigenvalues, as shown in the
following equation:
λi
ηi = . (19)

d
λi
i=1
2) Reconstruction: The first step of reconstruction is group-

ing, where the set of indexes {1, 2, . . . , d} is divided into m
disjoint subsets {I1 , I2 , . . . , Im }. Each Ij (j = 1, 2, . . . , m)
contains one or several elementary matrices Ti , which are
summed up within each group. Then, the resultant trajectory
matrices can be computed for each group, and consequently,
(18) is expanded as follows:
Fig. 1. Flowchart of the proposed methodology.
SVD

T = T1 + T2 + · · · + Td Therefore, the original series vector could be reconstructed
Grouping by only using the first or the first few groups generated from
its eigenvalues, and the rest could be discarded as noise. For
= TI1 + TI2 + · · · + TIm . (20)
example, if the window size L is set as ten, reconstruction using
the first three to five components should usually be enough to
The last step is diagonal averaging, which transforms each
achieve a good denoising performance. However, there is no
grouped matrix TIj into a new series with length N . This
general rule for the grouping. Like the window size L, one can
process is also known as Hankelization of the matrix TIj .
choose the eigenvalue grouping (EVG) value depending on the
Assume that Y1 = (y1 , y2 , . . . , yN ) is the transformed 1-D
application. If all decomposed components are involved in the
series of TI1 , then the elements in Y1 can be calculated using
EVG, the reconstructed series will be the same as the original
(21) by averaging the corresponding diagonals of TI1 , i.e.,
series.
⎧ k
⎪ ∗
⎪
⎪ 1
yj,k−j+1 , 1 ≤ k < L∗
⎪
⎪ k
III. P ROPOSED A PPROACH : A PPLYING
⎪
⎪ j=1
⎨
L∗ SSA IN THE C URVELET D OMAIN
∗
yk = L1∗ yj,k−j+1 , L∗ ≤ k ≤ K ∗ (21)
⎪
⎪ As briefly mentioned in Section I, the proposed denoising and
⎪
⎪
j=1
⎪
⎪ ∗ +1 ∗
N −K
feature extraction approach will be applied on hyperspectral
⎪ 1
⎩ N −k+1 yj,k−j+1 , K ∗ < k ≤ N
j=k−K ∗ +1 images, and the flowchart is shown in Fig. 1 for reference.
Hyperspectral sensors have a relatively high spectral resolution
where L∗ = min(L, K), K ∗ = max(L, K), yj,k−j+1 ∗
refers and can generate hundreds of observation channels [37]. The
∗
to the elements in TI1 , yj,k−j+1 = yj,k−j+1 if L < K, and obtained 3-D HSI data cube can be regarded as a stack of
∗
yj,k−j+1 = yk−j+1,j if L ≥ K. Then, the initial series X = 2-D images of the same scene corresponding to different wave-
(x1 , x2 , . . . , xN ) is decomposed into m series, as shown in the lengths [22], and the correlation between each two adjacent
following equation: bands is fairly high. The first step of the proposed method
is to perform the curvelet transform on each band of the
X = Y1 + Y2 + · · · + Ym . (22) hyperspectral image, and a few image stacks are generated after
the transform as shown in Fig. 1. By applying the curvelet

transform on each band, the band correlation property can be
preserved so that SSA is able to exploit the spectral signature.
The next step is to apply SSA in the spectral dimension for
each detail image stack for feature extraction and denoising.
After that, the denoised detail images at each band are gathered
according to their original location, followed by the inverse Fig. 2. Illustration of a 128 × 128 image with four decomposition scales.
(a) First (finest) scale with one orientation. (b) Second scale with eight
curvelet transform to get the denoised hyperspectral image. orientations. (c) Third scale with eight orientations. (d) Fourth (coarsest) scale
In this paper, a hyperspectral image is denoted as I(p, q) = with one orientation.
(I1 (p, q), I2 (p, q), . . . , IB (p, q)) ∈ B , where p ∈ [1, P ], q ∈
[1, Q], P × Q is the spatial dimension of the hyperspectral
image, and B is the number of bands. represents the set As suggested by the commonly adopted wavelet denois-
of real numbers with the pixel intensity Ib (p, q) at all sen- ing rule, noise is removed by thresholding only the wavelet
sor channels with b ∈ [1, B]. The 2-D curvelet transform via coefficients of the detail subbands, while keeping the low-
USFFT employed in this paper is completed using the toolbox frequency coefficients unchanged [39]. Therefore, in our ap-
CurveLab (version 2.1.2) [38]. Similar to the wavelet trans- proach, the coarse image stack stays unaltered as well. For
form, the curvelet transform can also decompose the image each detail image stack, SSA is applied to the spectral di-
into a coarse image and several detail images. Just like most mension for smoothing the spectral profile, followed by the
image processing algorithms, the curvelet transform requires inverse curvelet transform to get the denoised HSI data cube.
the processed image to be a square whose dimension is a power Given a detail image stack at scale 2−j and orientation l,
of 2. If the size of the original image is not a power of 2, pixels which is denoted as Wb,j,l (x, y) (b ∈ [1, B], x ∈ [1, X], y ∈
with a value of zero are padded to the next larger power of 2. [1, Y ]) containing X × Y coefficients in each band, the
Given the zero-padded HSI data set, I_pad(m, n) = spectral series vector for one pixel can be constructed
(I_pad1 (m, n), I_pad2 (m, n), . . . , I_padB (m, n)) where m, as Wj,l (x, y) = (W1,j,l (x, y), W2,j,l (x, y), . . . , WB,j,l (x, y)).
n ∈ [1, N ], consisting of B bands where each band has Then, the smoothing process on the spectral dimension by SSA
N 2 pixels, the curvelet transform is performed on a band image could be achieved, as follows:
I_padb to obtain the curvelet coefficients corresponding to that
Yj,l (x, y) = SSAL,EVG (Wj,l (x, y)) (26)
band cD b , i.e.,
in which SSA is the operation of the SSA listed in
cD
b = CT(I_padb ) (23) (18)–(22), L, EVG are SSA parameters, and Yj,l (x, y) =
(Y1,j,l (x, y), Y2,j,l (x, y), . . . , YB,j,l (x, y)) is the smoothed
where CT stands for the discrete curvelet transform operation
spectral feature in the curvelet domain. After collecting all
explained in Section II. The decomposition results of the trans-
smoothed spectra for each detail stack, the inverse discrete
form can be regarded as a superposition in the following form:
curvelet transform is performed on the smoothed curvelet co-

J−1 efficients band by band using the same toolbox CurveLab, and
cD
b = Cb,J + Wb,j (24) the denoised HSI data cube is achieved, shown as follows:
j=1 ⎛ ⎞

J−1 8
where Cb,J is the coarse version of the original band image with I_db = ICT ⎝Cb,J,1 + Yb,1,1 + Yb,j,l ⎠ (27)
low-frequency contents, and Wb,j stands for the detail band j=2 l=1
image at scale 2−j containing high-frequency contents. With an
N × N image, the default number of decomposition scales is where ICT stands for the inverse curvelet transform operation.
J = log2 (N ) − 3 as set in the CurveLab toolbox. Take a band The last step is to crop the denoised HSI cube to its original size.
image with the size of 128 × 128 for instance, the number of The pseudocode of the proposed curvelet and SSA approach is
decomposition scales J will be four. According to the settings summarized in Algorithm 1.
in the toolbox, there are eight orientations of the curvelet in the
second and third scales, starting from the top-left wedge and Algorithm 1 The Curvelet and SSA algorithm
increasing in a clockwise fashion, but for the coarsest and finest
scales, there is only one direction. Thus, (24) is rewritten as 1: procedure CURVELETSSA(I, L, EVG)
P × Q × B
HSI data set I, SSA window size L, and grouping param-

J−1 8
cD eter EVG
b = Cb,J,1 + Wb,1,1 + Wb,j,l (25)
j=2 l=1 2: if P = Q & P , Q are not power of 2 then
3: zero-pad each band of I to the next larger power of 2
where Wb,1,1 represents the finest scale with one orientation. to get
The schematic of four decomposition scales corresponding to 4: a new data cube I_pad with size of
a 128×128 image is shown in Fig. 2. Then, after applying the 5: N ×N ×B
curvelet transform to all bands, there will be a coarse image 6: else
stack and 17 detail image stacks. 7: I_pad ← I
8: end if
9: for b ← 1, B do
10: apply curvelet transform to I_padb to get
11: coarse Cb,J and detail Wb,j,l images
12: end for
13: for all detail image stacks do
14: for x ← 1, X do
15: for y ← 1, Y do
16: Wj,l (x, y) ← (W1,j,l (x, y), . . . ,
17: WB,j,l (x, y))
18: Yj,l (x, y) ← SSA(Wj,l (x, y), L, EVG)
19: end for
20: end for Fig. 3. Indian Pines data set. (Left) Band 150 out of 200 bands. (Right)
Reduced nine-class ground truth map with the number of samples shown.
21: end for
22: substitute original detail curvelet coefficients
23: with SSA-processed detail coefficients
24: for b ← 1, B do
25: apply inverse curvelet transform to Cb,J and
26: Yb,j,l to get denoised band image I_db
27: end for
28: crop I_db to the original size P × Q × B
29: end procedure
IV. DATA S ETS AND E XPERIMENTAL S ETUP

In this section, the experimental data sets will be introduced
as well as the choice of the classification model and parameters.
Three publicly available and widely used HSI remote sens-
ing data sets are employed to evaluate the performance of
the proposed method in this paper, including the Airborne
Visible/Infrared Imaging Spectrometer natural scenes Indian
Pines [40] and Salinas Valley [41] as well as the Reflec-
tive Optics System Imaging Spectrometer urban scene Pavia
University [42].
The Indian Pines data set was collected in the Indian Pines
test site in Northwest Indiana, USA on June 12, 1992. The
scene consists of two-thirds agriculture, and one-third forest, Fig. 4. Salinas Valley data set. (Left) Band 50 out of 204 bands. (Right) The
or other natural perennial vegetation [43]. Acquired in band- 16-class ground truth map with the number of samples shown.
interleaved format, it has 145 × 145 pixels with a spatial reso-
lution of 18 m and 220 continuous spectral channels ranging
from 400 to 2500 nm covering the complete visible–near- resolution of 3.7 m per pixel [46]. The full scene, which
infrared–shortwave-infrared spectrum. The nominal spectral includes vegetables, bare soils, and vineyard fields, comprises
resolution is 10 nm, and the radiometric resolution is 16 bits 512 lines by 217 samples with 224 spectral bands. The spectral
[44]. Due to atmospheric water absorption, bands 104–108, and radiometric resolutions of this data set are the same as for
150–163, and 220 do not contain useful information and are Indian Pines. After removing water absorption bands (108–112,
consequently removed to prevent from decreasing the classifi- 154–167, and 224), the obtained image then has 204 channels
cation accuracy, resulting in a reduced data set with 200 spectral covering 400–2500 nm [44]. The ground truth map of Salinas
bands. There are 16 classes in the original ground truth map. Valley contains 16 classes as shown in Fig. 4.
However, some classes have insufficient samples for training The Pavia University data set was collected during a flight
the classification model. As suggested by other researchers [9], campaign over the Pavia district in north Italy, with a spatial res-
[37], [45], 7 of 16 classes are discarded for more consistent olution of 1.3 m per pixel [47]. There are originally 115 bands
results, leaving nine classes considered in experiments. One with a spectral coverage ranging from 430 to 860 nm. However,
band image and the reduced ground truth map of Indian Pines 12 channels have been removed due to noise, leaving 103 bands
are shown in Fig. 3. with 610 × 340 pixels per band. Nine classes of interest are
The second data set was collected over Salinas Valley, CA provided in the ground truth map, including urban, soil, and
at low altitude on October 9, 1998, resulting in a high spatial vegetation features, as shown in Fig. 5.
Fig. 5. Pavia University data set. (Left) Band 70 out of 103 bands. (Right)
Nine-class ground truth map with the number of samples shown.
The spatial area of all data sets is not a square; hence,

before applying the curvelet transform, Indian Pines, Salinas
Valley, and Pavia University are zero-padded to the size of
256 × 256 × 200, 512 × 512 × 204, and 1024 × 1024 × 103,
Fig. 6. Sensitivity analysis of (left) window size and (right) number of decom-
respectively. Then, when the denoising and feature extraction position levels for (top) Indiana Pines, (middle) Salinas Valley, and (bottom)
process is finished, reconstructed images are cropped to their Pavia University.
original sizes for the following performance evaluation.
In the context of supervised classification, a variety of
methods have been developed for HSI data classification prob- Since the proposed methodology is inspired by the feature
lems, including the famous artificial neural networks [48]–[50], extraction approach based on SSA [9] and the denoising ap-
multinomial logistic regression [51], as well as the widely proach in [22], which uses MLR applied in the curvelet domain,
used support vector machine (SVM) that shows its outstanding the proposed methodology is compared with the feature ex-
performance in many papers [1], [8], [37], [52]. The properties traction approach and the denoising approach in Section V-A.
of SVM make it an effective tool for HSI data classification Moreover, the curvelet transform is similar to the wavelet
problems, which are influenced by the Hughes phenomenon transform as a multiscale geometric analysis. Therefore, SSA
[53]. The Hughes phenomenon is also known as the “curse of and MLR are also applied in the wavelet domain for comparison
dimensionality,” where the high number of spectral bands and in this section. For the wavelet-transform-based approaches,
low number of labeled training sample result in a decrease in the the hyperspectral data are also zero-padded, and the number of
generalization capability of the classifier [37]. In this paper, a decomposition scales is set the same as the curvelet transform,
publicly available SVM library called LIBSVM [54] is adopted, where the Cohen–Daubechies–Feauveau (CDF) 9/7 wavelet is
and the Gaussian radial basis function (RBF) kernel is chosen as adopted. For convenience, the proposed approach is denoted as
it has been shown to outperform other kernel functions for HSI CT-SSA, and other inspirational approaches are named SSA,
data, such as linear and polynomial kernel functions, in terms of CT-MLR (MLR applied in the curvelet domain), WT-MLR
classification accuracy [37]. The optimal values of regulariza- (MLR applied in the wavelet domain), and WT-SSA (SSA ap-
tion parameter C and width parameter γ in the Gaussian RBF plied in the wavelet domain). The computational cost required
kernel are achieved using tenfold cross validation on training for these methods is also discussed in this part.
samples, where values of both parameters are tested within the Then, the CT-SSA method as well as its extended version
exponentially increasing sequence {2−10 , 2−9 , . . . , 210 }. CT-SSA-PCA are further compared with some state-of-the-art
spectral feature extraction techniques in Section V-B, including
PCA, LDA, NMF, and ensemble EMD (EEMD) [55], to show
V. R ESULTS AND A NALYSIS
the efficacy of the proposed methodology.
In this section, the classification results based on SVM for The proposed approach only takes into account the spectral
three hyperspectral data sets are revealed. The classification information of the HSI data, while ignoring the important
performance is evaluated by the overall accuracy (OA) and the spatial information. For this reason, in Section V-C, a simple yet
average accuracy (AA), where OA refers to the percentage of all powerful postprocessing technique is applied to the classified
pixels that are correctly labeled, and AA stands for the average ground truth map, and it is compared with several recent
percentage of correctly labeled pixels for each class. spectral–spatial classification methods listed in [56].
Fig. 7. Classification results (OA in percentage) of Indian Pines obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data, (d) WT-MLR
processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
Fig. 8. Classification results (OA in percentage) of Salinas Valley obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data, (d) WT-MLR
processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
A. Comparison With Inspirational Approaches that the default setting of decomposition levels should always
The SSA-based feature extraction approach mentioned in [9] be used in the curvelet transform.
has two parameters: window size L and grouping parameter For the Indian Pines data set, in each class out of the nine
EVG. It is noticed in [9] that good results can be achieved with classes, 10% of pixels are randomly extracted into the training
only the first component used for EVG, as long as a proper set with the rest allocated in the testing set. As Salinas Valley
window size L is chosen. Another parameter in the proposed is a high-resolution HSI data set, spectra with this scene are
approach is the decomposition level in the curvelet transform. more separable than those with Indian Pines. Therefore, a lower
As previously mentioned, the default number of decomposition percentage of samples in each class (5%) is used for training
levels in the CurveLab toolbox is J = log2 (N ) − 3, and it is the classification model of SVM to give a more interpretable
advised that the number of decomposition levels should not result. A recent publication proposed to use a fixed number
be higher than the default setting. Therefore, different values (200) from each class in the Pavia University data set as training
have been tested to look for optimal parameters. First, the samples [56], which account for 4% of the whole labeled
default number of decomposition levels is used in the curvelet pixels. Therefore, 4% of pixels are also randomly chosen from
transform, and the window size L is tested from 2 to 5 as Pavia University for training SVM in the following experiment.
plotted in Fig. 6. Similarly, experiments have been carried out Figs. 7–9 have shown the classification maps as well as OA
ten times, and the average OA as well as the standard deviation levels for three data sets. Compared with those inspirational
are shown in the plots. By trial and error, the optimal value for L methods, the CT-SSA approach always presents better denois-
is set as 5 for both Indian Pines and Salinas Valley. For Pavia ing and feature extraction performance in terms of the highest
University, L is set as 4. The same settings of SSA are used for classification accuracy.
CT-SSA and WT-SSA as well. With the chosen optimal values Additionally, the capability of the proposed CT-SSA method
for window size L, the number of decomposition levels is tested in removing noise is compared. One noisy band is taken from
from 2 to the default setting. Results plotted in Fig. 6 indicate each data set, and the same band after being processed by
Fig. 9. Classification results (OA in percentage) of Pavia University obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data,
(d) WT-MLR processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
corrupted band is effectively suppressed, and local details of the

original image can be kept simultaneously.
To avoid errors and to get a more consistent result, ten
repeated experiments are carried out based on randomly chosen
training and testing samples, and average classification accu-
racy values are calculated, with numerical results shown in
Tables I–III for three experimental data sets. It can be noticed
that compared with the original image, OA using the CT-SSA
Fig. 10. Band 199 of Indian Pines in (a) original data and (b) CT-SSA processed method is increased with a percentage of 10.02%, 4.44%, and
data. 0.89%, respectively, for three data sets, achieving an impressive
improvement over other methods. For both Indian Pines and
Salinas Valley, in most cases, the proposed method presents
the highest classification accuracy; particularly in class 15
of Salinas Valley, the accuracy is improved by over 20% com-
pared with the raw image. Although it was found that the clas-
sification of hyperspectral urban data is a challenging problem
without combining the spatial and spectral information together
[47], both AA and OA are slightly improved for Pavia Univer-
sity using the proposed spectral processing method CT-SSA.
In comparison with CT-MLR, WT-MLR yields slightly worse
results in both AA and OA. Similar situations occur for
WT-SSA and CT-SSA, where the latter presents slightly higher
AA and OA than the former. It proves that the curvelet trans-
Fig. 11. Band 2 of Salinas Valley in (a) original data and (b) CT-SSA processed form does have advantages over the wavelet transform for
data.
extracting geometric details in some ways, but the performance
of the curvelet transform is largely dependent on the data set.
Despite the fact that CT-SSA is inspired by CT-MLR, SSA is
applied to the detail curvelet coefficients for each pixel, whereas
MLR is applied to each detail image stack all at once. This,
along with the embedding process and SVD decomposition in
SSA, is the main reason why CT-SSA overperforms MLR in
potentially maximizing the opportunity in reducing redundancy
and noise within the hyperspectral image data for improved
accuracy of classification.
For an N × N block, the computational complexity for the
curvelet transform is given as O(N 2 log N ) [31], whereas the
Fig. 12. Band 1 of Pavia University in (a) original data and (b) CT-SSA processed
data.
computation of the wavelet transform requires O(3/2(mH +
mL )(1 − 1/4J )N 2 ) flops for a block with the same size, where
CT-SSA is visually compared in Figs. 10–12. It can be observed J is the decomposition level, and mH , mL are the lengths of the
that the CT-SSA method works for both high-noise (see Figs. 10 filter [57]. With the CDF 9/7 filter used in the experiments, the
and 11) and low-noise (see Fig. 12) cases, where the noise in the lengths of the filter are both 9, i.e., mH = mL = 9. Therefore,
TABLE I
M EAN C LASS - BY-C LASS AND AVERAGE AND OVERALL A CCURACY VALUES (%) OF T EN R EPEATED E XPERIMENTS ON T ESTING S AMPLES
OF THE O RIGINAL I NDIAN P INES D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
D ATA S ETS W ITH 10% OF D ATA U SED FOR T RAINING , F OLLOWED BY THE S TANDARD D EVIATION
TABLE II
OF THE O RIGINAL S ALINAS VALLEY D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
when J is higher than 2, the computational complexity for the SSA [10] has been proposed without degrading the feature
wavelet transform can be approximated as O(27N 2 ), which extraction performance. For the Indian Pines data set, the
is much higher than that of the curvelet transform. Moreover, computing time of the proposed CT-SSA method (using the
it should be noted that the processing time for MLR and original SSA implementation) is about 5 min on a personal
SSA is largely dependent on the number of detail coefficient computer with an Intel Core i5-2400 CPU at 3.10 GHz using
stacks and the number of total pixels in those stacks. How- MATLAB 2014a (Mathworks). Hence, it is reasonable to be-
ever, it is not a problem for SSA, as fast implementation of lieve that with the fully optimized CurveLab toolbox and the
TABLE III
OF THE O RIGINAL PAVIA U NIVERSITY D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
fast implementation of SSA, the time cost of the proposed not biased with any predetermined basis. Although EMD has
method should be much lower. shown its efficacy in time-series decomposition (e.g., speech
recognition), results reported in [60] indicated that spectral
feature extraction using EMD could potentially reduce the fol-
B. Comparison With Other State-of-the-Art Spectral
lowing classification accuracy. Consequently, a more robust and
Processing Techniques
data-driven technique, i.e., EEMD, was proposed to alleviate
Although the curvelet transform is applied in the spatial the addressed problem. It is achieved by sifting an ensemble of
domain of the hyperspectral images, the dominating part of white noise-added signal, and similarly, it generates a series of
the proposed denoising and feature extraction method, i.e., IMFs, where the sum of them represents the processed signal.
SSA, is applied in the spectral domain. As a matter of fact, In this paper, the fast EEMD MATLAB toolbox is adopted [59].
the proposed CT-SSA approach actually belongs to the 1-D The performance of PCA on the proposed approach for further
spectral processing technique for hyperspectral images. There- feature extraction is also included for comparison.
fore, a few state-of-the-art 1-D spectral feature extraction and The number of resulting features using PCA on both original
dimensionality reduction techniques, including PCA, LDA, and data and CT-SSA processed data is tested within 5–50 in a
NMF, are compared with the proposed approach for further step of 5 [11], whereas the dimensions of extracted features are
performance assessments. LDA is a standard supervised dimen- tested from 2 to 10 for both LDA and NMF. Best results with the
sionality reduction technique for pattern recognition, where the highest classification accuracy are chosen for comparison. With
highest degree of class separability is obtained by maximiz- regard to EEMD, it is suggested that the input white noise level
ing the Raleigh quotient, i.e., the ratio of the between-class ε0 should be in the range of 0.1–0.4 and the number of ensemble
scatter matrix to the within-class scatter matrix [14]. Different NE should be the order of 100 [61]. Therefore, we keep these
from PCA, which could only extract holistic features from the parameters the same as in [55], where ε0 = 0.2, NE = 200,
original HSI data, NMF is a parts-based learning algorithm, and the number of IMFs is 7. Experiments are also repeated
decomposing a nonnegative matrix into two nonnegative ma- ten times, and comparison results are given in Table IV.
trices that are more intuitive and interpretable. One of the It can be seen that those dimensionality reduction techniques
decomposed matrices is formed with a set of basis vectors, only give comparable results compared with the original data
which can be utilized to project the original data onto the lower set, although they could speed up the classification process.
dimensional subspace [16]. The NMF algorithm is realized by As given in [9], with a 10% training rate of Indian Pines,
the NMF MATLAB toolbox [58]. Additionally, as an upgraded the highest classification OA achieved by EMD is 75.49%.
version of EMD, EEMD is also included for comparison as it Although EEMD has made an improvement of over 8% on this
overcomes the drawback of EMD, and therefore, it outperforms data set, it is not as good as the proposed CT-SSA approach.
EMD [55]. Being the fundamental part of the Hilbert–Huang For Indian Pines and Salinas Valley, adding an extra PCA step
transform, EMD is a nonlinear and nonstationary signal decom- leads to further improvement in classification accuracy while
position method that decomposes the signal into a finite number reducing the classification time with less number of features.
of intrinsic mode functions (IMFs) [59]. Each IMF represents Although the extra step of PCA has limited performance on the
a zero-mean frequency-amplitude modulation component that Pavia University data set, the proposed CT-SSA method still
is often related to a specific physical process so that EMD is outperforms other spectral feature extraction techniques.
TABLE IV TABLE V
M EAN OVERALL A CCURACY VALUES (%) OF T EN R EPEATED C OMPARISON OF C LASS - BY-C LASS AND AVERAGE AND OVERALL
E XPERIMENTS ON THE O RIGINAL D ATA S ETS , SSA, CT-MLR, WT-MLR, C LASSIFICATION A CCURACY VALUES (%) OF THE CT-SSA-SPP
WT-SSA, CT-SSA, CT-SSA-PCA, AND S OME S TATE - OF - THE -A RT A PPROACH AND R ECENT S PECTRAL –S PATIAL M ETHODS FOR
S PECTRAL F EATURE E XTRACTION T ECHNIQUES P ROCESSED D ATA S ETS , THE I NDIAN P INES D ATA S ET (10% T RAINING R ATE )
W ITH D IMENSIONALITY OF F EATURES S HOWN IN PARENTHESES
TABLE VI
C OMPARISON OF C LASS - BY-C LASS AND AVERAGE AND OVERALL
C. Comparison With Recent Spectral–Spatial C LASSIFICATION A CCURACY VALUES (%) OF THE CT-SSA-SPP
Classification Methods A PPROACH AND R ECENT S PECTRAL –S PATIAL M ETHODS FOR
THE S ALINAS VALLEY D ATA S ET (2% T RAINING R ATE )
For the majority of supervised classifiers, such as neural
networks, decision trees, and SVM, the spectral-feature-based
classification chain is adopted. However, they are not able
to incorporate spatial dependence relations presented in the
original scene into the classification process [62]. Several recent
publications have demonstrated that the integration of spectral
and spatial information could be beneficial to hyperspectral im-
age classification problems, where some spectral–spatial classi-
fication methods, including the extended morphological profile
(EMP) [47], the logistic regression and multilevel logistic
(LMLL) [63], the edge-preserving filter (EPF) [64], image fusion
and recursive filter (IFRF) [65], and intrinsic image decomposi-
tion (IID) [56], have been proved to be superior to the spectral-
feature-based classification method. Therefore, a postprocessing
step is added to the proposed CT-SSA approach to increase
the spatial consistency in the classification result, where a
T × T spatial window is applied around each central pixel in
the classification map, and the final classified label is decided
in favor of the class that appears most in the window. The
window shape could be designed to conform to different scenes;
however, to make it simple, here, the square window is used.
As can be noticed in ground truth maps in Figs. 3–5, Indian
spectral–spatial classification methods. For the proposed
Pines has a smaller spatial area for some classes compared
CT-SSA approach, it is found that the best OA values achieved
with Salinas Valley and Pavia University. Therefore, the spatial
are 94.13, 97.35, and 93.85 for the three data sets Indiana Pines,
postprocessing (SPP) window T is chosen as 5 for Indian Pines,
Salinas Valley, and Pavia University, where the corresponding
9 for Salinas Valley, and 9 for Pavia University, respectively.
training ratios are 10%, 5%, and 4%, respectively. With the
Results using EMP, LMLL, EPF, IFRF, and IID are cited
introduced spatial postprocessing and under the same or even
directly from [56]. Since only nine classes in Indian Pines are
less training ratios, the OA values for the three data sets have
involved in our experiments, we have eliminated the other seven
been dramatically improved to 98.40, 99.39, and 99.19 from the
classes from their results and recalculated AA and OA accord-
new CCT-SSA-SPP approach. This has clearly demonstrated
ingly. For Salinas Valley in [56], 2% of pixels in each class
the efficacy of the SPP procedure in the proposed approach.
have been used for training. Therefore, we have reperformed
the experiment under the same training rate with other parame-
ters unchanged. Numerical results are shown in Tables V–VII, VI. C ONCLUSION
where the proposed approach with spatial postprocessing In this paper, two powerful tools, i.e., SSA and the
(CT-SSA-SPP) gives comparable accuracy values with other curvelet transform, are combined for HSI feature extraction. By
TABLE VII R EFERENCES

C OMPARISON OF C LASS - BY-C LASS AND AVERAGE AND OVERALL
C LASSIFICATION A CCURACY VALUES (%) OF THE CT-SSA-SPP [1] T. Kelman, J. Ren, and S. Marshall, “Effective classification of Chinese
A PPROACH AND R ECENT S PECTRAL –S PATIAL M ETHODS FOR tea samples in hyperspectral imaging,” Artif. Intell. Res., vol. 2, no. 4,
THE PAVIA U NIVERSITY D ATA S ET (4% T RAINING R ATE ) pp. 87–96, Oct. 2013.
[2] Y. Roggo, A. Edmond, P. Chalus, and M. Ulmschneider, “Infrared hyper-
spectral imaging for qualitative analysis of pharmaceutical solid forms,”
Anal. Chim. Acta., vol. 535, no. 1/2, pp. 79–87, Apr. 2005.
[3] K. Gill, J. Ren, S. Marshall, S. Karthick, and J. Gilchrist, “Quality-
assured fingerprint image enhancement and extraction using hyperspectral
imaging,” in Proc. ICDP, London, U.K., 2011, pp. 1–6.
[4] A. J. Brown, W. M. Calvin, P. C. McGuire, and S. L. Murchie, “Com-
pact reconnaissance imaging spectrometer for Mars (CRISM) south polar
mapping: First Mars year of observations,” J. Geophys. Res.—Planet,
vol. 115, no. E2, Feb. 2010, Art. no. E00D13.
[5] C. Zhao, X. Li, J. Ren, and S. Marshall, “Improved sparse representa-
tion using adaptive spatial support for effective target detection in hyper-
spectral imagery,” Int. J. Remote Sens., vol. 34, no. 24, pp. 8669–8684,
Oct. 2013.
[6] T. Qiao, J. Ren, M. Sun, J. Zheng, and S. Marshall, “Effective compres-
sion of hyperspectral imagery using an improved 3-D DCT approach for
land-cover analysis in remote-sensing applications,” Int. J. Remote Sens.,
vol. 35, no. 20, pp. 7316–7337, Oct. 2014.
[7] X. Li, J. Ren, C. Zhao, T. Qiao, and S. Marshall, “Novel multivariate
vector quantization for effective compression of hyperspectral imagery,”
Opt. Commun., vol. 332, pp. 192–200, Dec. 2014.
[8] J. Zabalza et al., “Novel folded-PCA for improved feature extraction and
applying SSA in the curvelet domain, noise can be smoothed data reduction with hyperspectral imaging and SAR in remote sensing,”
from the decomposed signals, resulting in more effective fea- ISPRS J. Photogramm. Remote Sens., vol. 93, no. 7, pp. 112–122, Jul. 2014.
ture extraction. Inspired by applying MLR in the curvelet [9] J. Zabalza, J. Ren, W. Zheng, S. Marshall, and J. Wang, “Singular spectrum
analysis for effective feature extraction in hyperspectral imaging,” IEEE
domain and another method of solely applying SSA on HSI Geosci. Remote Sens. Lett., vol. 11, no. 11, pp. 1886–1890, Nov. 2014.
data, the proposed method takes advantage of those approaches. [10] J. Zabalza, J. Ren, Z. Wang, H. Zhao, J. Wang, and S. Marshall, “Fast
Using SVM on three publicly available data sets, i.e., Indian implementation of singular spectrum analysis for effective feature extrac-
tion in hyperspectral imaging,” IEEE J. Sel. Topics Appl. Earth Observ.
Pines, Salinas Valley, and Pavia University for performance as- Remote Sens., vol. 8, no. 6, pp. 2845–2853, Jun. 2015.
sessment, the proposed CT-SSA method significantly improves [11] J. Zabalza et al., “Novel two-dimensional singular spectrum analysis for
OA by as much as 10% compared with the result of the original effective feature extraction and data classification in hyperspectral imag-
ing,” IEEE Trans. Geosci. Remote Sens., vol. 53, no. 8, pp. 4418–4433,
Indian Pines data set. In addition, noise could be effectively Aug. 2015.
removed by checking the band image visually. Compared with [12] J. Ren, J. Zabalza, S. Marshall, and J. Zheng, “Effective feature extraction
the inspirational approaches and some state-of-the-art spectral and data reduction in remote sensing using hyperspectral imaging,” IEEE
Signal Process. Mag., vol. 31, no. 4, pp. 149–154, Jul. 2014.
feature extraction techniques, the proposed method always [13] J. Wang and C.-I. Chang, “Independent component analysis-based dimen-
stands out with the highest classification accuracy values. Fur- sionality reduction with applications in hyperspectral image analysis,”
thermore, it is noticed that with the spatial postprocessing step, IEEE Trans. Geosci. Remote Sens., vol. 44, no. 6, pp. 1586–1600, Jun. 2006.
[14] W. Liao, A. Pizurica, P. Scheunders, W. Philips, and Y. Pi, “Semisupervised
another 4% improvement in accuracy could be added to the final local discriminant analysis for feature extraction in hyperspectral images,”
result of Indian Pines, whereas for Salinas Valley and Pavia IEEE Trans. Geosci. Remote Sens., vol. 51, no. 1, pp. 184–198, Jan. 2013.
University, classification results can even be improved to nearly [15] U. Amato, R. M. Cavalli, A. Palombo, S. Pignatti, and F. Santini, “Experi-
mental approach to the selection of the components in the minimum noise
100%. It is worth noting that by adding the simple spatial post- fraction,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 1, pp. 153–160,
processing step, the performance of the proposed denoising and Jan. 2009.
feature extraction method is either comparable with in terms [16] X. Huang and L. Zhang, “An adaptive mean-shift analysis approach for
object extraction and classification from urban hyperspectral imagery,”
of accuracy or even higher than some recent spectral–spatial IEEE Trans. Geosci. Remote Sens., vol. 46, no. 12, pp. 4173–4185,
classification methods. The choice of the spatial postprocessing Dec. 2008.
techniques will be a topic of our future work. [17] X. Geng, L. Ji, Y. Zhao, and F. Wang, “A small target detection method
for the hyperspectral image based on Higher Order Singular Value De-
Although, in this paper, the use of SSA is proposed for spec- composition (HOSVD),” IEEE Geosci. Remote Sens. Lett., vol. 10, no. 6,
tral processing in the curvelet domain, other filtering techniques pp. 1305–1308, Nov. 2013.
[18] A. J. Brown, “Spectral curve fitting for automatic hyperspectral data anal-
could also give a relatively good result, where 2-D filtering in ysis,” IEEE Trans. Geosci. Remote Sens, vol. 44, no. 6, pp. 1601–1608,
the spatial domain in combination with SSA will be further Jun. 2006.
investigated as well. [19] D. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inf. Theory,
vol. 41, no. 3, pp. 613–627, May 1995.
[20] H. Othman and S.-E. Qian, “Noise reduction of hyperspectral imagery
using hybrid spatial–spectral derivative-domain wavelet shrinkage,” IEEE
ACKNOWLEDGMENT Trans. Geosci. Remote Sens., vol. 44, no. 2, pp. 397–408, Feb. 2006.
[21] G. Chen and S.-E. Qian, “Denoising of hyperspectral imagery using prin-
The authors would like to thank Prof. D. Landgrebe, cipal component analysis and wavelet shrinkage,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 3, pp. 973–980, Mar. 2011.
Prof. L. Johnson, and Prof. P. Gamba for providing the ex- [22] D. Xu, L. Sun, J. Luo, and Z. Liu, “Analysis and denoising of hyperspec-
perimental data sets online. They would also like to thank the tral remote sensing image in the curvelet domain,” Math. Probl. Eng.,
Associate Editor and anonymous reviewers for their construc- vol. 2013, pp. 1–11, 2013.
[23] R. E. Roger and J. F. Arnold, “Reliably estimating the noise in AVIRIS hy-
tive suggestions that further improved the clarity and quality of perspectral images,” Int. J. Remote Sens., vol. 17, no. 10, pp. 1951–1962,
this paper. 1996.
[24] L. Gao, Q. Du, B. Zhang, W. Yang, and Y. Wu, “A comparative study [51] P. Zhong and R. Wang, “Modeling and classifying hyperspectral im-
on linear regression-based noise estimation for hyperspectral imagery,” agery by CRFS with sparse higher order potentials,” IEEE Trans. Geosci.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, Remote Sens., vol. 49, no. 2, pp. 688–705, Feb. 2011.
pp. 488–498, Apr. 2013. [52] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspec-
[25] I. Selenick, R. Baraniuk, and N. Kingsbury, “The dual-tree complex wave- tral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43,
et transform,” IEEE Signal Process. Mag., vol. 22, no. 6, pp. 123–151, no. 6, pp. 1351–1362, Jun. 2005.
Nov. 2005. [53] G. P. Hughes, “On the mean accuracy of statistical pattern recognizers,”
[26] J.-L. Starck, E. Candes, and D. Donoho, “The curvelet transform for im- IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 55–63, Jan. 1968.
age denoising,” IEEE Trans. Image Process., vol. 11, no. 6, pp. 670–684, [54] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector
Jun. 2002. Machines (Version 3.20). [Online]. Available: http://www.csie.ntu.edu.tw/
[27] M. Do and M. Vetterli, “The contourlet transform: An efficient direc- ~cjlin/libsvm/
tional multiresolution image representation,” IEEE Trans. Image Process., [55] Z. He, Y. Shen, Q. Wang, and Y. Wang, “Optimized ensemble EMD-based
vol. 14, no. 12, pp. 2091–2106, Dec. 2005. spectral features for hyperspectral image classification,” IEEE Trans.
[28] F. Nencini, A. Garzelli, S. Baronti, and L. Alparone, “Remote sensing Instrum. Meas., vol. 63, no. 5, pp. 1041–1056, May 2014.
image fusion using the curvelet transform,” Inf. Fus., vol. 8, no. 2, [56] X. Kang, S. Li, L. Fang, and J. A. Benediktsson, “Intrinsic image de-
pp. 143–156, 2007. composition for feature extraction of hyperspectral images,” IEEE Trans.
[29] L. Sun and J. Luo, “Junk band recovery for hyperspectral image based on Geosci. Remote Sens., vol. 53, no. 4, pp. 2241–2253, Apr. 2015.
curvelet transform,” J. Central South Univ. Technol., vol. 18, pp. 816–822, [57] A. Ignat, “Combining features for texture analysis,” in Computer Analy-
2011. sis of Images and Patterns. Berlin, Germany: Springer-Verlag, 2015,
[30] J.-L. Starck, F. Murtagh, E. Candes, and D. Donoho, “Gray and color pp. 220–229.
image contrast enhancement by the curvelet transform,” IEEE Trans. [58] Y. Li and A. Ngom, “The non-negative matrix factorization toolbox for
Image Process., vol. 12, no. 6, pp. 706–717, Jun. 2003. biological data mining,” Source Code Biol. Med., vol. 8, no. 1, pp. 1–15,
[31] E. Candes, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet Apr. 2013.
transforms,” Multiscale Model. Simul., vol. 5, no. 3, pp. 861–899, 2006. [59] Y.-H. Wang, C.-H. Yeh, H.-W. V. Young, K. Hu, and M.-T. Lo,
[32] N. Golyandina and A. Zhigljavsky, Singular Spectrum Analysis for Time “On the computational complexity of the empirical mode decomposition
Series. Berlin, Germany: Springer-Verlag, 2013. algorithm,” Phys. A, vol. 400, pp. 159–167, Apr. 2014.
[33] N. Golyandina, V. Nekrutkin, and A. A. Zhigljavsky, Analysis of Time [60] B. Demir and S. Ertürk, “Empirical mode decomposition of hyperspectral
Series Structure: SSA and Related Techniques. London, U.K.: images for support vector machine classification,” IEEE Trans. Geosci.
CRC Press, 2001. Remote Sens., vol. 48, no. 11, pp. 4071–4084, Nov. 2010.
[34] E. Candes and D. Donoho, “Curvelets—A surprisingly effective nonadap- [61] Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition:
tive representation for objects with edges,” in Curve and Surface Fitting. A noise-assisted data analysis method,” Adv. Adapt. Data Anal., vol. 1,
Nashville, TN, USA: Vanderbilt Univ. Press, 1999, pp. 105–120. no. 1, pp. 1–41, Jan. 2009.
[35] S. Kouchaki, S. Sanei, E. Arbon, and D.-J. Dijk, “Tensor based singular [62] M. Rojas, I. Dópido, A. Plaza, and P. Gamba, “Comparison of support
spectrum analysis for automatic scoring of sleep EEG,” IEEE Trans. vector machine-based processing chains for hyperspectral image classifi-
Neural Syst. Rehabil. Eng., vol. 23, no. 1, pp. 1–9, Jan. 2015. cation,” in Proc. Satellite Data Compress., Commun., Process. VI, 2010,
[36] W. Pereira, S. Bridal, A. Coron, and P. Laugier, “Singular spectrum pp. 1–12.
analysis applied to backscattered ultrasound signals from in vitro human [63] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral image segmenta-
cancellous bone specimens,” IEEE Trans. Ultrason., Ferroelect., Freq. tion using a new Bayesian approach with active learning,” IEEE Trans.
Control, vol. 51, no. 3, pp. 302–312, Mar. 2004. Geosci. Remote Sens., vol. 49, no. 10, pp. 3947–3960, Oct. 2011.
[37] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens- [64] X. Kang, S. Li, and J. A. Benediktsson, “Spectral–spatial hyperspectral
ing images with support vector machines,” IEEE Trans. Geosci. Remote image classification with edge-preserving filtering,” IEEE Trans. Geosci.
Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. Remote Sens., vol. 52, no. 5, pp. 2666–2677, May 2014.
[38] E. Candes, L. Demanet, D. Donoho, and L. Ying, CurveLab Toolbox [65] X. Kang, S. Li, and J. A. Benediktsson, “Feature extraction of hyper-
(V 2.1.2). [Online]. Available: http://www.curvelet.org/software.html spectral images with image fusion and recursive filtering,” IEEE Trans.
[39] S. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for Geosci. Remote Sens., vol. 52, no. 6, pp. 3742–3752, Jun. 2014.
image denoising and compression,” IEEE Trans. Image Process., vol. 9,
no. 9, pp. 1532–1546, Sep. 2000.
Tong Qiao received the B.Eng. degree (first class)
[40] D. Landgrebe, AVIRIS Image Indian Pines. [Online]. Available: https://
in electrical and electronic engineering from the
engineering.purdue.edu/~biehl/MultiSpec
University of Manchester, Manchester, U.K., in
[41] L. Johnson, AVIRIS Image Salinas Valley. [Online]. Available: http://
2011; the M.Sc. degree (distinction) in electrical
www.ehu.eus/ccwintco/index.php
and electronic engineering from the University of
[42] P. Gamba, ROSIS Image Pavia University. [Online]. Available: http://
Strathclyde, Glasgow, U.K., in 2012; and the Ph.D.
www.ehu.eus/ccwintco/index.php
degree in hyperspectral imaging from the Centre for
[43] G. Camps-Valls and L. Bruzzone, Eds., Kernel Methods for Remote
Excellence in Signal and Image Processing, Univer-
Sensing Data Analysis. Chichester, U.K.: Wiley, 2009.
sity of Strathclyde, in 2016.
[44] A. Chakrabarty, O. Choudhury, P. Sarkar, A. Paul, and D. Sarkar, “Hyper-
Her research interests include hyperspectral-
spectral image classification incorporating bacterial foraging-optimized
imaging-based applications and feature extraction
spectral weighting,” Artif. Intell. Res., vol. 1, no. 1, pp. 63–83, Sep. 2012.
for remote sensing image classification.
[45] R. Archibald and G. Fann, “Feature selection and classification of hy-
perspectral images with support vector machines,” IEEE Geosci. Remote
Sens. Lett., vol. 4, no. 4, pp. 674–677, Oct. 2007. Jinchang Ren (M’07) received the B.E. degree in
[46] A. Plaza, P. Martinez, J. Plaza, and R. Perez, “Dimensionality reduc- computer software, the M.Eng. degree in image
tion and classification of hyperspectral image data using sequences of processing, and the D.Eng. degree in computer vi-
extended morphological transformations,” IEEE Trans. Geosci. Remote sion from Northwestern Polytechnical University,
Sens., vol. 43, no. 3, pp. 466–479, Mar. 2005. Xi’an, China, and the Ph.D. degree in electronic
[47] J. A. Benediktsson, M. Pesaresi, and K. Amason, “Classification and fea- imaging and media communication from Bradford
ture extraction for remote sensing images from urban areas based on mor- University, Bradford, U.K.
phological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, He is currently a Senior Lecturer with the Centre
no. 9, pp. 1940–1949, Sep. 2003. for Excellence in Signal and Image Processing and
[48] D. L. Civco, “Artificial neural networks for land-cover classification and the Deputy Director of the Strathclyde Hyperspectral
mapping,” Int. J. Geophys. Inf. Syst., vol. 7, no. 2, pp. 173–186, 1993. Imaging Centre, University of Strathclyde, Glasgow,
[49] H. Bischof and A. Leonardis, “Finding optimal neural networks for land U.K. He has published over 150 peer-reviewed journal and conference papers.
use classification,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 1, His research interests are focused mainly on visual computing and multimedia
pp. 337–341, Jan. 1998. signal processing and particularly on semantic content extraction for video
[50] L. Bruzzone and D. F. Prieto, “A technique for the selection of kernel- analysis and understanding and, more recently, hyperspectral imaging.
function parameters in RBF neural networks for classification of remote- Dr. Ren is an Associate Editor of two international journals including
sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 2, Multidimensional Systems and Signal Processing and the International Journal
pp. 1179–1184, Mar. 1999. of Pattern Recognition and Artificial Intelligence.
Zheng Wang received the Ph.D. degree in com- Jón Atli Benediktsson (S’84–M’90–SM’99–F’04)
puter science from Tianjin University (TJU), Tianjin, received the Cand.Sci. degree in electrical engi-
China, in 2009. neering from the University of Iceland, Reykjavik,
From 2007 to 2008, he was a Visiting Scholar Iceland, in 1984 and the M.S.E.E. and Ph.D. degrees
with the National Institute for Research in Com- in electrical and computer engineering from Purdue
puter Sciences and Control (INRIA), France. He is University, West Lafayette, IN, USA, in 1987 and
currently an Associate Professor with the School 1990, respectively.
of Computer Software, TJU. His research interests He is currently the Rector and a Professor of elec-
include video analysis, hyperspectral imaging, and trical and computer engineering with the University
computer graphics. of Iceland. He is a Cofounder of the biomedical
startup company Oxymap. He has published exten-
sively in his areas of interest. His research interests include remote sensing,
biomedical analysis of signals, pattern recognition, image processing, and
Jaime Zabalza received the M.Eng. degree in in- signal processing.
dustrial engineering from Universitat Jaume I (UJI), Dr. Benediktsson was the 2011–2012 President of the IEEE Geoscience and
Castellón de la Plana, Spain, in 2006; the MAS Remote Sensing Society (GRSS) and has been on the GRSS Administrative
degree in electrical technology from the Universitat Committee since 2000. He is a Fellow of the International Society for Optics
Politècnica de València, València, Spain, in 2010; and Photonics (SPIE). He is a member of the Association of Chartered Engi-
and the M.Sc. degree (with distinction) in electronic neers in Iceland (VFI), Societas Scinetiarum Islandica, and Tau Beta Pi. From
and electrical engineering and the Ph.D. degree 2003 to 2008, he was an Editor of the IEEE T RANSACTIONS ON G EOSCIENCE
in hyperspectral imaging from the University of AND R EMOTE S ENSING (TGRS). He has served as an Associate Editor for
Strathclyde, Glasgow, U.K., in 2012 and 2015, TGRS since 1999, the IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS
respectively. since 2003, and IEEE Access since 2013. He is on the International Editorial
From 2006 to 2011, he joined UJI as an R&D Board of the International Journal of Image and Data Fusion and was the
Engineer and worked in the Energy Technological Institute, Spain, in mul- Chairman of the Steering Committee of the IEEE J OURNAL OF S ELECTED
tidisciplinary fields comprising power electronics, automation, and computer T OPICS IN A PPLIED E ARTH O BSERVATIONS AND R EMOTE S ENSING
science. He is currently with the Department of Electronic and Electrical (J-STARS) during 2007–2010. He received the Stevan J. Kristof Award from
Engineering, University of Strathclyde. His research interests are related to Purdue University in 1991 as an outstanding graduate student in remote sensing.
synthetic aperture radar and remote sensing, hyperspectral imaging, and digital He was the recipient of the Icelandic Research Council’s Outstanding Young
signal processing, including signal processing in a wide range of applications. Researcher Award in 1997, was granted the IEEE Third Millennium Medal in
2000, was a corecipient of the University of Iceland’s Technology Innovation
Award in 2004, and received the yearly research award from the Engineering
Meijun Sun received the Ph.D. degree in com- Research Institute of the University of Iceland in 2006 and the Outstanding
puter science from Tianjin University (TJU), Tianjin, Service Award from the IEEE GRSS in 2007. He was also a corecipient of
China, in 2009. the 2012 IEEE TGRS Paper Award. He received the 2013 IEEE/VFI Electrical
From 2007 to 2008, she was a Visiting Scholar Engineer of the Year Award and was a corecipient of the IEEE GRSS Highest
with the National Institute for Research in Com- Impact Paper Award.
puter Sciences and Control (INRIA), France. She
is currently an Associate Professor with the School
of Computer Science and Technology, TJU. Her
research interests include computer graphics, hyper-
spectral imaging, and image processing.
Huimin Zhao was born in Shaanxi, China, in 1966. Qingyun Dai received the Ph.D. degree from the
He received the B.Sc. and M.Sc. degrees in signal South China University of Technology, Guangzhou,
processing from Northwestern Polytechnical Univer- China.
sity, Xi’an, China, in 1992 and 1997, respectively, She is a Full Professor with the School of Infor-
and the Ph.D. degree in electrical engineering from mation Engineering, the Director of the Intelligent
Sun Yat-sen University, Guangzhou, China, in 2001. Signal Processing and Big Data Research Center,
He is currently a Professor with the Guangdong and the Director of the Science and Technology
Polytechnic Normal University, Guangzhou. His re- Research Office of Guangdong University of Tech-
search interests include image, video, and informa- nology, Guangzhou. Her research interests mainly
tion security technology. include wavelets, image processing, image retrieval,
pattern recognition, manufacturing engineering sys-
tems, radio frequency identification, cloud computing, and big data.
Shutao Li (M’07–SM’15) received the B.S.,

M.S., and Ph.D. degrees from Hunan University,
Changsha, China, in 1995, 1997, and 2001, respec-
tively, all in electrical engineering.
In 2001, he joined the College of Electrical and
Information Engineering, Hunan University. In 2013,
he was granted the National Science Fund for Dis-
tinguished Young Scholars in China. He is currently Stephen Marshall received the B.Sc. degree in elec-
a Full Professor with the College of Electrical and trical and electronic engineering from the University
Information Engineering, Hunan University. He is of Nottingham, Nottingham, U.K., and the Ph.D.
also a Chang-Jiang Scholar Professor appointed by degree in image processing from the University of
the Ministry of Education of China. He has authored/coauthored more than 180 Strathclyde, Glasgow, U.K.
refereed papers. His professional interests include compressive sensing, sparse He is currently a Professor with the Department of
representation, image processing, and pattern recognition. Electronic and Electrical Engineering, University of
Dr. Li is an Associate Editor of the IEEE T RANSACTIONS ON G EO - Strathclyde. He has published over 150 papers in his
SCIENCE AND R EMOTE S ENSING and a member of the Editorial Board of areas of interest. His research activities are focused
Information Fusion and Sensing and Imaging. He was a recipient of two 2nd- on nonlinear image processing and hyperspectral
Grade National Awards at the Science and Technology Progress of China in imaging.
2004 and 2006. Dr. Marshall is a Fellow of the Institution of Engineering and Technology.

Effective Denoising and Classification of Hyperspectral Images Using Curvelet Transform and Singular Spectrum Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Effective Denoising and Classification of Hyperspectral Images Using Curvelet Transform and Singular Spectrum Analysis

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Effective Denoising and Classification

Abstract—Hyperspectral imaging (HSI) classification has be- I. I NTRODUCTION

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

It should be noted that the matrix T in (17) is a Hankel

2) Reconstruction: The first step of reconstruction is group-

the transform as shown in Fig. 1. By applying the curvelet

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

IV. DATA S ETS AND E XPERIMENTAL S ETUP

The spatial area of all data sets is not a square; hence,

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

corrupted band is effectively suppressed, and local details of the

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

12 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

TABLE VII R EFERENCES

14 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Shutao Li (M’07–SM’15) received the B.S.,

You might also like