Professional Documents
Culture Documents
Effective Denoising and Classification of Hyperspectral Images Using Curvelet Transform and Singular Spectrum Analysis
Effective Denoising and Classification of Hyperspectral Images Using Curvelet Transform and Singular Spectrum Analysis
applications. To avoid this problem, the multiscale wavelet- This paper is organized as follows: The theory of the curvelet
transform-based approaches are widely used for image denois- transform and SSA is introduced in Section II. The proposed
ing, where the transform decomposes the image into a set denoising and feature extraction approach applied on HSI data
of wavelet coefficients at different decomposition levels, and is given in Section III. Section IV introduces the experimental
noise in the low-energy channels of the transformed domain setup, including data sets, the classifier, and tuning of optimal
can be removed. A famous algorithm called soft threshold- parameters. Results and analysis are presented in Section V.
ing in the wavelet domain was proposed by Donoho in [19]. Conclusions are drawn in Section VI.
By removing small coefficients under a certain threshold and
shrinking large coefficients, most unwanted noise can be easily
II. BACKGROUND P RINCIPLES
discarded. For HSI data, Othman and Qian proposed a hybrid
spatial–spectral derivative-domain wavelet shrinkage approach In this section, the theory of the curvelet transform and
based on soft thresholding [20]. This algorithm works in the SSA [9], [26], [30]–[33], which will be used in the following
spectral derivative domain, in which the noise level is elevated sections, is reviewed.
and signal regularity is dissimilar in the spatial and spectral
domains of HSI data. Recently, Chen and Qian combined
A. Curvelet Transform
feature extraction and denoising, leading to a more effective
denoising method for HSI data using PCA and wavelet shrink- Similar to the wavelet transform, the curvelet transform
age [21]. In this approach, first, PCA transform is performed could also provide multiscale analysis on images. It was first
on the HSI data, and then, a 2-D bivariate wavelet thresh- proposed by Candes and Donoho in 1999, and their version is
olding method is used to remove noise for low-energy PCA also called the first-generation curvelet transform [34]. How-
channels. ever, this transform is rather complicated and needs at least four
In this paper, we are also aiming at combining ideas of steps to complete, including subband decomposition, smooth
feature extraction and denoising together for improving clas- partitioning, renormalization, and ridgelet analysis [34]. A few
sification accuracy of remote sensing hyperspectral images. years later, a simpler and faster version of the curvelet transform
This work is inspired by that in [22] where the curvelet was developed, which is the second-generation transform called
transform is applied to HSI data, and the representation of the fast discrete curvelet transform (FDCT) [31]. Two forms of
noise-free signals in the curvelet domain is predicted us- FDCT with the same computational complexity were proposed
ing multiple linear regression (MLR), which is a regression by Candes et al., based on the unequally spaced fast Fourier
that has been previously used for noise analysis [23], [24]. transforms (USFFTs) and the wrapping of specially selected
Although the wavelet transform has been widely applied for Fourier samples, respectively. The curvelet transform used here
image denoising, many studies have concluded that the wavelet is the one based on USFFT.
transform cannot provide a good representation of anisotropic Let us assume there is a pair of smooth, nonnegative, and
singularity, such as curves or edges in the image [25]–[27]. For real-valued windows called the radial window W (r) and the
this reason, soft-thresholding directional curvelet coefficients angular window V (t), such that W is supported on r ∈ (1/2, 2)
that match image edges could achieve better noise reduction and that V is supported on t ∈ [−1, 1]. These windows will
effect than the coefficients obtained in the wavelet domain. It always satisfy the admissibility conditions, i.e.,
has been proven that the curvelet transform could represent
∞
piecewise-linear contours on multiple scales through a few 3 3
W 2 (2j r) = 1, r ∈ , (1)
significant coefficients, leading to a better separation between j=−∞
4 2
geometric details and background noise [28]. Therefore, the ∞
1
curvelet transform is a good candidate for image denoising and V 2 (t − l) = 1, t ∈ −1/2, . (2)
2
enhancement. After the curvelet transform, the coefficients of l=−∞
two adjacent bands still maintain the correlation similarity of
For each scale j, the frequency window Uj is defined in the
the original HSI data [29], which means spectral processing
Fourier domain by
techniques could then be applied in the curvelet domain. Fea-
[j/2]
ture extraction by MLR in [22] is achieved by estimating the 2 θ
noise-free band by using all pixels in adjacent bands at the same Uj (r, θ) = 2−3j/4 W (2−j r)V (3)
2π
time, while more accurate spectral feature extraction is pro-
posed to be achieved by applying the aforementioned SSA to where [j/2] is the integer part of j/2. Consequently, the support
each pixel. The denoising performance in [22] is then assessed of Uj is defined by the support of W and V , which will be a
by adding noise to the original HSI data and comparing the polar wedge.
mean square errors and the mean structural similarity between A mother curvelet waveform ϕj (x) is defined related to
the original data and denoised data, where the classification Uj (ω) (Uj (r, θ) will be abbreviated as Uj (ω)), where the
performance on the denoised image is unknown. We therefore Fourier transform of ϕj is equal to Uj . Thus, all curvelets
adopt the curvelet transform instead of the wavelet transform in of scale 2−j can be acquired by rotations and translations
our approach and combine SSA to extract effective spectral fea- of ϕj , where the equispaced sequence of rotation angles is
tures in the curvelet domain to improve classification accuracy θl = 2π · 2−[j/2] · l, with l = 0, 1, . . . such that 0 ≤ θl < 2π,
for remote sensing hyperspectral images. and the sequence of translation parameters k = (k1 , k2 ) ∈ Z 2 .
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 3
Then, we can have the curvelets at scale 2−j , orientation θl , and 1) 2-D FFT: Apply a 2-D fast Fourier transform (FFT) to
position xk = Rθ−1 (k1 · 2−j , k2 · 2−j/2 ), defined as
(j,l)
input arrays and obtain Fourier samples as
l
(j,l)
n−1
ϕj,l,k (x) = ϕj Rθl x − xk (4) fˆ[n1 , n2 ] = f [t1 , t2 ]e−i2π(n1 t1 +n2 t2 )/n (13)
t1 ,t2 =0
where Rθ is the rotation by θ radians, and Rθ−1 is its inverse, i.e.,
with −n/2 ≤ n1 , n2 < n/2.
cos θ sin θ 2) Interpolation: For each pair of scale and angle (j, l),
Rθ = . (5)
− sin θ cos θ Fourier samples fˆ[n1 , n2 ] are interpolated to get new values,
fˆ[n1 , n2 − n1 tan θl ], for n1 , n2 ∈ Pj . Pj is defined as a set in
A curvelet coefficient of an element f ∈ 2 is the inner the following equation:
product of f and a curvelet ϕj,l,k , i.e.,
Pj = {(n1 , n2 ) : n10 ≤ n1 < n10 +L1,j n20 ≤ n2 < n20 +L2,j }
c(j, l, k) := f, ϕj,l,k = f (x)ϕj,l,k (x)dx. (6) (14)
R2
where L1,j and L2,j are the length and width of a rectangle,
and (n10 , n20 ) is the index of the pixel at the bottom left of the
Our discussion of the digital curvelet transforms will always rectangle.
be in the frequency domain. Therefore, the curvelet coeffi- 3) Multiplication: Multiply the interpolated samples with
cient could be reexpressed in the frequency domain using the frequency window U
j and obtain
Plancherel’s theorem, i.e.,
f
j,l [n1 , n2 ] = fˆ[n1 , n2 − n1 tan θl ]U
j [n1 , n2 ]. (15)
1
c(j, l, k) := fˆ(ω)ϕ̂j,l,k (ω)dω 4) Inverse 2-D FFT: The last step is to apply the inverse 2-D
(2π)2
1
(j,l)
FFT to f
j,l to get the discrete curvelet coefficients, i.e.,
= fˆ(ω)Uj (Rθ ω) ei xk ,ω dω. (7)
l
(2π)2 cD (j, l, k) = fˆ[n1 , n2 − n1 tan θl ]
n1 ,n2 ∈Pj
In practice, instead of using the polar window defined in (3),
j [n1 , n2 ]ei2π(k1 n1 /L1,j +k2 n2 /L2,j ) . (16)
×U
it is more common to use Cartesian equivalents. The Cartesian
frequency window is shown as follows: With all four steps, the discrete curvelet transform via USFFT
requires O(n2 log n) flops for computation and O(n2) for storage.
j (ω) := ψj (ω1 )Vj (ω)
U (8)
B. Singular Spectrum Analysis
where we define ψ(ω1 ) = φ(ω1 /2)2 − φ(ω1 )2 as a bandpass As a well-established approach, SSA has been applied for
profile in ψj (ω1 ) = ψ(2−j ω1 ) and Vj (ω) = V (2[j/2] ω2 /ω1 ) with time-series analysis and forecasting and is widely used in
V still obeying (2). By introducing the set of equispaced slopes different areas, including mathematics, economics, and even
tan θl := l · 2−[j/2] with l = −2[j/2] , . . . , 2[j/2] −1, for each θl ∈ biomedical engineering [35]. SSA shares the same theoretical
[−π/4, π/4), the Cartesian window can be rewritten as foundations as PCA. Both of them are able to decompose
j,l (ω) := ψj (ω1 )Vj (Sθ ω) the original time series into a linear combination of a new
U (9)
l
orthogonal basis, which includes eigenvectors generated from
the diagonalization of the data correlation matrix [36]. The main
where Sθ is the shear matrix defined as
capability of SSA is that it can decompose the original series
1 0 into some interpretable components, such as the trend, oscilla-
Sθ := . (10) tions, and unstructured noise [9]. The SSA algorithm consists
− tan θ 1
of two stages including decomposition and reconstruction, and
Similarly, curvelets should be redefined in Cartesian form as it is briefly explained as follows.
follows: 1) Decomposition: Assume that X is a nonzero 1-D series
vector with length N , i.e., X = (x1 , x2 , . . . , xN ). Given a
j SθTl x − Sθ−T
3j
j,l,k (x) = 2 4 ϕ
ϕ l
b (11) window size L(1 < L < N ), the original series X is mapped
to K lagged vectors, Xi = (xi , xi+1 , . . . , xi+L−1 )T for i =
with b := (k1 · 2−j , k2 · 2−j/2 ), where superscript T represents 1, 2, . . . , K, where K = N − L + 1. The window size should
the transpose of the matrix. Accordingly, the curvelet coeffi- be properly chosen depending on the application. Then, the
cient will be trajectory matrix is formed as
T = X1 X2 ··· XK
c(j, l, k) = fˆ (Sθl ω) U
j (ω)eib,ω dω. (12) ⎛ ⎞
x1 x2 ··· xK
⎜ x2 x3 ··· xK+1 ⎟
⎜ ⎟
Assume the input Cartesian arrays for an n × n image are in =⎜ . . . .. ⎟ . (17)
⎝ .. .. .. . ⎠
the form of f [t1 , t2 ],0 ≤ t1 ,t2 < n, a four-step implementation
of FDCT via USFFT can be obtained as follows. xL xL+1 ··· xN
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
T = T1 + T2 + · · · + Td (18)
√
where d is the rank of T, Ti = λi Ui ViT (i = 1, 2, . . . , d)
is called
√ the elementary matrix with rank 1, and Vi =
TT Ui / λi are often referred to as the principal components of
matrix T. Generally, the contribution of the elementary matrix Ti
to the trajectory matrix T is determined by the ratio of each
eigenvalue and the sum of all eigenvalues, as shown in the
following equation:
λi
ηi = . (19)
d
λi
i=1
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 5
8: end if
9: for b ← 1, B do
10: apply curvelet transform to I_padb to get
11: coarse Cb,J and detail Wb,j,l images
12: end for
13: for all detail image stacks do
14: for x ← 1, X do
15: for y ← 1, Y do
16: Wj,l (x, y) ← (W1,j,l (x, y), . . . ,
17: WB,j,l (x, y))
18: Yj,l (x, y) ← SSA(Wj,l (x, y), L, EVG)
19: end for
20: end for Fig. 3. Indian Pines data set. (Left) Band 150 out of 200 bands. (Right)
Reduced nine-class ground truth map with the number of samples shown.
21: end for
22: substitute original detail curvelet coefficients
23: with SSA-processed detail coefficients
24: for b ← 1, B do
25: apply inverse curvelet transform to Cb,J and
26: Yb,j,l to get denoised band image I_db
27: end for
28: crop I_db to the original size P × Q × B
29: end procedure
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 7
Fig. 5. Pavia University data set. (Left) Band 70 out of 103 bands. (Right)
Nine-class ground truth map with the number of samples shown.
Fig. 7. Classification results (OA in percentage) of Indian Pines obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data, (d) WT-MLR
processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
Fig. 8. Classification results (OA in percentage) of Salinas Valley obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data, (d) WT-MLR
processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
A. Comparison With Inspirational Approaches that the default setting of decomposition levels should always
The SSA-based feature extraction approach mentioned in [9] be used in the curvelet transform.
has two parameters: window size L and grouping parameter For the Indian Pines data set, in each class out of the nine
EVG. It is noticed in [9] that good results can be achieved with classes, 10% of pixels are randomly extracted into the training
only the first component used for EVG, as long as a proper set with the rest allocated in the testing set. As Salinas Valley
window size L is chosen. Another parameter in the proposed is a high-resolution HSI data set, spectra with this scene are
approach is the decomposition level in the curvelet transform. more separable than those with Indian Pines. Therefore, a lower
As previously mentioned, the default number of decomposition percentage of samples in each class (5%) is used for training
levels in the CurveLab toolbox is J = log2 (N ) − 3, and it is the classification model of SVM to give a more interpretable
advised that the number of decomposition levels should not result. A recent publication proposed to use a fixed number
be higher than the default setting. Therefore, different values (200) from each class in the Pavia University data set as training
have been tested to look for optimal parameters. First, the samples [56], which account for 4% of the whole labeled
default number of decomposition levels is used in the curvelet pixels. Therefore, 4% of pixels are also randomly chosen from
transform, and the window size L is tested from 2 to 5 as Pavia University for training SVM in the following experiment.
plotted in Fig. 6. Similarly, experiments have been carried out Figs. 7–9 have shown the classification maps as well as OA
ten times, and the average OA as well as the standard deviation levels for three data sets. Compared with those inspirational
are shown in the plots. By trial and error, the optimal value for L methods, the CT-SSA approach always presents better denois-
is set as 5 for both Indian Pines and Salinas Valley. For Pavia ing and feature extraction performance in terms of the highest
University, L is set as 4. The same settings of SSA are used for classification accuracy.
CT-SSA and WT-SSA as well. With the chosen optimal values Additionally, the capability of the proposed CT-SSA method
for window size L, the number of decomposition levels is tested in removing noise is compared. One noisy band is taken from
from 2 to the default setting. Results plotted in Fig. 6 indicate each data set, and the same band after being processed by
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 9
Fig. 9. Classification results (OA in percentage) of Pavia University obtained on (a) original data, (b) SSA processed data, (c) CT-MLR processed data,
(d) WT-MLR processed data, (e) WT-SSA processed data, and (f) CT-SSA processed data.
TABLE I
M EAN C LASS - BY-C LASS AND AVERAGE AND OVERALL A CCURACY VALUES (%) OF T EN R EPEATED E XPERIMENTS ON T ESTING S AMPLES
OF THE O RIGINAL I NDIAN P INES D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
D ATA S ETS W ITH 10% OF D ATA U SED FOR T RAINING , F OLLOWED BY THE S TANDARD D EVIATION
TABLE II
M EAN C LASS - BY-C LASS AND AVERAGE AND OVERALL A CCURACY VALUES (%) OF T EN R EPEATED E XPERIMENTS ON T ESTING S AMPLES
OF THE O RIGINAL S ALINAS VALLEY D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
D ATA S ETS W ITH 5% OF D ATA U SED FOR T RAINING , F OLLOWED BY THE S TANDARD D EVIATION
when J is higher than 2, the computational complexity for the SSA [10] has been proposed without degrading the feature
wavelet transform can be approximated as O(27N 2 ), which extraction performance. For the Indian Pines data set, the
is much higher than that of the curvelet transform. Moreover, computing time of the proposed CT-SSA method (using the
it should be noted that the processing time for MLR and original SSA implementation) is about 5 min on a personal
SSA is largely dependent on the number of detail coefficient computer with an Intel Core i5-2400 CPU at 3.10 GHz using
stacks and the number of total pixels in those stacks. How- MATLAB 2014a (Mathworks). Hence, it is reasonable to be-
ever, it is not a problem for SSA, as fast implementation of lieve that with the fully optimized CurveLab toolbox and the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 11
TABLE III
M EAN C LASS - BY-C LASS AND AVERAGE AND OVERALL A CCURACY VALUES (%) OF T EN R EPEATED E XPERIMENTS ON T ESTING S AMPLES
OF THE O RIGINAL PAVIA U NIVERSITY D ATA S ET AND SSA, CT-MLR, WT-MLR, WT-SSA, AND CT-SSA P ROCESSED
D ATA S ETS W ITH 4% OF D ATA U SED FOR T RAINING , F OLLOWED BY THE S TANDARD D EVIATION
fast implementation of SSA, the time cost of the proposed not biased with any predetermined basis. Although EMD has
method should be much lower. shown its efficacy in time-series decomposition (e.g., speech
recognition), results reported in [60] indicated that spectral
feature extraction using EMD could potentially reduce the fol-
B. Comparison With Other State-of-the-Art Spectral
lowing classification accuracy. Consequently, a more robust and
Processing Techniques
data-driven technique, i.e., EEMD, was proposed to alleviate
Although the curvelet transform is applied in the spatial the addressed problem. It is achieved by sifting an ensemble of
domain of the hyperspectral images, the dominating part of white noise-added signal, and similarly, it generates a series of
the proposed denoising and feature extraction method, i.e., IMFs, where the sum of them represents the processed signal.
SSA, is applied in the spectral domain. As a matter of fact, In this paper, the fast EEMD MATLAB toolbox is adopted [59].
the proposed CT-SSA approach actually belongs to the 1-D The performance of PCA on the proposed approach for further
spectral processing technique for hyperspectral images. There- feature extraction is also included for comparison.
fore, a few state-of-the-art 1-D spectral feature extraction and The number of resulting features using PCA on both original
dimensionality reduction techniques, including PCA, LDA, and data and CT-SSA processed data is tested within 5–50 in a
NMF, are compared with the proposed approach for further step of 5 [11], whereas the dimensions of extracted features are
performance assessments. LDA is a standard supervised dimen- tested from 2 to 10 for both LDA and NMF. Best results with the
sionality reduction technique for pattern recognition, where the highest classification accuracy are chosen for comparison. With
highest degree of class separability is obtained by maximiz- regard to EEMD, it is suggested that the input white noise level
ing the Raleigh quotient, i.e., the ratio of the between-class ε0 should be in the range of 0.1–0.4 and the number of ensemble
scatter matrix to the within-class scatter matrix [14]. Different NE should be the order of 100 [61]. Therefore, we keep these
from PCA, which could only extract holistic features from the parameters the same as in [55], where ε0 = 0.2, NE = 200,
original HSI data, NMF is a parts-based learning algorithm, and the number of IMFs is 7. Experiments are also repeated
decomposing a nonnegative matrix into two nonnegative ma- ten times, and comparison results are given in Table IV.
trices that are more intuitive and interpretable. One of the It can be seen that those dimensionality reduction techniques
decomposed matrices is formed with a set of basis vectors, only give comparable results compared with the original data
which can be utilized to project the original data onto the lower set, although they could speed up the classification process.
dimensional subspace [16]. The NMF algorithm is realized by As given in [9], with a 10% training rate of Indian Pines,
the NMF MATLAB toolbox [58]. Additionally, as an upgraded the highest classification OA achieved by EMD is 75.49%.
version of EMD, EEMD is also included for comparison as it Although EEMD has made an improvement of over 8% on this
overcomes the drawback of EMD, and therefore, it outperforms data set, it is not as good as the proposed CT-SSA approach.
EMD [55]. Being the fundamental part of the Hilbert–Huang For Indian Pines and Salinas Valley, adding an extra PCA step
transform, EMD is a nonlinear and nonstationary signal decom- leads to further improvement in classification accuracy while
position method that decomposes the signal into a finite number reducing the classification time with less number of features.
of intrinsic mode functions (IMFs) [59]. Each IMF represents Although the extra step of PCA has limited performance on the
a zero-mean frequency-amplitude modulation component that Pavia University data set, the proposed CT-SSA method still
is often related to a specific physical process so that EMD is outperforms other spectral feature extraction techniques.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE IV TABLE V
M EAN OVERALL A CCURACY VALUES (%) OF T EN R EPEATED C OMPARISON OF C LASS - BY-C LASS AND AVERAGE AND OVERALL
E XPERIMENTS ON THE O RIGINAL D ATA S ETS , SSA, CT-MLR, WT-MLR, C LASSIFICATION A CCURACY VALUES (%) OF THE CT-SSA-SPP
WT-SSA, CT-SSA, CT-SSA-PCA, AND S OME S TATE - OF - THE -A RT A PPROACH AND R ECENT S PECTRAL –S PATIAL M ETHODS FOR
S PECTRAL F EATURE E XTRACTION T ECHNIQUES P ROCESSED D ATA S ETS , THE I NDIAN P INES D ATA S ET (10% T RAINING R ATE )
W ITH D IMENSIONALITY OF F EATURES S HOWN IN PARENTHESES
TABLE VI
C OMPARISON OF C LASS - BY-C LASS AND AVERAGE AND OVERALL
C. Comparison With Recent Spectral–Spatial C LASSIFICATION A CCURACY VALUES (%) OF THE CT-SSA-SPP
Classification Methods A PPROACH AND R ECENT S PECTRAL –S PATIAL M ETHODS FOR
THE S ALINAS VALLEY D ATA S ET (2% T RAINING R ATE )
For the majority of supervised classifiers, such as neural
networks, decision trees, and SVM, the spectral-feature-based
classification chain is adopted. However, they are not able
to incorporate spatial dependence relations presented in the
original scene into the classification process [62]. Several recent
publications have demonstrated that the integration of spectral
and spatial information could be beneficial to hyperspectral im-
age classification problems, where some spectral–spatial classi-
fication methods, including the extended morphological profile
(EMP) [47], the logistic regression and multilevel logistic
(LMLL) [63], the edge-preserving filter (EPF) [64], image fusion
and recursive filter (IFRF) [65], and intrinsic image decomposi-
tion (IID) [56], have been proved to be superior to the spectral-
feature-based classification method. Therefore, a postprocessing
step is added to the proposed CT-SSA approach to increase
the spatial consistency in the classification result, where a
T × T spatial window is applied around each central pixel in
the classification map, and the final classified label is decided
in favor of the class that appears most in the window. The
window shape could be designed to conform to different scenes;
however, to make it simple, here, the square window is used.
As can be noticed in ground truth maps in Figs. 3–5, Indian
spectral–spatial classification methods. For the proposed
Pines has a smaller spatial area for some classes compared
CT-SSA approach, it is found that the best OA values achieved
with Salinas Valley and Pavia University. Therefore, the spatial
are 94.13, 97.35, and 93.85 for the three data sets Indiana Pines,
postprocessing (SPP) window T is chosen as 5 for Indian Pines,
Salinas Valley, and Pavia University, where the corresponding
9 for Salinas Valley, and 9 for Pavia University, respectively.
training ratios are 10%, 5%, and 4%, respectively. With the
Results using EMP, LMLL, EPF, IFRF, and IID are cited
introduced spatial postprocessing and under the same or even
directly from [56]. Since only nine classes in Indian Pines are
less training ratios, the OA values for the three data sets have
involved in our experiments, we have eliminated the other seven
been dramatically improved to 98.40, 99.39, and 99.19 from the
classes from their results and recalculated AA and OA accord-
new CCT-SSA-SPP approach. This has clearly demonstrated
ingly. For Salinas Valley in [56], 2% of pixels in each class
the efficacy of the SPP procedure in the proposed approach.
have been used for training. Therefore, we have reperformed
the experiment under the same training rate with other parame-
ters unchanged. Numerical results are shown in Tables V–VII, VI. C ONCLUSION
where the proposed approach with spatial postprocessing In this paper, two powerful tools, i.e., SSA and the
(CT-SSA-SPP) gives comparable accuracy values with other curvelet transform, are combined for HSI feature extraction. By
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 13
[24] L. Gao, Q. Du, B. Zhang, W. Yang, and Y. Wu, “A comparative study [51] P. Zhong and R. Wang, “Modeling and classifying hyperspectral im-
on linear regression-based noise estimation for hyperspectral imagery,” agery by CRFS with sparse higher order potentials,” IEEE Trans. Geosci.
IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 6, no. 2, Remote Sens., vol. 49, no. 2, pp. 688–705, Feb. 2011.
pp. 488–498, Apr. 2013. [52] G. Camps-Valls and L. Bruzzone, “Kernel-based methods for hyperspec-
[25] I. Selenick, R. Baraniuk, and N. Kingsbury, “The dual-tree complex wave- tral image classification,” IEEE Trans. Geosci. Remote Sens., vol. 43,
et transform,” IEEE Signal Process. Mag., vol. 22, no. 6, pp. 123–151, no. 6, pp. 1351–1362, Jun. 2005.
Nov. 2005. [53] G. P. Hughes, “On the mean accuracy of statistical pattern recognizers,”
[26] J.-L. Starck, E. Candes, and D. Donoho, “The curvelet transform for im- IEEE Trans. Inf. Theory, vol. 14, no. 1, pp. 55–63, Jan. 1968.
age denoising,” IEEE Trans. Image Process., vol. 11, no. 6, pp. 670–684, [54] C.-C. Chang and C.-J. Lin, LIBSVM: A Library for Support Vector
Jun. 2002. Machines (Version 3.20). [Online]. Available: http://www.csie.ntu.edu.tw/
[27] M. Do and M. Vetterli, “The contourlet transform: An efficient direc- ~cjlin/libsvm/
tional multiresolution image representation,” IEEE Trans. Image Process., [55] Z. He, Y. Shen, Q. Wang, and Y. Wang, “Optimized ensemble EMD-based
vol. 14, no. 12, pp. 2091–2106, Dec. 2005. spectral features for hyperspectral image classification,” IEEE Trans.
[28] F. Nencini, A. Garzelli, S. Baronti, and L. Alparone, “Remote sensing Instrum. Meas., vol. 63, no. 5, pp. 1041–1056, May 2014.
image fusion using the curvelet transform,” Inf. Fus., vol. 8, no. 2, [56] X. Kang, S. Li, L. Fang, and J. A. Benediktsson, “Intrinsic image de-
pp. 143–156, 2007. composition for feature extraction of hyperspectral images,” IEEE Trans.
[29] L. Sun and J. Luo, “Junk band recovery for hyperspectral image based on Geosci. Remote Sens., vol. 53, no. 4, pp. 2241–2253, Apr. 2015.
curvelet transform,” J. Central South Univ. Technol., vol. 18, pp. 816–822, [57] A. Ignat, “Combining features for texture analysis,” in Computer Analy-
2011. sis of Images and Patterns. Berlin, Germany: Springer-Verlag, 2015,
[30] J.-L. Starck, F. Murtagh, E. Candes, and D. Donoho, “Gray and color pp. 220–229.
image contrast enhancement by the curvelet transform,” IEEE Trans. [58] Y. Li and A. Ngom, “The non-negative matrix factorization toolbox for
Image Process., vol. 12, no. 6, pp. 706–717, Jun. 2003. biological data mining,” Source Code Biol. Med., vol. 8, no. 1, pp. 1–15,
[31] E. Candes, L. Demanet, D. Donoho, and L. Ying, “Fast discrete curvelet Apr. 2013.
transforms,” Multiscale Model. Simul., vol. 5, no. 3, pp. 861–899, 2006. [59] Y.-H. Wang, C.-H. Yeh, H.-W. V. Young, K. Hu, and M.-T. Lo,
[32] N. Golyandina and A. Zhigljavsky, Singular Spectrum Analysis for Time “On the computational complexity of the empirical mode decomposition
Series. Berlin, Germany: Springer-Verlag, 2013. algorithm,” Phys. A, vol. 400, pp. 159–167, Apr. 2014.
[33] N. Golyandina, V. Nekrutkin, and A. A. Zhigljavsky, Analysis of Time [60] B. Demir and S. Ertürk, “Empirical mode decomposition of hyperspectral
Series Structure: SSA and Related Techniques. London, U.K.: images for support vector machine classification,” IEEE Trans. Geosci.
CRC Press, 2001. Remote Sens., vol. 48, no. 11, pp. 4071–4084, Nov. 2010.
[34] E. Candes and D. Donoho, “Curvelets—A surprisingly effective nonadap- [61] Z. Wu and N. E. Huang, “Ensemble empirical mode decomposition:
tive representation for objects with edges,” in Curve and Surface Fitting. A noise-assisted data analysis method,” Adv. Adapt. Data Anal., vol. 1,
Nashville, TN, USA: Vanderbilt Univ. Press, 1999, pp. 105–120. no. 1, pp. 1–41, Jan. 2009.
[35] S. Kouchaki, S. Sanei, E. Arbon, and D.-J. Dijk, “Tensor based singular [62] M. Rojas, I. Dópido, A. Plaza, and P. Gamba, “Comparison of support
spectrum analysis for automatic scoring of sleep EEG,” IEEE Trans. vector machine-based processing chains for hyperspectral image classifi-
Neural Syst. Rehabil. Eng., vol. 23, no. 1, pp. 1–9, Jan. 2015. cation,” in Proc. Satellite Data Compress., Commun., Process. VI, 2010,
[36] W. Pereira, S. Bridal, A. Coron, and P. Laugier, “Singular spectrum pp. 1–12.
analysis applied to backscattered ultrasound signals from in vitro human [63] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Hyperspectral image segmenta-
cancellous bone specimens,” IEEE Trans. Ultrason., Ferroelect., Freq. tion using a new Bayesian approach with active learning,” IEEE Trans.
Control, vol. 51, no. 3, pp. 302–312, Mar. 2004. Geosci. Remote Sens., vol. 49, no. 10, pp. 3947–3960, Oct. 2011.
[37] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote sens- [64] X. Kang, S. Li, and J. A. Benediktsson, “Spectral–spatial hyperspectral
ing images with support vector machines,” IEEE Trans. Geosci. Remote image classification with edge-preserving filtering,” IEEE Trans. Geosci.
Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004. Remote Sens., vol. 52, no. 5, pp. 2666–2677, May 2014.
[38] E. Candes, L. Demanet, D. Donoho, and L. Ying, CurveLab Toolbox [65] X. Kang, S. Li, and J. A. Benediktsson, “Feature extraction of hyper-
(V 2.1.2). [Online]. Available: http://www.curvelet.org/software.html spectral images with image fusion and recursive filtering,” IEEE Trans.
[39] S. Chang, B. Yu, and M. Vetterli, “Adaptive wavelet thresholding for Geosci. Remote Sens., vol. 52, no. 6, pp. 3742–3752, Jun. 2014.
image denoising and compression,” IEEE Trans. Image Process., vol. 9,
no. 9, pp. 1532–1546, Sep. 2000.
Tong Qiao received the B.Eng. degree (first class)
[40] D. Landgrebe, AVIRIS Image Indian Pines. [Online]. Available: https://
in electrical and electronic engineering from the
engineering.purdue.edu/~biehl/MultiSpec
University of Manchester, Manchester, U.K., in
[41] L. Johnson, AVIRIS Image Salinas Valley. [Online]. Available: http://
2011; the M.Sc. degree (distinction) in electrical
www.ehu.eus/ccwintco/index.php
and electronic engineering from the University of
[42] P. Gamba, ROSIS Image Pavia University. [Online]. Available: http://
Strathclyde, Glasgow, U.K., in 2012; and the Ph.D.
www.ehu.eus/ccwintco/index.php
degree in hyperspectral imaging from the Centre for
[43] G. Camps-Valls and L. Bruzzone, Eds., Kernel Methods for Remote
Excellence in Signal and Image Processing, Univer-
Sensing Data Analysis. Chichester, U.K.: Wiley, 2009.
sity of Strathclyde, in 2016.
[44] A. Chakrabarty, O. Choudhury, P. Sarkar, A. Paul, and D. Sarkar, “Hyper-
Her research interests include hyperspectral-
spectral image classification incorporating bacterial foraging-optimized
imaging-based applications and feature extraction
spectral weighting,” Artif. Intell. Res., vol. 1, no. 1, pp. 63–83, Sep. 2012.
for remote sensing image classification.
[45] R. Archibald and G. Fann, “Feature selection and classification of hy-
perspectral images with support vector machines,” IEEE Geosci. Remote
Sens. Lett., vol. 4, no. 4, pp. 674–677, Oct. 2007. Jinchang Ren (M’07) received the B.E. degree in
[46] A. Plaza, P. Martinez, J. Plaza, and R. Perez, “Dimensionality reduc- computer software, the M.Eng. degree in image
tion and classification of hyperspectral image data using sequences of processing, and the D.Eng. degree in computer vi-
extended morphological transformations,” IEEE Trans. Geosci. Remote sion from Northwestern Polytechnical University,
Sens., vol. 43, no. 3, pp. 466–479, Mar. 2005. Xi’an, China, and the Ph.D. degree in electronic
[47] J. A. Benediktsson, M. Pesaresi, and K. Amason, “Classification and fea- imaging and media communication from Bradford
ture extraction for remote sensing images from urban areas based on mor- University, Bradford, U.K.
phological transformations,” IEEE Trans. Geosci. Remote Sens., vol. 41, He is currently a Senior Lecturer with the Centre
no. 9, pp. 1940–1949, Sep. 2003. for Excellence in Signal and Image Processing and
[48] D. L. Civco, “Artificial neural networks for land-cover classification and the Deputy Director of the Strathclyde Hyperspectral
mapping,” Int. J. Geophys. Inf. Syst., vol. 7, no. 2, pp. 173–186, 1993. Imaging Centre, University of Strathclyde, Glasgow,
[49] H. Bischof and A. Leonardis, “Finding optimal neural networks for land U.K. He has published over 150 peer-reviewed journal and conference papers.
use classification,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 1, His research interests are focused mainly on visual computing and multimedia
pp. 337–341, Jan. 1998. signal processing and particularly on semantic content extraction for video
[50] L. Bruzzone and D. F. Prieto, “A technique for the selection of kernel- analysis and understanding and, more recently, hyperspectral imaging.
function parameters in RBF neural networks for classification of remote- Dr. Ren is an Associate Editor of two international journals including
sensing images,” IEEE Trans. Geosci. Remote Sens., vol. 37, no. 2, Multidimensional Systems and Signal Processing and the International Journal
pp. 1179–1184, Mar. 1999. of Pattern Recognition and Artificial Intelligence.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
QIAO et al.: DENOISING AND CLASSIFICATION OF HYPERSPECTRAL IMAGES USING CT AND SSA 15
Zheng Wang received the Ph.D. degree in com- Jón Atli Benediktsson (S’84–M’90–SM’99–F’04)
puter science from Tianjin University (TJU), Tianjin, received the Cand.Sci. degree in electrical engi-
China, in 2009. neering from the University of Iceland, Reykjavik,
From 2007 to 2008, he was a Visiting Scholar Iceland, in 1984 and the M.S.E.E. and Ph.D. degrees
with the National Institute for Research in Com- in electrical and computer engineering from Purdue
puter Sciences and Control (INRIA), France. He is University, West Lafayette, IN, USA, in 1987 and
currently an Associate Professor with the School 1990, respectively.
of Computer Software, TJU. His research interests He is currently the Rector and a Professor of elec-
include video analysis, hyperspectral imaging, and trical and computer engineering with the University
computer graphics. of Iceland. He is a Cofounder of the biomedical
startup company Oxymap. He has published exten-
sively in his areas of interest. His research interests include remote sensing,
biomedical analysis of signals, pattern recognition, image processing, and
Jaime Zabalza received the M.Eng. degree in in- signal processing.
dustrial engineering from Universitat Jaume I (UJI), Dr. Benediktsson was the 2011–2012 President of the IEEE Geoscience and
Castellón de la Plana, Spain, in 2006; the MAS Remote Sensing Society (GRSS) and has been on the GRSS Administrative
degree in electrical technology from the Universitat Committee since 2000. He is a Fellow of the International Society for Optics
Politècnica de València, València, Spain, in 2010; and Photonics (SPIE). He is a member of the Association of Chartered Engi-
and the M.Sc. degree (with distinction) in electronic neers in Iceland (VFI), Societas Scinetiarum Islandica, and Tau Beta Pi. From
and electrical engineering and the Ph.D. degree 2003 to 2008, he was an Editor of the IEEE T RANSACTIONS ON G EOSCIENCE
in hyperspectral imaging from the University of AND R EMOTE S ENSING (TGRS). He has served as an Associate Editor for
Strathclyde, Glasgow, U.K., in 2012 and 2015, TGRS since 1999, the IEEE G EOSCIENCE AND R EMOTE S ENSING L ETTERS
respectively. since 2003, and IEEE Access since 2013. He is on the International Editorial
From 2006 to 2011, he joined UJI as an R&D Board of the International Journal of Image and Data Fusion and was the
Engineer and worked in the Energy Technological Institute, Spain, in mul- Chairman of the Steering Committee of the IEEE J OURNAL OF S ELECTED
tidisciplinary fields comprising power electronics, automation, and computer T OPICS IN A PPLIED E ARTH O BSERVATIONS AND R EMOTE S ENSING
science. He is currently with the Department of Electronic and Electrical (J-STARS) during 2007–2010. He received the Stevan J. Kristof Award from
Engineering, University of Strathclyde. His research interests are related to Purdue University in 1991 as an outstanding graduate student in remote sensing.
synthetic aperture radar and remote sensing, hyperspectral imaging, and digital He was the recipient of the Icelandic Research Council’s Outstanding Young
signal processing, including signal processing in a wide range of applications. Researcher Award in 1997, was granted the IEEE Third Millennium Medal in
2000, was a corecipient of the University of Iceland’s Technology Innovation
Award in 2004, and received the yearly research award from the Engineering
Meijun Sun received the Ph.D. degree in com- Research Institute of the University of Iceland in 2006 and the Outstanding
puter science from Tianjin University (TJU), Tianjin, Service Award from the IEEE GRSS in 2007. He was also a corecipient of
China, in 2009. the 2012 IEEE TGRS Paper Award. He received the 2013 IEEE/VFI Electrical
From 2007 to 2008, she was a Visiting Scholar Engineer of the Year Award and was a corecipient of the IEEE GRSS Highest
with the National Institute for Research in Com- Impact Paper Award.
puter Sciences and Control (INRIA), France. She
is currently an Associate Professor with the School
of Computer Science and Technology, TJU. Her
research interests include computer graphics, hyper-
spectral imaging, and image processing.
Huimin Zhao was born in Shaanxi, China, in 1966. Qingyun Dai received the Ph.D. degree from the
He received the B.Sc. and M.Sc. degrees in signal South China University of Technology, Guangzhou,
processing from Northwestern Polytechnical Univer- China.
sity, Xi’an, China, in 1992 and 1997, respectively, She is a Full Professor with the School of Infor-
and the Ph.D. degree in electrical engineering from mation Engineering, the Director of the Intelligent
Sun Yat-sen University, Guangzhou, China, in 2001. Signal Processing and Big Data Research Center,
He is currently a Professor with the Guangdong and the Director of the Science and Technology
Polytechnic Normal University, Guangzhou. His re- Research Office of Guangdong University of Tech-
search interests include image, video, and informa- nology, Guangzhou. Her research interests mainly
tion security technology. include wavelets, image processing, image retrieval,
pattern recognition, manufacturing engineering sys-
tems, radio frequency identification, cloud computing, and big data.