Sparse-Adaptive Hypergraph Discriminant Analysis

for Hyperspectral Image Classification
Fulin Luo , Liangpei Zhang , Fellow, IEEE, Xiaocheng Zhou, Tan Guo , Yanxiang Cheng, and Tailang Yin

Abstract— Hyperspectral image (HSI) contains complex I. I NTRODUCTION

multiple structures. Therefore, the key problem analyzing the
intrinsic properties of an HSI is how to represent the structure
relationships of the HSI effectively. Hypergraph is very effective
to describe the intrinsic relationships of the HSI. In general, H YPERSPECTRAL images (HSIs) contain hundreds
of 2-D images that are captured under different
electromagnetic spectra [1]–[3]. In an HSI, each pixel is a
Euclidean distance is adopted to construct the hypergraph.
However, this method cannot effectively represent the structure continuous spectral curve that has good discriminant perfor-
properties of high-dimensional data. To address this problem, mance for materials. Thus, the HSI has been widely used in
we propose a sparse-adaptive hypergraph discriminant analy- the fields of target detection [4], anomaly detection [5], and
sis (SAHDA) method to obtain the embedding features of the land-cover classification [6]. However, a large number of bands
HSI in this letter. SAHDA uses the sparse representation to will result in the Hughes phenomenon [7]. Therefore, a key
reveal the structure relationships of the HSI adaptively. Then,
problem is to reduce the number of bands.
an adaptive hypergraph is constructed by using the intraclass
sparse coefficients. Finally, we develop an adaptive dimensionality Dimensionality reduction is an effective manner, which
reduction mode to calculate the weights of the hyperedges and can transform high-dimensional data into a low-dimensional
the projection matrix. SAHDA can adaptively reveal the intrinsic space and preserve some significant information [8]. The
properties of the HSI and enhance the performance of the extracted low-dimensional features possess better discrimi-
embedding features. Some experiments on the Washington DC nant performance than the original features. Graph learning
Mall hyperspectral data set demonstrate the effectiveness of the is an effective method to represent the intrinsic properties
proposed SAHDA method, and SAHDA achieves better classifi-
cation accuracies than the traditional graph learning methods.
of data [9]. Some classic methods were proposed such as
locally linear embedding (LLE) [10] and Laplacian eigenmaps
Index Terms— Dimensionality reduction, hypergraph learning,
(LEs) [11]. Motivated by these methods, some novel methods
hyperspectral image (HSI), sparse representation.
were developed such as regularized local discriminant embed-
ding (RLDE) [8] and local geometric structure Fisher analysis
With the intraclass sparse coefficients, we construct a hyper-

graph G = {X, E, W}, where X is the vertex set, E is the
hyperedge set, W is the weight matrix of the hyperedges, and
each hyperedge ei ∈ E has a weight wi that can be adaptively
calculated in this letter.
To represent the relationship between the vertex
and the hyperedge, we construct an incidence matrix
H = [h i j ]i, j ∈ |V|×|E| with intraclass sparse coefficients,
where h i j denotes the relationship between xi and e j and is
defined as 
si, j , if (xi ) = (x j ) and i = j
Fig. 1. Flowchart of the proposed SAHDA method. hi j = . (2)
0, otherwise

analysis (SAHDA) method to implement the dimensionality According to the incidence matrix, we can obtain the
reduction of the HSI. SAHDA first uses the sparse repre- degrees of xi and e j , which are represented as
sentation to reveal the intrinsic structure relationships of the 

HSI adaptively. Then, we use the intraclass sparse coefficients div = w j h i j d ej = |e j | = hi j . (3)
to construct an adaptive hyperedge. Finally, we construct j =1 i=1
a dimensionality-reduction model to compute the projection In a low-dimensional space, we preserve the structures of
matrix and the hyperedge weights adaptively, which can be a hypergraph and compact the homogeneous data as close as
solved by the alternating direction method of multipliers possible. That is to say, the vertices on the same hyperedge
(ADMM). SAHDA is more robust to data and can better should be close in low-dimensional space, while the similarity
reveal the intrinsic properties of the HSI. Experiments on a of each hyperedge can be effectively calculated by an adaptive
hyperspectral data set achieve better performance than BH and weight. Thus, the objective function  can be denoted
2 as
1  wi   V x j VT xk   n
The rest of this letter is organized as follows. Section II J (V, W) = min  −  +α wi2
die  v v 
(x j ,xk )∈ei k 
details our method. Experimental results are presented in 2 dj d
ei ∈E i=1
Section III to demonstrate the effectiveness of the proposed
method. Finally, Section IV draws some conclusions. 
s.t. tr(V XX V) = 1,
wi = 1 (4)

To represent the intrinsic properties of the HSI adaptively, where W = [wi ]ni . α > 0 is a balanced parameter.
a novel dimensionality-reduction method, termed SAHDA, For the optimal problem of (4), we construct an augmented
was proposed in this letter, as shown in Fig. 1. First, spare Lagrangian function with a Lagrangian multiplier δ and λ as
representation is adopted to represent the intrinsic relation-
ships of the HSI adaptively. Then, according to the intra- L(V, W, δ, λ)
class sparse coefficients, we construct an adaptive hypergraph 1  wi   V x j
  −   V T x 
model. Finally, an adaptive dimensionality reduction model = 
is designed to learn the weights of the hyperedges and the 2 die
ei ∈E (x j ,xk )∈ei  d vj dkv 
projection matrix.  n 
Suppose a hyperspectral data set X = [x1 , x2 , . . . , +α wi +δ
wi −1 +λ(1−tr(VT XXT V)). (5)
xn ] ∈  D×n contains n pixels with D spectral bands. i=1 i=1
(xi ) ∈ {1, 2, . . . , c} denotes the class label of xi , where c
is the class number of the land cover. The low-dimensional Then, we use the ADMM to update the solution of (5).
embedding of X denotes Y = [y1 , y2 , . . . , yn ] ∈ d×n , where We first fix W to update V, and the objective function can be
d is the embedding dimension. We can get Y = VT X with a represented as
projection matrix V ∈  D×d , where d << D.  2
Sparse representation aims to represent a data point with    T 
1 wi   V xj V xk 
L(V, λ) =   v − d v 
a dictionary and obtain the representation coefficients as
e ∈E i (x ,x )∈e  k 
sparse as possible. For the representation coefficients, most 2 d
i j k i j
of them are zero, and only a few elements are nonzeros
that corresponding data points possess strong relevance. These + λ(1 − tr(V XX V)) T T
nonzero coefficients corresponding to data points can reveal  
1  h i j h ik wi  VT x j VT xk 
the intrinsic properties of the data. For a data point xi , its =  −  v 
sparse coefficients can be obtained by the following problem: 2 die  d v d k 
ei ∈E (x j ,xk )∈X j

min si 1 s.t. xi − Xsi  < ε (1) + λ(1 − tr(V XX V)) T T
= tr(VT XL H XT V) + λ(1 − tr(VT XXT V)) (6)
where ε > 0 is the error tolerance. || • ||1 is the l1 -norm
−1/2 T −1/2
that controls the sparsity of coefficients. si = [si,1 , si,2 , where L H = I − Dv HWD−1
e H Dv is the
. . . , si,i−1 , 0, si,i+1 , . . . , si,n ] are the sparse coefficients. hyper-Laplacian matrix. I is an identity matrix.

Dv = diag([d1v , d2v , . . . , dnv ]) and De = diag([d1e , d2e , . . . , dne ])

are the hyperedge and vertex degree matrices.
We can obtain the partial derivative of (6) with respect to V,
that is,
∂L(V, λ)
= XL H XT V − λXXT V. (7)
We set (7) to zero, and the solution of V can be obtained, Fig. 2. Washington DC Mall HSI including (Left) false color and (Right)
which is denoted as a generalized eigenvalue problem ground truth. (Note that the number of samples for each class is shown in
parentheses).
data set, as shown in Fig. 2. This data set was cap-
Then, the optimal projection matrix V = [V1 , V2 , . . . , Vd ] tured by the airborne hyperspectral digital imagery collection
can be obtained by d smallest eigenvalue-corresponding experiment (HYDICE) from the mall in Washington DC.
eigenvectors. The original size is 1208 × 307 with a total of 210 bands
To update W, we fix V and can obtain the following in the region of the visible and infrared spectra. In this letter,
objective function with respect to W as: we use a size of 250 × 307 with 191 bands that removed the
L(W, δ) water absorption bands.
1  wi   V x j
  V xk 
=  −  B. Experimental Setup
2 die
ei ∈E (x j ,xk )∈ei  d vj dkv 
In the experiments, we select two corresponding methods,
 n  i.e., BH and SGDA, to compare with SAHDA. For BH, we set
+α wi2 + δ wi − 1 the neighbor size to 5. For SAHDA and SGDA, the error
i=1 i=1 tolerance of spare representation was set to 0.1. In this letter,

n we adopted the SPGL1 toolbox [22] to solve the sparse
=− 2
wi qi ||ri || + α wi2 +δ wi − 1 (9) problem of SGDA and SAHDA. The parameter α is set to
i=1 i=1 i=1 5 for SAHDA. After obtaining the low-dimensional features,
the NN classifier and the support-vector machine (SVM) were
where qi is the ith element of the diagonal vector of De −1 and
used to discriminate the class types of unknown data, and
ri is the ith column of VT XDv−1/2 H.
we also showed the classification results of “RAW” spectrum.
For (9), we use the coordinate descent algorithm to solve
For the SVM, we adopted the LibSVM Toolbox [23] with an
this problem. In each iteration, we choose two elements for
radial basis function (RBF) kernel, and a grid search method
updating, while the other elements are fixed. For example,
was used to select the penalty term C and the RBF kernel
in an iteration, we update two elements wa and wb . For
n width σ in a given set {2−10 , . . . , 210 }. The embedding dimen-
i=1 wi = 1, the summation of wa and wb will be a fixed sion was set to 30 for all the methods. For the experimental
value. Thus, the updating of wa and wb can be denoted as
results, we use the accuracy of each class, average accuracy
⎧ k+1

⎪ wa = 0, wbk+1 = wak + wbk , (AA), overall accuracy (OA), and Kappa coefficient (KC) to

⎪   evaluate the effect of each algorithm. To represent the results

⎪ if 2α wak + wbk + (tb − ta ) ≤ 0

⎪ k+1 robustly, we repeated each experiment ten times and showed

⎨wa = wak + wbk , wbk+1 = 0,
 k  the AA with standard deviations (STD).
⎪  kif 2α w
 a + wb
k + (t − t ) ≤ 0
a b (10)

⎪ 2α wa + wb + (tb − ta )
C. Classification Results

⎪ wak+1 = , otherwise

⎪ 4δ
⎩ k+1 To evaluate the classification accuracies of each class,
wa = wa + wb − wa
k k k+1
we randomly selected 30 samples from each class as the
where ta = −qa ||ra ||2 . By updating all the wi , we can adap- training set and the remaining samples were used for testing.
Table I shows the results of different methods under different
tively obtain the optimum weight matrix of the hypergraph.
With the alternate updating of W and V up to convergence, classifiers.
According to Table I, under the NN and SVM classifiers,
we can achieve the optimal projection matrix and hyperedge
weights. Since the procedure is adaptive to construct a hyper- both the proposed method achieves better results than BH and
SGDA for most classes. For BH, it uses Euclidean distance
graph, SAHDA is more robust to data and can better represent
the intrinsic properties of the HSI. According to the projection to describe the structure of the HSI data, while Euclidean dis-
tance is inaccurate to represent the complex high-dimensional
matrix, the low-dimensional embedding of xi is
data in general. For SGDA, it can adaptively represent the
yi = V T xi . (11) intrinsic relationship of data, while it just considers the simple
one-to-one relationship, which is very difficult to reveal the
III. E XPERIMENTAL R ESULTS complex structures of the HSI. SAHDA inherits the merits
of BH and SGDA. It not only adaptively represents the
A. Data Set intrinsic structures of the HSI but also reveals the multiple
To demonstrate the effectiveness of the proposed method, properties of the HSI. BH, SGDA, and SAHDA generate better
we conduct some experiments on the Washington DC Mall accuracies than RAW because the dimensionality reduction

Fig. 3. Classification map with different methods, where the first and second rows show the results of SVM and NN, and the first to fourth columns show
the results of RAW, BH, SGDA, and SAHDA.


Fig. 4. Classification results with SVM (the first row) and NN (the second
row) under different embedding dimensions.

discriminative features. When the discriminant features are

sufficient to represent the intrinsic properties of the data,
the classification results will reach a peak value and keep a
fixed value even increasing the embedding dimension. In addi-
tion, SAHDA generates the best results than the compared
methods; the reason is that the proposed method can adaptively
represent the complex multiple properties of the HSI.

methods can reduce the redundancy and preserve the valuable E. Results With Different Numbers of Training Samples
information to enhance the discriminant performance of the In this section, we analyzed the results with different
HSI. For all the compared methods, SAHDA possesses the numbers of training samples. We selected randomly 5, 10,
best AA, OA, and KC. The visualized results are shown 15, 20, 25, and 30 samples from each class for training and
in Fig. 3. The proposed method achieves a smoother region repeated ten times under each condition. Table II shows the
than RAW, BH, and SGDA, especially in the areas of Road and average OAs with STD and KCs.
water because SAHDA can adaptively represent the intrinsic In Table II, the results indicate that the accuracies improve
multiple relationships and the similarity of the homogeneous with the increasing of training samples because more informa-
samples. tion can be used to construct the training model. Furthermore,
the proposed method achieves better OAs and KCs than the
D. Dimensionality Analysis other methods under each case.
To analyze the effect of embedding dimension, 30 samples
were randomly selected from each class for training and the F. Parameter Analysis
other samples were used for testing. We repeated ten times For SAHDA, it has a parameter α to adjust the weights.
under each case and Fig. 4 shows the results of each method To evaluate the influence of α, we randomly selected 30 sam-
under different embedding dimensions. ples from each class as the training set and the remaining
According to Fig. 4, the classification accuracies improve samples were considered as the test set. We set α to 0.01, 0.1,
and then reach a stable value with the increasing of embedding 1, 5, 10, 15, 20, 25, and 30, and Fig. 5 shows the average OAs
dimension, because the increasing dimension can obtain more with a ten-time-repeated experiment under each condition.

