Cking Network For Fast POLSAR Image Classification

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO.
7, JULY 2016 3273
Wishart Deep Stacking Network for

Fast POLSAR Image Classification
Licheng Jiao, Senior Member, IEEE, and Fang Liu
Abstract— Inspired by the popular deep learning architecture, processing, including natural image classification [2], [9],
deep stacking network (DSN), a specific deep model for polari- object detection [4] and scene labeling [5], [10]. DBN is a
metric synthetic aperture radar (POLSAR) image classification is deep network as a probabilistic generative structure, which
proposed in this paper, which is named Wishart DSN (W-DSN).
First of all, a fast implementation of Wishart distance is achieved performs good in many tasks, including image recognition [3],
by a special linear transformation, which speeds up the classi- speech recognition [6] and natural language processing [8].
fication of POLSAR image and makes it possible to use this With successful application in many tasks (such as speech
polarimetric information in the following neural network (NN). classification [7], [11], information retrieval [12], and image
Then, a single-hidden-layer NN based on the fast Wishart classification [1], [7]), the deep learning architecture DSN
distance is defined for POLSAR image classification, which is
named Wishart network (WN) and improves the classification has received increasing attention as well. Despite many fields
accuracy. Finally, a multi-layer NN is formed by stacking WNs, have been studied, this paper focuses on issues in the field of
which is in fact the proposed deep learning architecture W-DSN remote sensing, where deep learning is not fully developed.
for POLSAR image classification and improves the classification To be exact, an exploration of polarimetric synthetic aperture
accuracy further. In addition, the structure of WN can be radar (POLSAR) image classification by DSN is made in this
expanded in a straightforward way by adding hidden units if nec-
essary, as well as the structure of the W-DSN. As a preliminary paper.
exploration on formulating specific deep learning architecture With the fact that more than one polarization is used,
for POLSAR image classification, the proposed methods may POLSAR acquires a much richer characterization of the
establish a simple but clever connection between POLSAR image observed land and plays an important role in many areas,
interpretation and deep learning. The experiment results tested such as military, agriculture and geology [13], [14]. As a
on real POLSAR image show that the fast implementation
of Wishart distance is very efficient (a POLSAR image with result, it is of great significance to interpret POLSAR image
768 000 pixels can be classified in 0.53 s), and both the single- effectively. POLSAR image classification is one of the most
hidden-layer architecture WN and the deep learning architecture fundamental issues in the process of interpretation, where
W-DSN for POLSAR image classification perform well and work each pixel in POLSAR image is assigned to one class (such
efficiently. as urban, water, and grass). In fact, the task of POLSAR
Index Terms— Deep stacking network (DSN), POLSAR image image classification corresponds to the task of scene label-
classification, Wishart network (WN), Wishart deep stacking
network (W-DSN).
ing in nature image. Nevertheless, POLSAR data is always
represented by a coherency/covariance matrix, which contains
I. I NTRODUCTION fully polarimetric information, rather than by a real-valued
scalar/vector (in gray/color image).
W ITH A booming development of deep learning in recent
years, many deep learning architectures have been well
known in various fields, such as image processing [1]–[5],
Methods for POLSAR image classification have been stud-
ied for decades. Taking the specificity of POLSAR data
speech recognition [6], [7], natural language processing [8] into consideration, both scattering mechanism and statistical
and so on. These architectures include, but are not limited property of POLSAR data are widely used in many classical
to, Convolutional Neural Network (CNN) [2], [4], [9], [10], methods, such as Pauli decomposition [15], Entroy/Alpha
Deep Belief Network (DBN) [3], [6], [8] and Deep Stacking (H − α) decomposition [16] and Wishart distance [17], [18].
Network (DSN) [1], [7], [11], [12]. CNN is aimed at image In the usage of scattering mechanism, a professional and
thorough analysis of POLSAR data is of great importance
Manuscript received October 23, 2015; revised March 30, 2016; accepted to design a proper classifier [19]–[21], which is challenging
April 28, 2016. Date of publication May 11, 2016; date of current version
May 24, 2016. This work was supported in part by the National Basic for scholar who does not specialize in POLSAR and still
Research Program (973 Program) of China under Grant 2013CB329402 and intends to settle the task by machine learning. As for sta-
in part by the National Natural Science Foundation of China under Grant tistical property, the popular Whishart distance was proposed
61271302, Grant 61272282, Grant 61572383, and Grant 61573267. The
associate editor coordinating the review of this manuscript and approving based on the Wishart distribution of coherency matrix and
it for publication was Prof. David Clausi. covariance matrix [17], [22], [23], which is a maximum
The authors are with the Key Laboratory of Intelligent Perception and Image likelihood classifier actually [17] and has been used in both
Understanding of Ministry of Education, International Research Center for
Intelligent Perception and Computation, School of Electronic Engineering, unsupervised and supervised POLSAR image classification
Xidian University, Xi’an 710071, China (e-mail: lchjiao@mail.xidian.edu.cn; [18], [24], [25]. However, the calculation of Wishart distance is
fayliu77@163.com). time-consuming and the accuracy of classifier Wishart distance
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. is low. Methods based on more complicated distribution, take
Digital Object Identifier 10.1109/TIP.2016.2567069 too much time in estimating parameters [26], [27]. Besides the
1057-7149 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on December 28,2020 at 10:18:18 UTC from IEEE Xplore. Restrictions apply.
3274 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 7, JULY 2016
property of POLSAR data, lots of machine learning methods

are also used in the task of POLSAR image classification,
including Neural Network (NN) [28], [29], Support Vector
Machine (SVM) [30], [31] and so on. Even though these
machine learning methods perform well, few of them are
specially designed for POLSAR data. They are only used as
classifiers, with features extracted from POLSAR data. What’s
more, the speed of POLSAR image classification is limited
by the process of training or testing, which is necessary in
machine learning.
Yet deep learning architectures are less popular in remote
sensing. Three available works about POLSAR image classi-
fication by deep models were revealed in [32], [33], and [34]
respectively, where all of them employed the deep
model DBN, which is staked by Restricted Boltzmann Fig. 1. DSN with three modules.
Machines (RBMs). The first used the traditional RBM without
any modification, while the second revised it into Wishart
RBM (WRBM) according to the Wishart distribution of including a fast implementation of Wishart distance and a
POLSAR data, and the third stepped further by providing single-hidden-layer network WN, where the first makes it
a more mathematic stricter and stable version, i.e., Wishart- feasible to directly speed up the classification methods based
Bernoulli RBM (WBRBM). Unfortunately, the generative on Wishart distance and the second improves the classification
structure DBN with RBMs is not a task-oriented model, accuracy by supervised training. Experiment results show that
where DBN is trained by unsupervised training to model input these methods work very well. Specially, a separate experiment
data at first, and then it is modified to fit particular task comparison with DSN is made to show the necessary of
(e.g., classification) by supervised training. Hence the process W-DSN in the task of POLSAR image classification.
of training is time-consuming in [32]–[34]. By contrast, DSN The remainder of this paper is organized as below.
is task-oriented, which is stacked by modules and trained mod- Section II states the problem to be solved, which introduces
ule by module. Based on DSN, Wishart Deep Stacking Net- the traditional DSN and POLSAR image classification. The
work (W-DSN) is proposed in this paper, which is a specific connection between the prior knowledge of POLSAR data
deep learning architecture for POLSAR image classification. and NN is described at large in the next section, including the
At the very beginning, two primary missions should be fast implementation of Wishart distance and the single-hidden-
finished to reach this point, including a linear implementation layer neural network WN. In section IV, the specific deep
of Wishart distance and a single-hidden-layer network named leaning architecture W-DSN for POLSAR image classification
Wishart Network (WN). The first provides a fast implemen- is formed by stacking WNs, which is the ultimate goal in this
tation of Wishart distance, where the equation of Wishart paper. Experiment results are demonstrated in section V to
distance is re-organized in a linear form. The re-organization verify the effectiveness of the proposed methods. At last, a
makes it possible to use this polarimetric information in brief conclusion and the future work are discussed. Bold-faced
the proposed network. Both the computation analysis and letters in this paper denote vector or matrix, where lower-
experiment results show that the proposed fast implementation case letter in bold denotes vector and upper-case letter in bold
of Wishart distance works very well (reducing the cost from denotes matrix.
336s to 0.53s for a POLSAR image with 768000 pixels
and of 15 classes). Based on Wishart distance, the second II. P ROBLEM S TATEMENT
(i.e., WN) is a simple but valuable usage of prior information DSN comprises a series of layered modules and the structure
of POLSAR data in NN, where the column-vector form of of DSN with three modules is illustrated in Fig. 1. Each
coherency matrix for a pixel is used as an input data and module includes three layers, i.e., a linear input layer, a
the corresponding output of WN is an estimated label vector. nonlinear hidden layer and a linear output layer again [11],
These two researches establish a subtle connection between referring to the sub-structure T − H − Y in Fig. 1. In the
POLSAR image data and NN, and then W-DSN is achieved lowest module, the input units contain only the raw training
by stacking WNs. Experiment results show that the proposed data. While in each higher module, the input units contain raw
W-DSN performs well in both accuracy and efficiency. More- training data as well as the output units of lower modules.
over, the structure of WN can be expanded in a straightforward In scene labeling, the input of DSN in this task should be a
way by adding hidden units if necessary, as well as the pixel block, where the pixel block contains the pixel to be
structure of the W-DSN. classified and is used to represent that pixel. The output layer
The most important contribution of our work is the of each module output the estimated label vector for the input
proposed specific deep learning architecture W-DSN for data. Thus, the estimated label vector of higher module can
POLSAR image classification, which establishes a simple be regarded as a rectification of the adjacent lower one, since
but clever bridge between POLSAR image classification and the output of the lower module is used as a portion of input
deep learning. In addition, two byproducts are also obtained, data for the higher one. What’s more, the DSN is trained in a
JIAO AND LIU: W-DSN FOR FAST POLSAR IMAGE CLASSIFICATION 3275
module-by-module and supervised way, without propagation classification as well [18]. In unsupervised classification [18],
over all modules. J. S. Lee et al. initially clustered the POLSAR image by
As stated in the previous section, the classification of polarimetric decomposition, and then estimated the cluster
POLSAR image is consistent with the scene labeling of mean Tm by averaging all pixels from class m, i.e.,
natural image, i.e., each pixel in POLSAR image corresponds
Tm = E [T |T ∈ m ] = |1m | T, (2)
to one label. Fortunately, benefiting from the polarimetric T∈m
characteristics of POLSAR data, each pixel can be recognized
where m is the set of pixels from class m, and |m | is
without considering its neighborhoods in the task of POLSAR
the number of pixels in m . In supervised classification,
the
image classification. Then a pixel is represented by itself here,
labeled training data set = 1 2 · · · M is given,
rather than by a pixel block, and used as the input data.
where M is the number of classes. Then the cluster mean of
Furthermore, each POLSAR pixel is denoted by a complex-
each class can be calculated directly by eq. (2) with the labeled
valued matrix, which is different from the real-valued pixel
training data set.
in natural image. Specifically the 3 ∗ 3 complex coherency
2) A Linear Implementation of Wishart Distance: Although
matrix T is used to represent a POLSAR data for its Wishart
Wishart distance has been widely used in a great many of
distribution, where the corresponding Wishart distance is fully
literatures for POLSAR image classification [18], [24], [35],
used.
little attention is paid to the high-computation in calculating
Based on DSN, this paper reveals a specific deep model
the Wishart distance. Traditionally, a two-level loop (where
named W-DSN, which is designed for POLSAR data and
one level corresponds to the number of classes and the
classifies each POLSAR pixel T effectively and efficiently.
other one corresponds to the number of pixels) is needed
to compute the Wishart distance of each pixel from each
III. F ROM POLSAR DATA TO NN
cluster mean, since the equation of Wishart distance contains
To exploit a specific deep model for POLSAR image matrix operations (as shown in eq. (1)). It is a very important
classification, unique characteristics of POLSAR data should issue in POLSAR image classification, since that there are
be considered. In what follows, it is fully discussed that the always millions of pixels in a POLSAR image and the Wishart
Wishart distance serves for a guide to transfer the information distance of every pixel from each class should be calculated.
of POLSAR data to NN. In 2006, W. Wang el al. explored a fast implementation of
H-Alpha Wishart classification [36]. But, it just reduces the
A. Fast Implementation of Wishart Distance number of pixels in computing new cluster means to speed up,
Having a complex Wishart distribution [22], coherency without realizing that what matters most is the computation of
matrix and covariance matrix have been widely used in the Wishart distance. A fast implementation of Wishart distance
analysis of POLSAR data [17], [18]. J. S. Lee proposed a is proposed in this paper, which exactly computes the Wishart
maximum likelihood classifier based on Wishart distribution distance by a special linear transformation.
using covariance matrix [17], which is also suitable for As shown in eq. (1),
there are two items in the Wishart
T −1 T
coherency matrix (coherency matrix and covariance matrix distance, i.e.,
T r ace m and ln |Tm |. The first
can be transformed into each other by a linear operator). item T r ace Tm −1 T is usually calculated by multiplying
Complex coherency matrix T is chosen in this paper, which Tm −1 by T to obtain the matrix Tm −1 T firstly and
is conjugate symmetric, i.e., then the trace of the obtained matrix is computed. It needs
⎡ ⎤ 27 multiplication operations and 20 addition operations, with
T11 T12 T13
the obtained Tm −1 . Let = Tm −1 T. Both Tm −1
T = ⎣T12 T22 T23 ⎦ and T are 3*3 complex matrices, so is . It is noticed
T13 T23 T33 that the trace of a matrix is just a summation of diagonal
where T11 , T22 and T33 are real-valued, the remaining elements elements of the matrix. Namely, the computation of getting
are complex-valued and • is a conjugate of an element. is redundant and only the diagonal elements are necessary. The
1) Wishart Distance: With the maximum likelihood clas- fast implementation of Wishart distance is based on leaving
sifier [17], a multi-look POLSAR pixel T is classified out the redundant computation.
according to a so-called Wishart distance d (T |Tm ), as Let σ = f () be a function which arranges
shown in eq. (1), all elements in the matrix into column and out-
puts the column-vector form of (e.g., f (T) =
d (T |Tm ) = T r ace Tm −1 T + ln |Tm | (1)
T11 , T12 , T13 , T12 , T22 , T23 , T13 , T23 , T33 ). = f −1 (σ ) is
where tr ace (•) is the trace of a matrix, •-1 is the inverse of a the inverse function of σ = f (). In the following part of
matrix, |•| is the determinant of a matrix, Tm for class m is this paper, the n-th pixel in POLSAR image is denoted by tn ,
estimated by training samples from class m and it is regarded n = 1, 2, ..., N, where N is the total number of POLSAR
as the cluster mean of the m-th class. With calculated distance pixels. Then let T = [t1 , t2 , ..., t N ], where tn = f (Tn )
for each class, a pixel is assigned to the class with minimum is the column-vector form of matrix Tn and Tn is the
coherency matrix of the n-th pixel in POLSAR image.
What’s
distance.
Even through Wishart distance was originally proposed for more, let W = [w1 , w2 , ..., w M ], where wm = f Tm −1 ,
supervised classification [17], it has been used in unsupervised m = 1, 2, ..., M. Thus, it is easy to notice the fact that
Fig. 2. Comparison of traditional implementation and the fast implementation. (a) Traditional implementation. (b) Fast implementation.

T r ace Tm −1 Ti = (wm ) ti , where both wm and ti are form of Wishart distance is the key point to establish the

9-dim complex column vectors and (wm ) is merely the following WN.
transposition of wm without conjugation. The computation Experiment results listed in section V verify this improve-

of (wm ) ti includes 9 multiplication operations and 8 addi- ment: a POLSAR image with 768000 pixels and of 15 classes
tion operations, can be classified in 0.53s, while it needs over 330s with

which
are almost just one-third of what
T r ace Tm −1 Ti needs in traditional way. traditional implementation by eq. (1).
Furthermore, let

b = [ln (|T1 |) , ln (|T2 |) , . . . , ln (|T M |)] B. Wishart Network
be a column vector. As a result, the Wishart distances of a Despite the fact that it is highly efficient to compute the
pixel t from different cluster means are calculated by the linear Wishart distance by the proposed method as discussed in

transformation W t + b, where t = f (T). Thus the Wishart section III-A, it does not improve the classification accuracy
distance matrix is calculated by eq. (3), at all. It is because the proposed eq. (3) is just a fast but

exact implementation of Wishart distance. As a matter of
D=WT+B (3) fact, the classification accuracy of Wishart classifier depends
where B = [b, b, ..., b] is formed by repeating b N times, on the cluster mean of each class, which is not changed in
⎡

⎤ the fast implementation of Wishart distance. In supervised
d T1 |T1 d T2 |T1 · · · d T N |T1 POLSAR image classification, the cluster mean of class m is
⎢
1
2
N ⎥
⎢ d T |T2 d T |T2 · · · d T |T2 ⎥ simply estimated by averaging pixels from class m, as shown
D=⎢ ⎢ .. .. ..
⎥
⎥ in eq. (2). Inspired by the method of learning in machine
⎣ . . . ⎦

1
2
N learning, an effort to explore higher accuracy is made by a
d T |T M d T |T M · · · d T |T M learning method named WN, for the task of POLSAR image
is the matrix of Wishart distance, and D (m, n) denotes the classification.
Wishart distance of the n-th pixel Tn from the m-th cluster 1) The Definition of Wishart Network (WN): The Wishart
mean Tm . In this way, the Wishart distance of each pixel distance matrix D shown in eq. (3) is a linear transformation
from each cluster mean is calculated by eq. (3) without of T, with W and B as weight parameter and bias parameter.
looping, but a relative efficiency of compiled array-handling The n-th pixel is assigned to the class with minimum distance
code, while it needs a two-level loop by eq. (1). The omission in the n-th column of D. In supervised POLSAR image
of loops make it convenient to execute on computer, then the classification by Wishart classifier, labeled training pixels tend
method shown in eq. (3) is a fast and linear implementation to be classified correctly with
proper cluster

means.
Let

labeled
of Wishart distance matrix. training pixel set be = t1l , yl1 , t2l , yl2 , ..., tlK , ylK ,
A clear comparison is demonstrated in Fig. 2. The fast where tkl , k = 1, 2, ..., K denotes the k-th labeled pixel, ylk is
implementation leaves out the redundant computation of , the corresponding label vector and ylk is a unit column vector
and it is further organized in a way so that the Wishart distance where the index of the only non-zero element indicates the
matrix is calculated by a special linear transformation. As a label of tkl , and K is the number of labeled training pixels
result, the efficiency of calculating Wishart distance is highly in .
improved. In addition, the more classes and pixels, the higher Let W = [w1 , w2 , ..., w M ], where each cluster
the acceleration, since the two-level loop is omitted. The linear mean is associated with a column of W, as discussed
It should be noticed that the parameters of WN include

both complex and real values. As discussed in the
previous subsection, pixel t is the complex input data
(i.e., a complex column vector t = f (T)), W is a
complex matrix containing cluster means as columns, b =

[ln (|T1 |) , ln (|T2 |) , ..., ln (|T M |)] is a real column vec-

tor, and W t + b is the real value which corresponds to Wishart
distance (i.e., the input of hidden layer is the Wishart distance
of a pixel from each cluster mean). U and c are real matrix

and real bias vector respectively. Let = W; b ,
t = [t; 1], W

Fig. 3. The structure of WN.
h = [h; 1] and U = U; c . The output of WN can be denoted

in section III-A2. Given a POLSAR pixel t, the proposed by y = g (t) = U sigm W t , then the optimization is
learning method classifies it by the following function, as

K K
2 K
shown in eq. (4), l l 2
mi n E = ek = yk − g (tk ) = yk − U hk

y = g (t) = U sigm W t + b + c (4) k=1 k=1 k=1
(6)
where sigm (α) = 1+exp(−α)
1
is the sigmoid function, U and c

are weight parameter and bias parameter respectively, and y is where
hk = [hk ; 1] = sigm W tk ; 1 and
tk = [tk ; 1].
the estimated label vector where the index of the maximal ele- 2) Supervised Training: With the training set , the initial-

ment associates with the predicted label. Moreover, W t + b is ization of W is associated with the cluster mean of each class
just the Wishart distance of pixel t from cluster mean of each (which iscalculated by eq. (2)), i.e., each column of W is

class, if there is no update for W. The motivation of eq. (4) wm = f Tm −1 as discussed in section III-A2. Vector
is to learn proper parameters to improve the performance b is initialized by the natural logarithm of each cluster mean

of classification, including W, U, b and c. In supervised [ln (|T1 |) , ln (|T2 |) , ..., ln (|T M |)] .
classification, the predicted label of a training pixel should be b=
as element, i.e.,
Then W = W; b . In fact, the initialization of W and b
as close
as possible
2 to the true label. With the training set ,
ek = ylk − yk denotes the k-th error between the estimated

agrees with the discussion in section III-A2, which is much
label vector and the true label vector, where yk = g tkl . better than random initialization. This initialization is very
As a result, the proposed method should be trained to minimize important in the training process, because it makes the super-
the total error, as shown in eq. (5). vised training proceeds based on the existing Wishart distance
classifier rather than a fully random initialization, which

K K
2
l reduces uncertainty to a great extent. Given the initialized
mi n E = ek = yk − yk (5)
W and b, the optimization in eq. (6) is convex and it is
k=1 k=1 K
−1
With the structure shown in Fig. 3, the WN is defined as easy to obtain that U = hk
hk is
hk ylk . Then W
k=1
below, to realize the proposed supervised method shown in =W − λ ∂ E , where λ is the step.
updated by the equation W ∂W
eq. (4) and (5) by neural network. The input data of the
The updating equations here are obtained just by the simple
network is POLSAR pixel t, the input and output of hidden
derivation operator.
layer are W t + b and h = sigm (Wt + b) respectively, and
An explanation is provided here in detail that Tm is
the output of the output layer is the
corresponding
estimated
associated with the first 9 elements in the m-th
column of

label vector y, where y = U sigm W t + b +c. The number i.e., wm = W (1 : 9, m) = W (:, m) = f Tm −1 .
W,
of input units is 9, which is equal to the dimension of
The m-th element of b corresponds to the 10-th element in
input data t (i.e., the number of elements in t); the number i.e., bm = W (10, m). However, the
the m-th column of W,
of hidden units is equal to the number of cluster means
bias parameter b is updated in terms
of the updated W, i.e.,
(i.e., the number of columns in W); the number of output
−1
−1
units is equal to the number of classes (i.e., the dimension bm = ln (|Tm |), where Tm = f (wm ) and the
of label vector y). The weight matrix connecting input layer (10, m) is not used. It is because b is totally determined
W
and hidden layer is W, the bias parameter in hidden layer by the coherency matrix in Wishart distance, which should be
is b and the activation function of hidden layer is the sigmoid preserved anyhow. The way of updating b is a key point in
function. U is the weight matrix connecting hidden layer and the process of WN training, which has a close relationship
output layer and c is the bias parameter in output layer. to the specific design of WN. Let Tl = t1l , t2l , ..., tlK and

WN defines a function y = g (t),
Generally, the proposed Yl = yl1 , yl2 , ..., ylK . The training process of WN is summa-
i.e., g (t) = U sigm W t + b + c. The network WN should rized as Algorithm 1, where ’Require’ is meant to indicate a
be trained to learn proper parameters so that it can classify list of inputs and ’Ensure’ a list of outputs, as listed below.
labeled training pixels correctly. Therefore, the training of WN Although the structure of WN illustrated in Fig. 3 is
is aimed at the same optimization with eq. (5). similar to the general single-hidden-layer network, it should
Algorithm 1 Training of WN error condition. That is to say, if the total error E of WN

cannot be reduced to a certain threshold, then add some hidden
units in WN. Weights associated with the added hidden units
are initialized by rearranging the picked pixels with f (•),
where the picked pixels are randomly selected from different
classes. WN with M hidden units is called basic WN and WN
with more hidden units is called expanded WN. Experiment
results in section V show that the addition of hidden units can
improve the classification accuracy effectively.
IV. A D EEP M ODEL OF WN-W ISHART D EEP

S TACKING N ETWORK (W-DSN)
As a single-hidden-layer network, WN achieves higher
accuracy than the basic Wishart distance. In the following
discussion, a deep model of WN is formed to improve the
performance further, which is based on the popular DSN and
named as Wishart Deep Stacking Network (W-DSN). It is
precisely the specific deep learning architecture for POLSAR
image classification proposed in this paper.
A. Wishart Deep Stacking Network (W-DSN)

It is obvious that the proposed WN is directly a single
module of DSN, where the main differences are that the raw
input data is complex and the weights connecting input layer
and hidden layer are complex and initialized by cluster means.
The structure of W-DSN with three modules is same to that of
DSN, as demonstrated in Fig. 1. W-DSN is different from the
traditional DSN in the fact that the modules stacked in W-DSN
are WNs and the parameters in WNs are unique. Specially,
W-DSN with only one module is exactly WN.
Suppose that the parameter in the
i -th WN module
is

Wi , Ui , where Wi = Wi ; bi and Ui = Ui ; ci .
The inputs of three modules are T, [T; Y1 ] and [T; Y1 ; Y2 ]
respectively and the corresponding outputs are Y1 , Y2 and Y3
be emphasized that difference between WN and general one respectively,
as listed in eq. (7)∼(9). The lowest module with
parameter W 1,
U1 is exactly the WN, so it is trained as
is mainly that the input data is the complex-valued vector t,
the initialization of W is explicit and the update of bias b is described in Algorithm 1. However when the second module
adjacent to the lowest one is trained, W 2 is initialized by com-
unique (i.e., determined by the updated W, as discussed in the
bining the trained W 1 with several random real values, where
previous paragraph).
With learned parameters W and U, a POLSAR pixel t is these real values correspond to the weights connecting Y1
classified by the estimated label vector, i.e., y = g (t) = and hidden units, because Y1 is a portion of input data for

the second module. The initialization of W 3 is similar to that
U sigm W t , where t = [t; 1]. The index of maximal value
of W2 . Thus each module is trained as Algorithm 1 one by
in the estimated label vector y corresponds to the predicted
one, and the whole training process of W-DSN is summarized
label. Experiments shown in section V demonstrate that the
as Algorithm 2. The number of modules in W-DSN is needed
proposed WN does improve the classification accuracy.
by Algorithm 2 besides what is needed by Algorithm 1.
3) Expanded WN: The proposed WN mentioned above

contains 9 input units (corresponding to the dimension of input Y1 = U1 sigm W 1T (7)
data), M hidden units (corresponding to the number of cluster

means) and M output units (corresponding to the number of Y2 = U2 sigm W 2 [T; Y1 ] (8)

classes). From the point of view of function learning, the Y3 = U3 sigm W 3 [T; Y1 ; Y2 ] (9)
training of WN listed in Algorithm 1 is aimed to learn a
function (i.e., y = g (t)) so that it can classify training pixels With the fact that the estimated label vector of lower module
correctly. WN with more hidden units has a higher ability is used as a portion of input data for the adjacent higher
to represent complicated function, because more hidden units one, the estimated label vector of the higher module can
bring about more parameters. Hence some hidden units should be regarded as a rectification of that of the lower one. In
be added if the WN with M hidden units does not meet the training process, when the output of lower module (e.g., Y1 ) is
Algorithm 2 Training of W-DSN where tk,1 = [tk ; 1], as shown in eq. (6). When the second
lowest module (which is adjacent to the lowest one) is trained,
the optimization is
K
2
l 2
mi n E 2 = yk − U2 sigm W tk,2 , (11)
k=1

where tk,2 = t ; y . The initialization of parame-
k,1 k,1
ter is W 2 = W 1 ; P , where P is a M × M ran-
dom real-valued matrix associated with yk,1 . Thus E 2 =
K
2
l
yk − U2 sigm W 1 t k,1 + P y k,1 . If P a null matrix,
k=1
these two optimizations shown in eq. (10) and (11) are the
same, i.e., mi n E 1 is equal to mi n E 2 . Yet, P is a variable
in the second optimization, hence the latter optimization is
a generalization of the former one. Consequently, E 2 may
have a smaller minimum than E 1 , and parameters learned by
minimizing E 2 provide a better estimated label vector than that
learned by minimizing E 1 . Namely, the second lowest module
provides a better estimated label vector than the lowest one.
This will continue for higher modules. So W-DSN with more
modules can improve the classification accuracy further.
It has been discussed that both W-DSN (with more than
one WN) and expanded WN have a higher ability to express
complicated classification function than the basic WN. There-
fore, W-DSN with expanded WN will exhibit better classi-
combined with raw input data to form a new input in adjacent fication performance than both of them. Experiment results
higher module (e.g., [T; Y1 ]), the output (e.g., Y1 ) is exactly demonstrated in the following section confirm this conclusion
the linear combination of hidden units in the lower module, completely.
without estimating the predicted label. That is, only the output
of the last module (e.g., Y3 ) is used to estimate the final V. E XPERIMENTS
predicted label, in the process of testing.
A detail should be noticed that the module used here can be The POLSAR image which is tested on to verify the
the basic WN with M hidden units or the expanded WN with effectiveness of the proposed methods is related to the site of
more hidden units. The more hidden units there are, the higher Flevoland, the Netherland. The size of this POLSAR image is
ability to express complicated function the WN has, so is the 750*1024 and it has been used in many papers [37], [38].
W-DSN. If the number of classes is big or the training pixels Fig. 4 illustrates the corresponding Pauli RGB image
from different classes are difficult to distinguish from each and groundtruth respectively. There are 15 classes in this
other, which lead to the fact that the classification function groundtruth, where each class indicates a type of land covering
should be complicated enough to distinguish these pixels, more and is identified by one color. 167712 pixels are labeled in the
hidden units will be needed. groundtruth and only 5% of them are used as training pixels.
Some notes are laid out here for a clearer understanding of The reported testing accuracies are obtained by testing on
our work. W-DSN is different from DSN in that the modules the 95% residual pixels. What’s more, the proposed methods
stacked in W-DSN are WNs, which are defined specialized are compared with six other methods, including Support
for POLSAR data in section III. Thus the initialization and Vector Machine (SVM) [30], Radial Basis Function (RBF)
training of W-DSN are modified accordingly, despite the network, Wishart classifier, DBN with RBM [32], DBN with
similar procedure. WRBM [33] and DBN with WBRBM [34]. The importance
of the proposed W-DSN is emphasized by comparing W-DSN
B. Analysis of W-DSN with DSN separately, which is demonstrated in section V-D.
Although the explanation that estimated label vector of the As discussed in previous sections, the input data of WN
higher module can be regarded as a rectification of that of and W-DSN is the column-vector form of coherency matrix,
the lower one is intuitional, a rigorous mathematic analysis is i.e., t = f (T). Wishart classifier classifies pixel using the
provided. For simplicity, only the first two lower modules are original coherency matrix T. While in the last five comparing
considered, since higher modules work in a similar way. methods, POLSAR pixel is denoted by a 9-dim real-valued
When the lowest module is trained, the optimization is vector, i.e., [T11 , T22 , T33 , r e(T12), i m(T12 ), r e(T13 ), i m(T13),
K
2 r e(T23), i m(T23 )], where Ti, j denotes the element locating in
l 1 the i -th row and the j -th column, r e (•) and i m (•) denote the
mi n E 1 = yk − U1 sigm W tk,1 , (10)
k=1 real part and the image part of a complex number respectively.
Fig. 4. (a) The Pauli RGB of Flevoland. (b) Groundtruth.
TABLE I
C OMPARISON OF T RADITIONAL AND FAST I MPLEMENTATION
This real-valued vector contains all information of T, where the latter time-consumption is less than 1/600 of the
so does t. former one. This is because the redundant computation and
Experiment results below are reported to confirm three main two-level loop embedded in the traditional implementation
contributions of this paper, including the fast implementation are left out of the proposed fast implementation. Even though
of Wishart distance, the single-hidden-layer network WN and an extra step is needed in the fast implementation, it works
the deep learning architecture W-DSN for POLSAR image very efficiently, noting that the preparation time is negligible
classification. Finally, a comprehensive evaluation is provided. (0.0003s) and the speedup is large (over 600 times). The
On a 3.20 GHz machine with 4.00 GB RAM, all the experi- accuracy of the implementation of the Wishart distance is
ments are conducted using MATLAB. unchanged, since the fast implementation gives exactly the
same results as the traditional method.
A. The Comparison of Traditional and Fast
Implementation of Wishart Distance B. Effectiveness of WN
As discussed in section III, the proposed fast implementa- The proposed learning method WN, which is initialized by
tion of Wishart distance is achieved by a special linear trans- cluster mean of each class and uses the column-vector form of
form, without redundant computation and the two-level loop. original coherency matrix as input data, improves the accuracy
In supervised classification, Wishart classifiers with traditional by supervised training. The result of WN is compared with
implementation and with fast implementation respectively are that of SVM, RBF, Wishart classifier, and DBNs with RBM,
both tested. A simple preparation (i.e., the construction of WRBM and WBRBM respectively. For fairness, there are only
W and b, which corresponds to the first loop in Fig. 2 (b) one hidden layer in each DBN and linear classifier is used,
and the ’preparation’ part of Algorithm 1) is needed in the which are the same to the basic WN. In the following, these
fast implementation and the corresponding time-consumption three DBN methods are called RBM, WRBM and WBRBM
is called ’preparation time’. The time for classifying the whole respectively for conciseness.
image of 768000 pixels is called ’classification time’. First, all methods are repeatedly tested 50 times without
Running times and accuracies are listed in TABLE I. The changing training samples and the total accuracies are demon-
proposed fast implementation takes 0.0003s to construct W strated in boxplot in Fig. 5, to explore their robustness to
and b, which is unnecessary in the traditional implementation. randomness. It is clear that SVM, RBF, Wishart classifier
The traditional implementation takes 336.4s to classify the and the proposed WN are stable, which are not affected by
whole image, while the fast implementation takes just 0.5288s, random parameters. However, RBM, WBRBM and WBRBM
TABLE II
T HE A CCURACIES
Fig. 5. Total accuracy of each method by repeatedly running. Fig. 6. Total accuracy of each method by repeatedly sampling.
are sensitive to some extent, where RBM is the most sensitive Experiment results demonstrated in Fig. 5 and Fig. 6 show
one, WRBM performs more stable than RBM due to its that the proposed WN is robust to both random parameters
utility of some polarimetric information and WBRBM is the and different training samples, with relative high accuracy.
most stable one in these three methods because of its stricter Third, individual class accuracies are listed in TABLE II
mathematic support. for better insight, since many times they are more attractive to
Second, sample 5% labeled pixels repeatedly 50 times as users. In addition, the expanded WNs with 2M, 3M and 4M
training samples for each method, and the total accuracies are hidden units respectively are tested as well, to illustrate that
revealed in boxplot as well, as shown in Fig. 6. Similarly, the addition of hidden units gives rise to the improvement of
SVM, RBF, Wishar classifier and the proposed WN perform accuracy indeed.
rather stable with different training samples. However, the To analyze individual class accuracies, it should be declared
other three comparing methods are unstable, because they are that percentages of different classes in the whole training
not task-oriented. Note that, WRBM is less stable than RBM sample set are 7.89%, 10.77%, 6.08%, 4.20%, 5.72%, 4.52%,
here, however WBRBM are still more stable than both of them 3.04%, 5.98%, 6.65%, 13.27%, 3.77%, 8.27%, 9.78%, 0.43%
(the total accuracies of RBM, WRBM and WBRBM are not as and 9.63% respectively, corresponding to the class order listed
high as listed in [32]–[34], since more hidden layers with more in TABLE II. Even though SVM, RBF and Wishart classifier
units were employed there and nonlinear classifiers were used). reach similar total accuracies, SVM and RBF misclassify
TABLE III
A CCURACY OF W-DSN W ITH D IFFERENT N UMBERS OF M ODULES
‘Buildings’ (with least training samples, i.e., 0.43%) terribly,

which is well recognized by Wishart classifier. It is because
the optimizations of SVM and RBF are sensitive to small
samples, where more resources are distributed to model larger
samples for better optimal solution. Similarly, by unsupervised
training, RBM, WRBM and WBRBM are used to model all
samples at first, and then supervised training is conducted to
complete the classification task, so they are sensitive to small
samples for the same reason. In fact, they cannot recognize
‘Bare soil’ besides ‘Buildings’, where training samples of
‘Bare soil’ (3.04%) are just more than ‘Buildings’ (0.43%) but
less than other classes. However, Wishart classifier depends on
the center of each class, which is less sensitive to the number
Fig. 7. The variation of total error with more hidden units.
of samples from each class, so it performs stable in classifying
small samples, such as ‘Buildings’ and ‘Bare soil’. Another
opposite evidence is that ‘Water’ is not classified very well
WN and W-DSN with two or three modules are formed as
by Wishart classifier, even though the percentage of ‘Water’ is
illustrated in Fig. 1. The initialization and training process
high up to 7.89%. Moreover, WN is based on Wishart distance
of W-DSN are conducted module by module as discussed in
as well, so it is less sensitive to small samples, similar to
section IV, which is summarized as Algorithm 2.
Wishart classifier. Even so, compared with Wishart classifier,
W-DSNs with different numbers of modules are compared
WN is slightly effected by the number of training samples
with each other and the resulting accuracies are demonstrated
in the process of supervised training, which results in an
in TABLE III. Obviously, W-DSN with more modules or with
improvement in ‘Water’ accuracy and a decrease in ‘Bare soil’
WN of more hidden units achieves higher accuracy, including
accuracy. Next, with increasing total accuracy and comparable
most individual class accuracies and the total accuracy. More
individual class accuracies, the expanded WNs with 2M, 3M
specifically, with M hidden units in each module, W-DSNs
and 4M hidden units respectively perform better than the basic
of 1 module, 2 modules and 3 modules respectively result in
WN, confirming the conclusion in section III-B3 directly.
total accuracies 0.8650, 0.8855 and 0.9138. With 4M hidden
In addition, the final total errors of WNs with different
units in each module, W-DSNs of 1 module, 2 modules
numbers of hidden units are demonstrated in Fig. 7, where
and 3 modules respectively result in total accuracies 0.9018,
vertical axis and horizontal axis denote the total error and the
0.9163 and 0.9268. It is concluded that W-DSN with more
number of iterations for training respectively. It is obvious
modules performs better than W-DSN with less modules.
that the total error deceases with the increase of iterations.
Meanwhile, of the same number of modules (e.g., 2 modules),
What’s important is that the total error of WN with more
W-DSN of module with more hidden units results in higher
hidden units is smaller than that of WN with less hidden units,
total accuracy (e.g., W-DSNs of 2 modules with 4M and
for the same iterations. The results show that additional hidden
M hidden units in each module respectively provide the
units enhance WN’s ability of estimating label vector indeed,
accuracies 0.9163 and 0.8855). The results confirm that the
resulting in higher accuracy as shown in TABLE II.
proposed deep learning architecture W-DSN is an efficient new
type of model for the task of POLSAR image classification.
C. W-DSN for POLSAR Image Classification Similar to Fig. 7, the variation of total error of W-DSN
The deep learning architecture named W-DSN, which is with different numbers of modules is demonstrated in Fig. 8.
formed by stacking WN, could improve the accuracy further. The vertical axis and horizontal axis indicate the total error
Concretely, W-DSN with only one module is exactly the and the number of modules respectively. It is clear that the
TABLE IV
A CCURACY AND RUNNING T IME
Fig. 9. Accuracies of W-DSN and DSN.

Fig. 8. The variation of total error with the increase of the number of
modules.
since WN is a learning method based on Wishart distance.
total error decreases with more modules, resulting in better Therefore W-DSN with 1 module would achieve an accuracy
classification results as listed in TABLE III. With the same as high as Wishart classifier (0.8504) at the very beginning.
number of modules, W-DSN with modules of more hidden This polarimetric information is a very useful guide for
units has a lower total error, leading to higher accuracy. This classification task. While DSN with 1 module starts from a
observation is consistent with the fact that WN with more totally random initialization, which results in a rather low
hidden units has a higher ability of improving the classification accuracy (1/M ≈ 0.0667). As a result, W-DSN with 1 module
accuracy, as shown in TABLE II in detail. would perform better than the DSN with 1 module even after
supervised training. The performance of W-DSN with more
D. The Comparison of W-DSN and DSN modules is better than that of DSN with more modules for
the same reason. Furthermore, the most complicated DSN
To emphasize the importance of the proposed W-DSN in
(of 3 modules and 4M hidden units in each module) works
the task of POLSAR image classification, W-DSN is com-
rather worse than the simplest W-DSN (of 1 module and
pared with the traditional DSN separately. As discussed in
M hidden units in each module), where these two methods
sections III and IV, W-DSN takes the polarization information
achieve accuracies less than 0.70 and more than 0.85 respec-
of POLSAR data into consideration, which does not happen
tively. The results provide a powerful experimental evidence
in DSN. The DSN used here is stacked by 3 modules (with M
that the proposed W-DSN is real necessary for POLSAR image
hidden units in each module) like W-DSN (as shown in Fig. 1),
classification than DSN.
while these 3 modules stacked in DSN are just general one-
hidden-layer networks. That is to say, all units and weights
in DSN are real-valued and DSN classifies POLSAR pixels E. Comprehensive Evaluation
by treating them as general data. Thus the input data of DSN At last, the performance of each method is evaluated in
is a 9-dim real vector, which is same to SVM, RBF, RBM, both accuracy and time-consumption, as shown in TABLE IV.
WBRBM and WBRBM. Total accuracies of W-DSN and DSN It should be noticed that only WNs with M hidden units and
are shown in Fig. 9 to make the comparison clearly. 4M hidden units respectively are listed. The W-DSN shown
Similar to the discussed W-DSN, DSN achieves higher here are of 3 modules with 4M hidden units in each module.
accuracy with more modules or more hidden units in each Moreover, only the total accuracy is listed to make it clear
module, as demonstrated in Fig. 9. However DSN performs to exhibit. However both the time for training and the time
much worse than W-DSN, where accuracies obtained by DSN for classifying the whole POLSAR image are listed, which
is much lower than that obtained by W-DSN. In fact, WN are called ‘training time’ and ‘classifying time’ respectively.
(i.e., W-DSN with 1 module) begins with Wishart classifier, The total time is obtained by summing these two parts.
Fig. 10. (a)∼(i) show the classification results of SVM, RBF, Wishart classifier, RBM, WRBM, WBRBM, WN (with M hidden units), WN (with 4M hidden
units) and W-DSN (of 3 modules with 4M hidden units in each module).
The ‘preparation time’ in Wishart classifier is renamed as is needed (17s). W-DSN with more modules results in higher
‘training time’, to keep consistent with other methods. For accuracy as well (i.e., the accuracy of W-DSN of 3 modules
RBM, WRBM and WBRBM, both the time for unsupervised 0.9258 is higher than than of W-DSN of 1 module (i.e., WN)
training and the time for supervised training are contained in 0.9018). Even though both WN with more hidden units and
‘training time’. Besides, DSN is not included here for its rather W-DSN with more modules take much more time for training,
poor performance as revealed in section V-D. the total time is much less than that of the comparing methods.
According to the results listed in TABLE IV, the fast Specifically, W-DSN with 3 modules takes 50s to complete
implementation of Wishart distance works very efficient and the task of classification, while Wishart classifier (traditional
reduces the time-consumption greatly, compared with the implementation), SVM and RBF take 336.40s, 295.33s and
traditional implementation. WN with M hidden layer is the 988.89s respectively. In short, the proposed WN and W-DSN
original form of the learning method and achieves a higher achieve higher accuracy with less time.
accuracy than Wishart classifier, at the cost of 2.2851s for Meanwhile, the corresponding classification results are also
training. With the same number of hidden units, WN with M illustrated in pictures, as shown in Fig. 10. It is clear that
hidden units takes much less time than RBM, WRBM and Fig. 10(g) shows better results than Fig. 10(a)∼(f), con-
WBRBM (3s is much less than 61.34s, 69.56s and 58.77s), firming the conclusion that WN does achieve better results,
while WN still achieves higher accuracy. With more hidden compared with SVM, RBF, Wishart classifier, RBM, WRBM
units in WN, the accuracy becomes higher (i.e., the accuracy and WBRBM. By comparing Fig. 10(g), (h) and (i) with
of WN with 4M hidden units 0.9018 is higher than that of each other, it is clearly illustrated that WN with more hid-
WN with M hidden units 0.8650), where a little more time den units or W-DSN with more modules can improve the
classification results, which agrees with the numerical values [17] J. S. Lee, M. R. Grunes, and R. Kwok, “Classification of multi-look
listed in TABLE III. polarimetric SAR imagery based on complex Wishart distribution,” Int.
J. Remote Sens., vol. 15, no. 11, pp. 2299–2311, 1994.
[18] J.-S. Lee, M. R. Grunes, T. L. Ainsworth, L.-J. Du, D. L. Schuler, and
VI. C ONCLUSION AND F UTURE W ORK S. R. Cloude, “Unsupervised classification using polarimetric decompo-
sition and the complex Wishart classifier,” IEEE Trans. Geosci. Remote
A deep learning architecture named W-DSN is constructed Sens., vol. 37, no. 5, pp. 2249–2258, Sep. 1999.
specialized for POLSAR image classification in this paper. [19] F. Shang and A. Hirose, “Use of Poincare sphere parameters for fast
supervised PolSAR land classification,” in Proc. IEEE Int. Geosci.
Two other methods about POLSAR image classification are Remote Sens. Symp. (IGARSS), Jul. 2013, pp. 3175–3178.
also proposed for the construction of W-DSN, including a fast [20] G. Singh, Y. Yamaguchi, and S.-E. Park, “General four-component scat-
implementation of Wishart distance and a network named WN. tering power decomposition with unitary transformation of coherency
matrix,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 5,
The fast implementation of Wishart distance makes it possible pp. 3014–3022, May 2013.
to directly speed up the methods based on Wishart distance, [21] J.-S. Lee, M. R. Grunes, E. Pottier, and L. Ferro-Famil, “Unsupervised
and WN is a single-hidden-layer network which improves the terrain classification preserving polarimetric scattering characteristics,”
IEEE Trans. Geosci. Remote Sens., vol. 42, no. 4, pp. 722–731,
classification accuracy at a high speed. Most importantly, the Apr. 2004.
W-DSN achieves higher accuracy further, which is proved to [22] N. R. Goodman, “Statistical analysis based on a certain multivariate
be an effective and efficient deep architecture specialized for complex Gaussian distribution (an introduction),” Ann. Math. Statist.,
vol. 34, no. 1, pp. 152–177, 1963.
POLSAR image classification. [23] F. Cao, W. Hong, Y. Wu, and E. Pottier, “An unsupervised segmentation
The spatial information, which is also very important in with an adaptive number of clusters using the SPAN/H/α/A space and the
POLSAR image classification, will be considered in our future complex Wishart clustering for fully polarimetric SAR data analysis,”
IEEE Trans. Geosci. Remote Sens., vol. 45, no. 11, pp. 3454–3467,
work, to complete the classification more rapidly and precisely. Nov. 2007.
[24] L. J. Du and J. S. Lee, “Polarimetric SAR image classification based
on target decomposition theorem and complex Wishart distribution,” in
R EFERENCES Proc. IEEE Remote Sens. Sustain. Future, Int, Geosci. Remote Sens.
[1] J. Li, H. Chang, and J. Yang. (2015). “Sparse deep stacking Symp. (IGARSS), vol. 1. May 1996, pp. 439–441.
network for image classification.” [Online]. Available: https://arxiv. [25] G. Zhou, Y. Cui, Y. Chen, J. Yin, J. Yang, and Y. Su, “Pol-SAR
org/abs/1501.00777 images classification using texture features and the complex Wishart
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification distribution,” in Proc. IEEE Radar Conf., May 2010, pp. 491–494.
with deep convolutional neural networks,” in Proc. Adv. Neural Inf. [26] M. Liu, H. Zhang, C. Wang, and F. Wu, “Change detection of multilook
Process. Syst., 2012, pp. 1097–1105. polarimetric SAR images using heterogeneous clutter models,” IEEE
[3] O. Russakovsky et al., “ImageNet large scale visual recognition chal- Trans. Geosci. Remote Sens., vol. 52, no. 12, pp. 7483–7494, Dec. 2014.
lenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2014. [27] J. S. Lee, D. L. Schuler, R. H. Lang, and K. J. Ranson,
[4] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature “K-distribution for multi-look processed polarimetric SAR imagery,” in
hierarchies for accurate object detection and semantic segmentation,” Proc. Surf. Atmospheric Remote Sens., Technol., Data Anal. Interpre-
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, tation, Int. Geosci. Remote Sens. Symp. (IGARSS), vol. 4. Aug. 1994,
pp. 580–587. pp. 2179–2181.
[5] P. H. O. Pinheiro and R. Collobert. (2013). “Recurrent convo- [28] Y. C. Tzeng and K. S. Chen, “A fuzzy neural network to SAR image
lutional neural networks for scene parsing.” [Online]. Available: classification,” IEEE Trans. Geosci. Remote Sens., vol. 36, no. 1,
http://arxiv.org/abs/1306.2795 pp. 301–307, Jan. 1998.
[6] G. E. Dahl, D. Yu, L. Deng, and A. Acero, “Context-dependent pre- [29] C. Chen, K.-S. Chen, and J.-S. Lee, “The use of fully polarimetric
trained deep neural networks for large-vocabulary speech recognition,” information for the fuzzy neural classification of SAR images,” IEEE
IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30–42, Trans. Geosci. Remote Sens., vol. 41, no. 9, pp. 2089–2100, Sep. 2003.
Jan. 2012. [30] C. Lardeux et al., “Support vector machine for multifrequency SAR
[7] L. Deng, D. Yu, and J. Platt, “Scalable stacking and learning for building polarimetric data classification,” IEEE Trans. Geosci. Remote Sens.,
deep architectures,” in Proc. IEEE Int. Conf. Acoust., Speech Signal vol. 47, no. 12, pp. 4143–4152, Dec. 2009.
Process. (ICASSP), Mar. 2012, pp. 2133–2136. [31] S. Fukuda and H. Hirosawa, “Support vector machine classification of
[8] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and land cover: Application to polarimetric SAR data,” in Proc. IEEE Int.
P. Kuksa, “Natural language processing (almost) from scratch,” J. Mach. Geosci. Remote Sens. Symp. (IGARSS), vol. 1. Jul. 2001, pp. 187–189.
Learn. Res., vol. 12, pp. 2493–2537, Nov. 2011. [32] Q. Lv, Y. Dou, X. Niu, J. Xu, J. Xu, and F. Xia, “Urban land use and
[9] K. Simonyan and A. Zisserman. (2014). “Very deep convolutional land cover classification using remotely sensed SAR data through deep
networks for large-scale image recognition.” [Online]. Available: belief networks,” J. Sensors, vol. 2015, Jul. 2015, Art. no. 538063.
http://arxiv.org/abs/1409.1556 [33] Y. Guo, S. Wang, C. Gao, D. Shi, D. Zhang, and B. Hou, “Wishart
[10] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and RBM based DBN for polarimetric synthetic radar data classification,”
A. L. Yuille. (2014). “Semantic image segmentation with deep in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2015,
convolutional nets and fully connected CRFs.” [Online]. Available: pp. 1841–1844.
https://arxiv.org/abs/1412.7062 [34] F. Liu, L. Jiao, B. Hou, and S. Yang, “POL-SAR image classification
[11] L. Deng and D. Yu, “Deep convex net: A scalable architecture for speech based on Wishart DBN and local spatial information,” IEEE Trans.
pattern classification,” in Proc. Interspeech, 2011, pp. 2285–2288. Geosci. Remote Sens., vol. 54, no. 6, pp. 3292–3308, Jun. 2016.
[12] L. Deng, X. He, and J. Gao, “Deep stacking networks for infor- [35] C. Wang, W. Yu, R. Wang, Y. Deng, F. Zhao, and Y. Lu, “Unsuper-
mation retrieval,” in Proc. IEEE Int. Conf. Acoust., Speech Signal vised classification based on non-negative eigenvalue decomposition
Process. (ICASSP), May 2013, pp. 3153–3157. and Wishart classifier,” IET Radar, Sonar Navigat., vol. 8, no. 8,
[13] E. Attema et al., “ENVISAT ASAR science and applications,” ESA Publ. pp. 957–964, Oct. 2014.
SP, vol. 1225, no. 1, p. 59, Nov. 1998. [36] W. Wang, J. Wang, S. Mao, and P. Wu, “Fast implementation of
[14] F. T. Ulaby and C. Elachi, Eds., Radar Polarimetry for Geoscience H/α-Wishart classification to polarimetric SAR images images,” in Proc.
Applications, vol. 1. Norwood, MA, USA: Artech House, 1990, p. 376. CIE Int. Conf. Radar, Oct. 2006, pp. 1–4.
[15] E. Pottier, “Dr. J. R. Huynen’s main contributions in the development [37] T. L. Ainsworth, J. P. Kelly, and J.-S. Lee, “Classification comparisons
of polarimetric radar techniques and how the ‘radar targets phenomeno- between dual-pol, compact polarimetric and quad-pol SAR imagery,”
logical concept’ becomes a theory,” Proc. SPIE, vol. 1748, pp. 72–85, ISPRS J. Photogram. Remote Sens., vol. 64, no. 5, pp. 464–471, 2009.
Feb. 1993. [38] P. Yu, A. K. Qin, and D. A. Clausi, “Unsupervised polarimetric
[16] S. R. Cloude and E. Pottier, “An entropy based classification scheme SAR image segmentation and classification using region growing with
for land applications of polarimetric SAR,” IEEE Trans. Geosci. Remote edge penalty,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4,
Sens., vol. 35, no. 1, pp. 68–78, Jan. 1997. pp. 1302–1317, Apr. 2012.
Licheng Jiao (SM’89) received the B.S. degree Fang Liu was born in China, in 1990. She received
from Shanghai Jiaotong University, China, in 1982, the B.S. degree in information and computing
and the M.S. and Ph.D. degrees from Xi’an science from Henan University, Kaifeng, China,
Jiaotong University, Xi’an, China, in 1984 and 1990, in 2012. Since then, she has been taking suc-
respectively. Since 1992, he has been a Professor cessive post-graduate and doctoral programs with
with the School of Electronic Engineering, Xidian the Key Laboratory of Intelligent Perception and
University, Xi’an, where he is currently the Direc- Image Understanding of Ministry of Education,
tor of the Key Laboratory of Intelligent Perception Xidian University, Xi’an, China. Her research inter-
and Image Understanding of the Ministry of Edu- ests include deep learning, polarimetric sar image
cation of China, International Research Center of classification, and change detection in polarimetric
Intelligent Perception and Computation. His current sar images.
research interests include intelligent information processing, image processing,
machine learning, and pattern recognition. He is a member of the IEEE Xi’an
Section Execution Committee, the President of Computational Intelligence
Chapter, the IEEE Xi’an Section and IET Xi’an Network, the Chairman of
the awards and Recognition Committee, the Vice Board Chair-Person of the
Chinese Association of Artificial Intelligence, a Councilor of the Chinese
Institute of Electronics, a Committee Member of the Chinese Committee of
Neural Networks, and an Expert of the Academic Degrees Committee of the
State Council.

Cking Network For Fast POLSAR Image Classification

Uploaded by

Copyright:

Available Formats

You might also like

Cking Network For Fast POLSAR Image Classification

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cking Network For Fast POLSAR Image Classification

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO.

7, JULY 2016 3273

Wishart Deep Stacking Network for

property of POLSAR data, lots of machine learning methods

It should be noticed that the parameters of WN include

Algorithm 1 Training of WN error condition. That is to say, if the total error E of WN

IV. A D EEP M ODEL OF WN-W ISHART D EEP

A. Wishart Deep Stacking Network (W-DSN)

Fig. 4. (a) The Pauli RGB of Flevoland. (b) Groundtruth.

‘Buildings’ (with least training samples, i.e., 0.43%) terribly,

Fig. 9. Accuracies of W-DSN and DSN.

You might also like