Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Statistical Feature Selection of Narrowband RCS Sequence

Based on Greedy Algorithm

Xuehui Lei, Xiongjun Fu*, Cai Wang, Meiguo Gao


Radar Research Laboratory, Beijing Institute of Technology, Beijing 100081, P.R. China
20905139@bit.edu.cn ; fuxiongjun@bit.edu.cn

separation criterion obtained from limited samples may not be


Abstract monotonic. Genetic Algorithm (GA) [2], Simulated
Annealing (SA) algorithm [3] and Greedy algorithm [4] are
Greedy algorithm is introduced to select the statistical all locally optimal search method. GA and SA are highly
features of narrowband RCS sequences. A novel cost function dependent on parameters in the algorithm. Greedy algorithm
for Greedy algorithm is proposed. The experiment results is more efficient than both GA and SA, and it may find global
demonstrate the rationality of the novel cost function and the optimal feature combination for some problems.
effectiveness of Greedy algorithm. The criterion of search method plays an important part in
deciding the optimal feature combination. For target
Keywords: greedy algorithm, cost function, feature selection, recognition, the separation criterion can be used to measure
narrowband RCS sequence, target recognition the performance of selected feature combinations. There are
some separation criterions that are commonly used such as
inter-intra class distance, class probability density function
1. Introduction and entropy function [1] etc. Refer to literature [5], the purity
parameter is introduced to represent the degree of overlapping
The technology of space target recognition is one of the
among different target feature combinations and it can be a
key technologies in the space target surveillance system,
new class separation criterion. This paper adopts Greedy
which is mainly used in detection and tracking space targets
algorithm to select the optimized feature combinations of
and feature extraction. Space targets including space stationˈ narrowband RCS sequences, which are used for target
satellites ˈ debris, rocket body and so on. They can be recognition. A new cost function is proposed in this paper: p ,
recognized and catalogued according to their own features. It
which stands for purity parameter. This new cost function is
is meaningful to study the space target recognition based on
used for statistical feature selection of narrowband RCS
low resolution radar, since many low resolution radars are
sequence in order to demonstrate its validity. In the 5th section,
still being used these days. Although only limited information
the new cost function will be compared with the other cost
about the target can be obtained by low resolution radar, the
structure properties and movement information of target are function: D / D , where D , and D stand for intra class
intra inter intra inter

contained in their echoes. Statistical features extracted from distance and inter class distance respectively. The results of
the Radar Cross Section (RCS) sequence can be used to target recognition are given in the end.
distinguish different space targets.
The number of RCS sequence samples is very large and 2. RCS statistical features
the samples are imperfect. In order to distinguish different
There are many factors that affect the measurement of
targets in their feature space, as many features are extracted
RCS, such as target attitude, radar wavelength and
as possible. This leads to high dimension of the original
polarization mode of radar wave etc. After a period of
feature space of RCS sequence. For the sake of decreasing the
observation time a sequence of RCS will be obtained. The
computation complexity and eliminating redundant features
statistics of RCS sequence contain a certain amount of
as well as obtaining good target recognition results, it is
information about the target. The statistical features of RCS
necessary to select RCS features in the high dimensional
sequence commonly used are as follows:
feature space. Search method select features that are useful
(1) Mean value: the arithmetic mean value of a sequence x ,
for target recognition according to a particular criterion. It can
be mainly divided into two categories: globally optimal i.e.,
N
search method and locally optimal search method. Exhaustion
method and B&B (Branch and Bound) method [1] are typical
x ¦x i
(1)
i 1

globally optimal search method. Exhaustion method finds the where N is the total number of the sequence;
globally optimal feature combination among the ergodic of (2) Median value: the number in the middle position of a
feature combinations. But if the dimension of feature space is sorted sequence x ' . That is to say, if the total number of
too high, it will be time-consuming and maybe impossible for the sequence N is odd,
Exhaustion method to find globally optimal feature
combination. Although the computation amount of B&B x median
(x N/2
x N / 2 1
)/2 (2)
method is less than Exhaustion method, B&B method if N is even,
requires the monotonicity of separation criterion. In fact, if x (x x )/2
N / 2 1
(3)
the separation criterion is monotonic in theory, the actual median N/2

___________________________________
978-1-4244-8443-0/11/$26.00 ©2011 IEEE



Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on February 22,2022 at 10:26:55 UTC from IEEE Xplore. Restrictions apply.
(3) Truncated mean value: the arithmetic mean value of a Greedy algorithm is an efficient search method, and it is a
sorted sequence x ' with the first several numbers and good alternative for feature selection in high dimension
the last several numbers truncated, i.e., feature space. The basic idea of greedy algorithm for feature
N 1 k selection begins from an original feature combination, then
x tr ¦x i (4) generates a new feature combination from the adjacent area of
i k the original solution randomly, calculates the variation of cost
where k is the number of data to be truncated. function 'E for these two feature combinations, if 'E  0 , the
(4) Minimum value: the minimal value of a sequence, i.e., new feature combination is accepted, if 'E t 0 , the original
x (5) min{ x } feature combination is reserved.
min i
The steps of Greedy algorithm are as follows:
(5) Maximum value: the maximal value of a sequence, i.e., 1. Determine the range of dimension m to be [1, M] , where
x max
max{ x } i
(6) M is the total dimension of high dimensional feature
(6) Extreme difference: the difference between the space. Generates m integers within [1, M] which
maximum value and the minimum value, i.e., represent the original feature combination m , calculate 0

'x (7) xmax  xmin the corresponding value of cost function E ( m ) . 0

(7) Standard deviation: also known as Mean Square Error, 2. Regenerate m '(1 d m ' d M) integers within [1, M] to be the
i.e., new feature combination m , calculate the corresponding
1
N
1
¦ x x (8) value of cost function E ( m ) . The variation between
2
S i
1
N 1 i 1
E ( m ) and E ( m ) is:
1 0
(8) Average absolute deviation:
'E E ( m )  E ( m ) (16)
1 0
1
P abs
N
¦ xi  x (9) 3. If'E t 0 , the original feature combination is reserved, if
'E  0 , the new feature combination is accepted, then set
i

(9) Standard mean difference:


m m , E (m ) E (m ) .
0 1 0 1
1 N
x x
M x
( ¦ i
 2/S) (10) 4. Repeat step 2, step 3 for a certain times.
N (1  2 / S ) i 1 S

(10) Q-order central moment: 4. Cost function


1 N

¦ x The cost function in Greedy algorithm plays an


q
B q i
x (11)
N i 1
important role in obtaining the optimal feature combination.
(11) Clustering centre: the most frequently value of a Different cost function usually leads to different results. Intra-
sequence x . Divide the sorted sequence x ' into several inter class distance (i.e., D / D ) is commonly used as a
inter intra
intervals, clustering centre is the median value of the separation criterion. Under the same dimension feature space,
interval that the most frequently value exists, i.e., the optimal feature combination can be obtained according to
C x  I (x  x ) / N
x min Loc
/ 2 (12)
max min interval this criterion. But it is not suitable to get the optimal feature
where I is the number of interval that the most
Loc
combination under the feature space with different
frequently value exists, N is the total number of dimensions, for the criterion is monotone increasing as the
interval
feature dimension increases. In view of this, a parameterˈ
intervals.
(12) Coefficient of variation: the ratio of variation index and purity parameter- p ˈ represents the degree of overlapping
average index for a sequence. The coefficient of among different feature combinations, is introduced as the
standard variation is one of the coefficient of variation, new cost function in this paper which is monotone decreasing
i.e., as the feature dimension increases. The value of purity
G S/ x (13) parameter is decided by the feature with the smallest degree
(13) Coefficient of skewness: measurement for asymmetry of of overlapping. Any feature combinations that contains this
sequence probability density function, i.e., feature are with the same degree of overlapping. The optimal
3/ 2
feature combination is the feature combination that makes the
b s
B /B 3 2
(14) value of cost function the smallest. In the experiment section
where B ( q 2, 3) stands for second-order central moment
q
of this paper, we will compare this new cost function with the
when q 2 and third-order central moment when q 3 . other cost function, D / D , by selecting statistical features
intra inter

(14) Coefficient of kurtosis: measurement for peak value of of real RCS sequences.
sequence probability density function, i.e.,
2 4.1. Intra class distance
b k
B /B 4 2
(15)
where Bq ( q 2, 4) stands for second-order central Intra class distance stands for the degree of dispersivity
moment when q 2 and fourth-order central moment of samples of same class. Suppose that N kinds of targets are
when q 4 . to be recognized, and the i th sample of the l th kind of target
is denoted as xli (l 1, 2,! , N ; i 1, 2,! , N l ) , where N l is
3. Greedy algorithm the length of samples. Every sample xli LV D /



Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on February 22,2022 at 10:26:55 UTC from IEEE Xplore. Restrictions apply.
GLPHQVLRQDOYHFWRULH xli ^ xli (1), xli (2),! , xli ( L)`  Nk

$OOWKHVHVDPSOHVcompose a sample set of the l th kind


¦| k
i 1
i
Z  lZ |
pk ,l (24)
of target :l (l 1, 2, "" , N ) . All N kinds of sample sets Nk
compose the whole sample set :  Where Z represents the feature combination, N k is the
Different features may have different units, so it is RCS sequence number of the k th kind of target, k Zi is the
necessary to normalized the value of every dimension. Take
i th value of feature combination of the k th kind of RCS
use of translation-extreme difference transform to
normalized the samples: sequence, lZ stands for all the values of the l th kind of target
xli (d )  ^ xli (d )`
min feature combination of RCS sequence. If | k Zi  lZ | 1 , it
x 'li (d ) d
1d l d N ,1d i d Nl
1, 2," , L (17)
max ^ xli (d )`  min ^ xli (d )` means that there is overlapping between the i th value of
1d l d N ,1d i d N
l 1d l d N ,1d i d N l
feature combination of the k th kind of RCS sequence and the
Where xli (d ) is the d th dimension of the i th sample of the
values of feature combination of the l th kind of RCS
l th kind target. The normalized sample is denoted as
sequence. If | k Zi  lZ | 0 , it means there is no overlapping.
x 'li (l 1, 2,! , N ; i 1, 2,! , N l )  ZKLFK FRPSRVH D QHZ
The total degree of overlapping of the k th kind of RCS
VDPSOHVHW : 'l  The mean value of the samples of the l th sequence is:
kind of target is: 1 N

1 Nl p ¦ p , (k 1,ĂĂ, N ) (25)
¦
k k ,l
m l
x , ( xi  : l , l 1, 2, "" , N )
i
(18) N 1 l 1, l z k

N l i 1
For feature combination Z , the average degree of overlapping
The dispersivity degree of the samples of the l th kind target among the value of different kinds of RCS sequence features
is : is:
Nl N
1 1
Dl ¦ ( x 'li  ml ) 2 ( x 'li  : 'l , l 1, 2,"" , N ) (19) p
N
¦ p , (k 1,ĂĂ, N ) (26) z k
Nl i 1 k 1

and the total dispersivity degree of the sample set :l' , namely,
5. Experiment and result analysis
the total within-class distance is :
N
In this section, experiment on selecting statistical
D ¦ D , (l 1, 2,"" , N )
intra l
(20) features of real RCS sequence is carried out, and the results
l 1

are also given.


4.2. Inter class distance There are total 238 narrowband RCS sequences of 13
space targets, and the 14 statistical features, mentioned in
Inter class distance stands for the dispersivity degree of section 2, of all 238 RCS sequences have been extracted and
different class samples. The inter class distance between saved. Every RCS sequence corresponding to a 14-
sample set : 'l and : 'k is defined as: dimensional feature vector, and there are total 238 feature
vectors. 161 of them are used as feature templates, 77 of them
Dlk (ml  mk ) 2 ˈ (l , k 1, 2,! , N , l z k ) (21) are used for test. Greedy algorithm with different cost

RU function: D / D and p , is used to select optimal feature intra inter

1 1 Nl Nk combination that is propitious to distinguish targets. We do


¦¦ x 'li  x 'kj x 'li  x 'kj 
T
 Dlk the experiment 5 times for each different cost function.
2 N lN k i 1 j 1
 (l , k 1, 2,! , N , l z k )  5.1. Experiment flow
The steps of experiment for a single time are as follows:
(22) (1) Initialize the index of dimension: m 1 ;
The total inter class distance is: (2) Generate m integers within the range of [1,14]
N

Dinter ¦ Dlk . (23) randomly, and save these integers into a matrix as a
l , k 1, l z k
feature combination. Repeat this process for
N c ( N c C14m ) times, then eliminate the repeated feature
3.1.3. Purity parameter---- p
combinations;
Purity parameter represents the degree of overlapping (3) Set m m  1 ;
among different target features. Its value is within 0 ~ 1 , the (4) Repeat step (2) and step (3) until m 14 . The matrix
smaller the purity parameter the smaller the degree of with feature combination selected randomly is complete.
overlapping among different target features. (5) Calculate the value of cost function of all the feature
Suppose the RCS sequences of N kinds of targets have combinations in the matrix; choose the corresponding
feature combination of the smallest value of cost
been obtained, and the statistical features of RCS sequences
function.
also have been extracted. The degree of overlapping between
The flow chart of a single time experiment is as follows:
the k th kind of feature combination samples and the l th kind
of feature combination samples is:



Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on February 22,2022 at 10:26:55 UTC from IEEE Xplore. Restrictions apply.
suboptimal feature combination through Greedy algorithm,
START
for the highest recognition rate is 72.72% which is obtained
through Exhaustion method.
initialize the index of
dimension˖m 1
5. Conclusions
In this paper, we aim to find a proper cost function to
generate m integers select optimal/suboptimal feature combination in high
dimension feature space. The efficient search method, Greedy
save them into a matrix algorithm, is adopted in selecting optimal/suboptimal
statistical feature combination of narrowband RCS sequence,
and the selected statistical feature combination is used for
no reach N c
space target recognition. Although there is limited
times? information about the space target contained in narrowband
RCS sequence, the greedy algorithm with the cost function
yes p can select the suboptimal feature combination to
m m 1 distinguish different space targets and obtain good target
recognition results. In fact, this new cost function can also be
no used to select features in other applications.
m 14?

yes
References
calculate the value of cost function [1] Zhaoqi Bian, Xuegong Zhang, Pattern Recognition (2nd
of all the feature combinations
Edition), Press of Tsinghua University: Beijing, 2000
[2] KuanChieh Huang, YauHwang Kuo, ICheng Yeh, “A
Novel Fitness Function in Genetic Algorithms to
choose the corresponding feature combination
of the smallest value of cost function Optimize Neural Networks for Imbalanced Data Sets”,
8th International Conference on Intelligent Systems
Design and Applications, 2008, pp.647-650.
END [3] Hai Deng, Polyphase code design for Orthogonal Netted
Radar systems, IEEE transactions on signal processing,
Fig.1. Flow chart of a single time experiment Vol.52, No.11, November.2004.
[4] Joonmin Gil, Chanmyung Kim, Younhee Han, “A
5.2. Results and analysis Greedy Algorithm to Extend the Lifetime of Randomly
Deployed Directional Sensor Networks”, 2010
With the selected optimal feature combination, the
Proceedings of the 5th International Conference on
nearest-neighbour fuzzy classifier [6] works on the 77
Ubiquitous Information Technologies and Applications
narrowband RCS sequences for test. The target recognition
(CUTE 2010), pp.1-5.
results are showed with the corresponding feature
[5] Jiawei Gao, Researeh and APPlications of Classifieation
combination in table 1 and table 2.
Algorithms in Imbalanced Data Sets, Shanxi Universit:
Shanxi, 2008.
Table 1. Optimal Feature Combination of the 1st Cost
[6] Xiankang Liu, Meiguo Gao, Xiongjun Fu, “A Nearest
Function and Results of Target Recognition
Neighbour Fuzzy Classifier for Radar Target
Feature Combination Recognition Rate
Recognition Using Combined Features”, 8th
12 27.27%
International Conference on Signal Processing, vol
1 12 49.35% 3,2006 .
1 28.57%
12 27.27%
1 12 49.35%

Table 2. Optimal Feature Combination of the 2nd Cost


Function and Results of Target Recognition
Feature Combination Recognition Rate
1 2 4 7 8 9 10 11 12 14 62.34%
1 2 4 7 8 9 10 11 12 13 14 58.44%
2 3 4 7 8 9 10 11 12 14 62.34%
2 3 4 7 8 9 10 11 12 13 14 58.44%
1 2 4 7 8 9 10 11 12 14 62.34%

Compare the results of target recognition in table 1 with


table 2, we can see obviously that the first cost function
outperform the second cost function. We obtain the



Authorized licensed use limited to: Indian Institute of Space Science And Technology. Downloaded on February 22,2022 at 10:26:55 UTC from IEEE Xplore. Restrictions apply.

You might also like