Professional Documents
Culture Documents
Torrione 2002 Masters
Torrione 2002 Masters
Torrione 2002 Masters
LANDMINE DETECTION
by
Date:
Approved:
Contents
List of Tables
List of Figures
vi
1 Introduction
2 Background
2.1
2.1.1
2.1.2
2.2
Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
2.4
12
2.4.1
12
2.4.2
13
2.4.3
14
15
2.5.1
15
2.5.2
17
2.5.3
18
2.5.4
18
22
2.6.1
23
2.5
2.6
2.6.2
2.6.3
24
27
30
3.1
31
3.2
36
3.3
37
39
3.4
44
4.1
44
4.2
Basis Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3
48
4.4
53
56
5.1
Decay Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.2
Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.3
59
5.4
62
65
6.1
66
6.2
66
6.3
68
74
iii
Bibliography
79
iv
List of Tables
2.1
10
List of Figures
2.1
2.2
10
2.3
26
2.4
26
3.1
31
35
Typical quadrature background measurements corrupted by some multiplicative constant, or some additive term which increases with frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Plots of the Cramer-Rao lower bound, calculated, and sample estimator variances versus the standard deviation of k. Parameters: bi = 10,
n2 = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.1
45
4.2
46
4.3
48
4.4
51
54
3.2
3.3
3.4
4.5
vi
4.6
55
5.1
58
5.2
59
5.3
60
Estimated landmine decay rates plotted against 1 and 2 in Hz (closeup). Each landmine type is represented by a different shape. Note the
high degree of spatial correlation between landmines of each type. . .
61
62
5.4
5.5
5.6
Gaussian PDF contours with scattered landmine and clutter decay rates 63
5.7
64
69
Receiver operating characteristics of non-rejecting support vector machines trained on decay rates, matched subspace outputs, and full
signal responses operating in the calibration grid . . . . . . . . . . . .
70
71
72
75
6.1
6.2
6.3
6.4
7.1
vii
Chapter 1
Introduction
Although estimates vary, agencies including the Red Cross and the United Nations
concede that there are between 60 and 70 million active landmines in the ground,
buried across 70 countries around the globe. Every year approximately 26,000 people
are maimed or killed by landmines and 8,000 to 10,000 of these victims are children
[1].
Currently, there are approximately 340 different models of anti-personnel landmines. Although these landmines cost as little as three dollars to produce, their
presence inflicts a tremendous cost - especially in developing areas. Firstly, the
cost to safely detect and remove each landmine can range between $300 and $1000.
Furthermore, many surviving landmine victims require artificial prosthetics. These
artificial limbs can cost between $100 and $3000, and they must be regularly replaced
(every 3-5 years in adults, and every 6 months in children) [1]. It is impossible to
measure the damage landmines inflict upon productivity, emotional well being, and
the peaceful reconciliation of neighbors after years of war.
As of 2002, the landmine crisis primarily affects poorer countries for which the
economic impact of landmines is especially devastating. There are an estimated 22.5
million landmines in Egypt, 16 million in Iran, and 10 million in Iraq, to list only
some of the most egregiously affected countries [1].
The primary contributor to the large cost of landmine removal is a high false alarm
rate stemming from large amounts of anthropic clutter that pervades minefields. Until
it is excavated and determined to not pose a threat, this clutter must be considered
as dangerous as an actual landmine. On occasion, false alarm rates as high as 95%
1
A large body of literature exists dealing with the applications of EMI sensors and
processing of EMI data to the detection of buried landmines and unexploded ordnance
(UXO). Some of this work has focused on determining the EMI responses from rotationally symmetric bodies [11, 12] and development of simplified phenomenological
models to fit such responses [13]. Several researchers have explored the processing
of time domain EMI responses to landmine detection using estimated decay rates
[14, 15, 16, 17, 18, 19]. Other work by Won et al. has indicated that the wideband
EMI spectral responses from different landmines are unique [20]. Gao et al. have
derived the complicated optimal wideband EMI detector and have compared its results to sub-optimal detectors [21, 22]. Additional signal processing research on the
detection and classification of low-metallic content landmines via EMI data has been
performed by Collins et al. [23, 24].
In this thesis, we build on this body of work in three ways. First, we will address
the problem of landmine response estimation via soil, or background removal and
show that our proposed estimator achieves the Cramer Rao lower bound under specific
statistical models of the received data. Second, we will apply the theory of matched
subspace detectors [25, 26, 27] to the detection and classification of landmines versus
clutter. Third, we will explore the possible applications of support vector machines
(SVMs) [28, 29, 30, 31, 32, 33] to the landmine detection problem.
The remainder of this thesis is organized as follows.
In chapter two we review some of the information fundamental to the rest of
the paper. We begin with a brief overview of electromagnetic induction sensors, the
data collection procedure used, and the particular EMI sensor used in this study:
the GEM-3. This is followed by a review of the Cramer-Rao lower bound and some
linear algebra preliminaries to the matched subspace detector. A full treatment of
matched subspace detectors is given prior to discussing the derivation and properties
3
Chapter 2
Background
2.1
In 1831 Michael Faraday made the discovery that a changing magnetic field can
generate or induce a current in a nearby conductor. Building upon Faradays work,
Maxwell generated his four most famous equations upon which all electromagnetics is
based. The phenomenology associated with EMI sensors (like hobby metal-detectors)
is based directly on these equations.
2.1.1
A standard EMI sensor has a primary coil, or transmitter coil, composed of wire
through which alternating current flows. This current flow generates a changing
magnetic field around the sensor that penetrates the ground. As Faraday noted, the
changing magnetic field from the transmitter coil induces current flow in the ground.
The current flowing through the earth (and any contaminants therein) generates
another magnetic field. Thus, it is possible to use a receiver coil to listen for the
magnetic field that results from the induced current flow in the earth. Of course, care
must be taken in the placement of and recording of measurements from the receiver
coil since the magnetic field of the transmitter will, in general, be much stronger than
the secondary field resulting from the earths response. The magnitude and phase of
the measured wideband EMI responses can be used to discern the amount, type, and
shape of buried metal objects [20, 34, 35].
Although Maxwells equations completely govern the responses of conducting ma5
terials in any shape and orientation, solving these equations for shapes of arbitrary
complexity is mathematically problematic.
It has been shown [12, 36] that the frequency-domain response of a buried highly
conducting object subject to EMI radiation can be modeled as:
H() = a +
X
n
bn
jn
(2.1)
Furthermore, the initial term a has been shown to be non-zero only for ferrous
targets [37]. Similarly, the time-domain response of such a system has been shown
[14, 38, 39] to be the weighted sum of exponentials:
S(t) = a(t) +
An en t
(2.2)
where, since the real part of n is negligible, n is real. In practice, the actual
responses of buried targets are well approximated by the first few terms in each of
the above summations. The primary parameters of interest are often assumed to be
the first few decay rates: 1 and 2 . A significant amount of work has focused on the
application of estimated decay rates to landmine detection [14, 15, 11, 19, 17, 18, 40].
For high metal-content objects, the primary decay rates are generally fairly small,
resulting in slowly decaying exponential responses. Such responses are relatively easy
to sample in the time-domain. However, for objects containing small amounts of
metal (like most modern landmines), the decay rate parameters are very large and
the resulting exponential signature decays very rapidly. This makes time-domain
measurement of the decay rates difficult due to the rate at which the signal decays.
In this work, a wideband frequency-domain EMI sensor is utilized. Since a wideband frequency-domain sensors responses are not time dependent, these sensors are
advantageous when measuring quickly decaying exponential signals.
6
2.1.2
In this work, data from a Geophex GEM-3 sensor was used [41]. This section describes
the GEM-3 sensor.
The GEM-3 is a wideband digital electromagnetic sensor weighing about 10
pounds. The sensor head of the GEM-3 consists of three concentric coils. The
inner coil is the receiver coil, and the two outer coils comprise the transmitter coil.
The combination of the magnetic fields induced by the outer coils creates a magnetic
cavity (area with zero magnetic field) at the receiving coil. This prevents interference
between the transmitted and induced magnetic fields. [42]
When operating as a wideband sensor the GEM-3 prompts for a set of frequencies
at which to collect the induced EMI response. The GEM-3 can operate at frequencies
between 30 Hz and 24 kHz. In this work, the GEM-3 was programmed to collect data
at the following ten frequencies:
750 1410 2370 4050 6030 8250 10890 14430 19450 23970 Hz
A sensor that operates a multiple frequencies has the advantage of being able
to see at multiple depths into the medium since low frequency signal will penetrate further into the medium than a high frequency signal. It has been previously
shown that the GEM-3 performs significantly better for discriminating landmines
from clutter than several other sensors at blind government run test sites [43].
It has also been established that different types of landmines generate unique
frequency-domain signatures, which are relatively independent of target-sensor orientation and distance for high metal content objects [20, 44]. However, the signatures
are dependent on target-sensor orientation and distance if the objects metal content
is low [43]. Recent work has also shown that these signatures change when the objects
are buried [43]. The goal of this research is to develop algorithms that reduce the ef7
fects of the soil on the measured signal and maximize the detection and classification
of landmines using their frequency-domain EMI signatures.
2.2
Data Collection
The GEM-3 data used in this work was taken from a government test site in Virginia.
The site is segmented into a large (50m x 20 m) grid consisting of squares measuring
1 meter per side. Before being used as a testing ground, all of the anthropic clutter
was systematically removed from the site. Some clutter was subsequently replaced to
provide discrete opportunities for clutter-induced false alarms. At the center of each
1m x 1m grid square a landmine, a clutter item, or nothing is emplaced. Ground
truth, i.e. the object buried in each square, is sequestered for this area and is known
only to the government sponsor. A separate area measuring 25 meters by five meters
was designated for sensor calibration and algorithm testing. The ground truth for the
calibration section is available to the public so that algorithms can be tested prior to
application on the blind grid.
The calibration data used for algorithm training in this work was recorded from
various spots throughout the calibration grid. In all, 20 clutter responses and 27
landmine signatures from 12 different landmine types at varying depths were collected from the calibration lanes. Data from 980 potential targets was measured in
the blind grid. In the calibration lanes, where the ground truth is known, two background measurements were taken from either side of the center target location as
shown in Fig. 2.1. In the blind grid, measurements alternated between background
and potential targets at locations shown in Fig 2.2. All of the central and background measurements were taken by human operators. Although the sensor height is
approximately constant across all measurements, variations are bound to exist due to
uneven ground, operator height and posture, and other factors. Thus, sensor height
8
1m
1m
2.3
Estimating an unknown parameter from data is a research topic that has been studied
extensively [47, 27, 48]. In this section two standard approaches to parameter estimation and the Cramer-Rao bound which places limits on the best possible unbiased
9
1m
1m
Minetype
VS-50
TS-50
M-14
M-14 (HE)
PMA-3
VAL69
VS-2.2
M-19
TMA-4
TM62P3
T-72
TM-46
VS1.6
Number of measurements
5
3
3
2
2
1
2
2
2
2
1
1
1
10
is said to be a consistent
if E(|x)
= (where E represents the expected value).
2
(E(()
f (|x)
(2.3)
where:
f (|x) =
f (x|)f ()
f (x)
(2.4)
In maximum likelihood estimation one considers the density f (x, ) and maximizes
is chosen to maximize f (x, ).
1
J()
(2.5)
(2.6)
J() = E [
2.4
2
ln(f (x|))|]
(2.7)
This thesis is primarily concerned with the detection of signals in noise. In this
section the optimal solution to the hypothesis testing problem - the likelihood ratio,
and a sub-optimal version of this test - the generalized likelihood ratio are reviewed.
2.4.1
In most binary decision problems, one has a set of data and wishes to determine
which of two separate distributions the data was drawn from. The two hypotheses
are generally termed H0 , and H1 , or the null and alternative hypotheses respectively.
The likelihood ratio is the optimal decision statistic for a wide range of decision
problems [48] and is defined as:
(x) =
p(x|H1 )
p(x|H0 )
12
>
<
(2.8)
The null hypothesis is accepted if (x) is less than a certain threshold, , otherwise
the alternative hypothesis is accepted.
Determining the optimal threshold value to use depends on the performance criteria chosen. The two most commonly used performance criteria are the NeymanPearson criteria and the Bayes criteria [48].
2.4.2
The standard likelihood ratio test assumes that the conditional distributions of the
data under the two hypotheses are known. Often this assumption is invalid. When
the two probability density functions are not known or are difficult to estimate,
the Generalized Likelihood Ratio Test (GLRT) is often utilized. The GLRT is an
intuitive (although not optimal) mechanism by which to approach the problem of
unknown distributions in a two-hypothesis decision scenario. Consider again the two
probability distribution functions, except assume that some parameter, denoted ,
associated with the probability density function p is unknown:
p(x|H1 ) p(x|, H1 )
(2.9)
p(x|H0 ) p(x|, H0 )
(2.10)
(x) = R
(2.11)
In practice, the calculation of this integral is often difficult, or if p(|H1 ) is unknown, impossible. One sub-optimal solution results from substituting estimates of
the unknown into the density functions. This formulation is termed the generalized
likelihood ratio test [48]:
(x) =
p(x|, H1 )|
p(x|, H0 )|
13
(2.12)
2.4.3
One simple and commonly encountered hypothesis testing problem involves determining the presence of a known signal s in the presence of additive zero-mean white
noise. In this case, the likelihood ratio reduces to a filter known as a correlation
detector or matched filter [48].
Let s and n be length i vectors consisting of the known signal and statistically
independent, N (0, In2 ) noise respectively. Consider a received data vector x. Under
the null and alternative hypotheses
H0 : x = n
H1 : x = s + n
The distributions of x under H0 and H1 are:
f (x|H0 ) =
i
Y
2n2
j=1
f (x|H1 ) =
i
Y
j=1
1
2n2
exp
exp
x2j
2n2
(xj sj )2
2n2
(x) =
i
Y
exp
j=1
(2xj sj s2j )
2n2
Taking the natural logarithm and incorporating the known values (n2 ,si ) into the
threshold () yields:
(x) =
i
X
xj sj
j=1
>
<
2.5
The common matched filter is a special case of a more general class of filters termed
matched subspace detectors [27]. Scharfs derivation of the matched subspace detectors (see [27, 25]) requires some linear algebra preliminaries which allow him to
show that the matched subspace detector has many interesting and powerful properties including invariance to rotations in certain subspaces and optimal performance
under certain assumptions. In this section the linear algebra associated with projection matrices (which are an integral part of matched subspace filters) is discussed. A
summary of Scharfs definitions of invariance and maximal invariant statistics (closely
following the discussion from [27]) is given, and finally, summaries of Scharfs application of these ideas to the development of the matched subspace filter and his proof
that the matched subspace detector is a uniformly most powerful test are provided.
2.5.1
(2.13)
has a solution. When the vectors {vi } are considered columns in a matrix H, the span
of {vi } is equivalent to the subspace denoted by <H>. The orthogonal complement
of <H> is denoted <H> .
15
(2.14)
(2.15)
The most common orthogonal projection matrices are the Cartesian coordinate projections in <2 :
"
Px =
"
Py =
1 0
0 0
0 0
0 1
(2.16)
#
(2.17)
which map a vector onto the x and y axes respectively. It is possible to generate an
orthogonal projection matrix onto any subspace <H> using the following formula:
PH = H(HH H)1 HH
(2.18)
For example, to form a projection onto the x-axis, the H vector is:
"
H=
1
0
and:
(2.19)
"
H
PH = H(H H) H =
1 0
0 0
= Px
(2.20)
(2.21)
An orthogonal projection onto <H> maps vectors contained in the subspace <H>
to themselves, and maps vectors lying in <H> to the zero vector. This can be seen
using the Cartesian projections:
"
Px
"
Px
2.5.2
c
0
0
d
"
=
#
"
c
0
0
0
(2.22)
#
(2.23)
In many decision problems, there are parameters associated with the probability
distribution functions of the measured signals which are considered nuisance parameters. In these cases it is desirable to reduce the set of viable decision rules to
those which are (in some sense) invariant to changes in the nuisance parameters.
As Scharf states:
This leads to the key idea behind invariance in hypothesis testing: When
presented with nuisance parameters that are extraneous to the hypothesis
test, look for transformations of the measured data that would introduce
these nuisance parameters and then look for a decision rule that is invariant to these transformations. [27] pg. 128
Consider the hypothesis testing problem of determining if X was drawn from
F1 (x) or F0 (x). If for every g in G:
x : F (x)
(2.24)
y = g(x)
(2.25)
F0 (y) = P [g(X) y]
(2.26)
17
(2.27)
(that is - if the only effect of the function g(x) on the distribution F (x) is to change
the parameter from to g()) then the family of distributions for which equation
2.27 holds is said to be invariant to G. Also, if the transformation g maintains the
dichotomy between H1 and H0 , the hypothesis testing problem is said to be invariant
to G.
2.5.3
(2.28)
(2.29)
Thus, all invariant tests may be written as a function of a maximally invariant statistic
[27]:
(x) = (M (x))
(2.30)
These results are important for the landmine detection problem because they
show that when deriving a decision rule for all invariant hypothesis testing problems,
it is possible consider only functions of a maximal invariant statistic.
2.5.4
In this section a review of Scharfs work is presented which shows that the problem
statement leading up to the matched filter is naturally invariant to a set of transformations and that the matched subspace detector is a maximal invariant statistic.
18
Scharfs explanation of why the matched subspace detector is uniformly most powerful is also reviewed.
In a detection problem, the exact form of the signal of interest is often unknown.
The signal may be subject to an arbitrary gain, or it may be a random (unknown)
combination of a set of basis vectors. As has been previously noted, a vector x which
lies in the subspace <H> can always be represented by a linear combination of a set
of vectors comprising the matrix H. The signal x can then be represented as:
x=
n hn = H
(2.31)
(2.32)
(2.33)
where QH is a rotation matrix in <H> and v lies in the subspace <A>. Note that
the rotation of v leaves v unchanged (since we are rotating in <H>), and H is
mapped to H0 . Let
19
y = QH (X + v)
(2.34)
y : N [H0 + v, 2 I]
(2.35)
The hypothesis test is then to discern between the null hypothesis ( = 0) and
the alternative ( > 0). As mentioned above, since QH and v are unknown, they are
considered nuisance parameters and the matched subspace detector should ideally be
invariant to them. To show that the matched subspace detector is uniformly most
powerful, Scharf shows that the distribution of y is invariant to these parameters,
the matched subspace detector is a maximal invariant statistic, and the matched
subspace detector has a monotone likelihood ratio.
It can be shown that the hypothesis testing problem in this case is invariant to
the set of functions
G = {g : g(y) = QH (y + w)}
(2.36)
(2.37)
and the distribution of y is given by eq. 2.35. Note that the form of the distribution
has not changed (only the mean parameter has been altered), thus the distribution
of y is invariant to G. Also, since the transformation of the parameter (H + v) is:
g(H + v) = H0 + v + w
(2.38)
(2.39)
g(H1 ) = H0 + v + w = H1
(2.40)
and
20
the dichotomy of the original parameter space is maintained, and the hypothesis
testing problem is G-invariant.
To show that the matched subspace statistic
2 = M (y) = yT PH y
(2.41)
is maximal invariant to the group G, Scharf shows that eq. 2.28 and 2.29 hold with:
g(y) = QH (y + v)
(2.42)
(2.43)
= (y + v)T PH (y + v)
(2.44)
= yT PH y
(2.45)
(2.46)
since QTH QH = I:
note that the quadratic form involving PH is the energy of the vectors in the subspace
<H>. Since the energies of both y1 and y2 in the subspace <H> are the same, y2
must be a rotation of y1 and/or differ only in the subspace <A>. Thus:
y1 = QH (y2 + v)
(2.47)
random variable. By the Karlin-Rubin theorem, since all 2 random variables have
monotone likelihood ratios, the 2 test is uniformly most powerful [27].
In the above discussions, the variance of the noise ( 2 ) has been assumed to be
known. If this is not the case, then the maximal invariant statistic becomes:
xT PH x
xT (P
H )x
(2.48)
xT PH x
xT (I PH )x
(2.49)
F =
or
F =
Furthermore, note that the constant false alarm rate matched filter can be described using a cosine statistic as [50]:
cos2 =
xT PH x
xT x
(2.50)
Although matched subspace detectors are significantly more complicated than the
special case of the matched filter, they provide a wide range of invariances and are
significantly more robust than matched filters when the signal of interest is not known
exactly, as is the case in the particular problem of landmine detection.
2.6
Support vector machines (SVMs) are a relatively new type of learning machine that
have many interesting properties [29, 32, 28, 31]. Support vector machines operate
by mapping the data of interest to a high dimensional space and generating a separating hyperplane in that space. The high dimensional separating hyperplane can
then be used for hypothesis testing. In this section, we describe the mathematics
associated with SVMs and review how they avoid the complexities usually associated
with decision making in a high dimensional space.
22
2.6.1
Assume that a set of training vectors {xi } are available which were drawn from some
probability density function P (x, y) where y Y : {1, 1}. Here, y represents the
classification of the training data into one of two sets or hypotheses. Let y = 1
correspond to H0 and y = 1 correspond to H1 . Then consider then the sets of training
data:
(x1 , y1 ), ..., (xN , yN ) <N Y
(2.51)
R[f ] =
(2.52)
However, since P (x, y) is generally unknown, this problem often cannot be solved
directly. In order to estimate the solution, one can minimize the empirical risk :
]=
R[f
n
X
l(f (xi ), yi )
(2.53)
i=1
which is effectively an estimation of the risk function using only the data available
to us. It is important to note that since full knowledge of the distribution P is rarely
available, the function f which minimizes the empirical risk may tend to overfit and
yield a complicated and non-realistic decision boundary. To address this dilemma,
f can be restricted to functions whose complexity (as calculated from the VapnikChervonekis (VC) dimension) is low (see [29] and [28]).
23
(2.54)
(hyperplanes in some space), it can be shown [28] that the VC dimension is bounded
by the minimal distance from the hyperplane to a data point; this distance is called
the margin.
2.6.2
(2.55)
(2.56)
Now the obvious question arises: All we have done is increase the complexity of
the problem we are trying to solve. By mapping to a much higher space, doesnt
the curse of dimensionality ensure that the decision making process should be more
difficult? [29]
While it seems as if the problem has become more complicated, statistical learning
theory tells us that as long as the complexity of the decision surface remains low
learning in F may actually be easier than learning in <N . [29]
24
A simple example from [32] and [29] illustrates this point. Consider a set of data
distributed in <2 . The goal in this example is to devise a decision rule to discern
between the two sets of data shown in figure 2.3. The decision boundary is shown by
the dashed line; note that it is non-linear. Consider the feature space mapping:
(2.57)
The same data transformed via the above mapping is re-plotted in fig. 2.4 where
the decision boundary is now a plane in <3 (in the form of eq. 2.56). The decision
boundary has been simplified by mapping the original data into a higher space. As
Muller et al. state:
All of the variability and richness that one needs to have a powerful
function class is then introduced by the mapping . [29]
where their function class is equivalent to the decision rule.
In the above example, the dimension of the space F was not large enough to be
of concern, but actual data sets of arbitrary dimension combined with mappings of
significant complexity often result in very large feature spaces which then become
impossible to manage [29]. However, for certain spaces F (and mappings ) there
exist functions which allow one to compute scalar products of high dimensional vectors easily. Such functions are called kernel functions and are denoted k [29, 28]. For
example, in the mapping presented earlier the dot products between the mapped vectors is easily calculated without actually mapping into the higher dimensional space
[29]:
(x) (y) = (x21 ,
= (x y)2 = k(x, y)
25
(2.58)
X2
X1
Z3
Z2
Z1
Previous work, including the illustrative example above, has shown that in some cases
mapping data into higher dimensions may decrease the complexity of the data separation problem. Furthermore, kernel functions provide a tool to obtain dot products of
vectors in high-dimensional spaces without actually performing the high-dimensional
mapping. However, a technique for determining the optimal hyperplane as to achieve
the best possible performance has not been presented. In order to find the optimal
hyperplane, the discussion given in [29] is reviewed.
Optimal performance, and thus the optimal hyperplane, can be found by minimizing the expected risk. Since the expected risk is generally unknown, the optimal
hyperplane is found by minimizing the upper bound on the expected risk via [28]:
s
]+
R[f ] R[f
h ln(ln( 2n
+ 1) ln( 4 ))
h
n
(2.59)
i = 1, ..., n.
(2.60)
(2.61)
T Y = 0
(2.62)
(2.63)
1T = [1, ..., 1]
(2.64)
T = [1 , ..., n ]
(2.65)
subject to:
where:
w=
n
X
i yi (xi )
(2.66)
i=1
Dij = yi yj (k(xi , xj ))
(2.67)
" n
X
yi i ((x) (xi )) + b
(2.68)
i=1
or:
f (x) = sign
" n
X
yi i k(x, xi ) + b
(2.69)
i=1
In the above discussion it is assumed that the training data available is perfectly
separable by a hyperplane in F. If this is not the case, a hyperplane that is a solution
to:
1
2
max T 1 [T D + max ]
2
C
28
(2.70)
29
Chapter 3
The Cramer-Rao Lower Bound
The response of the ground to wideband EMI sensors is a random vector b which
depends upon the makeup of the soil and the height of the sensor above the ground.
When measuring the EMI responses of buried targets in the earth, the variability in
the background response degrades our received signal. Thus, the measured response
from a buried M-14 landmine will differ significantly depending on the composition
of the soil under which the landmine is buried [43]. Since landmines are found
throughout the world in varying environments, background interference adversely
affects ones ability to define a robust non-adaptive decision algorithm.
One approach to reducing the effect of the background response is to take measurements near the potential target and use these measurements to estimate the
background signal at the target location. In this chapter we discuss several models of
the received background data and show that under certain assumptions the CramerRao lower bound can be achieved by using the available background measurements to
remove an estimate of the background signature from the potential target location.
In the measurements from the site in Virginia, two background signals were taken
for each potential target (see figures 2.1 and 2.2). We will assume that the background
response at the site is constant over a distance of one meter. This allows us to
model the background response as constant over the potential target location and
two neighboring background measurements. The assumption that the background
response is constant is reasonable since the composition of the soil is not expected
to change substantially over one meter and it has been shown [43] that sensor drift
occurs over a longer time span than would be required to take EMI readings over a
30
Response
30
40
50
60
70
80
3
10
10
LogFrequency
3.1
For each target we have three measurements from the GEM-3. They will be denoted
si and are modeled as:
31
s1 = n1 + b
(3.1)
s2 = n2 + b + r
(3.2)
s3 = n3 + b
(3.3)
where
b is some unknown (but constant across the three measurements) vector representing
the ground response
ni is additive zero-mean white Gaussian noise [43]
r is the response of a buried target.
represents an arbitrary (non-negative) gain affecting the target response due to
the targets depth beneath the ground and the sensors height above the ground
The hypothesis test will be to decide between > 0 and = 0. First, we are
concerned with obtaining the best estimate of b so that we can estimate r via
r = s2 b.
(3.4)
= s1 + s3 .
b
2
(3.5)
This estimator is widely used in practice [43], but little analysis has been performed
is unbiased. This is
to evaluate its statistical properties. First we must show that b
easily shown by:
32
= E[
E[b]
s1 + s3
]
2
(3.6)
1
= E[n1 + b + n3 + b]
2
(3.7)
= b.
(3.8)
VAR[bi ] = E[(bi bi )2 ]
(3.9)
= E[b2i ] b2i
(3.10)
1
= E[(s1i + s3i )2 ] b2i
4
(3.11)
1
= E[n21i + n23i + 4n1i bi + 4n3i bi + 2n1i n3i + 4b2i ] b2i
4
(3.12)
=
33
n2
2
(3.13)
To determine optimality, we must show that the variance of bi achieves the CRLB
(eq. 2.5), using eq. 2.7 for the Fisher information. Since s1 and s3 are distributed as
N (b, n2 I), we have:
J(bi ) = Ebi [
2
ln(f (s1i , s3i |bi ))]
bi
(3.14)
(s1i bi )2 (s3i bi )2
2n2
(3.15)
1 2
(s 2s1i bi + b2i + s23i 2s3i bi + b2i )
2n2 1i
(3.16)
1
ln(f (s1i , s3i |bi )) = 2 (2s1i + 2bi 2s1i + 2bi )
bi
2n
(3.17)
(3.18)
2
n2
(3.19)
34
(3.20)
20
Response
30
40
50
60
70
80
3
10
10
LogFrequency
35
10
20
Response
30
40
50
60
70
80
3
10
10
LogFrequency
Figure 3.3: Typical quadrature background measurements corrupted by some multiplicative constant, or some additive term which increases with frequency
be subject to the same noise effects (additive, multiplicative, etc...). However, it is
unclear which statistical assumptions better model the background interference. For
completeness, we present the Cramer-Rao lower bound derivations for both cases. We
proceed to determine whether the previously posed estimator is still optimal when
the assumptions regarding the statistics of the noise are modified.
3.2
For the in-phase case we will model the extra interference as a random DC term cj
with variance c2 :
s1 = n1 + b + c1
36
(3.21)
s2 = n2 + b + r + c2
(3.22)
s3 = n3 + b + c3
(3.23)
We assume that the cj are distributed as N (0, c2 ). Note that while the b and
n vectors are functions of frequency, the DC terms cj are constant across frequency.
Under these assumptions, the si are distributed N (b, I(n2 + c2 )). It is easy to show
is unbiased, and that its variance is 2 =
that the estimator b
b
2 + 2
n
c
.
2
From the
distribution of f (s1i , s3i |bi ), we can show that the form of the CRLB corresponding
to equation 3.16 is:
ln(f (s1i , s3i |bi )) = ln(C) +
1
[(s1i bi )2 + (s3i bi )2 ]
+ c2 )
2(n2
(3.24)
(n2
(3.25)
Multiplying by negative one and taking the inverse, we again find the CRLB equal
to the variance of the estimator and the estimator is thus optimal under the in-phase
hypothesis.
3.3
This derivation is very similar to the in-phase model. In fact, the in-phase model of
an additive DC term is really a special case of the general additive vector encountered
here. In this model, the extra interference is modeled as a vector cj whose individual
terms cji have variance c2i :
37
s1 = n1 + b + c1
(3.26)
s2 = n2 + b + r + c2
(3.27)
s3 = n3 + b + c3
(3.28)
From the observed data, we can see that the variance of the cji increases with frequency. We assume that the cji are distributed as N (0, c2i ). Let 2c be the vector of
ci variances.
h
iT
(3.29)
The cj vectors are distributed as N (0, I 2c ). Under these assumptions, the sj are
is unbiased, and
distributed N (b, I(n2 + 2c )). It is easy to show that the estimator b
that its variance is 2 =
b
2 + 2
n
c
.
2
(s1i bi )2 (s3i bi )2
2(n2 + c2i )
(3.30)
1
(s2 2s1i bi + b2i + s23i 2s3i bi + b2i ) (3.31)
+ c2i ) 1i
2(n2
(3.32)
2
+ c2i )
(3.33)
(n2
2
+ c2i )
(3.34)
3.4
(3.35)
We now consider the quadrature case and assume that multiplicative Gaussian noise
is affecting the measured background signals. In this model, the multiplicative scaling
effects known to affect target responses are also assumed to affect the background
responses. This makes this model perhaps the most intuitively satisfying of all the
statistical models presented.
The multiplicative noise terms affecting the background responses are denoted kj
and are assumed to be distributed as N (1, k2 ). The received signals are modeled as:
s1 = n1 + k1 b
(3.36)
s2 = n2 + k2 b + r
(3.37)
s3 = n3 + k3 b
(3.38)
39
Note that s1 and s3 are distributed N (b, (b2 k2 +n2 )I). Furthermore, the estimator
= s1 +s3 is still unbiased.
b
2
Since the mean value (bi ) enters the signal distribution in the variance as well as
the mean, the calculations are more complicated. Since we assume that the noise
interference is white, we can consider the scalar equivalents of the pdf. The variance
of bi is given by:
VAR[bi ] = E[b2i ] b2i
= E[(
1
2
= E[n21i + 2 n1i k1i bi + 2 n1i n3i + 2 n1i k3i bi + k1i
b2i +
4
(3.39)
(3.40)
(3.41)
2
2 k1i bi n3i + 2 k1i b2i k3i + n23i + 2 n3i k3i bi + k3i
b2i ] b2i
(3.42)
2
si
= (b2i k2 + n2 )
(3.43)
with
1
1
exp[ 2 ((s1i bi )2 + (s3i bi )2 )]
2
2si
2si
(3.44)
and apply equation 3.16. After taking the natural logarithm, the equation can be
separated into two terms from the coefficient and exponential portions of equation
3.44:
ln (
1
1
) 2 ((s1i bi )2 + (s3i bi )2 )
2
2si
2si
40
(3.45)
(3.46)
(3.47)
(3.48)
or:
s4i
1
2 2 b2i k4 + s2i
(3.49)
Note that in this case, our estimator does not achieve the Cramer Rao lower
bound. In order to determine how close the variance of the proposed estimator is to
the variance of the optimal estimator, consider the term:
2 b2i k4
(3.50)
in the denominator. Since this term differentiates the CRLB from the variance of the
proposed estimator, as the term approaches zero, the difference between the variances
becomes negligible.
41
2b
3
2.5
2
1.5
CRLB
Sample Variance
(2k *b2 + 2n )/2
1
0.5
0
1e005 0.0333 0.0667
0.1
0.133 0.167
k
0.2
0.233 0.267
0.3
Figure 3.4: Plots of the Cramer-Rao lower bound, calculated, and sample estimator
variances versus the standard deviation of k. Parameters: bi = 10, n2 = 1.
To determine how well the proposed estimator performs compared to the CRLB,
a set of data was generated under the proposed assumptions and the actual (sample)
variance of the estimator was compared with the theoretically calculated variance
of the estimator and the Cramer-Rao lower bound. Figure 3.4 shows the CramerRao lower bound, the sample variance from a set of ten thousand data points, and
the calculated variance of the estimator (s2 /2). Note that the difference between the
CRLB and the sample and computed variances is small, especially for small k2 values.
In experiments, almost all estimated k2 values were found to be below 0.1 (except for
the lowest frequency measurement which, due to near-zero average magnitude, had a
high estimated k2 ). Thus, despite not achieving the CRLB, the proposed estimator
is expected to perform well on this data set.
42
We have shown that the intuitive estimation procedure that involves subtracting
the mean of the received background signals is optimal under three different assumptions regarding the underlying stochastic nature of the received signals:
1. if the signal is corrupted by additive white noise
2. if the signal is corrupted by additive white noise and a Gaussian-distributed
additive DC term (in-phase)
3. if the signal is corrupted by additive white noise and a Gaussian-distributed
additive vector (quadrature model 1)
and although not optimal, the intuitive procedure is a low-variance estimate when the
signal is corrupted by additive white noise and a Gaussian-distributed multiplicative
term (quadrature model 2). In the following chapters we will utilize the proposed
estimation technique to obtain estimates of the actual target responses for use in our
detection algorithms.
43
Chapter 4
Signal Processing Using Matched
Subspace Detectors
In chapter 3 we proposed an estimator of the background signal b which is an optimal
or low-variance estimator under several models of the underlying stochastic processes.
Using this estimator, we can now estimate the target response via
r = s2 b.
(4.1)
Using this target response estimate, a detection algorithm that distinguishes between
landmines and clutter and between different landmine types can be developed. In this
section the application of matched subspace filters to correctly identify and classify
landmines is presented.
4.1
Inphase
Quadrature
500
400
Response
300
200
100
0
100
3
10
10
LogFrequency
15
10
Response
5
0
5
10
Inphase
Quadrature
15
10
10
LogFrequency
46
4.2
Basis Estimation
47
16
14
Response
12
10
8
6
Mean M14 Response
Actual M14 Response
Estimated M14 Response
4
2
3
10
10
LogFrequency
4.3
The clutter present in the blind grid poses a unique problem to traditional subspace
detection techniques. Clutter is by nature difficult to classify (generally made up of
anthropic and natural conductors with an enormous range of sizes and shapes). Also,
the calibration data set contained only 20 clutter responses. One approach considered
was to model the clutter as a set of basis functions and have a clutter-detection
algorithm to compare against our landmine detection algorithm. However, attempts
to formulate a basis to model clutter are inherently limited since clutter is comprised
of an infinite set of possible shapes, sizes, and materials. Despite the wide range
of clutter which impedes most detection techniques, a matched subspace detector
should be somewhat naturally robust to clutter interference. Consider a piece of
48
random clutter whose response is some vector x. Our decision statistic is the cosine
statistic (equation 2.50):
xPH x0
.
xx0
(4.2)
(4.3)
(4.4)
i (x)p(Hi )
(4.5)
Where p(Hi ) represents the a priori probability of minetype i. Since all mine types
are considered equally likely a priori, this reduces to:
=
X
i
49
i (x)
(4.6)
Equation 4.6 suggests implementing a bank of matched subspace filters and summing their outputs to form a decision statistic. However, this formulation also assumes that the distribution p(x|H0 ) is known, but in this work, the distribution of
clutter is unknown and difficult to estimate. As an illustrative example of the problems encountered when p(x|H0 ) is unknown, consider n matched subspace filters each
tuned to a specific landmine type. When a landmine response is presented to the
bank of filters, a typical set of outputs contains one large response coinciding with
the matched subspace filter tuned to that landmine type. When a clutter response
is fed to the same bank of filters, although no filter bank produces a particularly
large result, the clutter vector generates significant responses from several different
filter banks because the clutter model in the denominator which would normally offset the numerator is missing. That clutter induces significant responses from several
filter banks makes intuitive sense since all of the landmine responses, when taken
together, span a large subspace and clutter will undoubtedly have some energy in
the span of this space. For typical examples of the matched subspace filter bank
outputs for clutter and landmine data, see figure 4.3. Note that the sum of the outputs across filter banks for the input clutter vector is larger than the sum for the
landmine vector. In this case, a better (although sub-optimal) decision statistic than
the summation across the filter banks is the maximum value across the filter banks.
Although this technique is not equivalent to the Bayesian solution to the multiple
hypothesis test problem, the similarities are evident. The Bayesian solution to the
multiple hypothesis testing problem is to choose Hi such that Hi maximizes the a
posteriori probability p(Hi |x) [48].
A bank of matched subspace detectors was thus generated, with each filter tuned
to a specific landmine type. The decision statistic chosen was the maximum value
50
Matched Subspace Outputs vs. Filter Bank for Landmine and Clutter Responses
1
Landmine Filter Bank Outputs, sum = 1.0110
Clutter Filter Bank Outputs, sum = 1.2803
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
6
8
Filter Bank Number
10
12
Figure 4.4: Comparison of filter bank outputs resulting from landmine and clutter
responses. Note that the sum across the filter banks from the clutter response is
larger than from the landmine response.
across the bank of filters.
= max i
i
(4.7)
Despite not being optimal, we shall see that the performance of this statistic is very
good. Furthermore, the maximum value across the filter banks provides an intuitive
method to perform landmine classification - the landmine type corresponding to the
largest filter bank output is considered our best guess of the underlying landmine type.
This is different from the maximum a posteriori Bayesian solution which chooses Hi
to maximize p(Hi |x); here Hi is chosen to maximize the percent energy of x in PHi
which is an intuitive measure of p(x|Hi ).
Note that since equation 4.2 contains a normalization term in the denominator
(xx0 ), the detector ensures that the maximum output from the detector is one 51
regardless of the energy of the input vector. This is important in a bank of detectors
since the numerator (xPH x0 ) will very often produce a large result for a large input
energy x.
The invariance that matched subspace detectors provide to gain occasionally has
some drawbacks. The primary drawback in this work stems from very low energy
clutter which often looks like deeply buried high-energy landmines. Consider the
VS-50 landmine (see fig. 4.1) which has a relatively high energy and rather flat frequency response. A substantial amount of low energy clutter also has a flat frequency
response. As a result, scaled low energy clutter often looks like a VS-50 landmine
to a matched subspace detector.
However, our prior knowledge regarding the depths at which landmines can be
buried leads us to conclude that very low energy flat signatures are not landmines
buried meters in the ground, rather they are small pieces of clutter. In this work we
assume that landmines will not be buried beyond their tactical depths. We further
assume that the distribution of landmine depths in the blind grid is uniform and commensurate with the depths found in the calibration grid. Under these assumptions,
we implemented an energy pre-screener that evaluates the energy of each potential
target vector to ensure that it is commensurate with the current filter bank landmine
type (within one order of magnitude from the lowest and highest energies from the
calibration grid for that particular landmine type). If the energy is within limits, the
subspace detector proceeds normally, otherwise that particular bank of the subspace
detector (wherever the input energy was found to be outside the reasonable range of
energies for that landmine type) is manually assigned a low output value.
Besides discriminating between clutter and landmines, detection algorithms must
also discriminate between empty ground signatures and landmines. While the blind
grid contains several blank squares containing neither anthropic clutter or landmines,
52
no such squares were measured in the calibration grid, so our detector may be subject
to false alarms caused by empty grid squares. We did not consider this a serious problem because background-corrected responses from blank grid squares should contain
very little energy and be automatically rejected by the energy pre-screener.
4.4
1
0.9
0.8
0.7
Pd
0.6
0.5
0.4
0.3
Quadrature MSS ROC
InPhase MSS ROC
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
54
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
55
Chapter 5
Decay Rate Estimation
As discussed in chapter 1, a popular method of discriminating landmines from clutter
is through characteristic decay rate estimates. In this chapter we discuss why decay
rates may be useful for target identification and discrimination, the estimation procedure that has been utilized, the relative locations of poles from the calibration lanes,
and a simple method of discrimination using Gaussian probability density functions.
5.1
Decay Rates
The EMI responses of a highly conducting body are given by equations 2.1 and 2.2
which are repeated here for convenience.
H() = a +
X
n
S(t) = a(t) +
bn
jn
An en t
(5.1)
(5.2)
computational load required to estimate these parameters and the fact that decay
rates do not provide a sufficient statistic [15].
5.2
Estimation Procedure
In this work, we focused on estimating the two primary decay rates from our EMI
data. In order to estimate 1 and 2 , an objective function was generated to minimize
the mean-square error between our estimated responses and the data. The MATLAB
function FMINUNC (in the optimization toolbox) was then used to find the optimal
five parameters to model each landmine. (Five parameters: DC term a, two gains b1 ,
b2 and two decay rates 1 , 2 .)
Often (especially when modeling clutter), the algorithm used by FMINUNC could
not find potential solutions any significant distance from the initial values provided.
This may be due to a local minimum in the objective function near the initial guess.
In these cases (when the resulting parameters were deemed too close to the initial
guesses), the initial decay rates were varied over a wide range and the optimization
was carried out at each point. The resulting estimate with the lowest error was chosen
as the best estimate of the target decay rates.
The error in these models was very low across a wide range of mine energies.
Figures 5.1 and 5.2 show the parametrized fits to the data for one high-energy and
one low-energy landmine.
Since the estimated decay rates approximate the actual responses well and the
estimated responses shapes are highly correlated, it is intuitive to suppose that the
decay rates estimated from different responses from the same landmine type would be
clustered together to some degree. Such clustering would indicate that the estimated
decay rates are drawn from some target dependent distribution and could facilitate
the formulation of a detector based on them.
57
350
300
250
200
150
100
50
Quadrature Data
Inphase Data
Quadrature Fit
Inphase Fit
0
50
3
10
10
LogFrequency
15
10
5
0
5
10
Quadrature Data
Inphase Data
Quadrature Fit
Inphase Fit
15
10
10
LogFrequency
5.3
One of the simplest approaches to incorporate the estimated decay rates into a detection algorithm is to model their statistical distribution with a 2-Dimensional Gaussian
probability density function and generate detectors based on these PDFs. By combining the probability density functions for the different landmine types, a mixture of
59
x 10
4.5
4
3.5
VS50
TS50
M14
PMA3
VAL69
VS2.2
M19
TMA4
TM62P3
T72
TM46
V31.6
3
2.5
2
1.5
1
0.5
0
5
4
x 10
Figure 5.3: Estimated landmine decay rates plotted against 1 and 2 in Hz. Each
landmine type is represented by a different shape.
Gaussian densities is formed. For each cluster of decay rates (clusters do not necessarily represent all landmines of a given landmine type) the sample mean and variance
were calculated using standard techniques. However, with so few data points for each
landmine type, these estimates are suspect. For example, the calibration grid contains only one instance of certain landmines. These solitary landmines are clustered
alone. To estimate the variance of their decay rate distribution functions, an estimate
of the average decay rate variance across landmine types was used. Also, no attempt
was made to generate estimated correlation matrices since there was rarely enough
data to make for a decent estimation. Contours of some of the resulting estimated
Gaussian distributions are shown in figure 5.6. The combination of the separate decay rate PDFs results in a mixture of Gaussian pdfs across the range of i values.
60
x 10
VS50
TS50
M14
PMA3
VAL69
VS2.2
M19
TMA4
TM62P3
T72
TM46
V31.6
2.5
1.5
0.5
1000
2000
3000
4000 5000
1
6000
7000
8000
We assumed that the clutter decay rates were totally random (uniform across
the range of frequencies) since we had little information to base any general clutter
model upon. Under this assumption the optimal detector for each landmine type is a
threshold on the mixture of Gaussian PDFs (or a monotonic function there of). Since
we have estimated the means and variances of the landmine clusters, this decision
statistic is a GLRT. To make the detector capable of discerning between all landmine
types and clutter, we followed the filter bank procedure outlined in the Chapter 5.
Thus, our results could be used to discriminate between landmine types by choosing
the filter bank with the highest response to an estimated set of decay rates.
61
x 10
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
10
4
x 10
Figure 5.5: Estimated clutter decay rates plotted against 1 and 2 in Hz. Note that
the estimated decay rates for clutter objects are spread throughout a wide frequency
range.
5.4
In this section we briefly discuss the ROC curves generated from the parameter based
detector discussed above. Figure 5.7 shows the ROC generated from the calibration
data.
Note that the algorithm does not achieve a 95% detection rate until its false alarm
rate approaches 35% and the algorithm only achieve a 100% detection rate at a 45%
false alarm rate. Furthermore, we have good reason to believe that the detector will
not be robust in a situation with a large amount of clutter. This belief is based on
the very small amount of training data with which the Gaussian distributions were
generated (often using only one or two data points). Since this detector did not
62
x 10
2.5
Mines
Clutter
1.5
0.5
1000
2000
3000
4000
5000
6000
Figure 5.6: Gaussian PDF contours with scattered landmine and clutter decay rates
perform as well as the matched subspace detector, and since we have reason to doubt
its robustness to unseen data, the blind grid results were not sent to the government
sponsor to be scored. In Chapter 7 we discuss ways by which we might improve the
detector prior to sending the results to be scored.
63
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 5.7: ROC for Gaussian-PDF estimated decay rate-based detector operating
in the calibration grid.
64
Chapter 6
Support Vector Machine Algorithms
Support vector machines are statistical learning machines as discussed in Chapter 2.
The learning algorithm in a SVM requires the solution of the quadratic programming
problem which maximizes the margin between the optimal hyperplane and the data
set (see equation 6.1). This chapter outlines some of the uses and advantages of support vector machines before discussing the implementation of an SVM for landmine
detection.
1
max T 1 T D
(6.1)
Support vector machines are versatile tools that can be applied to almost all
classification and detection problems. Their power stems from mapping observed
data to a high dimensional space through a kernel function. Their simplicity stems
from restricting the decision boundary to a hyperplane in the high dimensional space.
Support vector machines have been successfully used in digit recognition, object
recognition, function estimation, and time-series prediction (for a list of references,
see [29] and [30]).
As shown in figures 2.3 and 2.4, support vector machines can find linear solutions to decision problems that require non-linear solutions in the observed space.
Furthermore, by maximizing equation 6.1 one guarantees that the optimal linear
decision surface in the high dimensional space has been found.
65
6.1
In this work, support vector machines were trained on three sets of data: estimated
decay rates, estimated signal responses, and the outputs from the bank of matched
subspace detectors.
Originally, a support vector algorithm was created using MATLABs QUADPROG function (available in the Optimization Toolbox). This function worked well
on small data sets, but performance was poor on large data sets and in cases where a
soft classifier was required to separate the data (if the SVM can not perfectly separate
the calibration data using a hyperplane, the constraints on equation 6.1 need to be
relaxed, see equation 2.70). The performance degradation was probably due to the
lack of support vector machine specific optimizations in the QUADPROG function
(see [29] and [31] for a summary of numerical techniques for improving support vector
machine computation times). To reduce computation time, a support vector toolbox
was chosen that used the sequential minimal optimization technique (see the end of
chapter 2 for details and references).
6.2
One of the biggest concerns when creating a support vector machine is the choice
of kernel function. A significant body of work has been devoted to optimizing the
choice of kernel and kernel parameters (for references see [29]). Most of the techniques
available for kernel function selection are computationally intensive and require a
large number of data points to analyze.
In our work, we based the kernel selection solely on the model performance in
the calibration grid. Two kernels were considered: Gaussian and polynomial. These
kernels are shown in equations 6.2 and 6.3 respectively. The Gaussian kernel con66
sistently out-performed the polynomial kernel and was chosen to act as our kernel
function. The good performance of the Gaussian kernel is intuitively satisfying because the main metric in the Gaussian kernel is the 2-norm of the vectors and it is
reasonable to suppose that the distance between our data vectors is a useful metric
for discrimination [31].
|x1 x2 |2
c
(6.2)
k(x1 , x2 ) = (x1 x2 + )d
(6.3)
k(x1 , x2 ) = exp
The only parameter in a Gaussian kernel function is the variance term c. Three
different variances were used for the three different support vector machines (decay rates, landmine responses, and MSS outputs). Experiments were performed
by shifting the variance term across several orders of magnitude for each data set.
The resulting ROC curves were analyzed as were the decision boundaries each SVM
generated. Based on performance observations, the following c values were chosen:
1 107 , 1 103 , 1 (corresponding to variance terms in the decay rates, full signal,
and matched subspace detector based support vector machines respectively).
To perform detection across a range of different landmine types, the winner takes
all methodology was utilized [33]. The winner takes all technique is akin to the filter
bank technique described in Chapter 5. The maximum result across the bank of SVMs
was chosen as the threshold value, and the landmine type corresponding to that filter
bank became our landmine type estimate.
To generate different SVMs corresponding to the different landmine types, the
landmines were clustered according to type, except that the two M-19 landmines
were separated from one another. The M-19 landmines required separation because
of their close proximity to a large number of clutter decay rates. As a result, the
decision boundary generated by the SVM was very counter intuitive when the M67
19 mines were grouped together. Also, as with the matched subspace detector, the
M-14 mines containing the high explosive (HE) were separated from the other M-14
landmines.
Since the support vector machine is a learning algorithm, a shortage of training
data will result in poor generalization capability. To overcome the lack of training
data, each landmine response was estimated three separate ways - once with the mean
of the nearest background signals (as discussed in Chapter 3), and twice more by
using each of the neighboring background signals as our estimate of the background.
Although this estimation technique is non-optimal, a support vector machine has no
a priori information regarding the data being modeled, so a large amount of training
data must be available for the SVM to generalize well.
Furthermore, two versions of each of the three different support vector machines
were created. All of the SVMs were trained to reject clutter, but we were also
interested in how the SVMs would behave if the SVM for each landmine type was
trained to reject all other landmines types. These different SVMs will be referred to
as non-rejecting (SVMs that were not trained to reject other landmine types) and
rejecting (those that were trained to reject other landmine types).
Under the above parameters the SVMs were trained. In the next section we
discuss the results from these three support vector machines.
6.3
To better visualize the decision boundaries generated by our SVMs, the SVMs decision contours were plotted along with the locations of nearby clutter and landmine
decay rates (see Figure 6.1). Note the intricate curve of the support vector machine
decision boundary, and especially the class two margin. We also note that one of the
landmines being modeled falls outside the decision boundary. This is to be expected
68
SVM boundaries for one mine type with scattered mines and clutter
11000
10000
9000
8000
7000
6000
5000
4000
2000
4000
6000
1
8000
10000
Figure 6.1: Support Vector Machine decision boundaries for non-rejecting SVMs
and relevant landmine and clutter parameter locations from the calibration grid.
if a hyperplane in F cannot separate the data perfectly. We also note that the support vector machine may be slightly over-fitting the clutter which we believe is more
uniformly distributed than the decision boundary would indicate.
The calibration grid ROC curves for the three different support vector machines
are shown in figures 6.2 and 6.3.
machines achieve a detection rate of at least 95% at 10% false alarm rate, several of
the support vector machines do not achieve 100% detection until high false alarm rates
have been encountered. Also, the receiver operating characteristics of the rejecting
and non-rejecting versions of the SVMs did not differ significantly in the calibration
data.
The blind grid results for these SVMs are shown in figure 6.4. The difference
69
0.8
0.7
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 6.2: Receiver operating characteristics of non-rejecting support vector machines trained on decay rates, matched subspace outputs, and full signal responses
operating in the calibration grid
between the rejecting and non-rejecting versions of the SVMs was negligible in the
blind grid, so only the results from the non-rejecting SVMs are shown.
We note that the resulting ROC curves have long tails, i.e. they rise very
quickly to a probability of detection around 60% or 70%, but increase slowly beyond
that point. In fact, none of the support vector machines achieves a 100% detection
rate until their false alarm rate is above 99%.
The support vector machine which performed best was the SVM trained using
the entire signal. This makes sense since there is more information regarding the
true nature of a buried object in the full target response than in the outputs from
a bank of matched subspace filters or the estimated decay rates. Essentially, these
parameterizations of the signal do not provide a sufficient statistic.
70
Decay Rates
MSS
Full Signal
0.8
0.7
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Pd
0.6
0.5
Energy
SVM on MSS results
SVM on parameters
SVM on full signal
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 6.4: Receiver operating characteristics for three different support vector
machines operating in the blind grid
lies in the subspace of an average M-14 landmine, the M-14 filter bank would register
a small amount of energy whenever a VS-50 landmine was present. However, since
the matched subspace detectors decision statistic is the maximum value across the
bins, it would not use this possible correlation. In that case, it might be possible to
train a support vector machine that would take into account the information in the
other filter banks that the matched subspace detector missed.
To summarize, if there were no overlap in the information extracted by the
matched subspace detector filter banks, the only important information in the filter bank outputs would be the maximum value across the banks. If there were
significant overlap between the filter banks, it might be reasonable to assume that
there was information in the remaining filter bank outputs that could be useful. The
72
poor performance of the SVM operating on the matched subspace detector outputs
suggests that there is little information to be gleamed from the outputs of the other
filter banks, and thus, little overlap amongst filter banks.
These results are discussed in more detail in chapter 7.
73
Chapter 7
Conclusions and Future Work
In the preceding chapters, several statistics-based methods of discriminating landmines from clutter using wideband EMI responses were discussed. From the government generated receiver operating characteristics, it is clear that significant improvement in landmine detection techniques is possible using statistical signal processing
algorithms.
All of the statistical algorithms discussed are based on mine responses estimated
with the intuitive background removal technique presented in chapter 3. A statistical
analysis of the estimation technique was presented, and the estimation procedure was
found to achieve the Cramer-Rao lower bound under several assumptions regarding
the underlying stochastic process. In the case of multiplicative Gaussian interference,
the proposed background subtraction technique does not achieve the CRLB but is
still a low variance estimator. One possible avenue for further work is to determine
the optimal or maximum likelihood estimator under the multiplicative interference
assumption.
In chapter 4 the application of matched subspace detectors to the landmine detection problem was explored. By modeling the landmine subspace as scaled versions
of a single basis function, the matched subspace detectors gain invariance was utilized. The matched subspace detector and support vector machine receiver operating
characteristics are shown in figure 7. Under the stated assumptions regarding the
underlying signals, the matched subspace detector is an optimal detector, so its dominance over the support vector machines reinforces our assumptions regarding the
nature of the received signals.
74
Pd
0.6
0.5
0.4
0.3
MSS
SVM on MSS results
SVM on parameters
SVM on full signal
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
75
threshold on the maximum value across the filter banks provides the excellent results
obtained by the matched subspace detector. It appears that the support vector
machine based on that data is over-complicating the decision surface and finding
a much more computationally intensive (yet sub-optimal) solution. This can be
explained through the lack of available data (the support vector machine is trying
to learn the correct decision surface and cannot do so with only 27 landmine data
points). Alternatively, the Gaussian kernel may not be the optimal kernel in the case
of matched subspace outputs. The performance of all the support vector machines
presented may also be improved by altering the variance parameter in our Gaussian
kernel function and exploration of alternative kernel functions. Future applications of
support vector machines to the problem of landmine detection should involve larger
training sets. Furthermore, a mathematical attempt to obtain the optimal kernel
function parameters should be explored.
In the support vector machine presented, a winner takes all multiple hypothesis test was chosen because of its relation to the filter bank methodology used in
the matched subspace and decay rate detectors. However, the winner takes all approach is not optimal. Future work should apply one of the alternative SVM multiple
hypothesis frameworks [33] to the landmine detection and classification problem.
The application of wideband EMI data to the problem of landmine detection has
been presented. Matched subspace detectors were considered for landmine discrimination. These detectors have invariances which fit the landmine detection problem
well. The decay rates inherent to EMI responses of buried conductors were estimated
and a simple detection algorithm was built based on them. Support vector machines,
a type of learning algorithm, were applied to the landmine detection hypothesis testing problem with mixed results. The matched subspace detector was seen to provide
the lowest false alarm probability at very high detection rates. Furthermore, sugges77
tions for future work in this vein have been made which may further reduce the false
alarm rates inherent to landmine/clutter discrimination.
78
Bibliography
[1] Adopt a Minefiled Association. Adopt a minefield. http://www.landmines.org/.
[2] Report to the U.S. Congress. Unexploded ordnance clearance: A coordinated
approach to requirements and technology development. Office of the Undersecretary of Defense, Washington, DC , Mar. 25, 1998.
[3] International standards for humanitarian mine clearance
http://www.un.org/Depts/dpko/mine/Standard/s-index.htm.
operations.
[4] Y. Wang, I. D. Longstaff, C. J. Leat, and N. V. Shuley. Complex natural resonances of conducting planar objects buried in a dielectric half-space. IEEE
Transactions on Geoscience and Remote Sensing, 39(6):11831189, June 2001.
[5] C. Chen, M. B. Higgins, K. ONeill, and R. Detsch. Ultrawide-bandwidth fullypolarmetric ground penetrating radar classification of subsurface unexploded
ordnance. IEEE Transactions on Geoscience and Remote Sensing, 39(6):1221
1230, June 2001.
[6] A. van der Merwe and I. J. Gupta. A novel signal processing technique for clutter
reduction in gpr measurements of small, shallow land mines. IEEE Transactions
on Geoscience and Remote Sensing, 38(6):26272637, November 2000.
[7] B. Karlsen, J. Larsen, H. B.D. Sorensen, and K. B. Jakobsen. Comparison of pca
and ica based clutter reduction in gpr systems for anti-personal landmine detection. In Proceedings of the 11th IEEE Signal Processing Workshop on Statistical
Signal Processing, pages 146149, 2001.
[8] P. D. Gader, M. Mystkowski, and Y. Zhao. Landmine detection with ground
penetrating radar using hidden markov models. IEEE Transactions on Geoscience and Remote Sensing, 39(6):12371244, June 2001.
[9] W. R. Scott, J. S. Martin, and G. D. Larson. Experimental model for a seismic landmine detection system. IEEE Transactions on Geoscience and Remote
Sensing, 39(6):11551164, June 2001.
[10] J. M. Sabatier and N. Xiang. Investigation of acoustic-to-seismic coupling to detect buried antitank landmines. IEEE Transactions on Geoscience and Remote
Sensing, 39(6):11461154, June 2001.
79
[11] Y. Das, J.E. McFee, and R.H. Chesney. Time domain response of a sphere in
the field of a coil: Theory and experiment. IEEE Transactions on Geoscience
and Remote Sensing, GE - 22:360367, July 1984.
[12] L. Carin, H. Yu, Y. Dalichaouch, A. R. Perry, P. V. Czispott, and C. E.
Baum. On the wideband emi response of a rotationally symmetric permeable
and conducting target. IEEE Transactions on Geoscience and Remote Sensing,
39(6):12061213, June 2001.
[13] J. T. Miller, T. H. Bell, J. Soukup, and K. Keiswetter. Simple phenomenological models for wideband frequency-domain electromagnetic induction. IEEE
Transactions on Geoscience and Remote Sensing, 39(6):12941298, June 2001.
[14] S. L. Tantum and L. Collins. Performance bounds for target identification using
decay rates estimation from emi measurements. In Proceedings of the Geoscience
and Remote Sensing Symposium, volume 5, pages 22782280, 2000.
[15] S. L. Tantum and L. Collins. A comparison of algorithms for subsurface target
detection and identification using time-domain electromagnetic induction data.
IEEE Transactions on Geoscience and Remote Sensing, 39(6):12991306, June
2001.
[16] B. Barrow and H. H. Nelson. Model-based characterization of electromagnetic
induction signatures obtained with the mtads electromagnetic array. IEEE
Transactions on Geoscience and Remote Sensing, 39(6):12791285, June 2001.
[17] L.S. Riggs, J.E. Mooney, and D.E. Lawrence. Identification of metallic mine-like
objects using low frequency magnetic fields. IEEE Transactions on Geoscience
and Remote Sensing, 39(1):5666, 2001.
[18] G.D. Sower and S.P. Cave. Detection and identification of mines from natural
magnetic and electromagnetic resonances. Proceedings of the SPIE, April 1995.
[19] Y. Das, J.E. McFee, J. Toews, and G.C. Stuart. Analysis of an electromagnetic
induction detector for real-time location of buried objects. IEEE Transactions
on Geoscience and Remote Sensing, 28:278287, May 1990.
[20] I.J. Won, D. A. Keiswetter, and T. H. Bell. Electromagnetic induction spectroscopy for clearing landmines. IEEE Transactions on Geoscience and Remote
Sensing, 39(4):703709, April 2001.
80
[21] P. Gao and L. Collins. A comparison of optimal and suboptimal processors for
classification of buried metal objects. IEEE Signal Processing Letters, 6(8):216
218, August 1999.
[22] P. Gao, L. Collins, N. Geng, and L. Carin. Comparison of pca and ica based
clutter reduction in gpr systems for anti-personal landmine detection. In Proceedings of the Geoscience and Remote Sensing Symposium, volume 3, pages
18191822, 1999.
[23] L. Collins, P. Gao, and S. Tantum. Model-based statistical signal processing
using electromagnetic induction data for landmine detection and classification.
In Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal
Processing, pages 162165, 2001.
[24] L. Collins, L. Makowsky P. Gao, J. Moulton, D. Reidy, and D. Weaver. Improving detection of low-metallic content landmines using emi data. In Proceedings
of the Geoscience and Remote Sensing Symposium, pages 16311633, 2000.
[25] L. L. Scharf and B. Friedlander. Matched subspace detectors. IEEE Transactions on Signal Processing, 42(8):21462157, August 1994.
[26] D. Keiswetter, E. Novikova, I.J. Won, T. Hall, and D. Hanson. Electromagnetic
induction spectroscopy for ordnance identification. In Proc. SAGEEP, pages
743751, 1999.
[27] L. L. Scharf. Statistical Signal Processing: Detection, Estimation, and Time
Series Analysis. Addison-Wesley, Reading, MA, 1991.
[28] V. N. Vapnik. Statistical Learning Theory. John Wiley and sons Inc., New York,
NY, 1998.
[29] K-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf. An introduction
to kernel-based learning algorithms. IEEE Transactions on Neural Networks,
12(2):181201, March 2001.
[30] Kernel machines. http://www.kernel-machines.org/.
[31] Nello Cristianini and John Shawe-Taylor. An Introduction To Support Vector
Machines and other kernel-based learning methods. Cambridge University Press,
Cambridge, 2000.
81
82
[43] Leslie Collins, Ping Gao, Deborah Schofield, John Moulton, Larry Makowski,
Denis Reidy, and Richard Weaver. A statistical approach to landmine detection
using broadband electromagnetic induction data. In press, IEEE Transactions
on Geoscience and Remote Sensing.
[44] P. Gao, L. Collins, P. M. Garner, N. Geng, and L. Carin. Classification of
landmine-like metal targets using wideband electromagnetic induction. IEEE
Transactions on Geoscience and Remote Sensing, 38(3):13521361, May 2000.
[45] JUXOCO, Ft. Belvoir, VA. Hand Held Metallic Mine Detector Performance
Baselining Collection Plan, December 1998.
[46] Joint Unexploded Ordnance Coordination Office.
Hand
metallic
mine
detector
performance
baselining
collection
http://www.uxocoe.brtrc.com/testdata/PDF/HHTESTPLAN.PDF.
held
plan.