Torrione 2002 Masters

A COMPARISON OF STATISTICAL ALGORITHMS FOR
LANDMINE DETECTION
by
Peter Acerbo Torrione

Department of Electrical and Computer Engineering
Duke University
Date:
Approved:
Dr. Leslie Collins, Supervisor

Dr. Gary Ybarra
Dr. Gregg Trahey
A thesis submitted in partial fulfillment of the

requirements for the degree of Master of Science
in the Department of Electrical and Computer Engineering
in the Graduate School of
Duke University
2002
Contents
List of Tables
List of Figures
vi
1 Introduction
2 Background
2.1
Electromagnetic Induction Systems . . . . . . . . . . . . . . . . . . .
2.1.1
Physics of EMI Systems . . . . . . . . . . . . . . . . . . . . .
2.1.2
The GEM-3 Sensor . . . . . . . . . . . . . . . . . . . . . . . .
2.2
Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3
Parameter Estimation and the Cramer-Rao Lower Bound . . . . . . .
2.4
The Detection Problem: Likelihood Ratios and Generalized Likelihood

Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12
2.4.1
The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . .
12
2.4.2
The Generalized Likelihood Ratio Test . . . . . . . . . . . . .
13
2.4.3
The Matched Filter . . . . . . . . . . . . . . . . . . . . . . . .
14
Linear Algebra Preliminaries and Matched Subspace Detectors . . . .
15
2.5.1
Linear Algebra Preliminaries . . . . . . . . . . . . . . . . . . .
15
2.5.2
Invariance of Hypothesis Testing Problems . . . . . . . . . . .
17
2.5.3
Invariance Tests and Maximal Invariant Statistics . . . . . . .
18
2.5.4
Matched Subspace Detectors . . . . . . . . . . . . . . . . . . .
18
Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . .
22
2.6.1
23
2.5
2.6
Problem Statement and the Vapnik-Chervonekis Dimension .

ii
2.6.2
2.6.3
Kernel Functions and Avoiding the Complexities of a High Dimensional Space . . . . . . . . . . . . . . . . . . . . . . . . . .
24
Finding the Optimal Hyperplane . . . . . . . . . . . . . . . .
27
3 The Cramer-Rao Lower Bound
30
3.1
Additive White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
3.2
Additive White Noise and DC Term (in-phase) . . . . . . . . . . . . .
36
3.3
Additive White Noise and Additive Function of Frequency (model 1

quadrature) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
Additive White Noise and Multiplicative Term (model 2 quadrature)
39
3.4
4 Signal Processing Using Matched Subspace Detectors
44
4.1
Properties of Estimated Landmine Responses . . . . . . . . . . . . .
44
4.2
Basis Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4.3
Designing the Matched Subspace Filter . . . . . . . . . . . . . . . . .
48
4.4
Matched Subspace Results . . . . . . . . . . . . . . . . . . . . . . . .
53
5 Decay Rate Estimation
56
5.1
Decay Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5.2
Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5.3
Gaussian Models and Detection . . . . . . . . . . . . . . . . . . . . .
59
5.4
Decay Rate Estimation Results . . . . . . . . . . . . . . . . . . . . .
62
6 Support Vector Machine Algorithms
65
6.1
Building the Support Vector Machine . . . . . . . . . . . . . . . . . .
66
6.2
Model and Parameter Selection and Implementation . . . . . . . . . .
66
6.3
Support Vector Machine Results . . . . . . . . . . . . . . . . . . . . .
68
7 Conclusions and Future Work
74
iii
Bibliography
79
iv
List of Tables
2.1
Calibration grid landmine type and depth specifications . . . . . . . .
10
List of Figures
2.1
Calibration Lane Data Collection . . . . . . . . . . . . . . . . . . . .
2.2
Blind Lane Data Collection . . . . . . . . . . . . . . . . . . . . . . .
10
2.3
Data separation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . .
26
2.4
Data separation in 3 Dimensions . . . . . . . . . . . . . . . . . . . . .
26
3.1
Typical in-phase and quadrature background measurements versus logfrequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
Typical in-phase background measurements visibly shifted by some

constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
Typical quadrature background measurements corrupted by some multiplicative constant, or some additive term which increases with frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Plots of the Cramer-Rao lower bound, calculated, and sample estimator variances versus the standard deviation of k. Parameters: bi = 10,
n2 = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
4.1
Signatures of VS-50 landmines versus log-frequency . . . . . . . . . .
45
4.2
Signatures of M-14 landmines versus log-frequency . . . . . . . . . . .
46
4.3
Actual, mean, and estimated signatures of M-14 landmines . . . . . .
48
4.4
Comparison of filter bank outputs resulting from landmine and clutter

responses. Note that the sum across the filter banks from the clutter
response is larger than from the landmine response. . . . . . . . . . .
51
Comparison of in-phase and quadrature matched subspace receiver

operating characteristics from the calibration grid . . . . . . . . . . .
54
3.2
3.3
3.4
4.5
vi
4.6
Comparison of quadrature matched subspace detector and baseline

energy detector receiver operating characteristics from the blind and
calibration grids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.1
Estimation of VS-50 Response . . . . . . . . . . . . . . . . . . . . . .
58
5.2
Estimation of M-14 Response . . . . . . . . . . . . . . . . . . . . . .
59
5.3
Estimated landmine decay rates plotted against 1 and 2 in Hz. Each

landmine type is represented by a different shape. . . . . . . . . . . .
60
Estimated landmine decay rates plotted against 1 and 2 in Hz (closeup). Each landmine type is represented by a different shape. Note the
high degree of spatial correlation between landmines of each type. . .
61
Estimated clutter decay rates plotted against 1 and 2 in Hz. Note

that the estimated decay rates for clutter objects are spread throughout a wide frequency range. . . . . . . . . . . . . . . . . . . . . . . .
62
5.4
5.5
5.6
Gaussian PDF contours with scattered landmine and clutter decay rates 63
5.7
ROC for Gaussian-PDF estimated decay rate-based detector operating

in the calibration grid. . . . . . . . . . . . . . . . . . . . . . . . . . .
64
Support Vector Machine decision boundaries for non-rejecting SVMs

and relevant landmine and clutter parameter locations from the calibration grid. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
Receiver operating characteristics of non-rejecting support vector machines trained on decay rates, matched subspace outputs, and full
signal responses operating in the calibration grid . . . . . . . . . . . .
70
Receiver operating characteristics of rejecting support vector machines

trained on decay rates, matched subspace outputs, and full signal responses operating in the calibration grid . . . . . . . . . . . . . . . .
71
Receiver operating characteristics for three different support vector

machines operating in the blind grid . . . . . . . . . . . . . . . . . .
72
Comparison of detector operating characteristics for matched subspace

and support vector machines . . . . . . . . . . . . . . . . . . . . . . .
75
6.1
6.2
6.3
6.4
7.1
vii
Chapter 1
Introduction
Although estimates vary, agencies including the Red Cross and the United Nations
concede that there are between 60 and 70 million active landmines in the ground,
buried across 70 countries around the globe. Every year approximately 26,000 people
are maimed or killed by landmines and 8,000 to 10,000 of these victims are children
[1].
Currently, there are approximately 340 different models of anti-personnel landmines. Although these landmines cost as little as three dollars to produce, their
presence inflicts a tremendous cost - especially in developing areas. Firstly, the
cost to safely detect and remove each landmine can range between $300 and $1000.
Furthermore, many surviving landmine victims require artificial prosthetics. These
artificial limbs can cost between $100 and $3000, and they must be regularly replaced
(every 3-5 years in adults, and every 6 months in children) [1]. It is impossible to
measure the damage landmines inflict upon productivity, emotional well being, and
the peaceful reconciliation of neighbors after years of war.
As of 2002, the landmine crisis primarily affects poorer countries for which the
economic impact of landmines is especially devastating. There are an estimated 22.5
million landmines in Egypt, 16 million in Iran, and 10 million in Iraq, to list only
some of the most egregiously affected countries [1].
The primary contributor to the large cost of landmine removal is a high false alarm
rate stemming from large amounts of anthropic clutter that pervades minefields. Until
it is excavated and determined to not pose a threat, this clutter must be considered
as dangerous as an actual landmine. On occasion, false alarm rates as high as 95%
1
percent have been reported when clearing minefields [2].

There are two distinct categories of landmine remediation: military and humanitarian [1]. The primary goal of military landmine removal is to clear a path through
a suspected minefield to allow troop movement in the area. Generally, this must
be accomplished quickly, usually at night, to avoid exposure to the enemy. Military
landmine clearance is often accomplished by driving large rollers, flails, or plows over
landmines to detonate them and clear a path [1]. Unfortunately, these techniques may
only achieve clearance rates of 80 percent, which is not an acceptable detection level
for humanitarian situations (the regulations for humanitarian de-mining are described
by the International Standards for Humanitarian Mine Clearance Operations, see
reference [3]). Humanitarian landmine removal is a much more arduous process generally involving indigenous workers using hand-held devices to locate possible targets
and safely remove them.
Landmines are indiscriminate killers, and while the UN is lobbying for a worldwide ban on their use, concerted efforts are underway to remove landmines in areas
where they can cause harm to civilians. The goal of the research presented in this
thesis is to develop signal processing techniques that expediate accurate detection
of potential threats by decreasing the false alarm rates associated with currently
deployed landmine detectors while maintaining high detection rates.
Several novel sensor modalities have been investigated for discerning the locations
of buried landmines. Possible ground querying techniques include neutron backscattering [4], ground penetrating radar [5, 6, 7, 8], seismic detectors [9], and acoustic-toseismic coupling [10]. While many of these technologies hold great promise for future
application, currently almost all fielded or nearly-fielded landmine detection systems
use electromagnetic induction (EMI) sensors which operate on the same principals
as a standard metal-detector.
2
A large body of literature exists dealing with the applications of EMI sensors and
processing of EMI data to the detection of buried landmines and unexploded ordnance
(UXO). Some of this work has focused on determining the EMI responses from rotationally symmetric bodies [11, 12] and development of simplified phenomenological
models to fit such responses [13]. Several researchers have explored the processing
of time domain EMI responses to landmine detection using estimated decay rates
[14, 15, 16, 17, 18, 19]. Other work by Won et al. has indicated that the wideband
EMI spectral responses from different landmines are unique [20]. Gao et al. have
derived the complicated optimal wideband EMI detector and have compared its results to sub-optimal detectors [21, 22]. Additional signal processing research on the
detection and classification of low-metallic content landmines via EMI data has been
performed by Collins et al. [23, 24].
In this thesis, we build on this body of work in three ways. First, we will address
the problem of landmine response estimation via soil, or background removal and
show that our proposed estimator achieves the Cramer Rao lower bound under specific
statistical models of the received data. Second, we will apply the theory of matched
subspace detectors [25, 26, 27] to the detection and classification of landmines versus
clutter. Third, we will explore the possible applications of support vector machines
(SVMs) [28, 29, 30, 31, 32, 33] to the landmine detection problem.
The remainder of this thesis is organized as follows.
In chapter two we review some of the information fundamental to the rest of
the paper. We begin with a brief overview of electromagnetic induction sensors, the
data collection procedure used, and the particular EMI sensor used in this study:
the GEM-3. This is followed by a review of the Cramer-Rao lower bound and some
linear algebra preliminaries to the matched subspace detector. A full treatment of
matched subspace detectors is given prior to discussing the derivation and properties
3
of support vector machines.

Chapter three focuses on applying the Cramer-Rao lower bound to background
response estimation. A method of estimating the received signal by subtracting an
estimate of the background signal is proposed. The performance of the estimation
procedure is considered under four different models of the received data. The estimation procedure is shown to achieve the Cramer-Rao bound for three of the models
and to approach the bound for the fourth model.
Chapter four discusses the particulars of the matched subspace detector as applied to the landmine detection problem. This includes subspace basis estimation
procedures and energy pre-screening. Results from a blind field trial are presented.
Chapter five deals with the problems of decay rate estimation from frequencydomain EMI data. We briefly discuss a simple detection technique based on multiple
Gaussian probability density functions.
Chapter six describes the application of support vector machines to the landmine
detection problem. Three different support vector machines are presented, and their
receiver operating characteristics from blind field trials are discussed.
In chapter 7 we will review the research covered by this thesis and present thoughts
on the results. A comparison of the results of the support vector machine and matched
subspace landmine detection techniques is presented. Possible avenues for future work
are also discussed.
Chapter 2
Background
2.1
Electromagnetic Induction Systems
In 1831 Michael Faraday made the discovery that a changing magnetic field can
generate or induce a current in a nearby conductor. Building upon Faradays work,
Maxwell generated his four most famous equations upon which all electromagnetics is
based. The phenomenology associated with EMI sensors (like hobby metal-detectors)
is based directly on these equations.
2.1.1
Physics of EMI Systems
A standard EMI sensor has a primary coil, or transmitter coil, composed of wire
through which alternating current flows. This current flow generates a changing
magnetic field around the sensor that penetrates the ground. As Faraday noted, the
changing magnetic field from the transmitter coil induces current flow in the ground.
The current flowing through the earth (and any contaminants therein) generates
another magnetic field. Thus, it is possible to use a receiver coil to listen for the
magnetic field that results from the induced current flow in the earth. Of course, care
must be taken in the placement of and recording of measurements from the receiver
coil since the magnetic field of the transmitter will, in general, be much stronger than
the secondary field resulting from the earths response. The magnitude and phase of
the measured wideband EMI responses can be used to discern the amount, type, and
shape of buried metal objects [20, 34, 35].
Although Maxwells equations completely govern the responses of conducting ma5
terials in any shape and orientation, solving these equations for shapes of arbitrary
complexity is mathematically problematic.
It has been shown [12, 36] that the frequency-domain response of a buried highly
conducting object subject to EMI radiation can be modeled as:
H() = a +
X
n
bn
jn
(2.1)
Furthermore, the initial term a has been shown to be non-zero only for ferrous
targets [37]. Similarly, the time-domain response of such a system has been shown
[14, 38, 39] to be the weighted sum of exponentials:
S(t) = a(t) +
An en t
(2.2)
where, since the real part of n is negligible, n is real. In practice, the actual
responses of buried targets are well approximated by the first few terms in each of
the above summations. The primary parameters of interest are often assumed to be
the first few decay rates: 1 and 2 . A significant amount of work has focused on the
application of estimated decay rates to landmine detection [14, 15, 11, 19, 17, 18, 40].
For high metal-content objects, the primary decay rates are generally fairly small,
resulting in slowly decaying exponential responses. Such responses are relatively easy
to sample in the time-domain. However, for objects containing small amounts of
metal (like most modern landmines), the decay rate parameters are very large and
the resulting exponential signature decays very rapidly. This makes time-domain
measurement of the decay rates difficult due to the rate at which the signal decays.
In this work, a wideband frequency-domain EMI sensor is utilized. Since a wideband frequency-domain sensors responses are not time dependent, these sensors are
advantageous when measuring quickly decaying exponential signals.
6
2.1.2
The GEM-3 Sensor
In this work, data from a Geophex GEM-3 sensor was used [41]. This section describes
the GEM-3 sensor.
The GEM-3 is a wideband digital electromagnetic sensor weighing about 10
pounds. The sensor head of the GEM-3 consists of three concentric coils. The
inner coil is the receiver coil, and the two outer coils comprise the transmitter coil.
The combination of the magnetic fields induced by the outer coils creates a magnetic
cavity (area with zero magnetic field) at the receiving coil. This prevents interference
between the transmitted and induced magnetic fields. [42]
When operating as a wideband sensor the GEM-3 prompts for a set of frequencies
at which to collect the induced EMI response. The GEM-3 can operate at frequencies
between 30 Hz and 24 kHz. In this work, the GEM-3 was programmed to collect data
at the following ten frequencies:
750 1410 2370 4050 6030 8250 10890 14430 19450 23970 Hz
A sensor that operates a multiple frequencies has the advantage of being able
to see at multiple depths into the medium since low frequency signal will penetrate further into the medium than a high frequency signal. It has been previously
shown that the GEM-3 performs significantly better for discriminating landmines
from clutter than several other sensors at blind government run test sites [43].
It has also been established that different types of landmines generate unique
frequency-domain signatures, which are relatively independent of target-sensor orientation and distance for high metal content objects [20, 44]. However, the signatures
are dependent on target-sensor orientation and distance if the objects metal content
is low [43]. Recent work has also shown that these signatures change when the objects
are buried [43]. The goal of this research is to develop algorithms that reduce the ef7
fects of the soil on the measured signal and maximize the detection and classification
of landmines using their frequency-domain EMI signatures.
2.2
Data Collection
The GEM-3 data used in this work was taken from a government test site in Virginia.
The site is segmented into a large (50m x 20 m) grid consisting of squares measuring
1 meter per side. Before being used as a testing ground, all of the anthropic clutter
was systematically removed from the site. Some clutter was subsequently replaced to
provide discrete opportunities for clutter-induced false alarms. At the center of each
1m x 1m grid square a landmine, a clutter item, or nothing is emplaced. Ground
truth, i.e. the object buried in each square, is sequestered for this area and is known
only to the government sponsor. A separate area measuring 25 meters by five meters
was designated for sensor calibration and algorithm testing. The ground truth for the
calibration section is available to the public so that algorithms can be tested prior to
application on the blind grid.
The calibration data used for algorithm training in this work was recorded from
various spots throughout the calibration grid. In all, 20 clutter responses and 27
landmine signatures from 12 different landmine types at varying depths were collected from the calibration lanes. Data from 980 potential targets was measured in
the blind grid. In the calibration lanes, where the ground truth is known, two background measurements were taken from either side of the center target location as
shown in Fig. 2.1. In the blind grid, measurements alternated between background
and potential targets at locations shown in Fig 2.2. All of the central and background measurements were taken by human operators. Although the sensor height is
approximately constant across all measurements, variations are bound to exist due to
uneven ground, operator height and posture, and other factors. Thus, sensor height
8
1m
1m
Figure 2.1: Calibration Lane Data Collection

is essentially a random variable.
In summary, for each calibration-grid data point two unique background signals
were measured on each side of the possible target location. For each blind grid data
point there are two shared background signals for each square with the exception of
the first and last squares from each column.
Table 2.2 indicates the depths and number of occurrences of each landmine type
in the calibration area. In the table, HE means high explosive present. For additional information regarding the data collection, please see the Hand Held Metallic
Mine Detector Performance Baselining Collection Plan [45, 46].
2.3
Parameter Estimation and the Cramer-Rao Lower Bound
Estimating an unknown parameter from data is a research topic that has been studied
extensively [47, 27, 48]. In this section two standard approaches to parameter estimation and the Cramer-Rao bound which places limits on the best possible unbiased
9
1m
1m
Figure 2.2: Blind Lane Data Collection
Minetype
VS-50
TS-50
M-14
M-14 (HE)
PMA-3
VAL69
VS-2.2
M-19
TMA-4
TM62P3
T-72
TM-46
VS1.6
Number of measurements
5
3
3
2
2
1
2
2
2
2
1
1
1
Depth Range (in)

0 - 2.25
0 - 1.75
.25 - 1.75
.5 - 1.125
0 - 1.5
0
1.50 - 3
1.25 - 2.5
1.75 - 3
1.50 - 3
1.25
3
1
Table 2.1: Calibration grid landmine type and depth specifications
10
estimator are discussed.

Consider a data set x consisting of xi data points drawn from some distribution
F with parameter : F (x, ). The goal of an estimator is to predict the value of
using only the set of data given and (possibly) some prior knowledge of F. The

is said to be an unbiased estimator of
estimated value is then referred to as .
is said to be a consistent
if E(|x)
= (where E represents the expected value).
2
(E(()
estimator of if the variance of

|x)) tends toward zero with probability
one as the size of the data set grows to infinity.

There are two common approaches to parameter estimation: Bayesian and Maximum Likelihood [47]. In Bayesian estimation one assumes a prior distribution on the
parameter of interest F (). One then considers the distribution Fx (x|), and
= E(|x) =
f (|x)
(2.3)
where:
f (|x) =
f (x|)f ()
f (x)
(2.4)
In maximum likelihood estimation one considers the density f (x, ) and maximizes
is chosen to maximize f (x, ).
this function such that given a set of data x,

Often it is difficult to derive, implement, or show that the optimal estimator exists for a given problem [47]. Although consistency guarantees that the variance of
an estimate tends to zero, there are often some estimators whose variance will approaches zero more quickly than others. It is useful to determine if a given estimator
approaches or achieves the statistics of the best possible estimator; the Cramer Rao
lower bound (CRLB) provides such a tool [47, 49]. The CRLB is a measure of the
smallest variance that an unbiased estimator can achieve on a given set of data. If an
estimator achieves this bound, the estimator is the best unbiased estimator. Consider
11
of some parameter . Further, consider a set of data X = xi drawn

an estimator
from the density f (xi , ). In mathematical terms, the CRLB states that the variance
satisfies:
of an estimator

V AR()
1
J()
(2.5)
where J is the Fischer information defined as:

J() = E [
ln(f (x; ))]2
(2.6)
An alternative formulation of J(X) is given in [48] as:
J() = E [
2.4
2
ln(f (x|))|]
(2.7)
The Detection Problem: Likelihood Ratios and Generalized Likelihood Ratios
This thesis is primarily concerned with the detection of signals in noise. In this
section the optimal solution to the hypothesis testing problem - the likelihood ratio,
and a sub-optimal version of this test - the generalized likelihood ratio are reviewed.
2.4.1
The Likelihood Ratio Test
In most binary decision problems, one has a set of data and wishes to determine
which of two separate distributions the data was drawn from. The two hypotheses
are generally termed H0 , and H1 , or the null and alternative hypotheses respectively.
The likelihood ratio is the optimal decision statistic for a wide range of decision
problems [48] and is defined as:
(x) =
p(x|H1 )
p(x|H0 )
12
>
<
(2.8)
The null hypothesis is accepted if (x) is less than a certain threshold, , otherwise
the alternative hypothesis is accepted.
Determining the optimal threshold value to use depends on the performance criteria chosen. The two most commonly used performance criteria are the NeymanPearson criteria and the Bayes criteria [48].
2.4.2
The Generalized Likelihood Ratio Test
The standard likelihood ratio test assumes that the conditional distributions of the
data under the two hypotheses are known. Often this assumption is invalid. When
the two probability density functions are not known or are difficult to estimate,
the Generalized Likelihood Ratio Test (GLRT) is often utilized. The GLRT is an
intuitive (although not optimal) mechanism by which to approach the problem of
unknown distributions in a two-hypothesis decision scenario. Consider again the two
probability distribution functions, except assume that some parameter, denoted ,
associated with the probability density function p is unknown:
p(x|H1 ) p(x|, H1 )
(2.9)
p(x|H0 ) p(x|, H0 )
(2.10)
The likelihood ratio is [47]:

R
(p(x|, H1 ) p(|H1 )d)

(p(x|, H0 ) p(|H0 )d)
(x) = R
(2.11)
In practice, the calculation of this integral is often difficult, or if p(|H1 ) is unknown, impossible. One sub-optimal solution results from substituting estimates of
the unknown into the density functions. This formulation is termed the generalized
likelihood ratio test [48]:
(x) =
p(x|, H1 )|
p(x|, H0 )|
13
(2.12)
2.4.3
The Matched Filter
One simple and commonly encountered hypothesis testing problem involves determining the presence of a known signal s in the presence of additive zero-mean white
noise. In this case, the likelihood ratio reduces to a filter known as a correlation
detector or matched filter [48].
Let s and n be length i vectors consisting of the known signal and statistically
independent, N (0, In2 ) noise respectively. Consider a received data vector x. Under
the null and alternative hypotheses
H0 : x = n
H1 : x = s + n
The distributions of x under H0 and H1 are:
f (x|H0 ) =
i
Y
2n2
j=1
f (x|H1 ) =
i
Y
j=1
1
2n2
exp
exp
x2j
2n2
(xj sj )2
2n2
The likelihood ratio (equation 2.8) is
(x) =
i
Y
exp
j=1
(2xj sj s2j )
2n2
Taking the natural logarithm and incorporating the known values (n2 ,si ) into the
threshold () yields:
(x) =
i
X
xj sj
j=1
which is the well known matched filter.

14
>
<
2.5
Linear Algebra Preliminaries and Matched Subspace Detectors
The common matched filter is a special case of a more general class of filters termed
matched subspace detectors [27]. Scharfs derivation of the matched subspace detectors (see [27, 25]) requires some linear algebra preliminaries which allow him to
show that the matched subspace detector has many interesting and powerful properties including invariance to rotations in certain subspaces and optimal performance
under certain assumptions. In this section the linear algebra associated with projection matrices (which are an integral part of matched subspace filters) is discussed. A
summary of Scharfs definitions of invariance and maximal invariant statistics (closely
following the discussion from [27]) is given, and finally, summaries of Scharfs application of these ideas to the development of the matched subspace filter and his proof
that the matched subspace detector is a uniformly most powerful test are provided.
2.5.1
Linear Algebra Preliminaries
Before discussing matched subspace filters, it is important to review the formation

and properties of projection matrices.
The span of a set of vectors [v1 v2 , ...vN ] is defined as the set of all linear combinations of {v1 , v2 , ...vN }. A vector b is then an element of the span of {vi } if and
only if the equation:
b = a1 v1 + a2 v2 + ... + aN vN
(2.13)
has a solution. When the vectors {vi } are considered columns in a matrix H, the span
of {vi } is equivalent to the subspace denoted by <H>. The orthogonal complement
of <H> is denoted <H> .
15
A projection matrix E is a square matrix that gives a projection onto a given

subspace. The projection onto a subspace <H> is denoted as EH . A projection
matrix must be idempotent (equal to its own square):
E2 = E
(2.14)
An orthogonal projection matrix has the additional constraint of being Hermitian

(equal to its Hermitian transpose). Such projections are denoted with the letter P:
PH = P
(2.15)
The most common orthogonal projection matrices are the Cartesian coordinate projections in <2 :
"
Px =
"
Py =
1 0
0 0
0 0
0 1
(2.16)
#
(2.17)
which map a vector onto the x and y axes respectively. It is possible to generate an
orthogonal projection matrix onto any subspace <H> using the following formula:
PH = H(HH H)1 HH
(2.18)
For example, to form a projection onto the x-axis, the H vector is:
"
H=
1
0
and:
(2.19)
"
H
PH = H(H H) H =
1 0
0 0
= Px
(2.20)
The projection onto the orthogonal subspace P

H is found by subtracting PH from
the identity matrix:
P
H = I PH
16
(2.21)
An orthogonal projection onto <H> maps vectors contained in the subspace <H>
to themselves, and maps vectors lying in <H> to the zero vector. This can be seen
using the Cartesian projections:
"
Px
"
Px
2.5.2
c
0
0
d
"
=
#
"
c
0
0
0
(2.22)
#
(2.23)
Invariance of Hypothesis Testing Problems
In many decision problems, there are parameters associated with the probability
distribution functions of the measured signals which are considered nuisance parameters. In these cases it is desirable to reduce the set of viable decision rules to
those which are (in some sense) invariant to changes in the nuisance parameters.
As Scharf states:
This leads to the key idea behind invariance in hypothesis testing: When
presented with nuisance parameters that are extraneous to the hypothesis
test, look for transformations of the measured data that would introduce
these nuisance parameters and then look for a decision rule that is invariant to these transformations. [27] pg. 128
Consider the hypothesis testing problem of determining if X was drawn from
F1 (x) or F0 (x). If for every g in G:
x : F (x)
(2.24)
y = g(x)
(2.25)
F0 (y) = P [g(X) y]
(2.26)
17
where F0 (y) is the distribution of y with parameter , and
F0 (y) = Fg() (y)
(2.27)
(that is - if the only effect of the function g(x) on the distribution F (x) is to change
the parameter from to g()) then the family of distributions for which equation
2.27 holds is said to be invariant to G. Also, if the transformation g maintains the
dichotomy between H1 and H0 , the hypothesis testing problem is said to be invariant
to G.
2.5.3
Invariance Tests and Maximal Invariant Statistics
A hypothesis test is invariant to G if (g(x)) = (x) [27]. Furthermore, a statistic is

maximally invariant if
M (g(x)) = M (x) f or all g in G (invariant)
M (x1 ) = M (x2 ) implies x1 = g(x2 ) f or some g in G (maximal)
(2.28)
(2.29)
Thus, all invariant tests may be written as a function of a maximally invariant statistic
[27]:
(x) = (M (x))
(2.30)
These results are important for the landmine detection problem because they
show that when deriving a decision rule for all invariant hypothesis testing problems,
it is possible consider only functions of a maximal invariant statistic.
2.5.4
Matched Subspace Detectors
In this section a review of Scharfs work is presented which shows that the problem
statement leading up to the matched filter is naturally invariant to a set of transformations and that the matched subspace detector is a maximal invariant statistic.
18
Scharfs explanation of why the matched subspace detector is uniformly most powerful is also reviewed.
In a detection problem, the exact form of the signal of interest is often unknown.
The signal may be subject to an arbitrary gain, or it may be a random (unknown)
combination of a set of basis vectors. As has been previously noted, a vector x which
lies in the subspace <H> can always be represented by a linear combination of a set
of vectors comprising the matrix H. The signal x can then be represented as:
x=
n hn = H
(2.31)
where H is an N X P matrix and is a P X 1 vector containing the coordinates of

x in <H>. If the weight vector is known a priori then, since the subspace <H>
is known, the vector x is completely determined, and the optimal detector is the
matched filter. However, if is unknown, then all that is known about the vector
x is that it lies somewhere in the space spanned by H. Under these assumptions, if
x is corrupted by white noise and biased in <H> , Scharf [25] has shown that the
optimal test statistic is:
2 = xT PH x
(2.32)
Here we summarize his proof.

Let X = H + N with N : N [0, 2 I]. If a channel also rotates the signal
in <H> and adds a bias v in the subspace <A>=<H> , this can be described
mathematically as:
QH (X + v)
(2.33)
where QH is a rotation matrix in <H> and v lies in the subspace <A>. Note that
the rotation of v leaves v unchanged (since we are rotating in <H>), and H is
mapped to H0 . Let
19
y = QH (X + v)
(2.34)
y : N [H0 + v, 2 I]
(2.35)
The hypothesis test is then to discern between the null hypothesis ( = 0) and
the alternative ( > 0). As mentioned above, since QH and v are unknown, they are
considered nuisance parameters and the matched subspace detector should ideally be
invariant to them. To show that the matched subspace detector is uniformly most
powerful, Scharf shows that the distribution of y is invariant to these parameters,
the matched subspace detector is a maximal invariant statistic, and the matched
subspace detector has a monotone likelihood ratio.
It can be shown that the hypothesis testing problem in this case is invariant to
the set of functions
G = {g : g(y) = QH (y + w)}
(2.36)
since the distribution of QH (y + w) is

N [H00 + v + w, 2 I]
(2.37)
and the distribution of y is given by eq. 2.35. Note that the form of the distribution
has not changed (only the mean parameter has been altered), thus the distribution
of y is invariant to G. Also, since the transformation of the parameter (H + v) is:
g(H + v) = H0 + v + w
(2.38)
the transformations of the hypothesis are:

g(H0 ) = v + w = H0
(2.39)
g(H1 ) = H0 + v + w = H1
(2.40)
and
20
the dichotomy of the original parameter space is maintained, and the hypothesis
testing problem is G-invariant.
To show that the matched subspace statistic
2 = M (y) = yT PH y
(2.41)
is maximal invariant to the group G, Scharf shows that eq. 2.28 and 2.29 hold with:
g(y) = QH (y + v)
(2.42)
(QH (y + v))T PH (QH (y + v))
(2.43)
= (y + v)T PH (y + v)
(2.44)
= yT PH y
(2.45)
yT1 PH yT1 = yT2 PH yT2
(2.46)
For eq. 2.28:
since QTH QH = I:
and since v is in <A>
For eq. 2.29:
note that the quadratic form involving PH is the energy of the vectors in the subspace
<H>. Since the energies of both y1 and y2 in the subspace <H> are the same, y2
must be a rotation of y1 and/or differ only in the subspace <A>. Thus:
y1 = QH (y2 + v)
(2.47)
for some QH and v.

Since the statistic 2 / 2 (2 from eq. 2.41) is primarily the square of a Gaussiandistributed vector, it can be shown (see [27]) that it is distributed as a chi-squared
21
random variable. By the Karlin-Rubin theorem, since all 2 random variables have
monotone likelihood ratios, the 2 test is uniformly most powerful [27].
In the above discussions, the variance of the noise ( 2 ) has been assumed to be
known. If this is not the case, then the maximal invariant statistic becomes:
xT PH x
xT (P
H )x
(2.48)
xT PH x
xT (I PH )x
(2.49)
F =
or
F =
Furthermore, note that the constant false alarm rate matched filter can be described using a cosine statistic as [50]:
cos2 =
xT PH x
xT x
(2.50)
Although matched subspace detectors are significantly more complicated than the
special case of the matched filter, they provide a wide range of invariances and are
significantly more robust than matched filters when the signal of interest is not known
exactly, as is the case in the particular problem of landmine detection.
2.6
Support Vector Machines
Support vector machines (SVMs) are a relatively new type of learning machine that
have many interesting properties [29, 32, 28, 31]. Support vector machines operate
by mapping the data of interest to a high dimensional space and generating a separating hyperplane in that space. The high dimensional separating hyperplane can
then be used for hypothesis testing. In this section, we describe the mathematics
associated with SVMs and review how they avoid the complexities usually associated
with decision making in a high dimensional space.
22
2.6.1
Problem Statement and the Vapnik-Chervonekis Dimension
Assume that a set of training vectors {xi } are available which were drawn from some
probability density function P (x, y) where y Y : {1, 1}. Here, y represents the
classification of the training data into one of two sets or hypotheses. Let y = 1
correspond to H0 and y = 1 correspond to H1 . Then consider then the sets of training
data:
(x1 , y1 ), ..., (xN , yN ) <N Y
(2.51)
Consider a classification function f such that f : x <N Y . f can be trained

on the available data with the goal of finding the function f which best (in some
sense) classifies the yet unseen data vectors x. To define what is meant by best,
consider any loss function l. For a given l, define the best f as the function which
minimizes the expected error (risk) [29]:
Z
R[f ] =
l(f (x), y)dP (x, y)
(2.52)
However, since P (x, y) is generally unknown, this problem often cannot be solved
directly. In order to estimate the solution, one can minimize the empirical risk :
]=
R[f
n
X
l(f (xi ), yi )
(2.53)
i=1
which is effectively an estimation of the risk function using only the data available
to us. It is important to note that since full knowledge of the distribution P is rarely
available, the function f which minimizes the empirical risk may tend to overfit and
yield a complicated and non-realistic decision boundary. To address this dilemma,
f can be restricted to functions whose complexity (as calculated from the VapnikChervonekis (VC) dimension) is low (see [29] and [28]).
23
Unfortunately, the equations governing the VC dimension are complicated and

usually not of practical value [29]. If the search for f is restricted to linear forms:
f (x) = (w x) + b
(2.54)
(hyperplanes in some space), it can be shown [28] that the VC dimension is bounded
by the minimal distance from the hyperplane to a data point; this distance is called
the margin.
2.6.2
Kernel Functions and Avoiding the Complexities of a High Dimensional Space
Although the linear restrictions suggested above appear to be somewhat limiting,

this apparent shortcoming can be overcome by mapping the observed data into high
dimensional spaces. Consider a function : <N F, x (x) where F is a much
higher dimensional space than <N . Our original data can be mapped to the new
(high-dimensional) data set:
((x1 ), y1 ), ..., ((xN ), yN ) F Y
(2.55)
and the linear mapping f becomes:

f (x) = (w (x)) + b
(2.56)
Now the obvious question arises: All we have done is increase the complexity of
the problem we are trying to solve. By mapping to a much higher space, doesnt
the curse of dimensionality ensure that the decision making process should be more
difficult? [29]
While it seems as if the problem has become more complicated, statistical learning
theory tells us that as long as the complexity of the decision surface remains low
learning in F may actually be easier than learning in <N . [29]
24
A simple example from [32] and [29] illustrates this point. Consider a set of data
distributed in <2 . The goal in this example is to devise a decision rule to discern
between the two sets of data shown in figure 2.3. The decision boundary is shown by
the dashed line; note that it is non-linear. Consider the feature space mapping:
(x1 , x2 ) (x21 , 2x1 x2 , x22 ) : (z1 , z2 , z3 )
(2.57)
The same data transformed via the above mapping is re-plotted in fig. 2.4 where
the decision boundary is now a plane in <3 (in the form of eq. 2.56). The decision
boundary has been simplified by mapping the original data into a higher space. As
Muller et al. state:
All of the variability and richness that one needs to have a powerful
function class is then introduced by the mapping . [29]
where their function class is equivalent to the decision rule.
In the above example, the dimension of the space F was not large enough to be
of concern, but actual data sets of arbitrary dimension combined with mappings of
significant complexity often result in very large feature spaces which then become
impossible to manage [29]. However, for certain spaces F (and mappings ) there
exist functions which allow one to compute scalar products of high dimensional vectors easily. Such functions are called kernel functions and are denoted k [29, 28]. For
example, in the mapping presented earlier the dot products between the mapped vectors is easily calculated without actually mapping into the higher dimensional space
[29]:
(x) (y) = (x21 ,
2x1 x2 , x22 )(y12 , 2y1 y2 , y22 )T
= (x y)2 = k(x, y)
25
(2.58)
X2
X1
Z3
Figure 2.3: Data separation in 2 Dimensions
Z2
Z1
Figure 2.4: Data separation in 3 Dimensions

26
Special rules can be applied to determine if a function is a valid kernel. In this

thesis, we restrict ourselves to polynomial and Gaussian functions of the 2-norm of
the data. These are valid kernel functions by Mercers Theorem [29].
2.6.3
Finding the Optimal Hyperplane
Previous work, including the illustrative example above, has shown that in some cases
mapping data into higher dimensions may decrease the complexity of the data separation problem. Furthermore, kernel functions provide a tool to obtain dot products of
vectors in high-dimensional spaces without actually performing the high-dimensional
mapping. However, a technique for determining the optimal hyperplane as to achieve
the best possible performance has not been presented. In order to find the optimal
hyperplane, the discussion given in [29] is reviewed.
Optimal performance, and thus the optimal hyperplane, can be found by minimizing the expected risk. Since the expected risk is generally unknown, the optimal
hyperplane is found by minimizing the upper bound on the expected risk via [28]:
s
]+
R[f ] R[f
h ln(ln( 2n
+ 1) ln( 4 ))
h
n
(2.59)
with probability of at least 1 f or n > h.

where h is the VC dimension of the function class F.
] is zero,
If the training data is assumed to be perfectly separable by f , then R[f
and the risk is bounded by a monotonic function of the VC dimension h [29].
Furthermore, Vapnik has shown [28] that for linear classifiers (like the one determined by the optimal hyperplane) the VC dimension itself is bounded by a monotonic
function of w. Thus, one can find the optimal hyperplane by minimizing w while
maintaining perfect training data separation:
yi ((w (xi )) + b) 1,
27
i = 1, ..., n.
(2.60)
This minimization is complicated, but through Lagrange multipliers, it is possible to

arrive at the following quadratic programming formula [29, 32, 28]:
1
max T 1 T D
(2.61)
T Y = 0
(2.62)
(2.63)
1T = [1, ..., 1]
(2.64)
T = [1 , ..., n ]
(2.65)
subject to:
where:
w=
n
X
i yi (xi )
(2.66)
i=1
Dij = yi yj (k(xi , xj ))
(2.67)
k being the kernel f unction

The decision statistic is then:
f (x) = sign
" n
X
yi i ((x) (xi )) + b
(2.68)
i=1
or:
f (x) = sign
" n
X
yi i k(x, xi ) + b
(2.69)
i=1
In the above discussion it is assumed that the training data available is perfectly
separable by a hyperplane in F. If this is not the case, a hyperplane that is a solution
to:
1
2
max T 1 [T D + max ]
2
C
28
(2.70)
(subject to the same constraints) must be determined.

There is a substantial body of literature on solving the quadratic programming
problem (for a list of references, see [29]). In this work, we use Cawleys SVM package
(available from [30] or [51]). It achieves good performance by splitting the quadratic
optimization problem into mini-problems of size two using the sequential minimal
optimization technique [29].
29
Chapter 3
The Cramer-Rao Lower Bound
The response of the ground to wideband EMI sensors is a random vector b which
depends upon the makeup of the soil and the height of the sensor above the ground.
When measuring the EMI responses of buried targets in the earth, the variability in
the background response degrades our received signal. Thus, the measured response
from a buried M-14 landmine will differ significantly depending on the composition
of the soil under which the landmine is buried [43]. Since landmines are found
throughout the world in varying environments, background interference adversely
affects ones ability to define a robust non-adaptive decision algorithm.
One approach to reducing the effect of the background response is to take measurements near the potential target and use these measurements to estimate the
background signal at the target location. In this chapter we discuss several models of
the received background data and show that under certain assumptions the CramerRao lower bound can be achieved by using the available background measurements to
remove an estimate of the background signature from the potential target location.
In the measurements from the site in Virginia, two background signals were taken
for each potential target (see figures 2.1 and 2.2). We will assume that the background
response at the site is constant over a distance of one meter. This allows us to
model the background response as constant over the potential target location and
two neighboring background measurements. The assumption that the background
response is constant is reasonable since the composition of the soil is not expected
to change substantially over one meter and it has been shown [43] that sensor drift
occurs over a longer time span than would be required to take EMI readings over a
30
Typical Inphase and Quadrature Background Signals vs. LogFrequency

0
Quadrature
Inphase
10
20
Response
30
40
50
60
70
80
3
10
10
LogFrequency
Figure 3.1: Typical in-phase and quadrature background measurements versus

log-frequency
one meter square. For examples of in-phase and quadrature background signals, see
figure 3.1.
This chapter is divided into four sections each considering a different model of
the background response: additive zero mean white Gaussian noise, additive white
Gaussian noise with an additive constant term across frequencies, additive white
Gaussian noise with an additive non-constant variance term across frequencies, and
additive white Gaussian noise with a multiplicative term across frequencies.
3.1
Additive White Noise
For each target we have three measurements from the GEM-3. They will be denoted
si and are modeled as:
31
s1 = n1 + b
(3.1)
s2 = n2 + b + r
(3.2)
s3 = n3 + b
(3.3)
where
b is some unknown (but constant across the three measurements) vector representing
the ground response
ni is additive zero-mean white Gaussian noise [43]
r is the response of a buried target.
represents an arbitrary (non-negative) gain affecting the target response due to
the targets depth beneath the ground and the sensors height above the ground
The hypothesis test will be to decide between > 0 and = 0. First, we are
concerned with obtaining the best estimate of b so that we can estimate r via
r = s2 b.
(3.4)
= s1 + s3 .
b
2
(3.5)
We propose the estimator
This estimator is widely used in practice [43], but little analysis has been performed
is unbiased. This is
to evaluate its statistical properties. First we must show that b
easily shown by:
32
= E[
E[b]
s1 + s3
]
2
(3.6)
1
= E[n1 + b + n3 + b]
2
(3.7)
= b.
(3.8)
Note that b is a vector. In the following mathematical treatment we exploit

the assumption that the interfering noise is always white [43], so the measurements
between data points are uncorrelated. We use bi to represent the ith element of b
and show that our estimators satisfy our criteria for general bi and thus for b (also,
xji represents the ith data point in vector x from ground measurement j {1, 2, 3}).
The variance of bi is:
VAR[bi ] = E[(bi bi )2 ]
(3.9)
= E[b2i ] b2i
(3.10)
1
= E[(s1i + s3i )2 ] b2i
4
(3.11)
1
= E[n21i + n23i + 4n1i bi + 4n3i bi + 2n1i n3i + 4b2i ] b2i
4
(3.12)
=
33
n2
2
(3.13)
To determine optimality, we must show that the variance of bi achieves the CRLB
(eq. 2.5), using eq. 2.7 for the Fisher information. Since s1 and s3 are distributed as
N (b, n2 I), we have:
J(bi ) = Ebi [
2
ln(f (s1i , s3i |bi ))]
bi
(3.14)
Simplifying from the inside out:
(s1i bi )2 (s3i bi )2
2n2
(3.15)
1 2
(s 2s1i bi + b2i + s23i 2s3i bi + b2i )
2n2 1i
(3.16)
f (s1i , s3i |bi ) = C exp
ln(f (s1i , s3i |bi )) = ln(C) +
1
ln(f (s1i , s3i |bi )) = 2 (2s1i + 2bi 2s1i + 2bi )
bi
2n
(3.17)
differentiating again yields:

2
n2
(3.18)
Finally, taking the expected value and multiplying by 1, we have

J(bi ) =
2
n2
(3.19)
And the CRLB is satisfied:

1
2
= n = V AR(bi )
J(bi )
2
34
(3.20)
Typical Inphase Background Signals vs. LogFrequency
20
Response
30
40
50
60
70
80
3
10
10
LogFrequency
Figure 3.2: Typical in-phase background measurements visibly shifted by some

constant
Thus we have the optimal estimator of b given s1 and s3 .
In analyzing the experimental data, we noted that the data received from the
GEM-3 processor for adjacent background measurements was more variable than
could be accounted for by additive zero-mean Gaussian noise. The in-phase readings
appeared to be shifted by some additive constant across the frequency range, and
the quadrature readings appeared to either be corrupted by an additive term with
variance that increases across the frequency range, or have some small multiplicative
noise effects. For examples of these effects, see figures 3.2 and 3.3.
For clarity, we will refer to the additive-noise quadrature model as quadrature
model 1, and the multiplicative-noise quadrature model as quadrature model 2. Intuitively, it is reasonable to assume that the in-phase and quadrature signals should
35
Typical Quadrature Background Signals vs. LogFrequency
10
20
Response
30
40
50
60
70
80
3
10
10
LogFrequency
Figure 3.3: Typical quadrature background measurements corrupted by some multiplicative constant, or some additive term which increases with frequency
be subject to the same noise effects (additive, multiplicative, etc...). However, it is
unclear which statistical assumptions better model the background interference. For
completeness, we present the Cramer-Rao lower bound derivations for both cases. We
proceed to determine whether the previously posed estimator is still optimal when
the assumptions regarding the statistics of the noise are modified.
3.2
Additive White Noise and DC Term (in-phase)
For the in-phase case we will model the extra interference as a random DC term cj
with variance c2 :
s1 = n1 + b + c1
36
(3.21)
s2 = n2 + b + r + c2
(3.22)
s3 = n3 + b + c3
(3.23)
We assume that the cj are distributed as N (0, c2 ). Note that while the b and
n vectors are functions of frequency, the DC terms cj are constant across frequency.
Under these assumptions, the si are distributed N (b, I(n2 + c2 )). It is easy to show
is unbiased, and that its variance is 2 =
that the estimator b
b
2 + 2
n
c
.
2
From the
distribution of f (s1i , s3i |bi ), we can show that the form of the CRLB corresponding
to equation 3.16 is:
ln(f (s1i , s3i |bi )) = ln(C) +
1
[(s1i bi )2 + (s3i bi )2 ]
+ c2 )
2(n2
(3.24)
Differentiating twice with respect to bi yields the equation corresponding to 3.18:

2
.
+ c2 )
(n2
(3.25)
Multiplying by negative one and taking the inverse, we again find the CRLB equal
to the variance of the estimator and the estimator is thus optimal under the in-phase
hypothesis.
3.3
Additive White Noise and Additive Function of Frequency (model 1 quadrature)
This derivation is very similar to the in-phase model. In fact, the in-phase model of
an additive DC term is really a special case of the general additive vector encountered
here. In this model, the extra interference is modeled as a vector cj whose individual
terms cji have variance c2i :
37
s1 = n1 + b + c1
(3.26)
s2 = n2 + b + r + c2
(3.27)
s3 = n3 + b + c3
(3.28)
From the observed data, we can see that the variance of the cji increases with frequency. We assume that the cji are distributed as N (0, c2i ). Let 2c be the vector of
ci variances.
h
2c = c21 , c2i , ..., c2n
iT
(3.29)
The cj vectors are distributed as N (0, I 2c ). Under these assumptions, the sj are
is unbiased, and
distributed N (b, I(n2 + 2c )). It is easy to show that the estimator b
that its variance is 2 =
b
2 + 2
n
c
.
2
The distribution of the individual sji is:
f (s1i , s3i |bi ) = C exp
ln(f (s1i , s3i |bi )) = ln(C) +
(s1i bi )2 (s3i bi )2
2(n2 + c2i )
(3.30)
1
(s2 2s1i bi + b2i + s23i 2s3i bi + b2i ) (3.31)
+ c2i ) 1i
2(n2
(2s1i + 2bi 2s1i + 2bi )

ln(f (s1i , s3i |bi )) =
bi
2(n2 + c2i )
(3.32)
differentiating again yields:

(n2
38
2
+ c2i )
(3.33)
Taking the expected value and multiplying by 1, we have

J(bi ) =
(n2
2
+ c2i )
(3.34)
And the CRLB is satisfied:

( 2 + c2i )
1
= n
= V AR(bi )
J(bi )
2
3.4
(3.35)
Additive White Noise and Multiplicative Term (model

2 quadrature)
We now consider the quadrature case and assume that multiplicative Gaussian noise
is affecting the measured background signals. In this model, the multiplicative scaling
effects known to affect target responses are also assumed to affect the background
responses. This makes this model perhaps the most intuitively satisfying of all the
statistical models presented.
The multiplicative noise terms affecting the background responses are denoted kj
and are assumed to be distributed as N (1, k2 ). The received signals are modeled as:
s1 = n1 + k1 b
(3.36)
s2 = n2 + k2 b + r
(3.37)
s3 = n3 + k3 b
(3.38)
39
Note that s1 and s3 are distributed N (b, (b2 k2 +n2 )I). Furthermore, the estimator
= s1 +s3 is still unbiased.
b
2
Since the mean value (bi ) enters the signal distribution in the variance as well as
the mean, the calculations are more complicated. Since we assume that the noise
interference is white, we can consider the scalar equivalents of the pdf. The variance
of bi is given by:
VAR[bi ] = E[b2i ] b2i
= E[(
n1i + k1i bi + n3i + k3i bi 2

) ] b2i
2
1
2
= E[n21i + 2 n1i k1i bi + 2 n1i n3i + 2 n1i k3i bi + k1i
b2i +
4
(3.39)
(3.40)
(3.41)
2
2 k1i bi n3i + 2 k1i b2i k3i + n23i + 2 n3i k3i bi + k3i
b2i ] b2i
Taking the expected value, we obtain:

2
VAR[bi ] = si
2
(3.42)
2
si
= (b2i k2 + n2 )
(3.43)
with
To determine optimality, we begin with the conditional probability density function:

f (s1i , s3i |bi ) =
1
1
exp[ 2 ((s1i bi )2 + (s3i bi )2 )]
2
2si
2si
(3.44)
and apply equation 3.16. After taking the natural logarithm, the equation can be
separated into two terms from the coefficient and exponential portions of equation
3.44:
ln (
1
1
) 2 ((s1i bi )2 + (s3i bi )2 )
2
2si
2si
40
(3.45)
Differentiating equations 3.45 twice with respect to bi yields:

(6n2 b2i k2 2n4 + 2s1i b3 k4
(3.46)
6s1i bi k2 n2 + 2s3i b3i k4 6s3i bi k2 n2

3s23 k4 b2i + s23 k2 n2 3k4 s21i b2i
+s21i k2 n2 + 2b4 k6 2k2 n4 )/
(b2i k2 + n2 )3
The expected value operator then replaces s2ji with s2i +b2i and sji with bi , yielding:
(2b2i k4 + b2i k2 + n2 )
2
(b2i k2 + n2 )2
(3.47)
The Cramer-Rao lower bound is given by:

1
2)
(2b2 4 +b2i k2 +n
2 i(b2k2 +
2 )2
n
i k
(3.48)
or:
s4i
1
2 2 b2i k4 + s2i
(3.49)
Note that in this case, our estimator does not achieve the Cramer Rao lower
bound. In order to determine how close the variance of the proposed estimator is to
the variance of the optimal estimator, consider the term:
2 b2i k4
(3.50)
in the denominator. Since this term differentiates the CRLB from the variance of the
proposed estimator, as the term approaches zero, the difference between the variances
becomes negligible.
41
Comparison of CRLB, Sample, and Calculated variances vs. k

5
4.5
4
3.5
2b
3
2.5
2
1.5
CRLB
Sample Variance
(2k *b2 + 2n )/2
1
0.5
0
1e005 0.0333 0.0667
0.1
0.133 0.167
k
0.2
0.233 0.267
0.3
Figure 3.4: Plots of the Cramer-Rao lower bound, calculated, and sample estimator
variances versus the standard deviation of k. Parameters: bi = 10, n2 = 1.
To determine how well the proposed estimator performs compared to the CRLB,
a set of data was generated under the proposed assumptions and the actual (sample)
variance of the estimator was compared with the theoretically calculated variance
of the estimator and the Cramer-Rao lower bound. Figure 3.4 shows the CramerRao lower bound, the sample variance from a set of ten thousand data points, and
the calculated variance of the estimator (s2 /2). Note that the difference between the
CRLB and the sample and computed variances is small, especially for small k2 values.
In experiments, almost all estimated k2 values were found to be below 0.1 (except for
the lowest frequency measurement which, due to near-zero average magnitude, had a
high estimated k2 ). Thus, despite not achieving the CRLB, the proposed estimator
is expected to perform well on this data set.
42
We have shown that the intuitive estimation procedure that involves subtracting
the mean of the received background signals is optimal under three different assumptions regarding the underlying stochastic nature of the received signals:
1. if the signal is corrupted by additive white noise
2. if the signal is corrupted by additive white noise and a Gaussian-distributed
additive DC term (in-phase)
3. if the signal is corrupted by additive white noise and a Gaussian-distributed
additive vector (quadrature model 1)
and although not optimal, the intuitive procedure is a low-variance estimate when the
signal is corrupted by additive white noise and a Gaussian-distributed multiplicative
term (quadrature model 2). In the following chapters we will utilize the proposed
estimation technique to obtain estimates of the actual target responses for use in our
detection algorithms.
43
Chapter 4
Signal Processing Using Matched
Subspace Detectors
In chapter 3 we proposed an estimator of the background signal b which is an optimal
or low-variance estimator under several models of the underlying stochastic processes.
Using this estimator, we can now estimate the target response via
r = s2 b.
(4.1)
Using this target response estimate, a detection algorithm that distinguishes between
landmines and clutter and between different landmine types can be developed. In this
section the application of matched subspace filters to correctly identify and classify
landmines is presented.
4.1
Properties of Estimated Landmine Responses
We begin by inspecting the responses of different landmine types. As expected, the

landmines all have unique wideband EMI signatures [20].
Figure 4.1 shows the estimated in-phase and quadrature responses of five VS-50
landmines which were obtained by subtracting the estimated background as suggested
in Chapter 3. These five landmines were buried at depths from 0 to 1.875 inches.
Figure 4.2 shows the estimated in-phase and quadrature responses of three M-14
landmines which were obtained in the same manner. These three landmines were
buried at depths from 0.25 to 1.75 inches.
The responses of different landmine types are distinguishable from one another
and, as has been shown (see [20, 43]), the general shape of the responses stays constant
44
Estimated VS50 Landmine Responses vs. LogFrequency
Inphase
Quadrature
500
400
Response
300
200
100
0
100
3
10
10
LogFrequency
Figure 4.1: Signatures of VS-50 landmines versus log-frequency

across measurements despite differences in target-sensor orientation and mine depth.
Note that the final data point, corresponding to 23,970 Hz, in the estimated signals appears to be markedly out of place - especially in the quadrature measurements.
Comparisons to previous work on landmine responses and the theoretical treatment
of responses given in chapter 2 led us to believe that the final data points are distorted. Whether this corruption is a function of the sensor (it is operating at the
very limit of its frequency range), the additional noise inherent to measurements at
these frequencies, or user error is unclear. Due to the apparent erroneous nature of
the highest frequency measurement, the final data point is excluded in the work that
follows.
Although the landmine responses are discernible from one another and maintain
their approximate shape despite differences in their depth, it is clear that the ener45
Estimated M14 Landmine Responses vs. LogFrequency
15
10
Response
5
0
5
10
Inphase
Quadrature
15
10
10
LogFrequency
Figure 4.2: Signatures of M-14 landmines versus log-frequency

gies of the responses from any particular landmine type vary widely. This problem is
inherent in real-fielded landmine detection: the depth at which a landmine is buried
substantially alters the energy of the received signal [36]. This is particularly evident in the quadrature responses of high metal-content mines like the VS-50 (see
figure 4.1). This signal distortion can be modeled as an uncertainty parameter in
the distributions of our data. Consider an unknown parameter which acts as a
multiplicative gain on the received data. Physically, represents the depth at which
the landmine is buried. An effective detector should be robust or invariant to changes
in the uncertainty parameter . The matched subspace detector is such a detector
[52].
46
4.2
Basis Estimation
In order to apply a matched subspace detector, a linear subspace containing the

received signals is needed. Alternatively, a set of basis functions that spans the
responses from a particular landmine type must be found.
Estimating a signal subspace is a well studied problem [27], but the maximum
likelihood solution was not appropriate in this situation. The maximum likelihood
estimate of a signal subspace consists of the p largest eigenvectors of the sample covariance matrix [27]. However, the calibration data available often only contained
between one and three instances of any particular landmine type. The sample covariance matrix in this case would clearly be inaccurate. Furthermore, if it is assumed
that variation in target-sensor distance leads primarily to a change in the gain of the
received signals, we can very easily model the subspace in a much simpler fashion:
as scaled versions of a mean vector.
In figure 4.2 an actual M-14 landmine quadrature response, the mean of all M-14
landmine responses, and an estimate of the actual M-14 using a scaled version of the
mean are shown. The error in the resulting signal estimation is about 0.7% of the
original signals energy. In this particular case the estimation of a landmine response
as a scaled version of the mean of all landmine responses is very accurate, and this
result holds across all different landmine types (although the technique performs
significantly better on the quadrature data).
The decision to model the different responses as scaled versions of a single response is also intuitively satisfying, since it applies a simple law to account for
distance-induced differences in measurements. Furthermore, the scaling relationship
associated with target-sensor distance is well known [36, 35].
47
Mean, Actual, and Estimated M14 Responses vs. LogFrequency
16
14
Response
12
10
8
6
Mean M14 Response
Actual M14 Response
Estimated M14 Response
4
2
3
10
10
LogFrequency
Figure 4.3: Actual, mean, and estimated signatures of M-14 landmines
4.3
Designing the Matched Subspace Filter
The clutter present in the blind grid poses a unique problem to traditional subspace
detection techniques. Clutter is by nature difficult to classify (generally made up of
anthropic and natural conductors with an enormous range of sizes and shapes). Also,
the calibration data set contained only 20 clutter responses. One approach considered
was to model the clutter as a set of basis functions and have a clutter-detection
algorithm to compare against our landmine detection algorithm. However, attempts
to formulate a basis to model clutter are inherently limited since clutter is comprised
of an infinite set of possible shapes, sizes, and materials. Despite the wide range
of clutter which impedes most detection techniques, a matched subspace detector
should be somewhat naturally robust to clutter interference. Consider a piece of
48
random clutter whose response is some vector x. Our decision statistic is the cosine
statistic (equation 2.50):
xPH x0
.
xx0
(4.2)
The numerator can be considered a matched-energy detector since the output of

the numerator is the amount of the energy in x which lies in the subspace spanned
by <H>. We have assumed that there is only one basis vector in H corresponding to
the mean of the landmine responses for a given landmine-type. Therefore, for clutter
to register a large response in the detector, it must look much like a scaled version
of our landmine response (i.e. lie in the subspace spanned by the mean vector of the
landmine responses).
The standard matched subspace detector is appropriate for finding a single landmine type amongst background or clutter (binary hypothesis test). However, the
blind grid is populated with various landmine types. In the multiple hypothesis
test case our detector must decide between H0 and all the alternative hypotheses:
{H1 ,H2 ,...,Hn }. The standard likelihood ratio then becomes:
p(x|{H1 , H2 , ..., Hn })
p(x|H0 )
(4.3)
p(x|H1 )p(H1 ) + p(x|H2 )p(H2 ) + ... + p(x|Hn )p(Hn )

p(x|H0 )
(4.4)
i (x)p(Hi )
(4.5)
Where p(Hi ) represents the a priori probability of minetype i. Since all mine types
are considered equally likely a priori, this reduces to:
=
X
i
49
i (x)
(4.6)
Equation 4.6 suggests implementing a bank of matched subspace filters and summing their outputs to form a decision statistic. However, this formulation also assumes that the distribution p(x|H0 ) is known, but in this work, the distribution of
clutter is unknown and difficult to estimate. As an illustrative example of the problems encountered when p(x|H0 ) is unknown, consider n matched subspace filters each
tuned to a specific landmine type. When a landmine response is presented to the
bank of filters, a typical set of outputs contains one large response coinciding with
the matched subspace filter tuned to that landmine type. When a clutter response
is fed to the same bank of filters, although no filter bank produces a particularly
large result, the clutter vector generates significant responses from several different
filter banks because the clutter model in the denominator which would normally offset the numerator is missing. That clutter induces significant responses from several
filter banks makes intuitive sense since all of the landmine responses, when taken
together, span a large subspace and clutter will undoubtedly have some energy in
the span of this space. For typical examples of the matched subspace filter bank
outputs for clutter and landmine data, see figure 4.3. Note that the sum of the outputs across filter banks for the input clutter vector is larger than the sum for the
landmine vector. In this case, a better (although sub-optimal) decision statistic than
the summation across the filter banks is the maximum value across the filter banks.
Although this technique is not equivalent to the Bayesian solution to the multiple
hypothesis test problem, the similarities are evident. The Bayesian solution to the
multiple hypothesis testing problem is to choose Hi such that Hi maximizes the a
posteriori probability p(Hi |x) [48].
A bank of matched subspace detectors was thus generated, with each filter tuned
to a specific landmine type. The decision statistic chosen was the maximum value
50
Matched Subspace Outputs vs. Filter Bank for Landmine and Clutter Responses
1
Landmine Filter Bank Outputs, sum = 1.0110
Clutter Filter Bank Outputs, sum = 1.2803
0.9
0.8
Filter Bank Output
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
6
8
Filter Bank Number
10
12
Figure 4.4: Comparison of filter bank outputs resulting from landmine and clutter
responses. Note that the sum across the filter banks from the clutter response is
larger than from the landmine response.
across the bank of filters.
= max i
i
(4.7)
Despite not being optimal, we shall see that the performance of this statistic is very
good. Furthermore, the maximum value across the filter banks provides an intuitive
method to perform landmine classification - the landmine type corresponding to the
largest filter bank output is considered our best guess of the underlying landmine type.
This is different from the maximum a posteriori Bayesian solution which chooses Hi
to maximize p(Hi |x); here Hi is chosen to maximize the percent energy of x in PHi
which is an intuitive measure of p(x|Hi ).
Note that since equation 4.2 contains a normalization term in the denominator
(xx0 ), the detector ensures that the maximum output from the detector is one 51
regardless of the energy of the input vector. This is important in a bank of detectors
since the numerator (xPH x0 ) will very often produce a large result for a large input
energy x.
The invariance that matched subspace detectors provide to gain occasionally has
some drawbacks. The primary drawback in this work stems from very low energy
clutter which often looks like deeply buried high-energy landmines. Consider the
VS-50 landmine (see fig. 4.1) which has a relatively high energy and rather flat frequency response. A substantial amount of low energy clutter also has a flat frequency
response. As a result, scaled low energy clutter often looks like a VS-50 landmine
to a matched subspace detector.
However, our prior knowledge regarding the depths at which landmines can be
buried leads us to conclude that very low energy flat signatures are not landmines
buried meters in the ground, rather they are small pieces of clutter. In this work we
assume that landmines will not be buried beyond their tactical depths. We further
assume that the distribution of landmine depths in the blind grid is uniform and commensurate with the depths found in the calibration grid. Under these assumptions,
we implemented an energy pre-screener that evaluates the energy of each potential
target vector to ensure that it is commensurate with the current filter bank landmine
type (within one order of magnitude from the lowest and highest energies from the
calibration grid for that particular landmine type). If the energy is within limits, the
subspace detector proceeds normally, otherwise that particular bank of the subspace
detector (wherever the input energy was found to be outside the reasonable range of
energies for that landmine type) is manually assigned a low output value.
Besides discriminating between clutter and landmines, detection algorithms must
also discriminate between empty ground signatures and landmines. While the blind
grid contains several blank squares containing neither anthropic clutter or landmines,
52
no such squares were measured in the calibration grid, so our detector may be subject
to false alarms caused by empty grid squares. We did not consider this a serious problem because background-corrected responses from blank grid squares should contain
very little energy and be automatically rejected by the energy pre-screener.
4.4
Matched Subspace Results
To determine the effectiveness of our matched subspace detector in discriminating

landmines from clutter, receiver operating characteristic (ROC) curves were generated for the calibration and blind grids. The calibration grid ROCs were generated
manually, and the blind grid ROCs were generated by the government sponsor of the
test site. We expect our calibration lane ROCs to be very good since the filter was
trained on that data, while good results from the blind grid would be an indicator of
the algorithms robustness.
Before sending our results to be scored, the algorithm was run on the calibration
grid to determine its effectiveness. Two separate detectors utilizing the in-phase and
quadrature data were created and tested. As can be seen in figure 4.4, the algorithm
performs significantly better on the quadrature data than on the in-phase data. In
fact, the in-phase results are not significantly better than a simple energy detector.
We believe the poor in-phase performance is due to the relatively high amount of
noise inherent in the in-phase readings. Alternatively, the in-phase data may be
more difficult to model as a linear combination of a set of vectors. Future efforts that
may improve the in-phase processor results are discussed in chapter 7.
Figure 4.4 shows the ROCs of the matched subspace filter operating on the blind
and calibration data as well as a simple baseline energy detector operating on the
blind grid data. The matched subspace detector is nearly as effective on the blind
grid as on the calibration grid. This indicates that the algorithm is fairly robust and
53
1
0.9
0.8
0.7
Pd
0.6
0.5
0.4
0.3
Quadrature MSS ROC
InPhase MSS ROC
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 4.5: Comparison of in-phase and quadrature matched subspace receiver

operating characteristics from the calibration grid
that the assumptions made regarding the interfering noise statistics are reasonable.
Further, we note the substantial decrease in the false alarm rate as compared to the
simple energy detector. The matched subspace detector achieves a false alarm rate of
11% (at a probability of detection of 95%) in the blind grid, which is an improvement
of over a factor of 6 versus the energy detector.
The major difference between the two matched subspace curves appears between
the 60% and 95% probability of detection range. We believe the difference between
the two curves here stems from the vast amount of clutter present in the blind grid
as compared to the calibration lanes. The smoothness of the blind-grid ROC stems
from the 800 or so pieces of clutter present therein, and the discrete-jump nature of
the calibration ROC stems from the 20 pieces of clutter found there.
54
Matched Subspace Detector ROCs in Calibration and Blind Grids

1
0.9
0.8
0.7
Pd
0.6
0.5
0.4
0.3
Calibration MSS ROC

Blind MSS ROC
Blind Energy Detector ROC
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 4.6: Comparison of quadrature matched subspace detector and baseline

energy detector receiver operating characteristics from the blind and calibration grids
55
Chapter 5
Decay Rate Estimation
As discussed in chapter 1, a popular method of discriminating landmines from clutter
is through characteristic decay rate estimates. In this chapter we discuss why decay
rates may be useful for target identification and discrimination, the estimation procedure that has been utilized, the relative locations of poles from the calibration lanes,
and a simple method of discrimination using Gaussian probability density functions.
5.1
Decay Rates
The EMI responses of a highly conducting body are given by equations 2.1 and 2.2
which are repeated here for convenience.
H() = a +
X
n
S(t) = a(t) +
bn
jn
An en t
(5.1)
(5.2)
There is a substantial amount of work in the literature pertaining to estimating

decay rates from time-domain signals (see [14, 15, 17, 18, 19, 53, 54]). Decay rates
have been investigated for several reasons. First, they provide a compact space with
which to model landmine responses. In our experiments two decay rates are used to
model a signal which has eighteen data points (9 in-phase and 9 quadrature), thus
the computational load on our detection algorithm is reduced (note that obtaining
these decay rates is, however, computationally very expensive). Furthermore, as has
been noted [37], the decay rates should be purely target dependent at the frequencies
the GEM-3 sensor is operating at. Arguments against using decay rates cite the
56
computational load required to estimate these parameters and the fact that decay
rates do not provide a sufficient statistic [15].
5.2
Estimation Procedure
In this work, we focused on estimating the two primary decay rates from our EMI
data. In order to estimate 1 and 2 , an objective function was generated to minimize
the mean-square error between our estimated responses and the data. The MATLAB
function FMINUNC (in the optimization toolbox) was then used to find the optimal
five parameters to model each landmine. (Five parameters: DC term a, two gains b1 ,
b2 and two decay rates 1 , 2 .)
Often (especially when modeling clutter), the algorithm used by FMINUNC could
not find potential solutions any significant distance from the initial values provided.
This may be due to a local minimum in the objective function near the initial guess.
In these cases (when the resulting parameters were deemed too close to the initial
guesses), the initial decay rates were varied over a wide range and the optimization
was carried out at each point. The resulting estimate with the lowest error was chosen
as the best estimate of the target decay rates.
The error in these models was very low across a wide range of mine energies.
Figures 5.1 and 5.2 show the parametrized fits to the data for one high-energy and
one low-energy landmine.
Since the estimated decay rates approximate the actual responses well and the
estimated responses shapes are highly correlated, it is intuitive to suppose that the
decay rates estimated from different responses from the same landmine type would be
clustered together to some degree. Such clustering would indicate that the estimated
decay rates are drawn from some target dependent distribution and could facilitate
the formulation of a detector based on them.
57
Estimated and Fitted VS50 Responses
Fitted Response Error = 0.33042%
350
300
250
200
150
100
50
Quadrature Data
Inphase Data
Quadrature Fit
Inphase Fit
0
50
3
10
10
LogFrequency
Figure 5.1: Estimation of VS-50 Response

Several attempts were made to use clustering algorithms available in MATLAB to
group the different landmines automatically. However, the results obtained seemed
slightly counter-intuitive and did not take into account our a priori knowledge of
which estimates were from which landmine types. Figure 5.3 illustrates a clustering
of decay rates by landmine type made manually, and figure 5.4 provides a closeup of
the same figure.
Note the high degree of intra-mine type correlation. The majority of landmines for
a given type were grouped together. The only two instances where all the responses
from a particular landmine type were not grouped together were the M-14 HE / nonHE landmines. In the calibration lanes, two of the M-14 landmines were measured
with their primary high-explosive fills present. This altered the responses enough to
warrant the separation of these M-14s from their counterparts (the difference between
58
Estimated and Fitted M14 Responses
Fitted Response Error = 0.047565%
15
10
5
0
5
10
Quadrature Data
Inphase Data
Quadrature Fit
Inphase Fit
15
10
10
LogFrequency
Figure 5.2: Estimation of M-14 Response

HE and non-HE landmine responses is documented in [43]).
The decay rate estimates for the clutter from the calibration grid is shown in
figure 5.5. As can be seen from the figure, the clutter decay rates are distributed
throughout the range of frequencies but are more densely concentrated at low values
of the first decay rate 1 .
5.3
Gaussian Models and Detection
One of the simplest approaches to incorporate the estimated decay rates into a detection algorithm is to model their statistical distribution with a 2-Dimensional Gaussian
probability density function and generate detectors based on these PDFs. By combining the probability density functions for the different landmine types, a mixture of
59
x 10
Clustering of Mine Decay Rate Estimates by Mine Type
4.5
4
3.5
VS50
TS50
M14
PMA3
VAL69
VS2.2
M19
TMA4
TM62P3
T72
TM46
V31.6
3
2.5
2
1.5
1
0.5
0
5
4
x 10
Figure 5.3: Estimated landmine decay rates plotted against 1 and 2 in Hz. Each
landmine type is represented by a different shape.
Gaussian densities is formed. For each cluster of decay rates (clusters do not necessarily represent all landmines of a given landmine type) the sample mean and variance
were calculated using standard techniques. However, with so few data points for each
landmine type, these estimates are suspect. For example, the calibration grid contains only one instance of certain landmines. These solitary landmines are clustered
alone. To estimate the variance of their decay rate distribution functions, an estimate
of the average decay rate variance across landmine types was used. Also, no attempt
was made to generate estimated correlation matrices since there was rarely enough
data to make for a decent estimation. Contours of some of the resulting estimated
Gaussian distributions are shown in figure 5.6. The combination of the separate decay rate PDFs results in a mixture of Gaussian pdfs across the range of i values.
60
x 10
Clustering of Mine Decay Rate Estimates by Mine Type
VS50
TS50
M14
PMA3
VAL69
VS2.2
M19
TMA4
TM62P3
T72
TM46
V31.6
2.5
1.5
0.5
1000
2000
3000
4000 5000
1
6000
7000
8000
Figure 5.4: Estimated landmine decay rates plotted against 1 and 2 in Hz

(close-up). Each landmine type is represented by a different shape. Note the high
degree of spatial correlation between landmines of each type.
We assumed that the clutter decay rates were totally random (uniform across
the range of frequencies) since we had little information to base any general clutter
model upon. Under this assumption the optimal detector for each landmine type is a
threshold on the mixture of Gaussian PDFs (or a monotonic function there of). Since
we have estimated the means and variances of the landmine clusters, this decision
statistic is a GLRT. To make the detector capable of discerning between all landmine
types and clutter, we followed the filter bank procedure outlined in the Chapter 5.
Thus, our results could be used to discriminate between landmine types by choosing
the filter bank with the highest response to an estimated set of decay rates.
61
x 10
Clustering of Estimated Decay Rates for Clutter
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
10
4
x 10
Figure 5.5: Estimated clutter decay rates plotted against 1 and 2 in Hz. Note that
the estimated decay rates for clutter objects are spread throughout a wide frequency
range.
5.4
Decay Rate Estimation Results
In this section we briefly discuss the ROC curves generated from the parameter based
detector discussed above. Figure 5.7 shows the ROC generated from the calibration
data.
Note that the algorithm does not achieve a 95% detection rate until its false alarm
rate approaches 35% and the algorithm only achieve a 100% detection rate at a 45%
false alarm rate. Furthermore, we have good reason to believe that the detector will
not be robust in a situation with a large amount of clutter. This belief is based on
the very small amount of training data with which the Gaussian distributions were
generated (often using only one or two data points). Since this detector did not
62
x 10
Gaussian Contours with Clutter and Mines
2.5
Mines
Clutter
1.5
0.5
1000
2000
3000
4000
5000
6000
Figure 5.6: Gaussian PDF contours with scattered landmine and clutter decay rates
perform as well as the matched subspace detector, and since we have reason to doubt
its robustness to unseen data, the blind grid results were not sent to the government
sponsor to be scored. In Chapter 7 we discuss ways by which we might improve the
detector prior to sending the results to be scored.
63
MultiGaussian Distributed Parameter Detector ROC

1
0.9
0.8
0.7
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 5.7: ROC for Gaussian-PDF estimated decay rate-based detector operating
in the calibration grid.
64
Chapter 6
Support Vector Machine Algorithms
Support vector machines are statistical learning machines as discussed in Chapter 2.
The learning algorithm in a SVM requires the solution of the quadratic programming
problem which maximizes the margin between the optimal hyperplane and the data
set (see equation 6.1). This chapter outlines some of the uses and advantages of support vector machines before discussing the implementation of an SVM for landmine
detection.
1
max T 1 T D
(6.1)
Support vector machines are versatile tools that can be applied to almost all
classification and detection problems. Their power stems from mapping observed
data to a high dimensional space through a kernel function. Their simplicity stems
from restricting the decision boundary to a hyperplane in the high dimensional space.
Support vector machines have been successfully used in digit recognition, object
recognition, function estimation, and time-series prediction (for a list of references,
see [29] and [30]).
As shown in figures 2.3 and 2.4, support vector machines can find linear solutions to decision problems that require non-linear solutions in the observed space.
Furthermore, by maximizing equation 6.1 one guarantees that the optimal linear
decision surface in the high dimensional space has been found.
65
6.1
Building the Support Vector Machine
In this work, support vector machines were trained on three sets of data: estimated
decay rates, estimated signal responses, and the outputs from the bank of matched
subspace detectors.
Originally, a support vector algorithm was created using MATLABs QUADPROG function (available in the Optimization Toolbox). This function worked well
on small data sets, but performance was poor on large data sets and in cases where a
soft classifier was required to separate the data (if the SVM can not perfectly separate
the calibration data using a hyperplane, the constraints on equation 6.1 need to be
relaxed, see equation 2.70). The performance degradation was probably due to the
lack of support vector machine specific optimizations in the QUADPROG function
(see [29] and [31] for a summary of numerical techniques for improving support vector
machine computation times). To reduce computation time, a support vector toolbox
was chosen that used the sequential minimal optimization technique (see the end of
chapter 2 for details and references).
6.2
Model and Parameter Selection and Implementation
One of the biggest concerns when creating a support vector machine is the choice
of kernel function. A significant body of work has been devoted to optimizing the
choice of kernel and kernel parameters (for references see [29]). Most of the techniques
available for kernel function selection are computationally intensive and require a
large number of data points to analyze.
In our work, we based the kernel selection solely on the model performance in
the calibration grid. Two kernels were considered: Gaussian and polynomial. These
kernels are shown in equations 6.2 and 6.3 respectively. The Gaussian kernel con66
sistently out-performed the polynomial kernel and was chosen to act as our kernel
function. The good performance of the Gaussian kernel is intuitively satisfying because the main metric in the Gaussian kernel is the 2-norm of the vectors and it is
reasonable to suppose that the distance between our data vectors is a useful metric
for discrimination [31].
|x1 x2 |2
c
(6.2)
k(x1 , x2 ) = (x1 x2 + )d
(6.3)
k(x1 , x2 ) = exp
The only parameter in a Gaussian kernel function is the variance term c. Three
different variances were used for the three different support vector machines (decay rates, landmine responses, and MSS outputs). Experiments were performed
by shifting the variance term across several orders of magnitude for each data set.
The resulting ROC curves were analyzed as were the decision boundaries each SVM
generated. Based on performance observations, the following c values were chosen:
1 107 , 1 103 , 1 (corresponding to variance terms in the decay rates, full signal,
and matched subspace detector based support vector machines respectively).
To perform detection across a range of different landmine types, the winner takes
all methodology was utilized [33]. The winner takes all technique is akin to the filter
bank technique described in Chapter 5. The maximum result across the bank of SVMs
was chosen as the threshold value, and the landmine type corresponding to that filter
bank became our landmine type estimate.
To generate different SVMs corresponding to the different landmine types, the
landmines were clustered according to type, except that the two M-19 landmines
were separated from one another. The M-19 landmines required separation because
of their close proximity to a large number of clutter decay rates. As a result, the
decision boundary generated by the SVM was very counter intuitive when the M67
19 mines were grouped together. Also, as with the matched subspace detector, the
M-14 mines containing the high explosive (HE) were separated from the other M-14
landmines.
Since the support vector machine is a learning algorithm, a shortage of training
data will result in poor generalization capability. To overcome the lack of training
data, each landmine response was estimated three separate ways - once with the mean
of the nearest background signals (as discussed in Chapter 3), and twice more by
using each of the neighboring background signals as our estimate of the background.
Although this estimation technique is non-optimal, a support vector machine has no
a priori information regarding the data being modeled, so a large amount of training
data must be available for the SVM to generalize well.
Furthermore, two versions of each of the three different support vector machines
were created. All of the SVMs were trained to reject clutter, but we were also
interested in how the SVMs would behave if the SVM for each landmine type was
trained to reject all other landmines types. These different SVMs will be referred to
as non-rejecting (SVMs that were not trained to reject other landmine types) and
rejecting (those that were trained to reject other landmine types).
Under the above parameters the SVMs were trained. In the next section we
discuss the results from these three support vector machines.
6.3
Support Vector Machine Results
To better visualize the decision boundaries generated by our SVMs, the SVMs decision contours were plotted along with the locations of nearby clutter and landmine
decay rates (see Figure 6.1). Note the intricate curve of the support vector machine
decision boundary, and especially the class two margin. We also note that one of the
landmines being modeled falls outside the decision boundary. This is to be expected
68
SVM boundaries for one mine type with scattered mines and clutter
11000
10000
9000
8000
7000
6000
5000
class one margin

decision boundary
class two margin
Mines
Clutter
4000
2000
4000
6000
1
8000
10000
Figure 6.1: Support Vector Machine decision boundaries for non-rejecting SVMs
and relevant landmine and clutter parameter locations from the calibration grid.
if a hyperplane in F cannot separate the data perfectly. We also note that the support vector machine may be slightly over-fitting the clutter which we believe is more
uniformly distributed than the decision boundary would indicate.
The calibration grid ROC curves for the three different support vector machines
are shown in figures 6.2 and 6.3.
We note that while all of the support vector
machines achieve a detection rate of at least 95% at 10% false alarm rate, several of
the support vector machines do not achieve 100% detection until high false alarm rates
have been encountered. Also, the receiver operating characteristics of the rejecting
and non-rejecting versions of the SVMs did not differ significantly in the calibration
data.
The blind grid results for these SVMs are shown in figure 6.4. The difference
69
ROC Curves for three different SVMs (nonrejecting)

1
0.9
Decay Rates
MSS
Full Signal
0.8
0.7
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 6.2: Receiver operating characteristics of non-rejecting support vector machines trained on decay rates, matched subspace outputs, and full signal responses
operating in the calibration grid
between the rejecting and non-rejecting versions of the SVMs was negligible in the
blind grid, so only the results from the non-rejecting SVMs are shown.
We note that the resulting ROC curves have long tails, i.e. they rise very
quickly to a probability of detection around 60% or 70%, but increase slowly beyond
that point. In fact, none of the support vector machines achieves a 100% detection
rate until their false alarm rate is above 99%.
The support vector machine which performed best was the SVM trained using
the entire signal. This makes sense since there is more information regarding the
true nature of a buried object in the full target response than in the outputs from
a bank of matched subspace filters or the estimated decay rates. Essentially, these
parameterizations of the signal do not provide a sufficient statistic.
70
ROC Curves for three different SVMs (rejecting)

1
0.9
Decay Rates
MSS
Full Signal
0.8
0.7
Pd
0.6
0.5
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 6.3: Receiver operating characteristics of rejecting support vector machines

trained on decay rates, matched subspace outputs, and full signal responses operating
in the calibration grid
Since the support vector machine trained on the parameters 1 and 2 regularly
out performs the support vector machine trained on the matched subspace outputs,
we can conclude that the estimated parameters hold more information than the output of the matched subspace filter. This is reasonable since the outputs of a hypothesis test (like the matched subspace filter) ideally would compress the information in
a given to signal to a single bit denoting the presence of H1 or H0 .
Our goal in training the SVM on the matched subspace filter bank outputs was
to exploit the inaccuracy inherent in our modeling mine responses as scaled versions
of one another. We also hoped to discern if any information could be gathered from
the amount of energy in each of the landmine bins - not just the maximum across the
bins. For example, if an average VS-50 landmine has some amount of energy which
71
ROC Curves for three different SVMs in Blind Grid

1
0.9
0.8
0.7
Pd
0.6
0.5
Energy
SVM on MSS results
SVM on parameters
SVM on full signal
0.4
0.3
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 6.4: Receiver operating characteristics for three different support vector
machines operating in the blind grid
lies in the subspace of an average M-14 landmine, the M-14 filter bank would register
a small amount of energy whenever a VS-50 landmine was present. However, since
the matched subspace detectors decision statistic is the maximum value across the
bins, it would not use this possible correlation. In that case, it might be possible to
train a support vector machine that would take into account the information in the
other filter banks that the matched subspace detector missed.
To summarize, if there were no overlap in the information extracted by the
matched subspace detector filter banks, the only important information in the filter bank outputs would be the maximum value across the banks. If there were
significant overlap between the filter banks, it might be reasonable to assume that
there was information in the remaining filter bank outputs that could be useful. The
72
poor performance of the SVM operating on the matched subspace detector outputs
suggests that there is little information to be gleamed from the outputs of the other
filter banks, and thus, little overlap amongst filter banks.
These results are discussed in more detail in chapter 7.
73
Chapter 7
Conclusions and Future Work
In the preceding chapters, several statistics-based methods of discriminating landmines from clutter using wideband EMI responses were discussed. From the government generated receiver operating characteristics, it is clear that significant improvement in landmine detection techniques is possible using statistical signal processing
algorithms.
All of the statistical algorithms discussed are based on mine responses estimated
with the intuitive background removal technique presented in chapter 3. A statistical
analysis of the estimation technique was presented, and the estimation procedure was
found to achieve the Cramer-Rao lower bound under several assumptions regarding
the underlying stochastic process. In the case of multiplicative Gaussian interference,
the proposed background subtraction technique does not achieve the CRLB but is
still a low variance estimator. One possible avenue for further work is to determine
the optimal or maximum likelihood estimator under the multiplicative interference
assumption.
In chapter 4 the application of matched subspace detectors to the landmine detection problem was explored. By modeling the landmine subspace as scaled versions
of a single basis function, the matched subspace detectors gain invariance was utilized. The matched subspace detector and support vector machine receiver operating
characteristics are shown in figure 7. Under the stated assumptions regarding the
underlying signals, the matched subspace detector is an optimal detector, so its dominance over the support vector machines reinforces our assumptions regarding the
nature of the received signals.
74
Comparison of ROC curves for different detectors

1
0.9
0.8
0.7
Pd
0.6
0.5
0.4
0.3
MSS
SVM on MSS results
SVM on parameters
SVM on full signal
0.2
0.1
0
0.2
0.4
0.6
0.8
Pf
Figure 7.1: Comparison of detector operating characteristics for matched subspace

and support vector machines
We believe that improvements to the matched subspace detector can be made in
several ways. First, the EMI responses of buried landmines may not be adequately
modeled as multiplicative scalings of a single basis vector; they may be better modeled
as more complicated combinations of a linear subspace. As mentioned earlier, there
was not enough EMI data available to us to generate a set of basis functions for
each landmine type. An attempt to model all the mines as functions in a single
linear subspace through maximum likelihood subspace estimation also failed since the
landmine responses are so unique. If a significant amount more data were available, it
may be possible to accurately model the subspace from which each landmine response
was drawn.
Also, figure 4.4 shows the performance disparity between the quadrature and
75
in-phase matched subspace detectors operating in the calibration grid. It is our

belief that the in phase responses are not properly modeled as scaled versions of
one another, and that improved in-phase receiver operating characteristics could be
achieved through a better model of the underlying in-phase subspace.
Matched subspace detectors also have subspace-noise cancellation or rejection
capabilities. It would be interesting to compare the results of a matched subspace
filter built to pass energy of one landmine type and reject energies of all others
with our results here. Finally, although a good subspace model for the response of
clutter to wideband EMI radiation was not found, it may be possible to generate a
set of basis functions which can adequately model most clutter, and incorporate that
into the matched subspace formulation.
In chapter 5 the applications of frequency-domain decay rate estimation to landmine detection were explored. The resulting decay rate estimates were used in Gaussian and SVM detection algorithms with mixed results. We believe that the underlying distribution of the decay rates is Gaussian and feel that the lack of robustness
and high false alarm rates encountered with the Gaussian-based detector stem from
the lack of training data available to us. A larger sample size of experimental data
may help to formulate more accurate estimated of the underlying pdfs and improve
the detectors performance.
In chapter 6, the creation of support vector machines for the landmine detection
problem was presented. We feel that the same issues that impede the decay rate
and matched subspace approaches seriously inhibit the application of support vector
machines to the landmine detection problem. Since support vector machines have
no a priori information to base a decision statistic upon, a lack of training data is
especially detrimental to their performance. Consider the application of the support
vector machine to the matched subspace detection results. In this case, a simple
76
threshold on the maximum value across the filter banks provides the excellent results
obtained by the matched subspace detector. It appears that the support vector
machine based on that data is over-complicating the decision surface and finding
a much more computationally intensive (yet sub-optimal) solution. This can be
explained through the lack of available data (the support vector machine is trying
to learn the correct decision surface and cannot do so with only 27 landmine data
points). Alternatively, the Gaussian kernel may not be the optimal kernel in the case
of matched subspace outputs. The performance of all the support vector machines
presented may also be improved by altering the variance parameter in our Gaussian
kernel function and exploration of alternative kernel functions. Future applications of
support vector machines to the problem of landmine detection should involve larger
training sets. Furthermore, a mathematical attempt to obtain the optimal kernel
function parameters should be explored.
In the support vector machine presented, a winner takes all multiple hypothesis test was chosen because of its relation to the filter bank methodology used in
the matched subspace and decay rate detectors. However, the winner takes all approach is not optimal. Future work should apply one of the alternative SVM multiple
hypothesis frameworks [33] to the landmine detection and classification problem.
The application of wideband EMI data to the problem of landmine detection has
been presented. Matched subspace detectors were considered for landmine discrimination. These detectors have invariances which fit the landmine detection problem
well. The decay rates inherent to EMI responses of buried conductors were estimated
and a simple detection algorithm was built based on them. Support vector machines,
a type of learning algorithm, were applied to the landmine detection hypothesis testing problem with mixed results. The matched subspace detector was seen to provide
the lowest false alarm probability at very high detection rates. Furthermore, sugges77
tions for future work in this vein have been made which may further reduce the false
alarm rates inherent to landmine/clutter discrimination.
78
Bibliography
[1] Adopt a Minefiled Association. Adopt a minefield. http://www.landmines.org/.
[2] Report to the U.S. Congress. Unexploded ordnance clearance: A coordinated
approach to requirements and technology development. Office of the Undersecretary of Defense, Washington, DC , Mar. 25, 1998.
[3] International standards for humanitarian mine clearance
http://www.un.org/Depts/dpko/mine/Standard/s-index.htm.
operations.
[4] Y. Wang, I. D. Longstaff, C. J. Leat, and N. V. Shuley. Complex natural resonances of conducting planar objects buried in a dielectric half-space. IEEE
Transactions on Geoscience and Remote Sensing, 39(6):11831189, June 2001.
[5] C. Chen, M. B. Higgins, K. ONeill, and R. Detsch. Ultrawide-bandwidth fullypolarmetric ground penetrating radar classification of subsurface unexploded
ordnance. IEEE Transactions on Geoscience and Remote Sensing, 39(6):1221
1230, June 2001.
[6] A. van der Merwe and I. J. Gupta. A novel signal processing technique for clutter
reduction in gpr measurements of small, shallow land mines. IEEE Transactions
on Geoscience and Remote Sensing, 38(6):26272637, November 2000.
[7] B. Karlsen, J. Larsen, H. B.D. Sorensen, and K. B. Jakobsen. Comparison of pca
and ica based clutter reduction in gpr systems for anti-personal landmine detection. In Proceedings of the 11th IEEE Signal Processing Workshop on Statistical
Signal Processing, pages 146149, 2001.
[8] P. D. Gader, M. Mystkowski, and Y. Zhao. Landmine detection with ground
penetrating radar using hidden markov models. IEEE Transactions on Geoscience and Remote Sensing, 39(6):12371244, June 2001.
[9] W. R. Scott, J. S. Martin, and G. D. Larson. Experimental model for a seismic landmine detection system. IEEE Transactions on Geoscience and Remote
Sensing, 39(6):11551164, June 2001.
[10] J. M. Sabatier and N. Xiang. Investigation of acoustic-to-seismic coupling to detect buried antitank landmines. IEEE Transactions on Geoscience and Remote
Sensing, 39(6):11461154, June 2001.
79
[11] Y. Das, J.E. McFee, and R.H. Chesney. Time domain response of a sphere in
the field of a coil: Theory and experiment. IEEE Transactions on Geoscience
and Remote Sensing, GE - 22:360367, July 1984.
[12] L. Carin, H. Yu, Y. Dalichaouch, A. R. Perry, P. V. Czispott, and C. E.
Baum. On the wideband emi response of a rotationally symmetric permeable
and conducting target. IEEE Transactions on Geoscience and Remote Sensing,
39(6):12061213, June 2001.
[13] J. T. Miller, T. H. Bell, J. Soukup, and K. Keiswetter. Simple phenomenological models for wideband frequency-domain electromagnetic induction. IEEE
[14] S. L. Tantum and L. Collins. Performance bounds for target identification using
decay rates estimation from emi measurements. In Proceedings of the Geoscience
and Remote Sensing Symposium, volume 5, pages 22782280, 2000.
[15] S. L. Tantum and L. Collins. A comparison of algorithms for subsurface target
detection and identification using time-domain electromagnetic induction data.
IEEE Transactions on Geoscience and Remote Sensing, 39(6):12991306, June
2001.
[16] B. Barrow and H. H. Nelson. Model-based characterization of electromagnetic
induction signatures obtained with the mtads electromagnetic array. IEEE
[17] L.S. Riggs, J.E. Mooney, and D.E. Lawrence. Identification of metallic mine-like
objects using low frequency magnetic fields. IEEE Transactions on Geoscience
and Remote Sensing, 39(1):5666, 2001.
[18] G.D. Sower and S.P. Cave. Detection and identification of mines from natural
magnetic and electromagnetic resonances. Proceedings of the SPIE, April 1995.
[19] Y. Das, J.E. McFee, J. Toews, and G.C. Stuart. Analysis of an electromagnetic
induction detector for real-time location of buried objects. IEEE Transactions
on Geoscience and Remote Sensing, 28:278287, May 1990.
[20] I.J. Won, D. A. Keiswetter, and T. H. Bell. Electromagnetic induction spectroscopy for clearing landmines. IEEE Transactions on Geoscience and Remote
Sensing, 39(4):703709, April 2001.
80
[21] P. Gao and L. Collins. A comparison of optimal and suboptimal processors for
classification of buried metal objects. IEEE Signal Processing Letters, 6(8):216
218, August 1999.
[22] P. Gao, L. Collins, N. Geng, and L. Carin. Comparison of pca and ica based
clutter reduction in gpr systems for anti-personal landmine detection. In Proceedings of the Geoscience and Remote Sensing Symposium, volume 3, pages
18191822, 1999.
[23] L. Collins, P. Gao, and S. Tantum. Model-based statistical signal processing
using electromagnetic induction data for landmine detection and classification.
In Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal
Processing, pages 162165, 2001.
[24] L. Collins, L. Makowsky P. Gao, J. Moulton, D. Reidy, and D. Weaver. Improving detection of low-metallic content landmines using emi data. In Proceedings
of the Geoscience and Remote Sensing Symposium, pages 16311633, 2000.
[25] L. L. Scharf and B. Friedlander. Matched subspace detectors. IEEE Transactions on Signal Processing, 42(8):21462157, August 1994.
[26] D. Keiswetter, E. Novikova, I.J. Won, T. Hall, and D. Hanson. Electromagnetic
induction spectroscopy for ordnance identification. In Proc. SAGEEP, pages
743751, 1999.
[27] L. L. Scharf. Statistical Signal Processing: Detection, Estimation, and Time
Series Analysis. Addison-Wesley, Reading, MA, 1991.
[28] V. N. Vapnik. Statistical Learning Theory. John Wiley and sons Inc., New York,
NY, 1998.
[29] K-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf. An introduction
to kernel-based learning algorithms. IEEE Transactions on Neural Networks,
12(2):181201, March 2001.
[30] Kernel machines. http://www.kernel-machines.org/.
[31] Nello Cristianini and John Shawe-Taylor. An Introduction To Support Vector
Machines and other kernel-based learning methods. Cambridge University Press,
Cambridge, 2000.
81
[32] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20:273

297, 1995.
[33] D. J. Sebald and J. A. Bucklew. Support vector machines and the multiple
hypothesis test problem. IEEE Transactions on Signal Processing, 49(11):2865
2872, November 2001.
[34] T. H. Bell, B. J. Barrow, and J. T. Miller. Subsurface discrimination using electromagnetic induction sensors. IEEE Transactions on Geoscience and Remote
Sensing, 39(6):12861293, June 2001.
[35] Norbert Geng, Carl E. Baum, and Lawrence Carin. On the low-frequency natural response of conducting and permeable targets. IEEE Transactions on Geoscience and Remote Sensing, 37(1), January 1999.
[36] C.E. Baum, editor. Detection and Identification of Visually Obscured Targets.
Taylor and Francis, London, U.K., 1998.
[37] L. Carin. Wideband electromagnetic induction spectroscopy. Technical report,
Geophex, Ltd., 2000.
[38] G. D. Sower and S. P. Cave. Detection and identification of mines from natural
magnetic and electromagnetic resonances. Proceedingsd of the SPIE, 2496, April
1995.
[39] N. Geng, P. Garger, L. Collins, L. Carin, D. Hansen, D. Keiswetter, and I.J.
Won. Wideband electromagnetic induction for metal-target identification: Theory, measurement, and signal processing. IEEE Transactions on Geoscience and
Remote Sensing, UNKNOWN, UNKNOWN UNKNOWN.
[40] D.W.A.H. Trang and P.V. Czispott. Characterization of small metallic objects
and nonmetallics anti-personnel mines. Proc. SPIE Detection and Remediation
Technologies for Mines and Minelike Targets II, 3079, 1997.
[41] I.J. Won, D. Keiswetter, G. Fields, and L. Sutton. Gem-3: a monostatic broadband electromagnetic induction sensor. Journal of Environmental Engineering
and Geophysics, 2(1):129, 1997.
[42] Ping Gao. Improved approaches to landmine remediation using signal detection
and estimation theory. Masters thesis, Duke University, 1997.
82
[43] Leslie Collins, Ping Gao, Deborah Schofield, John Moulton, Larry Makowski,
Denis Reidy, and Richard Weaver. A statistical approach to landmine detection
using broadband electromagnetic induction data. In press, IEEE Transactions
on Geoscience and Remote Sensing.
[44] P. Gao, L. Collins, P. M. Garner, N. Geng, and L. Carin. Classification of
landmine-like metal targets using wideband electromagnetic induction. IEEE
Transactions on Geoscience and Remote Sensing, 38(3):13521361, May 2000.
[45] JUXOCO, Ft. Belvoir, VA. Hand Held Metallic Mine Detector Performance
Baselining Collection Plan, December 1998.
[46] Joint Unexploded Ordnance Coordination Office.
Hand
metallic
mine
detector
performance
baselining
collection
http://www.uxocoe.brtrc.com/testdata/PDF/HHTESTPLAN.PDF.
held
plan.
[47] Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes.

McGraw-Hill, New York, NY, 3 edition, 1991.
[48] R. N. McDonough and A. D. Whalen. Detection of Signals in Noise. Academic
Press, New York, NY, 2 edition, 1995.
[49] Thoms M. Cover and Joy A. Thomas. Elements of Information Theory. John
Wiley and Sons, Inc., New York, NY, 1991.
[50] S. Kraut, L. L. Scharf, and L. T. McWhorter. Adaptive subspace detectors.
IEEE Transactions on Signal Processing, 49(1):116, January 1994.
[51] Matlab support vector machine toolbox. http://theoval.sys.uea.ac.uk/ gcc/svm/toolbox/.
[52] S. Kraut and L. L. Scharf. The cfar adaptive subspace detector is a scaleinvariant glrt. IEEE Transactions on Signal Processing, 47(9):25382541,
September 1999.
[53] W. M. Steedly and R. L. Moses. The cramer-rao lower bound for pole and
amplitude coefficient estimates of damped exponential signals in noise. IEEE
Transactions on Signal Processing, 41:13051318, March 1993.
[54] Ping Gao and L.M. Collins. A theoretical performance analysis and simulation
of time-domain emi sensor data for land mine detection. IEEE Transactions on
Geoscience and Remote Sensing, 38(2):20422055, July 2000.
83

Torrione 2002 Masters

Uploaded by

Copyright:

Available Formats

You might also like

Torrione 2002 Masters

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Torrione 2002 Masters

Uploaded by

Copyright:

Available Formats

A COMPARISON OF STATISTICAL ALGORITHMS FOR

Peter Acerbo Torrione

Dr. Leslie Collins, Supervisor

A thesis submitted in partial fulfillment of the

Electromagnetic Induction Systems . . . . . . . . . . . . . . . . . . .

Physics of EMI Systems . . . . . . . . . . . . . . . . . . . . .

The GEM-3 Sensor . . . . . . . . . . . . . . . . . . . . . . . .

Parameter Estimation and the Cramer-Rao Lower Bound . . . . . . .

The Detection Problem: Likelihood Ratios and Generalized Likelihood

The Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . .

The Generalized Likelihood Ratio Test . . . . . . . . . . . . .

The Matched Filter . . . . . . . . . . . . . . . . . . . . . . . .

Linear Algebra Preliminaries and Matched Subspace Detectors . . . .

Linear Algebra Preliminaries . . . . . . . . . . . . . . . . . . .

Invariance of Hypothesis Testing Problems . . . . . . . . . . .

Invariance Tests and Maximal Invariant Statistics . . . . . . .

Matched Subspace Detectors . . . . . . . . . . . . . . . . . . .

Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . .

Problem Statement and the Vapnik-Chervonekis Dimension .

Kernel Functions and Avoiding the Complexities of a High Dimensional Space . . . . . . . . . . . . . . . . . . . . . . . . . .

Finding the Optimal Hyperplane . . . . . . . . . . . . . . . .

3 The Cramer-Rao Lower Bound

Additive White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . .

Additive White Noise and DC Term (in-phase) . . . . . . . . . . . . .

Additive White Noise and Additive Function of Frequency (model 1

Additive White Noise and Multiplicative Term (model 2 quadrature)

4 Signal Processing Using Matched Subspace Detectors

Properties of Estimated Landmine Responses . . . . . . . . . . . . .

Designing the Matched Subspace Filter . . . . . . . . . . . . . . . . .

Matched Subspace Results . . . . . . . . . . . . . . . . . . . . . . . .

5 Decay Rate Estimation

Gaussian Models and Detection . . . . . . . . . . . . . . . . . . . . .

Decay Rate Estimation Results . . . . . . . . . . . . . . . . . . . . .

6 Support Vector Machine Algorithms

Building the Support Vector Machine . . . . . . . . . . . . . . . . . .

Model and Parameter Selection and Implementation . . . . . . . . . .

Support Vector Machine Results . . . . . . . . . . . . . . . . . . . . .

7 Conclusions and Future Work

Calibration grid landmine type and depth specifications . . . . . . . .

Calibration Lane Data Collection . . . . . . . . . . . . . . . . . . . .

Blind Lane Data Collection . . . . . . . . . . . . . . . . . . . . . . .

Data separation in 2 Dimensions . . . . . . . . . . . . . . . . . . . . .

Data separation in 3 Dimensions . . . . . . . . . . . . . . . . . . . . .

Typical in-phase and quadrature background measurements versus logfrequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Typical in-phase background measurements visibly shifted by some

Signatures of VS-50 landmines versus log-frequency . . . . . . . . . .

Signatures of M-14 landmines versus log-frequency . . . . . . . . . . .

Actual, mean, and estimated signatures of M-14 landmines . . . . . .

Comparison of filter bank outputs resulting from landmine and clutter

Comparison of in-phase and quadrature matched subspace receiver

Comparison of quadrature matched subspace detector and baseline

Estimation of VS-50 Response . . . . . . . . . . . . . . . . . . . . . .

Estimation of M-14 Response . . . . . . . . . . . . . . . . . . . . . .

Estimated landmine decay rates plotted against 1 and 2 in Hz. Each

Estimated clutter decay rates plotted against 1 and 2 in Hz. Note

ROC for Gaussian-PDF estimated decay rate-based detector operating

Support Vector Machine decision boundaries for non-rejecting SVMs

Receiver operating characteristics of rejecting support vector machines

Receiver operating characteristics for three different support vector