Professional Documents
Culture Documents
Localization of Sound Sources - Studies On Mechatronics
Localization of Sound Sources - Studies On Mechatronics
Studies on Mechatronics
Spring 2009
Christian Lenz
i
Declaration of Independence
Hereby I, Christian Lenz, assure that this Studies on Mechatronics with the title
“Localization of Sound Sources” were written by myself. All used sources are declared
and if there are citations they are clearly marked.
Zürich
May 10, 2009
Christian Lenz
ii
Abstract
The goal of this work is to give an overview of the field of artificial sound localization
techniques. For this purpose the approach is decomposed in three parts, treated in this
document. Firstly the data acquisition, secondly the signal processing and thirdly the
microphones and their physical arrangement.
The technique that is mostly used, the inter-aural time difference (ITD), will be dis-
cussed as well as techniques concerning inter-aural level difference (ILD), beam forming
(BF), microphone directivity (MD) or head related transfer functions (HRTF). These
techniques mainly concern the data acquisition. Also important are different ways of
signal processing. The influence of different correlation techniques and filtering proces-
sors helping to separate the desired signal from noise will be discussed. The third part
describes the influence of different microphone arrangements.
In a last section the “Mosquito Localization Problem” will be discussed. Possible solu-
tions are imagined to localize a flying mosquito in order to “blind” it.
iii
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Area Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2.1 Scheme of Sound Source Localization . . . . . . . . . . . . . . . . 2
1.3 Assumptions and Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Hardware 14
3.1 Classification of Microphones . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Differentiation by Principles . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Type of Construction and Directivity . . . . . . . . . . . . . . . 15
3.1.3 Additional Structures . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Number and Arrangement of Microphones . . . . . . . . . . . . . . . . . 17
3.2.1 One Microphone . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Two Microphones . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Arrays of Microphones . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Conclusion 22
iv
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6 Bibliography 23
List of Figures
List of Tables
v
Nomenclature
ASL Autonomous Systems Lab
BF Beam Forming
FD Frequency Domain
MD Microphone Directivity
ML Maximum Likelihood
vi
SRP Steered Response Power
TD Time Domain
vii
viii
1 Introduction
1.1 Motivation
The motivation to do the Studies on Mechatronics (SoM) on localization of sound
sources has its origin in my job as a sound engineer. Working normally at live concerts
and thus trying to listen precisely at every sound made me wondering about artificial
ways to “listen”. The human hearing is easily influenced by psychoacoustic effects. In
live concert applications this is desired, otherwise it could be interesting to do mea-
surements without being “deceived”.
Looking for a possibility to write the SoM I had a conversation with Ambroise Krebs,
PhD student at the ASL at ETH Zürich. He knew of Laurent Kneip, also a PhD student
at the ASL which had already done some work on artificial sound source localization
and finally had the idea of the “Mosquito Localization Problem”. As I received the
project description the task began.
1
1.2.1 Scheme of Sound Source Localization
For a better understanding of the task of sound source localization different process
steps have to be identified. Under the assumption of listening passively to the envi-
ronment without use of any active localization technique the following scheme can be
established:
2
microphones. Here, already with the data acquistion, information about the direction is
gained. By choosing the overall strategy and therefore the physical principle one would
like to use, it is often the case that allready a restriction or even a ‘partial solution’ on
one or several of the three other problem fields.
The structure of this paper matchs the schematics of figure 1.1. The first part focuses
on data acquisition. Different physical principles to get data are discused. In the second
part the focus is on ways of signal processing in order to compute the determined
position of a sound source or to make the system more robust to noise. The third part
is about different microphon types and varying forms of arranging them. And finally
the last part discusses the mosquito localization problem in more detail and possible
approaches are proposed.
artificial Only artificial ways of sound localization are discussed. Ideas like neural
networks and other biological based approaches are beyond the scope of this text.
passive It is assumed that the localization system only passively listens to a generic
unknown situation. No active techniques of localizing obstacles like sonar or echo
location are discussed.
source Since the sound source has no defined shape all techniques which combine pas-
sive ways of sound localization with other techniques like video imaging, radar
or IR are omited. The sound source is suposed to be an omnidirectional point
source.
3
2 Sound Localization – Data Acquisition
and Signal Processing
2.1 Inter-aural Time Difference – ITD
The most common sound source localization technique is indeed the inter-aural time
difference (ITD). Also known as synonyms for this technique are Inter-aural Phase Dif-
ference (IPD) or Time-Difference Of Arrival (TDA or TDOA). Since it is relatively
simple to get the phase shift between two signals this technique has been investigated
a lot and found its way to many applications.
The basic idea of ITD is based on time shifts between received signals due to the
finite speed of sound. A sound wave front propagating through a medium has a certain
speed, the speed of sound1 . Due to this fact the wave front arrives at different times at
different locations. Thus, differently placed microphones are receiving more or less the
same signal but with a small time shift. As it is shown later the following computation
to estimate the sound sources position2 is independent from the speed of sound. This
makes the ITD-principle highly adaptive to different environments. It can be applied
in a gas as well as in a fluid.
1.0
0.5
2 4 6 8 10 12
-0.5
-1.0
Figure 2.1: Illustration of the time shift between two microphones induced by a single,
immovable sound source.
The now proposed mathematical model is based on the far-field assumption. [4, 5]
1
The speed
√ of sound in a gas depends on the gas itself and the temperature.
a = γRT ≈ 344 m/s (for air at 295 K)
2
‘Position’ means in this context the ‘direction to the sound source’. The distance is not independent
from the speed of sound at all.
4
The far-field assumption simply states that the distance l from the sound source to the
microphones is much larger than the distance d between the two microphones building
a pair. Therefore the two incident “sound rays” can be assumed as parallel which allows
simple calculation. Further we assume that there is no diffraction involved.
Figure 2.2: The far-field assumption l >> d allows simple calculations. (Source: [5])
Since most sound signals can be approximated as stationary for short periods of time
it can be assumed that the sound source is immovable during computations. As an
example, for speech this period is in the order of 20 − 30 ms [1].
cτij pi − p~j ) • ~u
(~
cosΦ = = = sinΘ (2.1)
k~
pi − p~j k k~
pi − p~j k • k~uk
1
⇒ τij = pi − p~j ) • ~u
(~ (2.2)
c
where k~uk = k1k is the unit vector and denotes the sound sources direction and c is
the speed of sound. A highly significant aspect is the distance between the microphones
p~i + p~j . Having it higher means in generall having higher accuracy on the estimated
angle. This can easily be seen in the formula 2.3 as derived in the paper [6]:
dL − dR
cosΦ = (2.3)
2k
where P hi means the angle in the direction to the sound source, dL , dR the distances
to the sound source measured from the left/right microphone and k the distance be-
tween the two microphones. It is irrelevant to the principle but worth mentioning here
that there exists an approach which respects the sampling time Fs for the calculation
of τ [5]. The time difference becomes then
Fs
⇒ τij = pi − p~j ) • ~u
(~ (2.4)
c
where Fs is the sampling time.
It is important to understand that the equations stated above only describe the prin-
ciple of sound source localization via ITD. Further processing has to be done in order
5
to compute the sound sources position. By using correlation techniques τ can be de-
termined. If a beamformer is used this leads directly to an estimation of the sources
position3 . If no beamformer is used there have to be other ways of computing the
position. One is proposed by Kneip in [6], which is a purely geometric model to deduct
the direction to the source or the position of the sound source respectively.
Figure 2.3: ILD cues are purely based on relative energy differences between two or
several ears. (Source: Thomson Higher Education, 2007)
A sound source s(t) is supposed. While it propagates, the sound is soiled by noise
from the environment. As sensors, we suppose N microphones. Now the signal received
by the ith microphone can be modeled as
s(t)
xi (t) = + ξi (t) (2.5)
di
where di is the constant distance of the ith microphone from the sound source and
ξi (t) is white Gaussian noise simulating the noisy environment.
In order to compute the relative energy difference between microphone pairs we have
to do some simplifications. The position of the source s(t) as well as the position of the
microphones is supposed to be constant while processed. This time window is defined
as the interval [0, W ] where W is the window size. The signal must be audible.
The energy received by the ith microphone can be computed by
3
Beamforming, see page 7, section 2.3.
6
Z W Z W 2
2 s(t)
Ei = xi (t) dt = + ξi (t) dt
0 0 di
0
z
}| {
W
s2 (t)
Z
s(t) 2
= +2 ξi (t) +ξi (t)
0 d2i di
Z W Z W
1 2
= 2 s (t)dt + ξi2 (t)dt (2.6)
di 0 0
where it is assumed that integrating the non-squared noise ξi (t) results in a mean
value of zero. The cross-term can therefore be neglected. As the energy is inversely
proportional to the square of the distance from the source to the microphone (Ei ∝
1/d2i ) this relation is named as the inverse-square-law [7].
For further discussion, a planar problem with only two microphones is considered.
Using equation (2.6) for two microphones leads to the following simple relation between
energies and distances
Directing a beamformer and look for the highest output simply can be done with a so
called delay-and-sum beamformer. Having a certain number of microphones positioned
in space in a fixed structure4 allows to determine exactly the time delays between two
microphones. In a delay-and-sum beamformer all received signals are aligned (corrected
in phase) and added. Since the microphone signals are in phase they are added con-
structively by the summation. The noise, assumed white, cancels out.
The output of a delay-and-sum beamformer as in figure 2.4 with M microphones is
defined as:
4
An array of microphones.
7
M
X −1
y(n) = xm (n − τm ) (2.8)
m=0
where xm (n) represents the received signal of the mth microphone, τm the corre-
sponding time delay of arrival [8]. Finding the maximum output can be done by
computing the energy over a frame of the length L.
L−1
X 2
E= x0 (n − τ0 ) + ... + xM −1 (n − τM −1 ) (2.9)
n=0
E will be maximized if all the τm are such that the signals are in phase. Remark it is
assumed that there is only one sound source present. Otherwise the possibility of two
energy peaks of different signals overlapping is highly increased. In that case of course
it would be impossible to differentiate them.
The further signal processing often requires a lot of computational power. Especially
if we handle the problem in the time domain. Often it is better to transform to frequency
domain for a reduction of the computer power consumption. A description of a delay-
and-sum beamformer is also given in [8].
By expanding equation 2.9 the beamformer can be described as:
M
X −1 L−1
X M
X −1 m
X1 −1 L−1
X
E= x2m (n − τm ) + xm1 (n − τm1 )xm2 (n − τm2 ) (2.10)
m=0 n=0 m1 =0 m2 =0 n=0
N
X 1 −1
−1 nX
E =K +2 Rxn1 , xn2 (τn1 − τn2 ) (2.11)
n1 =0 n2 =0
8
P −1 PL−1 2
K= M m=0 n=0 xm (n − τm ) is assumed as constant and therefore neglected when
maximizing the output engergy E. In the frequency domain one can approximate the
cross-correlation function as:
L−1
X
Rij (τ ) ≈ Xi (k)Xj (k)ej2πkτ /L (2.12)
k=0
where Xi (k) is the discrete Fourier transform of xi (nt ), Xi (k)Xj (k) is the cross-
spectrum of xi (nt ) and xj (nt ), (.) denotes the complex conjugate. The reduction on
the required computational power is described with an example in [5]. Here, only a
very short summary is given. By pre-computing the Rij (τ ) it takes just N (N − 1)/2
lookup and accumulation operations. In the time domain the computation requires
2L(N + 2) operations.
The main problem using a delay-and-sum beamformer is that we have relatively wide
energy peaks. This makes resolution poorer and therefore more difficult to identify or
separate sources from other sources or noise. One way to improve accuracy is to ‘whiten
the signals’ as explained in section 2.7.
The main idea of head related transfer functions is to look closely at nature. The
human hearing system is a very complex organ with amazing possibilities to locate
9
sound sources. Usually having the whole capabilities of our hearing system (inclusive
inter-aural level difference and inter-aural time delay) we can even locate sound pre-
cisely with one ear shut5 . This is because our hearing system does a spectral shaping
of the sound. HRTF are a way to model the mentioned shaping of sound during the
entrance to the ear.
Incoming sound to the ear is shaped by the whole hearing system. The pinnae, the
ear canal but also the head and torso do a spectral shaping of the sound. This shaping
strongly depends on the direction of the sound source relatively to the hearing system.
Since shorter wave lenghts belong to higher frequencies, sound locating systems based
on HRTF show better results when locating higher frequencies, mainly above 5 kHz [9],
where the shaping of the sound is much stronger than at lower frequencies.
For technical applications two approaches are known, a binaural6 [9] and a mono-
aural7 [10]. The first approach, as proposed by Keyrouz, works with a data base of the
well known KEMAR HRTF as a look-up table. The measured signal is correlated to
this table and the most likely position estimation is then choosed. This needs relatively
low computational power. The second approach as proposed by Hao is in need of
much more computational power due to lack of closed-form algorithms. This is why he
proposes an approach to narrow down the huge dataset of KEMAR transfer functions
by applying IIR / FIR filters on the data set first. Secondly, he applies self-learning
neural networks to cope with the high complexity. All this is beyond the scope of this
paper and therefore it is refered to [10].
In this chapter signal processing is in the focus of interest. Due to noise, reverberance
and system inaccuracy in sound localization, poor signal quality is often found. This
makes it hard to run the localization cue. To increase system stability and robustness
to noise, specific signal processing can be done. The most common way to do so in
sound localization tasks, is presented below.
The goal of pre-filters is to accentuate the signal where highest signal-to-noise ratio
(SNR) can be found. In contrast, noise shall be supressed. There exist also other
possibilities to handle and process the aquired data which are not mentioned here.
5
Worth mentioning in this context is that this works mainly for frequencies above 5 kHz. Below the
human auditory system is using ILD as main cue.
6
Two ears like the human auditory system. Each ear contains one microphone
7
Two microphones implemented as one ear. One is placed inside the other outside.
10
2.7 Generalized Cross-Correlation – GCC
As ITD cues are the most commonly used, most algorithms are developed for applica-
tions in combination with microphone arrays and therefore on steered response power.
In general the task to be done is to compute an estimation of the time delay of arrival
D. On one hand cross-correlation and pre-filtering can be done in the time domain
(TD) and on the other hand it takes much less computational power to do it in the fre-
quency domain (FD) . The showed derivations are mainly based on a paper of Charles
H. Knapp [11].
x1 y1
H1
T
× ∫0 2 Peak
Detector
x2 y2
H2 Delay
A signal s1 (t) is assumed to be the only signal in a room. This signal is uncorrelated.
Further, noise and reverberance are respected with ni (t), is named ‘noise’ and assumed
to be white noise. The signals xi (t), received by the i microphones, therefore can be
modelled as in equation 2.14:
where for simplification only two microphones are assumed. α is the attenuation and
D the delay (phase shift) between the two received signals xi (t). As a relatively slow
varying environment is assumed, the signal s1 (t), the noise ni (t), attenuation α and
the delay D are supposed to be stationary during the observation time T . Further it
is necessary to have a sufficiently high signal-to-noise ratio for being able to detect the
sound source. This will be discussed later.
2.7.1 Principle
The received signals being defined, this subsection shows how two signals con be cross-
correlated. As a remainder, the goal is to get an estimation D
b of the true delay D. The
correlation function is defined by
Rx1 x2 (τ ) = E x1 (t)x2 (t − τ ) (2.15)
where E denotes expectation value. E is maximized by τ . Hence the value of the
delay D. An estimation of the cross-correlation is given by
11
Z T
1
bx x (τ ) =
R 1 2 x1 (t)x2 (t − τ )dt (2.16)
T −τ τ
Again τ maximizes (2.15). It is important to understand that this only can be an
estimation because of the finite observation time. Correlation calculation either can
be done in time domain or in frequency domain. As already mentioned normally it is
more reasonable to correlate in FD because of the lower computational costs. Since
the link between TD and FD is defined by fourier transformation the correlation can
be written as:
Z ∞
Rx1 x2 (τ ) = Gx1 x2 (f )ei2πf τ df (2.17)
−∞
0
z }| {
Rx1 x2 = αRs1 s2 (τ ) + Rn1 n2 (τ )Gx1 x2 (f ) = αGs1 s2 (f )e−i2πf D + Gn1 n2 (2.21)
where n1 (t) and n2 (t) are assumed to be uncorrelated. Since a multiplication in time
domain becomes a convolution in frequency domain:
12
Rx1 x2 (τ ) = αRs1 s2 (τ ) ⊗ δ(t − D) (2.22)
The term δ(t − D) is ‘spread’ or ‘smeared’ by the Fourier transformation. This is
how peaks in a spectrum become broadened. As long as there is only one single delay
peak broadening is no big problem. Having multiple delays or even multiple sources
the resolution goes down and it may become impossible to distinguish peaks or delay
times.
X
Rx1 x2 (τ ) = Rs1 s2 (τ ) ⊗ αi δ(τ − Di ) (2.23)
i
To have a good time delay resolution means to choose Ψg (t) in a way to ensure a
large sharp peak in Ry1 y2 (t). The system then becomes more sensitive to errors (for
example due to finite observation time T ). This is especially the case if we have a low
SNR. So the choice of Ψg (t) becomes a tradeoff between stability and resolution.
2.7.3 Pre-Filtering
As mentioned above the choice of Ψg (f ) = H1 (f ) · H̄2 (f ) is an important step in the
design of a pre-filter. It strongly depends on the problem the filter will be applied to
how it is choosen best. Charles H. Knapp presents possible processors in his paper [11].
They are listed below:
13
3 Hardware
carbon microphone These were the first microphones built and well-known by the
mouthpieces of the first telephones. As a sensor for sound pressure a capsule
filled with carbon pieces is used. The resistance of the capsule dependends on the
acutal pressure on the membrane.
14
electret microphone These work based on the same principle as condenser micro-
phones. The difference is that they do not need any supply voltage for the sound
conversion. The charging is already ‘frozen’ on the membrane. These micro-
phones are cheap and therefore found in a lot of simple applications. As ‘normal’
electret microphones are not so low-noise, they usually do not find their way into
demanding applications.
laser microphone A laser beam is directed against an objects surface which is affected
by the sound one would like to listen to. The fine vibrations of the surface in the
sound field produce interferences between the emitted laser beam and its detected
reflection. By demodulation it is possible to reconstruct the sound.
15
Figure 3.1: Usual combinations of directivities for professional microphones.
(Source of images: Wikipedia)
microphones. This microphones are often used in noisy environments where one
would like to get a specific sound as for example on film sets or for TV interviews.
Parabolic microphones or ‘big ears’ are the extension of the above mentioned lobe mi-
crophones. They are based on the same principle like parabolic antennas for radio
waves. A parabolic deflector focuses incoming sound waves on the microphone
placed in the focal point. This allows to listen precisely into a specific direction
over long distances. Limiting for low frequencies is the diameter of the deflector
which shall not exceed the longest wave length to be received. This fact makes
this devise at once unattractive for high quality sound recordings because big
enough structures would be very unhandy.
16
phone array to get better directivity characteristics of the system and to suppress
aliasing effects [20]. Other structures are needed with HRTF as it can be found
in [18, 9].
17
3.2.3 Arrays of Microphones
Three microphones principally may be arranged linearly, planarly or spatially. This
list allready covers all different types of microphone arrays. The specific arrangement
fitting to a given problem of course strongly depends on the problem itself and the
principle of sound source localization one would like to use. In principle every localiza-
tion technique could be implemented in combination with a microphone array. Having
more microphones generally brings more accuracy and stability to the system.
Planar arrays are the best-known microphone arrays since they are commonly used
for beamforming and often applied in wind channels to examine the formation of
noise on an aerodynamic structure. Applications on mobile robots are proposed
by Sasaki using different 32-channel arrays [22, 15]. Another approach having
microphones on three circles is proposed by Tamai [16]. As shown by Kneip [6],
in principle four planarly arranged microphones are needed to completely define
the sources position by geometrical calculations, neglecting the usual back-front
ambiguity.
Spatial arrays allow many different combinations of microphones and therefore can
simply be applied to all kind of techniques, then having the whole spectrum of
possibilities. Using an array of eight microphones in the corner of a cube for
example is proposed by Valin [4, 23, 24, 5, 8]. It makes an important difference
if the array is used to listen to a sound source in the far-field, in the near-field or
if the source is even withing the array. It can be stated that in general far-field
conditions were assumed when using a spatial array. There are approaches which
place microphones in the walls of a room to locate a sound within this array
[14, 25, 26].
Another difference can be found by looking at the structure the array is embedded in.
Normally, one only uses free-standing arrays. Having microphones placed in the walls
of a room principally can be stated as using an additional structure with an embedded
array. There are also approaches which use special structures to improve the arrays
directivity and aliasing characteristics (see page 15, subsection 3.1.3) as for example
proposed by Dedieu [20].
18
4 Mosquito Localization Problem
In this chapter an effort is made to characterize the mosquito localization problem
by giving a brief overview over the most important specifications. In a second part
different possibilities are discussed to choose the most adequat technique.
The main idea behind the mosquito localization problem is the quest for a possibility
to localize and annihilate a flying mosquito within a room. Assuming to have a laser
or another method with high enough accuracy to shoot the mosquito down, the main
focus is on the localization of the mosquito. To narrow down the field again, it will
only be looked for techniques that work passively on the sound of the flying mosquito.
The most important difficulty to be discussed is the difference between near- and
far-field applications. The natural mosquito auditory system normally works in the
near-field. The positive effect of being in the near-field is that signal-to-noise ratio
becomes high enough for artificial techniques to work, the negative one is that sound
characteristics1 strongly differ with the relative distance of the sound source to the
microphone since the wavelenght of the sound remains relatively constant.
19
source of sound. The sound of a flying mosquito can be simulated by band-limited
random noise in the range of 230 − 3100Hz [27]. Experience teaches that the sound
of a flying mosquito is a relatively quiet one. No precise values on the volume of a
mosquito could be found and it likely will vary with different mosquitos. Therefore, a
‘good’ volume is assumed.
For the localization system it is more simple to localize a static source than a moving
one. Therefore the sound source is assumed to be quasi-static. That means that during
computation time the sound source will not significantly move. These assumptions
shall be enough realistic to have a system that works in the end.
With the goal to achieve highest possible signal-to-noise ratio it makes sense to use
interfacial microphones to minimize the influence of reflected sound waves. This re-
quires planar walls which could be made as the bottom, two walls and the top of a
cube. Since exceptional high accuracy is needed, it makes sense to use highly sensitive
microphones like condensers. It is difficult to make the assumption having an environ-
ment that is quiet enough for our localization methods to work. In reality, the required
signal-to-noise ratios never could be achieved. With this problem it can best be dealt
with by using a beamforming technique in combination with a microphone array for
localization. Using a lot of microphones increases generally the systems stability and
accuracy. By beamforming an attenuation of the environmental noise can be achieved,
therefore, locally the signal-to-noise ratio can be increased. In one of his papers, Valin
presents a couple of experiments with an array that is able to localize a sound source
2
This corresponds to the usual 1/r2 -law for energy quantities.
3
Its dimensions depend on the frequency the system is tuned on.
20
within the near-field. It needs a small adaption on the sound source direction vector ~u
that in the near-field case has a norm that is smaller than unity and therefore must be
normalized [4].
The use of highly directional microphones like lobe microphones or parabolic micro-
phones is interesting to monitor permanently to one or several points within a defined
space. If a mosquito is detected it is immediately shot down by an appropriate device
like a laser beam or a water jet.
Finally, it can be stated that further experiments are necessary for a more precise
proposition on a solution to the mosquito localization problem. Most important of all is
to find an answer to the question: “How does a mosquito sound like?” shortly followed
by “What is the spectral developement of the mosquito sound with resepect to the
distance?”. Maybe it could also be interesting to investigate closer the male mosquito
audition system. It could be possible to extract ideas for artificial application for the
localization of very quiet sounds in a relatively noisy environment.
21
5 Conclusion
Passive sound localization techniques are an interesting and not yet exhausted field of
research. On one hand most applications use ITD sound source localization techniques
and are applied to far-field situations. Therefore this surely is the best known technique
and mostly combined with microphone arrays, due to higher accuracy and stability. On
the other hand techniques like HRTF, MD and specially ILD are not yet used often in
sound source localization. Further research is indeed important.
Signal processing allows to increase significantly the systems accuracy, stability or its
frequency selectivity. It is important to choose carefully the most appropriate filtering
processor to be applied on a given system.
A high variety of microphone characteristics and spatial arrangements completes the
list of possibilities. Principally, it would be possible to combine almost every technique
with every algorithm or microphone.
As soon as it comes to system design for a specific problem it is important to define
carefully the boundary conditions. Since these are not well-known for the mosquito
localization problem, it is very difficult to propose a really funded solution. Further
research or even experiments have to be done to get more detailed specifications on
this problem.
22
6 Bibliography
[1] AMIDA Augmented Multi party Interaction with Distance Access. State-of-the-
art overview — localization and tracking of multiple interlocutors with multiple
sensors. January 2006.
[4] J.-M. Valin, F. Michaud, J. Rouat, and D. Letourneau. Robust sound source
localization using a microphone array on a mobile robot. In Proc. IEEE/RSJ In-
ternational Conference on Intelligent Robots and Systems (IROS 2003), volume 2,
pages 1228–1233, 27–31 Oct. 2003.
[5] J.-M. Valin. Auditory System for a Mobile Robot. PhD thesis, University of Sher-
brooke - Faculte de genie, Genie electrique et genie informatique, august 2005.
[6] Laurel Kneip and Claude Baumann. Binaural model for artificial spatial sound
localization based on interaural time delays and movements of the interaural axis.
J. Acoust. Soc. Am., (124):3108–3119, November 2008.
[7] S.T. Birchfield and R. Gangishetty. Acoustic localization by interaural level dif-
ference. In Proc. IEEE International Conference on Acoustics, Speech, and Signal
Processing (ICASSP ’05), volume 4, pages iv/1109–iv/1112 Vol. 4, 2005.
[8] Jean-Marc Valin, Francois Michaud, and Jean Rouat. Robust localization and
tracking of simultaneous moving sound sources using beamforming and particle
filtering. Robotics and Autonomous Systems, 55(3):216 – 228, 2007.
[9] Diepold Keyrouz, Bou Saleh. A novel approach to robotic monaural sound local-
ization. page 5, May 2007.
[10] Ma Hao, Zhou Lin, Hu Hongmei, and Wu Zhenyang. A novel sound localization
method based on head related transfer function. In Proc. 8th International Confer-
ence on Electronic Measurement and Instruments ICEMI ’07, pages 4–428–4–432,
2007.
23
[11] C. Knapp and G. Carter. The generalized correlation method for estimation of time
delay. IEEE Transactions on Acoustics, Speech and Signal Processing, 24(4):320–
327, 1976.
[12] M.D. Gillette and H.F. Silverman. A linear closed-form algorithm for source lo-
calization from time-differences of arrival. 15:1–4, 2008.
[14] B. Mungamuru and P. Aarabi. Enhanced sound localization. Systems, Man, and
Cybernetics, Part B, IEEE Transactions on, 34(3):1526–1540, June 2004.
[15] Yoko Sasaki, Satoshi Kagami, and Hiroshi Mizoguchi. Multiple sound source map-
ping for a mobile robot by self-motion triangulation. Intelligent Robots and Sys-
tems, 2006 IEEE/RSJ International Conference on, pages 380–385, Oct. 2006.
[16] Y. Tamai, Y. Sasaki, S. Kagami, and H. Mizoguchi. Three ring microphone ar-
ray for 3d sound localization and separation for mobile robot audition. In Proc.
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS
2005), pages 4172–4177, 2005.
[17] I. McCowan and H. Bourlard. Microphone array post-filter for diffuse noise field.
1:905–908, 2002. IDIAP-RR 01-39.
[18] Fakheredine Keyrouz and Klaus Diepold. An enhanced binaural 3d sound local-
ization algorithm. In Proc. IEEE International Symposium on Signal Processing
and Information Technology, pages 662–665, 2006.
[22] Y. Sasaki, S. Kagami, and H. Mizoguchi. Main-lobe canceling method for multiple
sound sources localization on mobile robot. Advanced intelligent mechatronics,
2007 ieee/asme international conference on, pages 1–6, Sept. 2007.
24
[24] J.-M. Valin, J. Rouat, and F. Michaud. Enhanced robot audition based on micro-
phone array source separation with post-filter. In Proc. IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2004), volume 3, pages 2123–
2128, 28 Sept.–2 Oct. 2004.
[27] Daniel Robert Martin C. Göpfert, Hans Briegel. Mosquito hearing: Sound-induced
antennal vibrations in male and female aedes aegypti. The Journal of Experimental
Biology 202, pages 2727–2738, 1999.
[28] Atsuhi Ikeda et al. 2d sound source localization in azimuth & elevation from
microphone array by using a directional pattern of element. 2007.
[31] W. M. Hartmann. Localization of sound in rooms. J. Acoust. Soc. Am., 74, 1983.
[32] Jack Hebrank and D. Wright. Spectral cues used in the localization of sound
sources on the median plane. The Journal of the Acoustical Society of America,
56(6):1829–1834, 1974.
[33] J.M. Loomis. Some research issues in spatial hearing. In Proc. IEEE ASSP Work-
shop on Applications of Signal Processing to Audio and Acoustics, pages 67–71,
1995.
[34] Yong Rui and D. Florencio. New direct approaches to robust sound source local-
ization. Multimedia and Expo, 2003. ICME ’03. Proceedings. 2003 International
Conference on, 1:I–737–40 vol.1, July 2003.
[36] Sampo Vesa. Sound source distance learning based on binaural signals. In Proc.
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics,
pages 271–274, 2007.
25
[37] H.-W. Wei and S.-F. Ye. Comments on a linear closed-form algorithm for source
localization from time-differences of arrival. 15:895–895, 2008.
26