Professional Documents
Culture Documents
Original PDF Signal Processing in Auditory Neuroscience Temporal and Spatial PDF
Original PDF Signal Processing in Auditory Neuroscience Temporal and Spatial PDF
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or
mechanical, including photocopying, recording, or any information storage and retrieval system, without
permission in writing from the publisher. Details on how to seek permission, further information about
the Publisher’s permissions policies and our arrangements with organizations such as the Copyright
Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/
permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher
(other than as may be noted herein).
Notices
Practitioners and researchers must always rely on their own experience and knowledge in evaluating
and using any information, methods, compounds or experiments described herein. Because of rapid
advances in the medical sciences, in particular, independent verification of diagnoses and drug
dosages should be made. To the fullest extent of the law, no responsibility is assumed by Elsevier,
authors, editors or contributors for any injury and/or damage to persons or property as a matter of
products liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
The author would like to thank: Manfred Schroeder who joined a course for practice. The author’s response was
encouraged him to undertake a series of works since that he had just received the degree under Takeshi Itoh,
1970 including concert hall acoustics. M. Schroeder was Professor of Waseda University in 1975 and would not
Director of Drittes Physikalisches Institut, Georg- have time to perform his own study in parallel.
August-Univ. Goettingen and wrote a recommenda- The author would also like to extend his gratitude
tion letter to Alexander von Humboldt Foundation, toward Hans Werner Strube, who assisted the author
supporting the author to perform the investigations in with an initial investigation into speech recognition for
1975e77. The author would like to share a memorable about 3 weeks in 2011 and gave precise and precious
episode from a seminar at the institute when after a comments. Yoshiharu Soeta, who gave reprints of
presentation on subjective preference and the initial papers on evoked MEG for each factor extracted from
time delay gap between the direct sound and the first ACF/IACF. Marianne Jõgi, who improved English
reflection in relation to the effective duration (se) of usage.
ACF, Schroeder pointed out that the value of 1/s1 could Christa Trepte, a long-time friend, who provided
be the pitch (missing fundamental). This moment was room and board in summer of 2011 (Photo P1).
the very beginning of investigations into speech by the Ando, Y. Architectural Acoustics, Blending Sound
author. Later, Frau Edith Kuhfuss, Head of Secretaries Sources, Sound Fields, and Listeners. AIP Press/
called the author to her office and explained that the Springer-Verlag, New York (1998). In part with written
institute may offer a doctorate degree (in physics) if he permission of the publisher.
Photo P1 Crista Trepte (left) and the author (right) together with Keiko enjoyed tea in the garden on a summer day in 2011,
which was arranged by her husband Hainer Trepte during his life. We all became instant friends when we first met at Rohn-
sterrassen in 1975. Photographed by Keiko Ando.
vii
This page intentionally left blank
Preface
According to neural evidences and cerebral hemisphere width Wf(0) (spectral tilt), s1 (voice pitch period), f1
specialization related to subjective preference, we have (voice pitch strength), se (effective duration of the ACF
previously proposed a signal-processing model of the envelope, temporal repetitive continuity/contrast),
auditory brain system. Based on the model we have (se)min (segment duration), and Df1/Dt (the rate of pitch
thoroughly described primary percepts by autocorre- strength change, related to voice pitch attack-decay dy-
lation features as a basis for phonetic and syllabic namics). Times at which ACF effective duration se is
distinctions. These features have emerged from the minimal reflect rapid signal pattern changes that usefully
theory of subjective preference of the sound field that demarcate segmental boundaries. Results suggest that
was originally developed for architectural acoustics. vowels, CV syllables, and phrases can be distinguished on
Correlation-based auditory temporal features extracted the basis of this ACF-derived feature set, whose neural
from monaural autocorrelation are used to predict correlates lie in population-wide distributions of all-
perceptual attributes such as pitch, timbre, loudness, order interspike intervals in early auditory stations.
duration, and coloration of reflected sound. It is worth On the other hand, spatial factors extracted from the
noticing that, for example, the missing fundamental interaural cross-correlation function (IACF) represent
phenomenon derived from spectral analysis has for a the binaural listening level (LL; mainly binaural loud-
long time remained mysterious, whereas it is described ness), the amplitude of IACF: IACC (subjective diffuse-
by the factor s1 extracted from the running autocorre- ness), binaural time delay sIACC (localization in the
lation function (ACF) of source signals in very simple horizontal plane), and the width of IACC: WIACC
terms. (apparent source width, ASW). It is obvious that quality
This study investigates the use of features of monaural of musical sound in a concert hall and speech in a
ACFs for representing phonetic elements (vowels), syl- conference room can be comprehensively described by
lables (/ka/, /sa/, /ta/, /na/, /ha/, /ma/, /ya/, /ra/, /wa/ (CV these temporal and spatial factors. Every result obtained
pairs)), and phrases using a small set of temporal factors from these scientific procedures is always “simple and
extracted from the short-term running ACF. These factors beautiful” without any complicated mathematical
include listening level (loudness), zero-lag ACF peak calculation.
ix
This page intentionally left blank
Foreword
It has been more than 30 years since the publication of formulation of the relationship between the envelope
Yoichi Ando’s “Concert Hall Acoustics” (CHA). The of the autocorrelation function and the most preferable
current book describes Ando’s research since CHA was time delay of an early reflection.
published, taking a neuroscientific approach to human Decaying envelopes are familiar to acousticians in
preferences regarding sound. terms of reverberation, which is another dimension in
The book provides readers with an explanation of Ando’s four-dimensional space.
the present status of concert hall acoustics, which has Responses to a sound source can be formulated by
been well formulated via both physiological and psy- the convolution of the source waveform and the im-
chological research. Recent developments in the theo- pulse response from the source to the listening position
retical understanding of sound preferences can enable in the sound field.
new approaches to speech recognition and the design The impulse response yields the reverberation in
of hearing aids. the field as an almost exponentially decaying envelope
As described in Manfred Schroeder’s foreword in of its waveform.
the original version of CHA, much scientific research in The decaying property of the reverberation formu-
the 1970s and 1980s focused on improving the lates the signal dynamics of traveling sound in the
acoustics of concert halls from the perspective of space by convolution, such as the onset and offset
binaural hearing science. Based on this research, Ando portions, and ongoing envelopes.
developed the notion of the “four-dimensional world,” In contrast, the decaying envelope of the autocor-
which describes the most significant factors for concert relation function shows the duration or fluctuation of
hall acoustics in terms of preferences for transmitted the periodic properties of sound itself because purely
sound from a source to the listener’s ears. periodic signals are rarely encountered in daily life.
The four dimensions can be summarized as follows: Interestingly, the effective duration, which is when
the autocorrelation envelope of a sound decreases to
1. total sound energy, 0.1, referred to as “tau_e” in Ando’s terms, provides the
2. reverberation, most preferable delay time of an early reflection of the
3. delay time of early reflections, and sound.
4. spatial-binaural (interaural) cross-correlation. The sound-enhancing interference between direct
and reflected sound can be predicted under this crite-
These dimensions imply that the qualities of sound
rion, subject to the fusing of direct and reflected sounds
fields cannot be represented by a single number. Rather,
into a single sound image. Ando’s criterion for the most
sound dimensions can be understood from a subjective
preferred delay time of an initial reflection, which is
point of view, such as via listeners’ preferences.
formulated by the envelope of the autocorrelation
The most preferable delay time of early reflections
function, makes it possible to design an acoustic space
of sound depends on the properties of the sound
that is adapted to a particular purpose.
source, which can be expressed by the autocorrelation
Spatial correlations exhibit the periodic character-
functions of a signal.
istics of sound fields, regarding spatial coordinates
The autocorrelation function mechanism proposed
rather than the time domain. Sinc-functions of spatial
in the central auditory signal-processing model that
coordinates play an important role in spatial cross-
provides an estimate of the period of a signal. For
correlation functions, as well as those of the time
example, the fundamental frequency or pitch (the in-
domain for autocorrelation functions. The interaural
verse of the estimated period) is thus not necessarily
cross-correlation function of a pair of binaural sounds
contained in the spectrum of the signal itself. One of
perceived by a listener, defined in the spatial-time
the most interesting aspects of Ando’s work is the
xi
xii FOREWORD
domain, is a new dimension in concert hall acoustics The theoretical and experimental issues are well
proposed by Ando. Subjective diffuseness is a funda- organized, from introductory matters such as basic
mental notion representing listeners’ spatial and tem- terminology and a brief historical review, to explana-
poral impressions in response to sound fields where tions of recent research. As such, the book is accessible
the spatial correlation, in principle, follows the sinc- to a broad audience.
function. Ando’s preference theory focused concert hall
Interestingly, dissimilarity of the binaural sound acoustics research on the qualities of reverberation and
pair is necessary for preferable sound fields. The dif- sound fields, such as studies of delay time for early
ference in a pair of binaural signals cannot be perceived reflections and subjective diffuseness for reverberating
in monaural listening, and the dissimilarity is yielded sound fields.
by the time- or phase-difference between the binaural Sound is an informative phenomenon and is very
sound pair. much qualitative in nature.
The cross-correlation function is sensitive to such The theoretical understanding of the qualities of
differences, whereas the autocorrelation function is sound represents an important subject for sound-based
independent of the phase information of signals. communication, which is an important aspect of daily
Received sound waves can be represented by the life cross-culturally.
superposition of direct and reverberated sound. The The research into sound preferences examined in
energy ratio of direct sound followed by early re- this book involves an exploration of the science of
flections and reverberated sound is a conventional qualities, which is a significant issue for modern
parameter representing the sound field from a subjec- science.
tive point of view because the ratio is a function of the Readers will appreciate this well-written book, so
distance from the source and its directivity. that readers find appropriate theme for their future
In contrast, interaural correlations are determined project.
by the time-structures of binaural sound pairs, which
differs from the energy criterion. Mikio Tohyama
Ando developed the theory of acoustic space Fujisawa, Japan
design, including concert hall design, based on his own
neuroscientific research over the past 30 years. January 2018
Contents
xiii
xiv CONTENTS
Introduction
When designing enclosures (or any environment) for perception of diffuseness. Two years later, in 1974, a
spoken communication, the acoustic properties of comparative study of European concert halls performed
the sound field should be taken into account. This vol- by Schroeder, Gottlob, and Siebrasse, showed that the
ume describes the human hearing system and the IACC was the most important factor in the incipient
possible auditory mechanisms responsible for the rise subjective preference reactions that established a
of subjective preference. Because subjective preference consensus among individuals.
is a primitive response that steers the judgment and In early 1975 at Kobe University, we observed a su-
behavior of the organism in the direction of maintain- perior sound field with a speech signal that was
ing life, we investigated corresponding cerebral activity achieved when adjusting the horizontal direction and
in the slow vertex response (SVR) and brain waves in the delay time of a single reflection. A loudspeaker in
electroencephalogram (EEG) and magnetoencephalog- front of a single listener reproduced the direct sound.
raphy (MEG). The results suggest an auditory signal- The angle of the single reflection was about 30 degrees
processing model that yields primary percepts and a in the horizontal plane measured from the front, the
theory of subjective preference for the sound field.1-4 delay time was about 20 ms, and the amplitude was
The temporal and spatial primary percepts may be the same as that of the direct sound.6 These working hy-
well described by temporal factors extracted from the potheses were reconfirmed in the fall of 1975 while the
running autocorrelation function (ACF) and spatial fac- author was an Alexander-von-Humboldt Fellow in
tors extracted from the interaural cross-correlation func- Goettingen.7,8 We were also able to explain the percep-
tion (IACF), respectively. These mechanisms are the tion of coloration produced by the single reflection in
bases for automatic speech recognition and the design terms of the envelope of the ACF of the source signal.9
of hearing aids in the context of this volume. In 1983, a method of calculating subjective prefer-
ence at each seat in a concert hall was described by
four orthogonal factors of the sound field.10 Soon after,
AUDITORY TEMPORAL AND SPATIAL a concert hall design theory was formulated based on a
FACTORS model of the auditory system. We assumed that some
For the past five decades, we have pursued a theory of aspects depended on the auditory periphery (the ear),
architectural acoustics based on acoustics and auditory while others depended on processing in the auditory
theory. In the summer of 1971, I visited the III Physics central nervous system.11 The model takes into account
Institute at the University of Goettingen, where Manfred both temporal factors and spatial factors that determine
R. Schroeder encouraged me to investigate the aspects of the subjective preference for sound fields.1 The model
spatial hearing that are most relevant to the design of consists of a monaural ACF mechanism and an IACF
concert halls. Peter Damaske and I were, at that time, mechanism for binaural processing. These two repre-
interested in explaining the subjective diffuseness of sentations are used to describe monaural temporal
sound fields using the IACF. The maximum magnitude and binaural spatial hearing operations that we pre-
interaural cross-correlation (IACC) of this function is an sume to be taking place at several stations in the audi-
indication of the level of subjective diffuseness of a tory pathway, from the auditory brainstem to the
given sound field perceived by an individual due to hemispheres of the cerebral cortex.
binaural effects. We reproduced sounds in a room using Special attention was given to computing optimal
a multiple-loudspeaker reproduction system and individual preferences by adjusting the weighting coef-
recorded the signals at the two ears of a dummy ficients of four orthogonal factors (two temporal factors
head.5 Because the IACC was known to be an important and two spatial factors), which were used to determine
determinant in the horizontal localization of sounds, the most preferred seating position for each individual
we also believed it to be significant in subjective in the room.2 “Subjective preference” is important to
1
2 Signal Processing in Auditory Neuroscience
us for philosophical and aesthetic reasons as well as for In our laboratory, we have found neural correlates of
practical, architectural acoustics reasons. We consider spatial hearing in the left and right auditory brainstem
preference as the primitive response of a living creature responses (ABRs). Here, the maximum neural activity
that directs its judgment and behavior in the pursuit of (wave V) corresponds to IACC, that is, the magnitude
maintaining lifedof body, of mind, and of personal- of the IACF.24 Also, wave IV for the left and right side
ity.12 Thus, neural evidence obtained could be used to brainstem responses (IVl,r ) nearly corresponds to the
identify the auditory system’s signal-processing model. sound energies at the right- and left-ear entrances.
SVRs are averaged auditory-evoked responses computed
from scalp EEG signals. We carried out a series of exper-
CORRELATION MODEL FOR TEMPORAL iments aimed at developing correlations between brain
AND SPATIAL INFORMATION activity, measurable with the SVR and the EEG, and sub-
PROCESSING jective sound field preference. Subjective sound field
To develop a theory of temporal and spatial hearing for preference is well described by four orthogonal acoustic
room acoustics that is grounded in the human auditory factors, two temporal and two spatial. The two temporal
system, we attempted to learn how sounds are repre- factors are (1) the initial time delay gap between the
sented and processed in the cochlea, the auditory nerve, direct sound and the first reflection (Dt1), and (2) the
and in the two cerebral hemispheres. Once effective subsequent reverberation time (Tsub). The two spatial
models of auditory processing are developed, designs factors are (1) the listening level and (2) the maximum
for concert halls can proceed in a straightforward magnitude of the IACF (IACC). The SVR- and EEG-
fashion, according to guidelines derived from the based neural correlates of the two temporal factors are
model.1 In addition, understanding the basic opera- associated with the left hemisphere, whereas the two
tions of the auditory system may lead to a new genera- spatial factors are associated with the right hemisphere.
tion of automatic systems for recognizing speech,4 Higher in the auditory pathway, we reconfirmed by
analyzing music,2 and identifying environmental noise MEG that the left cerebral hemisphere is associated
and its subjective effects.13 In more general terms, the with the delay time of the first reflection Dt1. We also
first book on a brain-grounded theory of temporal found that the duration of coherent alpha wave activity
and spatial design in architecture and the environment (effective duration of the ACF of the MEG response
was published in 2016.14 signal) directly corresponds to how much a given stim-
It is remarkable that the temporal discharge pat- ulus is preferred, that is, the scale value of individual
terns of neurons at the level of the auditory nerve subjective preference.25,26 The right cerebral hemi-
and brainstem include sufficient information to effec- sphere was activated by the typical spatial factor, that
tively represent the ACF of an acoustic stimulus. Mech- is, the magnitude of IACC.27 The information corre-
anisms for the neural analysis of interaural time sponding to subjective preference of sound fields was
differences through neural temporal cross-correlation found in the effective duration of the ACF of the alpha
operations and for analysis of stimulus periodicities waves of both EEG and MEG recordings. The repetitive
through neural temporal autocorrelations were pro- feature of the alpha wave as measured in its ACF was
posed over half a century ago.15-17 Since then, many observed at the preferred condition. This evidence can
electrophysiological studies based on single neurons be pragmatized by applying the basic theory of subjec-
and neural populations have more clearly elucidated tive preference to music and speech signals for each in-
the neuronal basis for these operations. Binaural dividual’s preference.2
cross-correlations are computed by axonal tapped We also investigated temporal aspects of sensation,
delay transmission lines that feed into neurons in the such as pitch or the missing fundamental,28 loudness,29
medial superior nucleus of the auditory brainstem timbre,30 and the duration of sensation.31 These are
and act as coincidence detectors.18 If one examines well described by the temporal factors extracted from
the temporal patterning of discharges in the auditory the ACF.32,33 Remarkably, the temporal factors of
nerve,19 one immediately sees the basis for a robust sound fields such as Dt1 and Tsub are associated with
time-domain representation of the acoustic stimulus. left hemisphere responses.24,34-37 Typically, aspects of
Here, the stimulus autocorrelation is represented sound fields involving spatial percepts such as subjec-
directly in the interspike interval distribution of the tive diffuseness38 and the apparent source width
entire population of auditory nerve fibers.20,21 This (ASW) have been investigated,32,39 which are associated
autocorrelation-like neural representation subserves with right hemisphere responses.1,2,26,35,40,41 Tests on
the perception of pitch and tonal quality (aspects of dissimilarity judgments were conducted in an existing
timbre based on spectral contour).22,23 hall in relation to both temporal and spatial factors.42
CHAPTER 1 Introduction 3
Features of the IACF correspond to binaural percep- the listener is especially important in the presence of
tual attributes of binaural listening level, sound direc- other speakers and surrounding noise, both of which
tion, ASW, and subjective diffuseness (envelopment). are conditions that may separate the perceived signal
Neural correlates of ACF-related monaural percepts, from the target voice due to the IACF mechanism.
which we called “temporal percepts,” were observed in Because individual differences of subjective prefer-
electrical and magnetic neural responses over the left ce- ence in relation to IACC of spatial factors are very small
rebral cortical hemisphere, whereas those of binaural (nearly everyone has the same basic preferences), we
IACF-related percepts, which we called “spatial per- can determine the architectural spatial form of a room
cepts,” were observed over the right hemisphere.3 The by first taking common preferences into account. The
correlational features of primary importance to speech temporal factors, which involve successive delays pro-
recognition lie in the monaural ACF, which could duced by sets of reflective surfaces, are closely related
help suppress environmental noise, and in the IACF, to the dimensions of a specific room/concert hall. These
which could help suppress unwanted conversations of dimensions can be altered to optimize the space due to
other people (a spatial attribute that can largely be the minimum effective duration of the running ACF for
ignored in telephone communication). specific types of sounddmusic, such as organ music,
chamber music, or choral works, or speech.2
9. Ando Y, Alrutz H. Perception of coloration in sound fields field. J Temporal Des Architect Environ. 2003;3:60e69.
in relation to the autocorrelation function. J Acoust Soc Am. http://www.jtdweb.org/Journal/.
1982;71:616e618. 28. Inoue M, Ando Y, Taguti T. The frequency range applicable
10. Ando Y. Calculation of subjective preference at each seat in to pitch identification based upon the auto-correlation
a concert hall. J Acoust Soc Am. 1983;74:873e887. function model. J Sound Vib. 2001;241:105e116.
11. Ando Y. Investigations on cerebral hemisphere activities 29. Sato S, Kitamura T, Sakai H, Ando Y. The loudness of
related to subjective preference of the sound field, pub- “complex noise” in relation to the factors extracted from
lished for 1983e2003. J Temporal Des Archit Environ. the autocorrelation function. J Sound Vib. 2001;241:
2003;3:2e27. 97e103.
12. Ando Y. On the temporal design of environments. 30. Hanada K, Kawai K, Ando Y. A study of the timbre of an elec-
J Temporal Des Architect Environ. 2004;4:2e14. http:// tric guitar sound with distortion. Proceedings of the 3rd Interna-
www.jtdweb.org/journal/. tional Symposium on Temporal Design. Guangzhou: J South
13. Soeta Y, Ando Y. Neurally Based Measurement and Evaluation China University of Technology (Natural Science Edition);
of Environmental Noise. Tokyo: Springer; 2015. 2007:96e99.
14. Ando Y. Brain-Grounded Theory of Temporal and Spatial 31. Saifuddin K, Matsushima T, Ando Y. Duration sensation
Design in Architecture and the Environment. Tokyo: Springer; when listening to pure tone and complex tone.
2016. J Temporal Des Architect Environ. 2002;2:42e47. http://
15. Jeffress LA. A place theory of sound localization. J Comp www.jtdweb.org/Journal/.
Physiol Psychol. 1948;41:35e39. 32. Ando Y, Sakai H, Sato S. Formulae describing subjective at-
16. Licklider JCR. A duplex theory of pitch perception. tributes for sound fields based on a model of the auditory-
Experientia. 1951;VII:128e134. brain system. J Sound Vib. 2000;232:101e127.
17. Cherry EC, Sayers BMA. “Human ‘cross-correlator’”da 33. Ando Y. Correlation factors describing primary and spatial
technique for measuring certain parameters of speech sensations of sound fields. J Sound Vib. 2002;258:405e417.
perception. J Acoust Soc Am. 1956;28:889e895. 34. Ando Y, Kang SH, Morita K. On the relationship between
18. Colburn S. Computational models of binaural processing. auditory-evoked potential and subjective preference for
In: Hawkins H, McMullin T, Popper AN, Fay RR, eds. Audi- sound field. J Acoust Soc Jpn (E). 1987;8:197e204.
tory Computation. New York: Springer-Verlag; 1996. 35. Ando Y. Evoked potentials relating to the subjective prefer-
19. Secker-Walker HE, Searle CL. Time domain analysis of ence of sound fields. Acustica. 1992;76:292e296.
auditory-nerve-fiber firing rates. J Acoust Soc Am. 1990;88: 36. Ando Y, Chen C. On the analysis of the autocorrelation
1427e1436. function of a-waves on the left and right cerebral hemi-
20. Cariani PA, Delgutte B. Neural correlates of the pitch of spheres in relation to the delay time of single sound
complex tones. I. Pitch and pitch salience. J Neurophysiol. reflection. J Architect Plann Environ Eng, AIJ. 1996;488:
1996a;76:1698e1716. 67e73.
21. Cariani PA, Delgutte B. Neural correlates of the pitch of 37. Chen C, Ando Y. On the relationship between the autocor-
complex tones. II. Pitch shift, pitch ambiguity, phase- relation function of the a-waves on the left and right cere-
invariance, pitch circularity, and the dominance region bral hemispheres and subjective preference for the
for pitch. J Neurophysiol. 1996b;76:1717e1734. reverberation time of music sound field. J Architect Plann
22. Meddis R, O’Mard L. A unitary model of pitch perception. Environ Eng, AIJ. 1996;489:73e80.
J Acoust Soc Am. 1997;102:1811e1820. 38. Ando Y, Kurihara Y. Nonlinear response in evaluating the
23. Cariani P. Temporal coding of periodicity pitch in subjective diffuseness of sound field. J Acoust Soc Am. 1986;
the auditory system: an overview. Neural Plast. 1999;6: 80:833e836.
147e172. 39. Sato S, Ando Y. On the apparent source width (ASW) for
24. Ando Y, Yamamoto K, Nagamatsu H, Kang SH. Auditory bandpass noises related to the IACC and the width of
brainstem response (ABR) in relation to the horizontal the interaural cross-correlation function (WIACC). J Acoust
angle of sound incidence. Acoust Lett. 1991;15:57e64. Soc Am. 1999;105:1234.
25. Soeta Y, Nakagawa S, Tonoike M, Ando Y. Magnetoence- 40. Ando Y, Hosaka I. Hemispheric difference in evoked po-
phalographic responses corresponding to individual sub- tentials to spatial sound field stimuli. J Acoust Soc Am.
jective preference of sound fields. J Sound Vib. 2002;258: 1983;74(S1):S64eS65(A).
419e428. 41. Ando Y, Kang SH, Nagamatsu H. On the auditory-evoked
26. Soeta Y, Nakagawa S, Tonoike M, Ando Y. Spatial analysis potentials in relation to the IACC of sound field. J Acoust
of magnetoencephalographic alpha waves in relation to Soc Jpn (E). 1987;8:183e190.
subjective preference of a sound field. J Temporal Des 42. Hotehama T, Sato S, Ando Y. Dissimilarity judgments in
Architect Environ. 2003;3:28e35. http://www.jtdweb.org/ relation to temporal and spatial factors for the sound fields
Journal/. in an existing hall. J Sound and Vib. 2002;258:429e441.
27. Sato S, Nishio K, Ando Y. Propagation of alpha waves cor- 43. Ando Y. Concert hall acoustics based on subjective prefer-
responding to subjective preference from the right hemi- ence theory. In: Rossing TD, ed. Springer Handbook of
sphere to the left with change in the IACC of a sound Acoustics. New York: Springer-Verlag; 2007 [Chapter 10].
CHAPTER 2
PHYSICAL SYSTEMS OF THE HUMAN EAR of a transfer function from a sound source in front of
Head, Pinna, and External Auditory Canal the listener to the eardrum is shown in Fig. 2.3. This cor-
An acoustic signal is perceived by the ears, in which a responds to direct sound when the listener is facing a
sound signal is given by a time sequence. The three- speaker. The transfer functions obtained by three re-
dimensional space is also perceived by the two ears. ports1-3 are not significantly different for frequencies
This is due mainly to the head-related transfer functions up to 10 kHz.
Hl,r(r/r0, u) between a source point and the two ear
entrances, which have directional qualities based on Eardrum and Bone Chain
the shape of the head and the pinna system. The direc- Behind the eardrum are the tympanic cavities, which
tional information is contained in such head-related contain the three auditory ossicles: the malleus, incus,
transfer functions, including the interaural time differ- and stapes. This area is called the middle ear (Fig. 2.4).
ence and the interaural amplitude difference.
Fig. 2.1 shows examples of the amplitude of the
head-related transfer function H(x,h,u) as parameters Amplitude
of the vertical angle of incidence h (x ¼ 0), x being
10 dB η
the horizontal angle. These were measured by the
single-pulse method in far-field conditions.1 The angle 9°
h ¼ 0 , x ¼ 0 corresponds to the frontal direction 9°
and h ¼ 90 to the upper direction. In this situation, 27°
the time difference between two ears is sIACC ¼ 0 , 27°
and the only difference is the angular frequency charac- 45° 45°
teristic, u.
Because the diameter of the external canal is small 63° 63°
enough compared to the wavelength below 8 kHz, the 85°
transfer function of the entrance canal E(u) is indepen- 85°
⏐H(ξ,η,ω)⏐
5
6 Signal Processing in Auditory Neuroscience
20
Amplitude
[dB]
10
| E(ω) |
0
–10
0.2 0.5 1 2 5 10 15
Frequency [kHz]
FIG. 2.2 Transfer functions of the ear canal. ([————], From Wiener FM, Ross DA. The pressure distribution
in the auditory canal in a progressive sound field. J Acoust Soc Am. 1946;18:401e408; [......], From Shaw
[1974]; [____], From Mehrgardt S, Mellert V. Transformation characteristics of the external human ear. J Acoust
Soc Am. 1977;61:1567e1576.)
20 5.1
[dB]
Amplitude 1.2
10 2.8
4.4
5.9
⏐H(ω) E(ω)⏐
0
7.5
5.1
–10 8.2
–20 3.5
0.2 0.5 1 2 5 10 15
4.4
Frequency [kHz]
FIG. 2.5 Contour lines of equal amplitude of human
FIG. 2.3 Transfer functions to the eardrum from a sound eardrum vibration at 525 Hz (121 dB SPL). Each value
source in front of the listener. ([————], From Wiener FM, should be multiplied by 105 cm. (From Tonndorf J, Khanna
Ross DA. The pressure distribution in the auditory canal in a SM. Tympanic-membrane vibrations in human cadaver ears
progressive sound field. J Acoust Soc Am. 1946;18: studied by time-averaged holography. J Acoust Soc Am.
401e408; [......], From Shaw [1974]; [____], From Mehrgardt S, 1972;52:1221e1233.)
Mellert V. Transformation characteristics of the external
human ear. J Acoust Soc Am. 1977;61:1567e1576.) The sound pressure striking the eardrum is transduced
into vibration. The middle-ear ossicles transmit the
vibration to the cochlea. The vibration pattern of the
Semicircular canal human eardrum was first measured by Bekesy,4 who per-
Incus
formed a point-by-point examination with an electric
Malleus Footplate or stapes capacitive probe. Later, Tonndorf and Khanna5
in oval window measured the vibration pattern by time-averaged holog-
raphy, which allows perception of finer vibration
Cochlear
External canal patterns on the eardrum, as shown in Fig. 2.5. Note
nerve
that the outline of the malleus is visible in the pattern
Cochlea at the amplitude value of 3.5. The vibration on the
Eardrum malleus is transmitted to the incus and the stapes.
Round The transfer function C(u) of the human middle ear
window (three auditory ossicles) between the sound pressure at
Auditory tube the eardrum and the apparent sound pressure on the
FIG. 2.4 Schematic illustration of the human ear. (Modified cochlea is plotted in Fig. 2.6. Data were obtained by
from Dorland, W.A.N. (1947). The American Illustrated Onchi6 and Rubinstein7 from cadavers. The values
Medical Dictionary. 24th ed. Saunders, Philadelphia.) have been rearranged by the Ando.8
CHAPTER 2 Human Hearing System 7
Sound
source
Lateral
Posterior
Sound
delivery
tube
Microphone
Knowles EK-3133
Brass
coupling Static
rings Acoustic
pressure
resistor
vent
PEC Nylon
screen Cotton
Enlarged fiber
Metal
Eustachian
screen
tune Dental
TM acrylic
Flush
outlet
Pv Mastoid
5 mm
Cochlear
turns Enlarged
internal Flush
Hydro- auditory
pressure inlet
canal
transducer
FIG. 2.7 Measurement system of middle ear transfer function. To measure the inner-ear pressure, a
hydropressure transducer was placed in the vestibule facing the stapes. In order to ensure that the cochlea
remains fluid filled during the measurement, an inlet flush tube was cemented into the superior semicircular
canal and an outlet flush tube was cemented into the apical turn of the cochlea. (Puria S, Rosowski JJ, Peake
WT. Middle-ear pressure gain in humans: preliminary results. In: Duifhuis H, Horst JW, Dijk P, Netten SM, eds.
Proceedings of the International Symposium on Biophysics of Hair Cell Sensory Systems. Singapore: World
Scientific; 1993:345e351.)
8 Signal Processing in Auditory Neuroscience
30
20
⏐C(ω)⏐
Sensitivity of ear
⏐S(ω)⏐
10
10 dB
0
0.2 0.5 1 2 5 10
Frequency [kHz]
FIG. 2.8 Transfer function of the human middle ear between
0.2 0.5 1 2 5 10
the sound pressure at the eardrum and the inner-ear
pressure. The global behavior is surprisingly similar to that in Frequency [kHz]
Fig. 2.6. (Puria S, Rosowski JJ, Peake WT. Middle-ear FIG. 2.9 Sensitivity of the human ear to a sound source in
pressure gain in humans: preliminary results. In: Duifhuis H, front of listeners. (____), Normal hearing threshold (ISO
Horst JW, Dijk P, Netten SM, eds. Proceedings of the recommendation); (......), reexamined in the low-frequency
International Symposium on Biophysics of Hair Cell Sensory range. (C, B), Transformation characteristics between the
Systems. Singapore: World Scientific; 1993:345e351.) sound source and the cochlea, S(w) ¼ H(w)E(w)C(w); (C),
data obtained from measured values C(w) by Onchi (1961),
and (B), from Rubinstein (1966), which are combined with the
transfer function H(w)E(w) measured by Mehrgardt and
Mellert (1977). (Berger EH. Re-examination of the low-
For the usual sound field, the transfer function
frequency (50e1000 Hz) normal threshold of hearing in free
between a sound source located in front of the listener and diffuse sound fields. J Acoust Soc Am. 1981;70:
and the cochlea can be represented by 1635e1645.)
SðuÞ ¼ Hð0; 0;uÞEðuÞCðuÞ. (2.1)
The values are plotted in Fig. 2.9 based on cadaver data
from Onchi6 and Rubinstein7. The pattern of the trans- AUDITORY BRAINSTEM RESPONSES
fer function agrees with ear sensitivity for people with IN AUDITORY PATHWAYS
normal hearing ability, so ear sensitivity can be charac- A possible mechanism for a spatial factor is the
terized primarily by the transfer function from the free maximum value of the interaural cross-correlation func-
field to the cochlea Zwislocki10. Better agreement can tion (IACC), between sound signals arriving at the two
be obtained by reexamining the values in the low- ear entrances for percepts of localization, apparent source
frequency range Berger11. width (ASW), and subjective diffuseness. The left and
right auditory-brainstem responses (ABRs) were recorded
The Cochlea to justify such a mechanism for spatial information
The stapes, which is the smallest bone in the human Ando, Yamamoto, Nagamatsu and Kang12.
body, is also the last of the three auditory ossicles. It is
connected to the oval window, and drives the fluid in Auditory-Brainstem Response Recording
the cochlea, producing a traveling wave along the basilar and Flow of Neural Signals
membrane. The cochlea contains the sensory receptor As a source signal p(t), a short-pulse signal (50 ms) was
organ on the basilar membrane, which transforms the supplied to a loudspeaker with frequency characteristics
fluid vibration into a neural code, as shown in for 100 Hz to 10 kHz, 3 dB. This signal was repeated
Fig. 2.10. The basilar membrane is so flexible that each every 100 ms for 200 seconds (2000 times) to be inte-
section can move independently of the neighboring grated simultaneously, and left and right ABRs were
section. The traveling waves on the basilar membrane recorded through electrodes placed on the vertex
observed by Bekesy4 and shown in Fig. 2.11A and B, and left and right mastoids. The distance between
are consistent with this representation. loudspeakers and the center of the head was kept at
CHAPTER 2 Human Hearing System 9
z Scala vestibuli
x
Reissner’s
membrane Scala media
Tectorial membrane
Outer hair
cells
Spiral
ganglion Inner hair Basilar
cell membrane
axon Nerve fiber
Scala tympani
Capsule of
gang. cell
FIG. 2.10 Cross section through the cochlea showing the fluid-filled canals and the basilar membrane
supporting hair cells. (Modified from Rasmussen AT. Outline of Neuro-Anmy. 3rd ed. Dubuque, IA: William
C. Brown; 1943.)
20 25 30 35
A Distance from stapes [mm]
π
Δφ = · 200 Hz
2
Amplitude
20 22 24 26 28 30 32
Left ABR subject : MR Right ABR shown in Fig. 2.13B; l > r for x ¼ 60 and 90 (P <
ξ .05). The behavior of wave III, shown in Fig. 2.13C, is
ξ: similar to that in wave I; r > l for x ¼ 30 e 120
0° (P < .01). This tendency is again reversed in wave IV
as shown in Fig. 2.13D; l > r for x ¼ 60 e 150 (P <
30° .05), and this is maintained further in wave VI as shown
in Fig. 2.13F, even though absolute values are ampli-
60°
IVV
I I VII fied; l > r for x ¼ 60 e 150 (P < .05). From this evi-
Amplitude of ABR