William D'Angelo, Mark Ericson '

Eric Scarborough, Steven Rogers, Philip Amburn, and Dennis Ruck *
1 Armstrong Laboratory
2 Air Force Institute of Technology
Wright-Patterson Air Force Base, OH
ABSTRACT however the sounds presented from
behind were perceived to be in
Relative auditory distance front. Side sounds appeared to be
perception of a direct path signal biased towards the front when
by itself, with synthetic presented at the farther distances.
reflection(s) , and with Virtual audio cues were shown to be
reverberation, presented over effective in creating the
headphones, was measured using a perception of auditory distance
2AFC task. The stimulus, either a over headphones.
500 millisecond pink noise burst or
a two second phrase of male speech,
was presented twice, first at a
reference distance and second at an INTRODUCTION
incremental distance from the
reference. Reference distances Most humans can perceive,
included five, fourteen, and separate, and locate sound sources
twenty-two feet, and the in three dimensions (3-D) and
incremental distances included correlate the apparent source
multiples of .25, .5, and 1 foot, location with the visual and
respectively. Stimulus pairs at vestibular senses. In nature, this
each of the three distances were ability serves as a low resolution
presented from four directions: early-warning system and is useful
front, back, left, and right. For in pointing the high resolution
each stimulus pair, the task of visual system to a location of
three subjects was to indicate interest. A large body of
which of the two sounds appeared experimental data on auditory
closer to himself/herself. From localization capabilities has been
histograms for each condition, the collected over the last century and
just noticeable difference (JND) continues to grow (Blauert, 1983;
was calculated by determining the Yost and Gourevitch, 1987; and
minimum interval at which the Durlach, 1991).
subjects could respond correctly Recently, the synthesis of the
seventy-five percent of the time. cues necessary to give
The results of all locations directionality to a sound source
indicated that the JNDs for the heard over headphones has become
direct path signal were seven feasible (McKinley, 1988). Virtual
percent, J N D s for the reflections audio is the creation of the
were around six percent, and JNDs illusion of a sound source existing
for the reverberant sounds were at a location in 3-D space, either
about twelve percent. All of the over an array of loudspeakers or
sounds appeared out-of-head, headphones, by encoding a signal

with various localization cues. the material of the structures. At
Virtual audio may help a pilot distances greater than 100 feet,
to perform a variety of functions the atmosphere begins to absorb
in the military cockpit. The high frequency sounds (Coleman,
visual workload may be reduced by 1963). An inverse relationship
providing a spatial auditory beacon exists between the ratio of direct
for navigation waypoints. The path to reflected energy and the
current audio alarm used by the source distance (von Bekesy, 1960).
radar warning receiver could be These auralization cues are the
more effective if it led the primary contribution to the
pilot's eyes to the area of danger perception of distance (Moore,
(Folds, 1990). The intelligibility 1989).
of speech has been shown to The final factor in
increase by spatially separating localization is the transformation
the communications channels of the sound source due to an
(Ericson and McKinley, 1992). individual's pinnae, head and
Formation flying at night or in bad torso. The fact that a person has
weather may be aided by attaching two ears separated by the width of
audio "running lights" to wingmen. the head produces two cues: a
Virtual audio may also provide a difference in the time of arrival
means for pilots to preserve their and in the spectral content of the
spatial orientation in addition to soundwave reaching each ear. The
existing visual cues. unique shape of each ear and the
mass of the head and shoulders
Backqround alter the frequency spectrum that
reaches the eardrum in a
The process of encoding characteristic way for each person.
localization cues can be framed in These two cues combined are called
terms of a transmitter-medium- the head related transfer function
receiver relationship, in this case (HRTF). The cues included in one's
sound generation-auralization-head HRTFs enable the localization of
related characteristics. Each sounds in azimuth and elevation.
sound source has its own spectral The time information dominates the
and temporal qualities. Properties perception of azimuth, while the
of the source such as the spectral cues are important for
bandwidth, pitch, degree of elevation perception (Wightman and
modulation, intensity and Kistler, 1989).
directionality play an important While ample experimental data
role in localization. Familiarity exists for human performance in
with the sound source and a priori azimuth and elevation, data for
information affect the listener's distance performance is scarce.
perception. For a familiar source, This situation is reflected in the
the level of the sound as it state of virtual audio technology
reaches the ears can give an in that the performance in azimuth
indication of distance (Moore, is very successful, the perception
1989). of elevation is good and improves
The effects that the acoustic with training, while the use of
environment imposes on the sound distance cues has not been fully
source before it is received by the successful. Therefore, the need
ears are called auralization cues. exists for research into distance
Reflections and reverberation occur performance. Existing and current
when a sound wave bounces off virtual audio technology can be
structures such as the ground or very useful to meet this need.
walls. These structures absorb and
pass sound differently depending on OBJECTIVE AND PURPOSE
the frequencies of the sound and

The objective of this work was sources to produce one source
to study the effectiveness of using consisting of a direct path and up
intensity scaling, reflections, and to three reflections.
reverberation to create the Reverberation was provided with a
perception of distance over Realistic (Cat. No. 32-lllOB)
headphones. The purpose was to electronic reverberation box. The
incorporate distance cues into delay between echoes was set at 15
virtual audio synthetic ms. Two echoes were produced with
environments. a successive 75% attenuation for
each. The stimulus was presented
METHOD to the subject over Sennheiser
HD520 headphones.
In software, a virtual room
A distance perception was created. It was 60 feet in
experiment was conducted at the Air length and 12 feet wide, with the
Force Institute of Technology subject centered (Figure 2).
(AFIT). In order to study Virtual sound sources could then be
synthetic audio and visual placed along the center of the
environments, AFIT has combined two room, lengthwise, from 3 to 3 0 feet
3-D Audio Generators with a Silicon on either side of the subject.
Graphics, Inc. (SGI) Personal Iris With the source in any location,
4D/35 and an accompanying SGI audio the subject could hear the direct
processor board (Scarborough, path sound source alone or
1992). This setup provided a accompanied by any combination of a
powerful testbed with flexibility floor reflection or a right or left
and control of both graphics and wall reflection. The software
digital audio (Figure 1). would calculate the appropriate
The Bioacoustics and time delays and intensity scaling
Biocommunications Branch (AL/CFBA) of the reflections using an
of the Armstrong Laboratory, United acoustic ray tracing program to
States Air Force, has developed determine path length and the
the 3-D Audio display generator inverse ratio law to simulate
which provides virtual acoustic absorption. The SGI main processor
capabilities. In the procedure would pass these factors to the
developed by AL/CFBA, the combined audio processing board.
frequency transformations of the
pinna and head of a mannequin are Stimuli
measured for 272 points in 3-D
space. These transformations were When the subject pressed the
recorded in the center of a 14 foot appropriate mouse button, the audio
diameter geodesic sphere of loud processor board would play the
speakers in an anechoic chamber. sampled stimulus over the four 3-D
The HRTFs are stored in memory and Audio channels, modifying them as
used to encode the sound source, instructed by the main processor.
separately for each ear, to make The stimulus consisted of a
the sound appear to come from a digitized, 500 millisecond, pink
particular location. noise burst. The noise burst was
The use of a pair of two- bandlimited between 100 H z and 10
channel 3-D Audio Generators 'kHz with a 50 ms onset and offset
provided four independent channels time. Also, the digitized, two
for simulating four spatially second phrase "fear of lawyers1'was
separate acoustic sources. The S G I used to test the effects of sound
audio processor board allowed the source familiarity. The subject's
coordination (time delays and response was entered using the
intensity scaling) of the four mouse. Data files were used to set
the initial conditions, control the sections for each of the noise and
experiment trials, and to store the speech stimuli) but with intervals
subject's response data. twice as large as those used in the
The sound source locations actual experiment. The subjects
were divided into two sets: the were presented the eight sections
primary and the secondary sources. with the noise stimulus, the speech
The primary sources were set at & stimulus, and the noise stimulus
5, 14, and 22 feet from the with reverberation.
subject. Each primary source had
eight secondary sources associated Subiects
with it: four in front and four
behind. The minimum interval used The three subjects tested
was one quarter foot, yielding included a female and two males.
secondary sources for the primary One of the males, the experimenter,
source at 5 feet at 4, 4.25, 4.5, had prior experience with the
4.75, 5.25, 5.5, 5.75, and 6 feet. distance stimuli; the other two
The interval for the 14 foot source subjects had no prior experience in
was one half foot and the interval distance perception experiments.
was one foot for the source at 22 All subjects were in their early
feet. twenties and had normal hearing
sensitivity and function.
Experimental Desiqn
The experiment consisted of a
two alternative, forced choice task Data was collected for the
in which the subject responded with three subjects and percent just
which of the two sounds appeared noticeable difference ( % J N D ) was
closer. In each trial, the subject calculated for different
heard a primary sound source experimental conditions. From
followed immediately by a secondary histograms for each condition, the
sound source. The subject was not %JND was calculated by first
allowed to repeat the trial determining the minimum interval at
presentation. A section consisted which the subjects could respond
of 48 trials corresponding to six correctly (the sound they perceived
primary source locations, each with to be closer was actually designed
eight secondary source locations. to be closer) seventy-five percent
In each section, the 48 source pair of the time. The length of this
locations were randomized. interval was then divided by the
The eight sections distance from the subject to the
corresponded to four reflection primary source. For example, if
conditions for two head-orientation the primary source was at 5 feet,
angles. The reflection conditions while for a certain set of
consisted of (1) a direct path conditions the subjects reached
signal, ( 2 ) a direct path and a seventy-five percent correct at the
floor reflection, ( 3 ) a direct path second interval (one half foot),
with reflections from walls on then the % J N D would be 1 0 % . In
either side of the subject, and (4) several cases, all using
a direct path with both floor and reverberation without reflections,
wall reflections. In half of the the percent correct never reached
sections, the subject faced forward seventy-five percent indicating the
in the virtual room, while in the need for more or larger intervals.
other half, the subject faced to The first relationship studied
the right. The order of the eight was the effect of reflections and
sections was randomized for each reverberation in general. For
subject. Each subject was trained those sections without reflections
for at least one hour (eight the % J N D was 7%, while for those
sections with reflections, a 6% J N D presented in the median plane
was calculated. With the limited sometimes appeared to be biased
data set, the addition of slightly to either side. For the
reflections did not affect the sources presented on the sides, at
judgement. When reverberation was the farther distances, the
included and when the subjects were positions were perceived to wrap
able to reach seventy-five percent around towards the front of the
correct, a 12% J N D was calculated, subject. Elevation was at the
indicating a two-fold increase in horizon or ear level.
the difficulty of discrimination. All subjects reported hearing
When the data was broken down a close, medium and far source
for the three different source region. The subjects consistently
distances (Table l), the effect of heard the nearest sounds at an arms
reflections was found to depend on length from the head. The absolute
the source location. For the five judgement of the further distances
foot source, the % J N D without was more difficult with estimates
reflections was 9 % (.45 ft.), while ranging from ten to one hundred
with reflections added, the % J N D feet.
dropped to 5% (.25 ft.) . With the
source at fourteen feet, the DISCUSSION
opposite relationship was found.
The % J N D without reflections was 5% Although the data set is
(.7 ft.), while the addition of small, some interesting trends
reflections increased the % J N D to emerged. As the reflections are
8% (1.12 ft.). The reflections had configured presently, they do not
no effect on the discrimination at contribute greatly to the
twenty-two feet. N o difference was discrimination task. The
found for the results using the reflections are important in that
noise versus speech stimuli in they do impact the perception of
terms of the effects of the sound source by giving it
reflections. volume. In one case, the
As mentioned previously, the reflections may aid in the
case of reverberation without discrimination task. When the
reflections did not yield seventy- sound source is close to the head,
five percent correct responses for the distance discrimination seems
the five and fourteen foot source to be achieved by an intensity
locations. For the source at judgement alone. The energy of the
twenty-two feet, the % J N D was 15% reflections may increase the
( 3 . 3 ft.) When the reflections intensity difference heard between
were included with reverberation, a sources that are near to each
12% J N D (2.64 ft.) was calculated. other, easing the discrimination.
While being trained, the
subjects were questioned about The addition of reverberation,
their impression of sound source while adding color to the sound,
location, absolute distance, and has a negative effect on the
the discrimination task. All subjects' discrimination
subjects reported externalization capability. The task appears
of the sound source, more so with particularly difficult under the
the speech stimulus than the noise contradictory condition of having
burst. The reflections and reverberation without reflections.
reverberation were reported to add The reverberation segment of the
volume or spaciousness to the sound experiment should be repeated with
source. larger intervals so that seventy-
The stimuli were never five percent correct performance is
perceived to be coming from the reached.
rear of the subject. Sounds For the noise burst stimuli, a

60 1
pitch shift was heard when the Durlach, N. (1991). Auditory
distance was changed. This cue localization in teleoperator and
destroyed the perception of a virtual environment systems:
distance change as externalization ideas, issues, and problems.
was lost. Subjects reported Perception, 20, 543-554.
attempting to use pitch shift as a
means of discrimination but were Ericson, M. A., and McKinley, R. L.
advised that pitch shift was an (1992). Experiments involving
unreliable cue and to attempt to auditory localization over
visualize the source at a distance. headphones using synthesized cues.
With training and concentration J. Acoust. Soc. Am., 9 2 ( 4 ) , 2296.
this unwanted percept could be
overcome. For close source Folds, D. J. (1990). Advanced
locations, the intensity alone was audio displays in aerospace
most often used as the means of systems: Technology requirements
discrimination. and expected benefits. Proceedings
This experiment demonstrated of the National Aerospace
the functionality of an audio Electronics Conference, pp 739-743.
distance simulator using a virtual
environment. Because of the McKinley, R. L. (1988). Concept
processing power available, data and design of an auditory
can be collected without the need localization cue synthesizer.
for extensive physical hardware. Unpublished Master's Thesis, Air
Also, with the flexibility of the Force Institute of Technology,
system, room dynamics can be Wright-Patterson Air Force Base,
simulated with modifications to the OH.
software only. Without the need
for mobile physical sources, the Moore, B. C. J. (1989). -
length of time for an experiment is Introduction to the Psycholoqy of
much less. Hearinq. Academic Press, San Diego
Further experimentation is CA.
planned to include collecting more
data with the present setup, Scarborough, E. L. (1992).
studying the role of differential Enhancement of audio localization
Doppler shifts, the role of visual cue synthesis by adding
cues, the effect of training, and environmental and visual cues.
of measuring absolute distance Unpublished Master's Thesis, Air
judgement. Force Institute of T e c h n o l o g y ,
Wright-Patterson Air Force Base OH.
von Bekesy, G. (1960). Experiments
in Hearinq. McGraw-Hill, New York
Wightman, F. L., and Kistler, D. J.
(1989). Headphone simulation o f
free-field listening. 11.
REFERENCES Psychophysical validation. J.
Acoust. Soc. Am., 85, 868-878.
Blauert, J. (1983). Spatial
Hearinq. MIT Press, Cambridge MA. Yost, W. A., and Gourevitch, G.
(1987). Directional Hearinq.
Coleman, P. D. (1963). An analysis Springer-Verlag, New York NY.
of cues to auditory depth
perception in free space. Psychol.
Bull., 6 0 ( 3 ) , 302-315.

Source Location: 5 FT 1 4 FT 2 2 FT TOTAL
Condition JND JND -
Without Reflections 9% 5% 7% 7%
w i t h Reflections 5% 8% 6%
Reverberation Alone - - 15%
Reverb and Reflections 15% 1 4% 7%


