Professional Documents
Culture Documents
A General Model For Spatial Processing of Sounds
A General Model For Spatial Processing of Sounds
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=mitpress.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The MIT Press is collaborating with JSTOR to digitize, preserve and extend access to Computer Music
Journal.
http://www.jstor.org
Moore
F.Richard A General Model for
Computer Audio Research Laboratory
Center for Music Experiment, Q-037
University of California, San Diego
Spatial Processing of
La Jolla, California 92093 USA
Sounds
Introduction Localization
We perceive sounds in a spatial context. Without Much attention has been paid to our ability to lo-
visual cues, we can often tell the direction or dis- calize sounds. Roederer(1975) points out that, espe-
tance from which a sound comes. We also perceive cially at high frequencies, intensity cues (amplitude
things about the apparentacoustic environment of differences between the sound waves arrivingat our
sounds, such as whether they seem to come from a two ears) help us to determine the direction from
reverberantcave or a padded cell. Multichannel re- which a sound comes. At lower frequencies, time
cordings can portray the spatial characteristics of cues also contribute to localization (Molino 1974).
recordedsounds independent of listening condi- What distinguishes high from low? At a speed of
tions. In ways analogous to looking through win- about 335 m per second, it takes a sound wavefront
dows, we can discern things about one acoustic about 500 ,/sec to travel the 17 cm or so between
environment through headphones or loudspeakers our ears. A 2000-Hz tone therefore has a wave-
while we move about in another. length about equal to our interauraldistance. At
Ideally, spatial processing of sounds would allow frequencies below this, interaural time delay can be
us to have complete control over the acoustic en- an important factor in localization.
vironment heard through the loudspeakers. Each For interaural time differences to be important,
sound located within this heard environment could something must exist in our neural mechanism
have a specified "size," direction, distance, and ap- that correlates the signals coming into our two
parent motion. We can use computers to gain such ears. Neural models of such crosscorrelatorswere
control over the spatial characteristics of sounds, proposed as early as 1959 (Licklider 1959), and
but for musical applications we must always spec- physiological evidence for such mechanisms has
ify the acoustic processing we believe will produce since been found (Rose et al. 1969). Roedereralso
the intended psychological effect. Spatial processing points out that the existence of such a crosscorrela-
therefore involves the simultaneous consideration tion mechanism has implications for spatial
of two sets of problems: the physical characteristics control:
of a space to be simulated and the psychological
characteristics of sounds presented to listeners over It is easy to see that the location ... will de-
loudspeakers. pend on the interaural time delay, which in
The work described in this article consists of turn depends on the direction of the incoming
sound. Two tones, a mistuned interval apart,
(1) a conceptual model for representing the problem
of spatial processing and (2) a description of an im- fed into separate ears, may "foul up" the
crosscorrelator:The gradually shifting phase
plementation of this model in the context of the
Cmusic sound synthesis program(Moore 1982). difference between the two tones ... will be
interpreted by this mechanism as a changing
This work was described by the author in a talk presentedat the difference in the time of arrival of the left
InternationalComputer Music Conferencein Venice, Italy,in and right auditory signals, hence signaling to
September 1982. the brain the sensation of a (physically non-
Computer Music Journal,Vol. 7, No. 3, existent) cyclically changing sound direction!
Fall 1983, 0148-9267/83/020006-10 $04.00/0, This is why two pure tones forming a mis-
? 1983 Massachusetts Institute of Technology. tuned consonant interval, presented di-
chotically with headphones, gives the eerie processing, however, we wish not to analyze but
sensation of a sound image that seems to be ratherto synthesize the sound of a concert hall or
"rotating inside the head" [Roederer1975]. some other acoustic environment. In terms of the
For localization to occur, we must know not only tapped-delay-plus-recirculating(TDR) filter model,
we must find ways to synthesize the gain, delay,
the direction of a sound source but also its distance.
and recirculation parameters according to a spec-
In research based on earlier work by Gardner(1962)
ified musical intent.
and Wendt (1961), Chowning (1977) demonstrated
Direct manipulation of such psychophysical
that the relative mixture of direct-to-reverberant
sound is a powerful cue for determining the dis- parameters as interaural time delay can lead to
tance between sound source and listener. By compelling illusions of sounds in space. Using
headphones or biteboards (which a listener grasps
combining simulated cues for angular location, dis- with the teeth) to control relative head position,
tance, and velocity (i.e., Doppler shift) with artifi-interauraltime delays can be used to obtain TDR
cial reverberation(Schroeder 1962) via the Music V
filter parameters. We would expect the results of
program(Mathews et al. 1969), Chowning was able such an approach to produce strong impressions of
to create convincing illusions of moving sound
localization.
sources.
Unfortunately, the relative positions of listeners
Improvements in our understanding of the acous- and sound sources in a concert situation is unpre-
tics of rooms (Schroeder,Gottlob, and Siebrasse
dictable. No two listeners in a concert hall hear ex-
1974) allowed Moorer (1979) to synthesize a concise
actly the same sound, rendering such factors as
yet powerful model for artificial reverberation.Fol- interaural time delay useless as control parameters
lowing Schroeder's suggestions, Moorer based his for music intended to be heard under concert con-
processing model on a tapped-delayline filter (also ditions. Even the relative amplitude of sound enter-
known as a finite-impulse response [FIR]filter) to
simulate the "early echo" response of a room fol- ing the two ears of each listener is likely to vary
lowed by a bank of recirculating filters (also known throughout the performance space.
Even though no two listeners in a concert hall
as infinite-impulse response [IIR]filters) to produce
hear the same thing, there is an invariance in their
the effect of dense global reverberation.Moorer
used the data gathered by Gottlob (1975) and subjective perception-they all hear the same mu-
Schroederto obtain "reasonable-sounding"values sic, if from different vantage points. We are clearly
able to compensate for our own vantage points, ex-
for the tap-delay and gain parameters of the delay-
line filter, together with acoustic data on sound ab-cept under unusual conditions that fool the percep-
tual mechanism. (The failure of this compensation
sorption, to obtain similar values for the comb fil- is the basis of most so-called illusions.) In the vi-
ters. These all led to a loose but useful simulation
sual realm we compensate readily for the difference
of Boston's Symphony Hall, with suggestions for
between size and distance, and we make similar
alternatives.
sonic distinctions between loudness and intensity.
Since we cannot control psychophysical param-
eters directly in a concert, a practical model for
Psychophysics Versus Performance
spatial processing must be based on physical char-
acteristics of the real or imaginary space or spaces
To gain more general control over spatial charac-
to be simulated. This suggests modeling the play-
teristics of musical sound, we need a general way to
back situation itself, rather than the details of the
obtain reasonable-sounding values for the process-
listener's perception of it-the approachtaken in
ing algorithm. The work of Schroeder,Gottlob, and this
study. The model described here is based on
Siebrasse, on which Moorer's values are based, was the
oriented toward improving the subjective impres- following elements:
sion of the sound of concert halls. With spatial The relevant characteristics of the listening
Moore 7
I I
Fig. 1. An outer room en- the base of the radiation Fig. 2. Signal flow diagram of the filter. The summed
closing an inner room. The vector associated with this of a basic TDR filter. A output of the delay taps is
circles on the periphery of source, and the line points characteristic set of delays then furtherprocessed by
the inner room represent in the direction of greatest (D[.])and frequency- a recirculating (IIR)filter
holes in its walls (loud- radiation (the length of the dependent gains (G[.]) de- R, which provides dense-
speakerpositions). A vector is proportional to termine the operation of echo reverberation.
sound source is shown in the amplitude of radiation the tapped-delay (FIR)part
the upperright quadrant. in that direction).
The small circle represents
in -- D[1].
FIR
V7I IIR
room would pass more closely to someone sitting Thus even two loudspeakers would do a fair job of
close to a wall than to someone sitting in the cen- representing the entire two-dimensional plane of
ter. After listening a while, however, all listeners locations in the outer room. More loudspeakers
should be able to agree on sound source locations would sample the acoustic field of the outer room
regardlessof where they sit in the inner room, at with greater spatial frequency, lessening source lo-
least as well as they could by listening through cation ambiguity.
holes to actual sound sources moving about in a By specifying sound paths in the outer room sep-
real outer room. arately from the characteristics of the listening
Sound sources may in general be located at any space, we separate the intended percept from its
position in the outer room. Having more loud- manner of presentation. Thus a given spatial com-
speakers leads to a clearer depiction of the outer position might exist in several versions: one for two
room. But even a small number of speakers can give loudspeakers in a small room (a living-room-stereo
the listener a great deal of information about the version), one for headphones, and one for eight
entire outer room and all sound source locations in channels of sound in a large room (a concert ver-
it. Imagine, for example, that a performerin the sion). The sound paths themselves would be identi-
outer room is walking in a circle around the inner cal in each version, that is, the composition itself
room while beating on a tambourine. If we listen to would be invariant. Only the inner room specifica-
the tambourine performance through two holes in tion would vary according to intended presentation.
the front two corners of the inner room (stereo),our A sound source in the outer room is modeled as
ability to locate the sound will be excellent when it one or more radiation vectors, each with an adjust-
is in the front (the azimuth-or angle between the able position, directionality, magnitude, and field
front-back line and a line pointing toward the shape. A single radiation vector suffices for most
sound source-lies between the two loudspeakers), sound sources. Multiple radiation vectors (Fig. 4)
less good when it is to the sides (outside the "cone" may be used to describe sound-radiating surfaces to
described by the lines drawn between the listener an arbitrary degree of precision. Individual radiation
and the two loudspeakers), and ambiguous at the vectors may have time-varying characteristics, al-
single point when it is directly behind the listeners. lowing sounds to move in arbitrary paths through-
Moore 9
I I
~ , \ IIVa
Ic
II a III 11
simple shapes such as squares and rectangles Sound is considered to be radiatedin a supercar-
should usually suffice for the outer room. diodal pattern principally in the direction of the
The inner room polygon is determined not by the vector but with smaller amplitude to the sides and
actual shape of the performance hall but by the lo- back (see Fig. 6).
cation of the loudspeakers within it. Four speakers The back value given in the specification of the
at the corners of a square 10 m on a side therefore radiationvector varies between 0 and 1. A back
define a square inner room 10 m on a side, regard- value of 0 implies no back radiation and a strongly
less of the actual size and shape of the listening directional radiation pattern. A back value of 1 im-
room. No particular allowance is made in this plies an omnidirectional radiation pattern. The
model for reverberantor other properties of the lis- supercardiodalshape of the radiation pattern is
tening space, since matching and/or compensating given by:
for these is largely independent of the spatial char-
acteristics of the illusory outer room. r()) = scaler for radiation in
ray direction 4)
2
= 1+ (back- l) l0- -4
Radiation Vectors 7T1
(2)
Sound sources are injected into the space by means where 0 and back are defined as in Eq. (1). A single
of radiationvectors. A radiation vector RV is com- sound source emanates from one or more radiation
pletely defined by the quintuple vectors. Each radiation vector may be located any-
RV = (x, y, 0, amp, back), (1)
where outside the inner room in the space de-
scribed by the outer room. Each radiation vector
where
represents a source of sound in the virtual sound
x and y are the base of the vector (all coordinates space.
are given in meters, with the origin (0, 0) in the
center of the inner room);
Sound Paths
0 is the direction of the vector (an angle of 0 rad
points to the right as viewed from above); Sound sources radiate to each speaker channel in
amp is the length of the vector and is used to two ways: (1) by direct paths and (2) by reflected
scale the amplitude of the source sound; and paths. There is exactly one potential direct path be-
back is the relative amplitude of the radiationin tween each source and each speaker channel. For
the direction opposite to that of the vector. each source, there is also one potential reflected
Moore 11
I I
Fig. 7. Direct radiation be- radiation between a single
tween a single source and source and four loud-
four loudspeakers (note speakers (note cut
the cut path) (a).Reflected paths) (b).
(a) (b)
path to each speaker channel from each wall of the est amplitude at the loudspeaker position. The
outer room. Thus a single source radiatingsound to reflection point chosen is therefore the one that re-
quad loudspeakers in a square outer room is mod- sults in the shortest distance from source to wall to
eled with four potential direct paths (one to each loudspeaker.This is the point at which the angles
loudspeaker)and sixteen potential reflected paths of sound incidence and reflection are equal. Such
(one from the source to each wall to each loud- reflections are easily modeled by standardacoustic
speaker)(Fig. 7). techniques involving virtual sources located on the
The shape of each path determines the following opposite side of the reflecting wall.
parameters:
Attenuation along the path due to distance Cut Factors
Frequency-dependentattenuation due to air
Modeling the outer wall of the inner room as com-
absorption
pletely absorptive yields good directional distinc-
Frequency-dependentattenuation due to reflec- tions among various source locations. Therefore,
tion (absorptionby the reflecting surface)
a sound path is potential until it is determined
Absorption due to collision with the outer walls whether it is obstructed by the inner room. An
of the inner room (these are modeled as being
obstructed sound path is considered to be "cut"by
completely absorptive) the barrier,and to be completely absorbedat that
Time delay due to the finite speed of sound
transmission point. We can define the cut factor for a sound path
to have a value of 0 when the path is obstructed by
The paths are calculated in the following manner. a barrierand 1 when it is unobstructed. The cut
Each direct sound path is simply a straight line be- factor can then be combined multiplicatively with
tween the source and the loudspeaker. One reflec- the overall amplitude of the sound on a particular
tion path is used for each wall of the outer room. Of path.
the possible reflection paths from a given wall, the Since the cut factors may dynamically vary for
one chosen for this model is the principal reflec- moving sound sources, flipping back and forth be-
tion path, that is, the one that results in the great- tween 0 and 1 as ray paths change with changing
source locations, some mechanism must be used to Amp[-]is the amplitude scalar given in the radia-
avoid clicks (due to stopping or starting the sound tion vector that is the source of the path;
too abruptly).A simple method for dealing with Rad[.]is a "radiant,"that is, an amplitude scaler
this problem involves a linear interpolation be- for the direction of radiation (= r({) for the
tween the 0 and 1 values as necessary to avoid path); and
clicks. A more sophisticated approachwould be to Cut[H]is the cut factor for the path (0 if cut and 1
model the refraction of the path around the edge of if not).
a cutting surface, but it seems unlikely that the
A changing source position would likely cause a
computation involved would be justified on percep-
tual grounds. changing delay parameter, which would result in
pitch shift as a side effect (shrinking delay values
would shift the pitch up and vice versa). The mag-
nitude of this pitch shift is precisely the same as
EarlyEcho Pattern that of a Doppler shift for a moving sound source.
Since the TDR filter (with properly interpolated
The early echo pattern for each sound source is a
collection of delays and frequency-dependentgains. delay taps) provides such shifts automatically, no
At each moment, for each radiation vector, one po- specification of Doppler shift is necessary for
tential direct path exists between the base of the moving sounds with this model (they happen
vector and each loudspeaker. The cut factors deter- automatically!).
The complexity Odi of the direct path computa-
mine whether the path is actually present or not. In
tion is proportional to
addition, one potential path exists from each radia-
tion source to each reflecting surface of the outer O di, NvecNchan. (6)
room to each loudspeaker. Thus the total number
The complexity of the reflected path computation
of sound paths included in this model is
is then
Npath= NvecNchan(1 + Nsrf) (33)
Orefl O NvecNchanNsurf, (7)
sound paths modeled, where
where the factors are defined as above. The overall
Npath is the total number of paths; complexity of the computation involved is propor-
Nvec is the number of radiation vectors; tional to the total number of paths:
Nchanis the number of speaker channels; and -
tot Odir + refl
Nsurf is the number of reflecting surfaces in the ocNvecNchanl(1 + N,surf) = Npath. (8)
outer room.
Since for each path P, we must compute both D[P]J
For each of these paths, P,, i = 1, 2, ..., Npth, and G,[P,] as defined above, the total amount of
we define a delay D[P,] and a frequency-dependent
computation for this model is significant. If all
gain G.[P,]: radiation-vectorelements are allowed to be time
varying, for example, new values must in general be
D[PJ] = Dist[P (4 calculated at every sample.
~and~~C
and
Moore 13
Fig. 8. Cmusic score exam-
ple. This score produces a
sound that moves in a 10-
m-radius circle centered
about the point (32,22).
Moore 15