The Creation of Musical Sounds For Playback Through Loudspeakers - Dave Moulton - 1990

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

THE CREATION OF MUSICAL SOUNDS FOR

PLAYBACK THROUGH LOUDSPEAKERS

BY DAVID MOULTON

Chairman of the Department of Music Production and Engineering


Berklee College of Music
Boston, MA.

The evolution of multi,rack recording of popular music has led to significant changes in style and
aesthetics, compared to live performances and the recordings derived from them. Central to these
issues is the idea that music may be represented by a single waveform over time and the emergence
of a "loudspeaker music" aesthetic based on the tension that exists between the A+B and A-B
elements in stereophonic playback of multitrack recordings. This paper reviews the technology
systems and usages that have influenced these changes, and discusses the validity of the current
assumptions and practices in terms of psychoacoustical principles. Particularemphasis is paid to
the idea of stereophony as it is employed in multitrack popular music recording.

1. HISTORICAL BACKGROUND creative accomplishment that has gone on in the recording


Sound recording has been viewed since its inception as a of popular music since 1950.
method for the storage and reproduction of acoustical musical Suffice it to say that multitrack recording has proceeded on
events. This archival activity has obscured an emerging and the premise that music consists of a multiplicity of voices,
equally important facet of sound recording: its use as a prima- each of which can be represented by a single electrical wave-
ry medium for the creation of music. This creative use f'mds form, and that successful, beautiful, and exciting music can be
many of ils origins in popular music, the evolution of popular created from a set of such waveforms. It should be further
electric musical instruments, and the mulfitrack efforts of Les noted that the preferred performance of such music is by loud-
Paul and others, speaker pair. Our efforts have been mainly directed at the de-
While stereophony has evolved from efforts to enhance velopment of a craft and technology to allow us to create,
the listening experience (particularly of classical music), modify, and mix the waveforms of such a set. Little attention
multitrack recording has evolved from efforts to enhance has been paid to the exploration of other assumptions about
popular music performance, by artificially expanding often and possibilities for the creation of music for loudspeakers.
limited resources and by deferring musical decisions until
the moment of fixing the master recording. The practition- 2. TECHNICAL PREDISPOSITIONS OF
ers of multitrack recording have been highly pragmatic in MULTITRACK RECORDING PRACTICE
their approach and have sought, in common-sense musical In many respects,art is shaped and driven by the technology
and technical ways, to improve the apparent quality of their which surrounds it. The three-minute song, for instance, is
performances and recordings. The tremendous commercial clearly a response to the physical constraints of the Edison
success their work has enjoyed has in turn led manufactur- cylinder and its descendents, and the applications and prac-
ers to eagerly support theft multitrack endeavors and re- rices of stereophony are equally rehted to the physical and
spond quickly to their expressed needs. A recording indus- psychological forces inherent to the mulitrack recording pro-
try based on this synergy has evolved extremely rapidly. At cess and its environs. The assumptions underlying the contig-
the same time, comparatively little has been done to assess urafion and operation of the modem recording studio have
the underlying principles and aesthetics of this method of profoundly affected the nature of the music created in it (such
creating music, or the psychoacoustical implications and as the evolution of the A+B/A-B aesthetic process noted be-
validity of those principles. The evolution of multitrack low). Therefore it is important to be aware of the configura-
recording as a primary music-creation medium has been so tion and practices of the multitrack recording process, as well
rapid that there simply hasn't been time yet to evaluate and as those of the end-user audience, in order to more fully an-
place into historical perspective the immense amount of derstand the music that is created with it, as well as the appro-

AESafl3INTERNATIONAL
CONFERENCE 161
MOULTON

priate design goals for the hardware audio systems used to nels via an arrayof mix buses. Some of the buses areused to
transmitthatmusic, collect signals for global signal-processingeffects (such as re-
verb) which are then routed throughyet otherchannels.These
2A. Microphone and Monitor Usage parallelchannelsare thensummedto two buses(for stereo)
Multilrackrecording is based on the assumptionof multiple, for final mixing, distributedto theses buses via amplitude-dis-
separateand independent voices. This has led to a predisposi- tribution panpots. Monauralversions of mixes are obtained
lion for isolation of insmunental sounds. A primary method via a simple summationof the stereo buses, usually for moni-
for obtaining isolation is through the placement of micro- toting purposes only, and then again later in broadcast tram-
phones in the near-field of a musical instrument. Typical missionfor radioand television.
placements are usually within inches of the source,with con- Each channel will have equalization (the modification of
siderableattention being paid to the exact placementas a pti- frequencyresponse of an audio signal)available.Equalization
mary determinant of timbre. Further isolation is obtained is, along with microphone placement,consideredto be thepti-
through the preferred use of microphones with cardioid and mary determinantof timbre. In contemporaryconsoles,equal-
hypereardioid polar patterns. These placement traditions are ization is fairly flexible,with usually four separate circuitsof
sufficiently pervasive that even when near-ideai isolation is so-called"parametric" control (i.e. control of amplitude, fre-
obtained via overdubbing, near-field placements continue to quency and bandwidth of the response modification) per
be preferred, except for special effect. The convention that channel. The stereo localizationof an audio channel is deter-
precise microphone placement in the near field is a primary mined by a panpot (usually dual potentiometerswith oppos-
determinant of timbre is carried even further as a matter of ing taperson a common shaft). This circuit is used todayas
course with the mixing of multiple near-field microphone the primary determinantof stereo, and the practicingassump-
placements for the same instrument to obtain thedesired tim- tion is that relative distribution of amplitude to the Left or
bre. Sometimes, electrical pickups will also be mixed with Right bus determines the location of the perceived stereo-
near-fieldmicrophones, phonicphantomimagefor thatparticularaudiosignal.(Man-
At the playback end of the recording/reproductionprocess, ufacturers also view stereo itself in this light, ff a device has
the evolution of monitor loudspeakersand conlrol room con- two outputs and the amplitude of a signal distributedto those
figurationshave arisen from a variety of designneeds, histori- two outputs is controlledby a panpot, the deviceis by defmi-
cai traditions and acoustical philosophies. The goal of much tion a "stereo" device.)
of this designeffort and usage has been to eliminatethe effect Gain-manipulationdevices (compressors,expanders et al.)
of the controlroom on the audio signal, based on the assump- are not generally fitted to all channels of a console,but are
tion that the listening room will add coloration to and will available widely in most studios(typicallya minimum of one
mask the true musical character of that audio signal.l,2 A for every two channels of audio on the console). These de-
cenlral ingredient in such design methods has been the at- vices were not originally developedto enhance the musicality
tempt to minimize the amount and effect of controlroom re- of a signal, but to avoid overload distortion and system noise
flections on the listener through the use of highly absorbent while recording acoustic sounds whose dynamic range ex-
rooms, geometriesthatprevent early reflectionsfromreaching ceeded that of the audio system. Recording engineers, musi-
the listener,near-field monitor placement, etc. A result of this cians and producers have noticed the substantial impact of
approach has been the creation of rooms that Supportabner- such devices on the timbre and musical quality of a sound,
really strong and stable phantom images (which permit users and have increasingly used companders for the purpose of
may derive maximum audibility fromthe effect of amplitude- manipulating musical values even as the available dynamic
distribution punpots). At the same time these rooms yield range of the audio channel itself has increased to a point
comparativelyaudible interferenceeffects (i.e. comb-filtering) where overload distortion and/or objectionable system noise
around the median plane. Detailsof reverberanceand time de- are no longer seriousconstraintsin the recordingprocess.
lay are also extremely well revealed. Dynamic range may ap- Signal processing in the time domain has proved to be one
preach the limits of human thresholds, of the most fruitful and musically compelling realms of craft
Such roomsbear littleresemblance to the end-userenviron- in multitrackproduction, for reasons relatedto the psycholog-
ments for which the recordings are intended. It is now ax- ical nature of the echo-locationportion of our hearingmecha-
iomatic that music mixes are subsequently evaluated in a nism. This realm includes thegenerationand reiterationof de-
range of such end-user environments and then remixed to lays ranging from approximately 100 microseconds to about a
compensate for undesirable qualities perceived in those end- second.Due to the comparativelyhigh cost of delay lines and
user environments. Producers and engineers also employ a the forbiddingly broad and complex range of possibleconfig-
commonpractice of using their own small near-fieldmonitors urationsthat might be desired,delay lines have not becomean
in production control rooms to minimize errors of this sort integral part of each channel, and reverberators remain "out-
and thereby shortenthis poslproductionprocess, board" units, usually dedicated to one functionat a time (al-
though often time-shared through overdubbing, in a multi-
2B. Signal Processing track cost-saving tradition).
The recording console carries multiple parallel channels,each One of the earliest and still most powerfulmusicalgestures
designedto condition and process an audio signal.Apart from to emerge from recording is the reiterated"tape echo," audio
the many functionsdesigned to facilitate production,consoles signal delays of greater than 50 milliseconds(and usuallyless
have as a primary function the interconnectionof these chart- than 500) generated originallyby theplayback head of a spare
162 AES 8th INTERNATIONAL CONFERENCE
THECREATIONOF MUSICALSOUNDSFORPLAYBACK
THROUGHLOUDSPEAKERS

deck, and sometimes looped for multiple iterations. This gence of the Musical InstrumentDigital Interface(MIDI) lan-
became a cliched vocal reverb effect in the 1950s and has guage, personal computers and inexpensive multitrack
proved to be tremendouslyuseful for a range of musical ges- recorders has fueled explosively rapid growth of music ere-
turesover the interveningyears, ation utilizingthis sort ofcontrol system.
Shortdelays (ca. 5-50 milliseconds)have proved to be very The full implicationsof this group of developments in syn-
usefulfor a broad range of uses, such as the simulationof in- thesis, particularlysequencing, cannot yet be assessed, but it
strument doubling and fimbral (spectral) modification of a may prove as signficant to the history of music as the deveb
signal (due to comb filtering). The mixing of a signal with its opment of the chromatic keyboard. As the technology ma-
delayed iteration generates such comb f'fiteringand subse- tures, we can expect to see the reintegration of the acts of
quent timbral interest. This can also take the form of doubling composition and performance at a high level of sophisfiea-
or flangingif the lime base of the delay is modulated. In addi- tion in both areas, satisfying both the creative organizational
lion, the distribution of short delays to the other channel of a urge of the composer and at the same time the need for ex-
stereopair results in much stronger spatiallocalization(due to pressive nuance of the peformer, resolving a dichotomy that
the precedence effect) than can be obtained by amplitude pan- has existed in Westem Music since the development of nota-
ning. Stereoflanging, doubling,chorusing and a host of other tion, a dichotomy that has on the one hand led to some of
effectshave alsobecomecommon. Westem Music's greatest achievements (polyphony, the or-
Live doubling(the overdubbing in unison of the same per- chestra, common-practice tonality and modulation), but that
former) is a remarkably beautiful sound, unique to multilrack has also proved frustrating and limiting for performers and
_c_rding. Whetherdone in stereo or mono, it adds substantial composersalike, for a variety of practical, economic and crc-
intensity,impactand interest to the part. This sort of doubling, ativereasons.
liketape echo, hasbecome a durable cliche. Interestingly,uni- A varietyof methods of synthesishave evolved since 1960.
siondoublingsof twoidentical instrumentsare discouragedin In general, the origins of synthesis are firmly rooted in the
traditionalorchestration,particularlywhen the instrumentsare work of Fourier and Helmholtz, in the assumptionsof wave-
rich in upper spectra.31speculatethat the interferenceeffects form constructionand resonant structures as fixed elements.
which prove to be of interest in the multitrackrealm are less These methods include:
· so in the traditionalorchestral realm due to the comparative - Additive synthesis: synthesis by creation of each sinu-
richnessof reverberantinformationin thatrealm, soidal partial of a sound, building timbral character through
Withthe advent of digital reverberators,artificialreverbera- the independentgenerationof each parfial's frequency,ampli-
tion has becomea powerfuland extremely flexibletool.Using tudeand (theoretically)relative phase.
a broad army of algorithms for modeling the reverberant be- - Subtractive synthesis: synthesis which begins with com-
haviorof rooms,reverberatorshave led us to remarkablynat- plex waveforms and removes spectral content by filtering to
ural-sounding artificial reverberance. In addition, numerous achievedesired fimbralcharacter.
non-naturalsoundsand special effects, such as so-called"gat- - FM synthesis: an array of algorithms in the digital realm
ed reverb" (the sudden truncationof an extremely reverberant that permits the creation of any spectralconfigurationthrough
sound),have evolvedinto normal productioneffects, thegenerationof sidebandsby the modulationof frequency of
one sinusoidal wave by another, where the amplitude, fre-
2C. MUSICSynthesis quencyand phaseof both carrier and modulator,and the
Music synthesis has been with us since the advent of the amount of modulationareall preciselycontrolledover lime.
baroquepipe organ.After numerous developmentsin synthe- - Sampling: the recording of single notes of instrumental
sis duringthe firstpart of thiscentury, the analog voltage-con- sound and treating the waveforms of those notes as musical
trolled synthesizer developed by Moog and others in the voices,controllingthem with the same control languageused
1960sbecame the first widely available synthes r. In keep- for other synthesis procedures. This modality has many ad-
ing with multitrack sensibilities, the voltage-controlled syn- vantages in terms of mimicking traditional instruments, but
thesizer treats the musical voice as an electrical waveform, also has proven to be fraught with difficultiesrelatedto (a) the
Other synthesismethods have followed in this tradition, range of expressivenuance of acoustical instruments and Co)
The most significant, enduring and original aspect of problemspertaining to the audibilityof frequency shiftof the
Moog's design is his development of an integrated control formantof an acoustical sound. A special case of sampling is
system,which defmes musical parameters in eleca'ical terms the so-called "drum machine," which combines samples of
(i.e. one octave = 1 volt, etc.), and applies this simple, focused drum kit sounds with a small sequencer which permits the
system to the physical elements of frequency (corresponding generation of complex(?) patterns of drum sounds over time.
to pitch), amplitude(loudness)and spectra (timbre).The other - Resynthesis:a techniquewherein sampledsounds are aha-
key elementin Moog's control system is the envelopegenera- lyzed and converted into an equivalent additive, subtractiveor
tor, which creates control time-contours for each individual other synthesis moclality so that they can be manipulated
sound, permitting extremely flexible shaping (if not varying within the frame of reference of that modality.
expressive nuance) of the loudness and spectral character of - LA (Linear Arithmetic) synthesis: a hybrid morality com-
individualnotes. Sequencing is an extension of synthesis con- bining sampled sounds with subtracfivesynthesis. Typically,
trol languageto permit theglobal controlof multiplesynthesis the beginning or attack of the sound is a recorded sample and
parts in real- or step-time, as well as the creation of more ex- this attack is spliced to a sustaining waveform generated by
pressive note-to-note progressions. The simultaneous emer- subtractivetechniques.

AES8th INTERNATIONAL
CONFERENCE 163
MOULTON

Other synthesis techniques,not necessarily dependentupon the actual sound sources.7 It has been generally assumed in
the ideas of Fourier and Helmholtz, have emergedbut not en- the recording industry that the phantom image is a common-
tered into general usage. These include modalities such as sense adjustment, a neurological intuition that the actual
VOSIM and granular synthesis,which are based on principles sourcemust be somewherebetween the two actual artifacts.I
generally outside of the conceptualrealm of fixed note stmc- speculate that the auditory system in fact infers a more com-
tures derived from waveformsAExperimental modalitiescon- plex situation, to wit: that those identical phase-locked,
tinuetobe developed, equal-time-of-arrival, different-angleartifactsarein factper-
ceived as the first among many reflection artifacts of a sound
3. STEREOPHONY whose direct artifact, for some reason, was not perceived,and
Stereophony has become such a fundamental and ubiquitous that the locationof that unperceived imaginary sourceis to be
element in popular music production that someof its key fea- inferred from all of the reflected artifacts. In the case of a
tures need to be discussed briefly. The original notion of loudspeakersystem whose two sources are in polarity,the sit-
stereophony lies in the idea that if a reverberantspace is ob- nation is comparatively simple: the actual artifacts suggest
served from two seI)aratebut nearbypoints simultaneously(as reflected paths from imaginaryside walls equidistant from an
our ears do), we can derive from the resulting signalsan illu- imaginary source located somewhere (depth, for convention-
sion of that space and the position of sound sources within it. al loudspeakers, is ambiguous) along the median plane.
From this idea we derive the conventional two- When the two sources are not in polarity, the situation is
microphone/two-lou_er recording/playbackstereophon- more confusing, because the reversal in polarity yields a pair
ic system. Technically, stereophonic audio is a dual-channel of artifacts that suggest a greater paradox: the imaginary and
transmission medium based on the premise that spatial infor- not-perceived source must be someplace other than the re-
marion that is both entertainingand satisfyingcan be included gion where the sound is coming from; in fact it must be
as a function of amplitudedifferences(A-Amplitude)and time someplace other than any normal place in the space in which
differences (A-Tune)between the two channels of transmis- we are listening.
sion. In multitrackrecording we mostlyuse an army of mono- In popular music multitrackrecording practice, the mono-
phonic signals that are mixed and dislributed with amplitude phonic (A+B) element is played off against both (A,B) and
differences (via pan-pots) to the two stereo channels. The (A-B) elements of stereophony, as will be described below.
original sources are mostly single monaural signals (either The clearly audible differentiation between (A+B) and the
acoustically recorded in a space or electrically generated)that other elements is used to musical and dramatic effect. The
are electrically summed with varying distributionsof ampli- monophonic origins of these sounds are usually quite audible
tude between the two stereo channels for each monaural (particularly in a reflection-controlled control mom or ane-
source signal. In keeping with the above characterizations,it choic listening environment). Often, the sound of an acousti-
is possible to approximately define in the abstractthe physical cai stereophonicrecording (usuallyderived from a coincident
parameters of the stereophonic signal as follows: or an AB microphone placement on the dram kit) serves as a
the stereophonicartifact consistsof a pair of signalsderived sort of sonic "glue" to hold the stereophonic soundstage to-
either from a common and simultaneously recorded acoustic gether, while (A-B) and (A,B) components spread the re-
sound source or from a common electrical audio signal mainder of the elements on the soundstage as widely as pos-
source, so that the pair have a fixed or locked phase relation- sible for maximum stereophonic impact. This process has as
ship. The pair's common elements have less than 10 decibels its counterpart in acoustical stereophonic recording the ad-
amplitude difference between them and less then 50 millisec- justment of amplitude of the "side" information in a middle-
ondstimeoffset.5 siderecording.
The phantom image, an illusion derived from the (A+B)
component of the stereo signal, is a clearly perceived image 4. THE EVOLUTION OF "RECORDED MUSIC
localized(usually)betweentwo loudspeakers. It is theresult AESTHETICS" IN POPULAR MUSIC
of a peculiarity of the stereophonicplayback system vis-a-vis The effectiveness of the illusion of acoustical sound
our auditory localization process. In pre-technological nature, reproductionis sufficientlycompelling that it has causedus to
no two sources are phase-locked, and our echo-location sys- mostly overlook the physical reality behind that illusion.
tem relies upon this fact: we localize through the Precedence When we listen to sound recordingswe rarely notice theplay-
Effect, consciously identifying direction of the fa'st arrival of back loudspeaker per se; instead we hear pianos, singers,
a phase-locked series of artifacts as thedirection in which the voices, electric guitar amps, and orchestras. We have little
source of that sound lies. All subsequent phase-locked arti- sense of the loudspeakeritself as a musical instnnnent, or of
facts are, in pre-technologieal nature, reflections.6When two its essential musical character.However, by physical del'mi-
loudspeakers each emit the same sound artifact, a listener tion the loudspeakeris in fact a musical instrument,and it has
equidistant from those loudspeakers perceives two phase- become, without formal recognition,the predominant musical
locked artifacts arriving from different directions simultane- instrumentof our time.
ously. This is paradoxical, because it represents a condition In the reproduction of classical music (or any music nor-
that 1) does not occur in pre-technological nature, and 2) our reallyperformed acousticallyby tradition and preference),the
neurological perception system is not equipped to identify, generally accepted goal of the sound recording is to simulate
The resulting perception is the illusion of a single sound the acoustical performance that was recorded and to evoke its
source located somewhere along the median plane between spirit, ambience and artistic force as compellingly and realisti-

164 AES 8th INTERNATIONAL CONFERENCE


THECREATION
OFMUSICAL
SOUNDS
FORPLAYBACK
THROUGH
LOUDSPEAKERS

tally as possible. In popular recorded music since 1950, no placement of direct voices in such a space is irrelevant in pre-
such goal for the sound recording has existed. In large part, sent-day popular recordings.
the music is created for performance by loudspeaker, and any Arising from these treatments of direct voices and the lay-
acoustical performance is normally a simulation. This has sig- ered ambiences assigned to those voices is an aesthetic of
nificant aesthetic implications and represents at least two ma- stereophonic playback that has evolved over the past two
jor changes in the way music is created: decades with little or no formal notice. This aesthetic (which I
- The performance of the music is fixed by the storage have called the "A+B/A-B aesthetic")8 is de facto and clearly
medium and subsequently dispersed (in software form) in in response to the desirability of a strong stereophonic illusion
time and space and then generated mechanically by a large coupled with the need to have that illusion work for the widest
range of different loudspeaker systems; range of possible stereophonic playback environments.
- The producer or composer exercises a greater control Briefly stated, the A+B/A-B aesthetic is based on the idea
than ever before over choice of (relative) timbre, tempo, in- that popular multitrack music consists of two complementary
terpretation and mood of the performance and obtains for the and interactive realms: the realm of direct musical statement
first time ever the ability to create a definitive "best perfor- and exposition, which is represented by the (A+B) compo-
mance", while at the same time losing control over the quali- nent of the recording, and the realm of supporting and often
ty, volume level, playback ambience and gross tonal charac- antiphonal rhythmic, harmonic and textural parts, which is
ter of the performance, represented by the (A-B) and (A,B) components of the
In the multimack recording, each of an away of tracks usual- recording. (The [A+B] component is that group of elements
ly carries a signal assumed to be equivalent to a musical pan, that is common to both channels A and B, the IA-BI compo-
as described above. These parts, in popular music, exist in nent is the group of elements that are common but with dif-
families, including the drum kit and electric bass, vocal parts, feting signs or polarities, and the [A,B] component is the
other rhythm parts, instrumental solo parts (i.e. "leads"), and group of elements that are present independently in either
instrumental backgrounds (i.e. "horns .... sweeteners .... string channel. For purposes of discussion, IA,BI is usually includ-
pads" "keyboards" etc.). Primary musical elements are the ed in the set of [A-B].) The interaction of these realms results
bass (or "kick") dram and electric bass (who together provide in a clear and musically effective spatial polyphony that is
the rhyilunic and harmonic foundation of the music), the lead audible over a broad range of playback systems. This interac-
vocal, and various instrumental lead solos that occur through- tion sums effectively to monophony; although the spatial
out the song. The recorded kick dram sound is highly stylized polyphony is lost, other musical elements remain essentially
and dynamically unrelated to the rest of the drum kit. The intact. The aesthetic arises naturally from the act of mixing
recorded electric bass sound is no longer of acoustic origin at on the median plane with panning-by-amplitude as the pti-
all, but is derived directly from the electrical output of the mary determinant of localization.
bass instrument itself. (Usually, when recording, the bass The aesthetic is comparatively simple in its realization in
player will perform in the control room rather than the studio, pop music: lead parts, kick-drum and eleclric bass are (A+B),
hearing himself via the control room monitors instead of which is to say that they are centered in the monaural phan-
through a bass guitar amplifier, by preference and established tom image between speakers, and virtually everything else is
tradition.) Supporting this basic primary group is a broad ar- either (A-B) or (A,B) in a balanced array, for the purposes of
ray of musical elements: the recorded sounds of rhythm gui- either "framing" or "answering" the (A+B) part(s). Ambience
tar(s) and keyboard parts, the rest of the drum kit (especially is generally and most effectively transmitted in the (A-B)
the snare drum), wind and string background instruments and realm and is used as a textural agent in the recording produc-
sections, and harmony vocals. These parts are recorded acous- don. There is no simulation of the environment of a live per-
tically in sections or part by part, and/or sampled or synthe- formance, no attempt to realistically invoke the illusion of
sized in whole or in part. such an environment. The A+B/A-B aesthetic results in a dis-
The spatial ambience accompanying such recordings, tech- tinct musical character, one that is only heard via loudspeak-
nically speaking, consists of discrete delays and reverberance ers playing back recordings. It is not heard in acoustic or rein-
derived fxom the direct parts, often with delays added to the forced live performance, nor is there any attempt to simulate it
onset of reverberance ("predelays"). These delays may be in those venues. It is, simply, a particular and unique charac-
treated with eq_mliTafion,compression/expansion, and/or am- teristic of music intended for playback over loudspeakers.
plitude-panning in a manner similar to the trealment of the di- A note about spectra of these various components: for rea-
rect voices, as they are passed through similar hardware. The sons originally having to do with limitations of the stylus
values pertaining to ambience are established intuitively, and movement in the record groove, low frequency (A-B) signals
are an essenfialpart of the sylistic signature of a multimack have been avoided in popular multitrack recording. Further, as
producer and/or remix engineer. Separate delays or groups of localization of acoustical sources in space is done in signifi-
delays may be used on individual parts, and separate artificial cant part by simultaneous intra-ear analysis of pinna effects at
reverberances may be similarly allocated, so that a finished the basilar membrane9, the use of high-frequency (A-B) and
pop/rock mix may include the simulation of multiple rever- (A,B) spectra for spatial localization has proved perfectly ap-
berant environments: one for the lead vocal, another for the propriate and effective, while at the same time being techni-
snare drum, a third for the strings, and so on. In the face of cally acceptable for stereophonic record production. The "pre_
the above treatments and practices, it is clear that the simula- ferred" low frequency spatial informationl0,11 that normally
tion of a distinct, singular, coherent acoustical space and the leads to our sense of sonic space (as opposed to source) has

AESSthINTERNATIONAL
CONFERENCE 165
MOULTON

been lost and/or not simulated, and sense of spaceis transmit- "stereo-enhance"switchwhich reverses the polarityof one of
ted primarily via reverberant wash. Therefore, a spectral dif- the speakers for greater illusion of space in the playback, at
ferenfiation between (A+B) and the other elements can be the sacrifice of a stable monaural phantom image. The ambi-
heard.Low frequency content is generallyrestrictedto (A+B), ent noise levels are the same as the range of the levels of the
while (A-B) and (A,B) components are particularly rich in environmentitself.
spectraabove 2500 Hertz. The environments mentionedabove are sufficientlydiffer-
Incidentally, study of the (A-B) component of a stereo- ent that they are almost incapable of being reconciled, with
phonic recording, (by inversion of the [BI componentprior to the automobile, the headphoneand the table radio presenting
summation to mono) leads to many quite useful insights stereophonic experiencesthat are mutally exclusive.Clearly,
about the nature of the recording methods and style of any music that is to be successfullyenjoyed in all of these envi-
given producer. First suggested to me by Bob Ludwig, this ronmentsmust:
study is highly recommended to any serious student of music (a) be comparativelysimple spatially and texturally (which
recording, particularly as it applies to popular multitrack is not to imply either crudenessor lack of musicalsensibility),
recordings.12 (b) have limited note-to-notedynamicrange,
(c) have a lownoise floor,
5. THE LISTENING ENVIRONMENTS (d) haveexaggerated spatialcharacteristics
that donot
It is unreasonable to attempt any assessment of style and come significantly degraded timbrally when summed to
usage of an art and its medium without also a consideration mono,and
of the audience(s) for that art. The audiences for recorded (e) not be dependent on the outer octaves (20-80 Hz., and
music are extremely diverse. However, it is possible to make 10-20 ldtz.) of the audible spectrum for musical effect.
some generalizations about the audience for so-calledpopu- Popular music generally fits this template, by economic
lar music, and about the conditions under which they listen necessity. The recording practices currently in use support
to music, such a template, and it is not reasonable to assume any nar-
The audience for popular music is comparatively young rower range of end-user environments for the purposes of as-
(and/or nostalgicfor their youth as a functionof their listening sessing the desirabilityof any recordingcraft, technique,suit-
to music). Their experience with music is predominantly ability, or realism, except when specifically considering
through recording and broadcast, and involves little, if any, .either a "niche" style of recording (such as an audiophile
experiencewith acousticperformances of music. The range of recording) or a "niche" style of music (such as heavy-metal).
listening systems/environments which they use is broad and The creators of music for mass distribution can certainly be
diverse. The various types of environments may all be regard- expected to continue to create music which remains broadly
ed as "typical," which leads to the paradoxical and problemat- suitable for these diverse environments, while perhaps not
ic notion that "ideal" recordings have characteristicsthat are ideal for any of them.
"ideal" for all of these environments. The general group of lis-
teeingenvironmentsincludesthefollowing: 6. THE NATUREOF THE MUSICAL VOICE
- The Living Room: typically with two loudspeakersand a Music is not a waveform, but a psychological and spiritual
viable median plane, possibly integrated with a monaural conslruct within the mind. The waveform and its physicaldi-
television. The typical residential living room is the basis for mensions are simply carders of musical information, while
the IEC listening room. The useful range of Sound Pressure music itself lies within higher realms of processing in the
Levels is from ca. 40 to ca. 100 dB SPL, with a "normal" mind: the processing of patterns,the mindplay of neural tern-
range probably being 55-90 dB SPL. Extremes of the range plates and associations, and the mysteries underlying our
are represented on one hand by the "high-end audiophile"en- emotional, spiritual and physicalresponses.13This is true for
vironmentand the "compact" stereo in a small apartment on all music, including that generated by loudpeakers, and it
the other, may be that the best productiongoal for loudspeakermusic is
- The Kitchen/Bedroom: characterized by monaural table to create sound artifacts that are as rich in the pattems of mu-
radio/TV.No stereo is availableand no median plane is possi- sicalperformance as are musical sounds created in the acous-
ble. Thereis a useful dynamic range from 40 to 75 dB Sound ticalrealm.
PressureLevel. The basic elements of sound are the pauems of freqencies,
- Headphones:characterized by wide dynamicrange (up to spectra and amplitude that we traditionally associate with
70 dB in the ear), broad bandwidth and a constant median waveform.The higher-levelmusicalelements are the patterns
plane. Headphones yield comparatively strong stereophonic of change of these basic elements. It is these patterns of
illusions and significant audible differentiation between change that our neurological system seems particularlywell-
recordingsof stereophonicand monophonicorigin, suited to observe and enjoy and that are central to the idea of
- The Automobile: characterized by the presence of stereo musicality.Another way of puttingit is to say that musicalex-
(often with multiplepairs of speakers, all off-axisto the listen- pressionresides primarilyin the transitionsbetween notes,the
ers), no median plane possible, very limited dynamic range differencesbetween the waveforms, and the dynamic ebb and
(10-30 decibels maximum) combined with extremely high flow of intensityand spectrum,ratherthan in the notes, wave-
backgroundnoise levels (60-100 dB SPL). forms, spectraand intensitiesthemselves.
-The Boom-box: a portable stereo playback system de- At both conscious and pre-consciouslevels, our enjoyment
signed for nearfield individual listening. Usually includes a of music is associatedwith our observationof and mentalplay

166 AES 8th INTERNATIONAL CONFERENCE


THE CREATION OF MUSICAL SOUNDS FOR PLAYBACK THROUGH LOUDSPEAKERS

with patterns of variation in time, fiequency and localization, characterof this wash,rather than in response to the ensemble
This enjoymentis derived from both the obvious and the inex- of heardbut unnoticedearly reflections.
plicable audible realities,and the tension between them. We Listening to music via loudspeakers presents some inter-
enjoy the sound of the musical ensemble, for instance,partly esting issues in regard to this process. The high-frequency
because we don't hear the many separate voices, but one emissions from loudspeakerstend to be quite directional,and
greater voice that transcendsthe quality of individual voices, our intuitive design tendency has been to suppress reverber-
Misdirection,timbral ambiguity,phantom sources, and meta- ant paths in the playback space as well. Further, recording
inslmmentsare all normalelements in the craftof Iraditional, tradition has emphasized pickup of only direct sound in the
acoustical orchestration. By the same token, we enjoy the recording studio (the microphone cannot make the use of the
mysteriesof illusionassociated with the (A-B) stereophonic multiple paths that the human hearing mechanism does, and
artifact and the illusions of stereophonic acoustical realities its summation of the information of those multiple paths
thatdiffer fromthe physicaland visual ones that actually sur- leads to a subsequent degradation of timbre, due to the quite
round us. Our enjoyment is also based upon the predictable audible interference effects resulting in comb-filtering). The
and unpredictablepatterns of change in micro-, intermediate- result is that musical sound from a loudspeaker usually lacks
and macro-_s of time: change in shapes of the dynamic the multiple, reflected paths of a typical acoustical musical
elementsof notes (attacks,releases,etc.), change in shapes of sound to the ear and suffers in comparison with such a sound
loudnessand timbre of individual notes, and change in pat- due to its one-dimensionalsingularity.I speculatethat a large
terns of shapes of loudness and timbre of patterns of part of the reason for the commercial success of stereophony
notes.14,15We respondto the implied speechand motor rem- lies in its partial recreation of the multiplicity of paths inher-
plates such pattems evoke, and respond to the tensions that ent in the acoustical sound, and consequent enhanced enjoy-
emerge between those templates and the image templates ment of the musical artifact patterns that such multiplicity
evoked by lyricsand other associativepropertiesof the musi- seems to provide.
cai setting. When we artificiallyinvoke the precedence effect with the
One of the most successful aspects of acoustical-source use of short delays (less than 50 milliseconds),thereby simu-
sound recordinghas been its ability to accuratelyrecreate the lating the presence of multiplepaths, it becomes immediately
emotivepatterns generatedby the recorded performersand to apparentthat the presence of the delayed (reflectedpath) acd-
therebytransmitsomethingof the emotionalintensityinherent fact has a largeimpact on the subjectivequality of the sound.
in the original performance. A conesponding deficiency of Workingwith synthesizersand delay lines quickly leadsto the
soundrecordingto date arises from the failureof loudspeakers insight that synthesizer sounds "come to life" with the addi-
to mimicthe way musicalinstrumentsinteract with the perfor- tion of early delays, particularlywhen the delays have differ-
mance space, thereby reducing the impact on the listener re- ent localizationcues than the undelayed sound (thus mimick-
lated to the perceptionof that interactionof source and room. ing the directional behavior of direct and reflected sounds in
This deficiency is particularly evident in multitmck recording the acoustical realm), and subsequent removal of such delays
of popularmusic, causes the sound to appear fiat and one-dimensional,without
body or depth. (Interestingly, synthesists almost invariably
6A. THE MUSICAL VOICE IN THE ACOUSTICAL workwith reverberance added,because, they will tell you,the
REALM dry sounds are simply toodull to put up with for the hours
In the acousticalrealm, the sound of music arrives at our ears flat are requiredfor mostsynthesis work.)
via multiplepaths with a complexity that is,when confronted
with the conventionaltoolsof physicalacousticalmeasure- 6B. THE MUSICAL VOICE INTHE ELECTRICAL
ment, both bewildering and baffling. As Arthur Benade has REALM
noted, the musical source is coherent, our perception of that In contrast to the acoustical source, the electrical audio
source is coherent, but the intervening transmission path is waveform is singular, and involves no ensemble of complex,
chaotic.16 It is not clear, at this lime, how we reconstruct the phase-locked iterations such as we encounter in acoustical
musicalityof the sourceinto such a clear perceptualconstruct space. The conceptual substitution of this waveform for its
from the chaotic transmissionpath, but it is axiomatic that we complexand multi-facetedacoustical counterpartis a primary
do. Benade furtherobserves,and generalobservationof musi- feature of multilrackrecording. This has lead to the practices
dans supports this idea, that the complexity of the Iransmis- discussed in this paper: the in-series manipulation of the
sion path seemsto enhance,not confuse, our perceptionof the waveform by equalization and compression/expansionin or-
acousticalmusical sound.17 We take for granted the acousti- der to manipulate timbre, and the parallel branches of delay
cal reality of our musicalperception. Processes leading to the and reverberation,which are clearly an attempt to reintroduce
precedenceeffect integrateour perceptionof room reflections the interest caused by the multiplicity of signal paths in the
with our perceptionof the direct sound, so that most of what acousticalrealm. In general,lime effort has been expendedon
we hear appears to be direct sound (even though the intensity the development of a more complex electrical signal model
of the reflected sounds is usually equal to or greater than the for recordingproduction, except in the area of binaural sound
intensity of the direct sound), and the early reflections and for a varietyof post-productionillusionenhancementpro-
themselvesare not apparent to us. Reverberance is perceived eesses (such as the various quadraphonic matrices and more
as a wash of sound followingthe direct sound,and we tend to recent commercialprocesses).Synthesis developmentcontin-
define "good" and "bad" acoustics based upon the perceived ues to explore the development of singular algorithms and

AES 8th INTERNATIONAL CONFERENCE 167


MOULTON

samples,and the evolution of consoles, signal processorsand nature),to mimic what happens in a conventionalroom, may
microphonesseems inextricablywedded to the notion of mu- prove to be aestheticallyand aurally very satisfyingand it cer.
sic-as-single-waveform, tainly is a significantly more robust representationof acousti-
Much of this has to do with the need to minimize data re- carlreality than is our Currentelectronic representation of a
quirements.Audio is, thanks to the extreme dynamicrange of sound. The synthesis of music via the generation and imple-
our hearing, data-hungry. The great promise of mulfiwack mentationof templatesof patterns of change relatedto specif-
recording and synthesis (to give the composer/producer ic emotional states, gestures and language,overlaid on values
greater control of and facility with music at reduced cost) has of frequency, amplitude and specman, may prove to be far
been traditionally limited by data limitations (manifesting more palpable, moving and effective than our present synthe-
themselves as track number limits and clearly audible band- sis modalities.
width limitations and distortion/noise thresholds). Develop- Toward this end, more complex microphone, signal-hah-
meat efforts have been expended primarily upon improving dling and signal reproductionsystems might be called for, al-
control and facility, and reducing data needs through time- though I expect two-channel stereo to remain the playback
sharing, the substitutionof control algorithms and samples for medium of choice, and (more likely) we will develop elabo-
mw data,and simplified signal processingmodesof operation rate signal-processing algorithms functioning globally in a
(parfio,larly compression, expansion and reverberance)that console to simulate such systems effectively. Poss_ilities in-
provide at least a rudimentary and sometimes a sophisticated dude the ideas of Middle-Side and Seven-Path Model Syn-
mimicry of the more complex acoustical reality.At the pre- thesis and global reverberators (consoles?)whosearchitecture
sent time, the notion of a multiple-iteration version of each permits the simulation of complex acoustical spaceswith the
musical voice (thus replicating a basic acoustical reverberant insertion of multitrackmusical voices at differentiated points
behavior)is theoretical, within those virtual spaces. At another level, simulationand
insertion of psychoacousticcues to invoke auditory illusions
7. CONCLUSIONS (suchaspinnareflectionsimulationto invokeiRma-aural lo-
The productionof popular music via multitrack recording calization,and non-linearity simulationto invoke pseudo-
representsby far the great majorityof all music recordingpro- loudness)are technologicallyin developmentand limiteduse
ductiondone in the world today (perhaps 90% of it).Over the now, and almost certainly will be used broadly and freely as
forty years since the advent of multitrack technology,a style various correlations between psychoacoustic stimuli and re-
of multitrack recording has.come into being that is distinct, sponsesare more fully establishedand theirwindowsof effec-
particularly in its stereophonic reali?ation, from the styles of tiveness defined.
recording employed in the recording of acoustical musical Philosphically,this moves away from the orginal archival
events. As this paper has described, this style is in large part a notion of accuracy in sound recording, and toward the notion
response to comparatively simple models for the musical sig- of intensityof illusion. Sucha directionis certainly in keeping
hal, the demands of an extremely wide rangeof playback eR- with the aesthetic propensitiesof popular music,and it proba-
vironmeats, and a production environment that emphasizes bly represents a more sensibleoverall goal artisticallyas well:
qualities that are suited to the configuration of theproduction to produce the most powerful, affectingand compelling musi-
hardware but not particularlyrelevant to the end-user's play- cai art that we can.
back environmeat(s). This has led to the evolved aesthetic
processdescribedemlier.We'are at a point, however,where ACKNOWLEDGEMENTS
both changes in the hardware system and a growing body of The author would like to thank Dr. Robert Myers,Associate
knowledge and historical perspective have led us to begin to Dean of Curriculum, and Dr.Thomas Rhea, AssociateProfes-
institutechanges in the ways we create this music, sor of Music Synthesis, both at Berklee College of Music,
Central to this change is the acknowledgementof the im- Boston, MA, for their invaluableassistance in the preparation
portance of pattem-cbange in musical aesthetics. The devel- of this paper.
opmeat of more sophisticatedand musical algorithms for se-
queace manipulationand control and for note-to-notetimbral REFERENCES
control in synthesis (to permit the comparativelyeasy integra- 1. Davis, Don & Davis. Chips, The LEDE® Concept
tion of expressive nuance into a group of notes) is beginning ' for the Control of Acoustic and Psychoacoustic Parametersin
to occur in the realm of music synthesis,and there are a wide Recording Control Rooms. Journal of the AudioEngineering
variety of ambience enhancement devices of all types and at Society28:9, Sept. 1986,pp. 585-595.
all prices in de facto acknowledgement of the need for sonic 2. Muncy,Neil, private conversations,1986-89.
complexityin our musical signal generation. 3. Druckman, Jacob, lectures in orchestration (cf.
In the hardware realm, increasing availabilityof memory Berlioz, Stravinsky, Piston), Juilliard School of Music, New
for data and and speed for conlrol algorithms arerelaxing the York,NY, 1963-6.
constraints we have faced more quickly than we can assimi- 4. Rhea, Tom,private conversation, 1990.
late the new capabilities they present. Conceptually,we can 5. Moulton, David, The Perception of Stereophonic
now reconsider some of our technical notions regarding the Artifactsin a ReverberantSpace, lecture presentedto the Fre-
nature of music. For instance, the representationof a recorded donia section of the AES, Fredonia, NY, April 20, 1989.Un-
or synthesized musical sound as a seven-iteration construct published.
(each with its own spectrum and angle-of-arrivalspectralsig- 6. Moulton, Ferralli, Hebrock and Pezzo, Localization

168 AES8thINTERNATIONAL
CONFERENCE
THECREATION
OFMUSICAL
SOUNDS
FORPLAYBACK
THROUGH
LOUDSPEAKERS

of PhantomImagesin an OmnidirectionalStereophonicLoud- 11. Griesinger,David,Spaciousness and Localization in


speakerSystem, The, presentedat the Audio EngineeringSo- Listening Rooms and Their Effects on the Recording Tech-
ciety81st Convention,November, 1986,Los Angeles,CA. nique, Journal of the Audio Engineering Society 34:4, 1986,
7. Blauett,Jeus (la'.Allen), SpatialHearing,MIT Press, pp. 255-268.
Cambridge,MA, 1974,1983,pp. 204-213. 12. Ludwig,Robert, private conversation,1985.
8. Moulton,David, MiscellaneousNotes in Supportof 13. Roederer, Juan G., Introduction to the Physics and
CriticalListening and Audio Ear Training Presentations, pp. Psychophysics of Music, 2nd Ed., Springer Verlag, NYC,
7-10. NationalPublic Radio Training,Washington,IX21990. 1973, 1979,pp 11-12,161-170.
9. Wright, Hebrank & Wilson, Pinna reflections as 14. Meyer, Leonard B. Emotion & Meaning in Music,
cues for localization. Journal of the Acoustical Society of The Universityof ChicagoPress, Chicago, IL. 1956.P.22-43.
America56:3, 1974,p. 957. 15. Roederer,Juan G., pp. 3-5.
10. Schroeder,Gottlob & Siebraase, Comparative study 16. Benade, A.H., From Instrument to Ear in a Room:
of Europeanconcerthalls: correlationof subjectivepreference Direct or viaRecording,Journal of the AudioEngineeringSo-
with geometric and acoustic parameters., Journal of the ciety 33:4, April, 1985,pp. 218-9.
AcousticalSocietyof America56:4, 1974,p. 1195ff. 17. Benade,A.H., pp. 226.

AES8111
INTERNATIONAL
CONFERENCE 169

You might also like