Professional Documents
Culture Documents
Jaes V52 4 All PDF
Jaes V52 4 All PDF
In this issue...
Interpreting Multitone Distortion
Spectrum
Compensating for Lossy Voice Coils
Database Searching to Identify
Audio
Pitch Detection with Percussive
Background
Features...
25th Conference
London—Preview
Technology Overview of
Loudspeakers
DSP in Loudspeakers
Surround Live Summary
26th Conference, Baarn
® Call for Papers
STANDARDS COMMITTEE
AUDIO ENGINEERING SOCIETY, INC.
INTERNATIONAL HEADQUARTERS
60 East 42nd Street, Room 2520, New York, NY 10165-2520, USA Richard Chalmers Mark Yonge
Tel: +1 212 661 8528 .
Fax: +1 212 682 0477 Chair Secretary, Standards Manager
E-mail: HQ@aes.org .
Internet: http://www.aes.org John Woodgate
Vice Chair
Yoshizo Sohma
Vice Chair, International
Bruce Olson
ADMINISTRATION Vice Chair, Western Hemisphere
Roger K. Furness Executive Director
Sandra J. Requa Executive Assistant to the Executive Director SC-02 SUBCOMMITTEE ON DIGITAL AUDIO
OFFICERS 2003/2004 TELLERS
Christopher V. Freitag Chair Robin Caine Chair Robert A. Finger Vice Chair
Ronald Streicher President Working Groups
Theresa Leonard President-Elect TECHNICAL COUNCIL
SC-02-01 Digital Audio Measurement Techniques
Kees A. Immink Past President Wieslaw V. Woszczyk Chair Richard C. Cabot, I. Dennis, M. Keyhl
Jim Anderson Vice President Jürgen Herre and
Eastern Region, USA/Canada Robert Schulein Vice Chairs SC-02-02 Digital Input-Output Interfacing:
John Grant, Robert A. Finger
Frank Wells Vice President
Central Region, USA/Canada TECHNICAL COMMITTEES SC-02- 05 Synchronization: Robin Caine
Bob Moses Vice President ACOUSTICS & SOUND
Western Region, USA/Canada REINFORCEMENT SC-03 SUBCOMMITTEE ON THE PRESERVATION AND RESTORATION
Søren Bech Vice President Mendel Kleiner Chair OF AUDIO RECORDING
Northern Region, Europe Kurt Graffy Vice Chair
Bozena Kostek ARCHIVING, RESTORATION AND Ted Sheldon Chair Dietrich Schüller Vice Chair
Vice President, Central Region, Europe DIGITAL LIBRARIES Working Groups
Ivan Stamac David Ackerman Chair
Vice President, Southern Region, Europe
SC-03-01 Analog Recording: J. G. McKnight
AUDIO FOR GAMES
Mercedes Onorato Vice President Martin Wilde Chair SC-03-02 Transfer Technologies: Lars Gaustad, Greg Faris
Latin American Region AUDIO FOR SC-03-04 Storage and Handling of Media: Ted Sheldon, Gerd Cyrener
Neville Thiele TELECOMMUNICATIONS
Bob Zurek Chair SC-03-06 Digital Library and Archives Systems: David Ackerman,
Vice President, International Region Ted Sheldon
Andrew Bright Vice Chair
Han Tendeloo Secretary
CODING OF AUDIO SIGNALS SC-03-12 Forensic Audio: Tom Owen, M. McDermott
Marshall Buck Treasurer James Johnston and Eddy Bogh Brixen
Jürgen Herre Cochairs
GOVERNORS
AUTOMOTIVE AUDIO
Jerry Bruck Richard S. Stroud Chair SC-04 SUBCOMMITTEE ON ACOUSTICS
Curtis Hoyt Tim Nind Vice Chair
Garry Margolis HIGH-RESOLUTION AUDIO Mendel Kleiner Chair David Josephson Vice Chair
Roy Pritts Malcolm Hawksford Chair
Vicki R. Melchior and Working Groups
Don Puluse Takeo Yamamoto Vice Chairs
Richard Small SC-04-01 Acoustics and Sound Source Modeling
LOUDSPEAKERS & HEADPHONES Richard H. Campbell, Wolfgang Ahnert
Peter Swarte David Clark Chair
Kunimaro Tanaka Juha Backman Vice Chair SC-04-03 Loudspeaker Modeling and Measurement
MICROPHONES & APPLICATIONS
David Prince, Neil Harris, Steve Hutt
COMMITTEES
David Josephson Chair SC-04-04 Microphone Measurement and Characterization
AWARDS Wolfgang Niehoff Vice Chair David Josephson, Jackie Green
Garry Margolis Chair MULTICHANNEL & BINAURAL SC-04-07 Listening Tests: David Clark, T. Nousaine
CONFERENCE POLICY AUDIO TECHNOLOGIES
Søren Bech Chair Francis Rumsey Chair
Gunther Theile Vice Chair SC-05 SUBCOMMITTEE ON INTERCONNECTIONS
CONVENTION POLICY & FINANCE
NETWORK AUDIO SYSTEMS
Marshall Buck Chair Jeremy Cooperstock Chair
EDUCATION Robert Rowe and Thomas Ray Rayburn Chair John Woodgate Vice Chair
Theresa Leonard Chair Sporer Vice Chairs
Working Groups
FUTURE DIRECTIONS AUDIO RECORDING & STORAGE
SYSTEMS SC-05-02 Audio Connectors
Ron Streicher Chair
Derk Reefman Chair Ray Rayburn, Werner Bachmann
HISTORICAL Kunimaro Tanaka Vice Chair
J. G. (Jay) McKnight Chair SC-05-05 Grounding and EMC Practices
PERCEPTION & SUBJECTIVE Bruce Olson, Jim Brown
Irving Joel Vice Chair EVALUATION OF AUDIO SIGNALS
Donald J. Plunkett Chair Emeritus Durand Begault Chair
LAWS & RESOLUTIONS Søren Bech and Eiichi Miyasaka SC-06 SUBCOMMITTEE ON NETWORK AND FILE TRANSFER OF AUDIO
Vice Chairs
Theresa Leonard Chair
SEMANTIC AUDIO ANALYSIS
MEMBERSHIP/ADMISSIONS Robin Caine Chair Steve Harris Vice Chair
Mark Sandler Chair
Francis Rumsey Chair
SIGNAL PROCESSING Working Groups
NOMINATIONS Ronald Aarts Chair
Kees A. Immink Chair James Johnston and Christoph M. SC-06-01 Audio-File Transfer and Exchange
PUBLICATIONS POLICY Musialik Vice Chairs Mark Yonge, Brooks Harris
Richard H. Small Chair STUDIO PRACTICES & PRODUCTION SC-06-02 Audio Applications Using the High Performance Serial
George Massenburg Chair Bus (IEEE: 1394): John Strawn, Bob Moses
REGIONS AND SECTIONS
Alan Parsons, David Smith and
Subir Pramanik and Mick Sawaguchi Vice Chairs SC-06-04 Internet Audio Delivery System
Roy Pritts Cochairs Karlheinz Brandenburg
TRANSMISSION & BROADCASTING
STANDARDS Stephen Lyman Chair SC-06-06 Audio Metadata
Richard Chalmers Chair Neville Thiele Vice Chair C. Chambers
Correspondence to AES officers and committee chairs should be addressed to them at the society’s international headquarters.
Europe Conventions
AES REGIONAL OFFICES
PAPERS
Graphing, Interpretation, and Comparison of Results of Loudspeaker Nonlinear Distortion
Measurements........Alexander Voishvillo, Alexander Terekhov, Eugene Czerwinski, and Sergei Alexandrov 332
For loudspeaker nonlinearity, measurement techniques range from single-tone harmonic distortion, which
is easy to interpret but not indicative of performance with music, to reactions to multitone stimuli, which
are hard to interpret but highly informative. Because multitone techniques have the potential to predict the
perception of nonlinearities, the authors focus on various presentation formats and analysis techniques to
make the relevant information in the thousands of intermodulation products accessible and meaningful.
ENGINEERING REPORTS
Impedance Compensation Networks for the Lossy Voice-Coil Inductance of Loudspeaker
Drivers....................................................................................................................W. Marshall Leach, Jr. 358
The high-frequency rise in the voice-coil impedance of a loudspeaker driver produced by lossy voice-coil
inductance can be approximately cancelled by a Zobel network connected in parallel. Such networks
improve performance by presenting purely resistive impedance to the crossover network. Although higher
order networks can be used, a pair of resistors and capacitors is sufficient for typical drivers.
Scalable, Content-Based Audio Identification by Multiple Independent Psychoacoustic
Matching .............................................................................Geoff R. Schmidt and Matthew K. Belmonte 366
A software audio search system, as an analog to text searching, allows a target music sample to be
identified by matching it to a database containing an inventory of reference samples. Rather than rely on
autonomous metadata, the algorithm uses a sequence of vectors based on perceptual attributes. By
iteratively testing a progression of such vectors, the algorithm has the ability to trade accuracy versus
compute time. With increasing storage capacity to hold virtually unlimited quantities of audio data, an
efficient search algorithm is a necessity.
On the Detection of Melodic Pitch in a Percussive Background .........Preeti Rao and Saurabh Shandilya 378
Although many pitch detection algorithms have been proposed over the years, the problem is particularly
difficult when melodic instruments are accompanied by percussive background. The authors propose a
temporal autocorrelation pitch detector motivated by an auditory model that attempts to suppress errors
produced by inharmonic interfering partials of such instruments as a kick drum. Separate processing of
frequency channels proved crucial in reducing the distortion products, due to the nonlinear hair-cell model,
between the signal harmonics and the interfering partials.
FEATURES
25th Conference Preview, London ......................................................................................................... 402
Calendar.............................................................................................................................................. 404
Program .............................................................................................................................................. 405
Registration Form .............................................................................................................................. 411
Historical Perspectives and Technology Overview of Loudspeakers for Sound Reinforcement
............................................................................................................................J. Eargle and M. Gander 412
DSP in Loudspeakers.............................................................................................................................. 434
Surround Live Summary ...............................................................................................Frederick Ampel 440
26th Conference, Baarn, Call for Papers ............................................................................................... 457
DEPARTMENTS
Review of Acoustical Patents ...........................395 Membership Information...................................451
News of the Sections.........................................443 Advertiser Internet Directory............................453
Upcoming Meetings ..........................................447 In Memoriam ......................................................456
Sound Track .......................................................448 Sections Contacts Directory ............................458
New Products and Developments ....................449 AES Conventions and Conferences ................464
Available Literature............................................450
PRESIDENT’S MESSAGE
Harmonic distortion and total harmonic distortion may not convey sufficient information
about nonlinearity in loudspeakers and horn drivers to judge their perceptual acceptability.
Multitone stimuli and Gaussian noise produce a more informative nonlinear response. The
reaction to Gaussian noise can be transformed into coherence or incoherence functions. These
functions provide information about nonlinearity in the form of “easy-to-grasp” frequency-
dependent curves. Alternatively, a multitone stimulus generates a variety of “visible” har-
monic and intermodulation spectral components. If the number of input tones is significant,
the nonlinear reaction may consist of hundreds, if not thousands, of distortion spectral
components. The results of such measurements are difficult to interpret, compare, and over-
lay. A new method of depicting the results of multitone measurements has been developed.
The measurement result is a single, continuous, frequency-dependent curve that takes into
account the level of the distortion products and their “density.” The curves can be easily
overlaid and compared. Future developments of this new method may lead to a correlation
between curves of the level of distortion and the audibility of nonlinear distortion. Using
nonlinear dynamic loudspeaker models, multitone and Gaussian noise test signals are com-
pared with traditional and nontraditional measurement techniques. The relationship between
harmonics and intermodulation products in static and dynamic nonlinear systems is analyzed.
never been popular in the assessment of loudspeaker non- linearity in audio equipment since two methods were in-
linearity. The multitone stimulus has been gaining popu- troduced in the 1940s by Hilliard [19] (SMPTE inter-
larity in many applications during the last decade [7]–[12]. modulation distortion) and Scott [20] (difference-
Various aspects of using multitone signals in loudspeaker frequency or CCIF distortion). The former uses one
testing will be discussed in Section 3. The multitone sweeping and one stationary low-frequency tone of four
stimulus produces a rich spectrum of distortion products. times higher amplitude. The latter method uses two closely
The statistical distribution and crest factor of multitone spaced simultaneously swept tones. Products of the kind
signals is close to that of a musical signal. However, the Pf2±f1 and Pf2±2f1 are plotted. The frequency f1 corresponds
results of a measurement presented in the form of an out- to the fixed tone in the SMPTE method or to the lower
put spectrum are difficult to interpret, compare, and overlay. frequency tone in the CCIF method, and f2 is the frequency
A new method of depicting the results of multitone of the higher sweeping tone. Both methods do not measure
measurements has been developed in this work. According intermodulation products of order higher than three.
to this method, the result of the measurement is presented For the measurement of loudspeaker nonlinearity AES
as a single, continuous, frequency-dependent curve that standard AES2-1984 [21] recommends a measurement of
takes into account not only the distortion level of the spec- only the second- and third-order harmonic distortion. IEC
tral components but also their “density.” Many such standard 60268-5 [22] recommends that a wider circle of
curves, corresponding to different levels of input signal, characteristics be measured, including THD, individual
can be overlaid easily in a way that is practically impos- second and third harmonics, and individual second-order
sible using “unprocessed” responses to the multitone difference intermodulation products. In addition this stan-
stimulus. These two- or three-dimensional graphs can eas- dard recommends the aggregated criteria of sound pres-
ily demonstrate how the overall nonlinear distortion in a sure level (SPL) intermodulation in the form (Pf2+f1 +
device measured (loudspeaker or horn driver) increases Pf2−f1)/Pf2 and (Pf2+2f1 + Pf2−2f1)/Pf2, f2 >> f1, where the sum
with the level of the input multitone stimulus. and difference products of similar order (second or third
In the technical publications of previous years other only) are summed and related to one of two primary tones.
approaches to measure, model, and assess loudspeaker Alternative approaches to measure loudspeaker inter-
nonlinearity have been discussed. Some of these methods modulation distortion were proposed by Keele [23]. Keele
have not been included in the existing standards and prob- recommends two methods for consideration, one based on
ably never will. However, a comparative survey helps us the use of two-tonal signals, 40 and 400 Hz, of equal
to look at the problem of loudspeaker nonlinearity mea- amplitude. The percent of distortion is to be plotted as a
surements from a systematic standpoint and to understand function of input power. The other method includes a
better their meaning, advantages, and limitations. The tra- fixed-frequency upper range signal coupled with a swept
ditional and nontraditional methods that will be compared bass signal. Keele also advocates the use of the shaped
to the methods based on the application of Gaussian noise tone burst for the assessment of loudspeaker maximum
(incoherence function) and multitone stimulus are re- SPL [24].
viewed hereafter. These various methods and signals provide different
One of the unconventional methods is the measurement information about the nonlinearity in a measured loud-
and graphing of high-order frequency response functions speaker. Nevertheless, the following questions remain
(HFRFs) derived from the second- and third-order Volt- open: “What method conveys most adequate information
erra time-domain kernels. The HFRFs are three- about nonlinearity in a measured loudspeaker?”, “How
dimensional graphs, representing a “surface” of distortion well are the measurement data related to the perceived
products. Volterra series expansion stems from the funda- deterioration of sound quality or to the malfunctioning of
mental theoretical input of Volterra [13] and has been a loudspeaker?”, and “How can these data be represented
introduced by Wiener for the analysis of weakly nonlinear in the most comprehensible manner?”.
systems, characterized by low levels of distortion (see [14] This work is intended to illustrate and compare several
for a history of this subject). Since then Volterra series methods of assessment and graphical presentation of weak
expansion has been used widely in many areas where the nonlinearity in loudspeakers. The comparison is carried
structure of weakly nonlinear systems is not known, or the out using a nonlinear dynamic model of a low-frequency
parametric analysis of their behavior is too complicated. loudspeaker that includes excursion-dependent param-
Kaizer pioneered the use of Volterra series in loudspeaker eters: Bl product, suspension stiffness, voice-coil induc-
nonlinear analysis. He derived explicit expressions for tance, parainductance, eddy currents–caused resistance,
HFRFs through loudspeaker excursion-dependent param- and voice-coil current-dependent magnetic flux modula-
eters [15]. Kaizer’s research was followed by a number of tion. The models of three different woofers are used for
works (for example, [16]–[18]), where the second- and comparison: 8-in (203-mm) diaphragm long voice coil,
third-order kernels were measured, transformed into cor- 8-in (203-mm) diaphragm short voice coil, and 12-in (305-
responding HFRFs, and then plotted as three-dimensional mm) diaphragm long voice coil. The measurement results
graphs depicting loudspeaker second- and third-order dis- are simulated at different signal levels. A comparison is
tortions. Advantages and drawbacks of Volterra series ex- made of THD, harmonic distortion, Volterra second-order
pansion will be discussed in the Section 3. frequency-domain kernels, also called high-order fre-
Two-tone intermodulation distortion of the second and quency response functions (HOFRF), two-tone sum and
third order has traditionally been measured to assess non- difference intermodulation distortion, two-tone total non-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 333
VOISHVILLO ET AL. PAPERS
linear distortion, multitone intermodulation and multitone musical signals and the level of harmonic distortion can
total nonlinear distortion (MTND), and incoherence lead to wrong conclusions. The present example considers
function. the three-dimensional representation of the second-order
nonlinearity. The responses of the higher order nonlineari-
1 TEST SUBJECT, TEST RESULT ties are multidimensional functions. A real dynamic non-
PRESENTATION, AND GOAL linearity existing in loudspeakers and horn drivers is sig-
nificantly more complex than this simple example.
The possible uncertainty in the objective assessment of Imagine a hypothetical loudspeaker whose amplitude
nonlinear systems by traditional testing methods stems frequency response is presented by only a few samples at
from the complex nature of nonlinear systems. A linear a few frequencies. If there is no information about the
time-invariant system with a single input and output is behavior of the amplitude frequency response between
fully described by its pulse response (or by its complex these sparse samples, we cannot make a judgment about its
transfer function). The output signal can be calculated by performance. The response of this loudspeaker might be
the convolution of the input signal with the impulse re- perfectly flat between available samples; it might as well
sponse (in the time domain), or by multiplication of the have a strong irregularity. Similarly, a single frequency
input spectra by the complex transfer function (in the fre- response of nonlinear distortion, be it a harmonic or an
quency domain). In addition, the relationship between an intermodulation curve, conveys only limited information
input and an output signal of a linear system can also be about the nonlinearity. If there is no information about a
expressed in the form of a linear differential equation or a surface of nonlinear responses between the available cuts
linear difference equation if a linear system is discrete. of harmonic or intermodulation frequency responses, the
Simply speaking, a linear system does not add new fre- behavior of the nonlinear system cannot be assessed ac-
quency components to the output signal. curately. This statement is valid for loudspeakers and horn
The behavior of a nonlinear system is substantially drivers, which are complex, dynamic nonlinear systems
more complex. Traditional methods used for the analysis with many degrees of freedom and whose nonlinear re-
of linear systems are not applicable for an analysis of even sponses depend strongly, and in a complex manner, on the
weakly nonlinear systems. The properties of such systems frequency. In amplifiers, for example, the nonlinear char-
can be described in the time domain by the sum of Volt- acteristics do not exhibit that strong a frequency depen-
erra kernels [14]. The latter are essentially the pulse re- dence. Therefore, in their analysis the relationship be-
sponses responsible for the transformation of input signal tween harmonic and intermodulation distortions might be
by nonlinearities of different orders. The overall pulse more predictable.
response of a weakly nonlinear system is the sum of ker- The examples with nonlinear distortion in the loud-
nels of different orders that are multidimensional functions speakers described in this work will assume weak non-
of time. For example, the pulse response of a simple non- linearity (distortion products are at least 20–30 dB lower
linear system characterized by a second-order dynamic than the fundamental signal). In reality, however, the dis-
weak nonlinearity is the sum of the kernels of the first tortion in loudspeakers and horn drivers can be higher,
order (which is essentially the linear pulse response) and placing loudspeakers and drivers in the category of
a second-order kernel. The latter can be presented graphi- strongly nonlinear systems. These systems are character-
cally as a three-dimensional surface with two horizontal ized by even more sophisticated properties that may in-
time scales. clude bifurcation and chaotic and stochastic behavior. This
The output of such a system can be expressed as the class of nonlinear systems will not be considered in the
convolution of an input signal with the first- and second- current work.
order kernels. This convolution is expressed in general by There is a dilemma in measuring, graphing, and inter-
multiple integrals. The multidimensionality is also valid preting nonlinear distortion. On the one hand the assess-
for a frequency-domain complex transfer function of non- ment of nonlinear distortion needs the analysis of much
linear systems. The amplitude and phase frequency re- more information than is required to assess a linear sys-
sponses of a second-order distortion are also three- tem. On the other hand this information should be pre-
dimensional surfaces having two horizontal frequency sented in a simple and comprehensible graphical manner.
scales. The second harmonic distortion response (ampli- These two requirements may contradict each other. Fur-
tude and phase) is merely a diagonal “cut” through these thermore, the graphed data should be pertinent from the
two surfaces. Similarly, the impulse response of the sec- standpoint of distortion audibility. The final goal of a loud-
ond harmonic is merely a diagonal cut across the surface speaker nonlinear distortion measurement is to obtain data
of the three-dimensional kernel of the second order [1]. It that convey adequate information about the nonlinearity so
is obvious that neither the frequency response of the sec- that this information can be related unambiguously to the
ond harmonic nor its impulse response will represent the perceived sound quality of a loudspeaker under test, and
entire second-order nonlinear response of a weakly non- that thus the performance of different loudspeakers can be
linear dynamic system legitimately. These cuts may not compared objectively. The measurement data must be
correspond to the maxima of the distortion surface. Using “manageable.” In spite of the seeming simplicity of these
only harmonic distortion may cause mistakes in the as- goals, and a nearly 90-year history of numerous efforts of
sessment of the nonlinearity. Therefore a search for a cor- many researchers (see [1] for a history of the subject),
relation between the audibility of nonlinearly distorted these goals have never been fully achieved.
334 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS
2 PSYCHOACOUSTICAL CONSIDERATIONS stronger signal, called masker. The masking may be ob-
served in the time domain in the form of post and pre-
The search for a correlation between an objective mea- masking when a stronger short-term masker “obliterates” a
surement of nonlinear distortion and the subjective audi- weaker masked signal, even if the latter precedes the
bility of nonlinear distortion in audio equipment in gen- masker. Masking may also occur in the frequency domain,
eral, and in loudspeakers in particular, has always been where a stronger masker produces a shadow zone around
and remains the Holy Grail of the audio industry. Loud- itself. This shadow psychoacoustically suppresses those
speaker distortion measurement data related to the per- masked signals whose spectrum components happen to be
ceived sound quality must not only have some readily within the spectrum and below the level of the masking
comprehensible interpretation, but must also be supported frequency-domain curve. The masking frequency-domain
psychoacoustically. There must be a credible knowledge curve produced by a single tone, for example, resembles a
relating the graphically presented objective data to the triangle. With an increase in the level of the masker, the
subjectively perceived sound quality. Due to the complex triangle becomes asymmetrical, with its longer side
nature of the nonlinearity and the intricacy of the human stretching toward high frequencies [25]. With an increase
auditory system’s reaction to a musical signal adversely in the level of the masking tone the level of the masking
affected by the nonlinearity, there are no undisputedly asymmetrical triangle increases and stretches over a wider
credible and commonly recognized thresholds expressed frequency range, producing a stronger masking effect
in terms of the traditional nonlinear distortion measures above the frequency of the sinusoidal masker rather than
related to the perceived sound quality. The problem is below it (Fig. 1). The masker shown in Fig. 1 corresponds
aggravated by the fact that the objective measurement of to curve a.
nonlinearity deals merely with the symptoms of a nonlin- The asymmetrical triangular shape of the masking curve
ear system, that is, with the reactions of a nonlinear sys- explains why the higher order harmonics and intermodu-
tem, such as a loudspeaker, to various testing signals. Here lation products are more audible than the lower order ones,
we operate with objective categories, such as measured who are more prone to be masked. In Fig. 2 the harmonics
levels, responses, characteristics, and parameters. Mean- and intermodulation products, produced by a two-tone sig-
while the subjective assessment of musical signals im- nal affected by the static fifth-order nonlinearity, are over-
paired by the nonlinearity deals with the human psycho- laid with the masking curve produced by the two-tone
acoustical reactions and impressions expressed in a quite masker. This also explains why the difference intermodu-
different vernacular, such as “acceptable, annoying, pleas-
ant, or irritating.” The objective of a researcher is to put a
bridge between these two different domains.
The dynamic reaction of a complex nonlinear system
(such as a direct-radiating loudspeaker or a horn driver) to
a musical signal cannot be extrapolated from its reaction to
a simple testing signal such as a mere sweeping tone.
Hence the credible thresholds of subjectively perceived
nonlinear distortion expressed in terms of the reaction to
simple sinusoidal signals (THD, harmonics, or two-tone
intermodulation distortion) may not be valid. More com-
plex signals, such as a random or pseudorandom noise or
a multitone stimulus, are believed (by the authors) to be
required to search for subjectively relevant thresholds. Fig. 1. Masking curves corresponding to levels a–e of sinusoidal
The complex properties of the human hearing system, tone masker. The masker corresponds to curve a.
which is a far cry from a mere Fourier frequency analyzer,
only add complexity to the problem. The behavior of the
hearing system is characterized by many effects described
in various publications on psychoacoustics (see [25], for
example). The properties of the auditory system most rel-
evant to the subject of this work are the intrinsic nonlin-
earity of the hearing system and temporal and frequency-
domain masking. These effects have been treated in detail
in the psychoacoustical literature, and it is not the authors’
goal to replicate these texts. However, it is worth men-
tioning that the intrinsic nonlinearity of the human hearing
system manifests itself at high levels of sound pressure,
whereas the masking is a general property of the hearing
system, “working” at any level of the sound pressure signal.
The masking plays a crucial role in the perception of Fig. 2. Masking effects produced by two closely spaced funda-
nonlinear distortion. The crux of masking is a psycho- mental tones (maskers) on their harmonics and intermodulation
acoustical suppression of a weaker masked signal by a products.
lation products are more susceptible to be outside the nar- interpretation of multitone test results does not have a well
rower lower side of the masking curve, which makes them substantiated psychoacoustical support. So far we cannot
more audible. In addition, the frequencies of the harmon- derive precise judgments about the sound quality of a
ics have a higher probability of coinciding with overtones loudspeaker (that has been tested by a multitone stimulus)
of particular musical instruments, and of being masked by from the response to this signal. However, the results of
these overtones as well. Meanwhile the variety of “disso- recent research on the correlation between objective mea-
nant” intermodulation products of various orders do not surements and subjectively perceived nonlinearly distorted
coincide with the overtones of the musical instruments speech and musical signals [26] prove that for certain
and, therefore, are more noticeable. kinds of nonlinearity the postprocessed reaction to a mul-
The complexity of nonlinear systems such as loud- titone stimulus expressed as a single number, dubbed by
speakers and the complexity of the hearing system explain the authors of that work the distortion score (DS), has a
why thresholds of distortion audibility expressed in terms very high correlation with subjectively perceived sound
of such plain metrics as harmonic distortion, THD, and quality. The distortion score is obtained by the summation
two-tone intermodulation distortion strongly depend on of the levels of distortion products within the mean
the type of musical signal used in the experiments, and equivalent rectangular bandwidth (ERBN) of the auditory
why the data obtained by different researchers are often filter, which is conceptually similar to the traditional criti-
inconsistent. An historical review of the search for a rela- cal bandwidth but differs in numerical values. It is be-
tionship between objective data (expressed in terms of lieved that future experiments with multitone stimuli
such metrics as THD, harmonic distortion, and two-tone might lead to further positive results in attempts to find a
intermodulation distortion) and the subjectively perceived relationship between the objective measurement data and
deterioration of the sound quality of reproduced material is subjectively perceived sound quality.
given in [1]. Historically many early research works in this
area did not have a clear understanding of the complex 3 TESTING METHODS AND INTERPRETATION
nature of nonlinear dynamic systems and suffered from a OF MEASUREMENT DATA
lack of the modern knowledge of the principles of opera-
3.1 Relationship between Harmonics and
tion of the hearing system. Since then the theory of non-
Intermodulation Products—Effects Produced by
linear systems and the knowledge of psychoacoustics have
Static Nonlinearity of Different Orders
progressed enormously. Examples of the use of this prog-
ress are the systems of low-bit compression (such as MP3, Measurements of nonlinear distortion using simple ex-
WMA, ATRAC) that “deceive” the hearing system by citation signals may not provide adequate information
deleting significant parts of signal information without a about the nonlinear properties of a device under test. Even
significant deterioration of the perceived sound quality. considering a simple form of static nonlinearity, some not
These low-bit-rate compression systems, evaluated by immediately obvious effects appear. Let a hypothetical
standard metrics such as THD or two-tone intermodulation static nonlinear system be governed by the simple poly-
distortion, would have exhibited unacceptable levels of nomial expression
distortion, proving that the standard metrics have no im- n
popular measure of nonlinear distortion in audio, is not a where z(t) is an input signal, y(t) is an output signal, h0 is
reliable measure of the psychoacoustically meaningful the dc distortion component, h1 is the linear gain coeffi-
nonlinearity in a loudspeaker. First, it does not add any- cient, and h2, . . . , hn are the weighting coefficients re-
thing to what individual harmonic curves can show. Sec- sponsible for the influence of a nonlinearity of a particular
ond, since useful information about harmonics of different order beginning from the second. The coefficients hi in
orders is not available from THD, its interpretation may general may have positive or negative signs, and some of
result in wrong conclusions about the character of the them may be zero.
nonlinearity in a loudspeaker tested. In other words, the A nonlinearity of this kind might, for example, approxi-
same 10% THD of the sound pressure level at a certain mate a loudspeaker suspension in the form of a relation-
level of input voltage might be produced by the dominant ship between the diaphragm displacement x and the force
second- and third-order nonlinearities in one loudspeaker, F if creep effect (the long-term dependence of the com-
or it might include the higher order harmonics in another pliance on the time of loading) and hysteresis are omitted.
loudspeaker as well. The difference in the amount and Then the coefficients h0, . . ., hn in Eq. (1) represent the
level of intermodulation products and, correspondingly, in suspension compliance. As a loudspeaker operates, non-
the sound quality of these two loudspeakers could be sig- linear compliance causes nonlinear displacement, and this
nificant. This THD would not indicate. effect interacts with other nonlinear phenomena. The over-
As has been mentioned, the multitone stimulus, whose all nonlinearity of loudspeakers is dynamic and more com-
objective parameters, such as the probability density func- plex than the simple relationship described by Eq. (1).
tion, have similarity with a musical signal, seems to be a Bearing in mind that this particular example is not a com-
good candidate for a better testing signal. However, there plete representation of the operation of a loudspeaker, we
is an important aspect of using multitone stimuli that will nevertheless analyze this simple static nonlinearity to
should be considered here to be objective. Currently the illustrate some general effects.
336 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS
Let us assume that the relationship between the dis- If the input force F has the form of a two-tone signal,
placement, x (output) and the force F (input) is described F ⭈ 0.5(sin 1t + sin 2t), then the output signal (displace-
by the expression: x(t) ⳱ c1F(t) − c5F5(t), where the co- ment) consists of the linear part c1F ⭈ 0.5(sin 1t + sin 2t)
efficients c1 and −c5 represent nonlinear mechanical com- and of the distortion products generated by the fifth-order
pliance. We also assume that the input driving force is nonlinearity, which include two fifth and two third har-
sinusoidal and the coefficient c5 is set to c5 ⳱ 0.26344 to monics, and twelve intermodulation products. In addition,
produce 10% THD in displacement. Fig. 3 shows the de- the fifth-order nonlinearity produces two spectral compo-
pendence of the displacement on the driving force and the nents having frequencies identical to the frequencies of the
spectrum of displacement corresponding to the sinusoidal initial input signals. Fig. 4 depicts the output spectrum. We
input. The spectral components of displacement are de- assume that the amplitude of each tone is half the ampli-
scribed by the expression tude of the previous sinusoidal tone to maintain the same
maximum level as the single-tone signal,
x共t兲 = c1F sin t − c5F 5 sin5 t
= c1F sin t − c5F 5共0.625 sin t − 0.3125 sin 3t x共t兲 = c1F ⭈ 0.5共sin 1t + sin 2t兲
+ 0.0625 sin 5t兲. (2) − c5F 5 ⭈ 0.55共sin 1t + sin 2t兲5
= c1F ⭈ 0.5共sin 1t + sin 2t兲
The fifth-order “limiting” nonlinearity produces the
fifth harmonic (which is quite predictable). It also gener- − c5F 5 ⭈ 0.03125关0.0625 sin 51t
ates the third harmonic and the spectral component having + 0.0625 sin 52t − 1.5625 sin 31t
the same frequency as the input signal. Since the latter − 1.5625 sin 32t + 6.25 sin 1t + 6.25 sin 2t
spectral component is out of phase with the fundamental − 3.125 sin 共21 + 2兲t + 3.125 sin 共21 − 2兲t
tone it produces the limiting effect of the suspension be- − 3.125 sin 共22 + 1兲t + 3.125 sin 共22 − 1兲t
cause it decreases the level of the first harmonic in the + 0.3125 sin 共41 + 2兲t − 0.3125 sin 共41 − 2兲t
displacement compared to the linear one. The fifth har- + 0.3125 sin 共42 + 1兲t − 0.3125 sin 共42 − 1兲t
monic is five times (−14 dB) smaller than the third and ten + 0.625 sin 共31 + 22兲t + 0.625 sin 共31 − 22兲t
times (−20 dB) smaller that the spectral component having + 0.625 sin 共32 + 21兲t + 0.625 sin 共32 − 21兲t兴.
the same frequency as the input tone. (3)
The balance between fifth and third harmonics becomes
significantly different compared to the single-tone excita-
tion. The fifth harmonic turns out to be much lower in
amplitude than the third harmonic and all intermodulation
products. The difference between the fifth and third har-
monics produced by the same fifth-order nonlinearity be-
comes 28 dB. All twelve intermodulation products are
higher in amplitude than the fifth harmonic. If the maxi-
mum level of the two-tone signal is chosen equal to the
amplitude of the single-tone signal producing 10% THD,
the relationship between the harmonics produced by these
two signals is as shown in Table 1.
From this observation it might follow that if someone
tests only the harmonic distortion in this hypothetical non-
linear suspension, he might come to a conclusion that this
suspension is impaired predominantly by the third har-
monic distortion and to a lesser degree by the fifth har-
monic distortion. This conclusion might lead to a “picture” Fig. 5 illustrates the dependence of displacement on the
of a musical signal contaminated predominantly with third driving force and the spectrum of displacement corre-
and slightly with fifth-order harmonic distortion. How- sponding to sinusoidal input.
ever, the application of a more complex two-tone signal A sinusoidal input obviously produces third harmonics.
changes the reaction of this nonlinear system. For ex- It also produces the spectral component of the “first or-
ample, the fifth harmonic becomes very small and virtu- der,” which has the same frequency as the input signal. If
ally unessential compared to other distortion components. a two-tone signal is applied to the same system, it pro-
In its turn, the power of the stronger third harmonic be- duces four intermodulation products, two third harmonics,
comes negligible compared to the power of the intermodu- and two terms of the “first order,”
lation products. This observation can be extrapolated eas- x共t兲 = c1F ⭈ 0.5共sin 1t + sin 2t兲
ily to higher order static nonlinearities.
− c3F 3 ⭈ 0.53共sin 1t + sin 2t兲3
This simple example illustrates the fact that harmonics
= c1F ⭈ 0.5共sin 1t + sin 2t兲
and intermodulation products are merely symptoms of
nonlinearity. They are the reaction of a nonlinear system to − c3F 3 ⭈ 0.125关−0.25 sin 31t − 0.25 sin 32t
a particular input signal. Different inputs produce different + 2.25 sin 1t + 2.25 sin 2t
symptoms in the same nonlinear system. Since a real mu- − 0.75 sin 共21 + 2兲t + 0.75 sin 共21 − 2兲t
sical signal is a set of various spectral components rather − 0.75 sin 共22 + 1兲t + 0.75 sin 共22 − 1兲t兴.
than a merely sinusoidal tone, we can assume that the (5)
share of harmonics is much less significant than the share If we set the coefficient c3 ⳱ 0.30888 to produce 10%
of intermodulation products if a musical signal is applied THD at the single-tone input, and if we set the maximum
to a static nonlinear system. The relationship between har- level of the two-tone signal equal to the amplitude of a
monics and intermodulation products in dynamic nonlin- single tone, we obtain the level of harmonic components
ear systems will be analyzed in detail in the next sections. listed in Table 2.
The preceding example illustrated the different reac- If someone compares the measurement results of har-
tions of the same static nonlinear system to different sig- monic distortion produced by the two hypothetical suspen-
nals. Now let us consider the role played by the static sions he might come to the conclusion that they perform
nonlinearity of different orders. Let the hypothetical non- essentially similarly because their THD is equal to 10%,
linear suspension be approximated by the expression x(t) the levels of their third harmonics are close (−21.7 dB
⳱ c1F(t) − c3F3(t) and let the sinusoidal signal be applied versus −22.3 dB), and the fifth harmonic produced by the
to it, fifth-order suspension is small and not important. How-
ever, the larger number of intermodulation products pro-
x共t兲 = c1F sin t − c3F 3 sin3 t duced by the fifth-order nonlinearity (which remained be-
= c1F sin t − c3F 3共0.75 sin t − 0.25 sin 3t兲. (4) yond the scope of this particular harmonic measurement)
Table 1
Fig. 5. (a) Dependence of suspension displacement on force; third-order approximation. (b) Spectrum of displacement corresponding
to third-order approximation. Sinusoidal input.
might produce a different effect on the perceived sound affinity with a musical signal. Truly, the multitone stimu-
quality. Fig. 6 illustrates the output spectrum. The spec- lus is closer to a musical signal than the single-tone stimu-
trum of the distortion products is wider and the density of lus in the crest factor, the spectrum, and the probability
the spectrum is higher in the fifth-order suspension. It may density function. This example illustrates the dominance
cause higher perceptibility of nonlinear distortion in the of intermodulation products revealed by the multicompo-
fifth-order system because some distortion products might nent testing signals. The tendency of intermodulation
not be masked by a hearing system. products to dominate harmonics, illustrated here through
This simple example illustrates a situation when the the use of a multitone signal, can be extrapolated to a musical
measurement of only harmonics does not convey enough signal. More details on this subject can be found in [1].
information about the performance of even a simple static
nonlinear system. It also shows that a higher order static 3.2 THD and Harmonic Distortion
nonlinearity produces a larger number of intermodulation
By observing only harmonic distortion curves we might
products if excited by a similar input signal.
not be able to come to an accurate conclusion about the
Three conclusions follow.
entire nonlinear properties of a loudspeaker under test, and
1) The overall level of harmonics is typically lower than
we cannot predict how the distortion products generated in
the overall level of intermodulation products within the
a musical signal will be masked by the hearing system. In
same nonlinear system, and this difference is stronger in a
addition, harmonic distortion measurements may not re-
system impaired by a higher order nonlinearity.
veal some nonlinear effects at all. A typical example is the
2) A nonlinear system of a higher order being exposed
Doppler distortion in direct-radiating loudspeakers. This
to a complex signal produces more intermodulation prod-
distortion is not revealed by a single tone. At least two
ucts with a wider spectrum. This effect is not revealed by
tones are required.
an analysis of the harmonic distortion.
The “supremacy” of intermodulation distortion may
3) The wider spectrum of intermodulation products
lead to the straightforward but wrong conclusion that har-
might be more noticeable because some of the spectral
monic distortion is irrelevant in any application and may
components would not be masked by the hearing system.
be omitted in measurements of loudspeaker nonlinear dis-
Table 3 shows the reaction of second- and third-order
tortion. However, while not being able to characterize
static nonlinearity to different multitone stimuli. The num-
nonlinearity in its entirety and complexity, and link it to
ber of intermodulation products of static nonlinearity char-
the audibility of signal deterioration, the harmonic distor-
acterized by only second and third orders increases dra-
matically with the number of input testing tones compared
to the number of harmonics. It can also be observed that
the third-order nonlinearity produces the same number of
harmonic products (compared to the second order) but a
significantly larger amount of intermodulation products.
This tendency increases in higher order nonlinearities.
In this example the increase in the number of inter-
modulation products generated stems from the nature of
the testing signal (multitone) that was chosen for some
Table 2
Table 3
tion curves, plotted separately as a function of frequency Fig. 7 illustrates the sensitivity of THD test results to
and level of input signal, provide useful information about variations of the loudspeaker parameters. Fig. 7(a) corre-
loudspeaker under test. For example, a strong level of sponds to input voltage of 10 V applied to a nonlinear
high-order harmonics may be indicative of a rubbing voice dynamic model of a 12-in (305-mm) woofer characterized
coil or the presence of nonlinear breakups of a compres- by a nonlinear Bl product, suspension stiffness, voice-coil
sion driver’s metallic diaphragm and suspension. The re- inductance, and flux modulation. The parameters of the
lationship between harmonics of even and odd orders tells woofer are given in Appendix 1. The modeling was carried
about the symmetry (or the lack of it) in loudspeaker dis- out through numerical solution of a system of nonlinear
placement-dependent parameters. The buildup of high- differential equations describing the behavior of an elec-
order harmonics with an increase in input voltage may be trodynamic loudspeaker (see Appendix 2). Fig. 7(b) shows
indicative of approaching the limit of a spider’s deflection. the THD curve corresponding to the same woofer, but the
When performing harmonic distortion measurements, one flux modulation distortion is omitted. The difference in the
should keep in mind that the harmonics will be accompa- physical properties between the two models is reflected in
nied by an outweighing number of intermodulation prod- the difference in THD curves. It can be observed that the
ucts as soon as the testing tone is replaced by a musical flux modulation distortion affects the THD curve at high
signal. frequencies. It is convenient to overlay THD curves cor-
The THD test can be used legitimately in “passed–not responding to different input levels. Fig. 8 shows SPL
passed” production tests where similar types of loudspeak- THD curves corresponding to an increase of the input
ers are tested. Certainly, THD gives an idea about audibly voltage from 10 to 40 V in 3-dB increments.
noticeable nonlinear distortion if its level is high. A loud- Figs. 9 and 10 show THD curves corresponding to two
speaker having 50% THD in the midrange will hardly be 8-in (203-mm) woofers having different motors. (One has
a source of mellifluous sound. It does not take some other a long 12-mm coil and a 6-mm short gap, the other a short
sophisticated analysis of nonlinearity or fine listening tests 6-mm coil and a long 12-mm gap.) The parameters of the
to figure that out. woofers are given in Appendix 1. The level of the input
signals corresponds to maximum voice-coil displacements
of 4 and 10 mm. The difference in the THD curves of the
small-level signal a is pronounced at low frequencies,
whereas the difference in the large-level signal b is pro-
nounced at frequencies above 80 Hz. Therefore THD gives
an idea of the difference in the objective performance of
two loudspeakers being compared.
the second harmonic nor the second-order difference- order sum intermodulation product Pf2+f1. The diagonal line
frequency intermodulation products provides sufficient in- a, characterized by the equal in modulus but opposite in
formation about the distortion of a hypothetical loud- sign frequencies, describes the zero-order harmonic Pf1−f1,
speaker characterized by only a second-order nonlinearity. which is a constant displacement if the HFRF describes
Figs. 11 and 12 show second-order frequency responses the voice-coil excursion. The diagonal line characterized
(frequency-domain Volterra kernels of the second order)
of the same 8-in (203-mm) woofers having different mo-
tors. A typical HFRF is presented in the form of a three-
dimensional “mountain terrain” with two horizontal fre-
quency axes. Figs. 11(a) and 12(a) illustrate the three-
dimensional terrains of the second distortion products,
whereas Figs. 11(b) and 12(b) show the maps of this sur-
face. The vertical axis shows the level of all second-order
distortion frequency responses, including second harmon-
ics, the sum and difference intermodulation products at all
combinations of two frequencies, and the frequency-
dependent constant component (dc or zero harmonic) if it
is excited in a particular nonlinear system [see Figs. 11(a)
and 12(a)].
In this interpretation the Cartesian coordinates of a point
on this map corresponding to one negative and one posi-
tive frequency describe the second-order difference fre-
quency component Pf2−f1, whereas a point with the coor-
dinates belonging to both positive frequencies is a second-
by the equal values of positive frequencies is a second scribe entire second-order nonlinearity is represented by
harmonic distortion Pf1+f1 (see Figs. 11 and 12). By com- the single second harmonic distortion response. To know
paring the second harmonic distortion cut with the entire the dynamic reaction of the second-order nonlinearity we
surface of the second-order distortion, one may clearly see should also take into account the three-dimensional sur-
what a small share of all the information required to de- face of the second-order phase response and perform a
twofold inverse Fourier transform, which will obtain the
second-order impulse response. Two-dimensional convo-
lution of the input signal with this two-dimensional pulse
response provides the dynamic reaction of the second-
order nonlinearity. It is not hard to imagine how far the
distortion signal may be from the results of harmonic or
THD measurement. Whether or not this dynamic distor-
tion signal is noticed by the hearing system depends on a
number of factors that should be considered in the context
of masking such as the level of the signal, its dynamics,
and the spectral contents.
The concept of Volterra expansion can be formally ex-
tended to higher orders. Unfortunately the third-order non-
linearity needs four-dimensional space for its description
(three frequency scales), which defies simple graphic rep-
resentation. One possible solution to plot the third-order
HFRFs is a cut through one of the three frequency scales
corresponding to the worst case of distortion, and using the
remaining two scales in the three-dimensional graph. Reed
and Hawksford used this approach in [17]. For the higher
orders of nonlinearity the situation becomes even more
desperate, and graphical representation is even less prac-
tical. To make matters worse, with increasing orders of
nonlinearity the volume of calculations required to de-
scribe a Volterra model increases tremendously, making a
practical application impossible. This “curse of dimen-
sionality” is clearly illustrated by an analysis of the ex-
pression for the output signal of a nonlinear system de-
scribed by the first three terms of a Volterra expansion,
t t t
t t t
all possible permutations of the third-order system’s re- stant, was proposed by Scott [20]. It is called difference
sponse samples is enough to describe it completely, and frequency or CCIF method. If a two-tone signal is applied
one-fourth of all permutations of the fourth-order samples to a second-order nonlinear system, it generates the fol-
[14]. Still, the number of samples to be analyzed increases lowing distortion products: dc component, two different
enormously with an increase in the order of Volterra intermodulation spectral components (sum and difference
expansion. products), and two second harmonics. Meanwhile the non-
The calculation of HFRFs from multidimensional pulse linear reaction of the third-order nonlinearity to the same
responses needs n-fold Fourier transforms. The volume of two-tone signal will consist of two third-order harmonics,
calculations also increases significantly with an increase in two spectral components having the same frequencies as
the order of nonlinearity, the initial tones but lower amplitudes, and four intermodu-
⬁ lation products. However, it is known from the theory of
H1共i1兲 = 兰h 共 兲e
−⬁
1 1
−i共11兲
d1
nonlinear systems that full description of the third-order
nonlinearity formally needs at least a three-tone signal
[14]. This increases the number of third-order harmonics
⬁
to three, and the number of spectral components having
H2共i1, i2兲 = 兰h 共 , 兲e
−⬁
2 1 2
−i共11+22兲
d1 d2 input signal frequencies to three as well. The number of
intermodulation products goes as high as 16.
⬁ It is not practical to plot 16 different frequency re-
H3共i1, i2, i3兲 = 兰h 共 , , 兲e
3 1 2 3
−i共11+22+33兲 sponses of intermodulation products. Traditionally only
the products Pf2±2f1 are analyzed, omitting components of
−⬁ (7)
d1 d2 d3 the type Pf3±f2±f1, which are also generated by the third-
order nonlinearity. Hence the measurement of individual
⭈ intermodulation products of the second and third orders
⭈
⭈ gives limited information about the third-order nonlinear-
ity if a two-tone signal is used. With regard to higher order
⬁
兰h 共 , . . . , 兲e −i共11+⭈⭈⭈+nn兲
nonlinearity, the standardized two-tone intermodulation
Hn共i1, . . . , in兲 = n 1 n methods supply limited information as well.
−⬁ Plotting all four “conventional” intermodulation curves
d1 . . . dn. (Pf2±f1 and Pf2±2f1) on a single graph still produces a picture
In addition, the Volterra series expansion has a fundamen- that is difficult to comprehend and interpret. An integrated
tal constraint stemming from the assumption that there is criterion in the form of total intermodulation distortion
no energy exchange between nonlinear products of differ- (TIMD) is a simplifying solution, leading to fewer fre-
ent orders. This constraint confines the application of quency responses of intermodulation distortion. For ex-
Volterra series expansions to only weakly nonlinear sys- ample, standard IEC 60268-5 [22] determines: “The
tems. Attempting to use Volterra expansions for a nonlin- modulation distortion (MD) of the nth order shall be speci-
ear system with a strong nonlinearity causes divergence of fied as the ratio of the arithmetic sum of the r.m.s. values
the Volterra series. of the sound pressures due to distortion components at
This simple example illustrates why Volterra expan- frequencies f2 ± (n − 1)f1 to the r.m.s. value of the sound
sions of orders higher than three are practically never used. pressure Pf2 due to the signal f2.” The total intermodulation
It precludes Volterra expansion from handling strong and coefficient of the second order according to IEC 60268-5 is
high-order nonlinearities. The measurement of Volterra
HFRFs can be carried out using special signals, such as Pf2−f1 + Pf2+f1
d2 = × 100%. (8)
maximum-length sequences (MLS) [27], multitone stimuli Pf2
[28], and Gaussian noise [29]. There are methods provid-
ing a direct calculation of HFRFs from NARMAX output The total intermodulation coefficient of the third order is
data [30]. Straightforward methods operating with a varia-
tion of two or three sinusoidal signals are not practical Pf2−2f1 + Pf2+2f1
d3 = × 100%. (9)
because of the measurement time burden. Pf2
3.4 Two-Tone Intermodulation Frequency The frequencies f1 and f2 satisfy the condition f2 >> f1, and
Responses the ratio of the amplitudes of the input signal is specified
Measuring the frequency responses of intermodulation by the user. The standard gives no recommendation re-
products by using a two-tone signal has nearly as old a garding the measurement of intermodulation and harmonic
history as measuring harmonic distortion and THD. Two products having orders higher than three. This omission
methods have been used predominantly in the audio in- was probably due to practical concerns. Without calling
dustry. One, proposed by Hilliard, uses one fixed low- into question the validity of the standard’s recommenda-
frequency tone and one sweeping tone. The method was tions, the authors do not exclude situations when measur-
adopted by SMPTE [19]. It is often called the modulation ing higher order harmonic and intermodulation products
method. The second method, using two sweeping tones might be useful in the assessment of audio equipment
and keeping the frequency difference between them con- performance.
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 343
VOISHVILLO ET AL. PAPERS
As was mentioned in Section 2, higher order nonlinear tone intermodulation distortion (TIMD) is specified by the
products may be more detrimental to the sound quality authors as
冑兺
compared to the lower order products. This effect was
N
recognized a long time ago (see, for example, [31]), and
modern research confirms it [1], [32]. The importance of p2i 共 f 兲
i=1
higher order distortion products is somewhat twofold. dTIMD共f 兲 = × 100% (10)
First, high-order nonlinearity produces a very large num- Pf
ber of intermodulation products, whose number and en- where
ergy increase dramatically with an increasing input signal
level. Second, higher order products are usually spread p1共 f 兲 = Pf2+f1, p2共 f 兲 = Pf2−f1, p3共 f 兲 = Pf2+2f1,
over a wide frequency range, which results in weaker psy- p4共 f 兲 = Pf2−2f1, p5共 f 兲 = Pf2+3f1, p6共 f 兲 = Pf2−3f1,
choacoustic masking of these distortion products [1]. In p7共 f 兲 = Pf2+4f1, p8共 f 兲 = Pf2−4f1, . . .
the wake of it, the authors developed an alternative way to pn−1共 f 兲 = Pf2+mf1, pn = Pf2−mf1
formulate two-tone intermodulation distortion characteris-
tics. Figs. 13 and 14 show the two-tone intermodulation are the amplitudes of the intermodulation products, Pf ⳱
distortion curves of the same two 8-in (203-mm) woofers. Pf1 ⳱ Pf2 is the amplitude of either one of the fundamental
The distortion curves correspond to similar 10-mm maxi- tones, f1 is the fixed low-frequency tone, and f2 is the
mum voice-coil displacement. The difference between higher frequency sweeping tone.
these intermodulation distortion curves of the two loud- In this approach the excitation signal consists of two
speakers is significantly more pronounced when compared tones having equal amplitude. One of these two tones is
with the THD curves of the same woofers. The intermodu- swept across the frequency range. The level of distortion,
lation distortion curves presented in Figs. 13 and 14 are calculated according to Eq. (10), is plotted at the fre-
calculated differently from traditional intermodulation co- quency of the sweeping tone. The authors attempted to
efficients recommended by existing standards. The two- extend the recommendations of IEC 60268-5 [22] to the
Fig. 13. Two-tone intermodulation distortion (IEC 60268-5); Fig. 14. Two-tone intermodulation distortion (IEC 60268-5);
loudspeaker A (long coil, short gap). a—(Pf2−f1 + Pf2+f1)/Pf1; loudspeaker B (short coil, long gap). a—(Pf2−f1 + Pf2+f1)/Pf1;
b—(Pf2−2f1 + Pf2+2f1)/Pf1. (a) Voice-coil maximum displacement b—(Pf2−2f1 + Pf2+2f1)/Pf1. (a) Voice-coil maximum displacement
Xmax ⳱ 4 mm. (b) Voice-coil maximum displacement Xmax ⳱ Xmax ⳱ 4 mm. (b) Voice-coil maximum displacement Xmax ⳱
10 mm. 10 mm.
measurement of two-tone intermodulation distortion in the total harmonic and intermodulation distortion is extended
form of the intermodulation products having frequencies to a larger number of input tones. The ten-tone total har-
(f2 ± f1) and (f2 ± 2f1). The latter are the products of the monic distortion coefficient takes into account the har-
interaction between the first and second harmonics; the monics of all input tones. The level of this distortion,
first tone with the first harmonic of the second tone. The similar to the TTIMD, is related to one of the fundamental
authors merely extended the set of harmonics of the first tones. This simplifies the comparison of the distortions
signal to higher orders (3f1, 4f1), which led to intermodu- evaluated by these two criteria.
lation terms of the kind (f2 ± 3f1), (f2 ± 4f1). The inter-
modulation terms corresponding to the interaction be- 3.5 Multitone Stimulus
tween the harmonics of the first tone and the harmonics of
The possible circumvention of the partial “blindness” of
the second tone, such as (2f2 ± 2f1), (3f2 ± 2f1), (2f2 ± 3f1),
the conventional two-tone intermodulation tests is not to
(3f2 ± 3f1), etc., were omitted. The authors do not claim the
plot continuous frequency responses of the corresponding
ultimate validity of this approach. Including all the inter-
intermodulation products, but rather to show the full dis-
modulation products produced by two-tone excitation
crete spectra of all nonlinear products corresponding to
would probably provide more accurate results.
particular frequencies and levels of the two test tones. By
An alternative method to measure intermodulation dis-
extending this idea to a larger number of excitation tones,
tortion has been used by Keele [23], [24]. His test signal,
we naturally arrive at the concept of the multitone signal.
consisting of two tones, 40 Hz and 400 Hz, of equal am-
Indeed, if we obtain and graph the spectrum of a nonlinear
plitude, is applied to a loudspeaker, the input level is in-
reaction to the two-tone signal, which, as it has been
creased, and the intermodulation is measured and plotted
shown, gives limited information even about the third-
as a function of the input level. Such a test is a simple way
order nonlinearity, let alone the higher orders, why not use
to evaluate the intermodulation of the midrange output of
as many tones as it takes to detect all conceivable higher
a loudspeaker by a simultaneous bass signal.
In the current work Keele’s general approach to mea-
suring intermodulation distortion versus input level was
simulated using two different criteria. The first, dTTIMD,
includes all N measurable output intermodulation prod-
ucts; the second, dTTHD, takes into account only M har-
monic distortion products produced by two primary tones.
Here TTIMD stands for two-tone total intermodulation
distortion and TTHD designates two-tone total harmonic
distortion
dTTIMD =
冑兺 N
i=1
P共i兲IM
2
× 100% (11)
Pf
dTTHD =
冑兺 M
k=1
P共k兲H2
× 100% (12)
Pf
order intermodulation products, cover the entire frequency Usually the multitone signal is generated according to
range of measurement, and have a signal statistically much the simple rule
closer to the real musical signal than a single-tone or two-
N
兺A sin 共 t + 兲.
tone signal?
This idea can be extended to a tone in every FFT fre- x共t兲 = i i i (13)
i=1
quency bin. This abundance of tones turns the multitone
stimulus into a noiselike signal. Truly, noise signals are A strong advantage of the multitone stimulus is a short
used widely in the identification and analysis of nonlinear measurement time and the ability to reveal simultaneously
systems as, for example, in the measurement of the coher- a set of visible harmonic and intermodulation products. In
ence function. However, once the FFT is applied to the this capacity the multitone signal is beyond competition
output signal of a nonlinear system excited by such noise- with other signals. Multitone testing handles high-order
like signals, all individual distortion spectral components nonlinearity, and its use is not hampered by the existence
are obscured by the fundamental tones and become invis- of such effects as hard limiting, hysteresis, and dead zone.
ible on a graph. Meanwhile the multitone signal, produc- Also, the multitone stimulus can be used in applications
ing a “sparse” and discrete spectrum at the output of a where the loudspeaker short-term performance must be
nonlinear system, makes the majority of distortion prod- evaluated, such as the maximum SPL. Comparing a mul-
ucts visible on a graph. At the same time the multitone titone burst with a tone burst, the advantages of the former
signal is rather close to noise and musical signals in the become obvious. After the time-domain reaction of the
probability density function, bandwidth, and crest factor. loudspeaker to the tone burst of a particular frequency has
The multitone stimulus fills the gap between the noise- been received and preprocessed to skip the transients and
based methods of nonlinear identification and measure- then put through the Fourier transform, only harmonic
ments, and the traditional standardized methods using one distortion and THD become available. The distortion (har-
or two stationary or swept (stepped) tones. monic or THD) corresponds to only a single excitation
frequency. To cover the whole frequency range of interest,
these measurements have to be repeated at different fre-
quencies. Taking into account the number of measure-
ments needed to cover the entire frequency range with a
decent resolution, the overall measurement time may be
significant. Meanwhile, by applying the multitone burst,
only one measurement is needed and, in addition, the mul-
titone signal obtains more information about the nonlin-
earity in a loudspeaker under test. Furthermore, a multi-
tone burst’s crest factor can be “tuned” by adjusting the
phases of individual spectral components.
However, the interpretation of a nonlinear reaction to a
multitone stimulus may be arduous if the number of gen-
erated nonlinear products of different orders is substantial.
A multitone stimulus gives such “abundant” spectral in-
formation about nonlinearities that it is difficult to com-
prehend at first sight. (Truly, a second look at the reaction
to the multitone stimulus may not be helpful either when
one has to analyze hundreds if not thousands of distortion
spectral components.) In addition, an engineer has no in-
formation on how a particular pattern of nonlinear reac-
tions to multitone stimuli is related to the perceived sound
quality. Moreover, the spectrum of reactions to multitone
stimuli is not convenient to overlay and compare, espe-
cially if the responses to several input levels are to be
observed. There are several possible ways to overcome
this impediment. One is to distinguish the products of
different orders by postprocessing and to plot them either
separately or in different colors on the same graph. An-
other solution is to plot the averaged value of all distor-
tions located between two adjacent tones, as it is done in
[10] and in the FASTTEST multitone measurement [8].
Fig. 16. Ten-tone total nonlinear distortion as a function of input This approach permits a simple graphical representation of
voltage. a—intermodulation products; b—harmonic products. nonlinear distortions at different levels of input signal.
Logarithmic frequency distribution in frequency range fs to 5.5 fs.
(a) Loudspeaker A (long coil, short gap). Umax ⳱ 7.3 V corre- Distinguishing different intermodulation and harmonic
sponds to Xmax ⳱ 10 mm. (b) Loudspeaker B (short coil, long products of different orders is comparatively easy when
gap). Umax ⳱ 7.8 V corresponds to Xmax ⳱ 10 mm. the number of initial tones is reasonably low (less than ten,
346 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS
for example). With an increasing number of input tones This expression for the MTND characteristic uses a rect-
overlapping of different frequencies occurs, and the prob- angular window and the weighting coefficient 1/K, where
lem of separation becomes much more difficult, but not a K is the number of distortion products in the rectangular
theoretically impossible task. The separation can be car- window. This way to formulate dMTND(f) gives values that
ried out by a discrete progressive increase of the input are too low if the number of distortion products in a cur-
level accompanied by an analysis of the rate of increase of rent window is significant but the level of them is not high.
each distortion product in a particular frequency bin in This statement is purely empirical and bears no relation-
such a way that the evaluated spectral components (in- ship to subjective sensations. An alternative way might be
cluding possible overlapped ones) measured at different to omit the weighting coefficient 1/K entirely. In this case,
levels of input signal form a so-called polynomial Van- however, the level of dMTND(f) may become dispropor-
dermonde matrix [33]. Corresponding mathematical ma- tionally high if the number of distortion products corre-
nipulations with this matrix, which remain beyond the sponding to a particular position of the rectangular win-
scope of this work, provide a separation of the overlapped dow is high, even if their amplitude is low. These practical
spectral components and make it possible to evaluate the considerations mean that when the level of an MTND
level and phase of each, disregarding the fact that they curve is much lower or much higher than the level of
overlap. This approach is described, for example, in [28], distortion spectral components, the graph of distortion
where a multitone signal is used in the identification of looks unnatural. This is merely the authors’ subjective
weakly nonlinear systems and the measurement of Volt- point of view derived from numerous modeling and mea-
erra kernels. surement experiments.
An alternative method to represent the results of multi- Fig. 17 shows the spectrum of the SPL reaction to
tone testing is the averaging of distortion products in a the input multitone stimulus of the same two 8-in (203-
“sliding window.” The spectral components are averaged mm) woofers. The solid curves correspond to MTND
in a window (such an rectangular or Hanning), and the calculated according to Eq. (15). Fig. 18 shows the
averaged value of the distortion products is plotted at the reaction to multitone stimuli of the 12-in (305-mm)
frequency corresponding to the center of the window [34]. woofer with and without flux modulation distortion. The
Afterward the window is shifted one frequency bin “up”
and the process is repeated. Ultimately it provides a con-
tinuous frequency response of the distortion products,
which encapsulates all harmonics and a variety of inter-
modulation products generated by a particular loudspeaker
at a particular level of multitone stimuli, and a particular
distribution of primary tones. It has been dubbed the mul-
titone total nonlinear distortion (MTND). One of the pos-
sible ways to calculate MTND, where the Hanning win-
dow is used, is presented in the expression
dMTND共 fi兲 =
20 log 冉冑 兺 再 冋 冉 冊 册 冎 冒 冊
KⲐ2
k=i−K Ⲑ 2
Dk cos
| fi − fk|
⌬f
+1
1
2
2
p0
D2
Kk=i−K Ⲑ 2 k 冒冊
p0 共dB SPL兲.
Fig. 17. Sound pressure reaction to multitone stimulus. Peak
level of input signal corresponds to Xmax ⳱ 10 mm. (a) Loud-
speaker A (long coil, short gap). (b) Loudspeaker B (short coil,
(15) long gap).
solid curves correspond to MTND calculated according to There is a current impediment to the widespread use of
Eq. (14). multitone stimuli for measuring nonlinearity in loudspeak-
The presentation of the reaction of a device impaired by ers. This is the ambiguity of the nonlinear reaction of a
nonlinearity to multitone stimuli in the form of an aver- particular device to a multitone signal. A different distri-
aged curve (MTND) makes it easy to overlay different bution and a different number of tones produce different
curves belonging, for example, to different levels of input reactions. In theory all these responses belong to the same
signals or to different loudspeakers. Fig. 19 shows two over- multidimensional space of nonlinear reactions; however,
laid MTND curves, indicating that the flux modulation pro- for an observer these responses look different. This com-
duces distortion in the upper part of the frequency range. plicates the comparison of responses measured using
The frequency response of the MTND curve can be different distributions and number of tones. So the cur-
expressed in dB SPL [Eqs. (14) and (15)] as well as in the rent disadvantage of using multitone stimuli is the lack
percentage of the fundamental frequency response, of a common agreement regarding the number of tones,
their distribution, and the initial phases. To avoid this
dMTND共 fi兲
冉冑 兺 再 冋 冉 冊 册 冎 冒 冊
problem the number and distribution of tones should be
KⲐ2 2 standardized.
| fi − fk| 1 There are many methods of forming the frequency dis-
= Dk cos +1 A共 fi兲
k=i−K Ⲑ 2 ⌬f 2 tribution of multitone fundamentals. The major goal of
× 100共%兲 (16) some of the frequency distributions of primary tones (dif-
ferent from the evenly distributed tones on a logarithmic
where A(fi) is the amplitude of the frequency response of frequency scale) is to minimize the overlapping of primary
a loudspeaker at the frequency fi. tones and distortion components [7], [12].
Fig. 20 shows MTND responses of the 12-in (305-mm) As was mentioned, the separation becomes increasingly
woofer calculated according to Eq. (16) and at different difficult with an increasing order of the distortion products
levels of the input signal in 3-dB increments. due to the effect of overlapping. The separation can be
handled through the use of the polynomial Vandermonde while this method has found an application in the assess-
matrix [28], [33], which is not a trivial procedure. The ment of nonlinearity in hearing aids [3], [4]. The coher-
separation of low-order and high-order distortion compo- ence function gives an integral lumped measure of the
nents can be performed easily by, for example, a two-tone nonlinearity in a device under test, but it also takes into
signal. This is important for the detection of loudspeaker account noise if it is presented in a device under test. In
defects (rub and buzz) separate from regular motor and loudspeaker testing the presence of noise does not seem to
suspension nonlinearities. The simplicity of the distorted be an impeding factor. The attractive feature of the coher-
two-tone signal allows one to “understand” the relation- ence function is its simple graphic representation. Plotting
ship between some of the loudspeaker nonlinear param- the set of coherence functions corresponding to different
eters (causes) and nonlinear distortion (symptoms). input levels is another nice option.
It seems to be more convenient to present the coherence
3.6 Coherence and Incoherence Functions function in the following manner and call it the incoher-
The next method that deserves discussion is the mea- ence function,
surement of the coherence function that characterizes the
degree of linear relationship between input and output as a I共 f 兲 = 公1 − ␥2共 f 兲 × 100共%兲. (23)
function of frequency. By definition, the coherence func- Expressed in percent, the incoherence function I(f) is in-
tion is expressed as the ratio of the square of the cross tuitively close to the concept of nonlinear distortion. Zero
spectrum (between input and output) to the product of the incoherence indicates the absence of nonlinear distortion
autospectra of input and output [35], and noise. There is a seeming similarity between the in-
|Gxy共 fi兲|2 coherence function and THD. However, there is a princi-
␥2共 fi兲 = (17) pal difference between these two characteristics. THD
Gxx共 fi兲Gyy共 fi兲
takes only harmonics into account, whereas the incoher-
where Gxx(fi) is the autospectrum of the input signal x(t) at ence function is sensitive to the overall nonlinear contami-
the frequency fi, Gyy(fi) is the autospectrum of the output nation of the output signal and noise.
signal y(t), and Gxy(fi) is the cross spectrum of the input Fig. 21 shows the incoherence function of the 12-in
signal x(t) and the output signal y(t). (305-mm) woofer corresponding to different levels of the
The functions Gxx(fi), Gyy(fi), and Gxy(fi) are calculated input signal. The initial level of the noise signal was 0.6 V
as follows: rms. This level produced a voice-coil peak displacement of
2.5 mm. The same peak displacement corresponded to 10
N
兺
1 V rms set for the measurement of THD SPL (see Fig. 8),
Gxx共 fi兲 = E 关X共 fi兲 X*共 fi兲兴 = lim Xn共 fi兲 X*n 共 fi兲 (18) and to 3.3 V rms for the multitone measurement (see Fig.
N→⬁ N n=1
20). This difference in initial rms levels is attributed to
N
different crest factors of these signals. The incoherence
兺
1
Gyy共 fi兲 = E 关Y共 fi兲 Y*共 fi兲兴 = lim Yn共 fi兲 Y*n 共 fi兲 (19) function, THD, and MTND each show a different pattern
N→⬁ N n=1
of nonlinear distortion. Due to the different nature of these
N three methods, they produce different data, all related to
兺
1
Gxy共 fi兲 = E 关X共 fi兲 Y*共 fi兲兴 = lim Xn共 fi兲 Y*n 共 fi兲 (20) the same particular nonlinearity. This example demon-
N→⬁ N n=1
strates the complexity of assessment of nonlinear effects
where * denotes complex conjugation, E indicates aver- and the nontrivial reactions of a nonlinear system to dif-
aging, and X( f ) and Y( f ) are the complex spectra of the ferent testing signals. Fig. 22 shows the difference be-
input and output signals x(t) and y(t), respectively, tween the incoherence functions of two 8-in (203-mm)
woofers corresponding to voice-coil displacement of 4 and
兰
⬁
X共 f 兲 = F兵x共t兲其 = x共t兲e−j2ft dt (21)
−⬁
Y共 f 兲 = F兵y共t兲其 = 兰
⬁
y共t兲e−j2ft dt (22)
−⬁
10 mm. An increase in the nonlinear distortion corre- tortion measures related to the perceived sound quality.
sponding to the increasing input signal can be observed. Since the dynamic reaction of a complex nonlinear system
The incoherence function was calculated by means of such as a loudspeaker cannot be extrapolated from its re-
the noise signal generated as a multitone signal with 4096 action to simple testing signals, such as a sweeping tone,
frequency components of equal amplitude and the random the thresholds expressed in terms of the loudspeaker reac-
distribution of phases. The sampling frequency was 7680 tion to these signals (THD, harmonics, and two-tone in-
Hz. The crest factor of this signal is 5.9. To adjust the termodulation distortion) may not be valid.
properties of this noise signal for the numerical integration The requirements for an optimal method of measuring
of the system of nonlinear differential equations governing nonlinear distortion in loudspeakers were formulated. The
the operation of a loudspeaker, an adaptive algorithm was optimal method to measure the nonlinearity in loudspeak-
used to provide the initial zero value of the testing signal. ers must be informative, that is, it must obtain enough
In the given examples the incoherence function resulted objective information about the nonlinearity of different
from 1000 averages, which would correspond to approxi- orders. The plotted measurement results must have a clear
mately 500 seconds of testing time. During this time the interpretation and be readily comprehensible. The mea-
warming of the voice coil would change the behavior of surement data must be supported psychoacoustically,
the loudspeaker significantly if the driver were operated at meaning that there should exist an unambiguous relation-
high amplitudes. This is a drawback of this technique. ship between the results presented and the expected sound
quality.
In nonlinear systems such as a loudspeaker, the inter-
4 CONCLUSION
modulation distortion outweighs the harmonic distortion if
Due to the complex nature of loudspeaker nonlinearity a musical signal is reproduced. Harmonics may not give a
and the intricacy of the human auditory system’s reaction quantitative measure of the nonlinear distortion in a loud-
to musical signals contaminated with nonlinear distortion speaker, especially in the context of nonlinear distortion
products, there are no undisputedly credible and com- audibility. Nevertheless, the harmonic distortion measure-
monly recognized thresholds of traditional nonlinear dis- ment provides valuable information, illustrating, for ex-
ample, the dominance of the nonlinearity of certain orders.
A wide spectrum of harmonics and a strong level of high-
order harmonics may be indicative of a loudspeaker mal-
function such as a rubbing voice coil.
It has been demonstrated that high orders of static non-
linearity are characterized by a significant difference be-
tween the harmonic and intermodulation products that out-
number the harmonics and outweigh them in power. It has
also been demonstrated that a high-order nonlinearity pro-
duces intermodulation and harmonic products of its “own”
order, and of lower orders as well. The latter might have
higher levels. Drawing the conclusion that a certain high-
order nonlinearity is not essential because it produces a
low level of its “own” harmonics may lead to wrong
results.
THD does not seem to be a good measure of psycho-
acoustically meaningful distortion in loudspeakers. Not
distinguishing different orders of harmonics, the THD fre-
quency response may lead to the wrong conclusions about
the performance of a loudspeaker. Similar levels of THD
may correspond to very different distributions of harmon-
ics of different orders. This difference, invisible to THD,
may correspond to a strong diversity in intermodulation
products and correspondingly significant differences in
sound quality. However, THD can be legitimately used in
testing where similar types of loudspeaker are compared
(for example, in production testing).
Multitone testing possesses a number of advantages
compared to other methods. It is fast and gives a detailed
graphical representation of the distortion products. When a
large number of input tones are applied to a loudspeaker,
the spectrum of the output signal becomes very rich with
Fig. 22. Sound pressure incoherence function. a—peak level of
input signal corresponds to Xmax ⳱ 4 mm; b—peak level of input intermodulation products (harmonic products have only a
signal corresponds to Xmax ⳱ 10 mm. (a) Loudspeaker A (long minuscule share of these spectral components). A visual
coil, short gap). (b) Loudspeaker B (short coil, long gap). examination of such a spectrum, though, may be difficult.
350 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS
To circumvent this problem, the spectral components of Of all the methods surveyed and simulated, multitone
different orders can be plotted separately. The separation testing seems to be the most feasible in the context of
of products of different orders needs postprocessing. distortion audibility for the assessment of loudspeaker
Another option to simplify the visual interpretation of large-signal performance and nonlinearity measurements.
measurement results is to plot the average level of dis- Nevertheless, harmonic and the traditional two-tone inter-
tortion confined within adjacent tones or to plot the level modulation distortion should not be withdrawn from the
of the distortion products averaged in a sweeping fre- list of standard characteristics. THD is a lower resolution
quency window. The comparatively high crest factor of measure of nonlinearity, but can still be used for the com-
multitone signals may give a “pessimistic” evaluation of parison of loudspeakers of the same type. Multitone test-
the distortion level, registering the low-probability high- ing is good for both intermodulation distortion measure-
level peaks that may not be psychoacoustically relevant ments and the maximum SPL check. For the latter the
when a testing signal is replaced by a musical one. More multitone burst should be used. In addition, multitone test-
research is required to put a reliable bridge between a ing is good for loudspeaker quality control testing.
loudspeaker’s response to multitone stimuli and the sound Setting any boundaries relating objective information
quality of the loudspeaker. The results of recently pub- and nonlinear distortion audibility requires extensive com-
lished psychoacoustical research of a correlation between puter simulation and involved psychoacoustical tests.
the responses to multitone stimuli and the audibility of Without such information about the relationship between
distortion [26] imply that such a goal might possibly be objective and subjective parameters, the measurement data
reached. will only be able to tell us that one loudspeaker has more
The incoherence functions of two 8-in (203-mm) woof- or less nonlinear distortion. The question of how critical
ers were modeled at two different levels of the input noise this difference is from the standpoint of distortion audi-
signal. In addition, the incoherence function of a 12-in bility will remain unanswered.
(305-mm) woofer was modeled for different levels of in-
put signal. The incoherence function detected the differ-
5 REFERENCES
ence in performance of the two motors, showing an in-
crease in the overall nonlinear distortion for a loudspeaker [1] E. Czerwinski, A. Voishvillo, S. Alexandrov, and
having stronger voice-coil inductance modulation and A. Terekhov, “Multitone Testing of Sound System Com-
stronger dependence of the Bl product on the voice-coil ponents—Some Results and Conclusions, Part 1: History
displacement. and Theory,” J. Audio Eng. Soc., vol. 49, pp. 1011–1048
There is a significant difference between THD, incoher- (2001 Nov.); “Multitone Testing of Sound System Com-
ence function, and reaction to multitone stimuli. All three ponents—Some Results and Conclusions, Part 2: Model-
methods provide an “integral” assessment of nonlinear ing and Application,” ibid., pp. 1181–1192 (2001 Dec.).
distortion. However, the information conveyed by these [2] N. Wiener, Nonlinear Problems in Random Theory
methods is principally different. THD characterizes only (Technology Press, M.I.T., and Wiley, New York, 1958).
harmonic distortion, omitting the intermodulation prod- [3] O. Dyrlund, “Characterization of Nonlinear Distor-
ucts, which significantly outweigh harmonics in a dis- tion in Hearing Aids Using Coherence Function,” Scand.
torted musical signal. The incoherence function expressed Audiol., vol. 18, pp. 143–148 (1989).
in percent may be interpreted as a measure of the “lack of [4] J. Kates, “On Using Coherence to Measure Distor-
similarity” between the reference and the output signal. tion in Hearing Aids,” J. Acoust. Soc. Am., vol. 91, pt.1,
Contrary to THD, the incoherence function takes into ac- pp. 2236–2244 (1992 Apr.).
count all nonlinear transformations of the signal as well as [5] Y. Cho, S. Kim, E. Hixson, and E. Powers, “A
the influence of noise. However, this function does not Digital Technique to Estimate Second-Order Distortion
distinguish the products of different orders, giving a Using Higher Order Coherence Spectra,” IEEE Trans. Sig-
“lumped” integral measure. The multitone stimulus pro- nal Process., vol. 40, pp. 1029–1040 (1992 May).
vides information about harmonic and intermodulation [6] U. Totzek and D. Preis, “How to Measure and In-
products of various orders, but does it in a more diversified terpret Coherence Loss in Magnetic Recording,” J. Audio
manner, making it possible to distinguish and analyze in- Eng. Soc., vol. 35, pp. 869–887 (1987 Nov.).
dividual nonlinear products of different orders. The [7] D. Jensen and G. Sokolich, “Spectral Contamina-
MTND response simplifies the interpretation of the non- tion Measurements,” presented at the 85th Convention of
linear reaction to multitone stimuli by merging the numer- the Audio Engineering Society, J. Audio Eng. Soc. (Ab-
ous individual distortion spectral components into a single stracts), vol. 36, p. 1034 (1988 Dec.), preprint 2725.
frequency response of distortion. [8] R. C. Cabot, “Fast Response and Distortion Test-
Measurement of the frequency-domain Volterra kernels ing,” presented at the 90th Convention of the Audio En-
is also discussed. Plotting these three-dimensional graphs gineering Society, J. Audio Eng. Soc. (Abstracts), vol. 39,
of distortion of the second and third order is only feasible p. 385 (1991 May), preprint 3045.
if a loudspeaker is characterized by a small level of dis- [9] R. Metzler, “Test and Calibration Application of
tortion (weak nonlinearity). This method quickly loses its Multitone Signals,” in Proc AES 11th Int. Conf. (1992
accuracy if the level of distortion is high. High-order Volt- May), pp. 29–36.
erra kernels do not have a readily comprehensible graphi- [10] J. Vanderkooy and S. G. Norcross, “Multitone
cal representation. Testing of Audio Systems,” presented at the 101st Con-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 351
VOISHVILLO ET AL. PAPERS
vention of the Audio Engineering Society, J. Audio Eng. Music and Speech Signals,” J. Audio Eng. Soc., vol. 51,
Soc. (Abstracts), vol. 44, p. 1174 (1996 Dec.), preprint pp. 1012–1031 (2003 Nov.).
4378. [27] M. Reed and M. Hawksford, “Identification of
[11] P. Schweizer, “Feasibility of Audio Performance Discrete Volterra Series Using Maximum Length Se-
Using Multitones,” in Proc. AES UK Conf. on the Measure quences,” IEEE Proc. Circuits Dev. Sys., vol. 143, pp.
of Audio (1997 Apr.), pp. 34–40. 241–248 (1996 Oct.).
[12] J. M. Risch, “A New Class of In-Band Multitone [28] S. Boyd, Y. Tang, and L. Chua, “Measuring Volt-
Test Signals,” presented at the 105th Convention of the erra Kernels,” IEEE Trans. Circuits Sys., vol. CAS-30, pp.
Audio Engineering Society, J. Audio Eng. Soc. (Ab- 571–577 (1983 Aug.).
stracts), vol. 46, p. 1037 (1998 Nov.), preprint 4803. [29] R. Nowak and B. Van Veen, “Random and Pseu-
[13] V. Volterra, Theory of Functionals and of Integral dorandom Inputs for Volterra Filter Identification,” IEEE
and Integrodifferential Equations (Dover, New York, Trans. Signal Process., vol. 42, pp. 2124–2135 (1994
1959). Aug.).
[14] M. Schetzen, Volterra and Wiener Theories of [30] H. K. Jang and K. J. Kim, “Identification of Loud-
Nonlinear Systems (Krieger Publ., Malabar, FL, 1989). speaker Nonlinearities Using the NARMAX Modeling
[15] A. M. Kaizer, “Modeling of the Nonlinear Re- Technique,” J. Audio Eng. Soc., vol. 42, pp. 50–59 (1994
sponse of an Electrodynamic Loudspeaker by a Volterra Jan./Feb.).
Series Expansion,” J. Audio Eng. Soc., vol. 35, pp. [31] D. Shorter, “The Influence of High Order Products
421–433 (1987 June). on Nonlinear Distortion,” Electron. Eng., vol. 22, pp.
[16] M. J. Reed and M. O. Hawksford, “Practical Mod- 152–153 (1950).
eling of Nonlinear Audio Systems Using the Volterra Se- [32] E. Geddes, Audio Transducers (Geddlee, 2002),
ries,” presented at the 100th Convention of the Audio pp. 236–241.
Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. [33] G. Golub and C. Van Loan, Matrix Computations
44, pp. 649–650 (1996 July/Aug.), preprint 4264. (Johns Hopkins University Press, Baltimore, MD, 1996).
[17] M. J. Reed and M. O. J. Hawksford, “Comparison [34] A. Voishvillo, “Nonlinear Distortion in Profes-
of Audio System Nonlinear Performance in Volterra sional Sound Systems—From Voice Coil to the Listener,”
Space,” presented at the 103rd Convention of the Audio presented at the Acoustical Conference of the Institute of
Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. Acoustics “Reproduced Sound 17” (Stratford-upon-Avon,
45, p. 1026 (1997 Nov.), preprint 4606. UK, 2001 Nov. 16–18).
[18] G. Cibelli, E. Ugolotti, and A. Bellini, “Dynamic [35] J. Bendat and A. Piersol, Random Data: Analysis
Measurements of Low-Frequency Loudspeakers Modeled and Measurement Procedures (Wiley, New York, 1986).
by Volterra Series,” presented at the 106th Convention of
the Audio Engineering Society, J. Audio Eng. Soc. (Ab-
APPENDIX 1
stracts), vol. 47, p. 534 (1999 June), preprint 4968.
[19] J. Hilliard, “Distortion Tests by the Intermodula-
tion Method,” Proc. IRE, vol. 29, pp. 614–620 (1941 REFERENCE LOUDSPEAKERS
Dec.).
[20] H. Scott, “The Measurement of Audio Distortion,” A1.1 12-in (305-mm) Woofer
Communications, pp. 25–32, 52–56 (1946 Apr.). The parameters of an experimental 12-in woofer were
[21] AES2-1984 (r. 2003), “AES Recommended Prac- obtained from measurements by the Klippel analyzer. In
tice—Specification of Loudspeaker Components Used in addition, force factor modulation by the voice-coil current
Professional Audio and Sound Reinforcement,” Audio En- was modeled using FEM (Fig. 23). The small-signal (rest-
gineering Society, New York (2003).
[22] IEC 60268-5, “Sound System Equipment—Part 5
Loudspeakers,” International Electrotechnical Commis-
sion, Geneva, Switzerland (2000).
[23] D. Keele, “Method to Measure Intermodulation
Distortion in Loudspeakers,” proposals for the working
group SC-04-03-C, Audio Engineering Society, New York
(2000).
[24] D. B. Keele, “Development of Test Signals for the
EIA-426-B Loudspeaker Power-Rating Compact Disk,”
presented at the 111th Convention of the Audio Engineer-
ing Society, J. Audio Eng. Soc. (Abstracts), vol. 49, p.
1224 (2001 Dec.), convention paper 5451.
[25] E. Zwicker, H. Fastl, and H. Frater, Psychoacous-
tics: Facts and Models, Springer ser. in Information Sci-
ences, 2nd updated ed. (Springer, New York, 1999). Fig. 23. Bl product affected by flux modulation. Experimental
[26] C. T. Tan, B. C. J. Moore, and N. Zacharov, “The 12-in (305-mm) woofer with overhung voice coil. Coil diameter
Effect of Nonlinear Distortion on the Perceived Quality of 75 mm; coil height 38 mm; top plate thickness 15 mm.
position) parameters of the 12-in woofer used in this work A1.2 Two 8-in (203-mm) Woofers
are given in Table 4. The length of the voice coil is 38 mm, The two 8-in woofers used in the experiments and mod-
the diameter is 75 mm, and the thickness of the top plate eling have similar suspension and different motors (Fig.
is 15 mm. The excursion-dependent parameters Cms, Kms, 25). One loudspeaker has a long coil (12 mm) and a short
Bl, and L1 are shown in Fig. 24. Distortions were simu- gap (6 mm), the other has a short coil (6 mm) and a long
lated for the woofer placed in a sealed 40-liter box. gap (12 mm) (Fig. 26). The diameter of both coils is 1.5 in
Table 4
Fig. 24. Excursion-dependent parameters of 12-in (305-mm woofer. (a) Suspension compliance. (b) Suspension stiffness. (c) Bl
product. (d) Voice-coil inductance.
(38 mm). Both loudspeakers did not have dust caps to its initial value of 0.41 mm/N, the Xmax values of both
prevent any possible artifacts caused by the compression loudspeakers were set to 10 mm. At this displacement
of the air underneath a dust cap or distortion due to the the Bl product in loudspeaker A is 2.0 T ⭈ m, which is
turbulent airflow in a pole piece vent. The small-signal 22% of its initial value of 9.0 T ⭈ m. The Bl product of
(rest-position) parameters of the loudspeakers are listed in the second driver drops to 3.5 T ⭈ m, which is 47% of
Table 5. its resting position value of 7.5 T ⭈ m. Such a compar-
The nonlinear displacement-dependent parameters for atively moderate decrease in the Bl product for loud-
loudspeaker A are given in Fig. 27, those for loudspeaker speaker B is explained by the use of an underhung voice
B in Fig. 28. Loudspeaker A (long coil, short gap) has coil.
stronger overall variations of the Bl product and voice-coil Using similar suspensions in both loudspeakers and set-
inductance. Using the criterion of the maximum displace- ting identical values of Xmax ⳱ 10 mm helped to compare
ment Xmax corresponding to a decrease in the suspension the difference in nonlinear distortion in these loudspeakers
compliance Cms(x) to 0.12 mm/N, which is to 30% of caused by the difference in motor parameters.
Table 5
Fig. 27. Parameters of loudspeaker A (long coil, short gap) as a function of voice-coil displacement. (a) Suspension compliance. (b)
Suspension stiffness. (c) Bl product; current increments 2 A. (d) Voice-coil inductance.
Fig. 28. Parameters of loudspeaker B (short coil, long gap) as a function of voice-coil displacement. (a) Suspension compliance. (b)
Suspension stiffness. (c) Bl product; current increments 2 A. (d) Voice-coil inductance.
The system of Eqs. (24), (25) was transformed into the is expressed through the loudspeaker parameters as
再 冋 册冎
canonical Cauchy form and solved numerically by the
dZ1共t兲 di2 1 dL2共x兲 dx
classical Runge–Kutta method of the fourth order. The = = R2共x兲i − i2 R2共x兲 −
vector form of the state variables is dt dt L2共x兲 dx dt
册
⌽共t, Z兲 = 兵1共t, Z兲, 2共t, Z兲, 3共t, Z兲, 4共t, Z兲其 (30)
dL2共x兲 dx
where z1(t) denotes the current i2(t) in the parainductance − i2 − Bl共x兲 − ⌬Bl共x, i兲
dx dt
L2(x), z2(t) stands for the voice-coil current i, z3(t) ac- (35)
counts for the voice-coil displacement x(t), z4(t) is the dZ3共t兲 dx
= =y
voice-coil velocity dx(t)/dt, and ⌽(t, Z), is the vector for- dt dt
再
mulation of the derivatives of the current di2(t)/dt, the
derivative of the voice-coil current di(t)/dt, the voice-coil dZ4共t兲 dy 1 dLe共x兲 i2
= = 关Bl共x兲 + ⌬Bl共x, i兲兴i −
velocity y(t), and the voice-coil acceleration dy(t)/dt. dt dt mms dx 2
冎
The Cauchy vector form of the system of Eqs. (24), (25)
and the initial conditions are dL2共x兲 i22
− − Rmsy − Kms共x兲x .
dx 2
dZ共t兲
= ⌽关t, Z共t兲兴 The function ⌬Bl(x, i) was calculated by the finite-
dt (31)
element method (FEM). The FEM static model of magnet
Z共t0兲 = Z0.
assembly was built and the model of the voice coil was
The integration steps are carried out according to the incorporated. The voice-coil model was ascribed the geo-
following algorithm: metrical dimensions, number of turns, and constant cur-
rent. Using the quasi-dynamic approach, that is, assigning
Zn+1 = Zn + hKn (32) different values of voice-coil current (of positive and
1 negative polarity), the distribution of the gap induction
Kn = 共K共n1兲 + 2K共n2兲 + 2K共n3兲 + K共n4兲兲 (33) was calculated. This procedure was repeated a number of
6
times for different positions of the voice coil. Afterward
and the Bl product was calculated for the corresponding dis-
crete values of the voice-coil current ± Ia, ± Ib ± ⭈ ⭈ ⭈ ± Im
K共n1兲 = ⌽共tn, Zn兲 and the positions of the voice coil ± X1, ± X2 ± ⭈ ⭈ ⭈ ± Xn.
冉 h
2
h
K共n2兲 = ⌽ tn + , Zn + K共n1兲
2 冊 The function ⌬Bl(x, i) approximated the variation of the Bl
product caused by the voice-coil current.
冉 冊
(34) The loudspeaker parameters were measured by the
h h Klippel analyzer and incorporated into the model. Integra-
K共n3兲 = ⌽ tn + , Zn + K共n2兲
2 2 tion of the system of Eqs. (24), (25) was performed using
different input signals to model different measurement
K共n4兲 = ⌽共tn + h, Zn + hK共n3兲兲
conditions. The signal duration and sampling frequency
where h is the time increment. were optimized for a particular signal. The sampling was
The vector linked to the time interval h used in the Runge–Kutta
冋 册
solution of the system (24), (25). The details of the solu-
dZ共t兲 dZ1共t兲 dZ2共t兲 dZ3共t兲 dZ4共t兲 T
tion are not discussed here because they do not have direct
= , , ,
dt dt dt dt dt relation to the subject of this paper.
THE AUTHORS
Alexander Voishvillo was born and raised in Leningrad ing a multichannel sound spectrum analyzer for jet en-
(now Saint Petersburg), Russia. He received a Ph.D. de- gines. At his next job, with Bendex Pacific, he designed
gree in 1987 for work centered on computer optimization the first of its kind 10 000-watt germanium solid-state
of loudspeaker crossover systems. amplifier for sonar transducers.
From 1977 to 1995 he worked at the Laboratory of His passion for music determined his future life and
Prospective Research and Development, Popov Institute career. Coming from a musically deprived childhood, he
for Radio and Acoustics, Saint Petersburg. While at the was exposed to a live orchestral sound as a young man,
Popov Institute he designed loudspeaker systems for which affected him profoundly. He wanted to be able to
manufacturers and did research work on loudspeakers. He replicate this experience at his time of choosing. While at
was responsible for the development of specialized studio Bendex he established Vega Associates, which designed
monitors for the Russian Broadcasting Corporation. In and manufactured custom high-fidelity systems. In 1973 it
1995 he moved to California, accepting an invitation of became Cerwin-Vega, Inc. He headed the company for the
Gene Czerwinski of Cerwin-Vega Inc., to head a new next three decades and developed the first quarter-kilowatt
research and development group. His responsibilities in- transistor audio amplifier and four-way 18-in loudspeaker
cluded the development of new transducers, as well as system. In 1964 he started experimenting with live sound
research work on nonlinearity in sound systems and ad- reinforcement for large-venue rock concerts. He worked
vanced methods of measurement of nonlinear distortion in with Universal Studios to develop the Sensurround sys-
audio equipment. He continued his collaboration with tem, which received an Academy Award in 1974 for spe-
Gene Czerwinski at Cerwinski Labs, a new R&D com- cial technical achievement in sound. Sensurround was
pany established in 2002, where he has been working on used nationwide in cinemas to realistically simulate vibra-
the development of original professional high-frequency tions for the “Earthquake” movie, and was a precursor to
transducers and on alternative methods of assessing non- the high-impact theater sound systems later developed by
linearity in audio equipment. Lucas Sound and Dolby, Inc. His lifelong interest in music
Dr. Voishvillo holds several U.S. patents on new types motivated Mr. Czerwinski to establish in 1989 a nonprofit
of transducers. He is the author of more than 30 publica- recording company, the MAMA (Musical Archives, Mu-
tions on loudspeakers, including the engineering book on sical Archives) Foundation. The Foundation’s goal is to
loudspeaker theory and design, High Quality Loudspeaker preserve the music of culturally significant artists whose
Systems and Transducers, published in Russia in 1985 as music does not have broad commercial appeal. MAMA
well as several publications in the Journal. He is a mem- has released over 30 state-of-the-art digital recordings of
ber of the Audio Engineering Society and participates in a jazz and big-band music, and garnered critical acclaim,
working group on loudspeaker measurements and model- including six Grammy nominations and two Grammy
ing at the AES Standards Committee. He is also a member Awards. In 2003 he sold Cerwin-Vega and founded Cer-
of the JAES Review Board. winski Laboratories, Inc. At Cerwinski Labs he continues
important research into air-propagation distortion and
●
multichannel sound reinforcement and reproduction.
Alexander Terekhov was born in Leningrad (now Saint Mr. Czerwinski has authored and coauthored numerous
Petersburg), Russia, in 1952. He received an M.Sc. degree patents on loudspeakers (six patents are currently pending).
in broadcasting and radio communication from the State
●
University of Telecommunications in 1974.
From 1981 to 1991 he worked as a research associate in Sergei Alexandrov was born in Leningrad (now Saint
the Laboratory of Prospective Acoustic Research and De- Petersburg), Russia. He received an M.Sc. degree in elec-
velopment at the Popov R&D Institute for Radio and trical engineering from Leningrad University in 1979.
Acoustics, Saint Petersburg. His activities included re- From 1969 to 1978 he worked as a development engi-
search on loudspeaker testing and measurements and bin- neer at the Marine Equipment R&D Institute. From 1978
aural stereophony. In 1991 he joined Audion Ltd., Saint to 1991 he held a principal position in the R&D group at
Petersburg, as a senior research associate and chief engi- Popov R&D Institute for Radio and Acoustics, Saint Pe-
neer. He came to the United States in 1996 and since 1997 tersburg, where he developed power amplifiers and audio
has been employed as an acoustic research engineer at signal processors for high-fidelity and studio applications.
Cerwin-Vega, Inc., and subsequently at Czerwinski Labo- In 1991 he cofounded and became president and CEO of
ratories. His professional activity is being focused on Audion Ltd., Saint Petersburg, an audio electronics and
acoustic research, the development of new software for loudspeaker test equipment manufacturing company. In
various R&D needs, including loudspeaker measurement 1996 he came to the United States and held a staff position
systems, and research on air propagation distortion. as an acoustic R&D engineer at Cerwin-Vega, Inc., until
Mr. Terekhov has presented technical papers at Russian 2002. There he developed original computer-controlled
audio conventions and in recent years coauthored several loudspeaker measurement systems, loudspeaker power
papers on air propagation distortion published in the Journal. testing devices, and a computer-based system for nonlin-
ear distortion audibility research. He continued his col-
●
laboration with Gene Czerwinski (the founder of Cerwin-
Eugene Czerwinski studied electrical engineering at the Vega) at Cerwinski Labs, where he has been working on
University of Toledo, where he received a B.S.E.E. degree. loudspeaker magnet system field optimization and non-
He worked as an associate professor of electronics at standard test equipment design.
the University of Michigan Engineering Research Insti- Mr. Alexandrov has presented a number of technical
tute. Later he was a development engineer at Willys Mo- papers at Russian audio and electronics conventions. He
tors Electronic Division, where he designed UHF TV has publications on homomorphic signal analysis, binaural
transmitting equipment, studio cameras, and audio sys- stereophony, audio signal processing, and loudspeaker
tems. In 1954 he joined the test division of Douglas Air- measurements to his credit and coauthored several publi-
craft, where he designed measurement equipment, includ- cations in the Journal.
Georgia Institute of Technology, School of Electrical and Computer Engineering, Atlanta, GA 30332-0250, USA
Two simple Zobel impedance compensation networks for the lossy voice-coil inductance
of a loudspeaker driver are described. Design equations for the element values are given, and
a numerical example is presented. The synthesis procedure can be extended to realize general
RC networks which exhibit an impedance that decreases with frequency at a rate of −n
dec/dec, where 0 < n < 1.
R0 + Z⬘0 be connected in parallel with Z1. The condition circuit has an impedance equal to RE at all frequencies [2].
that the parallel connection have a resistive impedance In this case R1 and C1 form a simple Zobel network, which
equal to R0 is that Z⬘1 ⳱ R20/Z1. This is a general result that cancels the lossless LE from the input impedance of the
is not specific to loudspeakers. For completeness, its deri- driver.
vation is given in the following where the notation used is In [10] it is shown that a lossy voice-coil inductance has
that for the voice-coil impedance. an impedance that can often be approximated by
The voice coil of a loudspeaker driver exhibits both a where Le and n are constants. Fig. 2(b) shows the circuit of
series resistance and an inductance. In the following it is Fig. 2(a) with LE replaced with ZL(j) and C1 replaced
assumed that the resistance is separated and treated as a with an impedance Z1(j). Let Zin be the input impedance
separate element, that is, not a part of the voice-coil in- to the circuit. The source current Is can be written
ductance. Fig. 1 shows the voice-coil equivalent circuit of
a driver in an infinite baffle [8]. The resistor RE and the Vs Vs − V1 Vs − VL
Is = = + . (5)
inductor LE represent the voice-coil resistance and induc- Zin R1 RE
tance. The elements RES, LCES, and CMES model the mo- If Zin ⳱ RE and R1 ⳱ RE, this equation can be solved for
tional impedance generated when the voice coil moves. Vs to obtain
These elements are related to the small-signal parameters
of the driver by the equations [9] Z1 ZL
Vs = V1 + VL = Vs + Vs (6)
RE + Z1 RE + ZL
QMS
RES = R (1) where voltage division has been used to express V1 and VL
QES E
as functions of Vs. This equation can be solved for Z1 to
RE obtain
LCES = (2)
冋 冉 冊 冉 冊册
2fSQES
R2E R2E n n
QES Z1共j兲 = = cos − j sin . (7)
CMES = (3) ZL共j兲 Le n 2 2
2fSRE
It follows that Zin ⳱ RE if R1 ⳱ RE and Z1(j) is given by
where QMS is the mechanical quality factor, QES is the Eq. (7). In this case the high-frequency voice-coil imped-
electrical quality factor, and fS is the fundamental reso- ance is resistive at all frequencies. Note that |Z1(j)| ⬀ −n
nance frequency. so that a plot of |Z1(j)| versus on log–log scales is a
Above the fundamental resonance frequency, the ca- straight line with a slope of −n dec/dec. It should also be
pacitor CMES becomes a short circuit and the voice-coil noted that Z1(j) is the dual of ZL(j) scaled by the factor
impedance can be approximated by RE in series with LE. R2E, which follows from the fundamental principle derived
The equivalent high-frequency circuit is shown in Fig. by Zobel.
2(a). A resistor R1 in series with a capacitor C1 is shown
in parallel with the voice-coil impedance. At low frequen- 2 APPROXIMATING IMPEDANCE
cies the impedance of the circuit is RE. If the inductor is
lossless, the high-frequency impedance is R1. If R1 ⳱ RE Fig. 3 shows the Bode magnitude plot of an impedance
and R1C1 ⳱ LE/RE, it is straightforward to show that the which exhibits a slope of −n dec/dec between the frequen-
cies f1 and f6. Also shown are the asymptotes of an ap-
proximating impedance which exhibit alternating slopes of
−1 and 0. Four frequencies are labeled between f1 and f6 at
which the slopes of the asymptotes change. In the general
case, let there be N frequencies, where N is even and N ⱖ
4. In this case the number of asymptotes having a slope of
0 is (N − 2)/2. Let k be the ratio of the asymptotic ap-
Fig. 1. Equivalent circuit of voice-coil impedance. proximating impedance to the desired impedance at f ⳱ f1.
The desired impedance at f1 is labeled |Z1| in Fig. 3 and is Let Z1( f) be the approximating impedance function. It is
given by given by
|Z1| = Le共2f1兲n. (8) k |Z1| 1 + j共 f Ⲑ f2兲
Z1共 f 兲 = × . (15)
The approximating impedance at f1 is labeled k|Z1|. j共 f Ⲑ f1兲 1 + j共 f Ⲑ f3兲
With n, f1, and fN specified, the object is to specify k and 2.2 Case B: N = 6
f2 through fN−1 such that the ratios of each even sub-
Let f1 and f6 be specified. For N ⳱ 6, Eqs. (9)–(11) can
scripted frequency to the odd subscripted frequency to its
be solved to obtain
left are equal and the intersection points (indicated by dots
冉冊
n共1−n兲
on the plot) occur at the geometric mean of the adjacent f 2共2+n兲
frequencies. In this case the lengths of the six dashed k= 6 (16)
f1
vertical lines in Fig. 3 are equal and the asymptotes of the
2 n
approximating impedance approximate the desired imped- f2 = f 2+n f 2+n (17)
1 6
ance in an equal ripple sense between f1 and fN.
1+n 1
It is straightforward to show that the following condi-
f3 = f 2+n f 2+n (18)
tions must hold: 1 6
冉冊 冉冊 冉 冊
1−n 1−n 1−n 1 1+n
f 2 f 2 fN 2 f4 = f 2+n f 2+n (19)
k= 2 = 4 =⭈⭈⭈= (9) 1 6
f1 f3 fN−1
n 2
f5 = f 2+n f 2+n. (20)
f2 = f 1−n
1 f n
3
1 6
f3 = f n2 f 1−n
4
2.3 Example Plots
To illustrate the accuracy of the approximating func-
f5 = f n4 f 1−n
6
(11) tions, let the impedance given by Eq. (7) be approximated
⯗ over a three-decade band for the case n ⳱ 0.5. The smaller
fN−1 = f N−2
n
f 1−n the value of n, the poorer the approximation. In the au-
N .
thor’s experience, the value of n for most loudspeaker
Solutions to these equations are given next for the cases drivers is in the range from 0.6 to 0.7. Thus the value n⳱
N ⳱ 4 and N ⳱ 6. 0.5 results in an approximation that is worse than what can
be expected with the typical driver.
2.1 Case A: N = 4 Fig. 4 shows the calculated Bode magnitude plots.
Let f1 and f4 be specified. For N ⳱ 4, Eqs. (9)–(11) can Curve a is the desired impedance. Curve b is the approxi-
be solved to obtain mating impedance for N ⳱ 4. Curve c is the approximat-
ing impedance for N ⳱ 6. It can be seen that the approxi-
冉冊
n共1−n兲
f 2共1+n兲 mating impedance functions ripple about the desired
k= 4 (12)
f1 function over the band of interest with a maximum devia-
1 n tion occurring at the two frequency extremes. Between the
f2 = f 1+n
1
f 1+n
4
(13) two extremes, the maximum deviation is less than it is at
n 1 the extremes because the design equations are derived
f3 = f 1+n
1
f 1+n
4
. (14) from the asymptotes of the approximating function.
Fig. 4. Example plots of desired impedance (curve a) and approximating impedances (curves b, c) versus frequency for n ⳱ 0.5.
3 THE COMPENSATING CIRCUITS The impedance of the circuit is equal to that of Eq. (21) if
3.1 Network A f2 f4
C1 = (33)
Fig. 5(a) shows a circuit consisting of two capacitors 2f1 f3 f5 k |Z1|
and one resistor, which can be used to realize the imped-
ance of Eq. (15). The impedance is given by f2 − f3 − f5 + f3 f5 Ⲑ f2
C2 = C1 (34)
f4 − f2
1 1 + s Ⲑ 2
Z1共s兲 = × (22)
s共C1 + C2兲 1 + s Ⲑ 3 1
R2 = (35)
2f2C2
where s ⳱ j ⳱ j2f and
1 f3 − f4 + f5 − f3 f5 Ⲑ f4
2 = 2f2 = (23) C3 = C1 (36)
R2C2 f4 − f2
C1 + C2 1
3 = 2f3 = . (24) R3 = . (37)
R2C1C2 2f4C3
calculated from Eqs. (38) through (40). It can be shown and their application. The dc voice-coil resistance was
that the elements RP, LP, and CP are given by found to be RE ⳱ 5.1 ⍀. The voice-coil impedance was
␣QES QLRE measured at 62 frequencies between 14.8 Hz and 20 kHz
RP = (44) with an MLSSA analyzer. The data in the range from 1.8
h
to 20 kHz were used to calculate the lossy voice-coil in-
␣QESRE ductance parameters. Calculations on the MLSSA data
LP = (45)
2fSh2 yielded the parameters RES ⳱ 26.9 ⍀, LCES ⳱ 38.1 mH,
1 CMES ⳱ 424 F, n ⳱ 0.764, and Le ⳱ 0.0150. Fig. 7
CP = (46) shows the measured magnitude and phase of the imped-
2fS␣QESRE ance as circles and the impedance calculated from the
where ␣ ⳱ VAS/VB is the system compliance ratio, QL is equation
冉 冊
the enclosure quality factor at the Helmholtz resonance
1 1 −1
frequency, and h ⳱ fB/fS is the system tuning ratio [13]. ZVC共j兲 = RE + Le共j兲n + + + jCMES
RES jLCES
5 NUMERICAL EXAMPLE (47)
One sample of the JBL model 2241H 18-in (0.457-m) shown as a solid line, where ⳱ 2f. The figure shows
professional woofer was selected to illustrate the networks excellent agreement between the measured and calculated
Fig. 6. Compensation circuits for low-frequency impedance rise. (a) Infinite-baffle and closed-box drivers. (b) Vented-box drivers.
Fig. 7. Impedance measured and calculated from Eq. (47) (———) for JBL driver. (a) Magnitude. (b) Phase.
data, thus verifying the calculated values of the driver frequency band from f1 ⳱ 300 Hz to fN ⳱ 20 kHz. Table
parameters. 1 summarizes the intermediate calculations for the two
The element values for the Zobel networks were calcu- networks. Table 2 gives the calculated element values.
lated to compensate for the voice-coil inductance over the Network A is the network of Fig. 5(a). Network B is that
of Fig. 5(b). The element values for the optional network
Table 1. Summary intermediate calculations.
to compensate for the impedance rise at resonance have
Network A Network B the values RS ⳱ 0.858 ⍀, LS ⳱ 11 mH, and CS ⳱ 1460
F. It is quite obvious that these values would be imprac-
N 4 6
k 1.24 1.15
tical in a passive crossover network. Indeed, an 11-mH
| Z1| 4.77 ⍀ 4.77 ⍀ air-core inductor would in all probability have a series
f1 300 Hz 300 Hz resistance greater than 0.858 ⍀. For these reasons, the
f2 1.85 kHz 958 Hz impedance Z2(j) has been omitted in the following. How-
f3 3.24 kHz 1.37 kHz ever, it would be expected that the element values would
f4 20 kHz 4.38 kHz
f5 6.26 kHz
fall in a more practical range for midrange and tweeter
f6 20 kHz drivers which have a much higher resonance frequency
than the driver considered here.
Table 2. Element values. Fig. 8 shows the magnitude and phase of the voice-coil
impedance with and without Zobel network A. The plots
Network A Network B are calculated from the measured voice-coil data and not
R1 5.1 ⍀ 5.1 ⍀ those predicted by Eq. (47). The plots for network B are
C1 51.2 F 47.4 F not shown because, for all practical purposes, they are not
R2 2.23 ⍀ 5.25 ⍀ distinguishable from those of network A. However, this may
C2 38.5 F 31.6 F not be the case with drivers that have a lower value of n.
R3 2.03 ⍀ To evaluate the effect of the Zobel networks on the
C3 17.9 F
performance of passive crossover networks, the voice-coil
Fig. 8. Impedance of JBL driver with and without Zobel network A. (a) Magnitude. (b) Phase.
voltage of the JBL driver was calculated for a source volt- Third-order networks are usually designed for a Butter-
age of 1 V rms with second- and third-order low-pass worth response. The crossover frequency is the −3-dB
crossover networks. The crossover frequency was chosen frequency of the network. The element values for the
to be fc ⳱ 800 Hz, which might be a typical value when third-order network in Fig. 9(b) are given by
this driver is used with a midrange horn. The circuit dia-
grams are shown in Fig. 9. Second-order networks are 3RE
L1 = = 1.52 mH (50)
usually designed for critical damping. The crossover fre- 4 fc
quency is the −6-dB frequency of the network. The ele-
ment values for the second-order network in Fig. 9(a) are 2
C1 = = 52 F (51)
given by 3 fc RE
RE RE
L1 = = 2.03 mH (48) L2 = = 0.507 mH. (52)
fc 4 fc
1 Figure 10(a) shows the calculated voice-coil voltage for
C1 = = 8.43 F. (49)
4 fcRE the second-order crossover network with and without Zo-
bel network A. With the network, the response follows
what would be expected of a second-order crossover net-
work. Without the network, the voltage exhibits a peak at
1.8 kHz that is 16.7 dB greater than the response with the
network. Fig. 10(b) shows the calculated voice-coil volt-
age for the third-order crossover network with and without
Zobel network A. With the network, the response follows
Fig. 9. (a) Second-order crossover network. (b) Third-order what would be expected of a third-order crossover net-
crossover network. work. Without the network, the voltage exhibits a peak at
Fig. 10. Voice-coil voltage of JBL driver with (curve a) and without (curve b) Zobel network A. (a) Second-order crossover network.
(b) Third-order crossover network.
630 Hz that is 10 dB greater than the response with the Design” (Reprint), J. Audio Eng. Soc., vol. 19, pp. 12–19
network. Above 1.6 kHz the response without the network (1971 Jan.).
lies above the response with the network and exhibits a [3] A. N. Thiele, “Optimum Passive Loudspeaker Di-
slope of approximately −43 dB/dec. The slope with net- viding Networks,” Proc. IREE (Australia), vol. 36, pp.
work A approaches −60 dB/dec, which is the correct slope 220–224 (1975 July).
for a third-order network. Crossover simulations with Zo- [4] General Radio Co., Instruction Manual—Type 1382
bel network B have been omitted because the results were Random-Noise Generator (1968).
almost identical. However, this may not be the case with [5] National Semiconductor Corp., Audio Handbook
drivers having a lower value of n. The plots in Fig. 10 (1976, 1977, 1980).
were calculated using the measured voice-coil data and not [6] P. Horowitz and W. Hill, The Art of Electronics
that predicted by Eq. (47). The plots show some evidence (Cambridge University Press, Cambridge, MA, 1980).
of the rise in impedance at the fundamental resonance fre- [7] P. G. L. Mills and M. O. J. Hawksford, “Transcon-
quency of the driver. This could be eliminated by the addition ductance Power Amplifier Systems for Current-Driven
of the circuit in Fig. 6(a) in parallel with the Zobel network. Loudspeakers,” J. Audio Eng. Soc., vol. 37, pp. 809–822
(1989 Oct.).
6 CONCLUSION [8] R. H. Small, “Direct-Radiator Loudspeaker System
Analysis,” J. Audio Eng. Soc., vol. 20, pp. 383–395 (1972
The high-frequency rise in the voice-coil impedance of a June).
loudspeaker driver caused by a lossy voice-coil inductance [9] W. M. Leach, Jr., Introduction to Electroacoustics
can be approximately canceled in the audio band by an RC and Audio Amplifier Design, 3rd ed. (Kendall/Hunt, Du-
Zobel network connected in parallel with the voice coil. The buque, IA, 2003).
simplest network consists of two resistors and two capacitors. [10] W. M. Leach, Jr., “Loudspeaker Voice-Coil Induc-
More complicated networks have three or more resistors and tance Losses: Circuit Models, Parameter Estimation, and
three or more capacitors. For a typical driver, the simplest Effect on Frequency Response,” J. Audio Eng. Soc., vol.
network can yield excellent results. Because the lossy voice- 50, pp. 442–450 (2002 June).
coil inductance can cause major perturbations in the per- [11] J. Borwick, Ed., Loudspeaker and Headphone
formance of crossover networks, the parameters n and Le Handbook, p. 216 (Focal Press–Elsevier, Burlington, MA,
should be included in the list of specifications for drivers
2001).
as an aid in the design of Zobel compensation networks.
[12] R. H. Small, “Closed-Box Loudspeaker Systems,
Parts I and II,” J. Audio Eng. Soc., vol. 20, pp. 798–808
7 REFERENCES (1972 Dec.); vol. 21, pp. 11–18 (1973 Jan./Feb.).
[1] O. J. Zobel, “Theory and Design of Uniform and [13] R. H. Small, “Vented-Box Loudspeaker Systems,
Composite Electric Wave Filters,” Bell Sys. Tech. J., vol. Parts I-IV,” J. Audio Eng. Soc., vol. 21, pp. 363–372 (1973
2, pp. 1–46 (1923 Jan.). June); pp. 438–444 (1974 July/Aug.); pp. 549–554 (1973
[2] R. H. Small, “Constant-Voltage Crossover Network Sept.); pp. 635–639 (1973 Oct.).
THE AUTHOR
W. Marshall Leach, Jr. received B.S. and M.S. degrees in he served as an officer in the U.S. Air Force. Since 1972
electrical engineering from the University of South Carolina, he has been a faculty member at The Georgia Institute of
Columbia, in 1962 and 1964, and a Ph.D. degree in electrical Technology, where he is presently professor of electrical
engineering from The Georgia Institute of Technology in engineering. Dr. Leach teaches courses in applied electro-
1972. In 1964 he worked at the National Aeronautics and magnetics and electronic design. He is a fellow of the Audio
Space Administration in Hampton, VA. From 1965 to 1968 Engineering Society and a senior member of the IEEE.
GEOFF R. SCHMIDT**
AND
MATTHEW K. BELMONTE**
Departments of Psychiatry and Experimental Psychology, University of Cambridge, Cambridge CB2 2AH, UK
0 INTRODUCTION but on the actual data being indexed. The complexity in-
herent in such algorithms is a product of the fact that
The field of informatics is in the midst of a transforma- perceptual similarities appear not so much in a media file’s
tion from purely textual systems, in which indexing is raw data as in its many derived properties. Similarities
driven by tried-and-true methods of string matching, to apparent to the human senses are seldom evident in com-
multimedia systems, in which measures of similarity are parisons of the data’s raw bytes. Images whose actual
more dimensionally complex and computationally inten- pixel values are utterly different from each other may
sive. As the capacity for information storage has outpaced nonetheless look alike to the human visual system, and
developments in algorithms, indexing of pictures and sounds whose time-series data are uncorrelated may none-
sounds has been left to rely not on the actual records being theless sound alike to the human auditory system. Samples
indexed, but rather on file names or other textual labels may vary along multiple perceptual axes, making the
attached to these records. Anyone who has made use of search space high-dimensional and therefore making near-
image search engines or peer-to-peer file sharing systems est-neighbor searches exponentially more difficult. A
knows that these labels, or metadata, inevitably fail to simple example of this dimensionality problem is a time–
capture essential information. The top matches returned by frequency representation of a sound, in which the number
a search may turn out to be altered editions related to the of dimensions is the product of the number of time steps
target item (for example, a live concert recording, a remix, and the number of frequency bands. In order for the prob-
or a cover), or even the wrong item altogether. lem of comparing media files to be rendered tractable, the
To escape this dependence on fallible metadata, search dimensionality of the inputs must be reduced. The process
algorithms are needed that are keyed not on attached text of deterministically computing a relatively low-dimensional,
unique identifier for a media file is known as the fingerprint-
*Manuscript received 2003 August 12; revised 2003 Decem- ing problem. Fingerprinting in the audio domain, in particu-
ber 30. lar, has received a great deal of attention due to its immediate
**Formerly with Tuneprint, Inc., Cambridge, MA 02139- applications in archive indexing and searching, automated
4056, USA. broadcast monitoring, and digital rights management.
366 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING
The audio fingerprinting schemes published to date ex- One way around the problem of temporal blurring in
tract information by applying frequency-domain analyses pooled data is to sum not the match scores themselves but
within narrow temporal windows. Some of these methods rather the outputs of some nonlinear “squashing function”
use a straightforward time–frequency representation as in- applied to the match scores. Good matches of short-lived
put to further processing steps [1]–[6] whereas others use features will thus be given a disproportionate effect on the
the Fourier spectrum to compute derived measures such as match quality of the recording as a whole. A system de-
modulation frequency [7], measures of spectral shape and signed by Papaodysseus et al. [5] adopts this approach
tonality [8], [9], MPEG-7 feature sets [10], or hash bits using a simple step function as a squashing function, that
based on frequency-specific temporal changes [11], [12]. is, matches at each time point are thresholded, and the
One published method extracts individual notes and then summed match score is incremented only if the match at
applies conventional string-matching algorithms to se- the current time point is above threshold. With this ap-
quences of these notes [13], though of course such a proach also, though, useful data are lost, since information
method does not allow for complexities such as superpo- on partial, subthreshold matches within individual time
sitions of various instruments or vocals. All of these sys- points has no effect on matching the recording as a whole,
tems either depend on supervised learning algorithms ap- even when such partial matches occur at a large number of
plied to fairly raw time–frequency representations, or time points.
decide a priori what specific spectral features or measures A difficulty in applying information from partial
will be relevant. In the former case, perceptually relevant matches is the time complexity of searching for these
information is submerged in a large corpus of data, making matches. Exact matches, on the other hand, are much more
statistical learning algorithms liable to discriminate on the efficiently identified. Quantization of the signal’s features
basis of perceptually irrelevant features. In the latter case, will produce some number of exact matches with identi-
feature extraction discards a great deal of useful informa- cally quantized records in the database. The database re-
tion along with the irrelevant details. Either way, classifi- cordings that generated these exact matches can then be
cation is degraded. searched for approximate matches. Constraining the
In addition to this loss of perceptual information within search space in this way renders the approximate search
time points, many existing techniques do not make full use problem tractable. This strategy has been applied in a
of information on changes in a recording’s perceptual hashing system based on temporal differences in subband
qualities across time points. A single recording may amplitude differences (that is, a double differentiation in
evolve temporally through many different styles, tempos, frequency and time) [11], [12]. Such a search method pre-
and timbres. Previous audio fingerprinting methods have serves information across time points, though it remains an
suffered from a needlessly exclusive view of time and open question whether this method of time–frequency dif-
frequency representations, collapsing localized frequency- ferentiation can be improved on for extracting perceptu-
domain features across long intervals of time. This strat- ally relevant information within time points.
egy accomplishes a great deal of data reduction, at the Losses of useful data within and across time points are
expense of blurring out short-lived properties that could be likely a major reason why fingerprinting schemes in gen-
useful for identification. The Muscle Fish system [8], for eral have suffered from error rates that would be unac-
example, computes feature vectors for a large set of ceptable in any large-scale system. Though more advanced
closely spaced time frames within a recording, but then systems (such as [12]) promise improved results, most of
retains only the mean, variance, and autocorrelation of the methods cited feature rates of successful matching in
these feature vectors over the entire recording. This strat- the range of 90 to 99%. When matching against a database
egy works well for brief samples such as sound effects [8] of hundreds of thousands or even millions of recordings,
but is unlikely to scale well to temporally extended re- even 99% is unacceptable. Worsening the outlook for scal-
cordings. A system described recently by Sukittanon and ability is the fact that many of these error rates arise from
Atlas [7] adopts a similar approach, computing subband tests against small databases on the order of hundreds [8],
modulation frequencies in each frame and then preserving [1]–[4] or thousands [13], [9], [7] of recordings, or within
only the centroids of these features across frames. A sys- restricted musical genres [1], [3]. In order to improve per-
tem described by Burges et al. [6] operates only on brief formance, strategies must be developed to extract selec-
clips, basing its classifications solely on the one frame that tively the perceptually relevant information within time
differs least from a frame in the database, without using points, and to preserve this information across time points.
information from surrounding frames. Other systems [2], For both of these goals one can look to human neurobiol-
[4] pool feature vectors across the entire recording into a ogy as a model.
histogram of vector-quantization bins, destroying temporal The goal of preservation of feature-specific information
sequence information as the data from the individual vec- across time points has been addressed in a biologically
tors are summed into the histogram. Another system based motivated neural network model proposed by Hopfield
on vector quantization [9] sums the error between feature and Brody [14]. This model recognizes audio signals via a
vectors and vector quantization codebook entries, and then massively parallel network of independent feature detec-
classifies the recording as a whole on the basis of the tors whose outputs decay with various time constants. The
codebook entry whose summed error is least. Here again, recognition signal consists of a selectively weighted sum
information on the recording’s time course is lost in the of these many time-dependent, feature-dependent mea-
summation. sures. Such a network of multiple independent feature de-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 367
SCHMIDT AND BELMONTE ENGINEERING REPORTS
tectors is the parallel-processing equivalent of a serial In order to preserve information across time points, we
strategy involving multiple independent matching of fea- adopt the philosophy that many approximate tests of
tures at a series of time points, followed by an evaluation matching are better than a single, make-or-break test.
of the match results for temporal consistency. This strat- Other systems have constructed databases whose elements
egy makes the recognition problem tractable by separating are whole recordings, and this strategy leads inevitably to
the problem of matching within time points from the prob- the problem of pooling data across time points. In contrast,
lem of matching across time points. Matching within time the system discussed in this study, known as Tuneprint,
points produces in general some scalar measure of good- matches each short fragment of a recording against a large
ness of fit. Matching across time points then amounts to an database consisting of every individual fragment of every
instance of the well-known problem of weighted linear recording loaded. Fast vector quantization within such a
regression, where the time indices of an unknown input large database is inevitably fallible, and thus many frag-
recording are regressed against the time indices of a ref- ments within an input recording will produce incorrect
erence recording using these within-time-points measures matches. A significant portion of matches, however, will
as weights, and a regression line of unit (or near-unit) be correct, and it is this redundancy across time points that
slope indicates a match. This method of weighted linear is the key to the Tuneprint algorithm: if significantly many
regression has been applied to audio data by Schmidt [15], fragments from the input produce matches that turn out to
[16] and independently by Wang [17]. be the temporally corresponding fragments from a single
The choice of a method for extracting perceptual infor- recording, the input can be identified as this recording.
mation within time points depends on the context for Using a sophisticated psychoacoustic model as the front
which the audio fingerprinting system has been designed. end to this fragment-based analysis, Tuneprint has
achieved a high success rate on a database of 100 000
While some systems [4] only classify the input as to mu-
commercial music releases.
sical genre or class, others rank better and worse matches
within a class [8], or identify single best matches [13], [2],
[9], [5], [7]. Among the systems designed for individual 1 THE ALGORITHM
identification, variously inclusive or expansive criteria can 1.1 General Considerations
be established. The most narrow and least useful sense of
The identification algorithm can be conceptualized as
identity is the simple equality of two signals. In this case,
the combination of a perceptual model, which eliminates
byte-for-byte comparison suffices and no fingerprinting is
features that are not significant to a human listener, a
needed. At the opposite extreme, some applications may fragment function, which matches each of the input’s in-
need to establish broad identity, retrieving sounds that dividual temporal fragments against fragments in the da-
contain similar sequences or thematic elements but differ- tabase, and an assembly function, which puts together re-
ent instrumentals or voices (such as remixes or covers). sults from the fragment function over time and evaluates
Perhaps most useful for applications of digital rights man- their consistency. In general, a fragment function takes an
agement, though, is an audio indexing system that identi- input recording and a temporal offset within that record-
fies an input as one record within a database of releases, ing, and produces a set of triples, each consisting of a
after that input has perhaps been distorted by slight varia- recording in the database, a temporal offset within that
tions in playback speed, by lossy compression, or by trans- recording, and a distance measure or some other quantifi-
mission through a bandwidth-limited system. Such an in- cation of match quality. An assembly function takes a
dexing system would identify recordings that sound the series of the fragment function’s outputs over time, and
same to a human listener, and would differentiate record- produces a set of outputs each of which associates a re-
ings that contain obvious differences. cording in the database with a confidence level with which
This criterion of equivalence to a human listener leads the input matches that recording.
to a natural approach to enhancing perceptually relevant The goal in defining the fragment function is not so
information within time points: if the software is made to much to maximize accuracy as to maximize the informa-
model human auditory processing, then features not im- tion garnered per unit of computational resources spent.
portant in human auditory discrimination will be lost, and Depending on how the fragment function is implemented,
features that contribute to discrimination will be retained. the limiting resource may be input–output bandwidth,
As a result, the criterion of equivalence to a human listener memory bandwidth, network bandwidth, or CPU time. It is
can be attained without any assumptions as to what par- expected that a fragment function will be noisy, even re-
ticular features are relevant. A classifier based on human turning no information at all for some particularly difficult
auditory modeling is in a way an elaboration of time– cases. The assembly function is able to extract a signal
frequency methods that partition the audible spectrum into from this noisy output by exploiting the constraint that any
frequency bands akin to the critical bands of the human genuine match must be consistent over an interval of time.
cochlea [2], [4], [5]. One such method has achieved 99.8% Since matching is evaluated independently for each
recognition by applying dimensionality reduction and su- fragment, it is possible for a match to occur between any
pervised learning after accounting for frequency-specific subinterval of a recording submitted for identification and
human auditory perceptual thresholds [6], though more any matching subinterval of a recording in the database.
complex psychoacoustic phenomena such as spreading This ability to construct independent matches on subinter-
and compression are not accounted for. vals makes the Tuneprint system robust to truncation of
368 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING
audio, to the addition of silent intervals, and to momentary maskers) versus those whose levels are frequency-specific
glitches such as sometimes occur during radio or stream- (tone maskers). As specified in [20], we spread each Bark
ing transmission. Useful results can be obtained even from band with a left-sided rolloff of 31 dB/Bark and a right-
less than a second of input. sided rolloff of 22 dB/Bark + (230 Hz/Bark)/, where is
the frequency (in hertz) of the masker. We apply an in-
1.2 Psychoacoustic Modeling tensity compression factor ␣ of 0.8, making the com-
As a front end to Tuneprint’s fragment function, audio pressed sum of excitations greater than the linear sum.
recordings are transformed by a psychoacoustically based Algorithmic details on the application of this compression
model of human hearing. Input is sampled at 44.1 kHz, factor and the computation of spreading are presented in
either from raw CD content or by playback from a com- [20]. In addition to accounting for masking, this spreading
pressed format. In the case of a stereo recording, left and of spectral peaks makes the identification robust to dis-
right channels are mixed to mono. Intervals of 185.715 ms cretization errors that may arise from small variations in
(8190 samples) are normalized for playback volume by playback speed.
subtracting the mean and then scaling to the interval’s Following spreading and conversion to an intensity (dB)
maximum excursion or to one-sixteenth of the playback scale, the minimum audible field (MAF) [21], expressed in
medium’s dynamic range, whichever is greater. (This dy- dB as a function of Bark frequency, is subtracted from the
namic range criterion prevents silent intervals from be- signal. Bark frequency bands in which this subtraction
coming high-amplitude noise.) The interval is multiplied yields a negative result are zeroed, whereas nonnegative
by a Hann window, padded with a single zero on each end, results are transformed according to a perceptual loudness
and then Fourier-transformed. A power spectrum with a measure based on that defined in [22], depending on Bark
resolution of 5.38 Hz is extracted from the Fourier trans- frequency z and frequency-specific intensity Iz:
form, over the frequency range from 253 to 12 500 Hz.
Frequencies outside this range are discarded so that 0.7 + 0.4 Ⲑ Bark ⭈ 共2.5 Bark − max关2.5 Bark, min共3 Bark, z兲兴兲
matches cannot depend on them. We have found this strat-
egy effective at improving the identification of band-
limited transmissions while having no negative effect on
冋
⭈ 共Iz − 100 dB兲 + 100 dB + 8.5 dB ⭈ 1 −
1+e
1
册
−0.09 Ⲑ dB⭈共Iz−60 dB兲
.
the identification of full-bandwidth recordings. The power This step completes the psychoacoustic transformation, an
spectrum is transformed from a hertz scale to a Bark scale example of which is shown in Fig. 1. The resulting output
by linear interpolation using the Bark frequency values changes fairly slowly across samples (Fig. 2), making
given in [18]. This transformation yields a power spectrum Tuneprint robust to temporal frame shifts.
extending from 2.53 Bark to 23.17 Bark in 128 discrete As a postprocessing step, the spectrum is high-pass
steps, energy from multiple frequency bins at the high end filtered by subtracting from each point the linear trend in
of the spectrum being summed into single Bark bins. the 6-Bark interval centered on that point. (In the case of
Transformation to the Bark scale sets the stage for the points that lie within 3 Bark of the spectrum’s upper or
computation of frequency spreading. In the human ear, lower edge, the detrending window is narrowed accord-
mechanical properties of the basilar membrane and coarse ingly.) Although this last filtering step may seem to di-
coding within the cochlear nucleus cause a single- verge from the goal of modeling human perception, we
frequency input to excite neurons encoding a range of find it useful in practice since it removes band-limited
frequencies, with a central peak of excitation occurring at intensity offsets that arise from equalization (see example
the input frequency. This spreading of neural excitation in Fig. 3). Such equalization can be produced deliberately,
across frequencies underlies the psychoacoustic phenom- but is more often an unintentional consequence of the
enon of masking, in which a low-amplitude tone, at a limited frequency response of amplifiers, loudspeakers,
frequency near that of a higher amplitude tone or a band of microphones, or transmission systems.
noise, cannot be resolved [18]. Most applications based on The end result of all these transformations is a feature
human perceptual modeling (for example, MPEG layer 3 vector whose 128 components represent high-pass-filtered
[19]) compute an intensity threshold below which a perceptual intensity as a function of Bark frequency, in a
masked sound will not be heard. Since thresholds differ brief interval around the time point of interest. In order to
depending on whether the masker is a pure tone or noise, pair this frequency-domain information with some local
this thresholding method has the disadvantage of relying temporal information, the psychoacoustic transformation
on rather arbitrary measures of spectral flatness as indices is repeated in adjoining fragments, one immediately pre-
of tonality. ceding and the other immediately following the fragment
An alternative to modeling the masking threshold is to of interest. These temporal offsets produce a total of three
model the frequency spreading itself. With this approach, 128-dimensional feature vectors, which are concatenated
the masked threshold is not explicitly computed. Instead, it to form a 384-dimensional feature vector. The time point
arises from decreases in discriminability due to spreading of interest is advanced through the recording in half-
of the spectrum. The computational distinction between fragment steps of 92.8575 ms, so each time point is in-
noise masking and tone masking also is obviated [20]: the corporated into two different 128-vectors, and each 128-
same model is used in both cases, and the difference arises vector is used in three different 384-vectors.
in the model’s behavior in response to masking inputs Especially after frequency spreading and duplication
whose levels are fairly uniform across frequencies (noise across time, these feature vectors contain a great deal of
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 369
SCHMIDT AND BELMONTE ENGINEERING REPORTS
Fig. 1. Steps in psychoacoustic transformation. Power is transformed to a Bark scale and spreading is applied to mimic the mechanical
properties of the human cochlea, implementing the psychoacoustic phenomenon of masking. Amplitude of the spread spectrum is
converted to dB, auditory threshold is subtracted, and resulting levels are converted to an arbitrarily scaled, dB-like measure of
perceptual loudness.
across each group of three successive fragments, and the within a given partition times the information contained in
other six capture frequency-domain information that is such a partitioning:
fairly constant across successive fragments. Although pro-
Fig. 4. Basis vectors computed by principal-components analysis. Each input is a concatenation of three 128-dimensional feature
vectors in successive fragments; frequency-selective components therefore replicate the same features three times over. Note broad
frequency selectivity of components 1 through 3, and time dependence of components 4 (decay and attack) and 5 (beat). Higher order
components show a spiky pattern of frequency selectivity, perhaps incorporating information from notes and harmonics. Since basis
vectors are normalized to unit length, magnitudes are arbitrary and are not specified here.
The tradeoff is that with high entropy comes a high Since distance along the principal component is a lower
proportion of incorrectly partitioned queries. In the limit- bound for actual distance, N ⊆ N0.
ing case, in which a query vector is identical to one of the In order to exclude those vectors that are in N0 but not
vectors in the database, of course, the query will neces- in N, we organize the elements of N0 into a heap keyed on
sarily be allocated to the same partition as its best match. lower bound distance from the query vector. A heap, as
However, if the query vector lies some distance from its used in this case, is a binary tree in which the value of
best match, there is some risk that the query and the best every child node is greater than or equal to the value of
match will fall on different sides of a partition boundary. that child’s parent node. The root node in such a tree is
In such a case the best match will never be identified, termed the “top” of the heap, and it necessarily contains
because the partition that contains it will not be searched. the heap’s minimal value. We consider this minimal value,
The optimal partitioning scheme maximizes entropy with summing into it the squared distances along the next ⌬n
the constraint that the rate of query partitioning errors is principal components. The addition of these extra dimen-
low enough to produce good matches at a preponderance sions tightens the lower bound on the vector’s actual dis-
of time points. The balance between these two factors of tance from the query. In the case in which this addition
high entropy and low partitioning error depends on the causes the value at the top of the heap to grow larger than
average distance between queries and their best matches: the distances lower down in the heap, we swap values
again, in the limiting case where this distance is 0, each down the tree in order to reestablish the heap property.
vector in the database could be its own partition (and the We continue this process, on each iteration increment-
entire problem could be handled using string-matching ing the number of dimensions in the top vector’s distance
algorithms). estimate by ⌬n, and then reestablishing the heap property
Within the selected partition we consider only the on the basis of this increased distance estimate. When a
neighborhood N of vectors whose Euclidean distance from vector appears at the top of the heap for which all the
the query vector q does not exceed a threshold distance dimensions in the query space have been considered, that
dlimit. The theme behind this computation is to do only as is, for which the lower bound distance is identical to the
much work as is necessary to produce the desired number actual distance, we identify that vector as the next match,
of best matches. In general, the algorithm operates by and delete it from the heap. This incremental computation
considering successively larger subspaces of the query of distances allows us to perform only the computation
space. We begin by considering the one-dimensional sub- that is necessary in order to identify the closest matches.
space defined by the first principal component. (Recall In the case of the eight-dimensional space around which
that the principal components are the basis in terms of the current implementation is built, ⌬n ⳱ 7, and thus all
which the query space is represented; projections onto the the higher dimensional distance computation is done at
subspaces defined by these components therefore have once. However, this incremental algorithm has improved
zero computational cost, being implemented simply by performance in tests with higher dimensionality, and may
discarding the higher order dimensions.) To facilitate dis- prove useful if scaling of the system demands an increase
tance computations within this one-dimensional subspace, in the number of basis vectors. Although the search algo-
the contents of the database partition are maintained in rithm without incremental distance computation is O[D
sorted order according to the value of the first principal |N0| log(|N0|)] for N0 vectors in a space of D dimensions, in
component. The set N0 of all vectors that lie within a practice constant factors are such that this time bound is an
distance ±dlimit of the query vector along this principal improvement on O(D|N0|) exhaustive search. The algo-
component can thus be computed with an efficient search. rithm is specified formally in Fig. 7.
comparisons: as long as there are enough successful que- undergone lossy compression and other subtle distortions
ries to raise the score of the matching database entry above as are commonly found on file-sharing networks, and is
those of nonmatching entries, the input will be correctly designed to implement an identity criterion of equivalence
identified. An increase in partition error simply produces a to a human listener. Any validation of Tuneprint’s output
corresponding increase in the length of input required for must therefore depend on a sampling of recordings repre-
reliable identification. For example, 150 fragments at 50% sentative of those found on file-sharing networks, and on
loss will produce identifications as reliable as 100 frag- the independent evaluations of human listeners which, for
ments at 25% loss, since on average both scenarios yield this purpose of validation, cannot be automated. This re-
the same number of successful queries, namely, 75. Al- quirement for listening tests makes expansion of the test
though the relationship between match scores and identi- set very labor intensive and is the reason for the current
fication rate has not yet been systematically explored, our test set’s small size. More significant than the size of the
experience is that the top match can be reliably detected if test set, though, is the size of the database against which
it exceeds the next closest match by a score of 8 or more. the test recordings are being matched. It is the database
(Since matches at individual queries have values between size that determines the system’s liability to collisions, and
1/2 and 1, this threshold corresponds to matching at 8 to 16 hence to false positives and to match failures. Since the
individual fragments. Recall that the frames from which elements of the test set are randomly selected, each ele-
each fragment is produced are spaced at intervals of ment constitutes an independent test of the database. Al-
92.8575 ms. Therefore this threshold corresponds to an though the further development of Tuneprint would ben-
input length of between about 0.75 and 1.5 s, neglecting efit from more detailed and larger scale testing than was
the effect of temporal autocorrelation between frames. possible during the period of Tuneprint’s commercial
These figures match our experiences with inputs consist- funding, the present result with a database of size 100 000
ing of short clips, where identification becomes unreliable is an indicator of the promise of Tuneprint’s methods.
for input lengths below about 1 s.) Partly because of the small size of Tuneprint’s test set,
a detailed, quantitative comparison with the performance
3 FURTHER WORK of other systems remains an open question. Wang [17]
independently implemented a temporal consistency mea-
The current Tuneprint system is a work in progress, sure similar to Tuneprint’s, though without full psycho-
several aspects of which deserve further research. Some of acoustic modeling at the input stage; the experimental per-
these issues are points that will become important as the formance of this system has not been reported. Using the
system scales up beyond the current 100 000 recordings. vector quantization method of Allamanche et al. [9], Hell-
Others are possibilities for improvement regardless of muth et al. have implemented an advanced classifier [10]
scale. that makes use of the information inherent in the temporal
With large databases, the question of what constitutes sequence of feature vectors, though the exact nature of this
an acceptable test set becomes increasingly important. method is left unspecified. An alternative to vector quan-
Tuneprint is meant to be applied to recordings that have tization methods is robust hashing, in which feature vec-
Fig. 9. Rate of query misclassification as a function of distance between query and target. Note linearity of relationship.
tors are discretized by simple thresholding of their com- database partitioning can be characterized by the slope of
ponents, and the search space for approximate matches is this partitioning-error function in combination with the
then constrained to only those database recordings of entropy figure. The total expected CPU time per identifi-
which one or more frames produced an exact match in this cation is the product of the expected number of queries (a
discretized domain [11]. Although the distribution of function of the partitioning-error rate) and the expected
within-recordings hash errors in a database of size 10 000 CPU time per query (a function of entropy). It likely will
using this method theoretically predicts a very low rate of be possible to develop an optimization procedure that ap-
false positives with a high rate of identification [12], the plies these descriptive statistics to find the partitioning
between-recordings error rate has yet to be assayed ex- scheme that minimizes the expected CPU time per
perimentally in a large database. An important feature of identification.
systems based on multiple independent comparisons is Perhaps the most compelling question regarding parti-
that the method of analysis within time points is separable tioning is that of how much information needs to be pre-
from the method of evaluation for consistency across time served within a single partition. Vector quantization yields
points. This separability raises the possibility of plugging an efficiency of time, arising from the restriction of the
any within-points method into an alternative across-points search space to a particular partition, and an efficiency of
method—for example, Tuneprint’s psychoacoustically space, arising from the lossy coding of individual vectors
based model could be integrated with a robust hashing in terms of the partition to which each maps. Currently
system. Tuneprint takes advantage only of the temporal efficiency,
Another key question regarding scaling is how the applying vector quantization to select a particular database
growth of the database may increase the likelihood of partition to search. The potential spatial efficiency is not
collision, that is, a situation in which two items in the realized, since the original, unquantized vectors are pre-
database map so close together in Tuneprint’s search space served for use in nearest-neighbor matching within the
that discrimination between them is degraded or impos- selected partition. A database with very high entropy
sible. As Tuneprint matches first at the level of isolated might offer the potential for eliminating the query-to-
fragments and then at the level of whole recordings, there match distance measure within a partition, and instead
are two senses in which collision can be considered. First, treating all elements of the partition as equally good
collisions might occur between individual fragments of matches. The winning match at the level of whole record-
different recordings. Such collisions would affect matching ings could then be determined by consistency of matching
of the recording as a whole in the same way that parti- as in the current model. Such an ability to discard the
tioning error does. Tuneprint’s redundant strategy of mul- original vectors would offer large savings in space, by
tiple independent matching makes it robust to collisions at eliminating the bulk of the database, and in time, by elimi-
the fragment level in the same way that it is robust to nating the demand for nearest-neighbor matching in a
partitioning error—as long as there are significantly many high-dimensional space. We have implemented a proto-
successful matches, failed matches do not affect the iden- type of such a system which compresses the entire data-
tification [15]–[17]. Although in tests to date any small base into 3 Gbyte, runs on a single CPU, and, when suc-
effect of such collisions has been swamped by the effect of cessful, identifies a recording in only 110 ms of processing
partitioning error, a high rate of collisions could be ex- time on average.
pected to mimic the effect of partitioning error, increasing Variations on this high-entropy strategy also are pos-
the length of input required for reliable identification. sible. One optimization might involve overlapping parti-
Second, one could imagine a collision involving very tions, that is, allowing database vectors that lie near par-
strong matching to more than one recording in the data- tition boundaries to be included in more than one partition.
base. Although we have observed such occurrences during Conversely, query vectors that lie near partition bound-
our tests, all have turned out to involve cases in which a aries could be made to trigger a search of multiple parti-
recording loaded into the database from a peer-to-peer tions in the database. Either of these strategies would de-
network had been mislabeled and was actually a copy of crease the rate of partitioning error, at the expense of a
another matching recording already present in the data- modest increase in search time. One can also imagine a
base. We have never observed a true collision at the level two-pass system in which distance measures are computed
of whole recordings, and this absence of collision may only for those match candidates that show consistent
perhaps be expected given the length of a typical recording matching over time at the level of partitions.
and the number of dimensions within which it can vary as Currently the query is sampled at constant intervals
it evolves through time. throughout its length. Higher confidence in matching
Scaling certainly can be expected to figure into the likely can be achieved by varying this sampling period,
tradeoff between entropy and partitioning error. It remains both as a function of overall match confidence (more sam-
undetermined what degree of partitioning would be opti- pling of difficult-to-identify recordings) and as a function
mal even at the database’s current size. The limit in which of the temporal derivative of the psychoacoustic function
partitioning errors begin to affect accuracy has not yet (more sampling in the time intervals surrounding abrupt
been reached, and the current stopping point for the num- changes). Landmarking of the input recording is one way
ber of partitions is somewhat arbitrary. Assuming that the of implementing increased sampling at intervals of abrupt
rate of partitioning errors remains a linear function of the change, and it would be interesting to compare the perfor-
distance between the query and its best match (Fig. 9), a mance of landmarking based on simple acoustic properties
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 375
SCHMIDT AND BELMONTE ENGINEERING REPORTS
[12] J. Haitsma and T. Kalker, “A Highly Robust Audio tal Audio,” J. Audio Eng. Soc., vol. 42, pp. 780–792 (1994
Fingerprinting System,” presented at the 3rd Int. Conf. on Oct.).
Music Information Retrieval, Paris, France, 2002 Oct. 13–17. [20] J. G. Beerends and J. A. Stemerdink, “A Percep-
[13] R. J. McNab, L. A. Smith, I. H. Witten, C. L. Hen- tual Audio Quality Measure Based on a Psychoacoustic
derson, and S. J. Cunningham, “Towards the Digital Mu- Sound Representation,” J. Audio Eng. Soc., vol. 40, pp.
sic Library: Tune Retrieval from Acoustic Input,” in Proc. 963–978 (1992 Dec.).
1st ACM Int. Conf. on Digital Libraries (Bethesda, MD, [21] ISO 389-7 “Acoustics—Reference Zero for the
1996, pp. 11–18. Calibration of Audiometric Equipment—Part 7: Reference
[14] J. J. Hopfield and C. D. Brody, “What Is a Mo- Threshold of Hearing under Free-Field and Diffuse-Field
ment? Transient Synchrony as a Collective Mechanism for Listening Conditions,” International Organization for
Spatiotemporal Integration,” Proc. Nat. Acad. Sci. USA, Standarization, Geneva, Switzerland (1996).
vol. 98, pp. 1282–1287 (2000). [22] B. C. J. Moore, R. W. Peters, and B. R. Glasberg,
[15] G. Schmidt, “Tuneprint Audio Fingerprinting “Detection of Decrements and Increments in Sinusoids at
Technology: Technical Overview Version 1.1,” paper pre- High Overall Levels,” J. Acoust. Soc. Am., vol. 99, pp.
pared in response to Recording Industry Association of 3669–3677 (1996).
America/IFPI request for information (2001 June 6). [23] I. T. Jolliffe, Principal Component Analysis, 2nd
[16] G. Schmidt, “Tuneprint Fingerprinting Technol- ed. (Springer, New York, 2002).
ogy,” presented to the Recording Industry Association of [24] Y. Linde, A. Buzo, and R. M. Gray, “An Algo-
America, Washington, DC (2001 Sept. 5). rithm for Vector Quantizer Design,” IEEE Trans. Com-
[17] A. Wang, “System and Methods for Recognizing mun., vol. 28, pp. 84–95 (1980).
Sound and Music Signals in High Noise and Distortion,” [25] S. P. Lloyd, “Least Squares Quantization in PCM,”
International Patent Publ. WO 02/11123 A2 (2002). IEEE Trans. Inform. Theory, vol. 28, pp. 129–137 (1982).
[18] E. Zwicker and H. Fastl, Psycho-acoustics: Facts [26] A. J. Bell and T. J. Sejnowski, “An Information-
and Models, 2nd ed. (Springer, Berlin, 1999). Maximization Approach to Blind Separation and Blind
[19] K. Brandenburg and G. Stoll, “ISO/MPEG-1 Au- Deconvolution,” Neural Comput., vol. 7, pp. 1004–1034
dio: A Generic Standard for Coding of High-Quality Digi- (1995).
THE AUTHORS
G. Schmidt
Geoff Schmidt was born in Branson, MO, in 1980. He a career in machine learning or programming language
left undergraduate studies at the Massachusetts Institute of design. E-mail gschmidt@mit.edu.
Technology in Cambridge, MA, after one term to pursue
●
entrepreneurial interests. Working alone, he developed the
Tuneprint framework in mid-2000 and in 2001 secured Matthew Belmonte studied English literature and com-
venture capital funding from Sage Hill Partners in Cam- puter science as an undergraduate, developing an interest
bridge to scale up and commercialize the system. Tune- in the processing of formal and natural languages. Moving
print Corporation ceased operations in 2002. from artificial to biological computing systems, he applied
Mr. Schmidt’s previous work on output-sensitive vis- this computational interest to problems in cognitive neu-
ible surface determination algorithms for 3-D rendering roscience. He is the author of several papers on the neu-
has won awards from the International Science and Engi- rophysiology of attention and perception in normal and
neering Fair, the U.S. Junior Science and Humanities Sym- autistic populations and on computational methods for sta-
posium, and other research competitions. He is now a com- tistical analysis of biophysical time series. He was a re-
mercial research consultant, currently performing machine search scientist at Tuneprint Corporation, the architect of
vision research at Intellivid, a Cambridge startup company. Tuneprint’s psychoacoustic model, and a major contribu-
Mr. Schmidt’s personal interests include meditation, tor to the technical description of the Tuneprint system. He
teaching, and politics. He recently served as campaign left the United States in 2002 and currently works in func-
manager for Matt DeBergalis’s Cambridge City Council tional magnetic resonance imaging at the University of
campaign, where in addition to day-to-day management he Cambridge Autism Research Centre in the UK. E-mail:
designed a successful direct mail effort. He plans to pursue belmonte@mit.edu.
The extraction of pitch (or fundamental frequency) information from polyphonic audio
signals remains a challenging problem. The specific case of detecting the pitch of a melodic
instrument playing in a percussive background is presented. Time-domain pitch detection
algorithms based on a temporal autocorrelation model, including the Meddis–Hewitt algo-
rithm, are considered. The temporal and spectral characteristics of percussive interference
degrade the performance of the pitch detection algorithms to various extents. From an
experimental study of the pitch estimation errors obtained on a set of synthetic musical
signals, the effectiveness of the auditory-perception–based modules of the Meddis–Hewitt
pitch detection algorithm in improving the robustness of fundamental frequency tracking in
the presence of percussive interference is discussed.
widely in speech analysis [1], has also been found suitable to hair-cell transduction is applied and the temporal peri-
for the pitch tracking of monophonic musical signals [3]. odicity detected separately in each frequency channel by
The present study investigates the Meddis–Hewitt percep- means of autocorrelation. Finally the across-channel infor-
tual PDA as an example of a more sophisticated algorithm mation is combined to produce a single pitch estimate. The
also based on the detection of periodicity via temporal recent pitch perception model of Meddis and Hewitt [5]
autocorrelation. has gained much prominence due to its demonstrated abil-
The engineering report is organized as follows. Section ity to predict the results of certain crucial pitch perception
1 provides a brief overview of various PDAs, with an experiments. The Meddis–Hewitt PDA is based on the
introduction to the PDAs chosen for the present study. The functional modules of the auditory periphery with added
subsequent sections describe the implementation of the processing stages that emulate auditory processing, which
functional blocks of the PDAs and the evaluation of is considered to be more central. The various stages of the
the PDAs by an experiment on the synthetic signal test PDA are a bandpass filter representing the transfer func-
set. The study concludes with a discussion of the obser- tion of the outer ear and middle ear canal, a bank of filters
vations targeted toward obtaining insights into the perfor- modeling the basilar membrane response, followed by a
mance of the PDAs with respect to signal and interference model of the inner hair cell applied to each filter channel
characteristics. output to simulate neural transduction, obtaining a series
of firing probabilities. Next an ACF periodicity detector is
1 PITCH DETECTION ALGORITHMS applied to each of the hair-cell model outputs. Finally a
summary autocorrelation function (SACF) is formed by
Time-domain PDAs, the oldest pitch detection algo- adding the ACFs so obtained across the frequency chan-
rithms, are based on measuring the periodicity of the sig- nels. A search for the highest peak in the relevant range of
nal via the repetition rate of specific temporal features. SACF lags provides an estimate of the pitch period.
Frequency-domain PDAs, on the other hand, are based on The added presence of noise and inharmonic partials
detecting the harmonic structure of the spectrum. Among due to an interfering signal perturbs the shape and location
the simpler time-domain PDAs is the popular autocorre- of the peaks contributed by the signal harmonics in the
lation function (ACF)–based PDA. The definition of the ACF. Thus the traditional ACF pitch detector applied to a
“biased” autocorrelation function is given by [1] musical signal with percussive accompaniment would be
N−−1 expected to be adversely affected by the presence of noise
ACF共k,兲 = 兺 y共k + i)y(k + i + )
i=0
(1) and inharmonic frequency components contributed by the
percussion. On the other hand, in the case of perception-
where k and are position of window and correlation lag, based PDAs, the signal is processed by a number of au-
respectively, and y is the input signal. ditory-model–based blocks before being subjected to pe-
For a pure tone, the ACF exhibits peaks at lags corre- riodicity extraction via the ACF. In particular, a combina-
sponding to the period and its integral multiples. The peak tion of linear and nonlinear filtering is applied, and the
in the ACF of Eq. (1) at the lag corresponding to the temporal periodicity information itself is computed via the
signal period will be higher than that at the lag values ACF separately in each frequency channel. It is of interest
corresponding to multiples of the period. For a musical to examine whether and to what extent these perceptually
tone consisting of the fundamental frequency component motivated enhancements improve the reliability of the
and several harmonics, one of the peaks due to each of the pitch estimation for harmonic musical signals with con-
higher harmonics occurs at the same lag position as that spicuous background interference. While much recent
corresponding to the fundamental, in addition to several work has investigated the ability of perceptual PDAs to
other integer multiples of the period (subharmonics) of predict subjectively perceived pitch in psychoacoustic ex-
each harmonic. Thus a large peak corresponding to the periments, the present work examines the robustness of the
sum contribution of all spectral components occurs at the PDA for estimating the signal fundamental frequency in
period of the fundamental (and higher integral multiples of the presence of interference. By means of carefully de-
the period of the fundamental). This property of the ACF signed test signals, we study the pitch estimation errors
makes it very suitable for the pitch tracking of monopho- obtained by the PDAs in the presence of percussive inter-
nic musical signals. The ACF PDA chooses as the pitch ference with respect to the underlying pitch of the melodic
period the lag corresponding to the highest peak within a voice, and later attempt to explain these observations.
range of lags.
In contrast to the simplicity of the ACF pitch detector 2 IMPLEMENTATION OF PDA
are more recent PDAs, also based on autocorrelation, but FUNCTIONAL BLOCKS
derived more closely from the mechanism of temporal
coding in the human auditory system. These PDAs can in Fig. 1 provides a modular structure for the PDA based
fact be viewed in terms of preprocessing of the signal on the Meddis–Hewitt pitch perception model [5]. The
followed by autocorrelation-based detection. There are a individual blocks represent various stages of the algo-
number of variants in this class of PDAs, but they all share rithm, each of which may, in principle, be implemented in
some important characteristics [4]. They decompose the multiple ways.
signal into frequency bands defined by the auditory filters Block 1 represents the outer ear and middle ear (OEM)
of the cochlea. Next, nonlinear processing corresponding prefiltering, with the magnitude response shown in Fig. 2.
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 379
RAO AND SHANDILYA ENGINEERING REPORTS
Essentially a bandpass filter with a resonance frequency can range from a full model of the hair cell derived from
near 3 kHz, this block has been implemented by the cas- a computational analysis of actual hair-cell and auditory-
cade of an eighth-order low-pass IIR filter and a second- nerve processes [5] to simple half-wave rectification fol-
order parameterized high-pass filter with a high-pass filter lowed by low-pass filtering [8]. In 1986 Meddis proposed
parameter value of 0.94 [6]. The magnitude response is a model for the hair cell which simulates, through differ-
similar to the inverted absolute threshold-of-hearing curve, ence equations rich in parameters, several properties of the
whereas below 1 kHz it is approximated by the inverted neural transduction and auditory nerve firing [9]. Impor-
equal-loudness contour for high loudness levels. tant characteristics of the hair-cell model are its nonlin-
Block 2 represents the cochlear filter bank with filter earity and frequency selectivity. The present implementa-
center frequencies that are equally spaced on the ERB tion is based on the hair-cell model of Meddis and
(equivalent rectangular bandwidth) scale with the band- coworkers [9], [10] as implemented in the auditory model
width increasing with the center frequency. A filter bank library of Slaney [11].
of 27 ninth-order gammatone filters of bandwidth 1 ERB Block 4 calculates the ACF [Eq. (1)] of the signal input
each are based on the corresponding function of the to this block. Keeping in mind the range of 150–800 Hz
HUTear library [7]. These filters have center frequencies for the expected fundamental frequency, we have used a
ranging from 123 Hz to 5.636 kHz, or 4 to 30 on the ERB 40-ms window with 50% overlap (frame space 20 ms) for
scale. The output of each filter simulates the pattern of the computation of the ACF. Once the ACFs are obtained
vibration at a particular location on the basilar membrane. for all the channels [by implementing Eq. (1) on the hair-
Next the conversion of this mechanical activity to the cell model output of each channel], block 5 performs the
neural spike generation events of the auditory nerve is task of combining them. Combining can occur in the form
simulated by block 3. The implementation of this module of either simple or weighted addition. We use simple sum-
Fig. 1. Block diagram of functional blocks of Meddis–Hewitt PDA with postprocessing added.
ming. The combined ACFs are known as summary ACF in order to create a number of different test conditions,
(SACF). In block 6 the SACF is searched for the highest pitch-shifted versions of the melody were created as fol-
peak within a prespecified range (corresponding to the lows: high (up by 4 semitones to the range of 440–787 Hz)
expected fundamental frequency range of 150–800 Hz). or low (down by 12 semitones to the range of 174.5–311
The lag value corresponding to the highest peak is ac- Hz). It may be remarked that the pitch transformations are
cepted as the estimated pitch period. Block 7 is a postpro- achieved via “instrumental” pitch shifting, which implies
cessing block, which smoothens out local variations in the that the relative amplitudes of the harmonics remain un-
pitch estimates across frames using a simple three-point changed across fundamental frequency changes, in con-
median filter. The combination of all seven blocks consti- trast to formant-corrected pitch shifting.
tutes the Meddis–Hewitt PDA with added postprocessing. The set of percussive instruments represents a range of
In the next section we describe a procedure for evaluating signal characteristics, as illustrated in the spectrograms of
the contribution of the various functional blocks of the Fig. 3. The kick drum is a relatively fast decaying signal
Meddis–Hewitt PDA to improving robustness over direct with predominantly low-frequency content. The hi hat is
ACF peak-based pitch detection. characterized by a slow time decay and a broadly spread
spectral mixture of moderately strong partials and noise.
3 EXPERIMENTAL EVALUATION Low agogo has low-noise content and strong partials all
the way from 1.1 to 10 kHz, with a moderate rate of decay.
In order to investigate the performance of the pitch de- To obtain a variety of combinations of target and inter-
tection algorithms for the pitch estimation of a harmonic ference timbres, the song was transformed by changing the
signal in a percussive background, a set of test signals was target instrument and then selecting only one of the inter-
designed from available MIDI songs. Apart from the ac- fering (percussive) instruments at a time. The selected
cess to the “ground truth” pitch, the use of MIDI files target instrument voices were of different timbres, as
provides great flexibility by allowing the inclusion or shown by the magnitude spectra of a fixed note in Fig. 4.
elimination of individual monophonic instrument chan- For the middle- and high-frequency ranges, baritone sax
nels, modifying the relative strengths of the component (prominent high harmonics), flute (weak high harmonics),
sounds, pitch transformations, and choice of the instru- and oboe (harmonics spread in frequency) were used. For
ment playing the melody line as well as the percussive the low-frequency range the flute was replaced by alto sax
instrument. to incorporate a more natural sound.
The JAZZ MIDI sequencer, available as shareware,1
3.1 Test Signal Set was used to achieve the needed transformations. The
A MIDI song of length 8 seconds was selected. It had a melody line was recorded, switching off all the other chan-
single harmonic instrument (alto sax) playing the melody nels, in each of the three pitch ranges with each of three
accompanied by several percussive (nonpitched) voices, target instruments. Thus a set of nine files containing pure
namely, hi hat, kick drum, and low agogo channels. The melody was obtained. For each of these melody files we
pitch range of the melody was 350–620 Hz and consisted created three “corrupted” versions, each with only a single
of four similar phrases, with each phrase comprising five percussion channel turned on. The synchronization be-
notes of various durations. [The pitch contour of the
1
melody can be seen as the solid line in Fig. 5(b).] Further, www.jazzware.com.
tween the melody and each of the percussion tracks was simple three-point median postfilter only corrects isolated
such that a minimum number of percussion strikes fell in pitch errors. The instances of percussion where pitch er-
the silence region between target instrument notes. This rors do not occur seem to be characterized by overlapping
led to percussion onsets being located at a variety of po- partials between target and percussion. Fig. 5(c) illustrates
sitions with respect to target note onset, steady state, and the pitch contour obtained by using the Meddis–Hewitt PDA,
decay. Then relative amplitudes of the target and interfer- also followed by the postfilter, on the same test signal. Indeed
ing signals were set so that the ratio of signal power to the performance has improved on using this algorithm.
interference power (each of the powers being computed as A controlled-parameter experiment was carried out to
the corresponding average over the nonsilent regions of study the behavior of the PDAs on the underlying signal
the musical piece) remained at a fixed predefined value characteristics, and to obtain an understanding of the role
(equal to 2.0) for a set of test signals across various target of each of the functional modules of the Meddis–Hewitt
instruments, percussive instruments, and pitch ranges. The algorithm in influencing the pitch detection. Specifically,
signal-to-interference ratio, however, is only an average four PDAs were evaluated on the test data set. All four
value with local values deviating greatly, depending on the algorithms are derived from the generic block diagram of
position of the percussion strike with respect to the target Fig. 1 by choosing different combinations of subblocks
instrument note onset. We thus obtained a total of 27 test and/or different realizations of a specific subblock. The
signals, all sampled at 44.1 kHz. postprocessor of block 7 is a three-point median filter that
is included in all four algorithms. The details of the four
3.2 Experiment PDAs are as follows.
The PDAs were run on the pure test signals, with the 1) AC1: This algorithm incorporates blocks 4, 6, and 7
same postprocessing applied to each PDA estimate to en- only. It corresponds to the traditional ACF pitch detector.
sure a fair comparison. Block 4, that is, the ACF calculation block, uses a rectan-
Fig. 5 shows a sample of the experimental results ob- gular window and the biased ACF computation of Eq. (1).
tained. The selected test signal is the combination of bari- 2) AC2: This algorithm incorporates block 1 with
tone sax and low agogo percussion. Fig. 5(a) shows the blocks 4, 6, and 7. This again is a traditional ACF pitch
spectrogram of the test signal. The relatively continuous detector, but with outer ear/middle ear filtering included as
dark lines correspond to the harmonic partials of the bari- a preprocessing function.
tone sax, and the very short dark segments that occur 3) AC3: This algorithm comprises blocks 1, 3, 4, 6,
during the first (2 strikes), third, and fourth notes of each and 7. This algorithm is an extension of AC2, where the
phrase correspond to the partials of the low agogo. Fig. neural transduction block (based on the hair-cell model of
5(b) compares the true pitch track with the pitch track [9]) has been introduced. Unlike the Meddis–Hewitt PDA,
estimated by the traditional ACF pitch detector followed the signal is not decomposed into separate frequency chan-
by a three-point median filter. We see that there are large nels. Rather, the hair-cell nonlinearity is applied to the full
pitch estimation errors, which coincide with the occur- band signal followed by ACF pitch detection to result in a
rence of the percussion and last over several frames. The single estimate of temporal periodicity per frame.
4) MH1: All blocks 1 to 7 are included, making this a From an inspection of the bar charts we note that the
complete implementation of the Meddis–Hewitt algorithm extent of pitch errors depends not only on the PDA but
with postprocessing added. For the ACF, a Hamming- also on the percussion instrument, the pitch range of the
windowed biased ACF is computed for each channel using target instrument, and the target instrument itself. Of the
an efficient fast Fourier transform implementation. latter three factors, the most marked is the dependence of
the PDA performance on the percussion characteristics.
3.3 Observations The ACF pitch detector AC1 makes a large number of
The bar charts of Figs. 6 to 9 display the results of the errors for all three percussions. The target instrument pitch
experiment in terms of a count of the pitch errors with range that is most affected is seen to depend on the spec-
respect to the known reference pitch contour, arranged by tral characteristics of the interference. The kick drum with
the PDA configuration used. A pitch estimate is obtained its low-frequency support affects the lowest pitch range
for every analysis frame only in regions where the target the most, whereas the low agogo with its broad spectral
instrument is playing. At a frame spacing of 20 ms, this mixture of partials and noise impacts all three pitch ranges.
comprises a total of 285 frames. A pitch error is defined to The introduction of OEM filtering in AC2 has the effect
occur whenever the detected pitch deviates from the of an overall lowering of the extent of pitch errors. A
reference pitch by 3% or more (about half a semitone) of strong exception to this happens to be the hi hat in the
the reference pitch frequency. Of the detected pitch high-pitch range (note the changed scale of the error axis).
errors, those of magnitude less than 6% are labeled “fine” The hair-cell model followed by autocorrelation (AC3)
errors, whereas those of higher magnitude are labeled serves to reduce all errors further, with the only significant
“gross.” The gross errors are found typically to be pitch errors remaining in the low agogo signals. Finally, the
octave errors. It may be noted that in the absence of per- full Meddis–Hewitt PDA MH1 reduces the errors in the
cussion, no pitch errors were observed in any of the PDA low agogo signals of the low- and middle-pitch ranges, but
configurations. worsens the performance slightly in the high-pitch range.
Fig. 5. Pitch estimates for test signal “middle pitch, baritone sax with low agogo.” (a) Input signal spectrogram (prominent low-
frequency partials of percussion encircled). (b) Actual pitch (——) and pitch estimated from AC1 PDA (- - -). (c) Actual pitch (——)
and pitch estimated from MH1 PDA (- - -).
Fig. 6. Error performance of AC1 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.
For the pure target signals, each of which contains a prominent peaks (at pitch and pitch multiples) but affects
number of harmonics, including the fundamental, the win- only their relative amplitudes. As a result, the “choose the
dowed ACF of the input signal computed according to Eq. highest peak in the ACF” approach typically results in
(1) shows peaks at lags corresponding to the pitch period either a fine error due to a misshapen pitch peak or a gross
and multiples of the pitch period. The highest peak corre- error in the form of a pitch octave error. Fig. 10(a) shows
sponds to the pitch period, and there is no error in the the ACF of a periodic signal of fundamental frequency
estimated pitch. On the other hand, when the input signal 600 Hz with the first four harmonics of amplitudes 10, 18,
to the ACF contains noise or interfering partials, there is a 14, and 12. At the sampling rate of 44.1 kHz, the signal
perturbation of the peak corresponding to the correct pitch pitch period is 73.5 samples. A single interference partial
period. The ACF of the interference partial (which can be of fundamental frequency 3300 Hz and amplitude 16 (cor-
considered to combine additively with the target ACF to responding to signal-to-interference power ratio [SIR] 3.0)
form the corrupted ACF) modifies the values of the origi- is added to the signal, resulting in the ACF of the noisy
nal ACF at all lags, thus modifying amplitudes at all lags signal shown in Fig. 10(b). We see that the likelihood of
to some extent. Unless the interference partial is very an octave error in the ACF of the noisy signal is highest
strong, this is not sufficient to change the locations of the when, as depicted in Fig. 10, a valley of the ACF of the
Fig. 7. Error performance of AC2 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.
noise partial coincides with the signal pitch peak and the the case of the hi hat and, to some extent, on the low
peak of the noise ACF coincides with a target pitch mul- agogo. This can be explained by the low-frequency nature
tiple. This is true whenever an interference partial occurs of the spectrum of the kick drum, which consequently is
at or near an odd multiple of half the target fundamental heavily attenuated by the OEM filter. The hi hat and low
frequency. Likewise, we expect no pitch errors when the agogo on the other hand have brighter spectra with much
interference partial is near a multiple of the target funda- middle frequency content that remains after the OEM fil-
mental frequency. It is easy to see from Fig. 10 that the ter. The target spectrum, because of its preponderant lower
likelihood of pitch octave error would increase as the am- harmonics, suffers greater overall attenuation than the bright
plitude of the noise partial increases relative to the target spectra percussions. The unusually sharp rise in pitch errors
signal strength. This explains why the introduction of a in the high-pitch target range with hi-hat interference was
linear filter such as the OEM filter affecting the relative found to be due to the chance occurrence of an interference
amplitudes of the signal and noise partials leads to a partial at an odd multiple of half the fundamental frequency
change in the error profiles, as seen in Fig. 7. The intro- of a note of recurring pitch throughout the song. This partial
duction of outer ear–middle ear filtering reduces the errors fell near the resonance frequency (3 kHz) of the OEM filter
in the case of the kick drum, but has the contrary effect in and was a prominent spectral component in the filtered sig-
Fig. 8. Error performance of AC3 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.
nal. It resulted in an octave error in the ACF pitch estimate leads to a bias favoring lower pitch lags, as seen in Fig.
almost throughout the duration of the note in question. 10(c), where we also observe the attenuation of the high-
Introducing the hair-cell model prior to ACF computa- frequency partial. Such effects contribute to the overall
tion is equivalent to a nonlinear processing of the signal improvement in the performance demonstrated by the
that, among other effects, gives rise to new frequency AC3 PDA in Fig. 8. In particular, a more robust pitch
components located at sum and difference frequencies of estimator is obtained in the case of the interference partial
the original components. In the case of a weak or missing at a high odd multiple of half the target fundamental fre-
fundamental, the creation of distortion components con- quency. The Meddis–Hewitt PDA is an enhancement of
tributes to the enhancement of the fundamental frequency the AC3 algorithm in that a cochlear filter bank is in-
component [1]. In addition the hair-cell model introduces cluded. The ACF is computed separately in each fre-
a dc bias and a low-pass frequency selectivity [9]. The quency channel, and summed across channels to obtain the
presence of interference partials at nonharmonic locations pitch estimate as the largest peak lag in the search range.
gives rise to nonharmonic distortion components, whose The frequency decomposition affected by the filter bank
magnitudes depend on the magnitudes and phases of the limits the number of interacting partials through the hair-
interacting components (both signal and interference). Due cell model nonlinearity applied separately to each channel.
to this the distortion components affect the peak at pitch Fig. 11, obtained for the same signal and interference as
lag in the ACF in different ways. One consistent effect is Fig. 10, illustrates the effect of this on the SACF. Shown,
the dc level introduced by the hair-cell processing that for two different channel configurations (of four channels
Fig. 9. Error performance of the MH1 PDA for various target instruments and pitch ranges with percussion instrument in background.
(a) Kick drum. (b) Hi hat. (c) Low agogo.
simulated by ideal bandpass filters), are the signal and of gross errors for all three percussions, depending on the
interference frequency components at the output of the frequency relation between target and interference partials.
channel nonlinearities as well as the corresponding SACF. The low-pitch-range errors are the most pronounced in the
In Fig. 11(a) the interference partial is in a separate chan- case of the kick drum due to a strong low-frequency partial
nel by itself. This eliminates any distortion components from this percussion. Fig. 12 illustrates this effect on ACF
created due to an interaction of target harmonics and in- peak-based pitch estimation by simulating the kick drum by
terference. On the other hand, the co-occurrence of several a strong interference tone at 68 Hz. Shown in Fig. 12 are
higher harmonics of the target in a single channel strength- ACFs for signals of fundamental frequency 600 Hz and 200
ens the fundamental frequency component in the SACF. Hz with the same harmonic amplitudes as the signal of Fig.
These two effects lead to an improved ACF peak at the 10. Both clean signals yield accurate pitch peaks in the SACF
signal pitch period of 73 samples. This explains the im- (at lags of 73 samples and 220 samples, respectively). How-
proved performance of the Meddis–Hewitt PDA for the ever, the addition of the 68-Hz low-frequency interference
low agogo samples for the low- and middle-pitch ranges. tone (with amplitude 26, corresponding to SIR 1.1) intro-
In the high-pitch range, however, it was observed that due duces a low-lag bias in the overall ACF in both cases. This
to the higher interharmonic spacing, several of the chan- leads to a gross pitch error (pitch submultiple selected) in
nels contained only a single harmonic of the target instru- the case of the lower fundamental frequency signal since
ment accompanied by interference components. This con- its pitch period is comparable to that of the interference.
dition is depicted and simulated by the configuration of
Fig. 11(b), where the signal partials occupy different chan- 5 CONCLUSIONS
nels and the noise partial shares a channel with a target In this engineering report an experimental investigation
harmonic. The last channel gives rise to inharmonic dis- is presented of the performance of pitch-detection algo-
tortion components, one of which is visible in the figure. rithms based on temporal autocorrelation for the pitch
Together with the reduced contribution to the fundamental tracking of a melodic signal with percussive accompani-
frequency due to the absence of unresolved harmonics, ment characterized by inharmonic partials. The perfor-
this leads to a degradation of the pitch estimate. mance of the autocorrelation pitch detector as well as its
Finally we return to the AC1 PDA and explain the low- enhancements based on the Meddis–Hewitt auditory
frequency errors due to the kick drum in Fig. 6. The au- model are studied experimentally on synthetic musical sig-
tocorrelation method of AC1 leads to a significant number nals. The ACF peak-based pitch detector incurs pitch es-
Fig. 10. ACF plotted as a function of lag. (a) Signal of fundamental frequency 600 Hz (——) and noise tone of frequency 3300 Hz
(- - -). (b) Noisy signal. (c) Nonlinearly processed noisy signal.
Fig. 11. SACF and spectral components of noisy signal after channel filtering and nonlinear processing corresponding to different
four-channel groupings of signal harmonics and noise. (a) SACF for /h1/h2/h3+h4/n/ and corresponding power spectrum. (b) SACF
for /h1/h2/h3/h4+n/ and corresponding power spectrum.
Fig. 12. ACF plotted as a function of lag for signal and interference tone of 68 Hz. - ⭈ - ⭈ - ACF of signal; - - - ACF of interference;
—— ACF of noisy signal. (a) Signal fundamental frequency 600 Hz. (b) Signal fundamental frequency 200 Hz.
THE AUTHORS
P. Rao S. Shandilya
Preeti Rao received a Bachelor degree in Electrical speech and audio signal compression and audio content
Engineering from the Indian Institute of Technology, retrieval.
Bombay, in 1984, and a Ph.D. degree specializing in
●
signal processing from the University of Florida, Gaines-
ville, in 1990. She taught in the Electrical Engineering Saurabh Shandilya received a B.E. degree in electrical
department at the Indian Institute of Technology, Kanpur engineering from the Government Engineering College, Bi-
from 1994 to 1999. Following a six-month visiting posi- laspur, India, in 2001 and an M.Tech. degree in electrical
tion at the Institute of Perception Research, Eindhoven, engineering from the Indian Institute of Technology, Bom-
The Netherlands, she joined the Indian Institute of bay in 2003. Currently he works for Neomagic Semiconduc-
Technology, Bombay, where she is presently an associ- tors Inc., Noida, India. His interests include speech and audio
ate professor. Her current research interests include processing, video coding, and associative computing.
Notations for expressing levels has Project AES-X144 Carriage of DS AES31-3 Audio file transfer and
been revised. Audio Data in AES47 exchange—Part 3: Simple project
The revision has clarified the use of Scope: “[T]o amend AES47-2002 interchange
the “dBu” and abandons the earlier clauses 4.1.2.1, 5.2.2.2 and 6 to provide
intention to recommend the adoption of the option of transmitting DSD audio Mix automation
1 V as the reference quantity for audio instead of linear PCM.” A proposed extension of AES31-3 is
levels in decibels. intended to provide a simple but
The revised document can be found reliable method of interchanging gain
on the Web site. Summary Report: SC-06-01 and pan automation data between
AES14-1992 AES standard for Working Group on Audio- workstations.
professional audio equipment— File Transfer and Exchange Should there be some maximum gain
Application of connectors, Part 1: This meeting was held in conjunction in a fader, referenced to unity? U. Henry
XLR-type polarity and gender has been with the AES 115th Convention, NY, observed that many workstations have
reaffirmed. 2003-10-10 and was convened by chair difficulty applying more than 12 dB
AES17-1998 AES standard method M. Yonge. gain to an audio path; sometimes less. It
for digital audio equipment— would be helpful if it was generally
Measurement of digital audio DVD Forum Liaison understood that this was a realistic con-
equipment has been reaffirmed. J. Yoshio reported from DVD Forum straint on free interchange. Bull agreed
Working Group 10. This group is con- but noted that replay systems should be
sidering professional applications of prepared to accommodate larger gain
Call for Comment DVD optical disc. maxima in case these are encountered in
The following document will be There is currently no standard for pro- practice.
withdrawn by the AES after any adverse fessional audio recording on DVD. There was a discussion concerning
comment received within three months Application areas include studios and gains in decibels expressed to two
of the publication of their call on the broadcast. The intention is to consider a decimal places. Henry noted that
AES Standards Web site has been new format for recording and playback because AES31-3 requires the value to
resolved. For more detailed information of high quality audio data and associated be rendered in ASCII characters, some
on each call please go to data. The next generation of DVD tech- limit to the number of characters is prac-
www.aes.org/standards/b_comments. nology will handle the higher bandwidth tically necessary. It was noted also that
Comments should be sent by and storage capacity necessary for: a) receiving applications will need to inter-
e-mail to the secretariat at timecode; b) metadata; c) video polate, or ramp, between gain steps in
standards@aes.org. All comments will recording possibilities; d) audio any case, and this will largely eliminate
be published on the Web site. recording for professional use. the need for higher precision gain
Persons unable to obtain this Two possibilities are being con- values; two decimal places were felt to
document from the Web site may sidered for development as the DVD be sufficient.
request a copy from the secretariat. Audio Professional (DVD-AP) format:
a) Broadcast Wave Format (BWF), a Panning
CALL FOR COMMENT on with- computer file format on DVD-ROM; b) Bull felt that 100 points in each
drawal of AES33-1999, AES standard a DVD audio format dedicated for pro- panning axis (left to right and front to
procedure for maintenance of AES fessional audio application. back) would not be sufficient to define
audio connector database has been It could be possible to have two a smooth pan locus. Imagine a circular
published, 2004-02-13. sessions on a single DVD disc: DVD pan movement; there could be audible
audio plus a DVD-ROM area. However, path errors, or gain artifacts similar to
this will need a special DVD AP zipper noise. With such coarse steps,
New Project AES-X144 on recorder/player. eliminating these artifacts would need a
DSD over AES47 It had been noted that BWF files will greater degree of look-ahead to up-
Direct stream digital (DSD) be a standard for broadcasters. The coming data which could make
audio coding is becoming estab- Japanese Post Production Association practical implementations unneces-
lished within the industry as an (JPPA) had promoted the use of the sarily complex.
alternative to linear PCM BWF-J format on 2.5 in MO disc, which It was felt that the default pan
c o d i n g f o r p r o f e s sional audio was very popular in Japan position should be at front-center,
applications. Project AES-X140 is although not so widely used elsewhere. which should be identified by zero
intended to standardize a transport However, the capacity of an MO disc coordinates in both axes.
which will carry both linear PCM and was not enough for multichannel audio It was also felt that there needed to be
DSD. If AES47 is not extended to files. Further development of this style a convention for interpreting pan points
include DSD in a standardized way, a of interchange would require the in a similar manner to that proposed for
variety of incompatible implemen- capacity of the DVD-AP instead of the faders. This should avoid the ap-
tations can be anticipated. MO disc. pearance of instantaneous tran- ➥
sitions—implementations should always It was pointed out that any general with the AES 115th Convention, New
use ramped coefficients in the same way standard should probably cover all York, 2003-10-10 and was convened
as the faders. international interchange and support all by chair J. Strawn.
Following a question about panning character sets, not just Japanese. UTF-
law, there was agreement that the 16 appears to be derived from ISO Liaison with 1394 Trade Association
document should define the total SPL in 10646; Unicode may be considered a (1394 TA)
the room from all loudspeakers to be subset of this same international Fujimori reported as follows:
constant, independent of the pan standard. The A/V Working Group is looking
position. Bull noted that it would be a sig- at the following topics:
nificant task to convert existing appli- • Blu-Ray DVD recording.
New business cations to a different character code set. • Japanese terrestrial digital television
S. Aoki spoke on the subject of Also, character codes sets are not inter- (DTV).
Broadcast Wave Format (BWF) files in operable so there will need to be some • A point-to-point test network.
international interchange. degree of active translation. • AV/C Camera storage subunit 2.1.
BWF as currently defined does not Henry observed that multi-octet file • IEEE Ballot Review Committee
support international interchange names will be a problem; they may not (BRC) review of 1394 implementation
because, for example, the metadata in be readable at all in some systems! Aoki tests.
the “bext” chunk is specied to use the said that the Japan Post Production • S800 Base-T, now known as IEEE-
ASCII character code. While ASCII is a Association (JPPA) has no special re- 1394c.
robust and simple character code, it is quirement but uses the file name scheme • Pin assignments on UTP5 cable.
limited to representing the Roman provided in the Japanese version of the The proposal from Digital Home
alphabet. ASCII code is not useful in computer operating system. Technology is to use pins 4 and 5 for
Japanese operations because operators Henry remained concerned that file 48V, and 3 and 6 for ground. B.
cannot read information in their own names will not be read correctly— Moses will get more information on
language. Interchange becomes difficult AES31-3 depends on locating files by this.
when interchanging commercials for file names. Multibyte file names could M. Mora of Apple asked if the
broadcasting where flexibility of corrupt accurate file name reading. Trade Association plans to publish
interchange is very important. This issue has implications that will audio guidelines similar to video
Aoki felt that Japanese character sets need further consideration. guidelines. Fujimori answered that the
based on ISO 10646 Universal Multi- DVD Forum and other organizations
Octet Coded Character Set (UCS) or its are expected to do that within their
derivative UCS Transformation Format, Summary Report: SC-06-02 application space.
UTF-16, would be appropriate. Unicode Working Group on Audio
and the Japanese-specific “Shift-JIS” are Applications Using the High New projects
also in use. In the Japan industry there Performance Serial Bus Strawn will submit a project initiation
appeared no clear consensus as to which (IEEE 1394) form for General Data based on IEC
character code to prefer. This meeting was held in conjunction 61883-6.
Proceedings of the
AES 24th International Conference:
THE PROCEEDINGS Multichannel Audio, The New Reality
OF THE AES 24 th
INTERNATIONAL Banff, Alberta, Canada
CONFERENCE 2003 June 26-28.
This conference was a follow-up to the 19th Conference on surround sound.
2003 June 26–28 These papers describe multichannel sound from production and engineering
to research and development, manufacturing, and marketing. 350 pages
LLOYD RICE
11222 Flatiron Drive, Lafayette, Colorado 80026
REVIEWERS
GEORGE L. AUGSPURGER, Perception Incorporated, MARK KAHRS, Department of Electrical Engineering,
Box 39536, Los Angeles, California 9003 University of Pittsburgh, Pittsburgh, Pennsylvannia 15261
6,535,610 in ma gnetic gap 57 formed be tween ring magnet 11 and pole piece 52. It
seems obvious that the magnet must have one pole on its inner surface
43.38.Hz DIRECTIONAL MICROPHONE UTILIZING facing the voice coil and the other pole on its outer surface abutting the pole
SPACED APART OMNI-DIRECTIONAL
MICROPHONES
Brett B. Stewart, assignor to Morgan Stanley & Company
Incorporated
18 March 2003 „Class 381Õ92…; filed 7 February 1996
The c urrent buzzword in audio pickup for teleconferencing is ‘‘beam-
forming.’’ If mere beamforming is not up to the task, then ‘‘adaptive beam-
forming’’ will surely do the trick. This latest invention mounts a few omni-
directional microphones around the periphery of a video display, digitizes
their outputs, and then processes the signals through tapped delay lines
under computer control.—GLA
piece. According to thispatent, a more linear magnetic field can be achieved
if ma gnetization is divided into three areas. Only the central portion of the
6,542,614 ring is magnetized at right angles. The upper and lower portions are mag-
netized at angles of 40° or less.—GLA
43.38.Si BOOMLESS HEARINGÕSPEAKING
CONFIGURATION FOR SOUND RECEIVING MEANS
6,542,617
Heinz Renner, assignor to Koninklijke Philips Electronics N.V.
1 April 2003 „Class 381Õ370…; filed in the European Patent Office
43.38.Ja SPEAKER
21 March 2001
Certain hands-free communication applications require the user to hear Masao Fujihira et al., assignors to Sony Corporation
local sounds as well as incoming signals. This requirement can be met by a 1 April 2003 „Class 381Õ402…; filed in Japan 26 May 1999
single headphone having an attached boom microphone. Although the mi-
What appears to be a conventional self-shielded l oudspeaker is in fact
crophone is advantageously close to the user’s mouth, its proximity results
in unwanted pickup of pops and air noises as well. Moreover, the assembly a s mall unit designed to reproduce frequencies up to 70 kHz or so. Voice
is easily dislodged if the user is moving. This patent describes a lightweight, coil bobbin 11 is made of a conductive material such as aluminum. Coil 6 is
clip-on earpiece that contains not only a headphone but an embedded direc- attached to the bobbin by a soft bonding agent that decouples the coil from
tional microphone.—GLA
6,529,107
43.38.Ja SPEAKER COMPRISING RING MAGNET
Motoharu Shimizu and Hiroyuki Daichoh, assignors to Hitachi
Metals Limited
4 March 2003 „Class 335Õ302…; filed in Japan 16 December 1999
Modern magnetic materials allow moving coil loudspeakers to be built
with thin, lightweight ring magnets. In the illustration, voice coil 55 moves
the bobbin at very high frequencies. The patent explains that in this very 6,554,098
high range the bobbin is driven as an induction motor and continues to
radiate sound, presumably from dust cap 15.—GLA 43.38.Ja PANEL SPEAKER WITH WIDE FREE
SPACE
Tatsumi Komura, assignor to NEC Corporation
6,556,687 29 April 2003 „Class 181Õ173…; filed in Japan 15 June 1999
43.38.Hz SUPER-DIRECTIONAL LOUDSPEAKER To save space, a panel-type loudspeaker diaphragm 1 can be driven at,
USING ULTRASONIC WAVE
Koji Manabe, assignor to NEC Corporation
29 April 2003 „Class 381Õ387…; filed in Japan 23 February 1998
It is known that an array of ultrasonic transducers can be driven by a
modulated carrier to produce audible sound from empty space. This patent
suggests that if the transducers are mounted on a concave surface, then their
or very near, its outer edge. A single panel can be driven by more than one
transducer 2, 2⬘.—GLA
6,543,574
6,535,269
43.38.Md VIDEO KARAOKE SYSTEM AND 6,549,632
METHOD OF USE
43.38.Kb MICROPHONE
Gary Sherman and M ichael Ch ase, both of Lo s Angel es,
California Hiroshi Akino et al., assignors to Kabushiki Kaisha
18 M arch 2003 „Class 352 Õ6…; filed 29 June 2001 Audio-Technica
15 April 2003 „Class 381Õ174…; filed 19 March 1997
The sound track of a commercial motion picture is already created and
stored in a multi-track format. Individual tracks may be rerecorded later to Some hand-held microphones are extremely sensitive to mechanical
correct audio problems or to dub dialog into another language. Suppose that hocks and scrapes. This patent describes a simple, passive shock isolation
you could purchase a DVD for home viewing that allowed you to record and ystem derived from mechanical analog circuit analysis. Although the patent
replace selected dialog tracks with your own overdubs. The patent describes ext describes embodiments for both dynamic and capacitor microphones,
an interactive system to facilitate such customized viewing.—GLA all of the eight patent claims refer to capacitor microphones only.—GLA
6,535,610
43.38.Hz DIRECTIONAL MICROPHONE UTILIZING
SPACED APART OMNI-DIRECTIONAL
MICROPHONES
Brett B. Stewart, assignor to Morgan Stanley & Company
Incorporated
18 March 2003 „Class 381Õ92…; filed 7 February 1996
The c urrent buzzword in audio pickup for teleconferencing is ‘‘beam-
forming.’’ If mere beamforming is not up to the task, then ‘‘adaptive beam-
forming’’ will surely do the trick. This latest invention mounts a few omni- quite different from a standard closed-box loudspeaker. Some general rules
directional microphones around the periphery of a video display, digitizes for predicting and optimizing the performance of such a system are devel-
their outputs, and then processes the signals through tapped delay lines oped and explained.—GLA
under computer control.—GLA
6,549,629
43.38.Tj DVE SYSTEM WITH NORMALIZED
6,549,637
SELECTION
43.38.Ja LOUDSPEAKER WITH DIFFERENTIAL
Brian M. Finn and Shawn K. Steenhagen, assignors to Digisonix
FLOW VENT MEANS LLC
15 April 2003 „Class 381Õ92…; filed 21 February 2001
Jon M. Risch, assignor to Peavey Electronics Corporation
15 April 2003 „Class 381Õ397…; filed 24 September 1998 DVE stands for digital voice enhancement which, in this case, includes
echo cancellation, background noise suppression, and optimal selection of
This patent includes just a little bit of everything, culminating in 40
multiple-zone microphones in a hands-free communications system.
fairly lengthy claims. However, the heart of the invention is the differential
flow vent shown. This is an open-ended cylinder 100 fitted with funnel-
shaped insert 104. We are informed that air flowing from left to right will
6,557,665
43.38.Ja ACTIVE DIPOLE INLET USING DRONE
CONE SPEAKER DRIVER
Richard D. McWilliam and Ian R. McLean, assignors to Siemens
Canada Limited
6 May 2003 „Class 181Õ206…; filed 16 May 2001
The invention is intended to provide active noise cancellation at the air
intake of an internal combustion engine. Inner diaphragm 18 is electrically
driven by conventional means 46. Outer diaphragm 22 is driven acoustically
to generate a noise attenuating signal. At the same time, air is somehow
6,535,613
43.38.Ja AIR FLOW CONTROL DEVICE FOR
LOUDSPEAKER generated in a sound wave amplifying horn to increase amplification effi-
ciency of bass sounds and improve the clearness of the resulting sounds.’’—
Jason A. Ssutu, assignor to JL Audio, Incorporated GLA
18 March 2003 „Class 381Õ397…; filed 28 December 1999
Airtight dust cap 44 pumps air in and out of cavity 46 through rear 6,557,664
43.38.Ja LOUDSPEAKER
Anthony John Andrews and John Newsham, both of Dorking,
Surrey, the United Kingdom
6 May 2003 „Class 181Õ152…; filed 22 February 1994
Central plug 21 is actually chisel-shaped, and surrounding horn 11 is
similarly asymmetrical. The overall assembly is a close cousin to the JBL
2405 high-frequency transducer designed more than 25 years ago. In both
vent 31. Plate 52 directs the air flow against the inner surface of bobbin 35
to c ool voice coil 36.—GLA
6,560,343
43.38.Ja SPEAKER SYSTEM
Jae-Nam Kim, assignor to Samsung Electronics Company,
Limited
6 May 2003 „Class 381Õ349…; filed in the Republic of Korea 22
April 1996
Part of the backwave energy from loudspeaker 16 is conducted
through horn 24 to the face of cabinet 10. The remainder energizes vent 26.
The patent explains that since only a portion of the rear sound waves are cases the objective is to create a coverage pattern that is relatively wide
collected and amplified, ‘‘...reflected waves or standing waves will not be horizontally but vertically narrow.—GLA
6,546,105 rectified, and then divided—not compared. The patent explains in some
detail how this arrangement estimates source proximity rather than signal-
43.38.Vk SOUND IMAGE LOCALIZATION DEVICE to-noise ratio. If the proximity estimation signal exceeds a predetermined
AND SOUND IMAGE LOCALIZATION threshold, then microphone 310 is gated on or its gain raised.—GLA
METHOD
6,553,121
Takashi Katayama et al., assignors to Matsushita Electric
Industrial Company, Limited
43.38.Vk THREE-DIMENSIONAL ACOUSTIC
8 April 2003 „Class 381Õ17…; filed in Japan 30 October 1998
PROCESSOR WHICH USES LINEAR PREDICTIVE
Using head-related transfer functions 共HRTFs兲 to create virtual sound
sources from a pair of loudspeakers is theoretically intriguing but messy in
COEFFICIENTS
practice. Assuming that some kind of all-purpose HRTFs can be derived,
Naoshi Matsuo and Kaori Suzuki, assignors to Fujitsu Limited
then FIR filter coefficients can be calculated for any angular location. How-
22 April 2003 „Class 381Õ17…; filed in Japan 8 September 1995
ever, computing and/or storing filter coefficients for all possible locations is
inefficient and time-consuming. This patent, like a number of earlier inven- To create convincing three-dimensional audio for computer games, a
number of virtual sound sources must be controlled within a virtual sound
field. At the same time, the acoustical characteristics of the actual reproduc
tions, attempts to find a better way. In this case, the angular location of a
virtual source is fed to a coefficient control device which then performs
digital mathematical operations involving only three predetermined fre-
quency response functions. The process is said to result in a dramatic reduc-
tion in memory requirements and computational time, as compared to prior
art.—GLA
6,549,630
43.38.Lc SIGNAL EXPANDER WITH
DISCRIMINATION BETWEEN CLOSE AND
DISTANT ACOUSTIC SOURCE ing sound field must be subtracted from the sound source. A bank of large
FIR filters seems to be called for, but the patent suggests a more efficient
James F. Bobisuthi, assignor to Plantronics, Incorporated approach. By performing linear predictive analysis of the impulse response
15 April 2003 „Class 381Õ94.7…; filed 4 February 2000 of the sound field to be added, the number of taps can be greatly reduced. A
similar procedure can be applied to sound sources in motion—in effect,
panning between locations rather than creating a plurality of individual
sources. The patent is clearly written and includes a great many helpful
The objective is to reliably turn on a microphone when its user speaks
illustrations.—GLA
and to minimize false triggering from other sound sources. A handset or
headset is fitted with two microphones, one 310 near the talker’s mouth and
the other 330 as far as possible from the first. Their outputs are filtered,
6,549,627
43.38.Lc GENERATING CALIBRATION SIGNALS
FOR AN ADAPTIVE BEAMFORMER
Jim Rasmusson et al., assignors to Telefonaktiebolaget LM
Ericsson
15 April 2003 „Class 381Õ71.11…; filed 30 January 1998
Consider a hands-free communications system installed in a vehicle.
The equipment includes two or more microphones 405, 407 and a loud-
speaker 401. By introducing adaptive filters at the outputs of individual
microphones it is possible to achieve in-phase summation of the signals
from the direction of a talker while largely canceling the unwanted signals
from the loudspeaker. Because the acoustical environment and the location
6,577,736
43.38.Vk METHOD OF SYNTHESIZING A THREE
DIMENSIONAL SOUND-FIELD
Richard David Clemow, assignor to Central Research
Laboratories Limited
10 June 2003 „Class 381Õ18…; filed in the United Kingdom 15 Oc-
tober 1998
A home surround sound system typically uses two or three front speak-
ers plus two rear speakers. When mixing program material for this format,
conventional panning techniques work well when moving sound images
laterally but cannot create accurate phantom sources between front and rear
speakers. Conversely, by making use of head-related transfer functions and
of the talker may change, the system must somehow calibrate itself. The
patent describes an improved one-step calibration process that may also be
augmented by utilizing the filters as fixed echo cancellers during normal
operation.—GLA
6,563,932
43.38.Ja MAGNET SYSTEM FOR LOUDSPEAKERS
Paul Cork, assignor to KH Technology
13 May 2003 „Class 381Õ412…; filed in the United Kingdom 16
January 2001
An electrodynamic loudspeaker magnetic circuit has a ring-shaped gap
42 between inner and outer pole pieces 26 and 36. Typically, the gap would
6,587,565
be energized by a single magnet 20. The addition of complementary magnet
44 is said to overcome deficiencies of prior art in terms of reduced size, 43.38.Vk SYSTEM FOR IMPROVING A SPATIAL
improved performance, and ease of assembly.—GLA EFFECT OF STEREO SOUND OR ENCODED
SOUND
Pyung Choi, assignor to 3S-Tech Company, Limited
1 July 2003 „Class 381Õ98…; filed in the Republic of Korea 13
6,574,339 March 1997
43.38.Vk THREE-DIMENSIONAL SOUND This patent describes yet another stereo enhancement method based on
filtering and cross-coupling. In this case, signal processing is applied to left
REPRODUCING APPARATUS FOR MULTIPLE
LISTENERS AND METHOD THEREOF
Doh-hyung Kim and Yang-seock Seo, assignors to Samsung
Electronics Company, Limited
3 June 2003 „Class 381Õ17…; filed 20 October 1998
Using head-related transfer functions and some fancy digital filtering,
it is possible to create a convincing three-dimensional sound field from two
loudspeakers. A major drawback is that the illusion is effectively limited to
a single listener at a defined location. We might give the listener a choice of
locations by allowing him to select from, say, three filter settings. But sup-
pose that all three settings are selected sequentially at some optimum time
interval. Will the result be aural confusion or will the effective sound field and right channels individually. The original left and right signals are ‘‘en-
be expanded to accommodate multiple listeners? The patent argues for the hanced’’ at low frequencies but otherwise unmodified. The difference signal
latter.—GLA is ‘‘enhanced’’ in the high-frequency range.—GLA
The Audio Engineering Society has published a 20-disk electronic library containing most of the
Journal technical articles, convention preprints, and conference papers published by the AES since
its inception through the year 2003. The approximately 10,000 papers and articles are stored in PDF
format, preserving the original documents to the highest fidelity possible while permitting full-text
and field searching. The library can be viewed on Windows, Mac, and UNIX platforms.
You can purchase the entire 20-disk library or disk 1 alone. Disk 1 contains the program and
installation files that are linked to the PDF collections on the other 19 disks. For reference and
citation convenience, disk 1 also contains a full index of all documents within the library, permit-
ting you to retrieve titles, author names, original publication name, publication date, page num-
bers, and abstract text without ever having to swap disks.
CONFERENCE
n the 135-meter-high London TUTORIAL DAY
Eye you can see as far as 25 Gerhard Stoll and Russell Mason, papers cochairs, have
miles away, and you have a targeted a number of papers for tutorial presentations on
bird’s eye view of such major Thursday, June 17 as a good way to offer attendees a
London sights as St. Paul’s thorough introduction to the subject of metadata. Two
Cathedral, Buckingham Palace, invited papers in the first morning session, “Metadata,
the Houses of Parliament, and Identities, and Handling Strategies,” by Chris
Big Ben. Chair John Grant and Chambers, and “Before There Was Metadata,” by Mark
his committee are planning a con- Yonge, are introductory papers to set the stage for
ference this June 17–19 that will everything that follows.
give you a great view of the critically The next session, File Basics, has three invited
important subject of metadata. Metadata for papers: “Introduction to MXF and AAF,” by Philip
Audio will be held at Church House, the conference center that DeNier; “XML Primer,” by Claude Seyrat; and “Keep-
is just a stone’s throw from Westminster Abbey and the ing it Simple: BWF and AES31,” by John Emmett.
Houses of Parliament in central London. The first session on Thursday afternoon, Practical
As the means for production and distribution of digital audio Schemes, starts with an invited paper by Philippa Mor-
proliferate, appropriate metadata tools are needed to facilitate, rell, “The Role of Registries.” The next paper, by
control, and extend these activities. There has been a great deal researchers from Pompeu Fabra University of
of activity in individual organizations to develop metadata Barcelona, will look at a system for managing sound
tools. However, substantial issues remain to be addressed effects. And Richard Wright will present an invited
before the desired goal of global exchange and common under- paper on the Dublin Core. The final session on Thurs-
standing can be reached. International standardization, such as day is a workshop on MPEG-7. This tutorial day is also
the work on MPEG-7 and MPEG-21 may hold some important available as a single-day registration option (see the
Photos courtesy British Tourist Office
To
AES
25th
Session 1 Session 7
Session T-1 Broadcast Implementations, Session A
Introduction Frameworks
Banquet (optional)
Houses of Parliament
Technical Sessions
Thursday, June 17 what it can do for the user, and various ways that it
TUTORIALS can be employed in the area of metadata for
audio. The contents of this paper form the basis for a
SESSION T-1: INTRODUCTION
number of the papers that appear later in the conference.
T1-1 Metadata, Identities, and Handling Strategies—
Chris Chambers, BBC R&D, Tadworth, Surrey, UK T2-3 Keeping it Simple: BWF and AES31—John Emmett,
(invited) Broadcast Project Research Ltd., Teddington,
Middlesex, UK (invited)
With all the potential media material and its associated
metadata becoming accessible on IT-based systems, Digital audio is spreading outward to the furthest
how are systems going to find and associate the ele- reaches of the broadcast chain. Making the best use of
ments of any single item? How are the users going to the opportunities presented by this demands a stan-
know they have the correct items when assembling dardization procedure that is adaptable to a vast num-
audio, video, and information for use within a larger ber of past, present, and future digital audio formats
project? This short talk will explore the way areas of and scenarios. In addition, would it not be just great if
our industry are hoping to tackle the problem and it cost nothing? This paper will point out the benefits of
some of the standards being introduced to ensure what we already have and tell a tale of borrowing
management of this material is possible. economical audio technology from many sources.
Tools for Content-Based Retrieval and This paper addresses the automated extraction of
Transformation of Audio Using MPEG-7: The musical meter from audio signals on three hierarchical
SPOff and the MDTools—Emilia Gómez, Oscar levels, namely tempo, tatum, and measure length. The
Celma, Emilia Gómez, Fabien Gouyon, Perfecto presented approach analyzes consecutive segments
Herrera, Jordi Janer, David García, University of the audio signal equivalent to a few seconds length
Pompeu Fabra, Barcelona, Spain each, and detects periodicities in the temporal progres-
sion of the amplitude envelope in a range between
In this workshop we will demonstrate three applica- 0.25 Hz and 10 Hz. The tatum period, beat period, and
tions for content-based retrieval and transformations measure length are estimated in a probabilistic manner
of audio recordings. They illustrate diverse aspects of from the periodicity function. The special advantages
a common framework for music content description of the presented method reside in the ability to track
and structuring implemented using the MPEG-7 stan- tempo also in music with strong syncopated rhythms,
dard. MPEG-7 descriptions can be generated either and its computational efficiency.
406 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
25rd International Conference Program
2-2 Percussion-Related Semantic Descriptors of signal spectral shape. The use of principal component
Music Audio Files—Perfecto Herrera1, Vegard analysis in conjunction with support vector machine
Sandvold2, Fabien Gouyon1 classification yields to nearly perfect recognition accu-
1Universitat Pompeu Fabra, Barcelona, Spain racy on varied musical solo phrases from ten instru-
2University of Oslo, Oslo, Sweden ments issued from different instrument families.
Automatic extraction of semantic music content meta-
data from polyphonic audio files has traditionally Friday, June 18
focused on melodic, rhythmic, and harmonic aspects. SESSION CD-3: TOOLKITS
In the present paper we will present several music con-
tent descriptors that are related to percussion instru- 3-1 Digital Media Project—R. Nicol, BT, Ipswich, UK
mentation. The “percussion index” estimates the (invited)
amount of percussion that can be found in a music [Abstract Not Available at Press Time]
audio file and yields a (numerical or categorical) value
that represents the amount of percussion detected in
3-2 MPEG-21: What and Why—Jan Bormans1, Kate
the file. A further refinement is the “percussion profile,”
Grant2 (invited)
which roughly indicates the existing balance between 1IMEC, Leuven, Belgium
drums and cymbals. We finally present the “percusive- 2Nine Tiles, Cambridge, UK
ness” descriptor, which represents the overall impul-
siveness or abruptness of the percussive events. Data The MPEG-21 vision is to define a multimedia frame-
from initial evaluations, both objective (i.e., errors, work to enable transparent and augmented use of mul-
misses, false alarms) and subjective (usability, useful- timedia resources across a wide range of networks
ness) will also be presented and discussed. and devices used by different communities. The tech-
nical report “Vision, Technologies and Strategy”
2-3 Tonal Description of Polyphonic Audio for Music describes the two basic building blocks: the definition
Content Processing—Emilia Gómez, Perfecto of a fundamental unit of distribution and transaction
Herrera, Universitat Pompeu Fabra, Barcelona, Spain (the digital item) and the concept of users interacting
with digital items. The digital items can be considered
The purpose of this paper is to describe a system that the “what” of the multimedia framework (e.g., a video
automatically extracts metadata from polyphonic audio collection, a music album), and the users can be con-
signals. This metadata describes the tonal aspects of sidered the “who” of the multimedia framework. MPEG-
music. We use a set of features to estimate the key of 21 is developing a number of specifications
the piece and to represent its tonal structure, but they enabling the integration of components and standards
could also be used to measure the tonal similarity to facilitate harmonisation of “technologies” for the cre-
between two songs and to perform some key-based ation, modification, management, transport, manipula-
segmentation or establish a tonal structure of a piece. tion, distribution, and consumption of digital items. This
paper will explain the relationship of the different
2-4 Phone-Based Spoken Document Retrieval in MPEG-21 specifications by describing a detailed use-
Conformance with the MPEG-7 Standard—Nicolas case scenario.
Moreau, Hyoung-Gook Kim, Thomas Sikora,
Technical University of Berlin, Berlin, Germany 3-3 A 3-D Audio Scene Description Scheme Based on
XML—Guillaume Potard, Ian Burnett, University of
This paper presents a phone-based approach of spo- Wollongong, NSW, Australia
ken document retrieval, developed in the framework of
the emerging MPEG-7 standard. The audio part of An object-oriented schema for describing time-vary-
MPEG-7 encloses a SpokenContent tool that provides ing 3-D audio scenes is proposed. The creation of
a standardized description of the content of spoken this schema was motivated by the fact that current
documents. In the context of MPEG-7, we propose an virtual reality description schemes (VRLM, X3D) have
indexing and retrieval method that uses phonetic infor- only basic 3-D audio description capabilities. In con-
mation only and a vector space IR model. Experiments trast, MPEG-4 AudioBIFs have advanced 3-D audio
are conducted on a database of German spoken docu- features but are not designed as a metadata lan-
ments with ten city name queries. Two phone-based guage. MPEG-4 BIFs are particularly targeted as a
retrieval approaches are presented and combined. The binary scene description language for scene render-
first one is based on the combination of phone ing purposes only. Our proposed 3-D audio scene
N-grams of different lengths used as indexing terms. description schema features state-of-the art 3-D
The other consists of expanding the document repre- audio description capabilities while being usable both
sentation thanks to the phone confusion probabilities. as a metadata scheme for describing 3-D audio con-
tent (for example, 5.1 or Ambisonics B-format) and as
2-5 Efficient Features for Musical Instrument a format for scene rendering.
Recognition on Solo Performances—Slim Essid,
Gaël Richard, Bertrand David, GET-Télécom Paris Friday, June 18
(ENST), Paris, France SESSION CD-4: FEATURE EXTRACTION, SESSION A
Musical instrument recognition is one of the important 4-1 A System for Harmonic Analysis of Polyphonic
goals of musical signal indexing. While much effort has Music—Claas Derboven, Markus Cremer, Fraunhofer
already been dedicated to such a task, most studies IIS AEMT, Ilmenau, Germany
were based on limited amounts of data that often
included only isolated musical notes. In this paper we A system for harmonic analysis of polyphonic musical
address musical instrument recognition on real solo signals is presented. The system uses a transform with
performance based on larger training and test sets. A a nonuniform frequency resolution for the extraction of
highly efficient set of features is proposed that is prominent tonal components and determines the key
obtained from signal cepstrum but also from spectrum and the contained chords of a musical input signal with
low- and higher-order statistical moments describing high accuracy. A statistical approach based on the ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 407
25rd International Conference Program
frequency of occurrence of musical notes for determin- the performance of MPEG-7 audio spectrum projection
ing the key is described. An algorithmic solution for (ASP) features based on several basis decomposition
chord determination is presented with a concise expla- algorithms vs. mel-scale frequency cepstrum coeffi-
nation. Finally, a qualitative evaluation of the system’s cients (MFCC). For basis decomposition in the feature
performance is conducted to demonstrate the applica- extraction we evaluate three approaches: principal com-
bility to real-world audio signals. ponent analysis (PCA), independent component analy-
sis (ICA), and non-negative matrix factorization (NMF).
4-2 Robust Identification of Time-Scaled Audio—Rolf Audio features are computed from these reduced vec-
Bardeli, Frank Kurth, University of Bonn, Bonn, Germany tors and are fed into a hidden Markov model (HMM)
classifier. We found that established MFCC features
Automatic identification of audio titles on radio broad- yield better performance compared to MPEG-7 ASP in
casts is a first step toward automatic annotation of general sound recognition under practical constraints.
radio programs. Systems designed for the purpose of
identification have to deal with a variety of postpro- 5-3 Automatic Optimization of a Musical Similarity
cessing potentially imposed on audio material at the Metric Using Similarity Pairs—Thorsten Kastner,
radio stations. One of the more difficult techniques to Eric Allamanche, Oliver Hellmuth, Christian Ertel,
be handled is time-scaling, i.e., the variation of play- Marion Schalek, Jürgen Herre, Fraunhofer IIS,
back speed. In this paper we propose a robust finger- Ilmenau, Germany
printing technique designed for the identification of
time-scaled audio data. This technique has been With the growing amount of multimedia data available
applied as a feature extractor to an algebraic indexing everywhere and the necessity to provide efficient meth-
technique that has already been successfully applied ods for browsing and indexing this plethora of
to the task of audio identification. audio content, automated musical similarity search and
retrieval has gained considerable attention in recent
4-3 Computing Structural Descriptions of Music years. We present a system which combines a set of
through the Identification of Representative perceptual low-level features with appropriate classifica-
Excerpts from Audio Files—Bee Suan Ong, tion schemes for the task of retrieving similar sounding
Perfecto Herrera, Universitat Pompeu Fabra, songs in a database. A methodology for analyzing the
Barcelona, Spain classification results to avoid time consuming subjective
listening tests for an optimum feature selection and com-
With the rapid growth of audio databases, many music bination is shown. It is based on a calculated “similarity
retrieval applications have employed metadata index” that reflects the similarity between specifically em-
descriptions to facilitate better handling of huge data- bedded similarity pairs. The system’s performance as
bases. Music structure creates the uniqueness identity well as the usefulness of the analyzing methodology is
for each musical piece. Therefore, structural descrip- evaluated through a subjective listening test.
tion is capable of providing a powerful way of interact-
ing with audio content and serves as a link between 5-4 Automatic Extraction of MPEG-7 Metadata for
low-level description and higher-level descriptions of Audio Using the Media Asset Management System
audio (e.g., audio summarization, audio fingerprinting, iFinder—Jobst Löffler, Joachim Köhler, Fraunhofer
etc.). Identification of representative musical excerpts IMK, Sankt Augustin, Germany
is the primary step toward the goal of generating struc-
tural descriptions of audio signals. In this paper we dis- This paper describes the MPEG-7 compliant media
cuss various approaches in identifying representative asset management system iFinder, which provides a
musical excerpts of music audio signals and propose set of automatic methods and software tools for media
to classify them into a few categories. Pros and cons of analysis, archiving, and retrieval. The core technology
each approach will also be discussed. of iFinder comprises several modules for audio and
video metadata extraction that are bundled in the
Friday, June 18 iFinderSDK, a commercial product offered to the media
industry. The workflow for audio content processing
SESSION CD-5: POSTERS, PART 2 together with pattern recognition methods used will be
5-1 Toward Describing Perceived Complexity of presented. Of special note, a technique for precise
Songs: Computational Methods and audio-text alignment together with a browser applica-
Implementation—Sebastian Streich, Perfecto tion for synchronized display of retrieval results will be
Herrera, Universitat Pompeu Fabra, Barcelona, Spain demonstrated. An insight to using MPEG-7 as a stan-
dardized metadata format for media asset manage-
Providing valuable semantic descriptors of multimedia ment will be provided from a practical point of view.
content is a topic of high interest in current research.
Such descriptors should merge the two predicates of 5-5 An Opera Information System Based on MPEG-7—
being useful for retrieval and being automatically Oscar Celma Herrada, Enric Mieza, Universitat
extractable from the source. In this paper the semantic Pompeu Fabra, Barcelona, Spain
descriptor concept of music complexity is introduced. Its
benefit for music retrieval and automated music recom- We present an implementation of the MPEG-7 stan-
mendation is addressed. The authors provide a critical dard for a multimedia content description of lyric opera
review of existing methods and a detailed prospect of in the context of the European IST project: OpenDra-
new methods for automated music complexity estimation. ma. The project goals are the definition, development,
and integration of a novel platform to author and deliv-
5-2 How Efficient Is MPEG-7 for General Sound er the rich cross-media digital objects of lyric opera.
Recognition?—Hyoung-Gook Kim, Juan José MPEG-7 has been used in OpenDrama as the base
Burred, Thomas Sikora, Technical University Berlin, technology for a music information retrieval system. In
Berlin, Germany addition to the MPEG-7 multimedia description
scheme, different classification schemes have been
Our challenge is to analyze/classify video sound track proposed to deal with operatic concepts such as musi-
content for indexing purposes. To this end we compare cal forms (acts, scenes, frames, introduction, etc.),
9-3 Integration of Audio Computer Systems 10-3 Audio Meta Data Generation for the Continuous
and Archives Via the SAM/EBU Dublin Core Media Web—Claudia Schremmer1, Steve Cassidy2,
Standard,Tech.doc 3293—Lars Jonsson1, Gunnar Silvia Pfeiffer1
Dahl2 1CSIRO, Epping, NSW, Australia
1Swedish Radio 2Macquarie University, Sydney, Australia
2KSAD, Norsk Rikskringkasting, Oslo, Sweden
The Continuous Media Web (CMWeb) integrates time-
Dublin Core is a well-known metadata initiative from continuous media into the searching, linking, and brows-
W3C that has been widely spread and used for text ing function of the World Wide Web. The file format un-
and Web pages on the Internet. The Scandinavian derlying the CMWeb technology, Annodex, streams the
SAM-group, with 25 archive specialists and engineers media content multiplexed with metadata in CMML for-
have defined semantic definitions and converted the mat that contains information relevant to the whole media
commonly used Dublin Core initiative for general use file (e.g., title, author, language) as well as time-sensitive
within the audio industry. The 15 basic elements of information (e.g., topics, speakers, time-sensitive hyper-
Dublin Core and new subsets have proven to cover links). This paper discusses the problem of generating
most of the tape protocols and database fields existing Annodex streams from complex linguistic annotations:
in broadcast production chain from early capturing over annotated recordings collected for use in linguistic
various types of production and all the way to distribu- research. We are particularly interested in
tion and archiving. This presentation covers some automatically annotated recordings of meetings and tele-
examples of the use of metadata transfer with Dublin conferences and see automatically-generated CMML
Core expressed in XML in Sweden and Norway. It files as one way of viewing such recordings. The paper
ends in a discussion of the future possibilities of Dublin presents some experiments with generating Annodex
Core in comparison with other existing metadata initia- files from hand annotated meeting recordings.
/
Month Year
Cardholder's Name (print)________________________________________________________________________
Signature of Cardholder__________________________________________________________________________
INTRODUCTION levels for speech reinforcement. Early pany, played Christmas carols for an
Horns and direct-radiating systems power amplifiers were limited to about audience of 50,000 using Pridham-
have provided the basis for sound rein- 10 watts output capability, and horn- Jensen rocking armature transducers
forcement for more than a century. driver efficiencies on the order of 20% connected to phonograph horns [12].
Both technologies have benefited from to 30% were necessary to reach the Western Electric set up a public
engineering and manufacturing im- desired sound pressure levels. address system capable of addressing
provements as well as demands for The direct field level referred to a 12,000 persons through 18 loudspeak-
pushing the performance envelope. distance of one meter produced by one ers in 1916 [24, p. 24].
Trends of fashion have often inter- acoustic watt radiated omnidirection- The first distributed system was
sected with engineering development, ally from a point source in free space is employed in 1919; 113 balanced arma-
economics, and even marketplace 109 dB LP [1, p. 314]. If we use a 10- ture driving units mounted on long
opportunism. A survey tutorial of the watt amplifier with a horn–driver com- horns were strung along New York
significant developments in transduc- bination that is 30% efficient, we can City’s Park Avenue “Victory Way” as
tion, signal transmission, and system produce three acoustical watts. If the a part of a Victory Bond sale [24, p. 25]
synthesis is presented here and dis- horn has a directivity index (DI) on [2], as shown in Fig. 1. The first suc-
cussed in historical perspective. axis of, say, 10 dB, then we can cessful indoor use of a public address
We begin with an overview of sound increase that level to: system was at the 1920 Chicago
reinforcement and the technologies Republican Convention, which also
that have supported it. This is followed Level (re 1 meter) = 109 + employed the first central cluster con-
by more detailed technical discussions 10 log (3) + DI = 124 dB LP figuration [24, p. 25], as shown in Fig.
of both direct radiating and horn sys- 2. On March 4, 1921, President Hard-
tems, leading to a discussion of mod- At a more realistic listening distance ing’s inauguration was amplified [24,
ern loudspeaker array techniques. The of 10 meters, the level would be, by p. 24], and on November 11, 1921,
presentation ends with a comprehen- inverse square relationship, 20 dB President Harding’s address in Arling-
sive bibliography. lower, or 104 dB. If wider coverage is ton, Virginia, was transmitted by West-
needed, more horns can be added and ern Electric, using Edgerton’s 1918
HISTORICAL PERSPECTIVES splayed as required. design of four-air-gap balanced-arma-
In the early days of transducer develop- There is little documentation of early ture units. For the first time 150,000
ment, horn systems offered the only examples of general speech reinforce- people, at Madison Square Garden in
means possible for achieving suitable ment, and that art progressed fairly New York, in the adjoining park, and in
slowly [9]. The first example of large- the Civic Auditorium in San Francisco,
*Revised and expanded from a presenta- scale sound reinforcement occurred on simultaneously listened to a person
tion at the Institute of Acoustics 12th
Annual Weekend Conference, Winder- Christmas Eve, 1915, when E. S. Prid- speaking [2].
mere, England, October 25-27, 1996. ham, cofounder of the Magnavox com- It was the cinema that paved the way
Fig. 8. The cone driver. Section view (A); equivalent circuit (B); radiation impedance of a cone in a large wall (C); power
response of system (D).
sients and other telegraphic signals. He mass controlled, and the cone looked with frequency to approximately ka = 2,
remarked at the time that it could be into a rising radiation impedance. This above which point it is essentially con-
used “for moving visible and audible in effect provided a significant fre- stant (ka is equal to cone circumference
signals.” quency region of flat power response divided by wavelength, or, 2πa/λ). Sys-
Half a century later in 1925, Rice for the design. Details of this are shown tem response is shown in Fig. 8D, and
and Kellogg of General Electric in Fig. 8. system efficiency over the so-called pis-
described “a new hornless loud- ton band is given [49] as:
speaker” that resembled that of Region of Flat Power Response
Siemens—a similarity that prompted Fig. 8A shows a section view of the
Rice to say: “The ancients have stolen cone loudspeaker with all electrical,
our inventions!” [37]. mechanical, and acoustical parame-
The key difference in the Rice and ters labeled. The equivalent circuit is
Kellogg design was the adjustment of shown in Fig. 8B; here the mechani-
mechanical parameters so that the fun- cal and acoustical parameters are
damental resonance of the moving sys- shown in the mobility analogy.
tem took place at a lower frequency When mounted in a large baffle,
than that at which the cone’s radiation the moving system looks into a com-
impedance had become uniform. Over plex acoustical load as shown in Fig. Fig. 9. Illustration of mutual coupling of LF
this range, the motion of the cone was 8C. The resistive component rises drivers.
416 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
η = [ρ0(Bl)2SD2/RE]/2πcMMS2
(1)
Mutual Coupling Fig. 10. The ported system. Section view (A); equivalent circuit (B); port and cone
In the LF range over which their contributions to total output (C).
response is essentially omnidirectional
(ka = 0.2 or lower), a doubling of
closely spaced driving units will result
in an increase in acoustical output of 3
dB for a fixed input power reference
level [39, 48, 52, 53]. The progression
in efficiency increase is shown in Fig. 9
for one, two, and four LF transducers,
respectively. In each case, the electrical
power delivered to each ensemble of
drivers is constant. Assume that the ref-
erence power fed to the single driver is
one watt; then for the set of two
drivers, the power per driver is 0.5
watt, and for the set of four, the power
per driver is 0.25 watt. Fig. 11. Power compression in a 380-mm-diameter LF driver. Curves for 1 and 100
One may imagine that, in the two- watts input are superimposed and displaced by 20 dB. (Data courtesy JBL.)
driver case with both drivers wired in
parallel, those two drivers have, in a give something for nothing, but there tion of power doubling for each two-
sense, coalesced into a new driver— are clear limits to its effectiveness. times increase in drivers is accurate
one with twice the cone area, twice the With each doubling of cone area, the only at very low frequencies and only
moving mass, and half the value of RE. ka = 0.2 upper response frequency if the efficiency values are low to
Thus, by Equation 1, the efficiency will corner moves downward approxi- begin with.
have doubled. For the case where the mately by a factor of 0.7, since this is
two drivers are wired in series, the the reciprocal of the value by which Distortion
analysis goes as follows: The new the effective cone radius has
driver has twice the cone area, twice increased. As the process of adding Mechanical Effects
the moving mass, four times the (Bl)2 drivers is continued, in the limit it can The primary distortion mechanism in
product, and twice the value of R E. be shown that the efficiency of an cone transducers is due to mechanical
Again, by Equation 1, there will be a ensemble of direct radiators cannot stress–strain limits. Small identified a
doubling of efficiency. exceed a value of 25% [38]. Because practical mechanical displacement
Mutual coupling often appears to of these constraints, the approxima- limit from rest position in the axial ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 417
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
direction as the excursion at which Below that frequency the electrical to thermal failure. Thermal failure is
10% harmonic distortion is reached. drive signal is generally rolled off to reached when the power dissipated in
This limit is known as xMAX. While a avoid subsonic over-excursions of the the voice coil as heat cannot be
loudspeaker may be operated beyond cone. removed at a sufficient rate to maintain
this displacement limit, at least on a A secondary mechanical distortion a safe operating temperature. A great
momentary basis, the 10% linearity effect will be seen when the voice coil deal of loudspeaker development has
departure is generally recognized as a driver is far enough out of the gap so gone into designing structures and
safe limit for good engineering prac- that there is a momentary loss of Bl moving elements that are not only
tice. Since cone displacement tends to product at peak excursion values. The resistant to heat but aid in its removal
increase as the inverse square of effect is asymmetrical and gives rise from the transducer [32, 36].
frequency down to the f0 region, it is to both even and odd distortion For most applications in sound rein-
easy to see how the x MAX limitation components. forcement, the effects of loudspeaker
may easily be encountered in normal heating are more likely to result in
operation. Port Turbulence in Vented Systems component failure than those associ-
The onset of the cone displacement In vented systems the ultimate output at ated with displacement limitations.
limit at low frequencies can be allevi- low frequencies may be limited not by Dynamic linearity or power compres-
ated by using ported LF enclosures. The considerations of maximum cone sion are terms used to describe the
nature of this design is shown in Fig. excursion, but rather by air turbulence effects of heating on audio performance
10. A section view of a ported system is in the enclosure port when the system [34]. The data shown in Fig. 11 pre-
shown in Fig. 10A, and the equivalent is operating at the tuning frequency sents the frequency response of a single
circuit is shown in Fig. 10B. The design [34]. A tentative limit here is to restrict 380-mm LF transducer with inputs of 1
relies on controlling the Helmholtz res- the port air particle velocity so that its watt and 100 watts. In each case the
onance of the enclosure to provide an peak value does not exceed about 5% chart recording of the levels has been
“assisted output,” via the port, that min- of the speed of sound. In general, ports adjusted to account for the 20-dB offset
imizes cone motion (and thus distor- should be designed with contoured between the curves. In this manner the
tion) at low frequencies, as shown in boundaries to minimize turbulence and response differences can be clearly
Fig. 10C. Thiele–Small parameters are the noise and losses it often produces. seen. If there were no dynamic com-
universally used today to synthesize Significant studies of port turbulence pression, the two curves would lie one
these systems. and its minimization through tapering on top of the other. As it is, the pro-
Virtually all commercial PC design the port tube’s cross-section area have gressive heating results in an increased
programs for ported systems will indi- been carried out by Vanderkooy [50, value of R E, which lowers the effi-
cate transducer displacement limits so 51], Salvatti and Button [45], and ciency. In extreme cases, the increase
that the design engineer will always be Roozen et al. [44]. in RE can result in changes in the LF
aware of whether a system, while still alignment, which may be clearly audi-
on the drawing board, will go into dis- Thermal Effects ble as such.
placement overload before it reaches its Modern cone transducers intended for Another way of viewing power com-
thermal limit. Good engineering prac- heavy-duty professional applications pression is shown in Fig. 12. Here, sev-
tice demands that a ported system take advantage of newer materials and eral 380-mm transducers have been
remain thermally limited down to f0. adhesives to make them more immune driven with a wide-band signal, and the
418 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
Theoretical Modeling
The compression driver is designed to
Fig. 20. Real and imaginary components of horn impedance for a long exponential match the impedance of the electrome-
horn (A) and a short exponential horn (B).
chanical system to the throat of the horn,
and the radiation impedance, reflected to
the electrical side of the circuit, is:
Fig. 21. Plane wave tube measurements of a compression driver showing η = (2RERET)/(RE + RET)2 (4)
amplitude response and impedance.
where RE is the voice coil resistance.
effectively cancel. For useful output increased acoustical output [58]. The When the voice coil resistance is made
from the front of the transducer, the sig- first example of thorough engineering equal to the radiation resistance, the
nal fed to the system must be equalized was carried out by Bell Telephone Lab- efficiency of the driver over its normal
with a 6-dB-per-octave rise for each oratories [83], working from the model pass-band will in theory be 50%. In
halving of frequency to compensate for of horn impedance described by Web- practice, efficiencies of the order of
the diminishing gradient. The equivalent ster [81]. Significant later development 30% can be achieved in the mid-
physical circuit of the loudspeaker is was carried out by Klipsch [66], who range—and this is only about
shown in Fig. 18B, and off-axis polar designed a remarkably compact bass 2 dB below the theoretical maximum.
data is shown in Fig. 18C. A system horn, and Salmon [76, 77], who
such as this would normally be used for described the impedance characteristics Region of Flat Power Output
speech purposes in highly reverberant of several important horn flares, includ- The data of Fig. 21 shows the normal
spaces where the loudspeaker’s DI of 6 ing the hyperbolic, or Hypex, profile power response for a compression
dB at LF would work to its advantage. [68]. Geddes [55] sought to position driver/horn combination when the
Vertical stacks of the device can Webster’s model as a special case horn’s throat impedance is resistive.
increase total output capability as well as within a broader context. The LF limit is due to the primary reso-
increase the on-axis DI. Fig. 19 shows the real part of the nance of the driver; for a typical HF
radiation impedance for hyperbolic, compression driver this may be in the
HORNS AND COMPRESSION exponential, and conical horn profiles. range of 500 Hz.
DRIVERS Here, only the exponential and hyper- The principal midband rolloff com-
bolic profiles provide useful output at mences at what is called the mass break
Early Development low frequencies. In our discussion we point, fHM, given by:
Many engineers and physicists have will restrict ourselves to the exponen-
contributed to horn and compression tial profile, since it has found almost fHM = (Bl)2/πREMMS (5)
driver development over the years. universal application over the years.
Early versions of the horn were used by Fig. 20A shows the real and imagi- where MMS is the mass of the moving
many tinkerers who basically did not nary parts of throat impedance for a system. For most HF compression
understand how the horn worked—they long exponential horn. For a horn of drivers the mass breakpoint takes place
knew only that somehow the horn practical length, we might observe in the range of 2500 to 4500 Hz. It is
422 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
considered a fundamental limit in HF
drivers, inasmuch as today’s magnetic
flux densities are normally maximized
in the range of 2 tesla, and low-moving
mass is limited by metals such as tita-
nium and beryllium that are not likely
to be replaced in the near future.
Two additional inflection points are
often seen in the HF driver response
curve: one is due to the volume of the
front air chamber in the driver, the
space between the diaphragm and the
phasing plug. Its effect on response
may be seen as low as 8 kHz in some
drivers. Voice coil inductance may
cause an additional HF rolloff at high
frequencies. This may be compensated
for through the use of a silver or cop-
per shorting ring plated on the pole-
piece in the region of the voice coil. Fig. 22. Second-harmonic distortion in two horn systems, using a compressed
fundamental. (Data courtesy JBL.)
(See Distortion in Transducer Mag-
netic Systems, page 419.)
Cone-Driven Horns
From the earliest days, cone transduc-
ers have been employed as horn drivers
[71, 72]. The theoretical principles that
govern the design parameters for horn
drivers apply equally to the adaptation
of cone drivers as well as to purpose-
designed compression drivers. Keele
[60] presented a straightforward and
useful analysis of LF horn design using
both Thiele–Small and traditional elec-
tromechanical parameters. Leach [69]
summarized Keele’s work, together
with Small’s approach to the subject
[78], and addressed other factors such
as reactance annulling.
Fig. 23. Plane wave tube amplitude response of three compression drivers.
Reactance Annulling
In some compression driver designs, a Distortion sound velocity at elevated temperatures
mechanical stiffness in the form of a The dominant cause of distortion in under adiabatic conditions.
small air chamber is located behind the compression driver-horn systems is due Thuras, Jenkins, and O’Neil [80] and
driver’s diaphragm. The mechanical to thermodynamic, or air, overload Goldstein and McLachlan [57] ana-
reactance resulting from the stiffness [75]. This comes as a result of lyzed the problem, leading to a simpli-
cancels in part the mass reactance por- extremely high pressures that exists at fied equation that gives the percent
tion of the radiation impedance, result- the horn throat: second harmonic distortion in horn
ing in a more resistive impedance in the systems:
region of the cutoff frequency. The LP = 94 + 20 log (WA(ρ0c)/ST)0.5 (6)
effect of this is greater acoustic output % 2nd HD = 1.73(f/fC) √IT x 10-2
in the horn cut-off frequency range for where WA is the acoustical power gener- (7)
a given drive signal [73, 74]. ated and ST is the throat area (m2).
Reactance annulling is not normally For example, in plane wave propaga- where IT is the intensity in watts per
used in HF compression drivers, but it tion, an intensity of one watt per square square meter at the horn’s throat, f is
is used in the design of bass horns, most centimeter will produce a sound pres- the driving frequency, and fC is the cut-
notably in the case of the Klipschorn, sure level of 160 dB LP. For levels in off frequency of the horn.
where the normal response associated this range, successive pressure peaks Fig. 22 presents measurements of the
with a 47-Hz flare rate is extended are tilted forward as they propagate second-harmonic distortion produced
down to about 40 Hz [66]. down the horn due to the increase in by two horns of differing flare rates. ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 423
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
kind of distortion can be defined mathe-
matically, a model can be implemented
and used to predistort the signal, result-
ing in reduced distortion in the system’s
output over a given power operating
range. Klippel [65] describes some of
the techniques for accomplishing this.
Fig. 25. Beamwidth and directivity data for typical radial horn (A) and slant-plate lens system (B).
diaphragm with a half-roll surround. reinforcement applications. For multi- one dimension, as shown in Fig. 25B
Note that, due to the greater stiffness and cellular horns, in the early days [83], [54, 67].
lower mass of the material, the sec- groups of exponential cells, each about
ondary resonance has shifted out to 15° wide in each plane, were clustered Constant Directivity Horns
about 17 kHz. Driver C has an alu- together to define a specific solid radi- Also known as uniform coverage or
minum diaphragm with distributed sur- ation angle; this produced excellent constant coverage horns, these designs
round geometry that moves the sec- results at mid-frequencies, but there date from the mid-1970s to the early
ondary resonance to beyond 20 kHz, was pronounced “fingering” of the 1980s [59, 62, 79]. The basic design
resulting in smooth, extended response response along the cell boundaries at common to a number of manufacturers
within the normal audio band with no higher frequencies. For radial horns, in uses a combination of exponential or
pronounced peaks [70]. this application, the horn’s horizontal conical throat loading, diffraction wave
profile is conical, with straight, radial guide principles, and flared termina-
Directional Response sides defining a target coverage angle tions to produce uniform nominal cov-
The basic exponential horn exhibits and the vertical profile is tailored to erage angles in the horizontal and verti-
directional response as shown in make a net exponential profile along cal planes. The general shape of the
Fig. 24. From the earliest days it was the horn’s primary axis; the nominal beamwidth curve is shown in Fig. 26A,
recognized that directional characteris- horizontal and vertical -6 dB as it applies to the horizontal and verti-
tics were a key element of loudspeaker beamwidth of a radial horn is shown in cal planes independently. Fig. 26B
performance [84]. Over decades of Fig. 25A [71]. For acoustical lenses, a shows the measured beamwidth and DI
development, numerous methods have slant-plate acoustical lens can be of a typical constant directivity horn
been used to improve directional per- placed at the mouth of an exponential with nominal 90°-by-40° pattern con-
formance at high frequencies for sound horn to diverge the exiting waves in trol. Within certain limitations, ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 425
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
Fig. 28. Directivity of a four-element vertical line array with 0.2 meter separation between driver centers. 200 Hz (A); 350 Hz
(B); 500 Hz (C); 1 kHz (D); directivity factor for arrays of 4, 6, 8, and 10 elements (E).
acoustic waveguide theory has pro- A is modified by a line array as The Simple Line Array
posed an alternate approach to achiev- shown at B, and a very large planar Kuttruff [95] describes the polar
ing similar goals [55, 56]. array will show little attenuation with response of a line array of omnidirec-
distance up to limits proportional to tional sources in the plane of the array as:
ARRAYS the array dimensions [98]. A finite
Both horns and direct radiators may planar array will have the characteris- R (θ) = (sin [1/2 Nkd sin θ])/(N sin
be treated the same in terms of array- tics shown at D. Long horizontal line [1/2 kd sin θ]) (8)
ing. In this section we will examine arrays have been placed above
some useful concepts. Single-ele- prosceniums in performance spaces where N is the number of elements in
ment, line, and planar arrays differ in to extend the range of a high direct- the array, k is 2πf/c, d is the spacing of
their radiation characteristics over to-reverberant ratio toward the rear of the elements in the array, c is the speed
distance, as shown in Fig. 27. At long the space; large planar arrays are the of sound, and θ is the measurement
wavelengths, the simple inverse mainstay of mega-event music rein- angle in radians. For four elements as
square relationship of a point source forcement [87]. shown in Fig. 28, the polar response ➥
C
A B
Fig. 29. Tapering of arrays. Electrical tapering (A); acoustical tapering (B); tapering through component rotation (C).
Fig. 31. Grateful Dead “Wall of Sound” direct radiator system, 1974, using line,
planar, and arc segment arrays. Individual discrete systems were employed for
each instrument separately from the vocal reinforcement. (Photo courtesy
Richard Pechner.)
B
A
Fig. 32. A large array for music reinforcement. Physical layout (A); off-axis response on the ground plane (B). (Data courtesy
Gander and Eargle.)
A B C
35. A programmable array. View of array (A); signal flow diagram (B); examples of variable polar response (C). (Data courtesy
Eastern Acoustics Works.)
sound pressure levels at considerable by π [98]. For progressively shorter array will act independently, with the
distances with relatively low distortion. wavelengths, this distance increases upper section producing a highly direc-
The primary defect is the dense comb according to the following equation: tive HF beam for distant coverage and
filtering (lobing) and “time smearing” the lower section producing broader
that inevitably results from such a multi- r = l2f/700 meters (9) radiation for near coverage.
plicity of sources covering a given lis- where l is the array length, and r and l Ureda proposes the spiral array for
tening position. Actually, since the are in meters [99]. more uniform overall coverage. The
required acoustic power cannot be Fig. 33A shows the attenuation pat- spiral array is continuously curved
achieved by a single source, the aim tern with distance from a straight from the top down, beginning with
here should be to keep the coverage at 3-meter array at 10 kHz, and the polar small angular increments, which
each listening position as uniform as response in the far field of that array is increase downward in arithmetic fash-
possible. The greater the number of shown at B. Note that the beamwidth (- ion. Fig. 34A shows a side view of
effective sound sources, the finer the 6-dB) is 0.8°, as given by the equation: such an array with a total length of 5
lobing patterns and the more uniform meters and a terminal angle of 45°.
the received signal will be. Ideally, we θline array = 2 sin-1(0.6λ/l) (10) The directivity function is remark-
would like the interference peaks and ably uniform with frequency over about
dips among the elements to be well where l is the length of the array and λ a decade. Fig. 34B shows a group of
within the ear’s critical bandwidths. is the wavelength [99]. polar plots from 500 Hz to 5 kHz.
The pronounced HF beaming of
Continuous Line Arrays straight arrays is of course a liability, Steerable Arrays
A continuous line array is an approxi- and articulation of the array is one way The very large arrays for music dis-
mation of a uniformly illuminated “rib- of dealing with the problem cussed earlier consist of elements
bon” of sound, and its directional equally driven in terms of level and
behavior in the far field can be deter- J and Spiral Arrays bandwidth, and the directional proper-
mined by equation (8). At low frequen- Ureda [100] describes a J array as hav- ties are due entirely to the spatial rela-
cies, the far-field for a straight line ing two distinct portions: a straight tionships among elements. A steerable
array begins approximately at a dis- upper section and a uniformly curved array is one in which the elements are
tance equal to the array length divided lower section. Each segment of the fixed in space, with relative drive ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 429
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
levels, signal delay, and frequency speakers and Their Development,” no. 4 (1937).
tapering individually adjustable for SMPTE (March 1937). [21] Olson, H., “Horn Loudspeak-
each transducer. [8] Frayne, J. and Locanthi, B., ers, Part II. Efficiency and Distortion,”
Relatively simple arrays can be “Theater Loudspeaker System Incor- RCA Review, vol. II, no. 2 (1937).
reconfigured, through sequential tim- porating an Acoustic Lens Radiator,” [22] Rice, C. and Kellogg, E.,
ing, to steer their beams as needed [85, SMPTE, 63:3, pp. 82-85 (September “Notes on the Development of a New
86, 96, 97]. While far-field modeling 1954). Type of Hornless Loudspeaker,”
may be fairly simple, the fact that many [9] Green, I. and Maxfield, J., “Pub- Transactions, AIEE, volume 44, pp.
listeners are seated in the transition lic Address Systems,” Bell System 982-991 (September 1925). Reprinted
region between near and far fields Technical Journal, 2:2, p. 113 (April J. Audio Eng. Soc., 30:7/8, pp. 512-
makes the problems of reconfiguration 1923). Reprinted in J. Audio Eng. 521 (July/August 1982).
and uniformity of coverage fairly com- Soc., 25:4, pp. 184-195 (April 1977). [23] Thiele, N. and Small, R., Direct
plex to estimate. [10] Hilliard, J., “An Improved The- Radiator Sealed Box, Vented Box, and
The relatively small system shown in ater-Type Loudspeaker System,” J. Other Papers Collected in AES Loud-
Fig. 35 can be configured via a PC by Audio Eng. Soc., 17:5, pp. 512-514 speaker Anthologies, Volumes 1, 2,
the user as required. The system profile (Oct. 1969). Reprinted in JAES, Nov and 3, Audio Engineering Society,
and signal flow diagram are shown at A 1978. New York, 1978, 1984, 1996.
and B, and a family of typical far-field [11] Hilliard, J., "A Study of The- [24] Thrasher, F., ed., Okay for
polar plots is shown at C. Systems such ater Loudspeakers and the Resultant Sound . . . How the Screen Found Its
as these, large or small, are presently Development of the Shearer Two-Way Voice, Duell, Sloan, and Pearce, New
used to solve intelligibility problems in Horn System," SMPTE, pp. 45-59 York, 1946.
a variety of large reverberant spaces. (July 1936). [25] Villchur, E., “Problems of Bass
[12] Hilliard, J., “Historical Review Reproduction in Loudspeakers,” J.
REFERENCES AND of Horns Used for Audience-Type Audio Eng. Soc., 5:3, pp. 122-126
SUPPLEMENTAL Sound Reproduction,” J. Acoust. Soc. (July 1957).
BIBLIOGRAPHY Am., 59:1, pp. 1-8 (January 1976). [26] Villchur, E., “Revolutionary
[13] Keele, D., “An Efficiency Con- Loudspeaker and Enclosure,” Audio,
Historical Perspectives stant Comparison between Low-Fre- vol. 38, no. 10 (October 1954).
[1] Beranek, L. Acoustics, John quency Horns and Direct Radiators,” [27] Wente, E. and Thuras, A.,
Wiley & Sons, New York, 1954. Cor- presented at the 54th AES Conven- “Auditory Perspective—Loudspeakers
rected edition published by the Ameri- tion, Los Angeles, 4-7 May 1976; and Microphones,” Electrical Engi-
can Institute of Physics for the Acous- preprint 1127. neering, vol. 53, pp. 17-24 (January
tical Society of America, 1986. [14] Kock, W. and Harvey, F., 1934). Also, BSTJ, XIII:2, p. 259
[2] Beranek, L., “Loudspeakers and “Refracting Sound Waves,” J. Acoust. (April 1934), and J. Audio Eng. Soc.,
Microphones,” J. Acoust. Soc. Am., Soc. Am., 21:5, pp. 471-481 (Septem- volume 26, number 3 (March 1978).
26:5 (1954). ber 1949).
[3] Clark, L. and Hilliard, J., “Head- [15] Lansing, J., “New Permanent Direct Radiators
phones and Loudspeakers,” Chapter Magnet Public Address Loudspeaker,” [28] Badmaieff, A., “Sound Repro-
VII in Motion Picture Sound Engi- SMPTE, 46:3, pp. 212 (March 1946). ducing Device,” Altec “Duocone,” U.
neering, D. Van Nostrand, New York, [16] Lansing, J. and Hilliard, J., “An S. Patent 2,834,424, issued 13 May
1938. Improved Loudspeaker System for 1958; filed 26 January 1956.
[4] Eargle, J. and Gelow, W., “Per- Theaters,” SMPTE, 45:5, pp. 339-349 [29] Bank, G., and Harris, N., “The
formance of Horn Systems: Low-Fre- (November 1945). Distributed Mode Loudspeaker—The-
quency Cut-off, Pattern Control, and [17] Locanthi, B., “Application of ory and Practice,” AES UK Confer-
Distortion Trade-offs,” presented at Electric Circuit Analogies to Loud- ence, London (16-17 March 1998).
the 101st AES Convention, Los Ange- speaker Design Problems,” IRE Trans- [30] G. Beers and H. Belar, “Fre-
les, 8-11 November 1996; preprint actions on Audio, volume PGA-4, quency Modulation Distortion in
4330. March 1952. Reprinted in J. Audio Loudspeakers,” Proceedings, IRE,
[5] Engebretson, M. and Eargle, J., Eng. Soc., 19:9, pp. 778-785 (1971). volume 31, number 4 (April 1943).
“Cinema Sound Reproduction Sys- [18] Martin, D., “Speaker Technol- Reprinted in J. Audio Eng. Soc., 29:5,
tems: Technology Advances and Sys- ogy for Sound Reinforcement,” Studio pp. 320-326 (May 1981).
tem Design Considerations,” SMPTE, Sound, March 1976. [31] Benson, J. E. “Theory and
91:11 (1982). Also see AES preprint [19] Novak, J. “Performance of Design of Loudspeaker Enclosures,”
1799. Enclosures for Low-Resonance, High Amalgamated Wireless Australia
[6] Engebretson, M., “Low Fre- Compliance Loudspeakers,” J. Audio Technical Review (1968, 1971, 1972).
quency Sound Reproduction,” J. Eng. Soc., 7:1, pp. 29-37 (January [32] Button, D., “Heat Dissipation
Audio Eng. Soc., 32:5, pp. 340-352 1959). and Power Compression in Loud-
(May 1984). [20] Olson, H., “Horn Loudspeak- speakers,” J. Audio Eng. Soc., 40:1/2,
[7] Flanagan, C., Wolf, R., and ers, Part I. Impedance and Directional pp. 32-41 (January/February 1992).
Jones, W., “Modern Theater Loud- Characteristics,” RCA Review, vol. I, [33] Gander, M., “Moving-Coil
430 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
Loudspeaker Topology and an Indica- sented at the 105th AES Convention, pp. 250-256 (March 1924); discussion
tor of Linear Excursion Capability,” J. San Fransisco, 1998; preprint 4855. pp. 1191-1197. Reprinted in J. Audio
Audio Eng. Soc., 29:1, pp. 10-26 (Jan- [46] Siemens, E., U.S. Patent Eng. Soc., 25:9, pp. 573-585 (Septem-
uary/February 1981). 149,797 (1874). ber 1977) and volume 26, number 3
[34] Gander, M., “Dynamic Linear- [47] Small, R., “Closed-box Loud- (March 1978).
ity and Power Compression in Mov- speaker Systems, Parts 1 and 2.” J. [59] Henricksen, C. and Ureda, M.,
ing-Coil Loudspeakers,” J. Audio Eng. Audio Eng. Soc., 20:10, pp. 798-808 “The Manta-Ray Horns,” J. Audio
Soc., 34:9, pp. 627-646 (September (December 1971) and 21:1, pp. 11-18 Eng. Soc., 26:9, pp. 629-634 (Septem-
1986). (January/February 1972). ber 1978).
[35] Harris, N., and Hawksford, O., [48] Strahm, C., “Complete Analy- [60] Keele, D., “Low-Frequency
“The Distributed Mode Loudspeaker sis of Single and Multiple Loud- Horn Design Using Thiele-Small
as a Broad-Band Acoustic Radiator,” speaker Enclosures,” presented at the Parameters,” presented at the 57th
presented at the AES 103rd Conven- 81st AES Convention, Los Angeles, AES Convention, Los Angeles, 10-13
tion, New York, September 1997, 12-16 November 1986; preprint 2419. May 1977; preprint 1250.
preprint 4526. [49] Thiele, N., “Loudspeakers in [62] Keele, D., “What’s So Sacred
[36] Henricksen, C., “Heat Transfer Vented Boxes, Parts 1 and 2,” J. About Exponential Horns,” presented
Mechanisms in Loudspeakers: Analy- Audio Eng. Soc., 19:5 and 6, pp. 382- at the 51st AES Convention, Los
sis, Measurement, and Design,” J. 392, 471-483 (May and June 1971). Angeles, 13-16 May 1975; preprint
Audio Eng. Soc., 35:10, pp. 778-791 [50] Vanderkooy, J., “Loudspeaker 1038.
(October 1987). Ports,” presented at the 103rd AES [63] Keele, D., “Optimum Horn
[37] Hunt, F., Electroacoustics, J. Convention, New York, September Mouth Size,” presented at the 46th
Wiley & Son, New York (1954). 1997; preprint 4523. AES Convention, New York, Septem-
Reprinted by the American Institute of [51] Vanderkooy, J., “Nonlinearities ber 1973; preprint 933.
Physics for the Acoustical Society of in Loudspeaker Ports,” presented at [64] Kinoshita, S. and Locanthi, B.,
America, 1982, p. 59. the 104th AES Convention, Amster- “The Influence of Parasitic Reso-
[38] Keele, D. B. (Don), “Maximum dam, The Netherlands, May 1998; nances on Compression Driver Loud-
Efficiency of Direct Radiator Loud- preprint 4748. speaker Performance,” presented at
speakers, 91st AES Convention, New [52] Wolff, I., and Malter, L., the 61st AES Convention, New York,
York, October 1991; preprint 3193. “Sound Radiation from a System of November 1978; preprint 1422.
[39] Klapman, “Interaction Circular Diaphragms,” Physical [65] Klippel, W., “Modeling the
Impedance of a System of Circular Review, vol. 33, pp. 1061-1065 (June Nonlinearities in Horn Loudspeakers,”
Pistons,” J. Acoustical Society of 1929). J. Audio Eng. Soc., 44:6, pp. 470-480
America, vol. 11, pp. 289-295 (Jan- [53] Zacharia, K. and Mallela, S., (June 1996).
uary 1940). “Efficiency of Multiple-Driver [66] Klipsch, P., “A Low-Frequency
[40] Klipsch, P., “Modulation Dis- Speaker Systems,” presented at the Horn of Small Dimensions,” J.
tortion in Loudspeakers: Parts 1, 2, IREE (Australia) Convention, 1975. Acoust. Soc. Am., vol. 13, pp. 137-144
and 3,” J. Audio Eng. Soc., 17:2, pp. (October 1941). Reprinted in J. Audio
194-206 (April 1969); 18:1, pp. 29-33 Horns and Compression Drivers Eng. Soc., 27:3, pp. 141-148 (March
(February 1970); 20:10, pp. 827-828 [54] Frayne, J. and Locanthi, B., 1979).
(December 1972). “Theater Loudspeaker System Incor- [67] Kock, W. and Harvey, F.,
[41] Leach, M., “Electroacoustic- porating an Acoustic Lens Radiator,” “Refracting Sound Waves,” J. Acoust.
Analogous Circuit Models for Filled SMPTE, 63:3, pp. 82-85 (September Soc. Am., 21:5, pp. 471-481 (Septem-
Enclosures,” J. Audio Eng. Soc., 1954). ber 1949).
37:7/8, pp. 586-592 (July 1989). [55] Geddes, E., “Acoustic Wave- [68] Leach, M., “A Two-Port Anal-
[42] Olson, H., Acoustical Engi- guide Theory,” J. Audio Eng. Soc., 37: ogous Circuit and SPICE Model for
neering, D. Van Nostrand, New York, 7/8, pp. 554-569 (July/August 1989). Salmon’s Family of Acoustic Horns,”
1957. Reprinted by Professional [56] Geddes, E., “Acoustic Wave- J. Acoust. Soc. Am., 99:3, pp. 1459-
Audio Journals, Philadelphia, PA, guide Theory Revisited,” J. Audio 1464 (March 1996).
1991. Eng. Soc., 41:6, pp. 452-461 (June [69] Leach, M., “On the Specifica-
[43] Olson, H., “Gradient Loud- 1993). tion of Moving-Coil Drivers for Low-
speakers,” J. Audio Eng. Soc., 21:2, [57] Goldstein, S. and McLachlan, Frequency Horn-Loaded Loudspeak-
pp. 86-93 (March 1973). N., “Sound Waves of Finite Ampli- ers,” J. Audio Eng. Soc., 27:12, pp.
[44] Roozen, N. B., et al., “Vortex tude in an Exponential Horn,” J. 950-959 (December 1979). Com-
Sound in Bass-Reflex Ports of Loud- Acoust. Soc. Am., vol. 6, pp. 275-278 ments: J. Audio Eng. Soc., 29:7/8, pp.
speakers, Parts 1 and 2,” J. Acoust. (April 1935). 523-524 (July/August 1981).
Soc. Am., vol. 104, no. 4, (October [58] Hanna, C., and Slepian J., “The [70] Murray, F. and Durbin, H.,
1998). Function and Design of Horns for “Three-Dimensional Diaphragm Sus-
[45] Salvatti A., Button, D., and Loudspeakers,” Transactions, AIEE, pensions for Compression Drivers,” J.
Devantier, A., “Maximizing Perfor- vol. 43, pp. 393-404 (February 1924); Audio Eng. Soc., 28:10, pp.720-725
mance of Loudspeaker Ports,” pre- also, abridged text in J. AIEE, vol. 43, (October 1980). ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 431
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
[71] Olson, H., “A New High-Effi- 1934). Also, Bell System Technical J., Applied Science Publishers, London,
ciency Theatre Loudspeaker of the XIII:2, p. 259 (April 1934). Reprinted 1979.
Directional Baffle Type,” J. Acoust. in J. Audio Eng. Soc., 26:7/8, pp. 518- [96] Meyer, D., “Multiple-Beam
Soc. Am., pp. 485-498 (April 1931). 525 (July/August 1978). Electronically Steered Line-Source
[72] Olson, H., “Recent Develop- [84] Wolff and Malter, “Directional Arrays for Sound Reinforcement
ments in Theatre Loudspeakers of the Radiation of Sound,” J. Acoust. Soc. Applications,” J. Audio Eng. Soc.,
Directional Baffle Type,” SMPTE Am., vol. 2, pp. 201-241 (October 38:4, pp. 237-249 (April 1990).
(May 1932). 1930). [97] Meyer, D., “Digital Control of
[73] Plach, D., “Design Factors in Loudspeaker Array Directivity,” J.
Horn-Type Loudspeakers,” J. Audio Arrays Audio Eng. Soc., 32:1, pp. 747-754
Eng. Soc., 1:4, pp. 276-281 (October [85] Augspurger, G. and Brawley, (1984).
1953). J., “An Improved Collinear Array,” [98] Rathe, E., “Note on Two Com-
[74] Plach, D. and Williams, P., presented at the 74th AES Conven- mon Problems of Sound Reproduc-
“Reactance Annulling for Horn Loud- tion, New York, 8-12 October 1983; tion,” J. Sound and Vibration, vol. 10,
speakers,” Radio-Electronic Engineer- preprint 2047. pp. 472-479 (1969).
ing, pp. 15-17, 35 (February 1955). [86] Augspurger, G., “Near-Field [99] Ureda, M., “Line Arrays: The-
[75] Rocard, M., “Sur la Propaga- and Far-Field Performance of Large ory and Applications," presented at the
tion des Ondes Sonores d’Amplitude Woofer Arrays,” J. Audio Eng. Soc., 110th AES Convention, Amsterdam,
Finie,” Comptes Rendus, p. 161, 16 38:4, pp. 231-236 (April 1990). May 2001; preprint 5304.
January 1933. [87] Davis., D and Wickersham, R., [100] Ureda, M., “‘J’ and ‘Spiral’
[76] Salmon, V., “Hypex Horns,” “Experiments in the Enhancement of Line Arrays,” presented at the 111th
Electronics, vol. 14, pp. 34-35 (July the Artist’s Ability to Control His AES Convention, New York, Decem-
1941). Interface with the Acoustic Environ- ber 2001; preprint 5485.
[77] Salmon, V., “A New Family of ment in Large Halls,” presented a the
Horns,” J. Acoust. Soc. Am., 17:3, pp. 51st AES Convention, Los Angeles, SUPPLEMENTAL
212-218 (January 1946). 13-16 May 1975; preprint number BIBLIOGRAPHY:
[78] Small, R., “Suitability of Low- 1033. [101] Borwick, J., ed., Loudspeaker
Frequency Drivers for Horn-Loaded [88] Franssen, N., “Direction and and Headphone Handbook, third ed.,
Loudspeaker Systems,” presented at Frequency Independent Column of Focal Press, Oxford, UK, 2001.
the 57th AES Convention, Los Ange- Electroacoustic Transducers,” Philips [102] Colloms, M., High Perfor-
les, 10-13 May 1977; preprint 1251. “Bessel” Array, Netherlands Patent mance Loudspeakers, fifth ed., John
[79] Smith, D., Keele, D., and Ear- 8,001,119, 25 February 1980; U.S. Wiley & Sons, New York, 1997.
gle, J., “Improvements in Monitor Patent 4,399,328, issued 16 August [103] Dickason, V., The Loud-
Loudspeaker Design,” J. Audio Eng. 1983. speaker Cookbook, sixth ed., Audio
Soc., 31:6, pp. 408-422 (June 1983). [89] Gander, M. and Eargle, J., Amateur Press, Peterborough, NH,
[80] Thuras, A., Jenkins, R., and “Measurement and Estimation of 2000.
O’Neill, H., “Extraneous Frequencies Large Loudspeaker Array Perfor- [104] Eargle, J., Electroacoustical
Generated in Air Carrying Intense mance,” J. Audio Eng. Soc., 38:4, pp. Reference Data, Van Nostrand Rein-
Sound Waves,” J. Acoust. Soc. Am., 204-220 (1990). hold, New York, 1994.
vol. 6, pp. 173-180 (January 1935). [90] Keele, D., “Effective Perfor- [105] Eargle, J., Loudspeaker Hand-
[81] Webster, A., “Acoustical mance of Bessel Arrays,” J. Audio book, 2nd ed., Kluwer Academic Pub-
Impedance and the Theory of Horns Eng. Soc., 38:10, pp. 723-748 (Octo- lishers, Boston, 2003.
and of the Phonograph,” Proceedings ber 1990). [106] Eargle, J. and Foreman, C.,
of the National Academy of Sciences, [91] Kinsler, L., et al., Fundamen- JBL Audio Engineering for Sound
vol. 5, pp. 275-282 (May 1919). tals of Acoustics, third edition, Wiley, Reinforcement, Hal Leonard Publica-
Reprinted in J. Audio Eng. Soc., New York, 1980. tions, 2002.
25:1/2, pp. 24-28 (January/February [92] Kitzen, J., “Multiple Loud- [107] Langford-Smith, F., ed.,
1977). speaker Arrays Using Bessel Coeffi- Radiotron Designer’s Handbook,
[82] Wente, E. and Thuras, A., “A cients,” Electronic Components & Amalgamated Wireless Valve Co,
High Efficiency Receiver for Horn- Applications, 5:4 (September 1983) Sydney, and Radio Corporation of
Type Loudspeakers of Large Power [93] Kleis, D., “Modern Acoustical America, Harrison, NJ, 1953 (avail-
Capacity,” Bell System Technical Engineering,” Philips Technical able on CD ROM).
Journal, VII:1, p. 40 (January 1928). Review, 20:11, pp. 309-348 (1958/59). [108] Merhaut, J., Theory of Elec-
Reprinted in J. Audio Eng. Soc., 26:3, [94] Klepper, D. and Steele, D. troacoustics, McGraw-Hill, New
pp. 139-144 (March 1978). “Constant Directional Characteristics York, 1981.
[83] Wente, E. and Thuras, A., from a Line Source Array,” J. Audio [109] Olson, H. Solutions of Engi-
“Auditory Perspective—Loudspeakers Eng. Soc., 11:3, pp. 198-202 (July neering Problems by Dynamical
and Microphones,” Electrical Engi- 1963). Analogies, second ed., D. Van Nos-
neering, vol. 53, pp. 17-24 (January [95] Kuttruff, H., Room Acoustics, trand, Princeton, NJ, 1958.
Fig. 1. General model describing the basic signal flow in loudspeakers using linear (thin) and nonlinear (bold) subsystems
(Figs. 1 and 2 courtesy Klippel).
has to be adopted. The filter parameters uations.” increased loudspeaker sensitivity com-
are adapted based on the loudspeaker As in Klippel’s paper, Bright consid- pared with traditional designs, owing to
input signal u(t) and the acoustic output ers a feed-forward system, which needs the use of shorter voice coils, together
p(t), making the system self-tuning. to be tuned in situ so as to take into with a reduction in distortion compared
However, the sensor can be deactivated account specific conditions within the with the uncompensated case as shown
at any time and the filter will continue loudspeaker in question such as long- in Fig. 3.
to operate based on current parameters. term drift, aging, and thermal changes.
Bright, in “Simplified Loudspeaker This can be achieved with feedback PSYCHOACOUSTIC LF
Distortion Compensation by DSP,” dis- from an electrical current sensor. Over- EXTENSION
cusses similar ideas. He describes a rel- all this compensation enables manufac- Aarts, in “Applications of DSP for
atively straightforward concept in turers to use a shorter voice coil that can Sound Reproduction Improvement,”
which a discrete-time model of the be located in the most concentrated part describes a novel means for giving the
loudspeaker in question is inverted to of the loudspeaker’s magnetic field, impression that a loudspeaker is
create an appropriate linearization filter. having correspondingly lower mass and reproducing more bass energy than it
He says, “the resulting algorithm for higher sensitivity. While this is nor- actually is. This is based on the psy-
compensation of nonlinear distortion is mally not done because of the nonlin- choacoustic principle of the missing
relatively simple, consisting of one or earity that results at high voice-coil dis- fundamental or “residue pitch,” in
more second-order IIR filter blocks placements, the signal processing which the hearing mechanism tends to
(depending on the order of the linear described makes it possible. assume the presence of a fundamental
dynamics) and several polynomial eval- Experimental results demonstrated frequency when harmonics of that ➥
Fig. 3. Bright’s results from one experimental loudspeaker with a shortened voice coil showing the distortion of an 800-Hz tone.
The left plot shows the acoustic pressure at 28 cm from the loudspeaker, and the right plot shows the amplifier’s output voltage.
The darker line shows the result with distortion compensation active (courtesy Bright).
N
porate sufficient DSP power, not only to
N’ (N’<N)
fs
Rfs realize simple functions such as volume
A control, delay, and shelving filtering, but
also to implement transducer response
DSP
correction and adaptation to its local
DAMP DAE
acoustic environment via the use of
equalization methods. These can incor-
porate FIR filtering on PCM data by
Q
PCM
(n/s, o/s)
PCM PCM to PWM Buffer
using room response inverse filters
N
fs
N’ (N’<N)
Rfs
1
2(2N-1)Rf s
derived from responses individually
B measured for each of the channel
receivers.” A possible implementation
DSP DAMP DAE scheme is shown in Fig. 4. However, as
we will see in the next section, this may
not always be the most appropriate way
of dealing with the problem of the
loudspeaker–room interface.
S/D
PCM
(n/s, o/s) Buffer Tatlas et al. go on to discuss digital
N
loudspeakers, explaining that current
fs 1
Rfs research is centered on two different
approaches—digital transducer arrays
C
(DTA) and multiple voice coil digital
Fig. 5. Three different approaches to the digital transducer array. (A) PCM; (B) PWM; loudspeakers (MVCDL). They prefer to
(C) 1-bit sigma–delta modulation. concentrate on the former for their anal-
ysis owing to promising characteristics
frequency are present even if the fun- frequency audio is also present. of the technology. DTAs are presented
damental is not physically present. There are various advantages and and tested with a variety of different dig-
Small loudspeakers tend to suffer disadvantages to this idea. The advan- ital audio signal types including multibit
from a poor low-frequency response, tages Aarts cites are: little energy is PCM, PWM, and 1-bit sigma–delta
making them ideal candidates for some radiated below the loudspeaker’s cut- (otherwise known as Direct Stream
form of enhancement. In this psychoa- off frequency; less headroom is Digital in the Sony/Philips SACD
coustic bass-enhancement system syn- required compared with more tradi- paradigm).
thetic harmonics of the “missing” bass tional “bass boost” approaches for a A typical digital transducer array, as
(that which the loudspeaker cannot comparable bass effect; the system is shown in Fig. 5, consists of three stages:
reproduce satisfactorily) are derived computationally and power efficient; it DSP (digital signal processing), DAMP
from the audio signal and added to the can be implemented with a simple ana- (digital amplification), and DAE (digital
part of the spectrum where the loud- log circuit if required; and it can also be audio emission).
speaker does radiate well, thereby giv- tuned to any kind and size of loud- The authors show how a typical
ing the impression that the lower- speaker. However, drawbacks include PCM digital transducer array might be
436 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
DSP in Loudspeakers
Table 1. Theoretical (A) and actual (B) characteristics of for the possibility
the arrays tested. N is the resolution of the digital signal of obtaining any refer-
and R is the oversampling ratio employed. ence response by this
Digital Transducer Transducer
method. In other words,
Speaker Number Frequency the designer is able to
PCM 2N’-1 Rfs
ensure that the loud-
speaker sounds similar
PWM 2(2N’-1) Rfs
wherever it is placed in
SDM R fs different listening
A rooms.
Measurement of radi-
Digital Transducer Transducer ation resistance requires
Speaker Number Frequency
that measurements are
PCM 63 44–176 kHz taken at two different
PWM 510 44–176 kHz distances from the loud-
Fig. 6 . Simulated 6-bit PCM-based SDM 32 44–176 kHz speaker, using a method
array showing transducer matrix; 1 similar to that shown in
corresponds to the MSB of the digital B
Fig. 8. (A commercial
signal.
implementation of this
constructed in a matrix of small drivers not necessarily optimal despite the fact principle has a motorized microphone,
as arranged in Fig. 6, with each driver that it might intuitively seem so. mounted below the woofer, which pops
having a certain “weight” and driven ABC concentrates on the bass fre- out to two different distances during the
by a specific bit of the digital signal. quencies (20 to 500 Hz), views the calibration phase.) Once the measure-
The authors simulated different types problem from the loudspeaker’s situa- ments are available, a smoothed 16th
and topologies of these arrays, in an tion, and works by examining the radi- order IIR filter is calculated that
attempt to evaluate their relative perfor- ation resistance of the loudspeaker attempts to approximate the desired
mances. The three arrangements used (which is related to the power output). correction. ➥
are described in Table 1. The radiation resis-
Overall it was found that the PWM tance is affected by
version required an impractically large the existence of room
number of drivers as well as showing modes so this method
higher distortion than the other array makes it possible to
types. Oversampled low-bit PCM compensate for the
arrangements showed promising effect of these on the
results with a manageable array size. perceived loud-
The SDM version performed well on- speaker timbre,
axis and its physical characteristics adjusting for the
made it a practical proposition, but its loudspeaker’s current Fig. 7. The basic principle of the ABC system (Figs. 7 and 8
directivity characteristics would position in the room. courtesy Pedersen).
require some attention. Generally, the (For example, the
smaller sized DTAs were found to be bass sound pressure
preferable and the distortion results level might be up to 9 dB higher when
measured off-axis were considerably the loudspeaker is placed in a corner
poorer than those on-axis. when compared to the free field.) By
controlling the acoustic power of the
LOUDSPEAKER–ROOM loudspeaker in this way it is claimed
INTERFACE EQUALIZATION that the equalization of timbre is per-
Pedersen, in “Adaptive Bass Control— ceived at any point in the listening
The ABC Room Adaptation System,” room, rather than being strongly
points out that traditional loudspeaker- affected by changes in listening posi-
room equalization systems concentrate tion (see Fig. 7).
on measuring the transfer response to a The aim of ABC is to measure the
certain listening position and creating loudspeaker radiation resistance in a
an inverse filter. This has various prob- reference position in a reference room
lems including gain differences of and then in the target room, using the Fig. 8. The ABC system embedded
maybe 20 to 30 dB, sensitivity to ABC filter to equalize the resulting into an active loudspeaker. The
changes in listener position (coloration, response so that the target timbre is microphone is mounted on a vertical
rod, which by rotation through 180
preechoes), and nonminimum phase close to the reference timbre. This degrees effectively moves the
components in the filter. Furthermore, a assumes that the reference response microphone to a position 4 cm further
constant amplitude transfer function is timbre is the desired one, but allows away from the diaphragm.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 437
DSP in Loudspeakers
Wilson et al., in “The Loud- ceived sound of the loudspeaker in
speaker-Room Interface—Con- the room and the relative impor-
trolling Excitation of Room tance of early and late sound. A
Modes,” discuss a similar con- tentative proposal is offered that
cept. They reinforce the points psychoacoustic evidence would
about the problems of equalizing suggest it is dangerous to introduce
the magnitude response at the lis- filters that have a notch greater
tening position, and they also than 6 dB (which therefore reduce
claim that it is preferable to con- the decay time of a room mode by
sider the way in which the loud- more than half), but this needs
speaker interacts with room more testing to verify.
modes. They summarize some of Waterfall plots from experi-
the problems noted in relation to mental work suggest a smoothed
loudspeaker-room mode interac- modal response resulting from
tion as follows: this system, as shown in Fig. 10.
1. Some frequencies are empha- Measurements of mode decay
sized or deemphasized, therefore time also suggest that the effect is
some notes in a series are too loud successful over a range of mea-
or missing. suring positions.
2. Some notes overhang or ring Fig. 9. System for measuring and controlling room Karjalainen et al. continue this
on much longer than others, con- modes (Figs. 9 and 10 courtesy Wilson et al.). theme in “Modal Equalization by
tributing to their dominance. Temporal Shaping of Room
3. The pitch changes during the the room between 500 and 2000 Hz. Response.” They too work on the prin-
decay of a note. Such correction can cause distortions ciple that traditional magnitude equal-
4. The pitch of short notes changes, to the direct sound signal from the ization (such as third-octave graphic EQ
such that the pitch heard is different loudspeaker, and so the question arises or basic loudspeaker equalization) can-
from the original. as to what primarily governs the per- not serve to control modal decay time
5. Echoes occur where a single tone
burst or note is changed to two or more,
shorter, tone bursts.
The energy decay rate at modal fre-
quencies tends to be longer than at non-
modal frequencies, causing overhang
and other nasty effects. One of the aims,
therefore, of the approach they describe
appears to be to create a filter such that
the combined response of the filter and
mode has a shorter decay time.
In order to devise the filters needed
to undertake this equalization, a mea-
surement to identify room modes is
required, finding those with the longest A
decay time (see Fig. 9). The authors
describe a novel method of achieving
this that attempts to overcome the prob-
lem that the measuring microphone
may be near a point of minimum SPL
in the modal response at the measuring
position. Using a long Hanning win-
dow, the impulse response is trans-
formed into the frequency domain and
even the smallest peaks are registered
as potential candidates for modes.
Information about the amplitude and
RT60 (reverberation time) of the modes
is used in a weighted fashion to deter-
mine the most critical modes for equal- B
ization. Most emphasis is placed on
RT60. Filters are then calculated with a Fig. 10. Waterfall plot showing room modes excited by the left loudspeaker (A). Plot
target RT based on the average RT in showing controlled excitation of modes from the same loudspeaker (B).
438 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
DSP in Loudspeakers
A
Fig. 12. Decay envelopes (Schroeder-integrated) for the
original impulse response (upper solid line), for the AMK
equalized (dotted line), for the ARMA equalized (dashed
line), and for the windowing mode equalized response
(lower solid line).
or the detailed structure of modal closely spaced modes and does not below 200 Hz (see Fig. 12), it was
response. So they resort to more sophis- require the prior estimation of modal found that the new windowing method
ticated forms of digital filtering in the decay rate. was more effective then either of the
time domain that attempt to alter the Fig. 11 shows the effect of such an original two methods, especially if no
detailed modal response of the loud- AMK filter, based on synthetically cre- hand-tuning was used with the AMK
speaker–room combination. ated room modes at frequencies of 50, and ARMA methods.
The principle of the approach is 55, 100, 130, and 180 Hz. The corre- The plot of decay time versus fre-
based on frequency-dependent win- sponding decay times were given as quency (see Fig. 13) shows that the
dowing of the impulse response of the 1.4, 0.8, 1.0, 0.8, and 0.7 s, and the decay time has been reduced success-
system. Essentially the impulse modal equalizer design target was to fully using the new windowing filter at
response at the listening position is reduce those decay times to 0.30, 0.30, most frequencies, close to the target
measured and the decay time deter- 0.26, 0.24, and 0.20 s (some LF rise is maximum (dashed line).
mined at each frequency, then the allowed as this is normal within room It is clear from the examples pre-
impulse response is filtered with an acoustics standards). The authors note sented in the papers summarized here
exponentially decaying function at that both types of filter tend to reduce that the use of DSP in loudspeakers
those frequencies where it is deemed the initial decay rate quite well but that can be an effective tool to counteract
necessary to reduce it. This is achieved then the decay slips back to being inadequacies in their physical charac-
by means of a relatively high-order FIR closer to the original (as noted at the teristics and in their interaction with
(finite impulse response) filter. They lower frequency end on these graphs), the room. It may also be that all-digi-
described two different filtering meth- particularly when the modes are closely tal reproduction chains will one day
ods, known as AMK and ARMA, in spaced (for instance at 50 and 55 Hz as become a reality if the problems asso-
previous papers, the second of which here). ciated with digital transducer arrays
they say is better suited to the fitting of Looking at the overall decay rate can be solved.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 439
SURROUND LIVE!
SUMMARY
Frederick Ampel
Technology Visions, Overland Park, KA, USA
O 2003 a unique
day-long event
took place at the
Manhattan Center Studio
Francisco. Graffy challenged
the attendees to start thinking
about physioceptualsophicacous-
tics, a term he created to encom-
complex’s Grand Ballroom. pass the many complex aspects
Surround Live was the first- of defining and understanding
ever comprehensive event the multichannel audio environ-
devoted exclusively to the ment and its physical and
creation, production, and reproduction psychoacoustic aspects. His presenta-
of live performance audio in multi- tion first explored some of the previous
channel surround. literature on the topic, including contri-
Surround Live brought together butions from Durand Begault on 3-D
nearly 250 working professionals from sound, Jens Blauert on spatial hearing,
a cross section of the audio industry in the extensive literature published by
a one-day interactive workshop, in David Griesinger, Brian C. J. Moore’s
conjunction with the AES 115th work on masking, David R. Perrott’s
Convention in New York, to discuss papers on audio and visual localization
the issues and technological challenges and modalities, J. Robert Stuart’s
created by presenting music, drama, defining work on the psychoacoustics
and theater in full multichannel of multichannel audio, Nick
surround audio formats to a live Zacharov’s papers covering multichan-
audience. nel level alignment, and the AES 16th
In an interesting piece of historical Conference in 1999 in Rovaniemi,
juxtaposition, the Manhattan Center Finland on spatial sound reproduction,
complex is adjacent to and connected as well as other papers from the AES
Surround Live panelists: clockwise
by passageways to what is now the New from above, Fred Ampel, Kurt Graffy, 8th, 12th, and 15th Conferences. He
Yorker Hotel, which was the site of the Kurt Eric Fischer, Bruce Olsen, and then conducted a review and refresher
very first New York AES Convention Steve Schull. for attendees on the physiological
more than a half century ago. aspects such as head shape, torso,
The Surround Live event was orga- SHeDAISY, ZZ Top, and many pinnae; perceptual aspects including
nized into several distinct sections, others—handled the console and the loudness, pitch, localization, envelop-
with formal presentations and discus- external signal processing. A lively ment; the philosophical aspects includ-
sions taking up the morning and a and intense interaction ensued between ing reality modes, visual-auditory
portion of the time after lunch. the attendees, the band, and the mixing modalities; the real-world acoustical
The event then segued into a series team, discussing various ways to aspects such as spectrum, level, signal-
of demonstrations followed by a 90- present the music experience and vari- to-noise, directivity, and reverberant
minute segment with a live band on ous ways to use electronic reverb and level.
stage, showcasing various mixing and spatial effects, and then testing those His presentation then covered a
music-presentation ideas and concepts. concepts live with the band while range of other topics related to multi-
Legendary FOH mixing engineer listening and evaluating the results channel audio including perceptual
Buford Jones—whose 30-year career immediately. environmental distortions, perception
has included tours as a live-sound The live music performance was and depth, and formats and presenta-
mixer or engineer for such luminaries provided by the Surround Live tions of sonic imaging.
as Stevie Wonder, Eric Clapton, Pink Performers featuring guitarist Jeff After a very focused question-and-
Floyd (including a Platinum Record Golub (GRP Records, www.jeffgolub. answer discussion, Bruce Olson of
for “Delicate Sound of Thunder”), com). Olson Sound Design in Minneapolis,
David Bowie, Faith Hill, Jeff Beck, The formal presentation portion of presented a detailed look at theatrical
George Harrison, James Taylor, Surround Live was opened by Kurt sound concepts and formats incorporat-
440 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
ing materials supplied by John Leonard Interaction
of Aura Sound Design in London. among the
attendees and
Issues raised by Olson included the
the panelists was
number of channels, loudspeaker loca- very lively
tion and position, audience perception throughout the
of sound effects, and spatial recreation. event.
It is hoped that a more extensive discus-
sion of this topic will be part of
Surround Live 2004 at the AES 117th
Convention in San Francisco with live
demonstrations of many of the points
from Leonard.
Next the attendees heard from Kurt
Eric Fischer of Attic Sound Design. His
sound design and production credits
include Tell Me on Sunday (Kennedy
Center), Marty (Huntington Theater (Longacre Theater), the world Following Fisher was Steve Schull
Company), Dorian (Denver Center premieres of Parade (Drama Desk of Acoustic Dimensions, Dallas,
Theater Company), Blue (Roundabout Nomination) and Whistle Down the Texas. Schull’s extensive background
Theater Company, Audelco Award), Wind, Sunset Boulevard, Jelly’s Last includes production credits on Les
Linda Eder (Carnegie Hall), Finian’s Jam, and Nick and Nora. Fisher Miserables, Grand Hotel, Cats, Lena
Rainbow (Coconut Grove Playhouse, expanded on many of the concepts Horne, Annie, Little Shop of Horrors,
The Cleveland Playhouse), Macbeth addressed by Olson, and added his prac- The Real Thing, Sophisticated Ladies,
(New York Shakespeare Festival), the tical understanding of production reali- Dreamgirls, Fences, and The Rocky
Broadway production and first national ties and the difficulties of achieving Horror Show. He has designed or
tour of Rent, Nine (Eugene O’Neill repeatability for a long running show or worked on aspects of the preservation
Theater), Jesus Christ Superstar (Ford an event that moves from location to and restoration of a number of historic
Center Theater), Fascinating Rhythm location, such as an on-the-road theaters including the New Victory
and Judgment at Nuremberg Broadway show or live music tour. Theater, the New Amsterdam ➥
SECTIONS
We appreciate the assistance of the
section secretaries in providing the
information for the following reports.
sary, which is not appropriate for third octave bandpass filter is used for available in the paper, “The Loudspeak-
DVD video playback. measuring the decay time, followed by er-Room Interface—Controlling Excita-
Modal ranges cover 20 Hz to 200 Schroeder integration. According to tion of Room Modes,” by Rhonda J.
Hz. Room effects include emphasis of Capp, a more accurate process is Wilson, Michael D. Capp, and J. Robert
some frequencies, ringing at resonant required to determine decay of individ- Stuart, Meridian Audio, AES 23rd Int’l.
frequencies, pitch changes during ual room modes. Waterfall FFT plots Conference, Copenhagen, 2003.
decay, and beat and echo effects. The do give a better resolution. The Steven Harris
room resonant modes of rigid walled Schroeder integration is prone to noise,
rooms of certain dimensions can be so special attention is given to starting
calculated. Usually, both fundamental from the peak impulse and ending just Triple Treat in New York
and harmonic modes are set up. Non- above the noise floor. As an example, Some 70 members and guests of the
rigid walls cause the modes to decay Capp showed several waterfall plots. New York Section met on December
exponentially. Wilson showed graphs Least squares regression is used to give 8, at Innovative Audio’s high-end
of the decay of room resonance, beat the optimum decay time. showrooms on Manhattan’s East Side
frequency effects, and echo effects. To identify the dominant room for a holiday party and annual triple
Although the magnitude of a room modes, all the peaks in the magnitude treat meeting.
mode varies with listening position, response must be measured. After the The evening’s format featured three
the decay time of a room mode is the decay time at each room mode is presenters, Thom Cadley, Malcolm
same at different listening positions. found, the dominant modes are identi- Addey, and Jim Anderson, who held
There is no requirement to use a spe- fied using the decay time. A more court to share their engineering skills.
cial microphone. Hence there is a sophisticated method is to calculate Cadley played DVD-Audio 5.1 sur-
focus on decay time. many decay profiles from a waterfall round mixes from his work with artists
The filter to control decay time uses plot. All the decay profiles are then Beyonce, Billy Joel, Stevie Ray Vaugh-
original Q, target Q, and center fre- summed and the largest peaks represent an, and others. Addey brought in a
quency. The resulting notch filter has the dominant modes. Example plots beautifully spacious two-track remote
a gain and bandwidth that can be cal- clearly showed the dominant modes. of a small string orchestra he recorded
culated based on target decay time and Capp described the system imple- in a West Side church. Anderson com-
room mode decay time. Plots of level mented to make these measurements, pared four release versions of Patricia
against delay time after correction which includes a microphone or SPL Barber’s Companion album on CD,
showed some sensitivity to pole fre- meter in the center of the room, a PC XRCD, SACD, and vinyl.
quency errors. To avoid errors, 32-bit and a surround decoder. An MLS test After a brief welcome and introduc-
coefficients and double precision signal is used for each loudspeaker, tion by emcee Allan Tucker, the partic-
implementations are also necessary. from which the impulse response for ipants split up into three isolated sound
Wilson said that it is not desirable to each loudspeaker is calculated. Then rooms to spend the first of three half-
create an anechoic environment. Pre- the designer performs an FFT and hour sessions listening to the presenter
vious work in the measurement of the decay time analysis, followed by de- of their choice. After thirty minutes,
reverberation time of living rooms has sign of the filters for each loudspeaker. each participant switched to another
yielded times of 0.4 to 1 second, quite Meridian has developed PC software room to experience a different show.
long compared to studios. driven by a user-friendly wizard to per- In this way, everyone got to enjoy all
Some work was done to make sure form the measurements and correction. three programs.
that the direct response was not overly Capp discussed a sample room and The central meeting area provided a
corrupted, and plots showed that the showed magnitude responses for dif- perfect spot for relaxing, schmoozing,
direct response changes phase when ferent listening positions before correc- and snacking. Conversations casually
the exciting signal is switched off. tion. Postcorrection magnitude and overheard touched on topics such as
Music is much more complex than decay responses showed clear the placement of Blumlein micro-
tones. All prefiltering is done below improvement. The filters generated for phones, the difficult state of the
250 Hz. The trade-off is to prefilter one listening position improved the recording business, and postulations
enough to reduce the reverberation responses for many listening positions. on why each attendee could not go
without affecting the direct response In conclusion, this new spatially home with a complimentary plasma
too much. Notches up to 6 dB deep are robust filter design technique uses only display. Maybe next year.
okay. Larger notches are more one microphone position but controls Allan Tucker
problematic. the decay time at multiple positions.
Capp continued, saying that the stan- The filter design is automated and does
dard process is used to characterize the not require specialist knowledge. Webster Reports Growth
decay time in the range 500 Hz to 2 According to Capp and Wilson, so far Students from the Webster Universi-
kHz and then to set a target decay time initial feedback from dealers and cus- ty Student Section are pleased to
for modes below 250 Hz, which tomers has been positive. report that a number of new students
increases as frequency decreases. A More information on this topic is have applied for membership and ➥
that the section appears to be steadily of audible distortion. gram included sing-alongs of Christmas
growing. The section welcomed some Because the presentations and songs, a condensed and minimalist per-
of these new members at a meeting demonstration ended late, the group formance of Bizet’s Carmen, and a
held on January 13 at the university. didn’t get a chance to hear the concert- medley of Hanukkah tunes that featured
Section chair John Jory talked to the sized system that had been set up out- Josef Burgstaller’s remarkable imitation
32 students present and described both doors. of a klezmer clarinetist, performed on a
the section’s achievements and hopes Bob Lee piccolo trumpet.
for the upcoming semester. The stu- The group thanked Kahne Krause
dents worked on a radio spot to Disney Concert Hall for putting the program together. She
announce the surround sound demon- Thanks to the efforts of section trea- did it almost single-handedly (with
stration on February 17. Eric Black- surer Kahne Krause, the section some help from her husband Ira), han-
mer, chairman of Earthworks, agreed enjoyed a holiday concert on Decem- dling the tickets, reservations, dinner,
to give a lecture to the group sometime ber 13 by the Canadian Brass at the and guest speaker.
in the future. In addition, the section new Walt Disney Concert Hall. In a
plans to hold elections for new com- pre-concert dinner at McCormick and
mittee members. Officers will remain Schmick’s, Yasushisa Toyota, a prin- Penn State Wrap-up
in their positions for the next semester. cipal designer for Nagata Acoustics, The Penn State Section announces
Andy Weidmann spoke to the group about the acousti- membership growth due to the group’s
cal design and lineage of the hall. successful attempts to reach out to the
Toyota began by explaining the char- campus and community. The section
Planar Transducers acteristics of two leading types of con- is now on a campaign to gain more
The Los Angeles Section’s November cert hall architecture: the shoebox and national members, as well as add more
25 meeting was hosted by HPV Tech- the vineyard. He showed photos and variety to meeting topics.
nologies in Costa Mesa. The company diagrams of halls representative of the On September 29, 30 people gathered
develops, manufactures, and markets two philosophies. The shoebox, com- to hear Robin Miller of Filmmaker, Inc.
concert loudspeaker systems using pact and rectangular shape, offers excel- Miller gave an abbreviated version of
planar transducers. lent acoustics to the listening his talk for the 115th AES meeting in
Before entering the meeting, each audience but is limited in usable size. New York. The presentation outlined
person was required to sign a nondis- To accommodate larger numbers of his patent-pending method of recording
closure agreement by company staff. patrons, designers have increasingly 3-D sound (PerAmbio 3D/2D) to more
Dragoslav Colich began his talk turned to the vineyard approach, so completely capture the sound field
with the recent evolution of planar named because the curving rows and experienced by a listener in a live
transducers. He spoke of his own jour- fractured seating sections as viewed acoustic space. These recordings can be
ney in audio transducer research and from the stage resemble the hills of a reproduced in 2-D on current 5.1/6.1
development, including a description vineyard. surround sound systems. The complete
of the magnet arrangements used in According to Toyota, a hurdle to be 3-D sound field can be reproduced with
such drivers. overcome in the vineyard design is the the help of a decoder.
HPV integrates these relatively issue of primary reflecting surfaces. In a On October 7, nine members joined
small, flat drivers into large arrays, shoebox, the walls define the room life fellow and former AES governor
creating highly efficient yet remark- boundaries and are fairly close to the (1977-1979) Geoffrey Wilson in a dis-
ably lightweight systems. Colich ex- listening audience, providing strong pri- cussion entitled “British Audio—75
plained the benefits and restrictions of mary reflections to reinforce the direct Years Ago.” In his talk, Wilson remi-
arraying the drivers. Two end users, sound from the stage. Vineyard design- nisced about his boyhood memories of
including the well-known event audio ers incorporate numerous small, often his father, Percy, who was technical
specialist Gary Hardesty, described curving walls that not only separate the advisor to The Gramophone Magazine
their experience with using the arrayed seating sections from one another but from the late 1920s until the start of
systems in situations ranging from the also serve as primary reflectors. The World War II in 1939. Most of his
Arco Arena in Sacramento to a Broad- Walt Disney Concert Hall is but the lat- memories had to do with the large horns
way on Times Square show, to an out- est example of this vineyard design. that Geoffrey’s father kept around the
door concert in Washington, D.C. For most of the AES contingent, this house, as well as his experiments with
After the presentations came the concert was the first visit beneath and early phonographs—many of which
demonstration of what these loud- within the wavy stainless steel of archi- were kept in their living room. The
speaker systems can do, with seg- tect Frank Gehry’s conception, which meeting served as a wonderful excur-
ments of jazz, pop, and classical was certainly a treat. The Canadian sion into the history of audio.
music. Some was quite loud and chest- Brass, a quintet of tuba, trombone, Wilson showed many slides, most of
thumping—especially the subwoofer- French horn, and two trumpets enter- which were taken from a 1974 paper
fed cannon shots in Tchaikovsky’s tained with a winning combination of “Horn Theory and the Phonograph,”
1812 Overture Solenelle—yet devoid virtuoso playing and humor. The pro- which historically detailed the devel-
DEVELOPMENTS
Product information is provided as a
service to our readers. Contact manu-
facturers directly for additional infor-
mation and please refer to the Journal
of the Audio Engineering Society.
and a 6.1-capable bass management the A-20, there are no parallel surfaces
system for flexibility. All electronics inside the M-20’s cabinet, so internal
are integrated into the 7073A sub- standing waves remain random and
woofer cabinet, including active provide a frequency response of 45 Hz
crossover filters, driver overload pro- to 20 kHz +/- 5 dB. Cabinet dimen-
tection circuits and power amplifiers. sions are 15-in x 13-in x 10-in and
The bass management system features each monitor weighs 18 lbs. The M-80
six inputs and outputs, LFE input and is sold by the channel and includes a
summed signal output connectors. 250-W monaural control amplifier that
Dedicated LFE input has a low-pass fits in a single rack space. NHTPro,
filter selectable to 85 or 120 Hz, plus a 6400 Goodyear Road, Benicia, CA
+10 dB sensitivity switch. If the LFE 94510, USA; tel. 800-648-9993 (toll
channel includes frequencies above free); fax +1 707 747 1273; Web site
the crossover frequency, the system’s www.nhtpro.com.
redirect function sends them to the
AES SUSTAINING MEMBER center front output to ensure they are AES SUSTAINING MEMBER
BROADCAST LINE MICRO- audible. Genelec Inc., 7 Tech Circle, WIRELESS SYSTEM for theater and
PHONE is the BCM 104 for radio Natick, MA 01760, USA; tel. +1 508 event production is composed of the
announcing. The microphone features a 652 0900; fax +1 508 652 0909; e- Artist Elite 5000 System, AT899 Sub-
K 104 large-diaphragm condenser with mail genelec.usa@genelec.com; Web miniature lavalier, and a transmitter
cardioid directional pattern and switch- site www.genelec.com. system. The Artist Elite 5000 Wireless
able proximity effect compensation. A System offers frequency-agile opera-
high-pass filter reduces frequencies tion with IntelliScan™ frequency
below 100 Hz by 12 dB/octave. A sec- selection for a choice of two hundred
ond, pre-attenuation switch allows the selectable UHF channels. The system
sensitivity to be reduced by 14 dB to also features PC-compatible control
optimize performance for circuits software, easy setup and maximum
designed for dynamic microphones. operation flexibility. The AT899cW
Both switches are internally mounted subminiature condenser microphone
within the microphone housing. The measures 5 mm in diameter and fea-
BCM 104 head grille twists off easily tures a low-profile housing with inter-
for quick cleaning. Additionally, a fit- nal construction designed to minimize
ted elastic mount prevents structure- noise from handling, clothing, cos-
borne noise and is compatible with tumes, and wind. The AEW-5111 UHF
standard broadcast-segment micro- Wireless Dual-UniPak™ Transmitter
phone arms. Neumann, 1 Enterprise System includes an AEW-R5200 dual
Drive, Old Lyme, CT 06371, USA; tel. receiver and two AEW-T1000 Uni-
+1 860 434 5220; fax +1 860 434 REFERENCE/MIXING MONITOR Pak™ transmitters. Finally, at the heart
3148; Web site ww.neumannusa.com. incorporates the drivers from the high- of the system is the AEW-R5200
ly acclaimed A-20 in a radically new receiver, which combines two indepen-
ACTIVE SUBWOOFER SYSTEM cabinet design that disperses sound dent receivers in a single full-rack
is designed for large-scale installations over a wider listening area. The new housing. The 5000 Series Wireless
in either stereo or surround. The M-20 features a flat, wide baffle across Systems are available with a choice of
7073A features four 12-in drivers, the tweeter and a narrower baffle sur- two condenser and two dynamic hand-
fast-acting low-distortion amplifiers, rounding the woofer for more even dis- held Artist Elite microphone/transmit-
124-dB SPL capability down to 19 Hz persion and enhanced imaging. Unlike ters. Audio Technica U.S., Inc., ➥
DEVELOPMENTS
1221 Commerce Drive, Stow, OH AVAILABLE
44224, USA; tel. +1 330 686 2600; fax
+1 330 688 3752; Web site www.
audio-technica.com.
LITERATURE
The opinions expressed are those of
SOFTWARE OPTION complements the individual reviewers and are not
the signal analyzers of the Rohde & necessarily endorsed by the Editors of
Schwarz FSQ series as well as the the Journal.
company’s FSU and FSP spectrum
analyzers. The software expands the
applications by measurement functions CATALOGS, BROCHURES… scientific disciplines. The Century of
in accordance with the 3GPP2 specifi- Science initiative will extend through
cations for mobile phones. Thus, mea- A new catalog of miniature micro- 2004, with new material available in
surements in line with the 1 x EV-DV phones and accessories is now avail- 2005.
standard are now feasible for the first able. The 34-page, full color catalog has Thomson ISI, The Thomson Corpo-
time. The regular cdma2000 standard, details of DPA’s MSS6000 Microphone ration is in Stamford, Connecticut,
which is supported particularly in Asia Summation System and the MPS6000 USA; tel: 215-366-0100, ext. 1396,
and North America, is also covered. Microphone Power Supply, as well as Internet: www.thomsonisi.com.
Revision C of the 1xEV-DV standard all current miniature microphone prod-
enables a higher data transmission rate ucts and solutions. Also featured is the Soundspace-Architecture for Sound
compared to the regular cdma2000 company’s range of 33 connection and Vision by Peter Grueneisen is
standard, allowing additional data ser- adapters, which support more than 80 published by Birkhauser Publishing
vices. Used in combination with soft- different wireless transmitter systems. Ltd., Basel, Switzerland. The 240-page
ware option FS-K83, the signal and A wide selection of mounts, grids, book is written and compiled by studio
spectrum analyzers are ideally suited windscreens, clips and magnets, bau:ton’s founder and principal archi-
for use in the development, production, including the popular MHS6001 Hold- tect. It contains issues central to build-
and quality assurance of mobile er for Strings, and the EMK and ing for the creation of sound, picture
phones that function in accordance FMK4071 microphone kits developed and contemporary media. With over
with the cdma2000 and 1xEV-DV for film and broadcast use are includ- 700 illustrations, the book covers
standards. Rohde & Schwarz GmbH & ed. The illustrated catalog contains diverse topics aimed at professionals,
Co. KG, Mühldorfstr. 15, D-81671 specifications as well as a detailed students, and music and film lovers as
Munich, Germany; tel. +49 89 4129 description of product applications. well as architecture aficionados.
13779; fax +49 89 4129 13777; e-mail For a copy contact: DPA Microphones The relationship between space,
c u s t o m e r s u p p o r t @r s d . r o h d e - A/S, Gydevang 42-44, DK-3450 sound, and vision is explored. Recent
schwarz.com; Web site www.rohde- Alleroed, Denmark; tel: +45-4814- developments in the music, film, and
schwarz.com. 2828, fax: +45-4814-2700, Internet: media industries have resulted in new
www.dpamicrophones.com; e-mail: types of buildings. These projects
AES SUSTAINING MEMBER info@dpamicrohpones.com. require a comprehensive approach
ANALOG AND DIGITAL CON- from various disciplines to bridge
VERTERS manage complex titles for IN BRIEF AND OF INTEREST… architecture, art and technology. The
high-resolution/multichannel music book is based on the experiences of
mastering. Meitner’s ADC8 provides Thomson ISI has announced that it is studio bau:ton over 13 years in com-
accuracy in converting analog masters expanding its Web of Science® cov- bining these disciplines.
to the digital domain with the proper erage with a Century of Science™, a Essays and projects by renowned
amount of warmth. The DAC8 sends new initiative that will provide the experts in audio engineering, science,
audio from the digital to the analog research community with access to the music and architecture are included. A
arena with no loss of quality in the world’s most influential scientific chapter on buildings and projects cov-
conversion process. During the master- research throughout the 20th Century. ers over forty projects, which include
ing process, the user is able to control This comprehensive archive, Web architectural competitions for muse-
all of the audio sources using the Meit- of Science, The Century of Science, ums and concert halls, music studios,
ner Switchman source controller, includes bibliographic data from the film facilities, TV, and radio broadcast
which allows the user to A/B audio highest impact scientific literature studios. Many of the buildings are pre-
from the mastering console, analog published between 1944-2000. It will sented in a beautifully illustrated port-
tape decks, and DVD player. SADiE, add nearly 850 000 articles from folio. The format is accessible and can
Inc., 475 Craighead Street, Nashville, approximately 200 journals. The be enjoyed by beginners as well as
TN 37204, USA; tel. +1 615 327 1140; Thomson ISI editorial team carefully professionals.
fax +1 615 327 1699; e-mail: selected journals based on criteria The hardcover book costs $77 and is
sales @ sadieus.com; Web site such as citation patterns, geographic available online at www.birkhauser.ch.
www.sadie.com. origin and meaningful balance across ISBN: 3-7643-6975-2.
G
Barbara Vlahides len Akins, AES life member, accomplished with speed, innova-
301 E. 22nd St., #11D, New York, NY 10010 died of heart failure on tion, and economy.
(IAR) November 14, 2003, in Los Akins retired in 1977 after leading
Aaron Vlasnik Angeles, CA. He was 87 years old. his departments with a calm, wise,
801 E. Benjamin, Dorm Box 218, Norfolk, Atkins was born in York, PA, in and humanistic statesmanship that
NE 68701
1916. While attending Gettysburg belied the frenzy of television broad-
Jelle Vlietstra College, he worked as a theater pro- casting. He then traveled the U. S.
Appelhof 7, NL 9201 KT, Drachten, jectionist, which spurred his interest in and worldwide with his wife Alice,
Netherlands (NES)
all things audio. Trained as an elec- visiting all continents but Antarctica.
Robin Watkins tronics engineer, Glen was employed Audio was equally important as
467 Napoleon Ave., Columbus, OH 43213 by the International Telephone and video in Glen’s mind. He encouraged
(HPTU)
Telegraph in their short wave and promoted the design and con-
Bethany Watson radio transmitter final test department. struction of production audio con-
2827 N. Spaulding #3, Chicago, IL 60618 In 1942, he joined the Office of soles and communication systems,
(CC)
War Information, and after a series which were not available commer-
Felipe Wernet of harrowing transportation incidents cially at the time. At home, he was an
Schrijnwerker 12, NL 3201 TK, Spijkenisse,
spent four years in China setting up avid gardener, an ardent amateur
Netherlands (NES)
and maintaining radio transmitting radio operator, and a hi-fi audio
Daniel White equipment, which often involved enthusiast. Loudspeaker systems
1135 S. 15th #4, Lincoln, NE 68502
dubious political consequences. This were his particular interest. He built
Felton White position involved associations with much of his own equipment. Ham
P.O. Box 5275, River Forest, IL 60305 (CC) prominent journalists and political swap meets and conventions contin-
Dennis Wikstroem figures in the area, which led to an ued to be a part of his life. He never
Ankarskatavagen 83 A, SE 941 31, Pitea, additional post of gathering and stopped learning.
Sweden (ULP) broadcasting news from China for Don McCroskey
Andreas Woerle the CBS radio network.
Zimmerplatzgasse 1, AT 8010, Graz, Austria In 1948, Glen joined the sound oseph Habig, AES life mem-
(GZ)
Endale Worku
1831 8th Ave. #502, Seattle, WA 98101
(TAIS)
department of RKO Studios in Holly-
wood, CA. When commercial televi-
sion became a reality, he moved to
J ber, died of Parkinson’s disease
on September 21, 2003, in Tin-
ton Falls, NJ. He was 79 years old.
ABC television, Hollywood, in 1951. Born in New York City, Habig
Christopher Wraith Hired as a video projectionist, he received a bachelor’s degree in music
The Coach House, 46 Painshawfield Rd., rapidly advanced to positions involv- education at the City College of New
Stocksfield, Northumberland, NE43 7QY,
UK ing the planning and maintenance of York. He also attended Julliard and
new equipment and facilities. He the Manhattan School of Music.
Fan Wu developed a working 3-D video sys- Trained as a classical musician, he
P. O. Box 147, Beijing Broadcasting Institute,
Beijing 100024, Peoples Republic of China tem, which was demonstrated at the worked with many of the top classi-
1953 NAB convention. cal artists during his career as a
Phillip Yarrow
In 1960, he designed the first producer. He also recorded major
750 Font Blvd. #C332, San Francisco, CA
94132 (SFU) video/audio/machine control popular and jazz artists.
routing switcher that served 24 A Grammy Award-winning record
Chris Yates
297 Commssioners Rd. E., London, N6C
sources to six control rooms. producer, Habig won the award for
2T3, Ontario, Canada Automation of video and audio his recording of Stravinsky’s Sym-
switching for the local station phony of Psalms. For 10 years he
Anthony N. Yeager
813 Sullivan Dr., Lansdale, PA 19446 control room was another pro- played the trombone with various
ject that involved unique symphony orchestras. He later
Emi S. Yonemura approaches. Many landmark became an artists’ and repertoire pro-
5312 Markwood Ln., Fair Oaks, CA 95628
(ARC) broadcasts bore his fingerprints, ducer at RCA Victor Red Seal for 19
such as the 1960 Democratic years (1954 to 1973). After that he
Luis G. Zamora Manriquez
convention, Nixon Kennedy de- worked as an executive producer at
Avenida Chillan 27 85, Independencia,
Santiago, Chile bates, and the Wide World of Reader’s Digest Music Division for
Sports. The conversion to color 16 years. He is survived by his wife,
Jose R. Zapata Gonzalez
cra 64A #39-15 Apto. 102, Medellin,
television under his aegis was Virginia Harlow Habig.
Antioquia, Colombia (LAU)
The AES 26th International Conference intends to explore the new insights in analog audio technology that have contributed to the
overall increase in the subjective and objective quality of modern digital audio systems. The resolution of digital audio systems,
both in the time domain and in the amplitude domain, has undergone a spectacular improvement in recent years. Because nearly
all digital audio signals are derived from analog microphone signals, the recording industry has directed new efforts to the design
of low-level and line-level analog circuitry to keep up with the increasing demands of the digital audio world. In particular, analog
microphone amplifiers and associated circuits such as line drivers, power supplies, cables, etc., have to match the quality of the
modern high-resolution A-to-D converters. Recent years have also seen an increase in the attention paid to a system’s ability to
create an illusion of depth, of space surrounding the performers, and of the feeling of “being there.” The relationship of these sub-
jective experiences with aspects of the design of the equipment will be one of the main topics of this conference. Because of the
subjective nature of this field, a preference will be given to papers that combine a lecture with a listening demonstration. For these
demonstrations three identical first-class listening rooms equipped for stereo listening will be available at the Polyhymnia Studios.
The AES 26th Conference Committee invites submission of technical papers and proposals for demonstrations at the conference in
October 2004 in Baarn. By 2004 May 17, a proposed title, 60- to 120-word abstract, and 500- to 1000-word precis of the paper
should be submitted via the Internet to the AES26th Conference paper-submission site at www.aes.org/26th_authors. You can
visit this site for more information and complete instructions for using the site anytime after 2004 March 19. The author’s informa-
tion, title, abstract, and precis should all be submitted online. The precis should describe the work performed, methods employed,
conclusion(s), and significance of the paper. Titles and abstracts should follow the guidelines in Information for Authors at
www.aes.org/journal/con_infoauth.html. Acceptance of papers will be determined by the 26th Conference review committee
based on an assessment of the abstract and precis.
EASTERN REGION, AES Student Section Fax +1 973 720 2217 PENNSYLVANIA
USA/CANADA Peabody Institute of Johns E-mail wpu@aes.org Carnegie Mellon University
Hopkins University Section (Student)
Vice President: Recording Arts & Science Dept. NEW YORK
Thomas Sullivan
Jim Anderson 2nd Floor Conservatory Bldg. Fredonia Section (Student) Faculty Advisor
12 Garfield Place 1 E. Mount Vernon Place Bernd Gottinger, Faculty Advisor AES Student Section
Brooklyn, NY 11215 Baltimore, MD 21202 AES Student Section Carnegie Mellon University
Tel. +1 718 369 7633 Tel. +1 410 659 8100 ext. 1226 SUNY–Fredonia University Center Box 122
Fax +1 718 669 7631 E-mail peabody@aes.org 1146 Mason Hall Pittsburg, PA 15213
E-mail vp_eastern_usa@aes.org Fredonia, NY 14063 Tel. +1 412 268 3351
MASSACHUSETTS Tel. +1 716 673 4634 E-mail carnegie_mellon@aes.org
UNITED STATES OF Berklee College of Music Fax +1 716 673 3154
E-mail fredonia@aes.org Duquesne University Section
AMERICA Section (Student) (Student)
Eric Reuter, Faculty Advisor Institute of Audio Research Francisco Rodriguez
CONNECTICUT Berklee College of Music Section (Student) Faculty Advisor
University of Hartford Audio Engineering Society Noel Smith, Faculty Advisor AES Student Section
c/o Student Activities
Section (Student) AES Student Section Duquesne University
Timothy Britt 1140 Boylston St., Box 82 Institute of Audio Research School of Music
Faculty Advisor Boston, MA 02215 64 University Pl. 600 Forbes Ave.
AES Student Section Tel. +1 617 747 8251 New York, NY 10003 Pittsburgh, PA 15282
University of Hartford Fax +1 617 747 2179 Tel. +1 212 677 7580 Tel. +1 412 434 1630
Ward College of Technology E-mail berklee@aes.org Fax +1 212 677 6549 Fax +1 412 396 5479
200 Bloomfield Ave. E-mail iar@aes.org E-mail duquesne@aes.org
West Hartford, CT 06117 Boston Section
Tel. +1 860 768 5358 J. Nelson Chadderdon New York Section Pennsylvania State University
Fax +1 860 768 5074 c/o Oceanwave Consulting, Inc. Bill Siegmund Section (Student)
E-mail aes@hartfordaes.org 21 Old Town Rd. Digital Island Studios Dan Valente
Beverly, MA 01915 71 West 23rd Street Suite 504 AES Penn State Student Chapter
FLORIDA Tel. +1 978 232 9535 x201 New York, NY 10010 Graduate Program in Acoustics
Full Sail Real World Fax +1 978 232 9537 Tel. +1 212 243 9753 217 Applied Science Bldg.
Education Section (Student) E-mail boston@aes.org E-mail new_york@aes.org University Park, PA 16802
Bill Smith, Faculty Advisor New York University Section Home Tel. +1 814 863 8282
AES Student Section University of Massachusetts (Student) Fax +1 814 865 3119
Full Sail Real World Education –Lowell Section (Student) Robert Rowe, Faculty Advisor E-mail penn_state@aes.org
3300 University Blvd., Suite 160 John Shirley, Faculty Advisor Steinhardt School of Education
Winter Park, FL 327922 AES Student Chapter 35 West 4th St., 777G Philadelphia Section
Tel. +1 800 679 0100 University of Massachusetts–Lowell New York, NY 10012 Rebecca Mercuri
E-mail full_sail@aes.org Dept. of Music Tel. +1 212 998 5435 P. O. Box 1166.
35 Wilder St., Ste. 3 E-mail nyu@aes.org Philadelphia, PA 19105
University of Miami Section Lowell, MA 01854-3083 Tel. +1 215 327 7105
(Student) Tel. +1 978 934 3886 NORTH CAROLINA E-mail philly@aes.org
Ken Pohlmann, Faculty Advisor Fax +1 978 934 3034
AES Student Section E-mail umass_lowell@aes.org Appalachian State University VIRGINIA
University of Miami Section (Student) Hampton University Section
School of Music Worcester Polytechnic Michael S. Fleming (Student)
PO Box 248165 Institute Section (Student) Faculty Advisor Bob Ransom, Faculty Advisor
Coral Gables, FL 33124-7610 William Michalson Sonaura Sound AES Student Section
Tel. +1 305 284 6252 Faculty Advisor 152 Villafe Drive Hampton University
Fax +1 305 284 4448 AES Student Section Boone, NC 28607 Dept. of Music
E-mail miami@aes.org Worcester Polytechnic Institute Tel. +1 828 263 0454 Hampton, VA 23668
100 Institute Rd. E-mail appalachian@aes.org Office Tel. +1 757 727 5658,
GEORGIA Worcester, MA 01609 +1 757 727 5404
Tel. +1 508 831 5766 University of North Carolina
Atlanta Section at Asheville Section (Student) Home Tel. +1 757 826 0092
Robert Mason E-mail wpi@aes.org Fax +1 757 727 5084
Wayne J. Kirby
2712 Leslie Dr. Faculty Advisor E-mail hampton_u@aes.org
Atlanta, GA 30345 NEW JERSEY
AES Student Section
Tel./Fax +1 770 908 1833 University of North Carolina at WASHINGTON, DC
William Paterson University
E-mail atlanta@aes.org Section (Student) Asheville American University Section
David Kerzner, Faculty Advisor Dept. of Music (Student)
MARYLAND AES Student Section One University Heights Rebecca Stone-gordon
Peabody Institute of Johns William Paterson University Asheville, NC 28804 Faculty Advisor
Hopkins University Section 300 Pompton Rd. Tel. +1 828 251 6487 AES Student Section
(Student) Wayne, NJ 07470-2103 Fax +1 828 253 4573 American University
Neil Shade, Faculty Advisor Tel. +1 973 720 3198 E-mail north_carolina@aes.org 4400 Massachusetts Ave., N.W.
458 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
SECTIONS CONTACTS
DIRECTORY
Washington, DC 20016 Faculty Advisor University of Michigan OHIO
Tel. +1 202 885 3242 AES Student Section Section (Student) Cincinnati Section
E-mail american_u@aes.org 676 N. LaSalle, Ste. 300 Jason Corey, Dan Scherbarth
Chicago, IL 60610 Faculty Advisor Digital Groove Productions
District of Columbia Section Tel. +1 312 344 7802 University of Michigan School
John W. Reiser 5392 Conifer Dr.
Fax +1 312 482 9083 of Music Mason, OH 45040
DC AES Section Secretary E-mail columbia@aes.org 1100 Baits Drive
P. O. Box 169 Tel. +1 513 325 5329
University of Illinois at Ann Arbor, MI 48109 E-mail cincinnati@aes.org
Mt. Vernon, VA 22121-0169 E-mail univ_michigan@aes.org
Tel. +1 703 780 4824 Urbana-Champaign Section Ohio University Section
Fax +1 703 780 4214 (Student) West Michigan Section (Student)
E-mail dc@aes.org Mark Hasegawa-Johnson, Carl Hordyk Erin M. Dawes
Faculty Advisor Calvin College AES Student Section
CANADA AES Student Section 3201 Burton S.E. Ohio University, RTVC Bldg.
University of Illinois, Urbana- Grand Rapids, MI 49546 9 S. College St.
McGill University Section Champaign Tel. +1 616 957 6279 Athens, OH 45701-2979
(Student) Urbana, IL 61801 Fax +1 616 957 6469 Home Tel. +1 740 597 6608
John Klepko, Faculty Advisor E-mail urbana@aes.org E-mail west_mich@aes.org E-mail ohio@aes.org
AES Student Section
McGill University INDIANA MINNESOTA University of Cincinnati
Sound Recording Studios Section (Student)
Ball State University Section Music Tech College Section Thomas A. Haines
Strathcona Music Bldg. (Student) (Student)
555 Sherbrooke St. W. Faculty Advisor
Michael Pounds, Faculty Advisor Michael McKern AES Student Section
Montreal, Quebec H3A 1E3 AES Student Section Faculty Advisor
Canada University of Cincinnati
Ball State University AES Student Section College-Conservatory of Music
Tel. +1 514 398 4535 ext. 0454 MET Studios Music Tech College
E-mail mcgill_u@aes.org M.L. 0003
2520 W. Bethel 19 Exchange Street East Cincinnati, OH 45221
Toronto Section Muncie, IN 47306 Saint Paul, MN 55101 Tel. +1 513 556 9497
Anne Reynolds Tel. +1 765 285 5537 Tel. +1 651 291 0177 Fax +1 513 556 0202
606-50 Cosburn Ave. Fax +1 765 285 8768 Fax +1 651 291 0366 E-mail Cincinnati@aes.org
Toronto, Ontario M4K 2G8 E-mail ball_state@aes.org E-mail
Canada musictech_student@aes.org TENNESSEE
Tel. +1 416 957 6204 Central Indiana Section Belmont University Section
James Latta Ridgewater College,
Fax +1 416 364 1310 Hutchinson Campus Section (Student)
E-mail toronto@aes.org Sound Around Wesley Bulla, Faculty Advisor
6349 Warren Ln. (Student)
Dave Igl, Faculty Advisor AES Student Section
Brownsburg, IN 46112 Belmont University
CENTRAL REGION, Office Tel. +1 317 852 8379 AES Student Section
F Ridgewater College, Hutchinson Nashville, TN 37212
USA/CANADA Fax +1 317 858 8105 E-mail Belmont@aes.org
E-mail central_indiana@aes.org Campus
2 Century Ave. S.E. Middle Tennessee State
Vice President: KANSAS Hutchinson, MN 55350 University Section (Student)
Frank Wells E-mail ridgewater@aes.org Phil Shullo, Faculty Advisor
Kansas City Section
2130 Creekwalk Drive Jim Mitchell AES Student Section
Murfreesboro, TN Upper Midwest Section Middle Tennessee State University
Custom Distribution Limited Greg Reierson
Tel. +1 615 848 1769 12301 Riggs Rd. 301 E. Main St., Box 21
Rare Form Mastering Murfreesboro, TN 37132
Fax +1 615 848 1108 Overland Park, KS 66209 4624 34th Avenue South
E-mail vp_central_usa@aes.org Tel. +1 913 661 0131 Tel. +1 615 898 2553
Minneapolis, MN 55406 E-mail mtsu@aes.org
Fax +1 913 663 5662 Tel. +1 612 327 8750
UNITED STATES OF E-mail upper_midwest@aes.org Nashville Section
AMERICA LOUISIANA Tom Edwards
New Orleans Section MISSOURI MTV Networks
ARKANSAS Joseph Doherty St. Louis Section 330 Commerce St.
University of Arkansas at Factory Masters John Nolan, Jr. Nashville, TN 37201
Pine Bluff Section (Student) 4611 Magazine St. 693 Green Forest Dr. Tel. +1 615 335 8520
Robert Elliott, Faculty Advisor New Orleans, LA 70115 Fenton, MO 63026 Fax +1 615 335 8625
AES Student Section Tel. +1 504 891 4424 Tel./Fax +1 636 343 4765 E-mail nashville@aes.org
Music Dept. Univ. of Arkansas Cell +1 504 669 4571 E-mail st_louis@aes.org SAE Nashville Section (Student)
at Pine Bluff Fax +1 504 899 9262 Larry Sterling, Faculty Advisor
1200 N. University Drive E-mail jdoherty@accesscom.net Webster University Section AES Student Section
Pine Bluff, AR 71601 (Student) 7 Music Circle N.
Tel. +1 870 575 8916 MICHIGAN Faculty Advisor: Nashville, TN 37203
Fax +1 870 543 8108 Detroit Section Gary Gottleib Tel. +1 615 244 5848
E-mail pinebluff@aes.org David Carlstrom Webster University Fax +1 615 244 3192
DaimlerChrysler 470 E. Lockwood Ave. E-mail saenash_student@aes.org
ILLINOIS E-mail detroit@aes.org Webster Groves, MO 63119
Tel. +1 961 2660 x7962 TEXAS
Chicago Section E-mail webster_st_louis@aes.org
Tom Miller Michigan Technological Texas State University—San
Knowles Electronics University Section (Student) NEBRASKA
Marcos (Student)
1151 Maplewood Dr. Greg Piper Mark C. Erickson
AES Student Section Nebraska Section Faculty Advisor
Itasca, IL 60143 Anthony D. Beardslee
Tel. +1 630 285 5882 Michigan Technological AES Student Section
University Northeast Community College Southwest Texas State University
Fax +1 630 250 0575 P.O. Box 469
E-mail chicago@aes.org 1400 Townsend Dr. 224 N. Guadalupe St.
121 EERC Building Norfolk, NE 68702 San Marcos, TX 78666
Columbia College Section Houghton, MI 49931 Tel. +1 402 844 7365 Tel. +1 512 245 8451
(Student) Tel. +1 906 482 3581 Fax +1 209 254 8282 Fax +1 512 396 1169
Dominique J. Chéenne E-mail michigan_tech@aes.org E-mail nebraska@aes.org E-mail tsu_sm@aes.org
San Francisco