Download as pdf or txt
Download as pdf or txt
You are on page 1of 140

AES

JOURNAL OF THE AUDIO ENGINEERING SOCIETY


AUDIO / ACOUSTICS / APPLICATIONS
Volume 52 Number 4 2004 April

In this issue...
Interpreting Multitone Distortion
Spectrum
Compensating for Lossy Voice Coils
Database Searching to Identify
Audio
Pitch Detection with Percussive
Background
Features...
25th Conference
London—Preview
Technology Overview of
Loudspeakers
DSP in Loudspeakers
Surround Live Summary
26th Conference, Baarn
® Call for Papers
STANDARDS COMMITTEE
AUDIO ENGINEERING SOCIETY, INC.
INTERNATIONAL HEADQUARTERS
60 East 42nd Street, Room 2520, New York, NY 10165-2520, USA Richard Chalmers Mark Yonge
Tel: +1 212 661 8528 .
Fax: +1 212 682 0477 Chair Secretary, Standards Manager
E-mail: HQ@aes.org .
Internet: http://www.aes.org John Woodgate
Vice Chair
Yoshizo Sohma
Vice Chair, International
Bruce Olson
ADMINISTRATION Vice Chair, Western Hemisphere
Roger K. Furness Executive Director
Sandra J. Requa Executive Assistant to the Executive Director SC-02 SUBCOMMITTEE ON DIGITAL AUDIO
OFFICERS 2003/2004 TELLERS
Christopher V. Freitag Chair Robin Caine Chair Robert A. Finger Vice Chair
Ronald Streicher President Working Groups
Theresa Leonard President-Elect TECHNICAL COUNCIL
SC-02-01 Digital Audio Measurement Techniques
Kees A. Immink Past President Wieslaw V. Woszczyk Chair Richard C. Cabot, I. Dennis, M. Keyhl
Jim Anderson Vice President Jürgen Herre and
Eastern Region, USA/Canada Robert Schulein Vice Chairs SC-02-02 Digital Input-Output Interfacing:
John Grant, Robert A. Finger
Frank Wells Vice President
Central Region, USA/Canada TECHNICAL COMMITTEES SC-02- 05 Synchronization: Robin Caine
Bob Moses Vice President ACOUSTICS & SOUND
Western Region, USA/Canada REINFORCEMENT SC-03 SUBCOMMITTEE ON THE PRESERVATION AND RESTORATION
Søren Bech Vice President Mendel Kleiner Chair OF AUDIO RECORDING
Northern Region, Europe Kurt Graffy Vice Chair
Bozena Kostek ARCHIVING, RESTORATION AND Ted Sheldon Chair Dietrich Schüller Vice Chair
Vice President, Central Region, Europe DIGITAL LIBRARIES Working Groups
Ivan Stamac David Ackerman Chair
Vice President, Southern Region, Europe
SC-03-01 Analog Recording: J. G. McKnight
AUDIO FOR GAMES
Mercedes Onorato Vice President Martin Wilde Chair SC-03-02 Transfer Technologies: Lars Gaustad, Greg Faris
Latin American Region AUDIO FOR SC-03-04 Storage and Handling of Media: Ted Sheldon, Gerd Cyrener
Neville Thiele TELECOMMUNICATIONS
Bob Zurek Chair SC-03-06 Digital Library and Archives Systems: David Ackerman,
Vice President, International Region Ted Sheldon
Andrew Bright Vice Chair
Han Tendeloo Secretary
CODING OF AUDIO SIGNALS SC-03-12 Forensic Audio: Tom Owen, M. McDermott
Marshall Buck Treasurer James Johnston and Eddy Bogh Brixen
Jürgen Herre Cochairs
GOVERNORS
AUTOMOTIVE AUDIO
Jerry Bruck Richard S. Stroud Chair SC-04 SUBCOMMITTEE ON ACOUSTICS
Curtis Hoyt Tim Nind Vice Chair
Garry Margolis HIGH-RESOLUTION AUDIO Mendel Kleiner Chair David Josephson Vice Chair
Roy Pritts Malcolm Hawksford Chair
Vicki R. Melchior and Working Groups
Don Puluse Takeo Yamamoto Vice Chairs
Richard Small SC-04-01 Acoustics and Sound Source Modeling
LOUDSPEAKERS & HEADPHONES Richard H. Campbell, Wolfgang Ahnert
Peter Swarte David Clark Chair
Kunimaro Tanaka Juha Backman Vice Chair SC-04-03 Loudspeaker Modeling and Measurement
MICROPHONES & APPLICATIONS
David Prince, Neil Harris, Steve Hutt
COMMITTEES
David Josephson Chair SC-04-04 Microphone Measurement and Characterization
AWARDS Wolfgang Niehoff Vice Chair David Josephson, Jackie Green
Garry Margolis Chair MULTICHANNEL & BINAURAL SC-04-07 Listening Tests: David Clark, T. Nousaine
CONFERENCE POLICY AUDIO TECHNOLOGIES
Søren Bech Chair Francis Rumsey Chair
Gunther Theile Vice Chair SC-05 SUBCOMMITTEE ON INTERCONNECTIONS
CONVENTION POLICY & FINANCE
NETWORK AUDIO SYSTEMS
Marshall Buck Chair Jeremy Cooperstock Chair
EDUCATION Robert Rowe and Thomas Ray Rayburn Chair John Woodgate Vice Chair
Theresa Leonard Chair Sporer Vice Chairs
Working Groups
FUTURE DIRECTIONS AUDIO RECORDING & STORAGE
SYSTEMS SC-05-02 Audio Connectors
Ron Streicher Chair
Derk Reefman Chair Ray Rayburn, Werner Bachmann
HISTORICAL Kunimaro Tanaka Vice Chair
J. G. (Jay) McKnight Chair SC-05-05 Grounding and EMC Practices
PERCEPTION & SUBJECTIVE Bruce Olson, Jim Brown
Irving Joel Vice Chair EVALUATION OF AUDIO SIGNALS
Donald J. Plunkett Chair Emeritus Durand Begault Chair
LAWS & RESOLUTIONS Søren Bech and Eiichi Miyasaka SC-06 SUBCOMMITTEE ON NETWORK AND FILE TRANSFER OF AUDIO
Vice Chairs
Theresa Leonard Chair
SEMANTIC AUDIO ANALYSIS
MEMBERSHIP/ADMISSIONS Robin Caine Chair Steve Harris Vice Chair
Mark Sandler Chair
Francis Rumsey Chair
SIGNAL PROCESSING Working Groups
NOMINATIONS Ronald Aarts Chair
Kees A. Immink Chair James Johnston and Christoph M. SC-06-01 Audio-File Transfer and Exchange
PUBLICATIONS POLICY Musialik Vice Chairs Mark Yonge, Brooks Harris
Richard H. Small Chair STUDIO PRACTICES & PRODUCTION SC-06-02 Audio Applications Using the High Performance Serial
George Massenburg Chair Bus (IEEE: 1394): John Strawn, Bob Moses
REGIONS AND SECTIONS
Alan Parsons, David Smith and
Subir Pramanik and Mick Sawaguchi Vice Chairs SC-06-04 Internet Audio Delivery System
Roy Pritts Cochairs Karlheinz Brandenburg
TRANSMISSION & BROADCASTING
STANDARDS Stephen Lyman Chair SC-06-06 Audio Metadata
Richard Chalmers Chair Neville Thiele Vice Chair C. Chambers
Correspondence to AES officers and committee chairs should be addressed to them at the society’s international headquarters.
Europe Conventions
AES REGIONAL OFFICES

Zevenbunderslaan 142/9, BE-1190 Brussels, Belgium, Tel: +32 2 345


AES Journal of the Audio Engineering Society
(ISSN 0004-7554), Volume 52, Number 4, 2004 April
7971, Fax: +32 2 345 3419, E-mail for convention information: Published monthly, except January/February and July/August when published bi-
euroconventions@aes.org. monthly, by the Audio Engineering Society, 60 East 42nd Street, New York, New
Europe Services York 10165-2520, USA, Telephone: +1 212 661 8528. Fax: +1 212 682 0477.
B.P. 50, FR-94364 Bry Sur Marne Cedex, France, Tel: +33 1 4881 4632, E-mail: HQ@aes.org. Periodical postage paid at New York, New York, and at an ad-
Fax: +33 1 4706 0648, E-mail for membership and publication sales: ditional mailing office. Postmaster: Send address corrections to AES Journal of the
euroservices@aes.org. Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520.
United Kingdom
British Section, Audio Engineering Society Ltd., P. O. Box 645, Slough, The Audio Engineering Society is not responsible for statements made by its
SL1 8BJ UK, Tel: +44 1628 663725, Fax: +44 1628 667002, contributors.
E-mail: UK@aes.org.
Japan EDITORIAL STAFF
AES Japan Section, 1-38-2 Yoyogi, Room 703, Shibuyaku-ku, Daniel R. von Recklinghausen Editor
Tokyo 151-0053, Japan, Tel: +81 3 5358 7320, Fax: +81 3 5358 7328,
E-mail: aes_japan@aes.org. William T. McQuaide Managing Editor Ingeborg M. Stochmal
Gerri M. Calamusa Senior Editor Copy Editor
AES REGIONS AND SECTIONS
Abbie J. Cohen Senior Editor Barry A. Blesser
Eastern Region, USA/Canada Mary Ellen Ilich Associate Editor Consulting Technical Editor
Sections: Atlanta, Boston, District of Columbia, New York, Philadelphia, Toronto Patricia L. Sarch Art Director
Student Sections: American University, Appalachian State University, Berklee Stephanie Paynes
College of Music, Carnegie Mellon University, Duquesne University, Fredonia, Flávia Elzinga Advertising Writer
Full Sail Real World Education, Hampton University, Institute of Audio Research, REVIEW BOARD
McGill University, New York University, Peabody Institute of Johns Hopkins
University, Pennsylvania State University, University of Hartford, University of Ronald M. Aarts Malcolm O. J. Hawksford D. Preis
Massachusetts-Lowell, University of Miami, University of North Carolina at James A. S. Angus Jürgen Herre Derk Reefman
Asheville, William Patterson University, Worcester Polytechnic Institute George L. Augspurger Tomlinson Holman Francis Rumsey
Central Region, USA/Canada Jerry Bauck Andrew Horner Kees A. Schouhamer
Sections: Central Indiana, Chicago, Cincinnati, Detroit, Kansas City,
James W. Beauchamp Jyri Huopaniemi Immink
Nashville, Nebraska, New Orleans, St. Louis, Upper Midwest, West Michigan James D. Johnston Manfred R. Schroeder
Student Sections: Ball State University, Belmont University, Columbia Col- Søren Bech
Arie J. M. Kaizer Robert B. Schulein
lege, Michigan Technological University, Middle Tennessee State University, Durand Begault James M. Kates Richard H. Small
Music Tech College, SAE Nashville, Ohio University, Ridgewater College, Barry A. Blesser
Hutchinson Campus, Texas State University–San Marcos, University of D. B. Keele, Jr. Julius O. Smith III
Arkansas-Pine Bluff, University of Cincinnati, University of Illinois-Urbana- John S. Bradley Mendel Kleiner Gilbert Soulodre
Champaign, University of Michigan, Webster University Robert Bristow-Johnson David L. Klepper Herman J. M. Steeneken
Western Region, USA/Canada John J. Bubbers Wolfgang Klippel John S. Stewart
Sections: Alberta, Colorado, Los Angeles, Pacific Northwest, Portland, Marshall Buck W. Marshall Leach, Jr. John Strawn
San Diego, San Francisco, Utah, Vancouver Stanley P. Lipshitz G. R. (Bob) Thurmond
Student Sections: American River College, Brigham Young University, Mahlon D. Burkhard
California State University–Chico, Citrus College, Cogswell Polytechnical Richard C. Cabot Robert C. Maher Jiri Tichy
College, Conservatory of Recording Arts and Sciences, Expression Center Robert R. Cordell Dan Mapes-Riordan Floyd E. Toole
for New Media, Long Beach City College, San Diego State University, San
Andrew Duncan J. G. (Jay) McKnight Emil L. Torick
Francisco State University, Cal Poly San Luis Obispo, Stanford University, The Guy W. McNally John Vanderkooy
Art Institute of Seattle, University of Colorado at Denver, University of Southern John M. Eargle
D. J. Meares Alexander Voishvillo
California, Vancouver Louis D. Fielder Robert A. Moog Daniel R. von
Northern Region, Europe Edward J. Foster Brian C. J. Moore Recklinghausen
Sections: Belgian, British, Danish, Finnish, Moscow, Netherlands, Mark R. Gander
Norwegian, St. Petersburg, Swedish James A. Moorer Rhonda Wilson
Student Sections: All-Russian State Institute of Cinematography, Danish, Earl R. Geddes Dick Pierce John M. Woodgate
Netherlands, Russian Academy of Music, St. Petersburg, University of David Griesinger Martin Polon Wieslaw V. Woszczyk
Lulea-Pitea
Central Region, Europe COPYRIGHT ONLINE JOURNAL
Sections: Austrian, Belarus, Czech, Central German, North German, Copyright © 2004 by the Audio Engi- AES members can view the Journal
South German, Hungarian, Lithuanian, Polish, Slovakian Republic, Swiss, neering Society, Inc. It is permitted to online at www.aes.org/journal/online.
Ukrainian quote from this Journal with custom-
Student Sections: Aachen, Berlin, Czech Republic, Darmstadt, Detmold, SUBSCRIPTIONS
ary credit to the source.
Düsseldorf, Graz, Ilmenau, Technical University of Gdansk (Poland), Vienna, The Journal is available by subscrip-
Wroclaw University of Technology tion. Annual rates are $190 surface
COPIES
Southern Region, Europe mail, $240 air mail. For information,
Individual readers are permitted to
Sections: Bosnia-Herzegovina, Bulgarian, Croatian, French, Greek, Israel, Ital- contact AES Headquarters.
ian, Portugal, Romanian, Slovenian, Spanish, Serbia and Montenegro, Turkish photocopy isolated ar ticles for
Student Sections: Croatian, Conservatoire de Paris, Italian, Louis-Lumière research or other noncommercial use. BACK ISSUES
Latin American Region Permission to photocopy for internal or Selected back issues are available:
Sections: Argentina, Brazil, Chile, Colombia, Ecuador, Mexico, Peru, personal use of specific clients is From Vol. 1 (1953) through Vol. 12
Uruguay, Venezuela granted by the Audio Engineering (1964), $10 per issue (members),
Student Sections: Del Bosque University, I.A.V.Q., Javeriana University, Los Society to libraries and other users $15 (nonmembers); Vol. 13 (1965) to
Andes University, Orson Welles Institute, San Buenaventura University, Taller registered with the Copyright Clear-
de Arte Sonoro (Caracas)
present, $6 per issue (members), $11
ance Center (CCC), provided that the (nonmembers). For information, con-
International Region base fee of $1 per copy plus $.50 per
Sections: Adelaide, Brisbane, Hong Kong, India, Japan, Korea, Malaysia, tact AES Headquarters office.
page is paid directly to CCC, 222
Melbourne, Philippines, Singapore, Sydney MICROFILM
Rosewood Dr., Danvers, MA 01923,
PURPOSE: The Audio Engineering Society is organized for the purpose USA. 0004-7554/95. Photocopies of Copies of Vol. 19, No. 1 (1971 Jan-
of: uniting persons performing professional services in the audio engi- individual articles may be ordered uary) to the present edition are avail-
neering field and its allied arts; collecting, collating, and disseminating from the AES Headquarters office at able on microfilm from University
scientific knowledge in the field of audio engineering and its allied arts; Microfilms International, 300 North
advancing such science in both theoretical and practical applications; $5 per article.
and preparing, publishing, and distributing literature and periodicals rela- Zeeb Rd., Ann Arbor, MI 48106, USA.
tive to the foregoing purposes and policies. REPRINTS AND REPUBLICATION
ADVERTISING
MEMBERSHIP: Individuals who are interested in audio engineering may Multiple reproduction or republica-
Call the AES Editorial office or send e-
become members of the AES. Information on joining the AES can be found tion of any material in this Journal
at www.aes.org. Grades and annual dues are: Full members and associate mail to: JournalAdvertising@aes.org.
requires the permission of the Audio
members, $95 for both the printed and online Journal; $60 for online Jour- Engineering Society. Permission MANUSCRIPTS
nal only. Student members: $55 for printed and online Journal; $20 for
online Journal only. A subscription to the Journal is included with all member- may also be required from the For information on the presentation
ships. Sustaining memberships are available to persons, corporations, or author(s). Send inquiries to AES Edi- and processing of manuscripts, see
organizations who wish to support the Society. torial office. Information for Authors.
AES
JOURNAL OF THE
AUDIO ENGINEERING SOCIETY
AUDIO/ACOUSTICS/APPLICATIONS
VOLUME 52 NUMBER 4 2004 APRIL
CONTENT
President’s Message .........................................................................................................Ron Streicher 331

PAPERS
Graphing, Interpretation, and Comparison of Results of Loudspeaker Nonlinear Distortion
Measurements........Alexander Voishvillo, Alexander Terekhov, Eugene Czerwinski, and Sergei Alexandrov 332
For loudspeaker nonlinearity, measurement techniques range from single-tone harmonic distortion, which
is easy to interpret but not indicative of performance with music, to reactions to multitone stimuli, which
are hard to interpret but highly informative. Because multitone techniques have the potential to predict the
perception of nonlinearities, the authors focus on various presentation formats and analysis techniques to
make the relevant information in the thousands of intermodulation products accessible and meaningful.

ENGINEERING REPORTS
Impedance Compensation Networks for the Lossy Voice-Coil Inductance of Loudspeaker
Drivers....................................................................................................................W. Marshall Leach, Jr. 358
The high-frequency rise in the voice-coil impedance of a loudspeaker driver produced by lossy voice-coil
inductance can be approximately cancelled by a Zobel network connected in parallel. Such networks
improve performance by presenting purely resistive impedance to the crossover network. Although higher
order networks can be used, a pair of resistors and capacitors is sufficient for typical drivers.
Scalable, Content-Based Audio Identification by Multiple Independent Psychoacoustic
Matching .............................................................................Geoff R. Schmidt and Matthew K. Belmonte 366
A software audio search system, as an analog to text searching, allows a target music sample to be
identified by matching it to a database containing an inventory of reference samples. Rather than rely on
autonomous metadata, the algorithm uses a sequence of vectors based on perceptual attributes. By
iteratively testing a progression of such vectors, the algorithm has the ability to trade accuracy versus
compute time. With increasing storage capacity to hold virtually unlimited quantities of audio data, an
efficient search algorithm is a necessity.
On the Detection of Melodic Pitch in a Percussive Background .........Preeti Rao and Saurabh Shandilya 378
Although many pitch detection algorithms have been proposed over the years, the problem is particularly
difficult when melodic instruments are accompanied by percussive background. The authors propose a
temporal autocorrelation pitch detector motivated by an auditory model that attempts to suppress errors
produced by inharmonic interfering partials of such instruments as a kick drum. Separate processing of
frequency channels proved crucial in reducing the distortion products, due to the nonlinear hair-cell model,
between the signal harmonics and the interfering partials.

STANDARDS AND INFORMATION DOCUMENTS


AES Standards Committee News........................................................................................................... 392
New AESSC chairman; audio file interchange; audio over IEEE 1394

FEATURES
25th Conference Preview, London ......................................................................................................... 402
Calendar.............................................................................................................................................. 404
Program .............................................................................................................................................. 405
Registration Form .............................................................................................................................. 411
Historical Perspectives and Technology Overview of Loudspeakers for Sound Reinforcement
............................................................................................................................J. Eargle and M. Gander 412
DSP in Loudspeakers.............................................................................................................................. 434
Surround Live Summary ...............................................................................................Frederick Ampel 440
26th Conference, Baarn, Call for Papers ............................................................................................... 457

DEPARTMENTS
Review of Acoustical Patents ...........................395 Membership Information...................................451
News of the Sections.........................................443 Advertiser Internet Directory............................453
Upcoming Meetings ..........................................447 In Memoriam ......................................................456
Sound Track .......................................................448 Sections Contacts Directory ............................458
New Products and Developments ....................449 AES Conventions and Conferences ................464
Available Literature............................................450
PRESIDENT’S MESSAGE

DISTINGUISHED SPEAKERS PROGRAM Program, to offer your services as a guest lecturer, or to


It is with great pleasure that I am able to announce two make a contribution in support of the program,
initiatives recently approved by the Board of Governors. please contact the AES Headquarters office at
The first is the Distinguished Speakers Program, which speakers@aes.org. You can also find out more about the
will establish a roster of touring guest lecturers who will program at www.aes.org/info/speakers.
be available to provide presentations at the meetings of
local sections throughout the Society. To begin the LEGACY FUND
program I would ask that anyone who travels regularly on Another new initiative is being developed to improve the
business and has the time, interest, and ability to offer a long-term financial stability of the AES. Called the
technical presentation to one or more local sections Legacy Fund, this provides an opportunity for members
contact the AES Headquarters office at speakers@aes.org to make tax-deductible contributions at any time or to
and let us know your schedule and itinerary. We will then leave bequests to the AES in their wills; similar programs
work to connect you to the appropriate section officers. are used by other professional societies. This endowment
Of course, the greater advance notice you can provide of fund will support the general operating budget and activi-
your planned trip the better we will be able to make these ties of the Society and lessen the need to increase dues or
arrangements to suit your schedule. other membership fees.
It is our goal that this project will develop to provide a As your president for this year, it is my hope that both of
wealth of interesting meetings for all of our local these initiatives will help our Society grow and prosper
sections, large and small, throughout the Society, long into the future. To accomplish this, however, we need
augmenting their activities and stimulating increased qualified people to join the Distinguished Speakers
membership in the Society. Program roster, and we need your financial support and
To support the Distinguished Speakers Program, the contributions to help us develop and sustain these and all of
Executive Committee has established a special fund that the activities of the AES.
we hope ultimately will be sufficiently endowed to be
fully self-supporting. To initiate this fund, I have made a
significant personal contribution, and I invite you, either
individually or corporately, to join me in supporting this
project. Contributions are tax deductible and will be used
solely to finance the activities of this program.
To find out more about the Distinguished Speakers Ron Streicher, President, rds@aes.org

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 331


PAPERS

Graphing, Interpretation, and Comparison of


Results of Loudspeaker Nonlinear
Distortion Measurements*

ALEXANDER VOISHVILLO, AES Member, ALEXANDER TEREKHOV, EUGENE CZERWINSKI,


AND SERGEI ALEXANDROV**

Cerwinski Laboratories Inc., Simi Valley, CA 93063, USA

Harmonic distortion and total harmonic distortion may not convey sufficient information
about nonlinearity in loudspeakers and horn drivers to judge their perceptual acceptability.
Multitone stimuli and Gaussian noise produce a more informative nonlinear response. The
reaction to Gaussian noise can be transformed into coherence or incoherence functions. These
functions provide information about nonlinearity in the form of “easy-to-grasp” frequency-
dependent curves. Alternatively, a multitone stimulus generates a variety of “visible” har-
monic and intermodulation spectral components. If the number of input tones is significant,
the nonlinear reaction may consist of hundreds, if not thousands, of distortion spectral
components. The results of such measurements are difficult to interpret, compare, and over-
lay. A new method of depicting the results of multitone measurements has been developed.
The measurement result is a single, continuous, frequency-dependent curve that takes into
account the level of the distortion products and their “density.” The curves can be easily
overlaid and compared. Future developments of this new method may lead to a correlation
between curves of the level of distortion and the audibility of nonlinear distortion. Using
nonlinear dynamic loudspeaker models, multitone and Gaussian noise test signals are com-
pared with traditional and nontraditional measurement techniques. The relationship between
harmonics and intermodulation products in static and dynamic nonlinear systems is analyzed.

0 INTRODUCTION and harmonic products are characterized by their indi-


vidual dependence on frequency. These individual depen-
Loudspeakers and horn drivers are complex nonlinear dencies prevent a substitution of intermodulation testing
dynamic systems whose nonlinear reactions to different with harmonic measurements.
stimuli may be significantly different. It has been demon- Complex signals such as multitone stimuli and Gaussian
strated earlier that traditional methods of measuring non- noise produce nonlinear reactions that carry more infor-
linear distortion using a sweeping tone, and such criteria as mation about intermodulation of various kinds and orders
harmonic distortion and total harmonic distortion (THD), in loudspeakers and horn drivers. Nonlinear reactions to
may not convey sufficient information about the nonlinear the Gaussian noise can be transformed into, for example,
properties of loudspeakers and horn drivers [1]. It has been the Wiener kernels [2], coherence or incoherence func-
demonstrated by using the multitone stimuli that the in- tions [3], [4], or the higher order spectra (HOS) [5]. Wie-
termodulation products outweigh the harmonics in even ner kernels do not have an intuitive simple interpretation.
comparatively simple nonlinear systems characterized by The coherence (or incoherence) functions and HOS pro-
static polynomial nonlinearity. This difference between vide information about the overall nonlinear behavior of a
harmonics and intermodulation products is especially pro- measured object in the form of “easy-to-grasp” frequency-
nounced if a higher order nonlinearity occurs. In complex dependent curves. However, these functions are also sen-
nonlinear dynamic systems, such as electrodynamic direct- sitive to noises and other effects (such as reflections) that
radiating loudspeakers or horn drivers, the intermodulation may “mar up” the results of nonlinear testing. In addition,
it takes a comparatively long time to measure them. His-
*Manuscript received 2002 May 9; revised 2003 December 12 torically the coherence function has been used in the test-
and 2004 February 13. ing of hearing aids [3], [4] and in the evaluation of non-
**Now with JBL Professional, Northridge, CA 91329, USA. linearity and noises in magnetic recording [6], but has
332 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

never been popular in the assessment of loudspeaker non- linearity in audio equipment since two methods were in-
linearity. The multitone stimulus has been gaining popu- troduced in the 1940s by Hilliard [19] (SMPTE inter-
larity in many applications during the last decade [7]–[12]. modulation distortion) and Scott [20] (difference-
Various aspects of using multitone signals in loudspeaker frequency or CCIF distortion). The former uses one
testing will be discussed in Section 3. The multitone sweeping and one stationary low-frequency tone of four
stimulus produces a rich spectrum of distortion products. times higher amplitude. The latter method uses two closely
The statistical distribution and crest factor of multitone spaced simultaneously swept tones. Products of the kind
signals is close to that of a musical signal. However, the Pf2±f1 and Pf2±2f1 are plotted. The frequency f1 corresponds
results of a measurement presented in the form of an out- to the fixed tone in the SMPTE method or to the lower
put spectrum are difficult to interpret, compare, and overlay. frequency tone in the CCIF method, and f2 is the frequency
A new method of depicting the results of multitone of the higher sweeping tone. Both methods do not measure
measurements has been developed in this work. According intermodulation products of order higher than three.
to this method, the result of the measurement is presented For the measurement of loudspeaker nonlinearity AES
as a single, continuous, frequency-dependent curve that standard AES2-1984 [21] recommends a measurement of
takes into account not only the distortion level of the spec- only the second- and third-order harmonic distortion. IEC
tral components but also their “density.” Many such standard 60268-5 [22] recommends that a wider circle of
curves, corresponding to different levels of input signal, characteristics be measured, including THD, individual
can be overlaid easily in a way that is practically impos- second and third harmonics, and individual second-order
sible using “unprocessed” responses to the multitone difference intermodulation products. In addition this stan-
stimulus. These two- or three-dimensional graphs can eas- dard recommends the aggregated criteria of sound pres-
ily demonstrate how the overall nonlinear distortion in a sure level (SPL) intermodulation in the form (Pf2+f1 +
device measured (loudspeaker or horn driver) increases Pf2−f1)/Pf2 and (Pf2+2f1 + Pf2−2f1)/Pf2, f2 >> f1, where the sum
with the level of the input multitone stimulus. and difference products of similar order (second or third
In the technical publications of previous years other only) are summed and related to one of two primary tones.
approaches to measure, model, and assess loudspeaker Alternative approaches to measure loudspeaker inter-
nonlinearity have been discussed. Some of these methods modulation distortion were proposed by Keele [23]. Keele
have not been included in the existing standards and prob- recommends two methods for consideration, one based on
ably never will. However, a comparative survey helps us the use of two-tonal signals, 40 and 400 Hz, of equal
to look at the problem of loudspeaker nonlinearity mea- amplitude. The percent of distortion is to be plotted as a
surements from a systematic standpoint and to understand function of input power. The other method includes a
better their meaning, advantages, and limitations. The tra- fixed-frequency upper range signal coupled with a swept
ditional and nontraditional methods that will be compared bass signal. Keele also advocates the use of the shaped
to the methods based on the application of Gaussian noise tone burst for the assessment of loudspeaker maximum
(incoherence function) and multitone stimulus are re- SPL [24].
viewed hereafter. These various methods and signals provide different
One of the unconventional methods is the measurement information about the nonlinearity in a measured loud-
and graphing of high-order frequency response functions speaker. Nevertheless, the following questions remain
(HFRFs) derived from the second- and third-order Volt- open: “What method conveys most adequate information
erra time-domain kernels. The HFRFs are three- about nonlinearity in a measured loudspeaker?”, “How
dimensional graphs, representing a “surface” of distortion well are the measurement data related to the perceived
products. Volterra series expansion stems from the funda- deterioration of sound quality or to the malfunctioning of
mental theoretical input of Volterra [13] and has been a loudspeaker?”, and “How can these data be represented
introduced by Wiener for the analysis of weakly nonlinear in the most comprehensible manner?”.
systems, characterized by low levels of distortion (see [14] This work is intended to illustrate and compare several
for a history of this subject). Since then Volterra series methods of assessment and graphical presentation of weak
expansion has been used widely in many areas where the nonlinearity in loudspeakers. The comparison is carried
structure of weakly nonlinear systems is not known, or the out using a nonlinear dynamic model of a low-frequency
parametric analysis of their behavior is too complicated. loudspeaker that includes excursion-dependent param-
Kaizer pioneered the use of Volterra series in loudspeaker eters: Bl product, suspension stiffness, voice-coil induc-
nonlinear analysis. He derived explicit expressions for tance, parainductance, eddy currents–caused resistance,
HFRFs through loudspeaker excursion-dependent param- and voice-coil current-dependent magnetic flux modula-
eters [15]. Kaizer’s research was followed by a number of tion. The models of three different woofers are used for
works (for example, [16]–[18]), where the second- and comparison: 8-in (203-mm) diaphragm long voice coil,
third-order kernels were measured, transformed into cor- 8-in (203-mm) diaphragm short voice coil, and 12-in (305-
responding HFRFs, and then plotted as three-dimensional mm) diaphragm long voice coil. The measurement results
graphs depicting loudspeaker second- and third-order dis- are simulated at different signal levels. A comparison is
tortions. Advantages and drawbacks of Volterra series ex- made of THD, harmonic distortion, Volterra second-order
pansion will be discussed in the Section 3. frequency-domain kernels, also called high-order fre-
Two-tone intermodulation distortion of the second and quency response functions (HOFRF), two-tone sum and
third order has traditionally been measured to assess non- difference intermodulation distortion, two-tone total non-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 333
VOISHVILLO ET AL. PAPERS

linear distortion, multitone intermodulation and multitone musical signals and the level of harmonic distortion can
total nonlinear distortion (MTND), and incoherence lead to wrong conclusions. The present example considers
function. the three-dimensional representation of the second-order
nonlinearity. The responses of the higher order nonlineari-
1 TEST SUBJECT, TEST RESULT ties are multidimensional functions. A real dynamic non-
PRESENTATION, AND GOAL linearity existing in loudspeakers and horn drivers is sig-
nificantly more complex than this simple example.
The possible uncertainty in the objective assessment of Imagine a hypothetical loudspeaker whose amplitude
nonlinear systems by traditional testing methods stems frequency response is presented by only a few samples at
from the complex nature of nonlinear systems. A linear a few frequencies. If there is no information about the
time-invariant system with a single input and output is behavior of the amplitude frequency response between
fully described by its pulse response (or by its complex these sparse samples, we cannot make a judgment about its
transfer function). The output signal can be calculated by performance. The response of this loudspeaker might be
the convolution of the input signal with the impulse re- perfectly flat between available samples; it might as well
sponse (in the time domain), or by multiplication of the have a strong irregularity. Similarly, a single frequency
input spectra by the complex transfer function (in the fre- response of nonlinear distortion, be it a harmonic or an
quency domain). In addition, the relationship between an intermodulation curve, conveys only limited information
input and an output signal of a linear system can also be about the nonlinearity. If there is no information about a
expressed in the form of a linear differential equation or a surface of nonlinear responses between the available cuts
linear difference equation if a linear system is discrete. of harmonic or intermodulation frequency responses, the
Simply speaking, a linear system does not add new fre- behavior of the nonlinear system cannot be assessed ac-
quency components to the output signal. curately. This statement is valid for loudspeakers and horn
The behavior of a nonlinear system is substantially drivers, which are complex, dynamic nonlinear systems
more complex. Traditional methods used for the analysis with many degrees of freedom and whose nonlinear re-
of linear systems are not applicable for an analysis of even sponses depend strongly, and in a complex manner, on the
weakly nonlinear systems. The properties of such systems frequency. In amplifiers, for example, the nonlinear char-
can be described in the time domain by the sum of Volt- acteristics do not exhibit that strong a frequency depen-
erra kernels [14]. The latter are essentially the pulse re- dence. Therefore, in their analysis the relationship be-
sponses responsible for the transformation of input signal tween harmonic and intermodulation distortions might be
by nonlinearities of different orders. The overall pulse more predictable.
response of a weakly nonlinear system is the sum of ker- The examples with nonlinear distortion in the loud-
nels of different orders that are multidimensional functions speakers described in this work will assume weak non-
of time. For example, the pulse response of a simple non- linearity (distortion products are at least 20–30 dB lower
linear system characterized by a second-order dynamic than the fundamental signal). In reality, however, the dis-
weak nonlinearity is the sum of the kernels of the first tortion in loudspeakers and horn drivers can be higher,
order (which is essentially the linear pulse response) and placing loudspeakers and drivers in the category of
a second-order kernel. The latter can be presented graphi- strongly nonlinear systems. These systems are character-
cally as a three-dimensional surface with two horizontal ized by even more sophisticated properties that may in-
time scales. clude bifurcation and chaotic and stochastic behavior. This
The output of such a system can be expressed as the class of nonlinear systems will not be considered in the
convolution of an input signal with the first- and second- current work.
order kernels. This convolution is expressed in general by There is a dilemma in measuring, graphing, and inter-
multiple integrals. The multidimensionality is also valid preting nonlinear distortion. On the one hand the assess-
for a frequency-domain complex transfer function of non- ment of nonlinear distortion needs the analysis of much
linear systems. The amplitude and phase frequency re- more information than is required to assess a linear sys-
sponses of a second-order distortion are also three- tem. On the other hand this information should be pre-
dimensional surfaces having two horizontal frequency sented in a simple and comprehensible graphical manner.
scales. The second harmonic distortion response (ampli- These two requirements may contradict each other. Fur-
tude and phase) is merely a diagonal “cut” through these thermore, the graphed data should be pertinent from the
two surfaces. Similarly, the impulse response of the sec- standpoint of distortion audibility. The final goal of a loud-
ond harmonic is merely a diagonal cut across the surface speaker nonlinear distortion measurement is to obtain data
of the three-dimensional kernel of the second order [1]. It that convey adequate information about the nonlinearity so
is obvious that neither the frequency response of the sec- that this information can be related unambiguously to the
ond harmonic nor its impulse response will represent the perceived sound quality of a loudspeaker under test, and
entire second-order nonlinear response of a weakly non- that thus the performance of different loudspeakers can be
linear dynamic system legitimately. These cuts may not compared objectively. The measurement data must be
correspond to the maxima of the distortion surface. Using “manageable.” In spite of the seeming simplicity of these
only harmonic distortion may cause mistakes in the as- goals, and a nearly 90-year history of numerous efforts of
sessment of the nonlinearity. Therefore a search for a cor- many researchers (see [1] for a history of the subject),
relation between the audibility of nonlinearly distorted these goals have never been fully achieved.
334 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

2 PSYCHOACOUSTICAL CONSIDERATIONS stronger signal, called masker. The masking may be ob-
served in the time domain in the form of post and pre-
The search for a correlation between an objective mea- masking when a stronger short-term masker “obliterates” a
surement of nonlinear distortion and the subjective audi- weaker masked signal, even if the latter precedes the
bility of nonlinear distortion in audio equipment in gen- masker. Masking may also occur in the frequency domain,
eral, and in loudspeakers in particular, has always been where a stronger masker produces a shadow zone around
and remains the Holy Grail of the audio industry. Loud- itself. This shadow psychoacoustically suppresses those
speaker distortion measurement data related to the per- masked signals whose spectrum components happen to be
ceived sound quality must not only have some readily within the spectrum and below the level of the masking
comprehensible interpretation, but must also be supported frequency-domain curve. The masking frequency-domain
psychoacoustically. There must be a credible knowledge curve produced by a single tone, for example, resembles a
relating the graphically presented objective data to the triangle. With an increase in the level of the masker, the
subjectively perceived sound quality. Due to the complex triangle becomes asymmetrical, with its longer side
nature of the nonlinearity and the intricacy of the human stretching toward high frequencies [25]. With an increase
auditory system’s reaction to a musical signal adversely in the level of the masking tone the level of the masking
affected by the nonlinearity, there are no undisputedly asymmetrical triangle increases and stretches over a wider
credible and commonly recognized thresholds expressed frequency range, producing a stronger masking effect
in terms of the traditional nonlinear distortion measures above the frequency of the sinusoidal masker rather than
related to the perceived sound quality. The problem is below it (Fig. 1). The masker shown in Fig. 1 corresponds
aggravated by the fact that the objective measurement of to curve a.
nonlinearity deals merely with the symptoms of a nonlin- The asymmetrical triangular shape of the masking curve
ear system, that is, with the reactions of a nonlinear sys- explains why the higher order harmonics and intermodu-
tem, such as a loudspeaker, to various testing signals. Here lation products are more audible than the lower order ones,
we operate with objective categories, such as measured who are more prone to be masked. In Fig. 2 the harmonics
levels, responses, characteristics, and parameters. Mean- and intermodulation products, produced by a two-tone sig-
while the subjective assessment of musical signals im- nal affected by the static fifth-order nonlinearity, are over-
paired by the nonlinearity deals with the human psycho- laid with the masking curve produced by the two-tone
acoustical reactions and impressions expressed in a quite masker. This also explains why the difference intermodu-
different vernacular, such as “acceptable, annoying, pleas-
ant, or irritating.” The objective of a researcher is to put a
bridge between these two different domains.
The dynamic reaction of a complex nonlinear system
(such as a direct-radiating loudspeaker or a horn driver) to
a musical signal cannot be extrapolated from its reaction to
a simple testing signal such as a mere sweeping tone.
Hence the credible thresholds of subjectively perceived
nonlinear distortion expressed in terms of the reaction to
simple sinusoidal signals (THD, harmonics, or two-tone
intermodulation distortion) may not be valid. More com-
plex signals, such as a random or pseudorandom noise or
a multitone stimulus, are believed (by the authors) to be
required to search for subjectively relevant thresholds. Fig. 1. Masking curves corresponding to levels a–e of sinusoidal
The complex properties of the human hearing system, tone masker. The masker corresponds to curve a.
which is a far cry from a mere Fourier frequency analyzer,
only add complexity to the problem. The behavior of the
hearing system is characterized by many effects described
in various publications on psychoacoustics (see [25], for
example). The properties of the auditory system most rel-
evant to the subject of this work are the intrinsic nonlin-
earity of the hearing system and temporal and frequency-
domain masking. These effects have been treated in detail
in the psychoacoustical literature, and it is not the authors’
goal to replicate these texts. However, it is worth men-
tioning that the intrinsic nonlinearity of the human hearing
system manifests itself at high levels of sound pressure,
whereas the masking is a general property of the hearing
system, “working” at any level of the sound pressure signal.
The masking plays a crucial role in the perception of Fig. 2. Masking effects produced by two closely spaced funda-
nonlinear distortion. The crux of masking is a psycho- mental tones (maskers) on their harmonics and intermodulation
acoustical suppression of a weaker masked signal by a products.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 335


VOISHVILLO ET AL. PAPERS

lation products are more susceptible to be outside the nar- interpretation of multitone test results does not have a well
rower lower side of the masking curve, which makes them substantiated psychoacoustical support. So far we cannot
more audible. In addition, the frequencies of the harmon- derive precise judgments about the sound quality of a
ics have a higher probability of coinciding with overtones loudspeaker (that has been tested by a multitone stimulus)
of particular musical instruments, and of being masked by from the response to this signal. However, the results of
these overtones as well. Meanwhile the variety of “disso- recent research on the correlation between objective mea-
nant” intermodulation products of various orders do not surements and subjectively perceived nonlinearly distorted
coincide with the overtones of the musical instruments speech and musical signals [26] prove that for certain
and, therefore, are more noticeable. kinds of nonlinearity the postprocessed reaction to a mul-
The complexity of nonlinear systems such as loud- titone stimulus expressed as a single number, dubbed by
speakers and the complexity of the hearing system explain the authors of that work the distortion score (DS), has a
why thresholds of distortion audibility expressed in terms very high correlation with subjectively perceived sound
of such plain metrics as harmonic distortion, THD, and quality. The distortion score is obtained by the summation
two-tone intermodulation distortion strongly depend on of the levels of distortion products within the mean
the type of musical signal used in the experiments, and equivalent rectangular bandwidth (ERBN) of the auditory
why the data obtained by different researchers are often filter, which is conceptually similar to the traditional criti-
inconsistent. An historical review of the search for a rela- cal bandwidth but differs in numerical values. It is be-
tionship between objective data (expressed in terms of lieved that future experiments with multitone stimuli
such metrics as THD, harmonic distortion, and two-tone might lead to further positive results in attempts to find a
intermodulation distortion) and the subjectively perceived relationship between the objective measurement data and
deterioration of the sound quality of reproduced material is subjectively perceived sound quality.
given in [1]. Historically many early research works in this
area did not have a clear understanding of the complex 3 TESTING METHODS AND INTERPRETATION
nature of nonlinear dynamic systems and suffered from a OF MEASUREMENT DATA
lack of the modern knowledge of the principles of opera-
3.1 Relationship between Harmonics and
tion of the hearing system. Since then the theory of non-
Intermodulation Products—Effects Produced by
linear systems and the knowledge of psychoacoustics have
Static Nonlinearity of Different Orders
progressed enormously. Examples of the use of this prog-
ress are the systems of low-bit compression (such as MP3, Measurements of nonlinear distortion using simple ex-
WMA, ATRAC) that “deceive” the hearing system by citation signals may not provide adequate information
deleting significant parts of signal information without a about the nonlinear properties of a device under test. Even
significant deterioration of the perceived sound quality. considering a simple form of static nonlinearity, some not
These low-bit-rate compression systems, evaluated by immediately obvious effects appear. Let a hypothetical
standard metrics such as THD or two-tone intermodulation static nonlinear system be governed by the simple poly-
distortion, would have exhibited unacceptable levels of nomial expression
distortion, proving that the standard metrics have no im- n

mediate relationship with the perceived sound quality.


Continuing this line of thought, THD, which is the most
y共t兲 = 兺h z 共t兲
i=0
i
i
(1)

popular measure of nonlinear distortion in audio, is not a where z(t) is an input signal, y(t) is an output signal, h0 is
reliable measure of the psychoacoustically meaningful the dc distortion component, h1 is the linear gain coeffi-
nonlinearity in a loudspeaker. First, it does not add any- cient, and h2, . . . , hn are the weighting coefficients re-
thing to what individual harmonic curves can show. Sec- sponsible for the influence of a nonlinearity of a particular
ond, since useful information about harmonics of different order beginning from the second. The coefficients hi in
orders is not available from THD, its interpretation may general may have positive or negative signs, and some of
result in wrong conclusions about the character of the them may be zero.
nonlinearity in a loudspeaker tested. In other words, the A nonlinearity of this kind might, for example, approxi-
same 10% THD of the sound pressure level at a certain mate a loudspeaker suspension in the form of a relation-
level of input voltage might be produced by the dominant ship between the diaphragm displacement x and the force
second- and third-order nonlinearities in one loudspeaker, F if creep effect (the long-term dependence of the com-
or it might include the higher order harmonics in another pliance on the time of loading) and hysteresis are omitted.
loudspeaker as well. The difference in the amount and Then the coefficients h0, . . ., hn in Eq. (1) represent the
level of intermodulation products and, correspondingly, in suspension compliance. As a loudspeaker operates, non-
the sound quality of these two loudspeakers could be sig- linear compliance causes nonlinear displacement, and this
nificant. This THD would not indicate. effect interacts with other nonlinear phenomena. The over-
As has been mentioned, the multitone stimulus, whose all nonlinearity of loudspeakers is dynamic and more com-
objective parameters, such as the probability density func- plex than the simple relationship described by Eq. (1).
tion, have similarity with a musical signal, seems to be a Bearing in mind that this particular example is not a com-
good candidate for a better testing signal. However, there plete representation of the operation of a loudspeaker, we
is an important aspect of using multitone stimuli that will nevertheless analyze this simple static nonlinearity to
should be considered here to be objective. Currently the illustrate some general effects.
336 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

Let us assume that the relationship between the dis- If the input force F has the form of a two-tone signal,
placement, x (output) and the force F (input) is described F ⭈ 0.5(sin ␻1t + sin ␻2t), then the output signal (displace-
by the expression: x(t) ⳱ c1F(t) − c5F5(t), where the co- ment) consists of the linear part c1F ⭈ 0.5(sin ␻1t + sin ␻2t)
efficients c1 and −c5 represent nonlinear mechanical com- and of the distortion products generated by the fifth-order
pliance. We also assume that the input driving force is nonlinearity, which include two fifth and two third har-
sinusoidal and the coefficient c5 is set to c5 ⳱ 0.26344 to monics, and twelve intermodulation products. In addition,
produce 10% THD in displacement. Fig. 3 shows the de- the fifth-order nonlinearity produces two spectral compo-
pendence of the displacement on the driving force and the nents having frequencies identical to the frequencies of the
spectrum of displacement corresponding to the sinusoidal initial input signals. Fig. 4 depicts the output spectrum. We
input. The spectral components of displacement are de- assume that the amplitude of each tone is half the ampli-
scribed by the expression tude of the previous sinusoidal tone to maintain the same
maximum level as the single-tone signal,
x共t兲 = c1F sin ␻t − c5F 5 sin5 ␻t
= c1F sin ␻t − c5F 5共0.625 sin ␻t − 0.3125 sin 3␻t x共t兲 = c1F ⭈ 0.5共sin ␻1t + sin ␻2t兲
+ 0.0625 sin 5␻t兲. (2) − c5F 5 ⭈ 0.55共sin ␻1t + sin ␻2t兲5
= c1F ⭈ 0.5共sin ␻1t + sin ␻2t兲
The fifth-order “limiting” nonlinearity produces the
fifth harmonic (which is quite predictable). It also gener- − c5F 5 ⭈ 0.03125关0.0625 sin 5␻1t
ates the third harmonic and the spectral component having + 0.0625 sin 5␻2t − 1.5625 sin 3␻1t
the same frequency as the input signal. Since the latter − 1.5625 sin 3␻2t + 6.25 sin ␻1t + 6.25 sin ␻2t
spectral component is out of phase with the fundamental − 3.125 sin 共2␻1 + ␻2兲t + 3.125 sin 共2␻1 − ␻2兲t
tone it produces the limiting effect of the suspension be- − 3.125 sin 共2␻2 + ␻1兲t + 3.125 sin 共2␻2 − ␻1兲t
cause it decreases the level of the first harmonic in the + 0.3125 sin 共4␻1 + ␻2兲t − 0.3125 sin 共4␻1 − ␻2兲t
displacement compared to the linear one. The fifth har- + 0.3125 sin 共4␻2 + ␻1兲t − 0.3125 sin 共4␻2 − ␻1兲t
monic is five times (−14 dB) smaller than the third and ten + 0.625 sin 共3␻1 + 2␻2兲t + 0.625 sin 共3␻1 − 2␻2兲t
times (−20 dB) smaller that the spectral component having + 0.625 sin 共3␻2 + 2␻1兲t + 0.625 sin 共3␻2 − 2␻1兲t兴.
the same frequency as the input tone. (3)
The balance between fifth and third harmonics becomes
significantly different compared to the single-tone excita-
tion. The fifth harmonic turns out to be much lower in
amplitude than the third harmonic and all intermodulation
products. The difference between the fifth and third har-
monics produced by the same fifth-order nonlinearity be-
comes 28 dB. All twelve intermodulation products are
higher in amplitude than the fifth harmonic. If the maxi-
mum level of the two-tone signal is chosen equal to the
amplitude of the single-tone signal producing 10% THD,
the relationship between the harmonics produced by these
two signals is as shown in Table 1.
From this observation it might follow that if someone
tests only the harmonic distortion in this hypothetical non-
linear suspension, he might come to a conclusion that this
suspension is impaired predominantly by the third har-
monic distortion and to a lesser degree by the fifth har-

Fig. 3. (a) Dependence of suspension displacement on force;


fifth-order approximation. (b) Spectrum of displacement corre- Fig. 4. Spectrum of nonlinear reaction to two-tone input; fifth-
sponding to fifth-order approximation. Sinusoidal input. order approximation.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 337


VOISHVILLO ET AL. PAPERS

monic distortion. This conclusion might lead to a “picture” Fig. 5 illustrates the dependence of displacement on the
of a musical signal contaminated predominantly with third driving force and the spectrum of displacement corre-
and slightly with fifth-order harmonic distortion. How- sponding to sinusoidal input.
ever, the application of a more complex two-tone signal A sinusoidal input obviously produces third harmonics.
changes the reaction of this nonlinear system. For ex- It also produces the spectral component of the “first or-
ample, the fifth harmonic becomes very small and virtu- der,” which has the same frequency as the input signal. If
ally unessential compared to other distortion components. a two-tone signal is applied to the same system, it pro-
In its turn, the power of the stronger third harmonic be- duces four intermodulation products, two third harmonics,
comes negligible compared to the power of the intermodu- and two terms of the “first order,”
lation products. This observation can be extrapolated eas- x共t兲 = c1F ⭈ 0.5共sin ␻1t + sin ␻2t兲
ily to higher order static nonlinearities.
− c3F 3 ⭈ 0.53共sin ␻1t + sin ␻2t兲3
This simple example illustrates the fact that harmonics
= c1F ⭈ 0.5共sin ␻1t + sin ␻2t兲
and intermodulation products are merely symptoms of
nonlinearity. They are the reaction of a nonlinear system to − c3F 3 ⭈ 0.125关−0.25 sin 3␻1t − 0.25 sin 3␻2t
a particular input signal. Different inputs produce different + 2.25 sin ␻1t + 2.25 sin ␻2t
symptoms in the same nonlinear system. Since a real mu- − 0.75 sin 共2␻1 + ␻2兲t + 0.75 sin 共2␻1 − ␻2兲t
sical signal is a set of various spectral components rather − 0.75 sin 共2␻2 + ␻1兲t + 0.75 sin 共2␻2 − ␻1兲t兴.
than a merely sinusoidal tone, we can assume that the (5)
share of harmonics is much less significant than the share If we set the coefficient c3 ⳱ 0.30888 to produce 10%
of intermodulation products if a musical signal is applied THD at the single-tone input, and if we set the maximum
to a static nonlinear system. The relationship between har- level of the two-tone signal equal to the amplitude of a
monics and intermodulation products in dynamic nonlin- single tone, we obtain the level of harmonic components
ear systems will be analyzed in detail in the next sections. listed in Table 2.
The preceding example illustrated the different reac- If someone compares the measurement results of har-
tions of the same static nonlinear system to different sig- monic distortion produced by the two hypothetical suspen-
nals. Now let us consider the role played by the static sions he might come to the conclusion that they perform
nonlinearity of different orders. Let the hypothetical non- essentially similarly because their THD is equal to 10%,
linear suspension be approximated by the expression x(t) the levels of their third harmonics are close (−21.7 dB
⳱ c1F(t) − c3F3(t) and let the sinusoidal signal be applied versus −22.3 dB), and the fifth harmonic produced by the
to it, fifth-order suspension is small and not important. How-
ever, the larger number of intermodulation products pro-
x共t兲 = c1F sin ␻t − c3F 3 sin3 ␻t duced by the fifth-order nonlinearity (which remained be-
= c1F sin ␻t − c3F 3共0.75 sin ␻t − 0.25 sin 3␻t兲. (4) yond the scope of this particular harmonic measurement)

Table 1

Single Tone Two Tones


Harmonics Harmonics
1st 3rd 5th 1st 3rd 5th
−1.6 dB −21.7 dB −35.7 dB −7.0 dB −37.8 dB −65.8 dB

Fig. 5. (a) Dependence of suspension displacement on force; third-order approximation. (b) Spectrum of displacement corresponding
to third-order approximation. Sinusoidal input.

338 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

might produce a different effect on the perceived sound affinity with a musical signal. Truly, the multitone stimu-
quality. Fig. 6 illustrates the output spectrum. The spec- lus is closer to a musical signal than the single-tone stimu-
trum of the distortion products is wider and the density of lus in the crest factor, the spectrum, and the probability
the spectrum is higher in the fifth-order suspension. It may density function. This example illustrates the dominance
cause higher perceptibility of nonlinear distortion in the of intermodulation products revealed by the multicompo-
fifth-order system because some distortion products might nent testing signals. The tendency of intermodulation
not be masked by a hearing system. products to dominate harmonics, illustrated here through
This simple example illustrates a situation when the the use of a multitone signal, can be extrapolated to a musical
measurement of only harmonics does not convey enough signal. More details on this subject can be found in [1].
information about the performance of even a simple static
nonlinear system. It also shows that a higher order static 3.2 THD and Harmonic Distortion
nonlinearity produces a larger number of intermodulation
By observing only harmonic distortion curves we might
products if excited by a similar input signal.
not be able to come to an accurate conclusion about the
Three conclusions follow.
entire nonlinear properties of a loudspeaker under test, and
1) The overall level of harmonics is typically lower than
we cannot predict how the distortion products generated in
the overall level of intermodulation products within the
a musical signal will be masked by the hearing system. In
same nonlinear system, and this difference is stronger in a
addition, harmonic distortion measurements may not re-
system impaired by a higher order nonlinearity.
veal some nonlinear effects at all. A typical example is the
2) A nonlinear system of a higher order being exposed
Doppler distortion in direct-radiating loudspeakers. This
to a complex signal produces more intermodulation prod-
distortion is not revealed by a single tone. At least two
ucts with a wider spectrum. This effect is not revealed by
tones are required.
an analysis of the harmonic distortion.
The “supremacy” of intermodulation distortion may
3) The wider spectrum of intermodulation products
lead to the straightforward but wrong conclusion that har-
might be more noticeable because some of the spectral
monic distortion is irrelevant in any application and may
components would not be masked by the hearing system.
be omitted in measurements of loudspeaker nonlinear dis-
Table 3 shows the reaction of second- and third-order
tortion. However, while not being able to characterize
static nonlinearity to different multitone stimuli. The num-
nonlinearity in its entirety and complexity, and link it to
ber of intermodulation products of static nonlinearity char-
the audibility of signal deterioration, the harmonic distor-
acterized by only second and third orders increases dra-
matically with the number of input testing tones compared
to the number of harmonics. It can also be observed that
the third-order nonlinearity produces the same number of
harmonic products (compared to the second order) but a
significantly larger amount of intermodulation products.
This tendency increases in higher order nonlinearities.
In this example the increase in the number of inter-
modulation products generated stems from the nature of
the testing signal (multitone) that was chosen for some

Table 2

Single Tone Two Tones


Harmonics Harmonics
1st 3rd 1st 3rd
−2.3 dB −22.3 dB −7.7 dB −40.3 dB Fig. 6. Spectrum of nonlinear reaction to two-tone input; third-
order approximation.

Table 3

Second-Order Nonlinearity Third-Order Nonlinearity Overall (Second and Third)


Number
of Initial Number of All Number of Number of All Number of Number of All Number of
Tones IM Products All Harmonics IM Products All Harmonics IM Products All Harmonics
1 0 1 0 1 0 2
2 2 2 4 2 6 4
3 6 3 16 3 22 6
4 12 4 40 4 52 8
5 20 5 80 5 100 10
10 90 10 660 10 750 20
15 210 15 2240 15 2450 30
20 380 20 5320 20 5700 40

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 339


VOISHVILLO ET AL. PAPERS

tion curves, plotted separately as a function of frequency Fig. 7 illustrates the sensitivity of THD test results to
and level of input signal, provide useful information about variations of the loudspeaker parameters. Fig. 7(a) corre-
loudspeaker under test. For example, a strong level of sponds to input voltage of 10 V applied to a nonlinear
high-order harmonics may be indicative of a rubbing voice dynamic model of a 12-in (305-mm) woofer characterized
coil or the presence of nonlinear breakups of a compres- by a nonlinear Bl product, suspension stiffness, voice-coil
sion driver’s metallic diaphragm and suspension. The re- inductance, and flux modulation. The parameters of the
lationship between harmonics of even and odd orders tells woofer are given in Appendix 1. The modeling was carried
about the symmetry (or the lack of it) in loudspeaker dis- out through numerical solution of a system of nonlinear
placement-dependent parameters. The buildup of high- differential equations describing the behavior of an elec-
order harmonics with an increase in input voltage may be trodynamic loudspeaker (see Appendix 2). Fig. 7(b) shows
indicative of approaching the limit of a spider’s deflection. the THD curve corresponding to the same woofer, but the
When performing harmonic distortion measurements, one flux modulation distortion is omitted. The difference in the
should keep in mind that the harmonics will be accompa- physical properties between the two models is reflected in
nied by an outweighing number of intermodulation prod- the difference in THD curves. It can be observed that the
ucts as soon as the testing tone is replaced by a musical flux modulation distortion affects the THD curve at high
signal. frequencies. It is convenient to overlay THD curves cor-
The THD test can be used legitimately in “passed–not responding to different input levels. Fig. 8 shows SPL
passed” production tests where similar types of loudspeak- THD curves corresponding to an increase of the input
ers are tested. Certainly, THD gives an idea about audibly voltage from 10 to 40 V in 3-dB increments.
noticeable nonlinear distortion if its level is high. A loud- Figs. 9 and 10 show THD curves corresponding to two
speaker having 50% THD in the midrange will hardly be 8-in (203-mm) woofers having different motors. (One has
a source of mellifluous sound. It does not take some other a long 12-mm coil and a 6-mm short gap, the other a short
sophisticated analysis of nonlinearity or fine listening tests 6-mm coil and a long 12-mm gap.) The parameters of the
to figure that out. woofers are given in Appendix 1. The level of the input
signals corresponds to maximum voice-coil displacements
of 4 and 10 mm. The difference in the THD curves of the
small-level signal a is pronounced at low frequencies,
whereas the difference in the large-level signal b is pro-
nounced at frequencies above 80 Hz. Therefore THD gives
an idea of the difference in the objective performance of
two loudspeakers being compared.

3.3 High-Order Frequency Response Functions


(Frequency-Domain Volterra Kernels)
All the foregoing conclusions concerning the deficiency
of information about loudspeaker nonlinearity provided by
harmonic and THD measurements do not mean, however,
that a frequency response of traditionally measured indi-
vidual intermodulation products of the second or third or-
ders will always represent accurately the nonlinear prop-
erties of a loudspeaker in their entire complexity. The
following example will illustrate a situation when neither

Fig. 7. SPL THD of 12-in (305-mm) woofer. Nonlinearity is


produced by Bl product, suspension stiffness, voice-coil induc- Fig. 8. Increase in SPL THD of 12-in (305-mm) woofer corre-
tance, and flux modulation. Input voltage 10 V. (b) Same as (a), sponding to increase in input voltage from 10 to 40 V in 3-dB
but flux modulation omitted. increments.

340 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

the second harmonic nor the second-order difference- order sum intermodulation product Pf2+f1. The diagonal line
frequency intermodulation products provides sufficient in- a, characterized by the equal in modulus but opposite in
formation about the distortion of a hypothetical loud- sign frequencies, describes the zero-order harmonic Pf1−f1,
speaker characterized by only a second-order nonlinearity. which is a constant displacement if the HFRF describes
Figs. 11 and 12 show second-order frequency responses the voice-coil excursion. The diagonal line characterized
(frequency-domain Volterra kernels of the second order)
of the same 8-in (203-mm) woofers having different mo-
tors. A typical HFRF is presented in the form of a three-
dimensional “mountain terrain” with two horizontal fre-
quency axes. Figs. 11(a) and 12(a) illustrate the three-
dimensional terrains of the second distortion products,
whereas Figs. 11(b) and 12(b) show the maps of this sur-
face. The vertical axis shows the level of all second-order
distortion frequency responses, including second harmon-
ics, the sum and difference intermodulation products at all
combinations of two frequencies, and the frequency-
dependent constant component (dc or zero harmonic) if it
is excited in a particular nonlinear system [see Figs. 11(a)
and 12(a)].
In this interpretation the Cartesian coordinates of a point
on this map corresponding to one negative and one posi-
tive frequency describe the second-order difference fre-
quency component Pf2−f1, whereas a point with the coor-
dinates belonging to both positive frequencies is a second-

Fig. 9. SPL THD corresponding to voice-coil maximum dis-


placement. Loudspeaker A (long coil, short gap). a—Xmax ⳱ 4
mm; b—Xmax ⳱ 10 mm.

Fig. 11. Sound pressure response of loudspeaker A (long coil,


short gap). (a) Second-order frequency-domain Volterra kernel.
Peak level of input signal corresponds to Xmax ⳱ 4 mm. (b)
Topological view of second-order SPL response. a—second har-
Fig. 10. SPL THD corresponding to voice-coil maximum dis- monic P2f; b—sum intermodulation product Pf2+f1; c—difference
placement. Loudspeaker B (short coil, short gap). a—Xmax ⳱ 4 intermodulation product Pf2−f1. Unique area of kernel is high-
mm; b—Xmax ⳱ 10 mm. lighted.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 341


VOISHVILLO ET AL. PAPERS

by the equal values of positive frequencies is a second scribe entire second-order nonlinearity is represented by
harmonic distortion Pf1+f1 (see Figs. 11 and 12). By com- the single second harmonic distortion response. To know
paring the second harmonic distortion cut with the entire the dynamic reaction of the second-order nonlinearity we
surface of the second-order distortion, one may clearly see should also take into account the three-dimensional sur-
what a small share of all the information required to de- face of the second-order phase response and perform a
twofold inverse Fourier transform, which will obtain the
second-order impulse response. Two-dimensional convo-
lution of the input signal with this two-dimensional pulse
response provides the dynamic reaction of the second-
order nonlinearity. It is not hard to imagine how far the
distortion signal may be from the results of harmonic or
THD measurement. Whether or not this dynamic distor-
tion signal is noticed by the hearing system depends on a
number of factors that should be considered in the context
of masking such as the level of the signal, its dynamics,
and the spectral contents.
The concept of Volterra expansion can be formally ex-
tended to higher orders. Unfortunately the third-order non-
linearity needs four-dimensional space for its description
(three frequency scales), which defies simple graphic rep-
resentation. One possible solution to plot the third-order
HFRFs is a cut through one of the three frequency scales
corresponding to the worst case of distortion, and using the
remaining two scales in the three-dimensional graph. Reed
and Hawksford used this approach in [17]. For the higher
orders of nonlinearity the situation becomes even more
desperate, and graphical representation is even less prac-
tical. To make matters worse, with increasing orders of
nonlinearity the volume of calculations required to de-
scribe a Volterra model increases tremendously, making a
practical application impossible. This “curse of dimen-
sionality” is clearly illustrated by an analysis of the ex-
pression for the output signal of a nonlinear system de-
scribed by the first three terms of a Volterra expansion,
t t t

y共t兲 = 兰h 共␶ 兲x共t − ␶兲 d␶ + 兰兰h 共␶ , ␶ 兲x共t − ␶ 兲


0
1 1
0 0
2 1 2 1

t t t

× x共t − ␶2兲 d␶1 d␶2 + 兰兰兰h 共␶ , ␶ , ␶ 兲x共t − ␶ 兲


0 0 0
3 1 2 3 1

× x共t − ␶2兲x共t − ␶3兲 d␶1 d␶2 d␶3. (6)


Here h2(␶1, ␶2) is a second-order kernel, or a two-
dimensional impulse response, which depends on the two
time arguments ␶1 and ␶2. Correspondingly, h3(␶1, ␶2, ␶3)
is a third-order kernel, depending on three time arguments.
If, for example, the analysis of the first-order kernel
(linear impulse response) is carried out on one thousand
samples, the second-order kernel (second-order impulse
response) requires one-quarter million samples to be ana-
lyzed with analogous resolution. The analysis of the third-
order kernel with the same resolution amounts to 375 mil-
lion samples, whereas the fourth-order kernel requires 250
billion samples to keep the same accuracy. In this estima-
Fig. 12. Sound pressure response of loudspeaker B (short coil, tion the property of Volterra kernel symmetry was used
long gap). (a) Second-order frequency-domain Volterra kernel. and the redundant parts of kernels were omitted.
Peak level of input signal corresponds to Xmax ⳱ 4 mm. (b) Unique areas of the second-order kernels presented are
Topological view of second-order SPL response. a—second har-
monic P2 f ; b—sum intermodulation product Pf2+f1; c—difference highlighted in Figs. 11(b) and 12(b). One quarter of all
intermodulation product Pf2−f1. Unique area of kernel is high- possible permutations of the samples of the second-order
lighted. system is required to describe the system. Three-eighths of
342 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

all possible permutations of the third-order system’s re- stant, was proposed by Scott [20]. It is called difference
sponse samples is enough to describe it completely, and frequency or CCIF method. If a two-tone signal is applied
one-fourth of all permutations of the fourth-order samples to a second-order nonlinear system, it generates the fol-
[14]. Still, the number of samples to be analyzed increases lowing distortion products: dc component, two different
enormously with an increase in the order of Volterra intermodulation spectral components (sum and difference
expansion. products), and two second harmonics. Meanwhile the non-
The calculation of HFRFs from multidimensional pulse linear reaction of the third-order nonlinearity to the same
responses needs n-fold Fourier transforms. The volume of two-tone signal will consist of two third-order harmonics,
calculations also increases significantly with an increase in two spectral components having the same frequencies as
the order of nonlinearity, the initial tones but lower amplitudes, and four intermodu-
⬁ lation products. However, it is known from the theory of
H1共i␻1兲 = 兰h 共␶ 兲e
−⬁
1 1
−i共␻1␶1兲
d␶1
nonlinear systems that full description of the third-order
nonlinearity formally needs at least a three-tone signal
[14]. This increases the number of third-order harmonics

to three, and the number of spectral components having
H2共i␻1, i␻2兲 = 兰h 共␶ , ␶ 兲e
−⬁
2 1 2
−i共␻1␶1+␻2␶2兲
d␶1 d␶2 input signal frequencies to three as well. The number of
intermodulation products goes as high as 16.
⬁ It is not practical to plot 16 different frequency re-
H3共i␻1, i␻2, i␻3兲 = 兰h 共␶ , ␶ , ␶ 兲e
3 1 2 3
−i共␻1␶1+␻2␶2+␻3␶3兲 sponses of intermodulation products. Traditionally only
the products Pf2±2f1 are analyzed, omitting components of
−⬁ (7)
d␶1 d␶2 d␶3 the type Pf3±f2±f1, which are also generated by the third-
order nonlinearity. Hence the measurement of individual
⭈ intermodulation products of the second and third orders

⭈ gives limited information about the third-order nonlinear-
ity if a two-tone signal is used. With regard to higher order

兰h 共␶ , . . . , ␶ 兲e −i共␻1␶1+⭈⭈⭈+␻n␶n兲
nonlinearity, the standardized two-tone intermodulation
Hn共i␻1, . . . , i␻n兲 = n 1 n methods supply limited information as well.
−⬁ Plotting all four “conventional” intermodulation curves
d␶1 . . . d␶n. (Pf2±f1 and Pf2±2f1) on a single graph still produces a picture
In addition, the Volterra series expansion has a fundamen- that is difficult to comprehend and interpret. An integrated
tal constraint stemming from the assumption that there is criterion in the form of total intermodulation distortion
no energy exchange between nonlinear products of differ- (TIMD) is a simplifying solution, leading to fewer fre-
ent orders. This constraint confines the application of quency responses of intermodulation distortion. For ex-
Volterra series expansions to only weakly nonlinear sys- ample, standard IEC 60268-5 [22] determines: “The
tems. Attempting to use Volterra expansions for a nonlin- modulation distortion (MD) of the nth order shall be speci-
ear system with a strong nonlinearity causes divergence of fied as the ratio of the arithmetic sum of the r.m.s. values
the Volterra series. of the sound pressures due to distortion components at
This simple example illustrates why Volterra expan- frequencies f2 ± (n − 1)f1 to the r.m.s. value of the sound
sions of orders higher than three are practically never used. pressure Pf2 due to the signal f2.” The total intermodulation
It precludes Volterra expansion from handling strong and coefficient of the second order according to IEC 60268-5 is
high-order nonlinearities. The measurement of Volterra
HFRFs can be carried out using special signals, such as Pf2−f1 + Pf2+f1
d2 = × 100%. (8)
maximum-length sequences (MLS) [27], multitone stimuli Pf2
[28], and Gaussian noise [29]. There are methods provid-
ing a direct calculation of HFRFs from NARMAX output The total intermodulation coefficient of the third order is
data [30]. Straightforward methods operating with a varia-
tion of two or three sinusoidal signals are not practical Pf2−2f1 + Pf2+2f1
d3 = × 100%. (9)
because of the measurement time burden. Pf2

3.4 Two-Tone Intermodulation Frequency The frequencies f1 and f2 satisfy the condition f2 >> f1, and
Responses the ratio of the amplitudes of the input signal is specified
Measuring the frequency responses of intermodulation by the user. The standard gives no recommendation re-
products by using a two-tone signal has nearly as old a garding the measurement of intermodulation and harmonic
history as measuring harmonic distortion and THD. Two products having orders higher than three. This omission
methods have been used predominantly in the audio in- was probably due to practical concerns. Without calling
dustry. One, proposed by Hilliard, uses one fixed low- into question the validity of the standard’s recommenda-
frequency tone and one sweeping tone. The method was tions, the authors do not exclude situations when measur-
adopted by SMPTE [19]. It is often called the modulation ing higher order harmonic and intermodulation products
method. The second method, using two sweeping tones might be useful in the assessment of audio equipment
and keeping the frequency difference between them con- performance.
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 343
VOISHVILLO ET AL. PAPERS

As was mentioned in Section 2, higher order nonlinear tone intermodulation distortion (TIMD) is specified by the
products may be more detrimental to the sound quality authors as

冑兺
compared to the lower order products. This effect was
N
recognized a long time ago (see, for example, [31]), and
modern research confirms it [1], [32]. The importance of p2i 共 f 兲
i=1
higher order distortion products is somewhat twofold. dTIMD共f 兲 = × 100% (10)
First, high-order nonlinearity produces a very large num- Pf
ber of intermodulation products, whose number and en- where
ergy increase dramatically with an increasing input signal
level. Second, higher order products are usually spread p1共 f 兲 = Pf2+f1, p2共 f 兲 = Pf2−f1, p3共 f 兲 = Pf2+2f1,
over a wide frequency range, which results in weaker psy- p4共 f 兲 = Pf2−2f1, p5共 f 兲 = Pf2+3f1, p6共 f 兲 = Pf2−3f1,
choacoustic masking of these distortion products [1]. In p7共 f 兲 = Pf2+4f1, p8共 f 兲 = Pf2−4f1, . . .
the wake of it, the authors developed an alternative way to pn−1共 f 兲 = Pf2+mf1, pn = Pf2−mf1
formulate two-tone intermodulation distortion characteris-
tics. Figs. 13 and 14 show the two-tone intermodulation are the amplitudes of the intermodulation products, Pf ⳱
distortion curves of the same two 8-in (203-mm) woofers. Pf1 ⳱ Pf2 is the amplitude of either one of the fundamental
The distortion curves correspond to similar 10-mm maxi- tones, f1 is the fixed low-frequency tone, and f2 is the
mum voice-coil displacement. The difference between higher frequency sweeping tone.
these intermodulation distortion curves of the two loud- In this approach the excitation signal consists of two
speakers is significantly more pronounced when compared tones having equal amplitude. One of these two tones is
with the THD curves of the same woofers. The intermodu- swept across the frequency range. The level of distortion,
lation distortion curves presented in Figs. 13 and 14 are calculated according to Eq. (10), is plotted at the fre-
calculated differently from traditional intermodulation co- quency of the sweeping tone. The authors attempted to
efficients recommended by existing standards. The two- extend the recommendations of IEC 60268-5 [22] to the

Fig. 13. Two-tone intermodulation distortion (IEC 60268-5); Fig. 14. Two-tone intermodulation distortion (IEC 60268-5);
loudspeaker A (long coil, short gap). a—(Pf2−f1 + Pf2+f1)/Pf1; loudspeaker B (short coil, long gap). a—(Pf2−f1 + Pf2+f1)/Pf1;
b—(Pf2−2f1 + Pf2+2f1)/Pf1. (a) Voice-coil maximum displacement b—(Pf2−2f1 + Pf2+2f1)/Pf1. (a) Voice-coil maximum displacement
Xmax ⳱ 4 mm. (b) Voice-coil maximum displacement Xmax ⳱ Xmax ⳱ 4 mm. (b) Voice-coil maximum displacement Xmax ⳱
10 mm. 10 mm.

344 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

measurement of two-tone intermodulation distortion in the total harmonic and intermodulation distortion is extended
form of the intermodulation products having frequencies to a larger number of input tones. The ten-tone total har-
(f2 ± f1) and (f2 ± 2f1). The latter are the products of the monic distortion coefficient takes into account the har-
interaction between the first and second harmonics; the monics of all input tones. The level of this distortion,
first tone with the first harmonic of the second tone. The similar to the TTIMD, is related to one of the fundamental
authors merely extended the set of harmonics of the first tones. This simplifies the comparison of the distortions
signal to higher orders (3f1, 4f1), which led to intermodu- evaluated by these two criteria.
lation terms of the kind (f2 ± 3f1), (f2 ± 4f1). The inter-
modulation terms corresponding to the interaction be- 3.5 Multitone Stimulus
tween the harmonics of the first tone and the harmonics of
The possible circumvention of the partial “blindness” of
the second tone, such as (2f2 ± 2f1), (3f2 ± 2f1), (2f2 ± 3f1),
the conventional two-tone intermodulation tests is not to
(3f2 ± 3f1), etc., were omitted. The authors do not claim the
plot continuous frequency responses of the corresponding
ultimate validity of this approach. Including all the inter-
intermodulation products, but rather to show the full dis-
modulation products produced by two-tone excitation
crete spectra of all nonlinear products corresponding to
would probably provide more accurate results.
particular frequencies and levels of the two test tones. By
An alternative method to measure intermodulation dis-
extending this idea to a larger number of excitation tones,
tortion has been used by Keele [23], [24]. His test signal,
we naturally arrive at the concept of the multitone signal.
consisting of two tones, 40 Hz and 400 Hz, of equal am-
Indeed, if we obtain and graph the spectrum of a nonlinear
plitude, is applied to a loudspeaker, the input level is in-
reaction to the two-tone signal, which, as it has been
creased, and the intermodulation is measured and plotted
shown, gives limited information even about the third-
as a function of the input level. Such a test is a simple way
order nonlinearity, let alone the higher orders, why not use
to evaluate the intermodulation of the midrange output of
as many tones as it takes to detect all conceivable higher
a loudspeaker by a simultaneous bass signal.
In the current work Keele’s general approach to mea-
suring intermodulation distortion versus input level was
simulated using two different criteria. The first, dTTIMD,
includes all N measurable output intermodulation prod-
ucts; the second, dTTHD, takes into account only M har-
monic distortion products produced by two primary tones.
Here TTIMD stands for two-tone total intermodulation
distortion and TTHD designates two-tone total harmonic
distortion

dTTIMD =
冑兺 N

i=1
P共i兲IM
2

× 100% (11)
Pf

dTTHD =
冑兺 M

k=1
P共k兲H2
× 100% (12)
Pf

where P(i)IM is the amplitude of an intermodulation prod-


uct corresponding to the ith frequency, P(k)H is the am-
plitude of the harmonic corresponding to the kth fre-
quency, and N and M are the number of intermodulation
and harmonic products, respectively. To avoid the over-
lapping of the fundamentals and the distortion products,
the frequencies of the primary tones were chosen as f1 ⳱
fs and f2 ⳱ 5.5fs with frequency fs being the resonance
frequency of the loudspeaker.
Figure 15 shows the graphs of two-tone total intermodu-
lation and harmonic distortion of two 8-in (203-mm)
woofers as a function of the input level. The frequencies of
the tones are 57 and 313.5 Hz. The harmonic distortion of Fig. 15. Two-tone total nonlinear distortion as a function of input
the loudspeaker with the long voice coil a prevails at the voltage. a—intermodulation products; b—harmonic products. (a)
Loudspeaker A (long coil, short gap.). f1 ⳱ fs ⳱ 57 Hz; f2 ⳱
lower level of the input signal. However, this effect is not 5.5 ⭈ f1 ⳱ 313.5 Hz. Umax ⳱ 29 V corresponds to Xmax ⳱ 10 mm.
observed when the two-tone signal is replaced by a ten- (b) Loudspeaker B (short coil, long gap). f1 ⳱ fs ⳱ 62 Hz; f2 ⳱
tone input signal (Fig. 16). Here the concept of two-tone 5.5 ⭈ f1 ⳱ 341 Hz. Umax ⳱ 31 V corresponds to Xmax ⳱ 10 mm.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 345


VOISHVILLO ET AL. PAPERS

order intermodulation products, cover the entire frequency Usually the multitone signal is generated according to
range of measurement, and have a signal statistically much the simple rule
closer to the real musical signal than a single-tone or two-
N

兺A sin 共␻ t + ␸ 兲.
tone signal?
This idea can be extended to a tone in every FFT fre- x共t兲 = i i i (13)
i=1
quency bin. This abundance of tones turns the multitone
stimulus into a noiselike signal. Truly, noise signals are A strong advantage of the multitone stimulus is a short
used widely in the identification and analysis of nonlinear measurement time and the ability to reveal simultaneously
systems as, for example, in the measurement of the coher- a set of visible harmonic and intermodulation products. In
ence function. However, once the FFT is applied to the this capacity the multitone signal is beyond competition
output signal of a nonlinear system excited by such noise- with other signals. Multitone testing handles high-order
like signals, all individual distortion spectral components nonlinearity, and its use is not hampered by the existence
are obscured by the fundamental tones and become invis- of such effects as hard limiting, hysteresis, and dead zone.
ible on a graph. Meanwhile the multitone signal, produc- Also, the multitone stimulus can be used in applications
ing a “sparse” and discrete spectrum at the output of a where the loudspeaker short-term performance must be
nonlinear system, makes the majority of distortion prod- evaluated, such as the maximum SPL. Comparing a mul-
ucts visible on a graph. At the same time the multitone titone burst with a tone burst, the advantages of the former
signal is rather close to noise and musical signals in the become obvious. After the time-domain reaction of the
probability density function, bandwidth, and crest factor. loudspeaker to the tone burst of a particular frequency has
The multitone stimulus fills the gap between the noise- been received and preprocessed to skip the transients and
based methods of nonlinear identification and measure- then put through the Fourier transform, only harmonic
ments, and the traditional standardized methods using one distortion and THD become available. The distortion (har-
or two stationary or swept (stepped) tones. monic or THD) corresponds to only a single excitation
frequency. To cover the whole frequency range of interest,
these measurements have to be repeated at different fre-
quencies. Taking into account the number of measure-
ments needed to cover the entire frequency range with a
decent resolution, the overall measurement time may be
significant. Meanwhile, by applying the multitone burst,
only one measurement is needed and, in addition, the mul-
titone signal obtains more information about the nonlin-
earity in a loudspeaker under test. Furthermore, a multi-
tone burst’s crest factor can be “tuned” by adjusting the
phases of individual spectral components.
However, the interpretation of a nonlinear reaction to a
multitone stimulus may be arduous if the number of gen-
erated nonlinear products of different orders is substantial.
A multitone stimulus gives such “abundant” spectral in-
formation about nonlinearities that it is difficult to com-
prehend at first sight. (Truly, a second look at the reaction
to the multitone stimulus may not be helpful either when
one has to analyze hundreds if not thousands of distortion
spectral components.) In addition, an engineer has no in-
formation on how a particular pattern of nonlinear reac-
tions to multitone stimuli is related to the perceived sound
quality. Moreover, the spectrum of reactions to multitone
stimuli is not convenient to overlay and compare, espe-
cially if the responses to several input levels are to be
observed. There are several possible ways to overcome
this impediment. One is to distinguish the products of
different orders by postprocessing and to plot them either
separately or in different colors on the same graph. An-
other solution is to plot the averaged value of all distor-
tions located between two adjacent tones, as it is done in
[10] and in the FASTTEST multitone measurement [8].
Fig. 16. Ten-tone total nonlinear distortion as a function of input This approach permits a simple graphical representation of
voltage. a—intermodulation products; b—harmonic products. nonlinear distortions at different levels of input signal.
Logarithmic frequency distribution in frequency range fs to 5.5 fs.
(a) Loudspeaker A (long coil, short gap). Umax ⳱ 7.3 V corre- Distinguishing different intermodulation and harmonic
sponds to Xmax ⳱ 10 mm. (b) Loudspeaker B (short coil, long products of different orders is comparatively easy when
gap). Umax ⳱ 7.8 V corresponds to Xmax ⳱ 10 mm. the number of initial tones is reasonably low (less than ten,
346 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

for example). With an increasing number of input tones This expression for the MTND characteristic uses a rect-
overlapping of different frequencies occurs, and the prob- angular window and the weighting coefficient 1/K, where
lem of separation becomes much more difficult, but not a K is the number of distortion products in the rectangular
theoretically impossible task. The separation can be car- window. This way to formulate dMTND(f) gives values that
ried out by a discrete progressive increase of the input are too low if the number of distortion products in a cur-
level accompanied by an analysis of the rate of increase of rent window is significant but the level of them is not high.
each distortion product in a particular frequency bin in This statement is purely empirical and bears no relation-
such a way that the evaluated spectral components (in- ship to subjective sensations. An alternative way might be
cluding possible overlapped ones) measured at different to omit the weighting coefficient 1/K entirely. In this case,
levels of input signal form a so-called polynomial Van- however, the level of dMTND(f) may become dispropor-
dermonde matrix [33]. Corresponding mathematical ma- tionally high if the number of distortion products corre-
nipulations with this matrix, which remain beyond the sponding to a particular position of the rectangular win-
scope of this work, provide a separation of the overlapped dow is high, even if their amplitude is low. These practical
spectral components and make it possible to evaluate the considerations mean that when the level of an MTND
level and phase of each, disregarding the fact that they curve is much lower or much higher than the level of
overlap. This approach is described, for example, in [28], distortion spectral components, the graph of distortion
where a multitone signal is used in the identification of looks unnatural. This is merely the authors’ subjective
weakly nonlinear systems and the measurement of Volt- point of view derived from numerous modeling and mea-
erra kernels. surement experiments.
An alternative method to represent the results of multi- Fig. 17 shows the spectrum of the SPL reaction to
tone testing is the averaging of distortion products in a the input multitone stimulus of the same two 8-in (203-
“sliding window.” The spectral components are averaged mm) woofers. The solid curves correspond to MTND
in a window (such an rectangular or Hanning), and the calculated according to Eq. (15). Fig. 18 shows the
averaged value of the distortion products is plotted at the reaction to multitone stimuli of the 12-in (305-mm)
frequency corresponding to the center of the window [34]. woofer with and without flux modulation distortion. The
Afterward the window is shifted one frequency bin “up”
and the process is repeated. Ultimately it provides a con-
tinuous frequency response of the distortion products,
which encapsulates all harmonics and a variety of inter-
modulation products generated by a particular loudspeaker
at a particular level of multitone stimuli, and a particular
distribution of primary tones. It has been dubbed the mul-
titone total nonlinear distortion (MTND). One of the pos-
sible ways to calculate MTND, where the Hanning win-
dow is used, is presented in the expression
dMTND共 fi兲 =

20 log 冉冑 兺 再 冋 冉 冊 册 冎 冒 冊
KⲐ2

k=i−K Ⲑ 2
Dk cos
␲| fi − fk|
⌬f
+1
1
2
2

p0

共dB SPL兲 (14)


where ⌬f is the width of the frequency window, fi is the
window center frequency, Dk is the amplitude of a sound
pressure distortion product (Pa) at the frequency fk , and K
is the number of spectral components; p0 ⳱ 2 × 10−5 Pa.
The frequency window consists of K frequency bins.
The window is essentially a weighting function that has a
maximum at the frequency fi corresponding to the center
of the window. This way of formulating multitone distor-
tion was chosen experimentally. The resulting frequency-
dependent function dMTNDf) looks like an envelope of the
distortion products.
There are other possible ways to express multitone total
nonlinear distortion. For example, dMTND(f) can be ex-
pressed as

dMTND共 f1兲 = 20 log 冉冑 1



KⲐ2

D2
Kk=i−K Ⲑ 2 k 冒冊
p0 共dB SPL兲.
Fig. 17. Sound pressure reaction to multitone stimulus. Peak
level of input signal corresponds to Xmax ⳱ 10 mm. (a) Loud-
speaker A (long coil, short gap). (b) Loudspeaker B (short coil,
(15) long gap).

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 347


VOISHVILLO ET AL. PAPERS

solid curves correspond to MTND calculated according to There is a current impediment to the widespread use of
Eq. (14). multitone stimuli for measuring nonlinearity in loudspeak-
The presentation of the reaction of a device impaired by ers. This is the ambiguity of the nonlinear reaction of a
nonlinearity to multitone stimuli in the form of an aver- particular device to a multitone signal. A different distri-
aged curve (MTND) makes it easy to overlay different bution and a different number of tones produce different
curves belonging, for example, to different levels of input reactions. In theory all these responses belong to the same
signals or to different loudspeakers. Fig. 19 shows two over- multidimensional space of nonlinear reactions; however,
laid MTND curves, indicating that the flux modulation pro- for an observer these responses look different. This com-
duces distortion in the upper part of the frequency range. plicates the comparison of responses measured using
The frequency response of the MTND curve can be different distributions and number of tones. So the cur-
expressed in dB SPL [Eqs. (14) and (15)] as well as in the rent disadvantage of using multitone stimuli is the lack
percentage of the fundamental frequency response, of a common agreement regarding the number of tones,
their distribution, and the initial phases. To avoid this
dMTND共 fi兲

冉冑 兺 再 冋 冉 冊 册 冎 冒 冊
problem the number and distribution of tones should be
KⲐ2 2 standardized.
␲| fi − fk| 1 There are many methods of forming the frequency dis-
= Dk cos +1 A共 fi兲
k=i−K Ⲑ 2 ⌬f 2 tribution of multitone fundamentals. The major goal of
× 100共%兲 (16) some of the frequency distributions of primary tones (dif-
ferent from the evenly distributed tones on a logarithmic
where A(fi) is the amplitude of the frequency response of frequency scale) is to minimize the overlapping of primary
a loudspeaker at the frequency fi. tones and distortion components [7], [12].
Fig. 20 shows MTND responses of the 12-in (305-mm) As was mentioned, the separation becomes increasingly
woofer calculated according to Eq. (16) and at different difficult with an increasing order of the distortion products
levels of the input signal in 3-dB increments. due to the effect of overlapping. The separation can be

Fig. 19. SPL MTND corresponding to 12-in (305-mm) woofer. U


⳱ 3.3 V; Xmax ⳱ 2.5 mm. a—flux modulation taken into ac-
count; b—flux modulation omitted.

Fig. 18. Multitone reaction and MTND (solid curve) of 12-in


(305-mm) woofer. (a) Nonlinearity is produced by Bl product,
suspension stiffness, voice-coil inductance, and flux modulation. Fig. 20. MTND of 12-in (305-mm) woofer corresponding to
Input voltage U ⳱ 3.3 V; Xmax ⳱ 2.5 mm. (b) Same as (a), but increasing level of input voltage. Umin ⳱ 3.3 V; voltage incre-
flux modulation not taken into account. ments 3 dB.

348 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

handled through the use of the polynomial Vandermonde while this method has found an application in the assess-
matrix [28], [33], which is not a trivial procedure. The ment of nonlinearity in hearing aids [3], [4]. The coher-
separation of low-order and high-order distortion compo- ence function gives an integral lumped measure of the
nents can be performed easily by, for example, a two-tone nonlinearity in a device under test, but it also takes into
signal. This is important for the detection of loudspeaker account noise if it is presented in a device under test. In
defects (rub and buzz) separate from regular motor and loudspeaker testing the presence of noise does not seem to
suspension nonlinearities. The simplicity of the distorted be an impeding factor. The attractive feature of the coher-
two-tone signal allows one to “understand” the relation- ence function is its simple graphic representation. Plotting
ship between some of the loudspeaker nonlinear param- the set of coherence functions corresponding to different
eters (causes) and nonlinear distortion (symptoms). input levels is another nice option.
It seems to be more convenient to present the coherence
3.6 Coherence and Incoherence Functions function in the following manner and call it the incoher-
The next method that deserves discussion is the mea- ence function,
surement of the coherence function that characterizes the
degree of linear relationship between input and output as a I共 f 兲 = 公1 − ␥2共 f 兲 × 100共%兲. (23)
function of frequency. By definition, the coherence func- Expressed in percent, the incoherence function I(f) is in-
tion is expressed as the ratio of the square of the cross tuitively close to the concept of nonlinear distortion. Zero
spectrum (between input and output) to the product of the incoherence indicates the absence of nonlinear distortion
autospectra of input and output [35], and noise. There is a seeming similarity between the in-
|Gxy共 fi兲|2 coherence function and THD. However, there is a princi-
␥2共 fi兲 = (17) pal difference between these two characteristics. THD
Gxx共 fi兲Gyy共 fi兲
takes only harmonics into account, whereas the incoher-
where Gxx(fi) is the autospectrum of the input signal x(t) at ence function is sensitive to the overall nonlinear contami-
the frequency fi, Gyy(fi) is the autospectrum of the output nation of the output signal and noise.
signal y(t), and Gxy(fi) is the cross spectrum of the input Fig. 21 shows the incoherence function of the 12-in
signal x(t) and the output signal y(t). (305-mm) woofer corresponding to different levels of the
The functions Gxx(fi), Gyy(fi), and Gxy(fi) are calculated input signal. The initial level of the noise signal was 0.6 V
as follows: rms. This level produced a voice-coil peak displacement of
2.5 mm. The same peak displacement corresponded to 10
N


1 V rms set for the measurement of THD SPL (see Fig. 8),
Gxx共 fi兲 = E 关X共 fi兲 X*共 fi兲兴 = lim Xn共 fi兲 X*n 共 fi兲 (18) and to 3.3 V rms for the multitone measurement (see Fig.
N→⬁ N n=1
20). This difference in initial rms levels is attributed to
N
different crest factors of these signals. The incoherence

1
Gyy共 fi兲 = E 关Y共 fi兲 Y*共 fi兲兴 = lim Yn共 fi兲 Y*n 共 fi兲 (19) function, THD, and MTND each show a different pattern
N→⬁ N n=1
of nonlinear distortion. Due to the different nature of these
N three methods, they produce different data, all related to

1
Gxy共 fi兲 = E 关X共 fi兲 Y*共 fi兲兴 = lim Xn共 fi兲 Y*n 共 fi兲 (20) the same particular nonlinearity. This example demon-
N→⬁ N n=1
strates the complexity of assessment of nonlinear effects
where * denotes complex conjugation, E indicates aver- and the nontrivial reactions of a nonlinear system to dif-
aging, and X( f ) and Y( f ) are the complex spectra of the ferent testing signals. Fig. 22 shows the difference be-
input and output signals x(t) and y(t), respectively, tween the incoherence functions of two 8-in (203-mm)
woofers corresponding to voice-coil displacement of 4 and


X共 f 兲 = F兵x共t兲其 = x共t兲e−j2␲ft dt (21)
−⬁

Y共 f 兲 = F兵y共t兲其 = 兰

y共t兲e−j2␲ft dt (22)
−⬁

with F{⭈} being the Fourier transform.


In a strictly linear noiseless system the coherence func-
tion ␥2(f) equals unity at all frequencies. To the contrary,
if the input x(t) and the output y(t) have no relation to each
other, the coherence function is zero. If the coherence
function has a value between 0 and 1, the system under test
may either be impaired by the nonlinear distortion or the
noise, or both, or the output y(t) depends on some other
input processes along with x(t). Therefore the coherence
function may be used as a measure of nonlinearity in a
device under test. Fig. 21. Sound pressure incoherence function of 12-in (305-mm)
Historically use of the coherence function has never woofer corresponding to increasing level of input voltage. Umin
been immensely popular in loudspeaker testing. Mean- ⳱ 0.6 V; voltage increments 3 dB.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 349


VOISHVILLO ET AL. PAPERS

10 mm. An increase in the nonlinear distortion corre- tortion measures related to the perceived sound quality.
sponding to the increasing input signal can be observed. Since the dynamic reaction of a complex nonlinear system
The incoherence function was calculated by means of such as a loudspeaker cannot be extrapolated from its re-
the noise signal generated as a multitone signal with 4096 action to simple testing signals, such as a sweeping tone,
frequency components of equal amplitude and the random the thresholds expressed in terms of the loudspeaker reac-
distribution of phases. The sampling frequency was 7680 tion to these signals (THD, harmonics, and two-tone in-
Hz. The crest factor of this signal is 5.9. To adjust the termodulation distortion) may not be valid.
properties of this noise signal for the numerical integration The requirements for an optimal method of measuring
of the system of nonlinear differential equations governing nonlinear distortion in loudspeakers were formulated. The
the operation of a loudspeaker, an adaptive algorithm was optimal method to measure the nonlinearity in loudspeak-
used to provide the initial zero value of the testing signal. ers must be informative, that is, it must obtain enough
In the given examples the incoherence function resulted objective information about the nonlinearity of different
from 1000 averages, which would correspond to approxi- orders. The plotted measurement results must have a clear
mately 500 seconds of testing time. During this time the interpretation and be readily comprehensible. The mea-
warming of the voice coil would change the behavior of surement data must be supported psychoacoustically,
the loudspeaker significantly if the driver were operated at meaning that there should exist an unambiguous relation-
high amplitudes. This is a drawback of this technique. ship between the results presented and the expected sound
quality.
In nonlinear systems such as a loudspeaker, the inter-
4 CONCLUSION
modulation distortion outweighs the harmonic distortion if
Due to the complex nature of loudspeaker nonlinearity a musical signal is reproduced. Harmonics may not give a
and the intricacy of the human auditory system’s reaction quantitative measure of the nonlinear distortion in a loud-
to musical signals contaminated with nonlinear distortion speaker, especially in the context of nonlinear distortion
products, there are no undisputedly credible and com- audibility. Nevertheless, the harmonic distortion measure-
monly recognized thresholds of traditional nonlinear dis- ment provides valuable information, illustrating, for ex-
ample, the dominance of the nonlinearity of certain orders.
A wide spectrum of harmonics and a strong level of high-
order harmonics may be indicative of a loudspeaker mal-
function such as a rubbing voice coil.
It has been demonstrated that high orders of static non-
linearity are characterized by a significant difference be-
tween the harmonic and intermodulation products that out-
number the harmonics and outweigh them in power. It has
also been demonstrated that a high-order nonlinearity pro-
duces intermodulation and harmonic products of its “own”
order, and of lower orders as well. The latter might have
higher levels. Drawing the conclusion that a certain high-
order nonlinearity is not essential because it produces a
low level of its “own” harmonics may lead to wrong
results.
THD does not seem to be a good measure of psycho-
acoustically meaningful distortion in loudspeakers. Not
distinguishing different orders of harmonics, the THD fre-
quency response may lead to the wrong conclusions about
the performance of a loudspeaker. Similar levels of THD
may correspond to very different distributions of harmon-
ics of different orders. This difference, invisible to THD,
may correspond to a strong diversity in intermodulation
products and correspondingly significant differences in
sound quality. However, THD can be legitimately used in
testing where similar types of loudspeaker are compared
(for example, in production testing).
Multitone testing possesses a number of advantages
compared to other methods. It is fast and gives a detailed
graphical representation of the distortion products. When a
large number of input tones are applied to a loudspeaker,
the spectrum of the output signal becomes very rich with
Fig. 22. Sound pressure incoherence function. a—peak level of
input signal corresponds to Xmax ⳱ 4 mm; b—peak level of input intermodulation products (harmonic products have only a
signal corresponds to Xmax ⳱ 10 mm. (a) Loudspeaker A (long minuscule share of these spectral components). A visual
coil, short gap). (b) Loudspeaker B (short coil, long gap). examination of such a spectrum, though, may be difficult.
350 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

To circumvent this problem, the spectral components of Of all the methods surveyed and simulated, multitone
different orders can be plotted separately. The separation testing seems to be the most feasible in the context of
of products of different orders needs postprocessing. distortion audibility for the assessment of loudspeaker
Another option to simplify the visual interpretation of large-signal performance and nonlinearity measurements.
measurement results is to plot the average level of dis- Nevertheless, harmonic and the traditional two-tone inter-
tortion confined within adjacent tones or to plot the level modulation distortion should not be withdrawn from the
of the distortion products averaged in a sweeping fre- list of standard characteristics. THD is a lower resolution
quency window. The comparatively high crest factor of measure of nonlinearity, but can still be used for the com-
multitone signals may give a “pessimistic” evaluation of parison of loudspeakers of the same type. Multitone test-
the distortion level, registering the low-probability high- ing is good for both intermodulation distortion measure-
level peaks that may not be psychoacoustically relevant ments and the maximum SPL check. For the latter the
when a testing signal is replaced by a musical one. More multitone burst should be used. In addition, multitone test-
research is required to put a reliable bridge between a ing is good for loudspeaker quality control testing.
loudspeaker’s response to multitone stimuli and the sound Setting any boundaries relating objective information
quality of the loudspeaker. The results of recently pub- and nonlinear distortion audibility requires extensive com-
lished psychoacoustical research of a correlation between puter simulation and involved psychoacoustical tests.
the responses to multitone stimuli and the audibility of Without such information about the relationship between
distortion [26] imply that such a goal might possibly be objective and subjective parameters, the measurement data
reached. will only be able to tell us that one loudspeaker has more
The incoherence functions of two 8-in (203-mm) woof- or less nonlinear distortion. The question of how critical
ers were modeled at two different levels of the input noise this difference is from the standpoint of distortion audi-
signal. In addition, the incoherence function of a 12-in bility will remain unanswered.
(305-mm) woofer was modeled for different levels of in-
put signal. The incoherence function detected the differ-
5 REFERENCES
ence in performance of the two motors, showing an in-
crease in the overall nonlinear distortion for a loudspeaker [1] E. Czerwinski, A. Voishvillo, S. Alexandrov, and
having stronger voice-coil inductance modulation and A. Terekhov, “Multitone Testing of Sound System Com-
stronger dependence of the Bl product on the voice-coil ponents—Some Results and Conclusions, Part 1: History
displacement. and Theory,” J. Audio Eng. Soc., vol. 49, pp. 1011–1048
There is a significant difference between THD, incoher- (2001 Nov.); “Multitone Testing of Sound System Com-
ence function, and reaction to multitone stimuli. All three ponents—Some Results and Conclusions, Part 2: Model-
methods provide an “integral” assessment of nonlinear ing and Application,” ibid., pp. 1181–1192 (2001 Dec.).
distortion. However, the information conveyed by these [2] N. Wiener, Nonlinear Problems in Random Theory
methods is principally different. THD characterizes only (Technology Press, M.I.T., and Wiley, New York, 1958).
harmonic distortion, omitting the intermodulation prod- [3] O. Dyrlund, “Characterization of Nonlinear Distor-
ucts, which significantly outweigh harmonics in a dis- tion in Hearing Aids Using Coherence Function,” Scand.
torted musical signal. The incoherence function expressed Audiol., vol. 18, pp. 143–148 (1989).
in percent may be interpreted as a measure of the “lack of [4] J. Kates, “On Using Coherence to Measure Distor-
similarity” between the reference and the output signal. tion in Hearing Aids,” J. Acoust. Soc. Am., vol. 91, pt.1,
Contrary to THD, the incoherence function takes into ac- pp. 2236–2244 (1992 Apr.).
count all nonlinear transformations of the signal as well as [5] Y. Cho, S. Kim, E. Hixson, and E. Powers, “A
the influence of noise. However, this function does not Digital Technique to Estimate Second-Order Distortion
distinguish the products of different orders, giving a Using Higher Order Coherence Spectra,” IEEE Trans. Sig-
“lumped” integral measure. The multitone stimulus pro- nal Process., vol. 40, pp. 1029–1040 (1992 May).
vides information about harmonic and intermodulation [6] U. Totzek and D. Preis, “How to Measure and In-
products of various orders, but does it in a more diversified terpret Coherence Loss in Magnetic Recording,” J. Audio
manner, making it possible to distinguish and analyze in- Eng. Soc., vol. 35, pp. 869–887 (1987 Nov.).
dividual nonlinear products of different orders. The [7] D. Jensen and G. Sokolich, “Spectral Contamina-
MTND response simplifies the interpretation of the non- tion Measurements,” presented at the 85th Convention of
linear reaction to multitone stimuli by merging the numer- the Audio Engineering Society, J. Audio Eng. Soc. (Ab-
ous individual distortion spectral components into a single stracts), vol. 36, p. 1034 (1988 Dec.), preprint 2725.
frequency response of distortion. [8] R. C. Cabot, “Fast Response and Distortion Test-
Measurement of the frequency-domain Volterra kernels ing,” presented at the 90th Convention of the Audio En-
is also discussed. Plotting these three-dimensional graphs gineering Society, J. Audio Eng. Soc. (Abstracts), vol. 39,
of distortion of the second and third order is only feasible p. 385 (1991 May), preprint 3045.
if a loudspeaker is characterized by a small level of dis- [9] R. Metzler, “Test and Calibration Application of
tortion (weak nonlinearity). This method quickly loses its Multitone Signals,” in Proc AES 11th Int. Conf. (1992
accuracy if the level of distortion is high. High-order Volt- May), pp. 29–36.
erra kernels do not have a readily comprehensible graphi- [10] J. Vanderkooy and S. G. Norcross, “Multitone
cal representation. Testing of Audio Systems,” presented at the 101st Con-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 351
VOISHVILLO ET AL. PAPERS

vention of the Audio Engineering Society, J. Audio Eng. Music and Speech Signals,” J. Audio Eng. Soc., vol. 51,
Soc. (Abstracts), vol. 44, p. 1174 (1996 Dec.), preprint pp. 1012–1031 (2003 Nov.).
4378. [27] M. Reed and M. Hawksford, “Identification of
[11] P. Schweizer, “Feasibility of Audio Performance Discrete Volterra Series Using Maximum Length Se-
Using Multitones,” in Proc. AES UK Conf. on the Measure quences,” IEEE Proc. Circuits Dev. Sys., vol. 143, pp.
of Audio (1997 Apr.), pp. 34–40. 241–248 (1996 Oct.).
[12] J. M. Risch, “A New Class of In-Band Multitone [28] S. Boyd, Y. Tang, and L. Chua, “Measuring Volt-
Test Signals,” presented at the 105th Convention of the erra Kernels,” IEEE Trans. Circuits Sys., vol. CAS-30, pp.
Audio Engineering Society, J. Audio Eng. Soc. (Ab- 571–577 (1983 Aug.).
stracts), vol. 46, p. 1037 (1998 Nov.), preprint 4803. [29] R. Nowak and B. Van Veen, “Random and Pseu-
[13] V. Volterra, Theory of Functionals and of Integral dorandom Inputs for Volterra Filter Identification,” IEEE
and Integrodifferential Equations (Dover, New York, Trans. Signal Process., vol. 42, pp. 2124–2135 (1994
1959). Aug.).
[14] M. Schetzen, Volterra and Wiener Theories of [30] H. K. Jang and K. J. Kim, “Identification of Loud-
Nonlinear Systems (Krieger Publ., Malabar, FL, 1989). speaker Nonlinearities Using the NARMAX Modeling
[15] A. M. Kaizer, “Modeling of the Nonlinear Re- Technique,” J. Audio Eng. Soc., vol. 42, pp. 50–59 (1994
sponse of an Electrodynamic Loudspeaker by a Volterra Jan./Feb.).
Series Expansion,” J. Audio Eng. Soc., vol. 35, pp. [31] D. Shorter, “The Influence of High Order Products
421–433 (1987 June). on Nonlinear Distortion,” Electron. Eng., vol. 22, pp.
[16] M. J. Reed and M. O. Hawksford, “Practical Mod- 152–153 (1950).
eling of Nonlinear Audio Systems Using the Volterra Se- [32] E. Geddes, Audio Transducers (Geddlee, 2002),
ries,” presented at the 100th Convention of the Audio pp. 236–241.
Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. [33] G. Golub and C. Van Loan, Matrix Computations
44, pp. 649–650 (1996 July/Aug.), preprint 4264. (Johns Hopkins University Press, Baltimore, MD, 1996).
[17] M. J. Reed and M. O. J. Hawksford, “Comparison [34] A. Voishvillo, “Nonlinear Distortion in Profes-
of Audio System Nonlinear Performance in Volterra sional Sound Systems—From Voice Coil to the Listener,”
Space,” presented at the 103rd Convention of the Audio presented at the Acoustical Conference of the Institute of
Engineering Society, J. Audio Eng. Soc. (Abstracts), vol. Acoustics “Reproduced Sound 17” (Stratford-upon-Avon,
45, p. 1026 (1997 Nov.), preprint 4606. UK, 2001 Nov. 16–18).
[18] G. Cibelli, E. Ugolotti, and A. Bellini, “Dynamic [35] J. Bendat and A. Piersol, Random Data: Analysis
Measurements of Low-Frequency Loudspeakers Modeled and Measurement Procedures (Wiley, New York, 1986).
by Volterra Series,” presented at the 106th Convention of
the Audio Engineering Society, J. Audio Eng. Soc. (Ab-
APPENDIX 1
stracts), vol. 47, p. 534 (1999 June), preprint 4968.
[19] J. Hilliard, “Distortion Tests by the Intermodula-
tion Method,” Proc. IRE, vol. 29, pp. 614–620 (1941 REFERENCE LOUDSPEAKERS
Dec.).
[20] H. Scott, “The Measurement of Audio Distortion,” A1.1 12-in (305-mm) Woofer
Communications, pp. 25–32, 52–56 (1946 Apr.). The parameters of an experimental 12-in woofer were
[21] AES2-1984 (r. 2003), “AES Recommended Prac- obtained from measurements by the Klippel analyzer. In
tice—Specification of Loudspeaker Components Used in addition, force factor modulation by the voice-coil current
Professional Audio and Sound Reinforcement,” Audio En- was modeled using FEM (Fig. 23). The small-signal (rest-
gineering Society, New York (2003).
[22] IEC 60268-5, “Sound System Equipment—Part 5
Loudspeakers,” International Electrotechnical Commis-
sion, Geneva, Switzerland (2000).
[23] D. Keele, “Method to Measure Intermodulation
Distortion in Loudspeakers,” proposals for the working
group SC-04-03-C, Audio Engineering Society, New York
(2000).
[24] D. B. Keele, “Development of Test Signals for the
EIA-426-B Loudspeaker Power-Rating Compact Disk,”
presented at the 111th Convention of the Audio Engineer-
ing Society, J. Audio Eng. Soc. (Abstracts), vol. 49, p.
1224 (2001 Dec.), convention paper 5451.
[25] E. Zwicker, H. Fastl, and H. Frater, Psychoacous-
tics: Facts and Models, Springer ser. in Information Sci-
ences, 2nd updated ed. (Springer, New York, 1999). Fig. 23. Bl product affected by flux modulation. Experimental
[26] C. T. Tan, B. C. J. Moore, and N. Zacharov, “The 12-in (305-mm) woofer with overhung voice coil. Coil diameter
Effect of Nonlinear Distortion on the Perceived Quality of 75 mm; coil height 38 mm; top plate thickness 15 mm.

352 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

position) parameters of the 12-in woofer used in this work A1.2 Two 8-in (203-mm) Woofers
are given in Table 4. The length of the voice coil is 38 mm, The two 8-in woofers used in the experiments and mod-
the diameter is 75 mm, and the thickness of the top plate eling have similar suspension and different motors (Fig.
is 15 mm. The excursion-dependent parameters Cms, Kms, 25). One loudspeaker has a long coil (12 mm) and a short
Bl, and L1 are shown in Fig. 24. Distortions were simu- gap (6 mm), the other has a short coil (6 mm) and a long
lated for the woofer placed in a sealed 40-liter box. gap (12 mm) (Fig. 26). The diameter of both coils is 1.5 in

Table 4

Bl Kms Cms mms Rms Re R2 Le L2 fs Qes Qms Qts Vas


(T ⭈ m) (N/mm) (mm/N) (g) (kg/s) (⍀) (⍀) (mH) (mH) (Hz) (dm3)
20.5 6.55 0.15 232 2.82 5.22 10.4 2.46 1.28 26.8 0.48 13.8 0.46 335

Fig. 24. Excursion-dependent parameters of 12-in (305-mm woofer. (a) Suspension compliance. (b) Suspension stiffness. (c) Bl
product. (d) Voice-coil inductance.

Fig. 25. Reference 8-in (203-mm) woofers used in experiments


and modeling. (a) Loudspeaker A [long coil (12 mm), short gap
(6 mm)]. (b) Loudspeaker B [short coil (6 mm), long gap (12 Fig. 26. Rest positions of voice coils in gap. (a) Loudspeaker A
mm)]. (long coil, short gap). (b) Loudspeaker B (short coil, long gap).

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 353


VOISHVILLO ET AL. PAPERS

(38 mm). Both loudspeakers did not have dust caps to its initial value of 0.41 mm/N, the Xmax values of both
prevent any possible artifacts caused by the compression loudspeakers were set to 10 mm. At this displacement
of the air underneath a dust cap or distortion due to the the Bl product in loudspeaker A is 2.0 T ⭈ m, which is
turbulent airflow in a pole piece vent. The small-signal 22% of its initial value of 9.0 T ⭈ m. The Bl product of
(rest-position) parameters of the loudspeakers are listed in the second driver drops to 3.5 T ⭈ m, which is 47% of
Table 5. its resting position value of 7.5 T ⭈ m. Such a compar-
The nonlinear displacement-dependent parameters for atively moderate decrease in the Bl product for loud-
loudspeaker A are given in Fig. 27, those for loudspeaker speaker B is explained by the use of an underhung voice
B in Fig. 28. Loudspeaker A (long coil, short gap) has coil.
stronger overall variations of the Bl product and voice-coil Using similar suspensions in both loudspeakers and set-
inductance. Using the criterion of the maximum displace- ting identical values of Xmax ⳱ 10 mm helped to compare
ment Xmax corresponding to a decrease in the suspension the difference in nonlinear distortion in these loudspeakers
compliance Cms(x) to 0.12 mm/N, which is to 30% of caused by the difference in motor parameters.

Table 5

Loudspeaker Bl Kms Cms mms Rms Re R2 Le L2 fs Qes Qms Qts Vas


(T ⭈ m) (T ⭈ m) (N/mm) (mm/N) (g) (kg/s) (⍀) (⍀) (mH) (mH) (Hz) (dm3)
A
Long coil,
short gap 9.0 2.44 0.41 19.2 1.5 5.5 5.4 0.72 0.38 57 0.46 4.6 0.42 33
B
Short coil,
long gap 7.5 2.44 0.41 16.0 1.5 5.3 2.4 0.56 0.29 62 0.60 4.2 0.52 33

Fig. 27. Parameters of loudspeaker A (long coil, short gap) as a function of voice-coil displacement. (a) Suspension compliance. (b)
Suspension stiffness. (c) Bl product; current increments 2 A. (d) Voice-coil inductance.

354 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

APPENDIX 2 mechanical losses in the suspension, and Kms(x) is the


suspension stiffness.
The terms
LOUDSPEAKER NONLINEAR MODEL

The nonlinear behavior of the reference loudspeakers dLe共x兲 i2 dL2共x兲 i22


and
was researched numerically using the model described by dx 2 dx 2
a system of two nonlinear differential equations, are the reluctance forces produced by the voice-coil in-
d⌽1共x, t兲 d⌽2共x, t兲 dx ductance Le(x) and the parainductance L2(x). The time de-
U = Rei + + + 关Bl共x兲 + ⌬Bl共x, i兲兴 rivatives of the alternating fluxes ⌽1(x, t) and ⌽2(x, t) are
dt dt dt
expressed as
(24)
d⌽1共x, t兲 di dLe共x兲 dx
dLe共x兲 i2 dL2共x兲 i22 = Le共x兲 + i (26)
关Bl共x兲 + ⌬Bl共x, i兲兴i − − dt dt dx dt
dx 2 dx 2
d2x dx d⌽2共x, t兲 di2 dL2共x兲 dx
= 2 mms + Rms + xKms共x兲 (25) = L2共x兲 + i. (27)
dt dt dt dt dx dt 2
The sound pressure response was calculated by the
where U is the input voltage, i is the voice-coil current, Re
simple far-field half-space expression
is the voice-coil resistance, x is the voice-coil excursion,
⌽1(x, t) is the alternating magnetic flux related to the dy共t兲 Seff␳
voice-coil inductance Le(x), ⌽2(x, t) is the alternating flux p共t兲 = (28)
dt 2␲r
related to the parainductance L2(x), Bl(x) is the force prod-
uct, ⌬Bl(x, i) is the function responsible for the modulation where y(t) is the voice-coil velocity, dy(t)/dt is the voice-
of the flux, induction, and Bl(x) product of the gap, i2 is the coil acceleration, Seff is the diaphragm effective area, ␳ is
current through the parainductance L2(x), mms is the mov- the air density, and r is the distance from the diaphragm to
ing mass of the diaphragm and voice coil, Rms denotes the the observation point.

Fig. 28. Parameters of loudspeaker B (short coil, long gap) as a function of voice-coil displacement. (a) Suspension compliance. (b)
Suspension stiffness. (c) Bl product; current increments 2 A. (d) Voice-coil inductance.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 355


VOISHVILLO ET AL. PAPERS

The system of Eqs. (24), (25) was transformed into the is expressed through the loudspeaker parameters as

再 冋 册冎
canonical Cauchy form and solved numerically by the
dZ1共t兲 di2 1 dL2共x兲 dx
classical Runge–Kutta method of the fourth order. The = = R2共x兲i − i2 R2共x兲 −
vector form of the state variables is dt dt L2共x兲 dx dt

Z共t兲 = 兵z1共t兲, z2共t兲, z3共t兲, z4共t兲其T (29)


dZ2共t兲 di
dt
= =
1
dt Le共x兲 冋
U − iRe − i
dLe共x兲 dx
dx dt
− L2共x兲
di2
dt


⌽共t, Z兲 = 兵␸1共t, Z兲, ␸2共t, Z兲, ␸3共t, Z兲, ␸4共t, Z兲其 (30)
dL2共x兲 dx
where z1(t) denotes the current i2(t) in the parainductance − i2 − Bl共x兲 − ⌬Bl共x, i兲
dx dt
L2(x), z2(t) stands for the voice-coil current i, z3(t) ac- (35)
counts for the voice-coil displacement x(t), z4(t) is the dZ3共t兲 dx
= =y
voice-coil velocity dx(t)/dt, and ⌽(t, Z), is the vector for- dt dt


mulation of the derivatives of the current di2(t)/dt, the
derivative of the voice-coil current di(t)/dt, the voice-coil dZ4共t兲 dy 1 dLe共x兲 i2
= = 关Bl共x兲 + ⌬Bl共x, i兲兴i −
velocity y(t), and the voice-coil acceleration dy(t)/dt. dt dt mms dx 2


The Cauchy vector form of the system of Eqs. (24), (25)
and the initial conditions are dL2共x兲 i22
− − Rmsy − Kms共x兲x .
dx 2
dZ共t兲
= ⌽关t, Z共t兲兴 The function ⌬Bl(x, i) was calculated by the finite-
dt (31)
element method (FEM). The FEM static model of magnet
Z共t0兲 = Z0.
assembly was built and the model of the voice coil was
The integration steps are carried out according to the incorporated. The voice-coil model was ascribed the geo-
following algorithm: metrical dimensions, number of turns, and constant cur-
rent. Using the quasi-dynamic approach, that is, assigning
Zn+1 = Zn + hKn (32) different values of voice-coil current (of positive and
1 negative polarity), the distribution of the gap induction
Kn = 共K共n1兲 + 2K共n2兲 + 2K共n3兲 + K共n4兲兲 (33) was calculated. This procedure was repeated a number of
6
times for different positions of the voice coil. Afterward
and the Bl product was calculated for the corresponding dis-
crete values of the voice-coil current ± Ia, ± Ib ± ⭈ ⭈ ⭈ ± Im
K共n1兲 = ⌽共tn, Zn兲 and the positions of the voice coil ± X1, ± X2 ± ⭈ ⭈ ⭈ ± Xn.

冉 h
2
h
K共n2兲 = ⌽ tn + , Zn + K共n1兲
2 冊 The function ⌬Bl(x, i) approximated the variation of the Bl
product caused by the voice-coil current.

冉 冊
(34) The loudspeaker parameters were measured by the
h h Klippel analyzer and incorporated into the model. Integra-
K共n3兲 = ⌽ tn + , Zn + K共n2兲
2 2 tion of the system of Eqs. (24), (25) was performed using
different input signals to model different measurement
K共n4兲 = ⌽共tn + h, Zn + hK共n3兲兲
conditions. The signal duration and sampling frequency
where h is the time increment. were optimized for a particular signal. The sampling was
The vector linked to the time interval h used in the Runge–Kutta

冋 册
solution of the system (24), (25). The details of the solu-
dZ共t兲 dZ1共t兲 dZ2共t兲 dZ3共t兲 dZ4共t兲 T
tion are not discussed here because they do not have direct
= , , ,
dt dt dt dt dt relation to the subject of this paper.

THE AUTHORS

A. Voishvillo A. Terekhov E. Czerwinski S. Alexandrov

356 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


PAPERS COMPARISON OF LOUDSPEAKER NONLINEAR DISTORTION MEASUREMENTS

Alexander Voishvillo was born and raised in Leningrad ing a multichannel sound spectrum analyzer for jet en-
(now Saint Petersburg), Russia. He received a Ph.D. de- gines. At his next job, with Bendex Pacific, he designed
gree in 1987 for work centered on computer optimization the first of its kind 10 000-watt germanium solid-state
of loudspeaker crossover systems. amplifier for sonar transducers.
From 1977 to 1995 he worked at the Laboratory of His passion for music determined his future life and
Prospective Research and Development, Popov Institute career. Coming from a musically deprived childhood, he
for Radio and Acoustics, Saint Petersburg. While at the was exposed to a live orchestral sound as a young man,
Popov Institute he designed loudspeaker systems for which affected him profoundly. He wanted to be able to
manufacturers and did research work on loudspeakers. He replicate this experience at his time of choosing. While at
was responsible for the development of specialized studio Bendex he established Vega Associates, which designed
monitors for the Russian Broadcasting Corporation. In and manufactured custom high-fidelity systems. In 1973 it
1995 he moved to California, accepting an invitation of became Cerwin-Vega, Inc. He headed the company for the
Gene Czerwinski of Cerwin-Vega Inc., to head a new next three decades and developed the first quarter-kilowatt
research and development group. His responsibilities in- transistor audio amplifier and four-way 18-in loudspeaker
cluded the development of new transducers, as well as system. In 1964 he started experimenting with live sound
research work on nonlinearity in sound systems and ad- reinforcement for large-venue rock concerts. He worked
vanced methods of measurement of nonlinear distortion in with Universal Studios to develop the Sensurround sys-
audio equipment. He continued his collaboration with tem, which received an Academy Award in 1974 for spe-
Gene Czerwinski at Cerwinski Labs, a new R&D com- cial technical achievement in sound. Sensurround was
pany established in 2002, where he has been working on used nationwide in cinemas to realistically simulate vibra-
the development of original professional high-frequency tions for the “Earthquake” movie, and was a precursor to
transducers and on alternative methods of assessing non- the high-impact theater sound systems later developed by
linearity in audio equipment. Lucas Sound and Dolby, Inc. His lifelong interest in music
Dr. Voishvillo holds several U.S. patents on new types motivated Mr. Czerwinski to establish in 1989 a nonprofit
of transducers. He is the author of more than 30 publica- recording company, the MAMA (Musical Archives, Mu-
tions on loudspeakers, including the engineering book on sical Archives) Foundation. The Foundation’s goal is to
loudspeaker theory and design, High Quality Loudspeaker preserve the music of culturally significant artists whose
Systems and Transducers, published in Russia in 1985 as music does not have broad commercial appeal. MAMA
well as several publications in the Journal. He is a mem- has released over 30 state-of-the-art digital recordings of
ber of the Audio Engineering Society and participates in a jazz and big-band music, and garnered critical acclaim,
working group on loudspeaker measurements and model- including six Grammy nominations and two Grammy
ing at the AES Standards Committee. He is also a member Awards. In 2003 he sold Cerwin-Vega and founded Cer-
of the JAES Review Board. winski Laboratories, Inc. At Cerwinski Labs he continues
important research into air-propagation distortion and

multichannel sound reinforcement and reproduction.
Alexander Terekhov was born in Leningrad (now Saint Mr. Czerwinski has authored and coauthored numerous
Petersburg), Russia, in 1952. He received an M.Sc. degree patents on loudspeakers (six patents are currently pending).
in broadcasting and radio communication from the State

University of Telecommunications in 1974.
From 1981 to 1991 he worked as a research associate in Sergei Alexandrov was born in Leningrad (now Saint
the Laboratory of Prospective Acoustic Research and De- Petersburg), Russia. He received an M.Sc. degree in elec-
velopment at the Popov R&D Institute for Radio and trical engineering from Leningrad University in 1979.
Acoustics, Saint Petersburg. His activities included re- From 1969 to 1978 he worked as a development engi-
search on loudspeaker testing and measurements and bin- neer at the Marine Equipment R&D Institute. From 1978
aural stereophony. In 1991 he joined Audion Ltd., Saint to 1991 he held a principal position in the R&D group at
Petersburg, as a senior research associate and chief engi- Popov R&D Institute for Radio and Acoustics, Saint Pe-
neer. He came to the United States in 1996 and since 1997 tersburg, where he developed power amplifiers and audio
has been employed as an acoustic research engineer at signal processors for high-fidelity and studio applications.
Cerwin-Vega, Inc., and subsequently at Czerwinski Labo- In 1991 he cofounded and became president and CEO of
ratories. His professional activity is being focused on Audion Ltd., Saint Petersburg, an audio electronics and
acoustic research, the development of new software for loudspeaker test equipment manufacturing company. In
various R&D needs, including loudspeaker measurement 1996 he came to the United States and held a staff position
systems, and research on air propagation distortion. as an acoustic R&D engineer at Cerwin-Vega, Inc., until
Mr. Terekhov has presented technical papers at Russian 2002. There he developed original computer-controlled
audio conventions and in recent years coauthored several loudspeaker measurement systems, loudspeaker power
papers on air propagation distortion published in the Journal. testing devices, and a computer-based system for nonlin-
ear distortion audibility research. He continued his col-

laboration with Gene Czerwinski (the founder of Cerwin-
Eugene Czerwinski studied electrical engineering at the Vega) at Cerwinski Labs, where he has been working on
University of Toledo, where he received a B.S.E.E. degree. loudspeaker magnet system field optimization and non-
He worked as an associate professor of electronics at standard test equipment design.
the University of Michigan Engineering Research Insti- Mr. Alexandrov has presented a number of technical
tute. Later he was a development engineer at Willys Mo- papers at Russian audio and electronics conventions. He
tors Electronic Division, where he designed UHF TV has publications on homomorphic signal analysis, binaural
transmitting equipment, studio cameras, and audio sys- stereophony, audio signal processing, and loudspeaker
tems. In 1954 he joined the test division of Douglas Air- measurements to his credit and coauthored several publi-
craft, where he designed measurement equipment, includ- cations in the Journal.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 357


ENGINEERING REPORTS

Impedance Compensation Networks for the Lossy


Voice-Coil Inductance of Loudspeaker Drivers*

W. MARSHALL LEACH, JR., AES Fellow

Georgia Institute of Technology, School of Electrical and Computer Engineering, Atlanta, GA 30332-0250, USA

Two simple Zobel impedance compensation networks for the lossy voice-coil inductance
of a loudspeaker driver are described. Design equations for the element values are given, and
a numerical example is presented. The synthesis procedure can be extended to realize general
RC networks which exhibit an impedance that decreases with frequency at a rate of −n
dec/dec, where 0 < n < 1.

At low frequencies the voice-coil impedance is domi-


0 INTRODUCTION
nated by its motional impedance. For infinite baffle sys-
A two-terminal network that is connected in series or in tems the low-frequency impedance exhibits a peak at the
parallel with a circuit to cause its terminal impedance to be fundamental resonance frequency of the driver. In [3] a
transformed into a desired impedance is commonly called modification of the circuit proposed in [2] is described
a Zobel [1] network. In loudspeaker design a Zobel net- which provides an additional compensation for this im-
work consisting of a series resistor and capacitor con- pedance peak. The circuit is also applicable to closed-box
nected in parallel with the voice coil of a driver has been systems. Although the present report concerns impedance
described to compensate for the impedance rise at high compensation at the high-frequencies where the voice-coil
frequencies caused by the voice-coil inductance [2]. If the inductance dominates, the low-frequency compensation
inductance is lossless, the network can be designed so that circuit proposed in [3] is reviewed. In addition, a modifi-
the effective high-frequency impedance is resistive. By cation of this circuit for vented-box systems is given.
maintaining a resistive load on the crossover network, its It is assumed that the loudspeaker driver is operated in
performance is improved. However, the voice-coil induc- its small-signal range. Otherwise the voice-coil inductance
tance of the typical loudspeaker driver is not lossless. In becomes a time-varying nonlinear function, its value vary-
this case a Zobel network consisting of one resistor and ing with diaphragm displacement. This would preclude a
one capacitor can be used to obtain a resistive input im- linear circuit analysis and make it impossible to derive the
pedance at only one frequency in the high-frequency range compensation networks.
where the voice-coil inductance dominates. The impedance approximation technique presented here
In this engineering report two Zobel networks are de- has been used in the design of filters that convert white
scribed, one consisting of two resistors and two capacitors noise into pink noise. These circuits exhibit a gain slope of
and the other consisting of three resistors and three ca- −3 dB per octave over the audio band. Example circuit
pacitors. Each can be designed to compensate for the lossy diagrams of such filters can be found in [4], [5], and [6],
voice-coil inductance of a driver. It is shown that the net- but no design equations are given. In [5] the network is
works can be designed to approximate the desired imped- described as one in which “the zeros of one stage partially
ance in an “equal-ripple” sense. Although the approxima- cancel the poles of the next stage.” In [7] a similar network
tion can be improved with the use of more elements, it is is described to realize an operational-amplifier circuit
shown by example that the simpler four element network which exhibits a gain slope of +4.6 dB per octave over the
can give excellent results with a typical driver. The effects audio band. The authors stated that the network compo-
of this network on the responses of second-order and third- nent values were selected with the aid of a software opti-
order low-pass crossover networks for a specific driver are mization routine to match the desired slope. An analytical
presented. solution is given here for the design of such networks.
The general impedance compensation theorem de-
*Manuscript received 2003 January 24; revised 2003 Novem- scribed by Zobel can be succinctly summarized as follows.
ber 11. Given an impedance Z1 ⳱ R0 + Z0, let an impedance Z⬘1 ⳱
358 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS IMPEDANCE COMPENSATION NETWORKS

R0 + Z⬘0 be connected in parallel with Z1. The condition circuit has an impedance equal to RE at all frequencies [2].
that the parallel connection have a resistive impedance In this case R1 and C1 form a simple Zobel network, which
equal to R0 is that Z⬘1 ⳱ R20/Z1. This is a general result that cancels the lossless LE from the input impedance of the
is not specific to loudspeakers. For completeness, its deri- driver.
vation is given in the following where the notation used is In [10] it is shown that a lossy voice-coil inductance has
that for the voice-coil impedance. an impedance that can often be approximated by

1 IMPEDANCE COMPENSATION CONDITION 冋 冉 冊


ZL共j␻兲 = Le共j␻兲n = Le cos
n␲
2
+ j sin
n␲
2冉 冊册 ␻n (4)

The voice coil of a loudspeaker driver exhibits both a where Le and n are constants. Fig. 2(b) shows the circuit of
series resistance and an inductance. In the following it is Fig. 2(a) with LE replaced with ZL(j␻) and C1 replaced
assumed that the resistance is separated and treated as a with an impedance Z1(j␻). Let Zin be the input impedance
separate element, that is, not a part of the voice-coil in- to the circuit. The source current Is can be written
ductance. Fig. 1 shows the voice-coil equivalent circuit of
a driver in an infinite baffle [8]. The resistor RE and the Vs Vs − V1 Vs − VL
Is = = + . (5)
inductor LE represent the voice-coil resistance and induc- Zin R1 RE
tance. The elements RES, LCES, and CMES model the mo- If Zin ⳱ RE and R1 ⳱ RE, this equation can be solved for
tional impedance generated when the voice coil moves. Vs to obtain
These elements are related to the small-signal parameters
of the driver by the equations [9] Z1 ZL
Vs = V1 + VL = Vs + Vs (6)
RE + Z1 RE + ZL
QMS
RES = R (1) where voltage division has been used to express V1 and VL
QES E
as functions of Vs. This equation can be solved for Z1 to
RE obtain
LCES = (2)

冋 冉 冊 冉 冊册
2␲fSQES
R2E R2E n␲ n␲
QES Z1共j␻兲 = = cos − j sin . (7)
CMES = (3) ZL共j␻兲 Le␻ n 2 2
2␲fSRE
It follows that Zin ⳱ RE if R1 ⳱ RE and Z1(j␻) is given by
where QMS is the mechanical quality factor, QES is the Eq. (7). In this case the high-frequency voice-coil imped-
electrical quality factor, and fS is the fundamental reso- ance is resistive at all frequencies. Note that |Z1(j␻)| ⬀ ␻−n
nance frequency. so that a plot of |Z1(j␻)| versus ␻ on log–log scales is a
Above the fundamental resonance frequency, the ca- straight line with a slope of −n dec/dec. It should also be
pacitor CMES becomes a short circuit and the voice-coil noted that Z1(j␻) is the dual of ZL(j␻) scaled by the factor
impedance can be approximated by RE in series with LE. R2E, which follows from the fundamental principle derived
The equivalent high-frequency circuit is shown in Fig. by Zobel.
2(a). A resistor R1 in series with a capacitor C1 is shown
in parallel with the voice-coil impedance. At low frequen- 2 APPROXIMATING IMPEDANCE
cies the impedance of the circuit is RE. If the inductor is
lossless, the high-frequency impedance is R1. If R1 ⳱ RE Fig. 3 shows the Bode magnitude plot of an impedance
and R1C1 ⳱ LE/RE, it is straightforward to show that the which exhibits a slope of −n dec/dec between the frequen-
cies f1 and f6. Also shown are the asymptotes of an ap-
proximating impedance which exhibit alternating slopes of
−1 and 0. Four frequencies are labeled between f1 and f6 at
which the slopes of the asymptotes change. In the general
case, let there be N frequencies, where N is even and N ⱖ
4. In this case the number of asymptotes having a slope of
0 is (N − 2)/2. Let k be the ratio of the asymptotic ap-
Fig. 1. Equivalent circuit of voice-coil impedance. proximating impedance to the desired impedance at f ⳱ f1.

Fig. 2. (a) High-frequency voice-coil equivalent circuit with two-


element Zobel network. (b) Circuit used to derive Zobel network Fig. 3. Desired impedance and asymptotes of approximating im-
impedance transfer function. pedance versus frequency.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 359


LEACH ENGINEERING REPORTS

The desired impedance at f1 is labeled |Z1| in Fig. 3 and is Let Z1( f) be the approximating impedance function. It is
given by given by
|Z1| = Le共2␲f1兲n. (8) k |Z1| 1 + j共 f Ⲑ f2兲
Z1共 f 兲 = × . (15)
The approximating impedance at f1 is labeled k|Z1|. j共 f Ⲑ f1兲 1 + j共 f Ⲑ f3兲
With n, f1, and fN specified, the object is to specify k and 2.2 Case B: N = 6
f2 through fN−1 such that the ratios of each even sub-
Let f1 and f6 be specified. For N ⳱ 6, Eqs. (9)–(11) can
scripted frequency to the odd subscripted frequency to its
be solved to obtain
left are equal and the intersection points (indicated by dots

冉冊
n共1−n兲
on the plot) occur at the geometric mean of the adjacent f 2共2+n兲
frequencies. In this case the lengths of the six dashed k= 6 (16)
f1
vertical lines in Fig. 3 are equal and the asymptotes of the
2 n
approximating impedance approximate the desired imped- f2 = f 2+n f 2+n (17)
1 6
ance in an equal ripple sense between f1 and fN.
1+n 1
It is straightforward to show that the following condi-
f3 = f 2+n f 2+n (18)
tions must hold: 1 6

冉冊 冉冊 冉 冊
1−n 1−n 1−n 1 1+n
f 2 f 2 fN 2 f4 = f 2+n f 2+n (19)
k= 2 = 4 =⭈⭈⭈= (9) 1 6
f1 f3 fN−1
n 2
f5 = f 2+n f 2+n. (20)
f2 = f 1−n
1 f n
3
1 6

The approximating impedance as a function of frequency


f4 = f 1−n
3 f5
n
for this case is given by
⯗ (10)
k |Z1| 1 + j共 f Ⲑ f2兲 1 + j共 f Ⲑ f4兲
fN−2 = f N−3
1−n n Z1共 f 兲 = × × . (21)
f N−1 j共 f Ⲑ f1兲 1 + j共 f Ⲑ f3兲 1 + j共 f Ⲑ f5兲

f3 = f n2 f 1−n
4
2.3 Example Plots
To illustrate the accuracy of the approximating func-
f5 = f n4 f 1−n
6
(11) tions, let the impedance given by Eq. (7) be approximated
⯗ over a three-decade band for the case n ⳱ 0.5. The smaller
fN−1 = f N−2
n
f 1−n the value of n, the poorer the approximation. In the au-
N .
thor’s experience, the value of n for most loudspeaker
Solutions to these equations are given next for the cases drivers is in the range from 0.6 to 0.7. Thus the value n⳱
N ⳱ 4 and N ⳱ 6. 0.5 results in an approximation that is worse than what can
be expected with the typical driver.
2.1 Case A: N = 4 Fig. 4 shows the calculated Bode magnitude plots.
Let f1 and f4 be specified. For N ⳱ 4, Eqs. (9)–(11) can Curve a is the desired impedance. Curve b is the approxi-
be solved to obtain mating impedance for N ⳱ 4. Curve c is the approximat-
ing impedance for N ⳱ 6. It can be seen that the approxi-
冉冊
n共1−n兲
f 2共1+n兲 mating impedance functions ripple about the desired
k= 4 (12)
f1 function over the band of interest with a maximum devia-
1 n tion occurring at the two frequency extremes. Between the
f2 = f 1+n
1
f 1+n
4
(13) two extremes, the maximum deviation is less than it is at
n 1 the extremes because the design equations are derived
f3 = f 1+n
1
f 1+n
4
. (14) from the asymptotes of the approximating function.

Fig. 4. Example plots of desired impedance (curve a) and approximating impedances (curves b, c) versus frequency for n ⳱ 0.5.

360 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS IMPEDANCE COMPENSATION NETWORKS

3 THE COMPENSATING CIRCUITS The impedance of the circuit is equal to that of Eq. (21) if
3.1 Network A f2 f4
C1 = (33)
Fig. 5(a) shows a circuit consisting of two capacitors 2␲f1 f3 f5 k |Z1|
and one resistor, which can be used to realize the imped-
ance of Eq. (15). The impedance is given by f2 − f3 − f5 + f3 f5 Ⲑ f2
C2 = C1 (34)
f4 − f2
1 1 + s Ⲑ ␻2
Z1共s兲 = × (22)
s共C1 + C2兲 1 + s Ⲑ ␻3 1
R2 = (35)
2␲f2C2
where s ⳱ j␻ ⳱ j2␲f and
1 f3 − f4 + f5 − f3 f5 Ⲑ f4
␻2 = 2␲f2 = (23) C3 = C1 (36)
R2C2 f4 − f2

C1 + C2 1
␻3 = 2␲f3 = . (24) R3 = . (37)
R2C1C2 2␲f4C3

The impedance of the circuit is equal to that of Eq. (15) if


4 COMPENSATING THE LOW-FREQUENCY
f2 DRIVER IMPEDANCE
C1 = (25)
2␲f1 f3 k|Z1|
The impedance Z1(j␻) in Fig. 2(b) is the dual of ZL(j␻)
f3 − f2 scaled by the factor R2E. Following [3], the impedance rise
C2 = C1 (26)
f2 at resonance can be canceled by adding an impedance
Z2(j␻) in parallel with Z1(j␻), which is the dual of the
1
R2 = . (27) motional impedance of the driver scaled by the factor R2E.
2␲f2C2 This impedance consists of a series RLC circuit having the
The circuit of Fig. 5(a) corresponds to Fincham’s more element values [3]
general compensating network in [11].
QES
RS = R (38)
3.2 Network B QMS E
Fig. 5(b) shows a circuit consisting of three capacitors QESRE
and two resistors, which can be used to realize the imped- LS = (39)
2␲ fS
ance of Eq. (21). The impedance is given by
1
1 CS = . (40)
Z1共s兲 = 2␲fSQESRE
s共C1 + C2 + C3兲
共1 + s Ⲑ ␻2兲 共1 + s Ⲑ ␻4兲 The circuit for Z2(j␻) is shown in Fig. 6(a). The preceding
× (28) equations apply to a driver in an infinite baffle. For a
s2 Ⲑ 共␻3␻5兲 + s共1 Ⲑ ␻3 + 1 Ⲑ ␻5兲 + 1
closed-box baffle the element values are given by
where
QEC
1 RS = R (41)
␻2 = 2␲ f2 = (29) QMC E
R2C2
QECRE
1 LS = (42)
␻4 = 2␲ f4 = (30) 2␲ fC
R3C3
1
C1 + C2 + C3 CS = (43)
␻3␻5 = 2␲f3 × 2␲ f5 = (31) 2␲fC QECRE
R2R3C1C2C3
1 1 1 1 where QEC is the closed-box electrical quality factor, QMC
+ = + is the closed-box mechanical quality factor, and fC is the
␻3 ␻5 2␲ f3 2␲ f5
closed-box resonance frequency [12].
R2C2共C1 + C3兲 + R3C3共C1 + C2兲 The circuit for Z2(j␻) for a vented-box baffle is shown
= . (32)
C1 + C2 + C3 in Fig. 6(b). The element values for RS, LS, and CS are

Fig. 5. Circuits for approximating impedance Z1(j␻).

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 361


LEACH ENGINEERING REPORTS

calculated from Eqs. (38) through (40). It can be shown and their application. The dc voice-coil resistance was
that the elements RP, LP, and CP are given by found to be RE ⳱ 5.1 ⍀. The voice-coil impedance was
␣QES QLRE measured at 62 frequencies between 14.8 Hz and 20 kHz
RP = (44) with an MLSSA analyzer. The data in the range from 1.8
h
to 20 kHz were used to calculate the lossy voice-coil in-
␣QESRE ductance parameters. Calculations on the MLSSA data
LP = (45)
2␲fSh2 yielded the parameters RES ⳱ 26.9 ⍀, LCES ⳱ 38.1 mH,
1 CMES ⳱ 424 ␮F, n ⳱ 0.764, and Le ⳱ 0.0150. Fig. 7
CP = (46) shows the measured magnitude and phase of the imped-
2␲fS␣QESRE ance as circles and the impedance calculated from the
where ␣ ⳱ VAS/VB is the system compliance ratio, QL is equation

冉 冊
the enclosure quality factor at the Helmholtz resonance
1 1 −1
frequency, and h ⳱ fB/fS is the system tuning ratio [13]. ZVC共j␻兲 = RE + Le共j␻兲n + + + j␻CMES
RES j␻LCES
5 NUMERICAL EXAMPLE (47)
One sample of the JBL model 2241H 18-in (0.457-m) shown as a solid line, where ␻ ⳱ 2␲f. The figure shows
professional woofer was selected to illustrate the networks excellent agreement between the measured and calculated

Fig. 6. Compensation circuits for low-frequency impedance rise. (a) Infinite-baffle and closed-box drivers. (b) Vented-box drivers.

Fig. 7. Impedance measured and calculated from Eq. (47) (———) for JBL driver. (a) Magnitude. (b) Phase.

362 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS IMPEDANCE COMPENSATION NETWORKS

data, thus verifying the calculated values of the driver frequency band from f1 ⳱ 300 Hz to fN ⳱ 20 kHz. Table
parameters. 1 summarizes the intermediate calculations for the two
The element values for the Zobel networks were calcu- networks. Table 2 gives the calculated element values.
lated to compensate for the voice-coil inductance over the Network A is the network of Fig. 5(a). Network B is that
of Fig. 5(b). The element values for the optional network
Table 1. Summary intermediate calculations.
to compensate for the impedance rise at resonance have
Network A Network B the values RS ⳱ 0.858 ⍀, LS ⳱ 11 mH, and CS ⳱ 1460
␮F. It is quite obvious that these values would be imprac-
N 4 6
k 1.24 1.15
tical in a passive crossover network. Indeed, an 11-mH
| Z1| 4.77 ⍀ 4.77 ⍀ air-core inductor would in all probability have a series
f1 300 Hz 300 Hz resistance greater than 0.858 ⍀. For these reasons, the
f2 1.85 kHz 958 Hz impedance Z2(j␻) has been omitted in the following. How-
f3 3.24 kHz 1.37 kHz ever, it would be expected that the element values would
f4 20 kHz 4.38 kHz
f5 6.26 kHz
fall in a more practical range for midrange and tweeter
f6 20 kHz drivers which have a much higher resonance frequency
than the driver considered here.
Table 2. Element values. Fig. 8 shows the magnitude and phase of the voice-coil
impedance with and without Zobel network A. The plots
Network A Network B are calculated from the measured voice-coil data and not
R1 5.1 ⍀ 5.1 ⍀ those predicted by Eq. (47). The plots for network B are
C1 51.2 ␮F 47.4 ␮F not shown because, for all practical purposes, they are not
R2 2.23 ⍀ 5.25 ⍀ distinguishable from those of network A. However, this may
C2 38.5 ␮F 31.6 ␮F not be the case with drivers that have a lower value of n.
R3 2.03 ⍀ To evaluate the effect of the Zobel networks on the
C3 17.9 ␮F
performance of passive crossover networks, the voice-coil

Fig. 8. Impedance of JBL driver with and without Zobel network A. (a) Magnitude. (b) Phase.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 363


LEACH ENGINEERING REPORTS

voltage of the JBL driver was calculated for a source volt- Third-order networks are usually designed for a Butter-
age of 1 V rms with second- and third-order low-pass worth response. The crossover frequency is the −3-dB
crossover networks. The crossover frequency was chosen frequency of the network. The element values for the
to be fc ⳱ 800 Hz, which might be a typical value when third-order network in Fig. 9(b) are given by
this driver is used with a midrange horn. The circuit dia-
grams are shown in Fig. 9. Second-order networks are 3RE
L1 = = 1.52 mH (50)
usually designed for critical damping. The crossover fre- 4␲ fc
quency is the −6-dB frequency of the network. The ele-
ment values for the second-order network in Fig. 9(a) are 2
C1 = = 52 ␮F (51)
given by 3␲ fc RE

RE RE
L1 = = 2.03 mH (48) L2 = = 0.507 mH. (52)
␲fc 4␲ fc
1 Figure 10(a) shows the calculated voice-coil voltage for
C1 = = 8.43 ␮F. (49)
4␲ fcRE the second-order crossover network with and without Zo-
bel network A. With the network, the response follows
what would be expected of a second-order crossover net-
work. Without the network, the voltage exhibits a peak at
1.8 kHz that is 16.7 dB greater than the response with the
network. Fig. 10(b) shows the calculated voice-coil volt-
age for the third-order crossover network with and without
Zobel network A. With the network, the response follows
Fig. 9. (a) Second-order crossover network. (b) Third-order what would be expected of a third-order crossover net-
crossover network. work. Without the network, the voltage exhibits a peak at

Fig. 10. Voice-coil voltage of JBL driver with (curve a) and without (curve b) Zobel network A. (a) Second-order crossover network.
(b) Third-order crossover network.

364 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS IMPEDANCE COMPENSATION NETWORKS

630 Hz that is 10 dB greater than the response with the Design” (Reprint), J. Audio Eng. Soc., vol. 19, pp. 12–19
network. Above 1.6 kHz the response without the network (1971 Jan.).
lies above the response with the network and exhibits a [3] A. N. Thiele, “Optimum Passive Loudspeaker Di-
slope of approximately −43 dB/dec. The slope with net- viding Networks,” Proc. IREE (Australia), vol. 36, pp.
work A approaches −60 dB/dec, which is the correct slope 220–224 (1975 July).
for a third-order network. Crossover simulations with Zo- [4] General Radio Co., Instruction Manual—Type 1382
bel network B have been omitted because the results were Random-Noise Generator (1968).
almost identical. However, this may not be the case with [5] National Semiconductor Corp., Audio Handbook
drivers having a lower value of n. The plots in Fig. 10 (1976, 1977, 1980).
were calculated using the measured voice-coil data and not [6] P. Horowitz and W. Hill, The Art of Electronics
that predicted by Eq. (47). The plots show some evidence (Cambridge University Press, Cambridge, MA, 1980).
of the rise in impedance at the fundamental resonance fre- [7] P. G. L. Mills and M. O. J. Hawksford, “Transcon-
quency of the driver. This could be eliminated by the addition ductance Power Amplifier Systems for Current-Driven
of the circuit in Fig. 6(a) in parallel with the Zobel network. Loudspeakers,” J. Audio Eng. Soc., vol. 37, pp. 809–822
(1989 Oct.).
6 CONCLUSION [8] R. H. Small, “Direct-Radiator Loudspeaker System
Analysis,” J. Audio Eng. Soc., vol. 20, pp. 383–395 (1972
The high-frequency rise in the voice-coil impedance of a June).
loudspeaker driver caused by a lossy voice-coil inductance [9] W. M. Leach, Jr., Introduction to Electroacoustics
can be approximately canceled in the audio band by an RC and Audio Amplifier Design, 3rd ed. (Kendall/Hunt, Du-
Zobel network connected in parallel with the voice coil. The buque, IA, 2003).
simplest network consists of two resistors and two capacitors. [10] W. M. Leach, Jr., “Loudspeaker Voice-Coil Induc-
More complicated networks have three or more resistors and tance Losses: Circuit Models, Parameter Estimation, and
three or more capacitors. For a typical driver, the simplest Effect on Frequency Response,” J. Audio Eng. Soc., vol.
network can yield excellent results. Because the lossy voice- 50, pp. 442–450 (2002 June).
coil inductance can cause major perturbations in the per- [11] J. Borwick, Ed., Loudspeaker and Headphone
formance of crossover networks, the parameters n and Le Handbook, p. 216 (Focal Press–Elsevier, Burlington, MA,
should be included in the list of specifications for drivers
2001).
as an aid in the design of Zobel compensation networks.
[12] R. H. Small, “Closed-Box Loudspeaker Systems,
Parts I and II,” J. Audio Eng. Soc., vol. 20, pp. 798–808
7 REFERENCES (1972 Dec.); vol. 21, pp. 11–18 (1973 Jan./Feb.).
[1] O. J. Zobel, “Theory and Design of Uniform and [13] R. H. Small, “Vented-Box Loudspeaker Systems,
Composite Electric Wave Filters,” Bell Sys. Tech. J., vol. Parts I-IV,” J. Audio Eng. Soc., vol. 21, pp. 363–372 (1973
2, pp. 1–46 (1923 Jan.). June); pp. 438–444 (1974 July/Aug.); pp. 549–554 (1973
[2] R. H. Small, “Constant-Voltage Crossover Network Sept.); pp. 635–639 (1973 Oct.).

THE AUTHOR

W. Marshall Leach, Jr. received B.S. and M.S. degrees in he served as an officer in the U.S. Air Force. Since 1972
electrical engineering from the University of South Carolina, he has been a faculty member at The Georgia Institute of
Columbia, in 1962 and 1964, and a Ph.D. degree in electrical Technology, where he is presently professor of electrical
engineering from The Georgia Institute of Technology in engineering. Dr. Leach teaches courses in applied electro-
1972. In 1964 he worked at the National Aeronautics and magnetics and electronic design. He is a fellow of the Audio
Space Administration in Hampton, VA. From 1965 to 1968 Engineering Society and a senior member of the IEEE.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 365


ENGINEERING REPORTS

Scalable, Content-Based Audio Identification by


Multiple Independent Psychoacoustic Matching*

GEOFF R. SCHMIDT**

Intellivid Corporation, Cambridge, MA 02138-1171, USA

AND

MATTHEW K. BELMONTE**

Departments of Psychiatry and Experimental Psychology, University of Cambridge, Cambridge CB2 2AH, UK

A software system for content-based identification of audio recordings is presented. The


system transforms its input using a perceptual model of the human auditory system, making
its output robust to lossy compression and to other distortions. In order to make use of both
the instantaneous pattern of a recording’s perceptual features and the information contained
in the evolution of these features over time, the system first matches fragments of the input
against a database of fragments of known recordings. In a subsequent step, these matches at
the fragment level are assembled in order to identify a single recording that matches con-
sistently over time. In a small-scale test the system has matched all queries successfully
against a database of 100 000 commercially released recordings.

0 INTRODUCTION but on the actual data being indexed. The complexity in-
herent in such algorithms is a product of the fact that
The field of informatics is in the midst of a transforma- perceptual similarities appear not so much in a media file’s
tion from purely textual systems, in which indexing is raw data as in its many derived properties. Similarities
driven by tried-and-true methods of string matching, to apparent to the human senses are seldom evident in com-
multimedia systems, in which measures of similarity are parisons of the data’s raw bytes. Images whose actual
more dimensionally complex and computationally inten- pixel values are utterly different from each other may
sive. As the capacity for information storage has outpaced nonetheless look alike to the human visual system, and
developments in algorithms, indexing of pictures and sounds whose time-series data are uncorrelated may none-
sounds has been left to rely not on the actual records being theless sound alike to the human auditory system. Samples
indexed, but rather on file names or other textual labels may vary along multiple perceptual axes, making the
attached to these records. Anyone who has made use of search space high-dimensional and therefore making near-
image search engines or peer-to-peer file sharing systems est-neighbor searches exponentially more difficult. A
knows that these labels, or metadata, inevitably fail to simple example of this dimensionality problem is a time–
capture essential information. The top matches returned by frequency representation of a sound, in which the number
a search may turn out to be altered editions related to the of dimensions is the product of the number of time steps
target item (for example, a live concert recording, a remix, and the number of frequency bands. In order for the prob-
or a cover), or even the wrong item altogether. lem of comparing media files to be rendered tractable, the
To escape this dependence on fallible metadata, search dimensionality of the inputs must be reduced. The process
algorithms are needed that are keyed not on attached text of deterministically computing a relatively low-dimensional,
unique identifier for a media file is known as the fingerprint-
*Manuscript received 2003 August 12; revised 2003 Decem- ing problem. Fingerprinting in the audio domain, in particu-
ber 30. lar, has received a great deal of attention due to its immediate
**Formerly with Tuneprint, Inc., Cambridge, MA 02139- applications in archive indexing and searching, automated
4056, USA. broadcast monitoring, and digital rights management.
366 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

The audio fingerprinting schemes published to date ex- One way around the problem of temporal blurring in
tract information by applying frequency-domain analyses pooled data is to sum not the match scores themselves but
within narrow temporal windows. Some of these methods rather the outputs of some nonlinear “squashing function”
use a straightforward time–frequency representation as in- applied to the match scores. Good matches of short-lived
put to further processing steps [1]–[6] whereas others use features will thus be given a disproportionate effect on the
the Fourier spectrum to compute derived measures such as match quality of the recording as a whole. A system de-
modulation frequency [7], measures of spectral shape and signed by Papaodysseus et al. [5] adopts this approach
tonality [8], [9], MPEG-7 feature sets [10], or hash bits using a simple step function as a squashing function, that
based on frequency-specific temporal changes [11], [12]. is, matches at each time point are thresholded, and the
One published method extracts individual notes and then summed match score is incremented only if the match at
applies conventional string-matching algorithms to se- the current time point is above threshold. With this ap-
quences of these notes [13], though of course such a proach also, though, useful data are lost, since information
method does not allow for complexities such as superpo- on partial, subthreshold matches within individual time
sitions of various instruments or vocals. All of these sys- points has no effect on matching the recording as a whole,
tems either depend on supervised learning algorithms ap- even when such partial matches occur at a large number of
plied to fairly raw time–frequency representations, or time points.
decide a priori what specific spectral features or measures A difficulty in applying information from partial
will be relevant. In the former case, perceptually relevant matches is the time complexity of searching for these
information is submerged in a large corpus of data, making matches. Exact matches, on the other hand, are much more
statistical learning algorithms liable to discriminate on the efficiently identified. Quantization of the signal’s features
basis of perceptually irrelevant features. In the latter case, will produce some number of exact matches with identi-
feature extraction discards a great deal of useful informa- cally quantized records in the database. The database re-
tion along with the irrelevant details. Either way, classifi- cordings that generated these exact matches can then be
cation is degraded. searched for approximate matches. Constraining the
In addition to this loss of perceptual information within search space in this way renders the approximate search
time points, many existing techniques do not make full use problem tractable. This strategy has been applied in a
of information on changes in a recording’s perceptual hashing system based on temporal differences in subband
qualities across time points. A single recording may amplitude differences (that is, a double differentiation in
evolve temporally through many different styles, tempos, frequency and time) [11], [12]. Such a search method pre-
and timbres. Previous audio fingerprinting methods have serves information across time points, though it remains an
suffered from a needlessly exclusive view of time and open question whether this method of time–frequency dif-
frequency representations, collapsing localized frequency- ferentiation can be improved on for extracting perceptu-
domain features across long intervals of time. This strat- ally relevant information within time points.
egy accomplishes a great deal of data reduction, at the Losses of useful data within and across time points are
expense of blurring out short-lived properties that could be likely a major reason why fingerprinting schemes in gen-
useful for identification. The Muscle Fish system [8], for eral have suffered from error rates that would be unac-
example, computes feature vectors for a large set of ceptable in any large-scale system. Though more advanced
closely spaced time frames within a recording, but then systems (such as [12]) promise improved results, most of
retains only the mean, variance, and autocorrelation of the methods cited feature rates of successful matching in
these feature vectors over the entire recording. This strat- the range of 90 to 99%. When matching against a database
egy works well for brief samples such as sound effects [8] of hundreds of thousands or even millions of recordings,
but is unlikely to scale well to temporally extended re- even 99% is unacceptable. Worsening the outlook for scal-
cordings. A system described recently by Sukittanon and ability is the fact that many of these error rates arise from
Atlas [7] adopts a similar approach, computing subband tests against small databases on the order of hundreds [8],
modulation frequencies in each frame and then preserving [1]–[4] or thousands [13], [9], [7] of recordings, or within
only the centroids of these features across frames. A sys- restricted musical genres [1], [3]. In order to improve per-
tem described by Burges et al. [6] operates only on brief formance, strategies must be developed to extract selec-
clips, basing its classifications solely on the one frame that tively the perceptually relevant information within time
differs least from a frame in the database, without using points, and to preserve this information across time points.
information from surrounding frames. Other systems [2], For both of these goals one can look to human neurobiol-
[4] pool feature vectors across the entire recording into a ogy as a model.
histogram of vector-quantization bins, destroying temporal The goal of preservation of feature-specific information
sequence information as the data from the individual vec- across time points has been addressed in a biologically
tors are summed into the histogram. Another system based motivated neural network model proposed by Hopfield
on vector quantization [9] sums the error between feature and Brody [14]. This model recognizes audio signals via a
vectors and vector quantization codebook entries, and then massively parallel network of independent feature detec-
classifies the recording as a whole on the basis of the tors whose outputs decay with various time constants. The
codebook entry whose summed error is least. Here again, recognition signal consists of a selectively weighted sum
information on the recording’s time course is lost in the of these many time-dependent, feature-dependent mea-
summation. sures. Such a network of multiple independent feature de-
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 367
SCHMIDT AND BELMONTE ENGINEERING REPORTS

tectors is the parallel-processing equivalent of a serial In order to preserve information across time points, we
strategy involving multiple independent matching of fea- adopt the philosophy that many approximate tests of
tures at a series of time points, followed by an evaluation matching are better than a single, make-or-break test.
of the match results for temporal consistency. This strat- Other systems have constructed databases whose elements
egy makes the recognition problem tractable by separating are whole recordings, and this strategy leads inevitably to
the problem of matching within time points from the prob- the problem of pooling data across time points. In contrast,
lem of matching across time points. Matching within time the system discussed in this study, known as Tuneprint,
points produces in general some scalar measure of good- matches each short fragment of a recording against a large
ness of fit. Matching across time points then amounts to an database consisting of every individual fragment of every
instance of the well-known problem of weighted linear recording loaded. Fast vector quantization within such a
regression, where the time indices of an unknown input large database is inevitably fallible, and thus many frag-
recording are regressed against the time indices of a ref- ments within an input recording will produce incorrect
erence recording using these within-time-points measures matches. A significant portion of matches, however, will
as weights, and a regression line of unit (or near-unit) be correct, and it is this redundancy across time points that
slope indicates a match. This method of weighted linear is the key to the Tuneprint algorithm: if significantly many
regression has been applied to audio data by Schmidt [15], fragments from the input produce matches that turn out to
[16] and independently by Wang [17]. be the temporally corresponding fragments from a single
The choice of a method for extracting perceptual infor- recording, the input can be identified as this recording.
mation within time points depends on the context for Using a sophisticated psychoacoustic model as the front
which the audio fingerprinting system has been designed. end to this fragment-based analysis, Tuneprint has
achieved a high success rate on a database of 100 000
While some systems [4] only classify the input as to mu-
commercial music releases.
sical genre or class, others rank better and worse matches
within a class [8], or identify single best matches [13], [2],
[9], [5], [7]. Among the systems designed for individual 1 THE ALGORITHM
identification, variously inclusive or expansive criteria can 1.1 General Considerations
be established. The most narrow and least useful sense of
The identification algorithm can be conceptualized as
identity is the simple equality of two signals. In this case,
the combination of a perceptual model, which eliminates
byte-for-byte comparison suffices and no fingerprinting is
features that are not significant to a human listener, a
needed. At the opposite extreme, some applications may fragment function, which matches each of the input’s in-
need to establish broad identity, retrieving sounds that dividual temporal fragments against fragments in the da-
contain similar sequences or thematic elements but differ- tabase, and an assembly function, which puts together re-
ent instrumentals or voices (such as remixes or covers). sults from the fragment function over time and evaluates
Perhaps most useful for applications of digital rights man- their consistency. In general, a fragment function takes an
agement, though, is an audio indexing system that identi- input recording and a temporal offset within that record-
fies an input as one record within a database of releases, ing, and produces a set of triples, each consisting of a
after that input has perhaps been distorted by slight varia- recording in the database, a temporal offset within that
tions in playback speed, by lossy compression, or by trans- recording, and a distance measure or some other quantifi-
mission through a bandwidth-limited system. Such an in- cation of match quality. An assembly function takes a
dexing system would identify recordings that sound the series of the fragment function’s outputs over time, and
same to a human listener, and would differentiate record- produces a set of outputs each of which associates a re-
ings that contain obvious differences. cording in the database with a confidence level with which
This criterion of equivalence to a human listener leads the input matches that recording.
to a natural approach to enhancing perceptually relevant The goal in defining the fragment function is not so
information within time points: if the software is made to much to maximize accuracy as to maximize the informa-
model human auditory processing, then features not im- tion garnered per unit of computational resources spent.
portant in human auditory discrimination will be lost, and Depending on how the fragment function is implemented,
features that contribute to discrimination will be retained. the limiting resource may be input–output bandwidth,
As a result, the criterion of equivalence to a human listener memory bandwidth, network bandwidth, or CPU time. It is
can be attained without any assumptions as to what par- expected that a fragment function will be noisy, even re-
ticular features are relevant. A classifier based on human turning no information at all for some particularly difficult
auditory modeling is in a way an elaboration of time– cases. The assembly function is able to extract a signal
frequency methods that partition the audible spectrum into from this noisy output by exploiting the constraint that any
frequency bands akin to the critical bands of the human genuine match must be consistent over an interval of time.
cochlea [2], [4], [5]. One such method has achieved 99.8% Since matching is evaluated independently for each
recognition by applying dimensionality reduction and su- fragment, it is possible for a match to occur between any
pervised learning after accounting for frequency-specific subinterval of a recording submitted for identification and
human auditory perceptual thresholds [6], though more any matching subinterval of a recording in the database.
complex psychoacoustic phenomena such as spreading This ability to construct independent matches on subinter-
and compression are not accounted for. vals makes the Tuneprint system robust to truncation of
368 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

audio, to the addition of silent intervals, and to momentary maskers) versus those whose levels are frequency-specific
glitches such as sometimes occur during radio or stream- (tone maskers). As specified in [20], we spread each Bark
ing transmission. Useful results can be obtained even from band with a left-sided rolloff of 31 dB/Bark and a right-
less than a second of input. sided rolloff of 22 dB/Bark + (230 Hz/Bark)/␯, where ␯ is
the frequency (in hertz) of the masker. We apply an in-
1.2 Psychoacoustic Modeling tensity compression factor ␣ of 0.8, making the com-
As a front end to Tuneprint’s fragment function, audio pressed sum of excitations greater than the linear sum.
recordings are transformed by a psychoacoustically based Algorithmic details on the application of this compression
model of human hearing. Input is sampled at 44.1 kHz, factor and the computation of spreading are presented in
either from raw CD content or by playback from a com- [20]. In addition to accounting for masking, this spreading
pressed format. In the case of a stereo recording, left and of spectral peaks makes the identification robust to dis-
right channels are mixed to mono. Intervals of 185.715 ms cretization errors that may arise from small variations in
(8190 samples) are normalized for playback volume by playback speed.
subtracting the mean and then scaling to the interval’s Following spreading and conversion to an intensity (dB)
maximum excursion or to one-sixteenth of the playback scale, the minimum audible field (MAF) [21], expressed in
medium’s dynamic range, whichever is greater. (This dy- dB as a function of Bark frequency, is subtracted from the
namic range criterion prevents silent intervals from be- signal. Bark frequency bands in which this subtraction
coming high-amplitude noise.) The interval is multiplied yields a negative result are zeroed, whereas nonnegative
by a Hann window, padded with a single zero on each end, results are transformed according to a perceptual loudness
and then Fourier-transformed. A power spectrum with a measure based on that defined in [22], depending on Bark
resolution of 5.38 Hz is extracted from the Fourier trans- frequency z and frequency-specific intensity Iz:
form, over the frequency range from 253 to 12 500 Hz.
Frequencies outside this range are discarded so that 0.7 + 0.4 Ⲑ Bark ⭈ 共2.5 Bark − max关2.5 Bark, min共3 Bark, z兲兴兲
matches cannot depend on them. We have found this strat-
egy effective at improving the identification of band-
limited transmissions while having no negative effect on

⭈ 共Iz − 100 dB兲 + 100 dB + 8.5 dB ⭈ 1 −
1+e
1

−0.09 Ⲑ dB⭈共Iz−60 dB兲
.

the identification of full-bandwidth recordings. The power This step completes the psychoacoustic transformation, an
spectrum is transformed from a hertz scale to a Bark scale example of which is shown in Fig. 1. The resulting output
by linear interpolation using the Bark frequency values changes fairly slowly across samples (Fig. 2), making
given in [18]. This transformation yields a power spectrum Tuneprint robust to temporal frame shifts.
extending from 2.53 Bark to 23.17 Bark in 128 discrete As a postprocessing step, the spectrum is high-pass
steps, energy from multiple frequency bins at the high end filtered by subtracting from each point the linear trend in
of the spectrum being summed into single Bark bins. the 6-Bark interval centered on that point. (In the case of
Transformation to the Bark scale sets the stage for the points that lie within 3 Bark of the spectrum’s upper or
computation of frequency spreading. In the human ear, lower edge, the detrending window is narrowed accord-
mechanical properties of the basilar membrane and coarse ingly.) Although this last filtering step may seem to di-
coding within the cochlear nucleus cause a single- verge from the goal of modeling human perception, we
frequency input to excite neurons encoding a range of find it useful in practice since it removes band-limited
frequencies, with a central peak of excitation occurring at intensity offsets that arise from equalization (see example
the input frequency. This spreading of neural excitation in Fig. 3). Such equalization can be produced deliberately,
across frequencies underlies the psychoacoustic phenom- but is more often an unintentional consequence of the
enon of masking, in which a low-amplitude tone, at a limited frequency response of amplifiers, loudspeakers,
frequency near that of a higher amplitude tone or a band of microphones, or transmission systems.
noise, cannot be resolved [18]. Most applications based on The end result of all these transformations is a feature
human perceptual modeling (for example, MPEG layer 3 vector whose 128 components represent high-pass-filtered
[19]) compute an intensity threshold below which a perceptual intensity as a function of Bark frequency, in a
masked sound will not be heard. Since thresholds differ brief interval around the time point of interest. In order to
depending on whether the masker is a pure tone or noise, pair this frequency-domain information with some local
this thresholding method has the disadvantage of relying temporal information, the psychoacoustic transformation
on rather arbitrary measures of spectral flatness as indices is repeated in adjoining fragments, one immediately pre-
of tonality. ceding and the other immediately following the fragment
An alternative to modeling the masking threshold is to of interest. These temporal offsets produce a total of three
model the frequency spreading itself. With this approach, 128-dimensional feature vectors, which are concatenated
the masked threshold is not explicitly computed. Instead, it to form a 384-dimensional feature vector. The time point
arises from decreases in discriminability due to spreading of interest is advanced through the recording in half-
of the spectrum. The computational distinction between fragment steps of 92.8575 ms, so each time point is in-
noise masking and tone masking also is obviated [20]: the corporated into two different 128-vectors, and each 128-
same model is used in both cases, and the difference arises vector is used in three different 384-vectors.
in the model’s behavior in response to masking inputs Especially after frequency spreading and duplication
whose levels are fairly uniform across frequencies (noise across time, these feature vectors contain a great deal of
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 369
SCHMIDT AND BELMONTE ENGINEERING REPORTS

redundant information. In order to avoid the problem of


searching in a very high-dimensional space, the 384-
vectors are reduced to 8-vectors by principal components
analysis (PCA), a method also applied in a system by
Burges et al. [6]. PCA [23] is a standard method for re-
ducing dimensionality by exploiting correlations between
dimensions. In 3-space, for example, a set of vectors
whose coordinates were perfectly uncorrelated might form
a spherical volume of scattering, while a set of vectors
whose coordinates were perfectly correlated would form a
line. A correlation much greater than 0 but somewhat less
than 1 would describe a cylindrical volume, the long axis
of which would capture most of the variance in the data. If
one wanted to reduce the dimensionality of the data from
3 to 1, PCA could be applied in order to discover this axis
of greatest variation, and all of the original 3-vectors could
then be projected onto it. Fig. 2. Evolution of psychoacoustic transformation across suc-
More formally, PCA operates on an N-dimensional cessive fragments. Bark frequency is on horizontal axis, loudness
space by computing from a set of training data the N × N on vertical axis, and time on depth axis. In these half-
covariance matrix where element (i, j) is the covariance of overlapping, 93-ms fragments, psychoacoustic features arise and
the ith dimension with the jth. The eigenvectors of this disappear gradually over time.
matrix are the principal components, and for each of these
principal components the corresponding eigenvalue de-
scribes the amount of variance captured. Reduction from
384 dimensions to a basis of eight dimensions thus can be
accomplished by computing the eigenvectors of the 384 ×
384 covariance matrix and extracting the eight eigenvec-
tors whose eigenvalues are largest. The transformation
matrix from the original 384-space to a variance-
maximizing 8-space then consists of these eight vectors, or
principal components.
The training data for our PCA consist of all of the
384-dimensional feature vectors computed from nonover-
lapping groups of three fragments, that is, from samples at
intervals of 557.145 ms. The eight basis vectors computed
in this analysis (Fig. 4) account for 46.5% of the variance Fig. 3. Comparison of two inputs, identical except for equaliza-
in our database of 100 000 recordings. Two of these basis tion profiles. Note almost exact similarity in local psychoacoustic
vectors capture temporal information, changing sign features and frequency-dependent offset in absolute amplitudes.

Fig. 1. Steps in psychoacoustic transformation. Power is transformed to a Bark scale and spreading is applied to mimic the mechanical
properties of the human cochlea, implementing the psychoacoustic phenomenon of masking. Amplitude of the spread spectrum is
converted to dB, auditory threshold is subtracted, and resulting levels are converted to an arbitrarily scaled, dB-like measure of
perceptual loudness.

370 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

across each group of three successive fragments, and the within a given partition times the information contained in
other six capture frequency-domain information that is such a partitioning:
fairly constant across successive fragments. Although pro-

兺 冋|database| ⭈ − log 冉 | database| 冊册.


jecting onto this compact basis preserves slightly less than |partition| | partition|
half of the variance in the total corpus of recordings, our 2
partition
framework of multiple independent comparisons makes
the system robust to the resulting increase in match fail-
ures. An eight-component basis greatly reduces storage High entropy decreases processing time. One factor in this
requirements in comparison to larger bases, and even dou- decrease is that more partitions allow a database to be
bling the dimension of the basis would account for only an spread over many separate back-end servers. More signifi-
additional 10.4% of the sample variance (Fig. 5). cantly, though, even when many partitions are allocated to
a single server, high entropy means that each partition
contains fewer search candidates.
1.3 Fragment Analysis
Vectors are stored in the database as eight 4-byte IEEE
floating-point numbers, with 20 additional bytes associat-
ing the vector with its source recording and a temporal
offset within that recording. With an average recording
length of about 4 min, the 100 000 recordings in the da-
tabase occupy 12.2 Gbyte of storage. In order to attain
high throughput for match queries, this storage is split
across seven separate back-end servers, each holding a
1.75-Gbyte slice of the master database. Vectors are quan-
tized into partitions using the generalized Lloyd algorithm
[24], [25], and only the partition whose centroid has the
closest Euclidean distance to the query vector is searched.
One or more front-end index servers accept queries and
dispatch each of them asynchronously to the back-end
machine that holds the appropriate partition (Fig. 6). In
general, the number of back-end machines required for a
particular application is a function of both the size of the
database (given a limit on memory per single PC bus) and
the volume of queries per unit time. These two factors
scale somewhat independently: a small, heavily loaded
Tuneprint system can split a small amount of memory
across a large number of CPUs, whereas a large Tuneprint
system with a smaller query volume can concentrate more
memory in a smaller number of machines.
The partitioned database can be described by two in-
versely related measures: entropy and partition error. The
entropy of the system is the expected amount of informa- Fig. 5. Variance accounted for as a function of number of prin-
tion necessary for its description—in this case the sum, cipal components preserved. Eight components account for
over all partitions, of the probability that a vector will fall 46.5% of variance. From there, curve increases more gradually.

Fig. 4. Basis vectors computed by principal-components analysis. Each input is a concatenation of three 128-dimensional feature
vectors in successive fragments; frequency-selective components therefore replicate the same features three times over. Note broad
frequency selectivity of components 1 through 3, and time dependence of components 4 (decay and attack) and 5 (beat). Higher order
components show a spiky pattern of frequency selectivity, perhaps incorporating information from notes and harmonics. Since basis
vectors are normalized to unit length, magnitudes are arbitrary and are not specified here.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 371


SCHMIDT AND BELMONTE ENGINEERING REPORTS

The tradeoff is that with high entropy comes a high Since distance along the principal component is a lower
proportion of incorrectly partitioned queries. In the limit- bound for actual distance, N ⊆ N0.
ing case, in which a query vector is identical to one of the In order to exclude those vectors that are in N0 but not
vectors in the database, of course, the query will neces- in N, we organize the elements of N0 into a heap keyed on
sarily be allocated to the same partition as its best match. lower bound distance from the query vector. A heap, as
However, if the query vector lies some distance from its used in this case, is a binary tree in which the value of
best match, there is some risk that the query and the best every child node is greater than or equal to the value of
match will fall on different sides of a partition boundary. that child’s parent node. The root node in such a tree is
In such a case the best match will never be identified, termed the “top” of the heap, and it necessarily contains
because the partition that contains it will not be searched. the heap’s minimal value. We consider this minimal value,
The optimal partitioning scheme maximizes entropy with summing into it the squared distances along the next ⌬n
the constraint that the rate of query partitioning errors is principal components. The addition of these extra dimen-
low enough to produce good matches at a preponderance sions tightens the lower bound on the vector’s actual dis-
of time points. The balance between these two factors of tance from the query. In the case in which this addition
high entropy and low partitioning error depends on the causes the value at the top of the heap to grow larger than
average distance between queries and their best matches: the distances lower down in the heap, we swap values
again, in the limiting case where this distance is 0, each down the tree in order to reestablish the heap property.
vector in the database could be its own partition (and the We continue this process, on each iteration increment-
entire problem could be handled using string-matching ing the number of dimensions in the top vector’s distance
algorithms). estimate by ⌬n, and then reestablishing the heap property
Within the selected partition we consider only the on the basis of this increased distance estimate. When a
neighborhood N of vectors whose Euclidean distance from vector appears at the top of the heap for which all the
the query vector q does not exceed a threshold distance dimensions in the query space have been considered, that
dlimit. The theme behind this computation is to do only as is, for which the lower bound distance is identical to the
much work as is necessary to produce the desired number actual distance, we identify that vector as the next match,
of best matches. In general, the algorithm operates by and delete it from the heap. This incremental computation
considering successively larger subspaces of the query of distances allows us to perform only the computation
space. We begin by considering the one-dimensional sub- that is necessary in order to identify the closest matches.
space defined by the first principal component. (Recall In the case of the eight-dimensional space around which
that the principal components are the basis in terms of the current implementation is built, ⌬n ⳱ 7, and thus all
which the query space is represented; projections onto the the higher dimensional distance computation is done at
subspaces defined by these components therefore have once. However, this incremental algorithm has improved
zero computational cost, being implemented simply by performance in tests with higher dimensionality, and may
discarding the higher order dimensions.) To facilitate dis- prove useful if scaling of the system demands an increase
tance computations within this one-dimensional subspace, in the number of basis vectors. Although the search algo-
the contents of the database partition are maintained in rithm without incremental distance computation is O[D
sorted order according to the value of the first principal |N0| log(|N0|)] for N0 vectors in a space of D dimensions, in
component. The set N0 of all vectors that lie within a practice constant factors are such that this time bound is an
distance ±dlimit of the query vector along this principal improvement on O(D|N0|) exhaustive search. The algo-
component can thus be computed with an efficient search. rithm is specified formally in Fig. 7.

Let D be the dimensionality of the query space;


Let q ⳱ (q0 . . . qD−1) be the query vector within this space;
Let k be the number of matches desired;
for {v | (␯0 − q0)2 ⱕ d2limit}
dim(v):⳱ 1;
d2(v):⳱ (␯0 − q0)2;
initialize N as a heap keyed on d2;
MATCH :⳱ ;
do | MATCH | ⫽ k ∧ | N | ⫽ 0 →
Let v be the vector at the top of heap N;
if dim(v) ⳱ D →
MATCH :⳱ MATCH ∪ {v};
delete v from N
□ dim(v) < D →
d2(v) :⳱ d2(v) + ∑i⳱dim(v)
dim(v)+⌬n−1
(␯i − qi)2;
dim(v) :⳱ dim(v) + ⌬n;
reestablish N as a heap on d2
fi
od
Fig. 6. Block diagram of Tuneprint processing. Queries are dis-
patched to appropriate back-end server. Fig. 7. Algorithm for incremental search of query neighborhood.

372 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

scores, establishes the slope of the regression line between


1.4 Assembly query time and match time—in other words, the playback
Using this procedure we obtain the closest 10 matches speed. In most cases this slope will be 1. However, it is
for each time step of the query recording. Each of these possible for the playback speed to differ slightly from 1,
matching database entries is tagged with its source record- for example, in the case of a broadcaster trying to pack as
ing and its temporal offset within that source recording. many tracks as possible into an hour of air time. In appli-
Assembling the information from these multiple indepen- cations in which the playback speed is known to be 1, this
dent matches is a matter of detecting whether a prepon- regression step can be skipped.
derance of them originates from a single recording, with Linear regression could also be applied to estimate the
temporal offsets corresponding to their sequence within intercept, that is, the length of the recording’s initial si-
the query recording. lence or the amount by which the recording has been trun-
For each time step in the query, we first scale the results cated at its beginning. Such an estimate, however, would
for each recording in the database that has produced more be perturbed by the repetition of themes and refrains,
than a single match for this time step. (Singleton match which produce short sequences of matches with very dif-
sets are likely spurious—the results of partition errors and ferent temporal offsets (see Fig. 8). In order to avoid such
of collisions in the database—and are discarded.) If M is perturbation, we estimate the intercept by sliding the re-
the number of matches originating from a particular gression line through the match plot and computing the
matching recording, di is the distance between the query total number of matches within a five-point window cen-
vector and the match vector for the ith match, and (dmin, tered on the regression line. The intercept that maximizes
dmax) is the interval covering all the di’s, then the score for this local match total is taken as the actual temporal offset,
each match is defined as and the maximal total itself is the match score of the
recording as a whole. All the recordings under consider-
di + dmin − 2dmax ation are then ranked in order of this match score.
S= .
共兺 M−1
i=0 兲
di + Mdmin − 2Mdmax
2 PERFORMANCE
This expression scales the best match to 1 and the worst to
For a preliminary evaluation of Tuneprint’s perfor-
1/2, with the intermediate scores being a linear function of
mance on real-world data, a test corpus was created of
the match distance. The resulting scores are pooled across
randomly selected recordings that were publicly available
time steps for each matching recording. These match
as MPEG-1 layer 3 files on a peer-to-peer network. In
scores over time can be visualized as a two-dimensional
cases in which listening tests indicated that one of these
plot, with query time on one axis, match time on the other,
downloaded recordings was not in our database, or was a
and the intensity of each pixel corresponding to the score
duplicate of a recording already in the test set, the record-
of the match against the query at the corresponding time
ing in question was deleted from the test set. These dele-
point. A strong match thus appears as a bright diagonal
tions left a test set of 73 unique recordings. In all 73 cases,
line (Fig. 8) [16].
Tuneprint identified the test recording correctly as the da-
The top candidates in terms of number of time steps
tabase entry with the greatest match score.
matched are considered further. Linear regression on the
The system used to conduct this test split 1000 database
temporal offsets from each match, weighted by the match
partitions between 14 server processes, two on each of
seven physical servers. Each physical server had two 1.0-
GHz Athlon processors and 2 Gbyte of RAM. Since each
partition is a different size, a simple, greedy method was
used to balance the total number of vectors assigned to
each server process, considering each partition in turn
from largest to smallest and assigning it to the server pro-
cess with the smallest total number of vectors so far. This
system computed matches for an entire recording against
its database of size 100 000 in approximately 1.5 s of CPU
time. This figure includes the client’s time spent generat-
ing query vectors from the audio input, the front-end da-
tabase server’s time spent dispatching queries, and the
back-end servers’ time spent matching the queries.
Fig. 8. Graphical representation of a successful match. Horizon- The entropy of the partitioned database was 8.45,
tal axis is temporal offset within a query recording; vertical axis equivalent to a collection of 349 equally likely classifica-
is temporal offset within a candidate match in database. Intensity tions. Fig. 9 shows the linear relationship, within this test
of each pixel (i, j) represents distance between vector represent- database, between the distance of a query vector from its
ing ith sample of query recording and vector representing jth match target and the fraction of queries lost to partition
sample of database recording. Thus a perfect match appears as a
bright diagonal line, matches against refrains appear as diagonal errors. In our test system, about 25% of queries were lost
line segments offset from main line, and absence of a match to partition errors. However, this high error rate is well
appears as a dark field. tolerated by Tuneprint’s system of multiple independent
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 373
SCHMIDT AND BELMONTE ENGINEERING REPORTS

comparisons: as long as there are enough successful que- undergone lossy compression and other subtle distortions
ries to raise the score of the matching database entry above as are commonly found on file-sharing networks, and is
those of nonmatching entries, the input will be correctly designed to implement an identity criterion of equivalence
identified. An increase in partition error simply produces a to a human listener. Any validation of Tuneprint’s output
corresponding increase in the length of input required for must therefore depend on a sampling of recordings repre-
reliable identification. For example, 150 fragments at 50% sentative of those found on file-sharing networks, and on
loss will produce identifications as reliable as 100 frag- the independent evaluations of human listeners which, for
ments at 25% loss, since on average both scenarios yield this purpose of validation, cannot be automated. This re-
the same number of successful queries, namely, 75. Al- quirement for listening tests makes expansion of the test
though the relationship between match scores and identi- set very labor intensive and is the reason for the current
fication rate has not yet been systematically explored, our test set’s small size. More significant than the size of the
experience is that the top match can be reliably detected if test set, though, is the size of the database against which
it exceeds the next closest match by a score of 8 or more. the test recordings are being matched. It is the database
(Since matches at individual queries have values between size that determines the system’s liability to collisions, and
1/2 and 1, this threshold corresponds to matching at 8 to 16 hence to false positives and to match failures. Since the
individual fragments. Recall that the frames from which elements of the test set are randomly selected, each ele-
each fragment is produced are spaced at intervals of ment constitutes an independent test of the database. Al-
92.8575 ms. Therefore this threshold corresponds to an though the further development of Tuneprint would ben-
input length of between about 0.75 and 1.5 s, neglecting efit from more detailed and larger scale testing than was
the effect of temporal autocorrelation between frames. possible during the period of Tuneprint’s commercial
These figures match our experiences with inputs consist- funding, the present result with a database of size 100 000
ing of short clips, where identification becomes unreliable is an indicator of the promise of Tuneprint’s methods.
for input lengths below about 1 s.) Partly because of the small size of Tuneprint’s test set,
a detailed, quantitative comparison with the performance
3 FURTHER WORK of other systems remains an open question. Wang [17]
independently implemented a temporal consistency mea-
The current Tuneprint system is a work in progress, sure similar to Tuneprint’s, though without full psycho-
several aspects of which deserve further research. Some of acoustic modeling at the input stage; the experimental per-
these issues are points that will become important as the formance of this system has not been reported. Using the
system scales up beyond the current 100 000 recordings. vector quantization method of Allamanche et al. [9], Hell-
Others are possibilities for improvement regardless of muth et al. have implemented an advanced classifier [10]
scale. that makes use of the information inherent in the temporal
With large databases, the question of what constitutes sequence of feature vectors, though the exact nature of this
an acceptable test set becomes increasingly important. method is left unspecified. An alternative to vector quan-
Tuneprint is meant to be applied to recordings that have tization methods is robust hashing, in which feature vec-

Fig. 9. Rate of query misclassification as a function of distance between query and target. Note linearity of relationship.

374 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

tors are discretized by simple thresholding of their com- database partitioning can be characterized by the slope of
ponents, and the search space for approximate matches is this partitioning-error function in combination with the
then constrained to only those database recordings of entropy figure. The total expected CPU time per identifi-
which one or more frames produced an exact match in this cation is the product of the expected number of queries (a
discretized domain [11]. Although the distribution of function of the partitioning-error rate) and the expected
within-recordings hash errors in a database of size 10 000 CPU time per query (a function of entropy). It likely will
using this method theoretically predicts a very low rate of be possible to develop an optimization procedure that ap-
false positives with a high rate of identification [12], the plies these descriptive statistics to find the partitioning
between-recordings error rate has yet to be assayed ex- scheme that minimizes the expected CPU time per
perimentally in a large database. An important feature of identification.
systems based on multiple independent comparisons is Perhaps the most compelling question regarding parti-
that the method of analysis within time points is separable tioning is that of how much information needs to be pre-
from the method of evaluation for consistency across time served within a single partition. Vector quantization yields
points. This separability raises the possibility of plugging an efficiency of time, arising from the restriction of the
any within-points method into an alternative across-points search space to a particular partition, and an efficiency of
method—for example, Tuneprint’s psychoacoustically space, arising from the lossy coding of individual vectors
based model could be integrated with a robust hashing in terms of the partition to which each maps. Currently
system. Tuneprint takes advantage only of the temporal efficiency,
Another key question regarding scaling is how the applying vector quantization to select a particular database
growth of the database may increase the likelihood of partition to search. The potential spatial efficiency is not
collision, that is, a situation in which two items in the realized, since the original, unquantized vectors are pre-
database map so close together in Tuneprint’s search space served for use in nearest-neighbor matching within the
that discrimination between them is degraded or impos- selected partition. A database with very high entropy
sible. As Tuneprint matches first at the level of isolated might offer the potential for eliminating the query-to-
fragments and then at the level of whole recordings, there match distance measure within a partition, and instead
are two senses in which collision can be considered. First, treating all elements of the partition as equally good
collisions might occur between individual fragments of matches. The winning match at the level of whole record-
different recordings. Such collisions would affect matching ings could then be determined by consistency of matching
of the recording as a whole in the same way that parti- as in the current model. Such an ability to discard the
tioning error does. Tuneprint’s redundant strategy of mul- original vectors would offer large savings in space, by
tiple independent matching makes it robust to collisions at eliminating the bulk of the database, and in time, by elimi-
the fragment level in the same way that it is robust to nating the demand for nearest-neighbor matching in a
partitioning error—as long as there are significantly many high-dimensional space. We have implemented a proto-
successful matches, failed matches do not affect the iden- type of such a system which compresses the entire data-
tification [15]–[17]. Although in tests to date any small base into 3 Gbyte, runs on a single CPU, and, when suc-
effect of such collisions has been swamped by the effect of cessful, identifies a recording in only 110 ms of processing
partitioning error, a high rate of collisions could be ex- time on average.
pected to mimic the effect of partitioning error, increasing Variations on this high-entropy strategy also are pos-
the length of input required for reliable identification. sible. One optimization might involve overlapping parti-
Second, one could imagine a collision involving very tions, that is, allowing database vectors that lie near par-
strong matching to more than one recording in the data- tition boundaries to be included in more than one partition.
base. Although we have observed such occurrences during Conversely, query vectors that lie near partition bound-
our tests, all have turned out to involve cases in which a aries could be made to trigger a search of multiple parti-
recording loaded into the database from a peer-to-peer tions in the database. Either of these strategies would de-
network had been mislabeled and was actually a copy of crease the rate of partitioning error, at the expense of a
another matching recording already present in the data- modest increase in search time. One can also imagine a
base. We have never observed a true collision at the level two-pass system in which distance measures are computed
of whole recordings, and this absence of collision may only for those match candidates that show consistent
perhaps be expected given the length of a typical recording matching over time at the level of partitions.
and the number of dimensions within which it can vary as Currently the query is sampled at constant intervals
it evolves through time. throughout its length. Higher confidence in matching
Scaling certainly can be expected to figure into the likely can be achieved by varying this sampling period,
tradeoff between entropy and partitioning error. It remains both as a function of overall match confidence (more sam-
undetermined what degree of partitioning would be opti- pling of difficult-to-identify recordings) and as a function
mal even at the database’s current size. The limit in which of the temporal derivative of the psychoacoustic function
partitioning errors begin to affect accuracy has not yet (more sampling in the time intervals surrounding abrupt
been reached, and the current stopping point for the num- changes). Landmarking of the input recording is one way
ber of partitions is somewhat arbitrary. Assuming that the of implementing increased sampling at intervals of abrupt
rate of partitioning errors remains a linear function of the change, and it would be interesting to compare the perfor-
distance between the query and its best match (Fig. 9), a mance of landmarking based on simple acoustic properties
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 375
SCHMIDT AND BELMONTE ENGINEERING REPORTS

[17] with landmarking based on psychoacoustic proper- 4 ACKNOWLEDGMENT


ties. Although in general the psychoacoustic output varies
slowly over time, the presence of two principal compo- Profound thanks are due to our colleagues Daren Gill,
nents that encode time-varying information from neigh- Martin Stiaszny, Amittai Axelrod, Josh Pollack, Jennifer
boring fragments is an indication that abrupt changes con- Chung, and Lex Nemzer, without whose many day and
tain much useful information. night hours the Tuneprint system could never have been
Although dimensionality reduction by principal compo- implemented. In addition, we wish to acknowledge Sage
nents analysis is an expedient strategy, it may not be the Hill Partners, who funded the development of Tuneprint.
best way to form a basis. In particular, the roles of the two
time-varying components extracted by PCA have yet to be
5 REFERENCES
ascertained. In an environment in which queries may be
offset from their best matches by up to half the length of [1] A. Pikrakis, S. Theodoridis, and D. Kamarotos,
a fragment, an inclusion of time-varying components may “Recognition of Isolated Musical Patterns in the Context
simply add noise. In addition, the construction of a space of Greek Traditional Music,” presented at the IEEE Int.
based on independent component analysis (ICA) [26] has Conf. on Electronics, Circuits, and Systems, Rhodes,
not yet been explored, and should be. Greece, 1996 Oct. 13–16.
Although Tuneprint has performed well in preliminary [2] J. T. Foote, “Content-Based Retrieval of Music and
tests, the effects of distortion on Tuneprint’s performance Audio,” in C. C. J. Kuo, Ed., Multimedia Storage and Ar-
have not yet been explored systematically. Since our chiving Systems II, Proc. SPIE, vol. 3229, pp. 138–147
test set was selected at random from recordings publicly (1997).
available on peer-to-peer networks, its audio quality was [3] A. Pikrakis, S. Theodoridis, and D. Kamarotos,
representative of this population. Our statistics indicate “Recognition of Isolated Musical Patterns Using Discrete
that only about 1% of these recordings are encoded at Observation Hidden Markov Models,” presented at the
bit rates less than 128 kbits/s. In addition, our bandwidth European Signal Processing Conf., Rhodes, Greece, 1998
of 253 to 12 500 Hz ranges four times as high as that Sept. 8–11.
available in telephone transmission. Testing and retuning [4] D. Pye, “Content-Based Methods for the Manage-
for low-bit-rate encodings, for signals varying in playback ment of Digital Music,” in Proc. IEEE Int. Conf. on
speed, and for signals varying in bandwidth and other Acoustics, Speech, and Signal Processing (Istanbul, Tur-
equalization properties were under way with positive key, 2000), vol. 4, pp. 2437–2440.
early results, but unfortunately could not be completed [5] C. Papaodysseus, G. Roussopoulos, D. Fragoulis,
within the period of Tuneprint’s commercial funding. T. Panagopoulos, and C. Alexiou, “A New Approach to
The outlook for low-bandwidth processing seems par- the Automatic Recognition of Musical Recordings,” J. Au-
ticularly positive since most of the amplitude of the dio Eng. Soc., vol. 49, pp. 23–35 (2001 Jan./Feb.).
basis vectors (see Fig. 4) is contained in the lower Bark [6] C. J. C. Burges, J. C. Platt, and S. Jana, “Distortion
frequencies. Discriminant Analysis for Audio Fingerprinting,” IEEE
It is worth noting that audio is not the only medium to Trans. Speech Audio Process., vol. 11, pp. 165–174
which our general strategy of multiple independent match- (2003).
ing of fragments might apply. Any medium that can be [7] S. Sukittanon and L. E. Atlas, “Modulation Fre-
reduced to an array of features, either over time or over quency Features for Audio Fingerprinting,” presented at
spatial dimensions, can be subjected to fragmentation over the IEEE Int. Conf. on Acoustics, Speech, and Signal Pro-
these dimensions. A fragment analysis of images, for in- cessing, Orlando, FL, 2002 May 13–17.
stance, might compare features that are independent of [8] E. Wold, T. Blum, D. Keislar, and J. Wheaton,
translation, rotation, and scaling, and the corresponding “Content-Based Classification, Search, and Retrieval of
assembly function might compute affine transformations Audio,” IEEE Multimedia, vol. 3, pp. 27–36 (1996).
between the query image and matching portions of the [9] E. Allamanche, J. Herre, O. Hellmuth, B. Fröba,
database images. and M. Cremer, “AudioID: Towards Content-Based Iden-
We have described a system for content-based identifi- tification of Audio Material,” presented at the 110th Con-
cation of audio recordings, built around a strategy of mul- vention of the Audio Engineering Society, J. Audio Eng.
tiple independent matching. Using a psychoacoustically Soc. (Abstracts), vol. 49, p. 542 (2001 June), convention
based representation, this system matches inputs according paper 5380.
to a criterion meant to model that of a human listener. Its [10] O. Hellmuth, E. Allamanche, J. Herre, T. Kastner,
identifications are robust to truncation, and work well with M. Cremer, and W. Hirsch, “Advanced Audio Identifica-
the lossy compression commonly found in recordings on tion Using MPEG-7 Content Description,” presented at the
peer-to-peer networks. In addition, the system is designed 111th Convention of the Audio Engineering Society, J.
to be robust to modest variations in equalization and play- Audio Eng. Soc. (Abstracts), vol. 49, pp. 1227–1228 (2001
back speed. For equalization in particular, we found that Dec.), convention paper 5463.
high-pass filtering of the output of the psychoacoustic [11] J. Haitsma, T. Kalker, J. Oostveen, “Robust Audio
transform (see Fig. 3) was very effective in maintaining Hashing for Content Identification,” presented at the In-
performance. Successful operation on a database of ternational Workshop on Content-Based Multimedia In-
100 000 recordings bodes well for further scaling. dexing, Brescia, Italy, 2001 Sept. 19–21.
376 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS AUDIO IDENTIFICATION BY MULTIPLE INDEPENDENT PSYCHOACOUSTIC MATCHING

[12] J. Haitsma and T. Kalker, “A Highly Robust Audio tal Audio,” J. Audio Eng. Soc., vol. 42, pp. 780–792 (1994
Fingerprinting System,” presented at the 3rd Int. Conf. on Oct.).
Music Information Retrieval, Paris, France, 2002 Oct. 13–17. [20] J. G. Beerends and J. A. Stemerdink, “A Percep-
[13] R. J. McNab, L. A. Smith, I. H. Witten, C. L. Hen- tual Audio Quality Measure Based on a Psychoacoustic
derson, and S. J. Cunningham, “Towards the Digital Mu- Sound Representation,” J. Audio Eng. Soc., vol. 40, pp.
sic Library: Tune Retrieval from Acoustic Input,” in Proc. 963–978 (1992 Dec.).
1st ACM Int. Conf. on Digital Libraries (Bethesda, MD, [21] ISO 389-7 “Acoustics—Reference Zero for the
1996, pp. 11–18. Calibration of Audiometric Equipment—Part 7: Reference
[14] J. J. Hopfield and C. D. Brody, “What Is a Mo- Threshold of Hearing under Free-Field and Diffuse-Field
ment? Transient Synchrony as a Collective Mechanism for Listening Conditions,” International Organization for
Spatiotemporal Integration,” Proc. Nat. Acad. Sci. USA, Standarization, Geneva, Switzerland (1996).
vol. 98, pp. 1282–1287 (2000). [22] B. C. J. Moore, R. W. Peters, and B. R. Glasberg,
[15] G. Schmidt, “Tuneprint Audio Fingerprinting “Detection of Decrements and Increments in Sinusoids at
Technology: Technical Overview Version 1.1,” paper pre- High Overall Levels,” J. Acoust. Soc. Am., vol. 99, pp.
pared in response to Recording Industry Association of 3669–3677 (1996).
America/IFPI request for information (2001 June 6). [23] I. T. Jolliffe, Principal Component Analysis, 2nd
[16] G. Schmidt, “Tuneprint Fingerprinting Technol- ed. (Springer, New York, 2002).
ogy,” presented to the Recording Industry Association of [24] Y. Linde, A. Buzo, and R. M. Gray, “An Algo-
America, Washington, DC (2001 Sept. 5). rithm for Vector Quantizer Design,” IEEE Trans. Com-
[17] A. Wang, “System and Methods for Recognizing mun., vol. 28, pp. 84–95 (1980).
Sound and Music Signals in High Noise and Distortion,” [25] S. P. Lloyd, “Least Squares Quantization in PCM,”
International Patent Publ. WO 02/11123 A2 (2002). IEEE Trans. Inform. Theory, vol. 28, pp. 129–137 (1982).
[18] E. Zwicker and H. Fastl, Psycho-acoustics: Facts [26] A. J. Bell and T. J. Sejnowski, “An Information-
and Models, 2nd ed. (Springer, Berlin, 1999). Maximization Approach to Blind Separation and Blind
[19] K. Brandenburg and G. Stoll, “ISO/MPEG-1 Au- Deconvolution,” Neural Comput., vol. 7, pp. 1004–1034
dio: A Generic Standard for Coding of High-Quality Digi- (1995).

THE AUTHORS

G. Schmidt

Geoff Schmidt was born in Branson, MO, in 1980. He a career in machine learning or programming language
left undergraduate studies at the Massachusetts Institute of design. E-mail gschmidt@mit.edu.
Technology in Cambridge, MA, after one term to pursue

entrepreneurial interests. Working alone, he developed the
Tuneprint framework in mid-2000 and in 2001 secured Matthew Belmonte studied English literature and com-
venture capital funding from Sage Hill Partners in Cam- puter science as an undergraduate, developing an interest
bridge to scale up and commercialize the system. Tune- in the processing of formal and natural languages. Moving
print Corporation ceased operations in 2002. from artificial to biological computing systems, he applied
Mr. Schmidt’s previous work on output-sensitive vis- this computational interest to problems in cognitive neu-
ible surface determination algorithms for 3-D rendering roscience. He is the author of several papers on the neu-
has won awards from the International Science and Engi- rophysiology of attention and perception in normal and
neering Fair, the U.S. Junior Science and Humanities Sym- autistic populations and on computational methods for sta-
posium, and other research competitions. He is now a com- tistical analysis of biophysical time series. He was a re-
mercial research consultant, currently performing machine search scientist at Tuneprint Corporation, the architect of
vision research at Intellivid, a Cambridge startup company. Tuneprint’s psychoacoustic model, and a major contribu-
Mr. Schmidt’s personal interests include meditation, tor to the technical description of the Tuneprint system. He
teaching, and politics. He recently served as campaign left the United States in 2002 and currently works in func-
manager for Matt DeBergalis’s Cambridge City Council tional magnetic resonance imaging at the University of
campaign, where in addition to day-to-day management he Cambridge Autism Research Centre in the UK. E-mail:
designed a successful direct mail effort. He plans to pursue belmonte@mit.edu.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 377


ENGINEERING REPORTS

On the Detection of Melodic Pitch in a


Percussive Background*

PREETI RAO, AES Member, AND SAURABH SHANDILYA

Department of Electrical Engineering, Indian Institute of Technology, Bombay, India

The extraction of pitch (or fundamental frequency) information from polyphonic audio
signals remains a challenging problem. The specific case of detecting the pitch of a melodic
instrument playing in a percussive background is presented. Time-domain pitch detection
algorithms based on a temporal autocorrelation model, including the Meddis–Hewitt algo-
rithm, are considered. The temporal and spectral characteristics of percussive interference
degrade the performance of the pitch detection algorithms to various extents. From an
experimental study of the pitch estimation errors obtained on a set of synthetic musical
signals, the effectiveness of the auditory-perception–based modules of the Meddis–Hewitt
pitch detection algorithm in improving the robustness of fundamental frequency tracking in
the presence of percussive interference is discussed.

0 INTRODUCTION Pitch determination of speech signals has been the sub-


ject of research for decades [1]. It is marked by challenges
The problem of pitch (or fundamental frequency) ex- arising from the complex temporal and spectral structure
traction of periodic signals in the presence of interfering of speech as well as its nonstationary nature. While mu-
sounds and noise is an important problem in both speech sical applications require a higher accuracy of pitch esti-
and music applications. Apart from the value of pitch in- mation than speech applications, tracking the pitch of a
formation per se, a knowledge of the time-varying funda- melodic instrument or singing voice is easier than tracking
mental frequency can be useful in the separation and re- that of speech signals due to the relatively slowly changing
construction of a harmonic source from a sound mixture. signal characteristics. The presence of interfering sounds
A number of pitch detection algorithms (PDAs) have been from percussive accompaniment, however, would be ex-
proposed over the decades. But while each has had a mea- pected to adversely affect the accuracy of pitch estimates
sure of success in the targeted application, no single PDA of any given PDA for musical applications. Percussive
is found suitable for all types of signals and conditions. sounds are characterized by rapidly varying temporal en-
This engineering report presents an investigation of the velopes, mixed partials-plus-broad-band noise spectra, and
performance of some well-known PDAs in estimating the low values of signal-to-interference ratio in localized time
fundamental frequency of a melodic instrument playing in intervals. The peculiar problems in pitch detection posed
the presence of percussive background. This is a restricted by such interference form the main focus of the present
case of the larger problem of musical pitch detection in study. In particular, we consider the degradation caused by
polyphony. Nevertheless it is an important problem. For the presence of inharmonic interfering partials. The ro-
instance, classical Indian vocal and instrumental music is bustness of pitch detection methods to additive broad-band
always accompanied by percussive instruments providing noise has been studied in various contexts in the literature
the rhythmic structure. The melody itself is strongly char- (see, for example, [2]). While the motivation for the pres-
acterized by the presence of microtones and continuous ent work is the pitch tracking of the melodic instruments
pitch variation. Detecting the melodic pitch contour has including the singing voice in the presence of percussion,
important applications in music recognition and for gen- we use in our experiments test signals from a set of MIDI
erating metadata in audio content retrieval systems. instrument voices to enable controlled experiments focus-
ing on the effects of percussion, with access to the “ground
*Manuscript received 2003 July 9; revised 2004 February 9. truth” pitch available.
An early version of this work was presented at the 114th Con- In this study an important subclass of PDAs, namely,
vention of the Audio Engineering Society, Amsterdam, The those based on the detection of periodicity in the time-
Netherlands, 2003 March 22–25, under the title “Pitch Detection domain signal by means of short-term correlation, is con-
of the Singing Voice in Musical Audio.” sidered. Autocorrelation-based pitch determination, used
378 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004
ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

widely in speech analysis [1], has also been found suitable to hair-cell transduction is applied and the temporal peri-
for the pitch tracking of monophonic musical signals [3]. odicity detected separately in each frequency channel by
The present study investigates the Meddis–Hewitt percep- means of autocorrelation. Finally the across-channel infor-
tual PDA as an example of a more sophisticated algorithm mation is combined to produce a single pitch estimate. The
also based on the detection of periodicity via temporal recent pitch perception model of Meddis and Hewitt [5]
autocorrelation. has gained much prominence due to its demonstrated abil-
The engineering report is organized as follows. Section ity to predict the results of certain crucial pitch perception
1 provides a brief overview of various PDAs, with an experiments. The Meddis–Hewitt PDA is based on the
introduction to the PDAs chosen for the present study. The functional modules of the auditory periphery with added
subsequent sections describe the implementation of the processing stages that emulate auditory processing, which
functional blocks of the PDAs and the evaluation of is considered to be more central. The various stages of the
the PDAs by an experiment on the synthetic signal test PDA are a bandpass filter representing the transfer func-
set. The study concludes with a discussion of the obser- tion of the outer ear and middle ear canal, a bank of filters
vations targeted toward obtaining insights into the perfor- modeling the basilar membrane response, followed by a
mance of the PDAs with respect to signal and interference model of the inner hair cell applied to each filter channel
characteristics. output to simulate neural transduction, obtaining a series
of firing probabilities. Next an ACF periodicity detector is
1 PITCH DETECTION ALGORITHMS applied to each of the hair-cell model outputs. Finally a
summary autocorrelation function (SACF) is formed by
Time-domain PDAs, the oldest pitch detection algo- adding the ACFs so obtained across the frequency chan-
rithms, are based on measuring the periodicity of the sig- nels. A search for the highest peak in the relevant range of
nal via the repetition rate of specific temporal features. SACF lags provides an estimate of the pitch period.
Frequency-domain PDAs, on the other hand, are based on The added presence of noise and inharmonic partials
detecting the harmonic structure of the spectrum. Among due to an interfering signal perturbs the shape and location
the simpler time-domain PDAs is the popular autocorre- of the peaks contributed by the signal harmonics in the
lation function (ACF)–based PDA. The definition of the ACF. Thus the traditional ACF pitch detector applied to a
“biased” autocorrelation function is given by [1] musical signal with percussive accompaniment would be
N−␶−1 expected to be adversely affected by the presence of noise
ACF共k,␶兲 = 兺 y共k + i)y(k + i + ␶)
i=0
(1) and inharmonic frequency components contributed by the
percussion. On the other hand, in the case of perception-
where k and ␶ are position of window and correlation lag, based PDAs, the signal is processed by a number of au-
respectively, and y is the input signal. ditory-model–based blocks before being subjected to pe-
For a pure tone, the ACF exhibits peaks at lags corre- riodicity extraction via the ACF. In particular, a combina-
sponding to the period and its integral multiples. The peak tion of linear and nonlinear filtering is applied, and the
in the ACF of Eq. (1) at the lag ␶ corresponding to the temporal periodicity information itself is computed via the
signal period will be higher than that at the lag values ACF separately in each frequency channel. It is of interest
corresponding to multiples of the period. For a musical to examine whether and to what extent these perceptually
tone consisting of the fundamental frequency component motivated enhancements improve the reliability of the
and several harmonics, one of the peaks due to each of the pitch estimation for harmonic musical signals with con-
higher harmonics occurs at the same lag position as that spicuous background interference. While much recent
corresponding to the fundamental, in addition to several work has investigated the ability of perceptual PDAs to
other integer multiples of the period (subharmonics) of predict subjectively perceived pitch in psychoacoustic ex-
each harmonic. Thus a large peak corresponding to the periments, the present work examines the robustness of the
sum contribution of all spectral components occurs at the PDA for estimating the signal fundamental frequency in
period of the fundamental (and higher integral multiples of the presence of interference. By means of carefully de-
the period of the fundamental). This property of the ACF signed test signals, we study the pitch estimation errors
makes it very suitable for the pitch tracking of monopho- obtained by the PDAs in the presence of percussive inter-
nic musical signals. The ACF PDA chooses as the pitch ference with respect to the underlying pitch of the melodic
period the lag corresponding to the highest peak within a voice, and later attempt to explain these observations.
range of lags.
In contrast to the simplicity of the ACF pitch detector 2 IMPLEMENTATION OF PDA
are more recent PDAs, also based on autocorrelation, but FUNCTIONAL BLOCKS
derived more closely from the mechanism of temporal
coding in the human auditory system. These PDAs can in Fig. 1 provides a modular structure for the PDA based
fact be viewed in terms of preprocessing of the signal on the Meddis–Hewitt pitch perception model [5]. The
followed by autocorrelation-based detection. There are a individual blocks represent various stages of the algo-
number of variants in this class of PDAs, but they all share rithm, each of which may, in principle, be implemented in
some important characteristics [4]. They decompose the multiple ways.
signal into frequency bands defined by the auditory filters Block 1 represents the outer ear and middle ear (OEM)
of the cochlea. Next, nonlinear processing corresponding prefiltering, with the magnitude response shown in Fig. 2.
J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 379
RAO AND SHANDILYA ENGINEERING REPORTS

Essentially a bandpass filter with a resonance frequency can range from a full model of the hair cell derived from
near 3 kHz, this block has been implemented by the cas- a computational analysis of actual hair-cell and auditory-
cade of an eighth-order low-pass IIR filter and a second- nerve processes [5] to simple half-wave rectification fol-
order parameterized high-pass filter with a high-pass filter lowed by low-pass filtering [8]. In 1986 Meddis proposed
parameter value of 0.94 [6]. The magnitude response is a model for the hair cell which simulates, through differ-
similar to the inverted absolute threshold-of-hearing curve, ence equations rich in parameters, several properties of the
whereas below 1 kHz it is approximated by the inverted neural transduction and auditory nerve firing [9]. Impor-
equal-loudness contour for high loudness levels. tant characteristics of the hair-cell model are its nonlin-
Block 2 represents the cochlear filter bank with filter earity and frequency selectivity. The present implementa-
center frequencies that are equally spaced on the ERB tion is based on the hair-cell model of Meddis and
(equivalent rectangular bandwidth) scale with the band- coworkers [9], [10] as implemented in the auditory model
width increasing with the center frequency. A filter bank library of Slaney [11].
of 27 ninth-order gammatone filters of bandwidth 1 ERB Block 4 calculates the ACF [Eq. (1)] of the signal input
each are based on the corresponding function of the to this block. Keeping in mind the range of 150–800 Hz
HUTear library [7]. These filters have center frequencies for the expected fundamental frequency, we have used a
ranging from 123 Hz to 5.636 kHz, or 4 to 30 on the ERB 40-ms window with 50% overlap (frame space 20 ms) for
scale. The output of each filter simulates the pattern of the computation of the ACF. Once the ACFs are obtained
vibration at a particular location on the basilar membrane. for all the channels [by implementing Eq. (1) on the hair-
Next the conversion of this mechanical activity to the cell model output of each channel], block 5 performs the
neural spike generation events of the auditory nerve is task of combining them. Combining can occur in the form
simulated by block 3. The implementation of this module of either simple or weighted addition. We use simple sum-

Fig. 1. Block diagram of functional blocks of Meddis–Hewitt PDA with postprocessing added.

Fig. 2. Magnitude spectrum of outer-ear–middle-ear (OEM) filter.

380 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

ming. The combined ACFs are known as summary ACF in order to create a number of different test conditions,
(SACF). In block 6 the SACF is searched for the highest pitch-shifted versions of the melody were created as fol-
peak within a prespecified range (corresponding to the lows: high (up by 4 semitones to the range of 440–787 Hz)
expected fundamental frequency range of 150–800 Hz). or low (down by 12 semitones to the range of 174.5–311
The lag value corresponding to the highest peak is ac- Hz). It may be remarked that the pitch transformations are
cepted as the estimated pitch period. Block 7 is a postpro- achieved via “instrumental” pitch shifting, which implies
cessing block, which smoothens out local variations in the that the relative amplitudes of the harmonics remain un-
pitch estimates across frames using a simple three-point changed across fundamental frequency changes, in con-
median filter. The combination of all seven blocks consti- trast to formant-corrected pitch shifting.
tutes the Meddis–Hewitt PDA with added postprocessing. The set of percussive instruments represents a range of
In the next section we describe a procedure for evaluating signal characteristics, as illustrated in the spectrograms of
the contribution of the various functional blocks of the Fig. 3. The kick drum is a relatively fast decaying signal
Meddis–Hewitt PDA to improving robustness over direct with predominantly low-frequency content. The hi hat is
ACF peak-based pitch detection. characterized by a slow time decay and a broadly spread
spectral mixture of moderately strong partials and noise.
3 EXPERIMENTAL EVALUATION Low agogo has low-noise content and strong partials all
the way from 1.1 to 10 kHz, with a moderate rate of decay.
In order to investigate the performance of the pitch de- To obtain a variety of combinations of target and inter-
tection algorithms for the pitch estimation of a harmonic ference timbres, the song was transformed by changing the
signal in a percussive background, a set of test signals was target instrument and then selecting only one of the inter-
designed from available MIDI songs. Apart from the ac- fering (percussive) instruments at a time. The selected
cess to the “ground truth” pitch, the use of MIDI files target instrument voices were of different timbres, as
provides great flexibility by allowing the inclusion or shown by the magnitude spectra of a fixed note in Fig. 4.
elimination of individual monophonic instrument chan- For the middle- and high-frequency ranges, baritone sax
nels, modifying the relative strengths of the component (prominent high harmonics), flute (weak high harmonics),
sounds, pitch transformations, and choice of the instru- and oboe (harmonics spread in frequency) were used. For
ment playing the melody line as well as the percussive the low-frequency range the flute was replaced by alto sax
instrument. to incorporate a more natural sound.
The JAZZ MIDI sequencer, available as shareware,1
3.1 Test Signal Set was used to achieve the needed transformations. The
A MIDI song of length 8 seconds was selected. It had a melody line was recorded, switching off all the other chan-
single harmonic instrument (alto sax) playing the melody nels, in each of the three pitch ranges with each of three
accompanied by several percussive (nonpitched) voices, target instruments. Thus a set of nine files containing pure
namely, hi hat, kick drum, and low agogo channels. The melody was obtained. For each of these melody files we
pitch range of the melody was 350–620 Hz and consisted created three “corrupted” versions, each with only a single
of four similar phrases, with each phrase comprising five percussion channel turned on. The synchronization be-
notes of various durations. [The pitch contour of the
1
melody can be seen as the solid line in Fig. 5(b).] Further, www.jazzware.com.

Fig. 3. Spectrogram illustrating time–frequency behavior of three percussion instruments.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 381


RAO AND SHANDILYA ENGINEERING REPORTS

tween the melody and each of the percussion tracks was simple three-point median postfilter only corrects isolated
such that a minimum number of percussion strikes fell in pitch errors. The instances of percussion where pitch er-
the silence region between target instrument notes. This rors do not occur seem to be characterized by overlapping
led to percussion onsets being located at a variety of po- partials between target and percussion. Fig. 5(c) illustrates
sitions with respect to target note onset, steady state, and the pitch contour obtained by using the Meddis–Hewitt PDA,
decay. Then relative amplitudes of the target and interfer- also followed by the postfilter, on the same test signal. Indeed
ing signals were set so that the ratio of signal power to the performance has improved on using this algorithm.
interference power (each of the powers being computed as A controlled-parameter experiment was carried out to
the corresponding average over the nonsilent regions of study the behavior of the PDAs on the underlying signal
the musical piece) remained at a fixed predefined value characteristics, and to obtain an understanding of the role
(equal to 2.0) for a set of test signals across various target of each of the functional modules of the Meddis–Hewitt
instruments, percussive instruments, and pitch ranges. The algorithm in influencing the pitch detection. Specifically,
signal-to-interference ratio, however, is only an average four PDAs were evaluated on the test data set. All four
value with local values deviating greatly, depending on the algorithms are derived from the generic block diagram of
position of the percussion strike with respect to the target Fig. 1 by choosing different combinations of subblocks
instrument note onset. We thus obtained a total of 27 test and/or different realizations of a specific subblock. The
signals, all sampled at 44.1 kHz. postprocessor of block 7 is a three-point median filter that
is included in all four algorithms. The details of the four
3.2 Experiment PDAs are as follows.
The PDAs were run on the pure test signals, with the 1) AC1: This algorithm incorporates blocks 4, 6, and 7
same postprocessing applied to each PDA estimate to en- only. It corresponds to the traditional ACF pitch detector.
sure a fair comparison. Block 4, that is, the ACF calculation block, uses a rectan-
Fig. 5 shows a sample of the experimental results ob- gular window and the biased ACF computation of Eq. (1).
tained. The selected test signal is the combination of bari- 2) AC2: This algorithm incorporates block 1 with
tone sax and low agogo percussion. Fig. 5(a) shows the blocks 4, 6, and 7. This again is a traditional ACF pitch
spectrogram of the test signal. The relatively continuous detector, but with outer ear/middle ear filtering included as
dark lines correspond to the harmonic partials of the bari- a preprocessing function.
tone sax, and the very short dark segments that occur 3) AC3: This algorithm comprises blocks 1, 3, 4, 6,
during the first (2 strikes), third, and fourth notes of each and 7. This algorithm is an extension of AC2, where the
phrase correspond to the partials of the low agogo. Fig. neural transduction block (based on the hair-cell model of
5(b) compares the true pitch track with the pitch track [9]) has been introduced. Unlike the Meddis–Hewitt PDA,
estimated by the traditional ACF pitch detector followed the signal is not decomposed into separate frequency chan-
by a three-point median filter. We see that there are large nels. Rather, the hair-cell nonlinearity is applied to the full
pitch estimation errors, which coincide with the occur- band signal followed by ACF pitch detection to result in a
rence of the percussion and last over several frames. The single estimate of temporal periodicity per frame.

Fig. 4. Magnitude spectra of melodic instruments used in experimental study.

382 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

4) MH1: All blocks 1 to 7 are included, making this a From an inspection of the bar charts we note that the
complete implementation of the Meddis–Hewitt algorithm extent of pitch errors depends not only on the PDA but
with postprocessing added. For the ACF, a Hamming- also on the percussion instrument, the pitch range of the
windowed biased ACF is computed for each channel using target instrument, and the target instrument itself. Of the
an efficient fast Fourier transform implementation. latter three factors, the most marked is the dependence of
the PDA performance on the percussion characteristics.
3.3 Observations The ACF pitch detector AC1 makes a large number of
The bar charts of Figs. 6 to 9 display the results of the errors for all three percussions. The target instrument pitch
experiment in terms of a count of the pitch errors with range that is most affected is seen to depend on the spec-
respect to the known reference pitch contour, arranged by tral characteristics of the interference. The kick drum with
the PDA configuration used. A pitch estimate is obtained its low-frequency support affects the lowest pitch range
for every analysis frame only in regions where the target the most, whereas the low agogo with its broad spectral
instrument is playing. At a frame spacing of 20 ms, this mixture of partials and noise impacts all three pitch ranges.
comprises a total of 285 frames. A pitch error is defined to The introduction of OEM filtering in AC2 has the effect
occur whenever the detected pitch deviates from the of an overall lowering of the extent of pitch errors. A
reference pitch by 3% or more (about half a semitone) of strong exception to this happens to be the hi hat in the
the reference pitch frequency. Of the detected pitch high-pitch range (note the changed scale of the error axis).
errors, those of magnitude less than 6% are labeled “fine” The hair-cell model followed by autocorrelation (AC3)
errors, whereas those of higher magnitude are labeled serves to reduce all errors further, with the only significant
“gross.” The gross errors are found typically to be pitch errors remaining in the low agogo signals. Finally, the
octave errors. It may be noted that in the absence of per- full Meddis–Hewitt PDA MH1 reduces the errors in the
cussion, no pitch errors were observed in any of the PDA low agogo signals of the low- and middle-pitch ranges, but
configurations. worsens the performance slightly in the high-pitch range.

Fig. 5. Pitch estimates for test signal “middle pitch, baritone sax with low agogo.” (a) Input signal spectrogram (prominent low-
frequency partials of percussion encircled). (b) Actual pitch (——) and pitch estimated from AC1 PDA (- - -). (c) Actual pitch (——)
and pitch estimated from MH1 PDA (- - -).

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 383


RAO AND SHANDILYA ENGINEERING REPORTS

4 DISCUSSION inharmonic) as the inverse Fourier transform of the power


spectrum of the signal [1]. The signal power spectrum is
In each of the PDA configurations used in the experi- insensitive to the relative phases of the components, to the
ment, the final pitch estimate (prior to postprocessing) is extent that the window is long enough that there is no
obtained by searching the ACF (or SACF) for the lag significant leakage of the frequency components. Due to
value corresponding to the highest peak within the ex- the linearity of the Fourier transform, the ACF of the
pected range of lags. The ACF is computed either directly signal is the summation of the ACFs of the individual
on the input signal or after nonlinear processing by the components in the signal power spectrum, and is therefore
hair-cell model. In the case of the Meddis–Hewitt PDA, insensitive to the phase relations between components. Based
the hair-cell nonlinearity followed by the ACF is com- on this interpretation of the ACF, the observations of the
puted separately in frequency bands as determined by the previous section are discussed and justified via simula-
gammatone filter bank, and then combined linearly to ob- tions using simplified implementations of channel separa-
tain the SACF. To understand the behavior of ACF peak- tion (ideal bandpass filters) and hair-cell nonlinearity (a
based pitch detection better it is useful to think of the ACF half-wave rectifier followed by a low-pass filter given by
of a signal comprising several components (harmonic and first-order Butterworth with 1-kHz cutoff frequency).

Fig. 6. Error performance of AC1 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.

384 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

For the pure target signals, each of which contains a prominent peaks (at pitch and pitch multiples) but affects
number of harmonics, including the fundamental, the win- only their relative amplitudes. As a result, the “choose the
dowed ACF of the input signal computed according to Eq. highest peak in the ACF” approach typically results in
(1) shows peaks at lags corresponding to the pitch period either a fine error due to a misshapen pitch peak or a gross
and multiples of the pitch period. The highest peak corre- error in the form of a pitch octave error. Fig. 10(a) shows
sponds to the pitch period, and there is no error in the the ACF of a periodic signal of fundamental frequency
estimated pitch. On the other hand, when the input signal 600 Hz with the first four harmonics of amplitudes 10, 18,
to the ACF contains noise or interfering partials, there is a 14, and 12. At the sampling rate of 44.1 kHz, the signal
perturbation of the peak corresponding to the correct pitch pitch period is 73.5 samples. A single interference partial
period. The ACF of the interference partial (which can be of fundamental frequency 3300 Hz and amplitude 16 (cor-
considered to combine additively with the target ACF to responding to signal-to-interference power ratio [SIR] 3.0)
form the corrupted ACF) modifies the values of the origi- is added to the signal, resulting in the ACF of the noisy
nal ACF at all lags, thus modifying amplitudes at all lags signal shown in Fig. 10(b). We see that the likelihood of
to some extent. Unless the interference partial is very an octave error in the ACF of the noisy signal is highest
strong, this is not sufficient to change the locations of the when, as depicted in Fig. 10, a valley of the ACF of the

Fig. 7. Error performance of AC2 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 385


RAO AND SHANDILYA ENGINEERING REPORTS

noise partial coincides with the signal pitch peak and the the case of the hi hat and, to some extent, on the low
peak of the noise ACF coincides with a target pitch mul- agogo. This can be explained by the low-frequency nature
tiple. This is true whenever an interference partial occurs of the spectrum of the kick drum, which consequently is
at or near an odd multiple of half the target fundamental heavily attenuated by the OEM filter. The hi hat and low
frequency. Likewise, we expect no pitch errors when the agogo on the other hand have brighter spectra with much
interference partial is near a multiple of the target funda- middle frequency content that remains after the OEM fil-
mental frequency. It is easy to see from Fig. 10 that the ter. The target spectrum, because of its preponderant lower
likelihood of pitch octave error would increase as the am- harmonics, suffers greater overall attenuation than the bright
plitude of the noise partial increases relative to the target spectra percussions. The unusually sharp rise in pitch errors
signal strength. This explains why the introduction of a in the high-pitch target range with hi-hat interference was
linear filter such as the OEM filter affecting the relative found to be due to the chance occurrence of an interference
amplitudes of the signal and noise partials leads to a partial at an odd multiple of half the fundamental frequency
change in the error profiles, as seen in Fig. 7. The intro- of a note of recurring pitch throughout the song. This partial
duction of outer ear–middle ear filtering reduces the errors fell near the resonance frequency (3 kHz) of the OEM filter
in the case of the kick drum, but has the contrary effect in and was a prominent spectral component in the filtered sig-

Fig. 8. Error performance of AC3 PDA for various target instruments and pitch ranges with percussion instrument in background. (a)
Kick drum. (b) Hi hat. (c) Low agogo.

386 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

nal. It resulted in an octave error in the ACF pitch estimate leads to a bias favoring lower pitch lags, as seen in Fig.
almost throughout the duration of the note in question. 10(c), where we also observe the attenuation of the high-
Introducing the hair-cell model prior to ACF computa- frequency partial. Such effects contribute to the overall
tion is equivalent to a nonlinear processing of the signal improvement in the performance demonstrated by the
that, among other effects, gives rise to new frequency AC3 PDA in Fig. 8. In particular, a more robust pitch
components located at sum and difference frequencies of estimator is obtained in the case of the interference partial
the original components. In the case of a weak or missing at a high odd multiple of half the target fundamental fre-
fundamental, the creation of distortion components con- quency. The Meddis–Hewitt PDA is an enhancement of
tributes to the enhancement of the fundamental frequency the AC3 algorithm in that a cochlear filter bank is in-
component [1]. In addition the hair-cell model introduces cluded. The ACF is computed separately in each fre-
a dc bias and a low-pass frequency selectivity [9]. The quency channel, and summed across channels to obtain the
presence of interference partials at nonharmonic locations pitch estimate as the largest peak lag in the search range.
gives rise to nonharmonic distortion components, whose The frequency decomposition affected by the filter bank
magnitudes depend on the magnitudes and phases of the limits the number of interacting partials through the hair-
interacting components (both signal and interference). Due cell model nonlinearity applied separately to each channel.
to this the distortion components affect the peak at pitch Fig. 11, obtained for the same signal and interference as
lag in the ACF in different ways. One consistent effect is Fig. 10, illustrates the effect of this on the SACF. Shown,
the dc level introduced by the hair-cell processing that for two different channel configurations (of four channels

Fig. 9. Error performance of the MH1 PDA for various target instruments and pitch ranges with percussion instrument in background.
(a) Kick drum. (b) Hi hat. (c) Low agogo.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 387


RAO AND SHANDILYA ENGINEERING REPORTS

simulated by ideal bandpass filters), are the signal and of gross errors for all three percussions, depending on the
interference frequency components at the output of the frequency relation between target and interference partials.
channel nonlinearities as well as the corresponding SACF. The low-pitch-range errors are the most pronounced in the
In Fig. 11(a) the interference partial is in a separate chan- case of the kick drum due to a strong low-frequency partial
nel by itself. This eliminates any distortion components from this percussion. Fig. 12 illustrates this effect on ACF
created due to an interaction of target harmonics and in- peak-based pitch estimation by simulating the kick drum by
terference. On the other hand, the co-occurrence of several a strong interference tone at 68 Hz. Shown in Fig. 12 are
higher harmonics of the target in a single channel strength- ACFs for signals of fundamental frequency 600 Hz and 200
ens the fundamental frequency component in the SACF. Hz with the same harmonic amplitudes as the signal of Fig.
These two effects lead to an improved ACF peak at the 10. Both clean signals yield accurate pitch peaks in the SACF
signal pitch period of 73 samples. This explains the im- (at lags of 73 samples and 220 samples, respectively). How-
proved performance of the Meddis–Hewitt PDA for the ever, the addition of the 68-Hz low-frequency interference
low agogo samples for the low- and middle-pitch ranges. tone (with amplitude 26, corresponding to SIR 1.1) intro-
In the high-pitch range, however, it was observed that due duces a low-lag bias in the overall ACF in both cases. This
to the higher interharmonic spacing, several of the chan- leads to a gross pitch error (pitch submultiple selected) in
nels contained only a single harmonic of the target instru- the case of the lower fundamental frequency signal since
ment accompanied by interference components. This con- its pitch period is comparable to that of the interference.
dition is depicted and simulated by the configuration of
Fig. 11(b), where the signal partials occupy different chan- 5 CONCLUSIONS
nels and the noise partial shares a channel with a target In this engineering report an experimental investigation
harmonic. The last channel gives rise to inharmonic dis- is presented of the performance of pitch-detection algo-
tortion components, one of which is visible in the figure. rithms based on temporal autocorrelation for the pitch
Together with the reduced contribution to the fundamental tracking of a melodic signal with percussive accompani-
frequency due to the absence of unresolved harmonics, ment characterized by inharmonic partials. The perfor-
this leads to a degradation of the pitch estimate. mance of the autocorrelation pitch detector as well as its
Finally we return to the AC1 PDA and explain the low- enhancements based on the Meddis–Hewitt auditory
frequency errors due to the kick drum in Fig. 6. The au- model are studied experimentally on synthetic musical sig-
tocorrelation method of AC1 leads to a significant number nals. The ACF peak-based pitch detector incurs pitch es-

Fig. 10. ACF plotted as a function of lag. (a) Signal of fundamental frequency 600 Hz (——) and noise tone of frequency 3300 Hz
(- - -). (b) Noisy signal. (c) Nonlinearly processed noisy signal.

388 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

Fig. 11. SACF and spectral components of noisy signal after channel filtering and nonlinear processing corresponding to different
four-channel groupings of signal harmonics and noise. (a) SACF for /h1/h2/h3+h4/n/ and corresponding power spectrum. (b) SACF
for /h1/h2/h3/h4+n/ and corresponding power spectrum.

Fig. 12. ACF plotted as a function of lag for signal and interference tone of 68 Hz. - ⭈ - ⭈ - ACF of signal; - - - ACF of interference;
—— ACF of noisy signal. (a) Signal fundamental frequency 600 Hz. (b) Signal fundamental frequency 200 Hz.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 389


RAO AND SHANDILYA ENGINEERING REPORTS

timation errors when interference partials cooccur with


6 REFERENCES
signal harmonics. The following enhancements to the ba-
sic ACF PDA improve the robustness of pitch extraction [1] W. Hess, Pitch Determination of Speech Signals
in the presence of percussive interference in the form of (Springer, New York, 1983).
inharmonic tones. Outer-ear-middle-ear prefiltering, co- [2] J. D. Wise, J. R. Capiro, and T. W. Parks, “Max-
chlear bandpass filtering, and hair-cell nonlinear process- imum Likelihood Pitch Estimation,” IEEE Trans.
ing represent a combination of linear and nonlinear pre- Acoustics, Speech, Signal Process., vol. ASSP-24 (1976
processing of the signal before computing the ACF to Oct.).
estimate pitch periodicity. The sum of channel autocorre- [3] G. Monti and M. Sandler, “Monophonic Transcrip-
lations (SACF) would simply be proportional to the auto- tion with Autocorrelation,” in Proc. COST G-6 Conf. on
correlation of the input signal to the cochlear filter bank if Digital Audio Effects, DAFX-00 (Verona, Italy, 2000).
it was not for the hair-cell nonlinearity [12]. Of signifi- [4] D. J. Hermes, “Pitch Analysis,” in Visual Represen-
cance then is the combined role of channel separation and tations of Speech Signals, M. Cooke, S. Beet, and M.
hair-cell nonlinearity. The experimental results provide Crawford, Eds. (Wiley, New York, 1993).
important insights into the nature of the pitch errors and [5] R. Meddis, and M. J. Hewitt, “Virtual Pitch and
their dependence on the relative frequencies of the signal Phase Sensitivity of a Computer Model of the Auditory
and interference. The noise sensitivity of ACF peak-based Periphery. I: Pitch Identification,” J. Acoust. Soc. Am., vol.
pitch detection is highest when interfering partials fall ex- 89 (1991 June).
actly between signal harmonics. The hair-cell nonlinearity [6] M. Pflueger, R. Höldrich, and W. Riedler, “A Non-
serves to increase the accuracy of pitch detection primarily linear Model of the Peripheral Auditory System,” pre-
via an increased bias of lower lag peaks in the ACF and sented at the 103rd Convention of the Audio Engineering
the attenuation of high-frequency partials. The separate Society, J. Audio Eng. Soc. (Abstracts), vol. 45, p. 1002
processing of frequency channels by the introduction of (1997 Nov.), preprint 4510.
the cochlear filter bank is crucial in reducing the distortion [7] “HUTear—Matlab Toolbox for Auditory Model-
components from the interaction of signal harmonics and ing,” available at www.acoustics.hut.fi/software/HUTear/.
interference partials while reinforcing the contribution to [8] A. Klapuri, “Wide-Band Pitch Estimation for Natu-
the fundamental frequency component from unresolved ral Sound Sources with Inharmonicities,” presented at the
higher harmonics of the signal. 106th Convention of the Audio Engineering Society, J.
It would be interesting to explore the perceptual impli- Audio Eng. Soc. (Abstracts), vol. 47, p. 520 (1999 June),
cations of the preceding observations. That is, since the preprint 4906.
observations on the accuracy of the Meddis–Hewitt PDA [9] R. Meddis, “Simulation of Mechanical to Neural
in the current context are explained by the signal- Transduction in the Auditory Receptor,” J. Acoust. Soc.
processing algorithms used, it is relevant to wonder wheth- Am., vol. 79 (1986, Mar.).
er these specific predictions of the model hold for subjec- [10] R. Meddis, M. Hewitt, and T. Shackleton, “Imple-
tive pitch perception. From a practical viewpoint, the mentation Details of a Computation Model of the Inner
results of this study may be applied to construct pitch Hair-Cell/Auditory-Nerve Synapse,” J. Acoust. Soc. Am.,
detectors for musical signals that are robust to the presence vol. 87 (1990 Apr.).
of nonpitched percussion. It is interesting to consider tun- [11] M. Slaney, Auditory Toolbox, version 2, Interval
ing the several available parameters of the hair-cell model Research Corporation.
nonlinearity to increase its effectiveness for known signal [12] P. Cariani, M. Tramo, and B. Delgutte, “Neural
and interference characteristics, including possibly the Representation of Pitch through Temporal Autocorrela-
presence of broad-band noise. Alternative means of ob- tion,” presented at the 103rd Convention of the Audio
taining the SACF, such as summing with channel weight- Engineering Society, J. Audio Eng. Soc. (Abstracts), vol.
ing, provide further promising directions for future work. 45, pp. 1021–1022 (1997 Nov.), preprint 4583.

THE AUTHORS

P. Rao S. Shandilya

390 J. Audio Eng. Soc., Vol. 52, No. 4, April 2004


ENGINEERING REPORTS DETECTION OF MELODIC PITCH IN A PERCUSSIVE BACKGROUND

Preeti Rao received a Bachelor degree in Electrical speech and audio signal compression and audio content
Engineering from the Indian Institute of Technology, retrieval.
Bombay, in 1984, and a Ph.D. degree specializing in

signal processing from the University of Florida, Gaines-
ville, in 1990. She taught in the Electrical Engineering Saurabh Shandilya received a B.E. degree in electrical
department at the Indian Institute of Technology, Kanpur engineering from the Government Engineering College, Bi-
from 1994 to 1999. Following a six-month visiting posi- laspur, India, in 2001 and an M.Tech. degree in electrical
tion at the Institute of Perception Research, Eindhoven, engineering from the Indian Institute of Technology, Bom-
The Netherlands, she joined the Indian Institute of bay in 2003. Currently he works for Neomagic Semiconduc-
Technology, Bombay, where she is presently an associ- tors Inc., Noida, India. His interests include speech and audio
ate professor. Her current research interests include processing, video coding, and associative computing.

J. Audio Eng. Soc., Vol. 52, No. 4, April 2004 391


AES STANDARDS
COMMITTEE NEWS
Detailed information regarding AES Standards Committee
(AESSC) proceedings including structure, procedures, reports,
meetings, and membership is published on the AES Standards Web
site at http://www.aes.org/standards/. Membership of AESSC work-
ing groups is open to any individual materially and directly affect-
ed by the work of the group. For current project schedules, see the
project-status document also on the Web site.

climate following AESSC has rightfully earned a high


New AESSC Chair Named the problems reputation in the international standards
I am delighted to announce that Richard resulting from world and I believe it is well placed to
Chalmers has been appointed as the September 11, continue to be respected for years to
new AES Standards Committee Chair 2001, but I hope come.
by the AES President, Ron Streicher. that the standards “I wish the AESSC well and I hope
You will recall that, just before the work of the AES Richard Chalmers has as much fun I had.
Richard Chalmers
AES 115th Convention in New York, will continue to “John Nunn”
John Nunn notified us, “[T]hat for reflect the broad interests of the A note from Mark Yonge, AES
personal reasons I am unable to members and be driven by their needs. Standards Secretary:
consider re-appointment when my “I look forward to meeting everyone I would like to express here the
current term as AESSC Chair comes to and helping as much as I can to forward appreciation of everybody involved in
an end.” After extensive research and the work of the AES Standards the Standards Committee for John’s
consideration, this announcement Committee. stewardship and astonishing hard work
provides the AESSC with a clear path “Richard Chalmers” through some very interesting times. In
into the future. John Nunn, outgoing chair, writes: the past nine years, the Standards
Richard Chalmers writes: “Just before the New York con- Committee has moved from a paper-
“I am pleased and honored to take vention I informed the AESSC that I based operation to a modern, internet-
over from John Nunn as chair of the would be standing down as its Chair centered organization; the AES has
AESSC. For many years I was secretary once the appointment of a new Chair acquired an increased international sig-
to a number of EBU audio technical could be made by the AES President, nificance through its relationship with
groups and had the pleasure of meeting Ron Streicher. I have known Richard for the IEC; on-line communications now
several AESSC chairs as well as many years and I am confident that his permit faster document development
keeping in touch with AES work on a knowledge of audio and his experience than at any time in the past.
more formal basis through John and in the standards business will ensure the From a personal point of view, I owe
Mark Yonge. future of AES Standards in safe hands. a huge debt of gratitude to John for the
“Of course the interests of AES “When I first became involved with support he gave to both Dan Queen and
members are wider (and deeper!) than the AES standards activities, more than myself during the nontrivial process of
those of the EBU broadcasters so I am twenty years ago, I never considered secretariat transition in 2002.
expecting to follow a learning curve on that one day I might become Chair of Richard Chalmers’s own activities
a number of subjects. the AESSC and it has been a privilege within the European Broadcasting
“I am very conscious that the AESSC to serve the Society in this capacity for Union (EBU) over a number of years
has to operate in a more restrictive some nine years. Yes, it has been hard shows him to be someone who under-
work at times but it has also been very stands the purpose and value of audio
For its published documents and enjoyable and satisfying leading the standards as well as someone who
reports the AESSC is guided by Inter- AESSC through a period of consid- brings an abundance of knowledge and
national Electrotechnical Commission erable evolution. The rapid techno- experience to the work of the AESSC.
(IEC) style as described in the ISO/IEC logical changes in the audio industry Please join me in welcoming Richard
Directives, Part 3. IEC style differs in have produced many challenges, both to the AES Standards Committee. I very
some respects from the style of the AES technical and structural, for the AESSC much look forward to working with him
as used elsewhere in this Journal. and I should like to thank the AES in the times that lie ahead.
AESSC document stages referenced Board of Governors for their support
are: Project initiation request (PIR);
during this period. I should also like to
Proposed task-group draft (PTD);
Proposed working-group draft (PWD);
thank all the members of the AESSC Newly Published
Proposed call for comment (PCFC); who have worked on the standards AES-R2, AES project report for
Call for comment (CFC) which have been published and who articles on professional audio and
have given me their support. The for equipment specifications—

392 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


AES STANDARD
COMMITTEE NEWS

Notations for expressing levels has Project AES-X144 Carriage of DS AES31-3 Audio file transfer and
been revised. Audio Data in AES47 exchange—Part 3: Simple project
The revision has clarified the use of Scope: “[T]o amend AES47-2002 interchange
the “dBu” and abandons the earlier clauses 4.1.2.1, 5.2.2.2 and 6 to provide
intention to recommend the adoption of the option of transmitting DSD audio Mix automation
1 V as the reference quantity for audio instead of linear PCM.” A proposed extension of AES31-3 is
levels in decibels. intended to provide a simple but
The revised document can be found reliable method of interchanging gain
on the Web site. Summary Report: SC-06-01 and pan automation data between
AES14-1992 AES standard for Working Group on Audio- workstations.
professional audio equipment— File Transfer and Exchange Should there be some maximum gain
Application of connectors, Part 1: This meeting was held in conjunction in a fader, referenced to unity? U. Henry
XLR-type polarity and gender has been with the AES 115th Convention, NY, observed that many workstations have
reaffirmed. 2003-10-10 and was convened by chair difficulty applying more than 12 dB
AES17-1998 AES standard method M. Yonge. gain to an audio path; sometimes less. It
for digital audio equipment— would be helpful if it was generally
Measurement of digital audio DVD Forum Liaison understood that this was a realistic con-
equipment has been reaffirmed. J. Yoshio reported from DVD Forum straint on free interchange. Bull agreed
Working Group 10. This group is con- but noted that replay systems should be
sidering professional applications of prepared to accommodate larger gain
Call for Comment DVD optical disc. maxima in case these are encountered in
The following document will be There is currently no standard for pro- practice.
withdrawn by the AES after any adverse fessional audio recording on DVD. There was a discussion concerning
comment received within three months Application areas include studios and gains in decibels expressed to two
of the publication of their call on the broadcast. The intention is to consider a decimal places. Henry noted that
AES Standards Web site has been new format for recording and playback because AES31-3 requires the value to
resolved. For more detailed information of high quality audio data and associated be rendered in ASCII characters, some
on each call please go to data. The next generation of DVD tech- limit to the number of characters is prac-
www.aes.org/standards/b_comments. nology will handle the higher bandwidth tically necessary. It was noted also that
Comments should be sent by and storage capacity necessary for: a) receiving applications will need to inter-
e-mail to the secretariat at timecode; b) metadata; c) video polate, or ramp, between gain steps in
standards@aes.org. All comments will recording possibilities; d) audio any case, and this will largely eliminate
be published on the Web site. recording for professional use. the need for higher precision gain
Persons unable to obtain this Two possibilities are being con- values; two decimal places were felt to
document from the Web site may sidered for development as the DVD be sufficient.
request a copy from the secretariat. Audio Professional (DVD-AP) format:
a) Broadcast Wave Format (BWF), a Panning
CALL FOR COMMENT on with- computer file format on DVD-ROM; b) Bull felt that 100 points in each
drawal of AES33-1999, AES standard a DVD audio format dedicated for pro- panning axis (left to right and front to
procedure for maintenance of AES fessional audio application. back) would not be sufficient to define
audio connector database has been It could be possible to have two a smooth pan locus. Imagine a circular
published, 2004-02-13. sessions on a single DVD disc: DVD pan movement; there could be audible
audio plus a DVD-ROM area. However, path errors, or gain artifacts similar to
this will need a special DVD AP zipper noise. With such coarse steps,
New Project AES-X144 on recorder/player. eliminating these artifacts would need a
DSD over AES47 It had been noted that BWF files will greater degree of look-ahead to up-
Direct stream digital (DSD) be a standard for broadcasters. The coming data which could make
audio coding is becoming estab- Japanese Post Production Association practical implementations unneces-
lished within the industry as an (JPPA) had promoted the use of the sarily complex.
alternative to linear PCM BWF-J format on 2.5 in MO disc, which It was felt that the default pan
c o d i n g f o r p r o f e s sional audio was very popular in Japan position should be at front-center,
applications. Project AES-X140 is although not so widely used elsewhere. which should be identified by zero
intended to standardize a transport However, the capacity of an MO disc coordinates in both axes.
which will carry both linear PCM and was not enough for multichannel audio It was also felt that there needed to be
DSD. If AES47 is not extended to files. Further development of this style a convention for interpreting pan points
include DSD in a standardized way, a of interchange would require the in a similar manner to that proposed for
variety of incompatible implemen- capacity of the DVD-AP instead of the faders. This should avoid the ap-
tations can be anticipated. MO disc. pearance of instantaneous tran- ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 393


AES STANDARD
COMMITTEE NEWS

sitions—implementations should always It was pointed out that any general with the AES 115th Convention, New
use ramped coefficients in the same way standard should probably cover all York, 2003-10-10 and was convened
as the faders. international interchange and support all by chair J. Strawn.
Following a question about panning character sets, not just Japanese. UTF-
law, there was agreement that the 16 appears to be derived from ISO Liaison with 1394 Trade Association
document should define the total SPL in 10646; Unicode may be considered a (1394 TA)
the room from all loudspeakers to be subset of this same international Fujimori reported as follows:
constant, independent of the pan standard. The A/V Working Group is looking
position. Bull noted that it would be a sig- at the following topics:
nificant task to convert existing appli- • Blu-Ray DVD recording.
New business cations to a different character code set. • Japanese terrestrial digital television
S. Aoki spoke on the subject of Also, character codes sets are not inter- (DTV).
Broadcast Wave Format (BWF) files in operable so there will need to be some • A point-to-point test network.
international interchange. degree of active translation. • AV/C Camera storage subunit 2.1.
BWF as currently defined does not Henry observed that multi-octet file • IEEE Ballot Review Committee
support international interchange names will be a problem; they may not (BRC) review of 1394 implementation
because, for example, the metadata in be readable at all in some systems! Aoki tests.
the “bext” chunk is specied to use the said that the Japan Post Production • S800 Base-T, now known as IEEE-
ASCII character code. While ASCII is a Association (JPPA) has no special re- 1394c.
robust and simple character code, it is quirement but uses the file name scheme • Pin assignments on UTP5 cable.
limited to representing the Roman provided in the Japanese version of the The proposal from Digital Home
alphabet. ASCII code is not useful in computer operating system. Technology is to use pins 4 and 5 for
Japanese operations because operators Henry remained concerned that file 48V, and 3 and 6 for ground. B.
cannot read information in their own names will not be read correctly— Moses will get more information on
language. Interchange becomes difficult AES31-3 depends on locating files by this.
when interchanging commercials for file names. Multibyte file names could M. Mora of Apple asked if the
broadcasting where flexibility of corrupt accurate file name reading. Trade Association plans to publish
interchange is very important. This issue has implications that will audio guidelines similar to video
Aoki felt that Japanese character sets need further consideration. guidelines. Fujimori answered that the
based on ISO 10646 Universal Multi- DVD Forum and other organizations
Octet Coded Character Set (UCS) or its are expected to do that within their
derivative UCS Transformation Format, Summary Report: SC-06-02 application space.
UTF-16, would be appropriate. Unicode Working Group on Audio
and the Japanese-specific “Shift-JIS” are Applications Using the High New projects
also in use. In the Japan industry there Performance Serial Bus Strawn will submit a project initiation
appeared no clear consensus as to which (IEEE 1394) form for General Data based on IEC
character code to prefer. This meeting was held in conjunction 61883-6.

Proceedings of the
AES 24th International Conference:
THE PROCEEDINGS Multichannel Audio, The New Reality
OF THE AES 24 th
INTERNATIONAL Banff, Alberta, Canada
CONFERENCE 2003 June 26-28.
This conference was a follow-up to the 19th Conference on surround sound.
2003 June 26–28 These papers describe multichannel sound from production and engineering
to research and development, manufacturing, and marketing. 350 pages

Also available on CD-ROM

You can purchase the book and CD-ROM online at


www.aes.org.
Banff, Alberta, Canada For more information email Andy Veloz at
AAV@aes.org
Conference Chair:
Theresa Leonard
or telephone +1 212 661 8528 x39.

394 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


REVIEWS OF
ACOUSTICAL PATENTS*
Any opinions expressed here are those of reviewers as
individuals and are not legal opinions. Printed copies
of United States Patents here reviewed may be ordered
at $3.00 each from the Commissioner of Patents and
Trademarks, Washington, D.C. 20231. Patents are
available via the Internet at http://www.uspto.gov.

LLOYD RICE
11222 Flatiron Drive, Lafayette, Colorado 80026
REVIEWERS
GEORGE L. AUGSPURGER, Perception Incorporated, MARK KAHRS, Department of Electrical Engineering,
Box 39536, Los Angeles, California 9003 University of Pittsburgh, Pittsburgh, Pennsylvannia 15261

6,535,610 in ma gnetic gap 57 formed be tween ring magnet 11 and pole piece 52. It
seems obvious that the magnet must have one pole on its inner surface
43.38.Hz DIRECTIONAL MICROPHONE UTILIZING facing the voice coil and the other pole on its outer surface abutting the pole
SPACED APART OMNI-DIRECTIONAL
MICROPHONES
Brett B. Stewart, assignor to Morgan Stanley & Company
Incorporated
18 March 2003 „Class 381Õ92…; filed 7 February 1996
The c urrent buzzword in audio pickup for teleconferencing is ‘‘beam-
forming.’’ If mere beamforming is not up to the task, then ‘‘adaptive beam-
forming’’ will surely do the trick. This latest invention mounts a few omni-
directional microphones around the periphery of a video display, digitizes
their outputs, and then processes the signals through tapped delay lines
under computer control.—GLA
piece. According to thispatent, a more linear magnetic field can be achieved
if ma gnetization is divided into three areas. Only the central portion of the
6,542,614 ring is magnetized at right angles. The upper and lower portions are mag-
netized at angles of 40° or less.—GLA
43.38.Si BOOMLESS HEARINGÕSPEAKING
CONFIGURATION FOR SOUND RECEIVING MEANS
6,542,617
Heinz Renner, assignor to Koninklijke Philips Electronics N.V.
1 April 2003 „Class 381Õ370…; filed in the European Patent Office
43.38.Ja SPEAKER
21 March 2001
Certain hands-free communication applications require the user to hear Masao Fujihira et al., assignors to Sony Corporation
local sounds as well as incoming signals. This requirement can be met by a 1 April 2003 „Class 381Õ402…; filed in Japan 26 May 1999
single headphone having an attached boom microphone. Although the mi-
What appears to be a conventional self-shielded l oudspeaker is in fact
crophone is advantageously close to the user’s mouth, its proximity results
in unwanted pickup of pops and air noises as well. Moreover, the assembly a s mall unit designed to reproduce frequencies up to 70 kHz or so. Voice
is easily dislodged if the user is moving. This patent describes a lightweight, coil bobbin 11 is made of a conductive material such as aluminum. Coil 6 is
clip-on earpiece that contains not only a headphone but an embedded direc- attached to the bobbin by a soft bonding agent that decouples the coil from
tional microphone.—GLA

6,529,107
43.38.Ja SPEAKER COMPRISING RING MAGNET
Motoharu Shimizu and Hiroyuki Daichoh, assignors to Hitachi
Metals Limited
4 March 2003 „Class 335Õ302…; filed in Japan 16 December 1999
Modern magnetic materials allow moving coil loudspeakers to be built
with thin, lightweight ring magnets. In the illustration, voice coil 55 moves

* The editors of this Journal are grateful to the Journal of the


Acoustical Society of America for allowing us to republish, from
their monthly Reviews, selected patents of interest to our readers.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 395
REVIEWS OF
ACOUSTICAL PATENTS*

the bobbin at very high frequencies. The patent explains that in this very 6,554,098
high range the bobbin is driven as an induction motor and continues to
radiate sound, presumably from dust cap 15.—GLA 43.38.Ja PANEL SPEAKER WITH WIDE FREE
SPACE
Tatsumi Komura, assignor to NEC Corporation
6,556,687 29 April 2003 „Class 181Õ173…; filed in Japan 15 June 1999
43.38.Hz SUPER-DIRECTIONAL LOUDSPEAKER To save space, a panel-type loudspeaker diaphragm 1 can be driven at,
USING ULTRASONIC WAVE
Koji Manabe, assignor to NEC Corporation
29 April 2003 „Class 381Õ387…; filed in Japan 23 February 1998
It is known that an array of ultrasonic transducers can be driven by a
modulated carrier to produce audible sound from empty space. This patent
suggests that if the transducers are mounted on a concave surface, then their

or very near, its outer edge. A single panel can be driven by more than one
transducer 2, 2⬘.—GLA

6,543,574

43.38.Ja METHOD OF MAKING A SPEAKER EDGE


CONTAINING ISOCYANATE AND POLYOL

Sinya Mizone et al., assignors to Inoac Corporation; Matsushita


Electric Industrial Company, Limited
8 April 2003 „Class 181Õ171…; filed in Japan 9 March 1999
combined energy can be focused at a specific point in space. Moreover, if A half-roll loudspeaker cone suspension is usually formed from sheet
the curvature is adjustable, then the focal point can be shifted. These two stock molded to shape by heat and pressure. The patent describes an inter-
observations would seem to be self-evident, but they were expanded into 17 esting alternative in which a liquid material is injected into a die cavity and
patent claims.—GLA then subjected to reaction foaming and curing—a little like baking a
waffle.—GLA

6,535,269
43.38.Md VIDEO KARAOKE SYSTEM AND 6,549,632
METHOD OF USE
43.38.Kb MICROPHONE
Gary Sherman and M ichael Ch ase, both of Lo s Angel es,
California Hiroshi Akino et al., assignors to Kabushiki Kaisha
18 M arch 2003 „Class 352 Õ6…; filed 29 June 2001 Audio-Technica
15 April 2003 „Class 381Õ174…; filed 19 March 1997
The sound track of a commercial motion picture is already created and
stored in a multi-track format. Individual tracks may be rerecorded later to Some hand-held microphones are extremely sensitive to mechanical
correct audio problems or to dub dialog into another language. Suppose that hocks and scrapes. This patent describes a simple, passive shock isolation
you could purchase a DVD for home viewing that allowed you to record and ystem derived from mechanical analog circuit analysis. Although the patent
replace selected dialog tracks with your own overdubs. The patent describes ext describes embodiments for both dynamic and capacitor microphones,
an interactive system to facilitate such customized viewing.—GLA all of the eight patent claims refer to capacitor microphones only.—GLA

396 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


REVIEWS OF
ACOUSTICAL PATENTS*

6,535,610
43.38.Hz DIRECTIONAL MICROPHONE UTILIZING
SPACED APART OMNI-DIRECTIONAL
MICROPHONES
Brett B. Stewart, assignor to Morgan Stanley & Company
Incorporated
18 March 2003 „Class 381Õ92…; filed 7 February 1996
The c urrent buzzword in audio pickup for teleconferencing is ‘‘beam-
forming.’’ If mere beamforming is not up to the task, then ‘‘adaptive beam-
forming’’ will surely do the trick. This latest invention mounts a few omni- quite different from a standard closed-box loudspeaker. Some general rules
directional microphones around the periphery of a video display, digitizes for predicting and optimizing the performance of such a system are devel-
their outputs, and then processes the signals through tapped delay lines oped and explained.—GLA
under computer control.—GLA

6,549,629
43.38.Tj DVE SYSTEM WITH NORMALIZED
6,549,637
SELECTION
43.38.Ja LOUDSPEAKER WITH DIFFERENTIAL
Brian M. Finn and Shawn K. Steenhagen, assignors to Digisonix
FLOW VENT MEANS LLC
15 April 2003 „Class 381Õ92…; filed 21 February 2001
Jon M. Risch, assignor to Peavey Electronics Corporation
15 April 2003 „Class 381Õ397…; filed 24 September 1998 DVE stands for digital voice enhancement which, in this case, includes
echo cancellation, background noise suppression, and optimal selection of
This patent includes just a little bit of everything, culminating in 40
multiple-zone microphones in a hands-free communications system.
fairly lengthy claims. However, the heart of the invention is the differential
flow vent shown. This is an open-ended cylinder 100 fitted with funnel-
shaped insert 104. We are informed that air flowing from left to right will

Although the procedure is too complicated to describe in a few sentences, it


is clearly explained in the patent. Anyone interested in the field will find the
encounter more resistance than that flowing in the reverse direction. If two patent informative.—GLA
conventional vents in a woofer box are replaced by differential vents of
opposite polarity, the result is forced air ventilation—benign turbulence, so
to speak.—GLA
6,570,079
43.38.Md RECORDING AND REPRODUCING
APPARATUS, RECORDING AND REPRODUCING
METHOD, AND DATA PROCESSING
6,553,124 APPARATUS
43.38.Ja ACOUSTIC DEVICE Shinichi Fukuda, assignor to Sony Corporation
27 May 2003 „Class 84Õ602…; filed in Japan 19 February 1998
Henry Azima and Joerg Panzer, assignors to New Transducers
Limited
22 April 2003 „Class 381Õ345…; filed in the United Kingdom 2 Sep- Sony let the examiner do the legwork on this one. Suppose an audio
tember 1995 CD owner wishes to make a copy. If a lossy copy is chosen, then the user is
This is an interesting patent that includes more than 20 pages of actual stuck with just a bad fidelity copy but not a bill. If the high-fidelity copy is
test results. The document is really a research paper followed by seven chosen, then the ‘‘accounting system’’ is involved and the owner is charged
patent claims. The patent teaches that enclosing the rear radiation of a bend- a fee. The patent doesn’t address any of the wider issues regarding music
ing wave, panel-type loudspeaker results in a system that is acoustically distribution.—MK

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 397


REVIEWS OF
ACOUSTICAL PATENTS*

6,557,665
43.38.Ja ACTIVE DIPOLE INLET USING DRONE
CONE SPEAKER DRIVER
Richard D. McWilliam and Ian R. McLean, assignors to Siemens
Canada Limited
6 May 2003 „Class 181Õ206…; filed 16 May 2001
The invention is intended to provide active noise cancellation at the air
intake of an internal combustion engine. Inner diaphragm 18 is electrically
driven by conventional means 46. Outer diaphragm 22 is driven acoustically
to generate a noise attenuating signal. At the same time, air is somehow

flowing from mouth 38 in direction A through passageway 14 into the


engine even though both diaphragms are provided with seals 30, 34. The
patent claims do not clarify the arrangement.—GLA

6,535,613
43.38.Ja AIR FLOW CONTROL DEVICE FOR
LOUDSPEAKER generated in a sound wave amplifying horn to increase amplification effi-
ciency of bass sounds and improve the clearness of the resulting sounds.’’—
Jason A. Ssutu, assignor to JL Audio, Incorporated GLA
18 March 2003 „Class 381Õ397…; filed 28 December 1999
Airtight dust cap 44 pumps air in and out of cavity 46 through rear 6,557,664
43.38.Ja LOUDSPEAKER
Anthony John Andrews and John Newsham, both of Dorking,
Surrey, the United Kingdom
6 May 2003 „Class 181Õ152…; filed 22 February 1994
Central plug 21 is actually chisel-shaped, and surrounding horn 11 is
similarly asymmetrical. The overall assembly is a close cousin to the JBL
2405 high-frequency transducer designed more than 25 years ago. In both

vent 31. Plate 52 directs the air flow against the inner surface of bobbin 35
to c ool voice coil 36.—GLA

6,560,343
43.38.Ja SPEAKER SYSTEM
Jae-Nam Kim, assignor to Samsung Electronics Company,
Limited
6 May 2003 „Class 381Õ349…; filed in the Republic of Korea 22
April 1996
Part of the backwave energy from loudspeaker 16 is conducted
through horn 24 to the face of cabinet 10. The remainder energizes vent 26.
The patent explains that since only a portion of the rear sound waves are cases the objective is to create a coverage pattern that is relatively wide
collected and amplified, ‘‘...reflected waves or standing waves will not be horizontally but vertically narrow.—GLA

398 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


REVIEWS OF
ACOUSTICAL PATENTS*

6,546,105 rectified, and then divided—not compared. The patent explains in some
detail how this arrangement estimates source proximity rather than signal-
43.38.Vk SOUND IMAGE LOCALIZATION DEVICE to-noise ratio. If the proximity estimation signal exceeds a predetermined
AND SOUND IMAGE LOCALIZATION threshold, then microphone 310 is gated on or its gain raised.—GLA
METHOD
6,553,121
Takashi Katayama et al., assignors to Matsushita Electric
Industrial Company, Limited
43.38.Vk THREE-DIMENSIONAL ACOUSTIC
8 April 2003 „Class 381Õ17…; filed in Japan 30 October 1998
PROCESSOR WHICH USES LINEAR PREDICTIVE
Using head-related transfer functions 共HRTFs兲 to create virtual sound
sources from a pair of loudspeakers is theoretically intriguing but messy in
COEFFICIENTS
practice. Assuming that some kind of all-purpose HRTFs can be derived,
Naoshi Matsuo and Kaori Suzuki, assignors to Fujitsu Limited
then FIR filter coefficients can be calculated for any angular location. How-
22 April 2003 „Class 381Õ17…; filed in Japan 8 September 1995
ever, computing and/or storing filter coefficients for all possible locations is
inefficient and time-consuming. This patent, like a number of earlier inven- To create convincing three-dimensional audio for computer games, a
number of virtual sound sources must be controlled within a virtual sound
field. At the same time, the acoustical characteristics of the actual reproduc

tions, attempts to find a better way. In this case, the angular location of a
virtual source is fed to a coefficient control device which then performs
digital mathematical operations involving only three predetermined fre-
quency response functions. The process is said to result in a dramatic reduc-
tion in memory requirements and computational time, as compared to prior
art.—GLA

6,549,630
43.38.Lc SIGNAL EXPANDER WITH
DISCRIMINATION BETWEEN CLOSE AND
DISTANT ACOUSTIC SOURCE ing sound field must be subtracted from the sound source. A bank of large
FIR filters seems to be called for, but the patent suggests a more efficient
James F. Bobisuthi, assignor to Plantronics, Incorporated approach. By performing linear predictive analysis of the impulse response
15 April 2003 „Class 381Õ94.7…; filed 4 February 2000 of the sound field to be added, the number of taps can be greatly reduced. A
similar procedure can be applied to sound sources in motion—in effect,
panning between locations rather than creating a plurality of individual
sources. The patent is clearly written and includes a great many helpful
The objective is to reliably turn on a microphone when its user speaks
illustrations.—GLA
and to minimize false triggering from other sound sources. A handset or
headset is fitted with two microphones, one 310 near the talker’s mouth and
the other 330 as far as possible from the first. Their outputs are filtered,
6,549,627
43.38.Lc GENERATING CALIBRATION SIGNALS
FOR AN ADAPTIVE BEAMFORMER
Jim Rasmusson et al., assignors to Telefonaktiebolaget LM
Ericsson
15 April 2003 „Class 381Õ71.11…; filed 30 January 1998
Consider a hands-free communications system installed in a vehicle.
The equipment includes two or more microphones 405, 407 and a loud-
speaker 401. By introducing adaptive filters at the outputs of individual
microphones it is possible to achieve in-phase summation of the signals
from the direction of a talker while largely canceling the unwanted signals
from the loudspeaker. Because the acoustical environment and the location

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 399


REVIEWS OF
ACOUSTICAL PATENTS*

6,577,736
43.38.Vk METHOD OF SYNTHESIZING A THREE
DIMENSIONAL SOUND-FIELD
Richard David Clemow, assignor to Central Research
Laboratories Limited
10 June 2003 „Class 381Õ18…; filed in the United Kingdom 15 Oc-
tober 1998
A home surround sound system typically uses two or three front speak-
ers plus two rear speakers. When mixing program material for this format,
conventional panning techniques work well when moving sound images
laterally but cannot create accurate phantom sources between front and rear
speakers. Conversely, by making use of head-related transfer functions and

of the talker may change, the system must somehow calibrate itself. The
patent describes an improved one-step calibration process that may also be
augmented by utilizing the filters as fixed echo cancellers during normal
operation.—GLA

6,563,932
43.38.Ja MAGNET SYSTEM FOR LOUDSPEAKERS
Paul Cork, assignor to KH Technology
13 May 2003 „Class 381Õ412…; filed in the United Kingdom 16
January 2001
An electrodynamic loudspeaker magnetic circuit has a ring-shaped gap
42 between inner and outer pole pieces 26 and 36. Typically, the gap would

interaural crosstalk cancellation, a two-speaker playback system can theo-


retically locate sound images almost anywhere if it is designed for one
particular listener at a specific location. The inventor has developed an in-
teresting method for combining the best features of both systems. According
to the patent, it can easily be combined with conventional multi-channel
mixing techniques.—GLA

6,587,565
be energized by a single magnet 20. The addition of complementary magnet
44 is said to overcome deficiencies of prior art in terms of reduced size, 43.38.Vk SYSTEM FOR IMPROVING A SPATIAL
improved performance, and ease of assembly.—GLA EFFECT OF STEREO SOUND OR ENCODED
SOUND
Pyung Choi, assignor to 3S-Tech Company, Limited
1 July 2003 „Class 381Õ98…; filed in the Republic of Korea 13
6,574,339 March 1997

43.38.Vk THREE-DIMENSIONAL SOUND This patent describes yet another stereo enhancement method based on
filtering and cross-coupling. In this case, signal processing is applied to left
REPRODUCING APPARATUS FOR MULTIPLE
LISTENERS AND METHOD THEREOF
Doh-hyung Kim and Yang-seock Seo, assignors to Samsung
Electronics Company, Limited
3 June 2003 „Class 381Õ17…; filed 20 October 1998
Using head-related transfer functions and some fancy digital filtering,
it is possible to create a convincing three-dimensional sound field from two
loudspeakers. A major drawback is that the illusion is effectively limited to
a single listener at a defined location. We might give the listener a choice of
locations by allowing him to select from, say, three filter settings. But sup-
pose that all three settings are selected sequentially at some optimum time
interval. Will the result be aural confusion or will the effective sound field and right channels individually. The original left and right signals are ‘‘en-
be expanded to accommodate multiple listeners? The patent argues for the hanced’’ at low frequencies but otherwise unmodified. The difference signal
latter.—GLA is ‘‘enhanced’’ in the high-frequency range.—GLA

400 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


2003
ROUGH
D TH
UPDATE 10,000
Journal technical
articles, convention
preprints, and
conference papers
at your fingertips

The Audio Engineering Society has published a 20-disk electronic library containing most of the
Journal technical articles, convention preprints, and conference papers published by the AES since
its inception through the year 2003. The approximately 10,000 papers and articles are stored in PDF
format, preserving the original documents to the highest fidelity possible while permitting full-text
and field searching. The library can be viewed on Windows, Mac, and UNIX platforms.
You can purchase the entire 20-disk library or disk 1 alone. Disk 1 contains the program and
installation files that are linked to the PDF collections on the other 19 disks. For reference and
citation convenience, disk 1 also contains a full index of all documents within the library, permit-
ting you to retrieve titles, author names, original publication name, publication date, page num-
bers, and abstract text without ever having to swap disks.

For price and ordering information send email to Andy Veloz at


aav@aes.org, visit the AES web site at www.aes.org, or call any
AES office at +1 212 661 8528 ext. 39 (USA);
+44 1628 663725 (UK); +33 1 4881 4632 (Europe).
AES 25 INTERNATIONAL th

CONFERENCE
n the 135-meter-high London TUTORIAL DAY
Eye you can see as far as 25 Gerhard Stoll and Russell Mason, papers cochairs, have
miles away, and you have a targeted a number of papers for tutorial presentations on
bird’s eye view of such major Thursday, June 17 as a good way to offer attendees a
London sights as St. Paul’s thorough introduction to the subject of metadata. Two
Cathedral, Buckingham Palace, invited papers in the first morning session, “Metadata,
the Houses of Parliament, and Identities, and Handling Strategies,” by Chris
Big Ben. Chair John Grant and Chambers, and “Before There Was Metadata,” by Mark
his committee are planning a con- Yonge, are introductory papers to set the stage for
ference this June 17–19 that will everything that follows.
give you a great view of the critically The next session, File Basics, has three invited
important subject of metadata. Metadata for papers: “Introduction to MXF and AAF,” by Philip
Audio will be held at Church House, the conference center that DeNier; “XML Primer,” by Claude Seyrat; and “Keep-
is just a stone’s throw from Westminster Abbey and the ing it Simple: BWF and AES31,” by John Emmett.
Houses of Parliament in central London. The first session on Thursday afternoon, Practical
As the means for production and distribution of digital audio Schemes, starts with an invited paper by Philippa Mor-
proliferate, appropriate metadata tools are needed to facilitate, rell, “The Role of Registries.” The next paper, by
control, and extend these activities. There has been a great deal researchers from Pompeu Fabra University of
of activity in individual organizations to develop metadata Barcelona, will look at a system for managing sound
tools. However, substantial issues remain to be addressed effects. And Richard Wright will present an invited
before the desired goal of global exchange and common under- paper on the Dublin Core. The final session on Thurs-
standing can be reached. International standardization, such as day is a workshop on MPEG-7. This tutorial day is also
the work on MPEG-7 and MPEG-21 may hold some important available as a single-day registration option (see the
Photos courtesy British Tourist Office

answers. registration form on page 411).


This conference seeks to describe the state of the art, identify
the issues, and indicate directions for the develop-
ment of advanced metadata systems, both for con-
sumer distribution and business-to-business. It will
bring together media publishers and software
designers, media librarians and archivists, database
managers and streaming engineers whose opera-
tions are increasingly dependent on the success of
sophisticated metadata systems.

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


Metadata for Audio
London, UK
June 17–19, 2004
CONFERENCE DAY 1 Metadata is the “bread and butter” of libraries
On Friday the papers sessions begin with Frameworks, which and archives, so the first afternoon session on
includes an invited paper by Wes Curtis, “P-META: Program Saturday will include papers about projects at
Data Exchange in Practice.” This will be followed by the first the U.S. Library of Congress, Spanish National
posters session. And the final morning session will be Radio, and Swedish Radio. The final
Toolkits, which will include two invited papers: “Digital conference session will be on metadata
Media Project,” by R. Nicol, and “MPEG-21: What and needed for the online delivery of audio. The
Why,” by Jan Bormans and Kate Grant. After lunch there will calendar, complete program with abstracts,
be the two-part session Feature Extraction. The second part of and conference registration form follow on
the posters session will also be on Friday afternoon. pages 404–411.
On Friday evening there will be an optional (not included in And, of course, you should try to
conference registration fee) banquet and guided tour at the his- schedule an extra day or two to visit one
toric Houses of Parliament (see photo below). The evening of the world’s great cities. London
will start with a tour of the debating chambers of the Houses preserves its magnificent history and
of Commons and Lords. There will be a brief technical talk at the same time encourages new
about the sound-reinforcement system in the Lords Chamber, music, art, literature, and architec-
which uses 84 microphones on motorized winches and has ture; just what good metadata
over 400 individually controlled loudspeakers. Afterwards does for audio. Meet your col-
dinner will be served in a room overlooking the River Thames. leagues there June 17–19 for the
AES 25th International Confer-
CONFERENCE DAY 2 ence, Metadata for Audio, it’s
The entire Saturday morning session will be Broadcast going to be absolutely fabulous.
Implementations, which will include papers on the metadata Go to www.aes.org now for
processes of British, German, and Japanese broadcasters. more details and register online.

To
AES
25th

J. Audio Eng. Soc., Vol. 52, No. 4, 2004


AES 25th INTERNATIONAL CONFERENCE
Metadata for Audio
London, UK • June 17–19, 2004

THURSDAY, JUNE 17 FRIDAY, JUNE 18 SATURDAY, JUNE 19


TUTORIAL DAY CONFERENCE DAY 1 CONFERENCE DAY 2

Tutorial Registration Conference Registration

Session 1 Session 7
Session T-1 Broadcast Implementations, Session A
Introduction Frameworks

Session 2: Posters, Part 1 Session 8


Session T-2 Broadcast Implementations, Session B
File Basics Session 3
Toolkits
Lunch Lunch
Lunch

Session T-3 Session 9


Practical Schemes Session 4 Libraries and Archives
Feature Extraction, Session A
Session 10
Workshop
Session 5: Posters, Part 2 Delivery of Audio
MPEG-7
Session 6
Photo courtesy British Tourist Office

Feature Extraction, Session B

Banquet (optional)
Houses of Parliament

Schedule is subject to change. Check www.aes.org for updates.


AES 25th International
Conference Program
Metadata for Audio
2004 June 17–19
London, UK

Technical Sessions
Thursday, June 17 what it can do for the user, and various ways that it
TUTORIALS can be employed in the area of metadata for
audio. The contents of this paper form the basis for a
SESSION T-1: INTRODUCTION
number of the papers that appear later in the conference.
T1-1 Metadata, Identities, and Handling Strategies—
Chris Chambers, BBC R&D, Tadworth, Surrey, UK T2-3 Keeping it Simple: BWF and AES31—John Emmett,
(invited) Broadcast Project Research Ltd., Teddington,
Middlesex, UK (invited)
With all the potential media material and its associated
metadata becoming accessible on IT-based systems, Digital audio is spreading outward to the furthest
how are systems going to find and associate the ele- reaches of the broadcast chain. Making the best use of
ments of any single item? How are the users going to the opportunities presented by this demands a stan-
know they have the correct items when assembling dardization procedure that is adaptable to a vast num-
audio, video, and information for use within a larger ber of past, present, and future digital audio formats
project? This short talk will explore the way areas of and scenarios. In addition, would it not be just great if
our industry are hoping to tackle the problem and it cost nothing? This paper will point out the benefits of
some of the standards being introduced to ensure what we already have and tell a tale of borrowing
management of this material is possible. economical audio technology from many sources.

T1-2 Before There Was Metadata—Mark Yonge (invited) Thursday, June 17


Audio has never existed in isolation. There has always SESSION T-3: PRACTICAL SCHEMES
been a mass of associated information, both explicit
and implicit, to direct, inform, and enhance the use of T3-1 The Role of Registries—Phillipa Morell, Metadata
the audio. In the blithe days before information theory Associates Ltd., London, UK (invited)
we didn’t know it was all metadata. This paper reviews Some forms of metadata, especially those that identify
the extent of traditional metadata covering a range of objects or classes of objects, form classes of their own
forms. Some of them may be surprising; all of them that need to be administered centrally in order to avoid
need to be re-appraised in the light of newer, more for- the risk of duplication and consequent misidentifica-
mal metadata schemes. tion. The concept of such a registry is not new; for
example, International Standard Book Numbers (ISBN)
Thursday, June 17 derive from a central registry that was originally set up
in 1970. The registry that ensures that every ethernet-
SESSION T-2: FILE BASICS connected device in the world is uniquely identifiable is
T2-1 Introduction to MXF and AAF—Philip DeNier, BBC another example. Formal identifiers and other metada-
R&D, Tadworth, Surrey, UK (invited) ta for use in commercial transactions will increasingly
use the services of one or more metadata registries, as
The AAF and MXF file formats provide a means to this paper will discuss.
exchange digital media along with a rich (extendible)
set of metadata. This presentation will be a basic intro- T3-2 Sound Effect Taxonomy Management in
duction into the content of these file formats and will Production Environments—Pedro Cano, Markus
include a description of the metadata scheme used. Koppenberger, Perfecto Herrera, Oscar Celma,
Universitat Pompeu Fabra, Barcelona, Spain
T2-2 XML Primer—Claude Seyrat (invited)
Categories or classification schemes offer ways of nav-
Most audio professionals have heard of the term “XML” igating and having higher control over the search and
but not many know for sure what it means or have yet retrieval of audio content. The MPEG-7 standard pro-
had to work with it. This paper sets out what XML is, vides description mechanisms and ontology man- ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 405


25rd International Conference Program
agement tools for multimedia documents. We have manually or automatically and are stored in an XML
implemented a classification scheme for sound effects database. Retrieval services are implemented in the
management inspired by the MPEG-7 standard on top database. A set of musical transformations are
of an existing lexical network, WordNet. WordNet is a defined directly at the level of musically meaningful
semantic network that organizes over 100,000 con- MPEG-7 descriptors and are automatically mapped
cepts of the real world with links between them. We onto low-level audio signal transformations. Topics
show how to extend WordNet with the concepts of the included in the presentation are: (1) Description gen-
specific domain of sound effects. We review some of eration procedure, manual annotation of editorial
the taxonomies to acoustically describe sounds. Mining description: the MDTools, automatic description of
legacy metadata from sound effects libraries further audio recordings, the SPOffline; (2) Retrieval function-
supplies us with terms. The extended semantic net- alities, local retrieval: SPOffline, remote retrieval:
work includes the semantic, perceptual, and sound Web-based retrieval; and (3) Transformation utilities:
effects specific terms in an unambiguous way. We the SPOffline.
show the usefulness of the approach, easing the task
for the librarian and providing higher control on the Using MPEG-7 Audio Low-Level Scalability: A
search and retrieval for the user. Guided Tour—Jürgen Herre, Eric Allamanche,
Fraunhofer IIS, Ilmenau, Germany
T3-3 Dublin Core—R. Wright, BBC (invited)
[Abstract Not Available at Press Time]
Dublin Core metadata provides card catalog-like defini-
tions for defining the properties of objects for Web-
based resource discovery systems. The importance of Friday, June 18
the Dublin Core is its adoption as a basis for many CONFERENCE DAY 1
more elaborate schemes. When the view ahead is SESSION CD-1: FRAMEWORKS
obscured by masses of local detail, a firm grasp of the
Dublin Core will often reveal the real landscape. 1-1 Data Model for Audio/Video Production—A. Ebner,
IRT, Munich, Germany
Thursday, June 17 When changing from traditional production systems to
WORKSHOP—MPEG-7 IT-based production systems the introduction and
usage of metadata is unavoidable. Direct access of the
Coordinator: G. Peeters, IRCAM, Paris, France information stored in IT-based systems is not possible.
(in association with SAA TC) Descriptive and structural metadata are the enablers to
have proper access of selected material. Metadata
Managing Large Sound Databases Using MPEG— does not focus on descriptive information about the
Max Jacob, IRCAM, Paris, France content only. It describes the usage of the material, the
structure of a program, handling processes, relevant
Sound databases are widely used for scientific, com-
information, delivery information about properties, and
mercial, and artistic purposes. Nevertheless there is yet
storage of information. The basis to achieve a com-
no standard way to manage them. This is due to the
plete collection of metadata is a detailed analysis of a
complexity of describing and indexing audio content and
broadcaster's production processes and usage cases.
to the variety of purposes a sound database might
A logical data model expresses the relationship
address. Recently there appeared MPEG-7, a standard
between the information and is the foundation for
for audio/visual content metadata that could be a good
implementations that enable a controlled exchange
starting point. MPEG-7 not only defines a set of descrip-
and storage of metadata.
tion tools but is more generally an open framework host-
ing specific extensions for specific needs in a common
environment. This is crucial since there would be no way 1-2 P-META: Program Data Exchange in Practice—
to freeze in a monolithic definition all the possible needs Wes Curtis, BBC Television, London, UK (invited)
of a sound database. This paper outlines how the [Abstract Not Available at Press Time]
MPEG-7 framework can be used, how it can be extend-
ed, and how all this can fit into an extensible database
design, gathering three years of experience during the Friday, June 18
CUIDADO project at IRCAM. SESSION CD-2: POSTERS, PART 1
Integrating Low-Level Metadata in Multimedia 2-1 Low-Complexity Musical Meter Estimation from
Database Management Systems—Michael Casey, Polyphonic Music—Christian Uhle1, Jan Rohden1,
City University, London, UK Markus Cremer1, Jürgen Herre2
1Fraunhofer AEMT, Erlangen, Germany
[Abstract Not Available at Press Time] 2Fraunhofer IIS, Ilmenau, Germany

Tools for Content-Based Retrieval and This paper addresses the automated extraction of
Transformation of Audio Using MPEG-7: The musical meter from audio signals on three hierarchical
SPOff and the MDTools—Emilia Gómez, Oscar levels, namely tempo, tatum, and measure length. The
Celma, Emilia Gómez, Fabien Gouyon, Perfecto presented approach analyzes consecutive segments
Herrera, Jordi Janer, David García, University of the audio signal equivalent to a few seconds length
Pompeu Fabra, Barcelona, Spain each, and detects periodicities in the temporal progres-
sion of the amplitude envelope in a range between
In this workshop we will demonstrate three applica- 0.25 Hz and 10 Hz. The tatum period, beat period, and
tions for content-based retrieval and transformations measure length are estimated in a probabilistic manner
of audio recordings. They illustrate diverse aspects of from the periodicity function. The special advantages
a common framework for music content description of the presented method reside in the ability to track
and structuring implemented using the MPEG-7 stan- tempo also in music with strong syncopated rhythms,
dard. MPEG-7 descriptions can be generated either and its computational efficiency.
406 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
25rd International Conference Program
2-2 Percussion-Related Semantic Descriptors of signal spectral shape. The use of principal component
Music Audio Files—Perfecto Herrera1, Vegard analysis in conjunction with support vector machine
Sandvold2, Fabien Gouyon1 classification yields to nearly perfect recognition accu-
1Universitat Pompeu Fabra, Barcelona, Spain racy on varied musical solo phrases from ten instru-
2University of Oslo, Oslo, Sweden ments issued from different instrument families.
Automatic extraction of semantic music content meta-
data from polyphonic audio files has traditionally Friday, June 18
focused on melodic, rhythmic, and harmonic aspects. SESSION CD-3: TOOLKITS
In the present paper we will present several music con-
tent descriptors that are related to percussion instru- 3-1 Digital Media Project—R. Nicol, BT, Ipswich, UK
mentation. The “percussion index” estimates the (invited)
amount of percussion that can be found in a music [Abstract Not Available at Press Time]
audio file and yields a (numerical or categorical) value
that represents the amount of percussion detected in
3-2 MPEG-21: What and Why—Jan Bormans1, Kate
the file. A further refinement is the “percussion profile,”
Grant2 (invited)
which roughly indicates the existing balance between 1IMEC, Leuven, Belgium
drums and cymbals. We finally present the “percusive- 2Nine Tiles, Cambridge, UK
ness” descriptor, which represents the overall impul-
siveness or abruptness of the percussive events. Data The MPEG-21 vision is to define a multimedia frame-
from initial evaluations, both objective (i.e., errors, work to enable transparent and augmented use of mul-
misses, false alarms) and subjective (usability, useful- timedia resources across a wide range of networks
ness) will also be presented and discussed. and devices used by different communities. The tech-
nical report “Vision, Technologies and Strategy”
2-3 Tonal Description of Polyphonic Audio for Music describes the two basic building blocks: the definition
Content Processing—Emilia Gómez, Perfecto of a fundamental unit of distribution and transaction
Herrera, Universitat Pompeu Fabra, Barcelona, Spain (the digital item) and the concept of users interacting
with digital items. The digital items can be considered
The purpose of this paper is to describe a system that the “what” of the multimedia framework (e.g., a video
automatically extracts metadata from polyphonic audio collection, a music album), and the users can be con-
signals. This metadata describes the tonal aspects of sidered the “who” of the multimedia framework. MPEG-
music. We use a set of features to estimate the key of 21 is developing a number of specifications
the piece and to represent its tonal structure, but they enabling the integration of components and standards
could also be used to measure the tonal similarity to facilitate harmonisation of “technologies” for the cre-
between two songs and to perform some key-based ation, modification, management, transport, manipula-
segmentation or establish a tonal structure of a piece. tion, distribution, and consumption of digital items. This
paper will explain the relationship of the different
2-4 Phone-Based Spoken Document Retrieval in MPEG-21 specifications by describing a detailed use-
Conformance with the MPEG-7 Standard—Nicolas case scenario.
Moreau, Hyoung-Gook Kim, Thomas Sikora,
Technical University of Berlin, Berlin, Germany 3-3 A 3-D Audio Scene Description Scheme Based on
XML—Guillaume Potard, Ian Burnett, University of
This paper presents a phone-based approach of spo- Wollongong, NSW, Australia
ken document retrieval, developed in the framework of
the emerging MPEG-7 standard. The audio part of An object-oriented schema for describing time-vary-
MPEG-7 encloses a SpokenContent tool that provides ing 3-D audio scenes is proposed. The creation of
a standardized description of the content of spoken this schema was motivated by the fact that current
documents. In the context of MPEG-7, we propose an virtual reality description schemes (VRLM, X3D) have
indexing and retrieval method that uses phonetic infor- only basic 3-D audio description capabilities. In con-
mation only and a vector space IR model. Experiments trast, MPEG-4 AudioBIFs have advanced 3-D audio
are conducted on a database of German spoken docu- features but are not designed as a metadata lan-
ments with ten city name queries. Two phone-based guage. MPEG-4 BIFs are particularly targeted as a
retrieval approaches are presented and combined. The binary scene description language for scene render-
first one is based on the combination of phone ing purposes only. Our proposed 3-D audio scene
N-grams of different lengths used as indexing terms. description schema features state-of-the art 3-D
The other consists of expanding the document repre- audio description capabilities while being usable both
sentation thanks to the phone confusion probabilities. as a metadata scheme for describing 3-D audio con-
tent (for example, 5.1 or Ambisonics B-format) and as
2-5 Efficient Features for Musical Instrument a format for scene rendering.
Recognition on Solo Performances—Slim Essid,
Gaël Richard, Bertrand David, GET-Télécom Paris Friday, June 18
(ENST), Paris, France SESSION CD-4: FEATURE EXTRACTION, SESSION A
Musical instrument recognition is one of the important 4-1 A System for Harmonic Analysis of Polyphonic
goals of musical signal indexing. While much effort has Music—Claas Derboven, Markus Cremer, Fraunhofer
already been dedicated to such a task, most studies IIS AEMT, Ilmenau, Germany
were based on limited amounts of data that often
included only isolated musical notes. In this paper we A system for harmonic analysis of polyphonic musical
address musical instrument recognition on real solo signals is presented. The system uses a transform with
performance based on larger training and test sets. A a nonuniform frequency resolution for the extraction of
highly efficient set of features is proposed that is prominent tonal components and determines the key
obtained from signal cepstrum but also from spectrum and the contained chords of a musical input signal with
low- and higher-order statistical moments describing high accuracy. A statistical approach based on the ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 407
25rd International Conference Program
frequency of occurrence of musical notes for determin- the performance of MPEG-7 audio spectrum projection
ing the key is described. An algorithmic solution for (ASP) features based on several basis decomposition
chord determination is presented with a concise expla- algorithms vs. mel-scale frequency cepstrum coeffi-
nation. Finally, a qualitative evaluation of the system’s cients (MFCC). For basis decomposition in the feature
performance is conducted to demonstrate the applica- extraction we evaluate three approaches: principal com-
bility to real-world audio signals. ponent analysis (PCA), independent component analy-
sis (ICA), and non-negative matrix factorization (NMF).
4-2 Robust Identification of Time-Scaled Audio—Rolf Audio features are computed from these reduced vec-
Bardeli, Frank Kurth, University of Bonn, Bonn, Germany tors and are fed into a hidden Markov model (HMM)
classifier. We found that established MFCC features
Automatic identification of audio titles on radio broad- yield better performance compared to MPEG-7 ASP in
casts is a first step toward automatic annotation of general sound recognition under practical constraints.
radio programs. Systems designed for the purpose of
identification have to deal with a variety of postpro- 5-3 Automatic Optimization of a Musical Similarity
cessing potentially imposed on audio material at the Metric Using Similarity Pairs—Thorsten Kastner,
radio stations. One of the more difficult techniques to Eric Allamanche, Oliver Hellmuth, Christian Ertel,
be handled is time-scaling, i.e., the variation of play- Marion Schalek, Jürgen Herre, Fraunhofer IIS,
back speed. In this paper we propose a robust finger- Ilmenau, Germany
printing technique designed for the identification of
time-scaled audio data. This technique has been With the growing amount of multimedia data available
applied as a feature extractor to an algebraic indexing everywhere and the necessity to provide efficient meth-
technique that has already been successfully applied ods for browsing and indexing this plethora of
to the task of audio identification. audio content, automated musical similarity search and
retrieval has gained considerable attention in recent
4-3 Computing Structural Descriptions of Music years. We present a system which combines a set of
through the Identification of Representative perceptual low-level features with appropriate classifica-
Excerpts from Audio Files—Bee Suan Ong, tion schemes for the task of retrieving similar sounding
Perfecto Herrera, Universitat Pompeu Fabra, songs in a database. A methodology for analyzing the
Barcelona, Spain classification results to avoid time consuming subjective
listening tests for an optimum feature selection and com-
With the rapid growth of audio databases, many music bination is shown. It is based on a calculated “similarity
retrieval applications have employed metadata index” that reflects the similarity between specifically em-
descriptions to facilitate better handling of huge data- bedded similarity pairs. The system’s performance as
bases. Music structure creates the uniqueness identity well as the usefulness of the analyzing methodology is
for each musical piece. Therefore, structural descrip- evaluated through a subjective listening test.
tion is capable of providing a powerful way of interact-
ing with audio content and serves as a link between 5-4 Automatic Extraction of MPEG-7 Metadata for
low-level description and higher-level descriptions of Audio Using the Media Asset Management System
audio (e.g., audio summarization, audio fingerprinting, iFinder—Jobst Löffler, Joachim Köhler, Fraunhofer
etc.). Identification of representative musical excerpts IMK, Sankt Augustin, Germany
is the primary step toward the goal of generating struc-
tural descriptions of audio signals. In this paper we dis- This paper describes the MPEG-7 compliant media
cuss various approaches in identifying representative asset management system iFinder, which provides a
musical excerpts of music audio signals and propose set of automatic methods and software tools for media
to classify them into a few categories. Pros and cons of analysis, archiving, and retrieval. The core technology
each approach will also be discussed. of iFinder comprises several modules for audio and
video metadata extraction that are bundled in the
Friday, June 18 iFinderSDK, a commercial product offered to the media
industry. The workflow for audio content processing
SESSION CD-5: POSTERS, PART 2 together with pattern recognition methods used will be
5-1 Toward Describing Perceived Complexity of presented. Of special note, a technique for precise
Songs: Computational Methods and audio-text alignment together with a browser applica-
Implementation—Sebastian Streich, Perfecto tion for synchronized display of retrieval results will be
Herrera, Universitat Pompeu Fabra, Barcelona, Spain demonstrated. An insight to using MPEG-7 as a stan-
dardized metadata format for media asset manage-
Providing valuable semantic descriptors of multimedia ment will be provided from a practical point of view.
content is a topic of high interest in current research.
Such descriptors should merge the two predicates of 5-5 An Opera Information System Based on MPEG-7—
being useful for retrieval and being automatically Oscar Celma Herrada, Enric Mieza, Universitat
extractable from the source. In this paper the semantic Pompeu Fabra, Barcelona, Spain
descriptor concept of music complexity is introduced. Its
benefit for music retrieval and automated music recom- We present an implementation of the MPEG-7 stan-
mendation is addressed. The authors provide a critical dard for a multimedia content description of lyric opera
review of existing methods and a detailed prospect of in the context of the European IST project: OpenDra-
new methods for automated music complexity estimation. ma. The project goals are the definition, development,
and integration of a novel platform to author and deliv-
5-2 How Efficient Is MPEG-7 for General Sound er the rich cross-media digital objects of lyric opera.
Recognition?—Hyoung-Gook Kim, Juan José MPEG-7 has been used in OpenDrama as the base
Burred, Thomas Sikora, Technical University Berlin, technology for a music information retrieval system. In
Berlin, Germany addition to the MPEG-7 multimedia description
scheme, different classification schemes have been
Our challenge is to analyze/classify video sound track proposed to deal with operatic concepts such as musi-
content for indexing purposes. To this end we compare cal forms (acts, scenes, frames, introduction, etc.),

408 J. Audio Eng. Soc., Vol. 52, No. 3, 2004 April


25rd International Conference Program
musical indications (piano, forte, ritardando, etc.), and Despite the subjective nature of associating a certain
genre and creator roles (singers, musicians, production song or artist with a specific musical genre, this type
staff, etc.). Moreover, this project has covered the of characterization is frequently used to provide a con-
development of an authoring tool for an MPEG-7 stan- venient way of expressing very coarse information on
dard, namely MDTools, which includes segmentation, the basic stylistic and rhythmic elements and/or instru-
classification scheme generation, creation and produc- mentation of a song. An audio database that is struc-
tion, and media information descriptors. tured according to different musical genres is a first
important step to provide an easy/intuitive access to a
5-6 Morphological Sound Description: Computational large music collection. Thus, a convenient way for
Model and Usability Evaluation—Julien Ricard, indexing large databases by musical genre is desired.
Perfecto Herrera, Universitat Pompeu Fabra, This paper describes a system for an automatic genre
Barcelona, Spain classification into several musical genres. Different
features as well as classification strategies will be
Sound samples of metadata are usually limited to low- evaluated and compared. The system’s performance
level descriptors and a source label. In the context of is assessed by means of a subjective listening test.
sound retrieval only the latter is used as a search cri-
terion, which makes the retrieval of sounds having no
identifiable source (abstract sounds) a difficult task. Saturday, June 19
We propose a description framework focusing on Conference Day 2
intrinsic perceptual sound qualities, based on Schaef- SESSION CD-7: BROADCAST IMPLEMENTATIONS,
fer’s research on sound objects that could be used to SESSION A
represent and retrieve abstract sounds and to refine a
traditional search by source for nonabstract sounds. 7-1 Audio Metadata in Radio Broadcasting— Shigeru
We show that some perceptual labels can be auto- Aoki1, Masahito Kawamori2
matically extracted with good performance, avoiding 1TokyoFM Broadcasting, Tokyo, Japan
the time-consuming manual labeling task, and that the 2NTT, Tokyo, Japan
resulting representation is evaluated as useful and
usable by a pool of users. Generally an audio sequence or program is produced by
DAW (Digital Audio Workstation) and delivered as a dig-
ital audio file. However, the descriptive data of the audio
Friday, June 18 program, such as the cue sheet of the radio program, is
SESSION CD-6: FEATURE EXTRACTION, SESSION B transferred apart from the audio file. The content
descriptive data is commonly known as metadata. The
6-1 Drum Pattern-Based Genre Classification from most effective method to transfer the audio data and the
Popular Music—Christian Uhle, Christian Dittmar, metadata is to embed those as one digital file that an
Fraunhofer AEMT, Ilmenau, Germany audio player plays and offer the description of that audio
sequence simultaneously. This paper describes the for-
This paper addresses the identification of drum patterns
mat and scheme of the audio file with metadata.
and the classification of their musical genres. The drum
patterns are estimated from audio data automatically.
This process involves the transcription of percussive 7-2 Integrated Metadata in the Broadcast Environment
unpitched instruments with a method based on indepen- —Joe Bull1, Kai-Uwe Kaup2
1SADiE UK, Cambridgeshire, UK
dent subspace analysis and a robust estimation of the
2VCS Aktiengesellschaft, Bochum, Germany
tatum grid and the musical meter. The rhythmic patterns
are identified from pattern histograms, describing the In a modern broadcast environment, efficient and effec-
frequency of occurrence of the percussive events. The tive handling of metadata becomes more important
classification procedure evaluates the meter information, every day. Much time and money can be wasted reen-
the pattern histogram as well as other high-level rhyth- tering data that is already present in the digital domain.
mic features derived from the estimated drum pattern. This money could be better spent on program-making.
The authors will describe practical examples of how this
6-2 Assessing the Relevance of Rhythmic Descriptors can be achieved in a real broadcast environment using
in a Musical Genre Classification Task—Fabien real products in use or in development.
Gouyon1, Simon Dixon2, Elias Pampalk2, Gerhard
Widmer2
1Universitat Pompeu Fabra, Barcelona, Spain Saturday, June 19
2Austrian Research Institute for AI, Vienna, Austria
SESSION CD-8: BROADCAST IMPLEMENTATIONS
Organizing or browsing music collections in a musically SESSION B
meaningful way calls for tagging the data in terms of, e.g., 8-1 Broadcast Wave and AES Audio in MXF—Bruce
rhythmic, melodic or harmonic aspects, among others. In Devlin, Snell & Wilcox
some cases, such metadata can be extracted automati-
cally from musical files; in others, a trained listener must The SMPTE has established MXF is the new open stan-
extract it by hand. In this paper we consider a specific set dard file format for interchange in the broadcast world.
of rhythmic descriptors for which we provide procedures of One important aspect of the standard is audio mapping.
automatic extraction from audio signals. Evaluating the rel- This paper will be a basic tutorial on how MXF and the
evance of such descriptors is a difficult task that can easily audio mapping standard work. It will include issues of
become highly subjective. To avoid this pitfall, we as- physically interleaving audio and video as well as adding
sessed the relevance of these descriptors by measuring rich metadata using the MXF data model.
their rate of success in genre classification experiments.
8-2 The Advanced Authoring Format and its Relevance
6-3 Music Genre Estimation from Low-Level Audio to the Exchange of Audio Editing Decisions—David
Features—Oliver Hellmuth, Eric Allamanche, McLeish1, Phil Tudor2
Thorsten Kastner, Ralf Wistorf, Nicolas Lefebvre, 1SADiE, Cambridgeshire, UK,
2BBC R&D, Tadworth, Surrey, UK
Jürgen Herre, Fraunhofer IIS, Ilmenau, Germany ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 409
25rd International Conference Program
This paper explores how the Advanced Authoring tives in an integrated world of interconnected databas-
Format (AAF) model, a vehicle for exchanging es coming into all audio-related companies.
metadata and media-rich content, can be used to
describe audio program compositions so that they Saturday, June 19
can be more seamlessly exchanged between audio
editors who work with tools designed by different SESSION CD-10: DELIVERY OF AUDIO
manufacturers. In addition, the extensibility of the 10-1 Watermarking and Copy Protection by Information
format is discussed as a means of looking at its fu- Hiding in Soundtracks—Tim Jackson, Keith Yates,
ture potential. Francis Li, Manchester Metropolitan University,
Manchester, UK
Saturday, June 19 In this paper digital audio watermarking techniques
SESSION CD-9: LIBRARIES AND ARCHIVES are reviewed and categorized. Applications of water-
marking schemes are discussed, and their capabili-
9-1 Development of a Digital Preservation Program at ties and limitations clarified in the context of audio
the Library of Congress—Carl Fleischhauer, copyright management and copy protection. Tradi-
Samuel Brylawski, Library of Congress, Washington, tional watermarking schemes embed hidden signa-
DC, USA tures in soundtracks and are found to be effective in
ownership authentication and copyright manage-
This paper will trace the development of a digital ment. Nevertheless, they do not prevent unautho-
preservation program for sound recordings at the rized copying unless dedicated watermark detectors
Library of Congress. It will outline the Library’s use of are added to the recording devices. Purpose-chosen
METS (Metadata Encoding and Transmission Stan- hidden signals are known to interfere with some
dard); survey the challenges faced in the Library’s recording devices, for example, magnetic tape
work to create digital objects for public use, comprised recorders, offering a potential solution to copy pro-
of sound files and images of packaging and accompa- tection. It is therefore reasonable to postulate that
nying materials; and review the tools and methods watermarking techniques could be extended to the
utilized to create metatdata. general audio copy protection without the resort to
dedicated detectors.
9-2 Audio Metadata Used in the Radio Nacional de
España Sound Archive Project—Miguel Rodeno1, 10-2 Metadata Requirements for Enabling the On-line
Jesus Nicolas2, Isabel Diaz2 Music Industry’s New Business Models and
1Alcala University, Madrid, Spain
Pricing Structures—Nicolas Sincaglia, MusicNow
2Radio Nacional de España, Madrid, Spain
Inc., Chicago, IL, USA
The 20th century Spanish sound history has been pre- The music industry has begun selling and distributing its
served in digital format and may now be consulted media assets online. Online music distribution is vastly
online through the Internet. This a pioneer project in the different from the normal means of media distribution.
broadcasting industry around the world finished in These untested methods of music sales and distribution
December 2002. The archive is considered the most require experimentation in order to determine which
important audio archive in the Spanish language in the business models and pricing tiers will most resonate
world. This paper describes the metadata used in this with the consumer. This translates into the need for ver-
project. Radio Nacional de España followed the 1997/98 satile and robust data models to enable these market tri-
European Broadcasting Union (EBU) standard for the als. Copyright owners and media companies require
interchange of audio files and their broadcasting: the well-designed data structures to enable them to transmit
Broadcast Wave Fornmat (BWF). The voice/word and and receive these complicated sets of business rules.
the different kinds of music (classical, light, international This metadata is an essential part of an overall digital
or Spanish) have different types of metadata. Some rights management system to control and limit access to
examples are shown with the detailed metadata. the associated media assets.

9-3 Integration of Audio Computer Systems 10-3 Audio Meta Data Generation for the Continuous
and Archives Via the SAM/EBU Dublin Core Media Web—Claudia Schremmer1, Steve Cassidy2,
Standard,Tech.doc 3293—Lars Jonsson1, Gunnar Silvia Pfeiffer1
Dahl2 1CSIRO, Epping, NSW, Australia
1Swedish Radio 2Macquarie University, Sydney, Australia
2KSAD, Norsk Rikskringkasting, Oslo, Sweden
The Continuous Media Web (CMWeb) integrates time-
Dublin Core is a well-known metadata initiative from continuous media into the searching, linking, and brows-
W3C that has been widely spread and used for text ing function of the World Wide Web. The file format un-
and Web pages on the Internet. The Scandinavian derlying the CMWeb technology, Annodex, streams the
SAM-group, with 25 archive specialists and engineers media content multiplexed with metadata in CMML for-
have defined semantic definitions and converted the mat that contains information relevant to the whole media
commonly used Dublin Core initiative for general use file (e.g., title, author, language) as well as time-sensitive
within the audio industry. The 15 basic elements of information (e.g., topics, speakers, time-sensitive hyper-
Dublin Core and new subsets have proven to cover links). This paper discusses the problem of generating
most of the tape protocols and database fields existing Annodex streams from complex linguistic annotations:
in broadcast production chain from early capturing over annotated recordings collected for use in linguistic
various types of production and all the way to distribu- research. We are particularly interested in
tion and archiving. This presentation covers some automatically annotated recordings of meetings and tele-
examples of the use of metadata transfer with Dublin conferences and see automatically-generated CMML
Core expressed in XML in Sweden and Norway. It files as one way of viewing such recordings. The paper
ends in a discussion of the future possibilities of Dublin presents some experiments with generating Annodex
Core in comparison with other existing metadata initia- files from hand annotated meeting recordings.

410 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


AES 25TH INTERNATIONAL CONFERENCE REGISTRATION
2004 June 17–19, London, UK
Registration: Fee includes attendance at the 25th minutes walk from Church House. Delegates will
conference including a copy of the Proceedings of have the opportunity to continue their discussions
the AES 25th International Conference as well as there in a relaxed atmosphere. There are three
lunch and refreshments Thursday through price options—£99, £119, and £149—with VAT and
Saturday. There is an optional dinner on Friday breakfast included; these are special rates for the
evening at the Houses of Parliament. Rooms have conference. To book a room fill out item 2 below;
been reserved at St. Ermin’s Hotel, which is a few payment is made to hotel at end of stay.
Please return by mail or fax to:
Heather Lane, AES, P.O. Box 645, Slough SL1 8BJ, UK
Fax: +44 1628 667002 Tel: +44 1628 663725 Email: uk@aes.org
Conference registration is also available online at www.aes.org/events/25/registration.html.
No registration is final until the conference fee has been received.

Please print/type all information as you wish it to appear on your badge.


1
First Name Last Name
Affiliation (Company/Institution)
Street Address
City State Postal Code Country
Telephone INT+ Fax INT+
Email AES Membership No.

■ Book me a room at St. Ermin’s Hotel Number of people______


2 ■ £ 99 Standard ■ £ 119 Superior ■ £ 149 Deluxe (Prices are per room, not per person)
Arrival Date Departure Date Number of nights

Metadata Conference only (Friday and Saturday, June 18–19)


3 ■ AES MEMBERS £ 298.00 plus £52.15 VAT Total: £ 350.15
■ NONMEMBERS £ 348.00 plus £60.90 VAT Total: £ 408.90
■ AUTHORS, STUDENTS £ 198.00 plus £34.65 VAT Total: £ 232.65
Tutorial Day only (Thursday, June 17)
■ AES MEMBERS £ 98.00 plus £17.15 VAT Total: £ 115.15
■ NONMEMBERS £ 128.00 plus £22.40 VAT Total: £ 150.40
■ AUTHORS, STUDENTS £ 68.00 plus £11.90 VAT Total: £ 79.90
Tutorial Day and Conference
■ AES MEMBERS £ 368.00 plus £64.40 VAT Total: £ 432.40
■ NONMEMBERS £ 448.00 plus £78.40 VAT Total: £ 526.40
■ AUTHORS, STUDENTS £ 248.00 plus £43.40 VAT Total: £ 291.40
■ Optional Dinner___number of tickets £ 38.00 plus £ 6.65 VAT Total: £ 44.65

Payment Modes (check box) Total Amount £_________________________


4
■ Check (payable to AES Ltd.) ■ Mastercard/Eurocard ■ Visa
Card Number Expiration Date

/
Month Year
Cardholder's Name (print)________________________________________________________________________
Signature of Cardholder__________________________________________________________________________

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 411


HISTORICAL
PERSPECTIVES AND
TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS
FOR SOUND
REINFORCEMENT*
1,2 1
J. Eargle , Honorary Member, and M. Gander , AES Fellow
(1) JBL Professional, Northridge, California, U. S. A.
(2) JME Consulting Corporation, Los Angeles, California, U. S. A.

INTRODUCTION levels for speech reinforcement. Early pany, played Christmas carols for an
Horns and direct-radiating systems power amplifiers were limited to about audience of 50,000 using Pridham-
have provided the basis for sound rein- 10 watts output capability, and horn- Jensen rocking armature transducers
forcement for more than a century. driver efficiencies on the order of 20% connected to phonograph horns [12].
Both technologies have benefited from to 30% were necessary to reach the Western Electric set up a public
engineering and manufacturing im- desired sound pressure levels. address system capable of addressing
provements as well as demands for The direct field level referred to a 12,000 persons through 18 loudspeak-
pushing the performance envelope. distance of one meter produced by one ers in 1916 [24, p. 24].
Trends of fashion have often inter- acoustic watt radiated omnidirection- The first distributed system was
sected with engineering development, ally from a point source in free space is employed in 1919; 113 balanced arma-
economics, and even marketplace 109 dB LP [1, p. 314]. If we use a 10- ture driving units mounted on long
opportunism. A survey tutorial of the watt amplifier with a horn–driver com- horns were strung along New York
significant developments in transduc- bination that is 30% efficient, we can City’s Park Avenue “Victory Way” as
tion, signal transmission, and system produce three acoustical watts. If the a part of a Victory Bond sale [24, p. 25]
synthesis is presented here and dis- horn has a directivity index (DI) on [2], as shown in Fig. 1. The first suc-
cussed in historical perspective. axis of, say, 10 dB, then we can cessful indoor use of a public address
We begin with an overview of sound increase that level to: system was at the 1920 Chicago
reinforcement and the technologies Republican Convention, which also
that have supported it. This is followed Level (re 1 meter) = 109 + employed the first central cluster con-
by more detailed technical discussions 10 log (3) + DI = 124 dB LP figuration [24, p. 25], as shown in Fig.
of both direct radiating and horn sys- 2. On March 4, 1921, President Hard-
tems, leading to a discussion of mod- At a more realistic listening distance ing’s inauguration was amplified [24,
ern loudspeaker array techniques. The of 10 meters, the level would be, by p. 24], and on November 11, 1921,
presentation ends with a comprehen- inverse square relationship, 20 dB President Harding’s address in Arling-
sive bibliography. lower, or 104 dB. If wider coverage is ton, Virginia, was transmitted by West-
needed, more horns can be added and ern Electric, using Edgerton’s 1918
HISTORICAL PERSPECTIVES splayed as required. design of four-air-gap balanced-arma-
In the early days of transducer develop- There is little documentation of early ture units. For the first time 150,000
ment, horn systems offered the only examples of general speech reinforce- people, at Madison Square Garden in
means possible for achieving suitable ment, and that art progressed fairly New York, in the adjoining park, and in
slowly [9]. The first example of large- the Civic Auditorium in San Francisco,
*Revised and expanded from a presenta- scale sound reinforcement occurred on simultaneously listened to a person
tion at the Institute of Acoustics 12th
Annual Weekend Conference, Winder- Christmas Eve, 1915, when E. S. Prid- speaking [2].
mere, England, October 25-27, 1996. ham, cofounder of the Magnavox com- It was the cinema that paved the way

412 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


Fig. 1. First use of a distributed sound system, Fig. 2. First central array sound system, 1920
New York, Park Avenue (1919). Note trumpet Chicago Republican Convention. (Photo Fig. 3. Western Electric
horns in upper left hung above crowd aiming from Frederick Thrasher, ed., Okay for “Roxy” horn type 12A
downward. (Photo from Lester Cowan Sound … How the Screen Found its Voice, with 555W dynamic
Recording Sound for Motion Pictures, McGraw Duell, Sloan, and Pearce, New York, 1946.) driver. (Photo, Cowan,
Hill, New York, 1931.) p. 329.)

for rapid development of professional lowing WWII [15].


sound reproduction. Talking pictures The LF transducers
required more than speech intelligibil- chosen for this appli-
ity, however. The single horns of the cation had relatively
day, as shown in Fig. 3, were limited small moving masses
in response to the range from about and high (Bl) 2 /R E
125 Hz to 3 kHz. While this was ade- ratios. This enabled
quate for speech, music required them to efficiently
greater bandwidth, especially at lower handle the acoustical
frequencies [7]. load transformed by
The first widely accepted cinema sys- the horn and maintain
tem used a two-way approach. A multi- good response up to
cellular high-frequency horn with a about 400 Hz, at which
compression driver was coupled to a point the high fre-
cone-driven, folded low-frequency horn quency (HF) horn took
assembly [3] as shown in Fig. 4. Its over. The HF horn Fig.4. Shearer MGM horn Fig.5. The Hilliard and
high level of performance and approval itself was undergoing system, which won the Lansing designed Altec-
by the Academy of Motion Picture Arts modifications aimed at Academy of Motion Picture Lansing A-4 “Voice of
Arts and Sciences scientific
and Sciences, in the form of a technical improving its direc- award in 1936. Contributors to the Theatre” system,
achievement award, led to its accep- tional response. RCA this program included Douglas 1945. Utilizing a short
LF straight exponential
tance as an industry standard [11]. favored the radial Shearer, John Hilliard, James horn with an acoustical
The next step in low frequency (LF) horn, while Bell Labo- B. Lansing, Harry Olson, John path length equal to that
response was to employ multiple large- ratories favored the Volkmann, and William Snow. of the HF horn to avoid
(Photo from Research Council delay between the LF
diameter direct radiators in some sort of multicellular device of AMPAS, Motion Picture
“directional baffle” that could both load [27]. Bell Laboratories Sound Engineering, Van and HF sections.
(Photo, Altec-Lansing)
the drivers for increased efficiency and later pioneered the use Nostrand, New York, 1938.)
give them added forward directionality of the acoustical lens
[20]. The best LF direct-radiating trans- in similar applications [8, 14]. sumer high fidelity during the mid-
ducers of the day were about 10 dB By the mid-1940s, HF horns had 1950s and 1960s, reached the point
lower in efficiency than horn systems, established their primacy in the fre- where beneficial tradeoffs could be
but their use was mandated due to the quency range above 500 Hz, while made among the three “eternal” loud-
size and complexity of traditional bass hybrid LF horns, which relied on reflex speaker variables of size, efficiency,
horns. When employed in multiples, loading of cones below about 100 Hz, and LF bandwidth extension. As early
they could provide sufficient level and dominated the range below 500 Hz. as the mid-1950s, consumers began to
coverage in large theaters. In time, the These families of components formed enjoy relatively small sealed systems
various directional baffles evolved into the basis of postwar sound reinforce- that provided substantial LF output [25,
the familiar front-loaded quasi-horns ment activities and maintained that 26]. Additional analysis and synthesis
that became universally identified with position for nearly three decades. of ported enclosure design by Thiele
the cinema [16], as shown in Fig. 5. By the early 1970s available ampli- and Small [23], following the earlier
Field coil energized transducers also fier power, which had been steadily ris- work of Locanthi [17], Novak [19], and
gave way to permanent magnets fol- ing through the introduction of con- Benson [31], led to the general ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 413


HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
flat power bandwidth over its operating
A B spectrum in the range from 40 to 400
Hz. It uses high-powered drivers and
can accommodate approximately four
times the amplifier power as the system
Fig. 6A, but by the mid-1970s the cost
per watt of amplifier power had dimin-
ished significantly.
For large-scale music reinforcement
there were other considerations. LF
horn systems had developed along lines
unique to the music industry, since
fairly high mid-bass directivity was a
prime concern. The older radial horns
were likewise favored for their direc-
tionality, which increased at HF. Typi-
Fig. 6. Large systems for speech reproduction. Design using pre-1970 components,
multicellular exponential horns with ported LF horns (A, J. Audio Eng. Soc., vol. 20, cal usage is shown in Fig. 7.
pp. 571); design using post-1970 components, constant directivity HF horns and The conservative motion picture
ported LF direct radiators (B, courtesy KMK Ltd.) industry continued with two-way solu-
tions but eventually traded in its hybrid
adoption of ported enclosures as a LF the previous generation of hardware bass horns for standard ported systems
building block in professional sound (theater-type LF hybrid horns and HF and retired the old multicellular HF
system design. multicellular horns), while the system horns of the 1930s [5]. The concept of
Through the use of Thiele-Small shown in Fig. 6B makes use of ported flat power response in system design
driver parameters, LF systems could be LF systems and modern HF horn hard- stated that not only uniform direct
rationally designed for best response ware. In terms of LF response and arrival sound was important; reflected
and enclosure size. Gone were the days power handling, both systems have six sound (proportional to the total radiated
of cutting and trying, and also gone LF drivers. With horn loading, the out- power of the system) should also be
were the days of poorly designed LF put capability of the system shown in uniform. This goal could be more eas-
transducers. Thiele-Small analysis Fig. 6A will be higher in the 100 to 200 ily met with ported LF systems, along
pointed to the need for identifying the Hz range than the system shown in Fig. with uniform coverage HF horns.
right cone mass, resonance, excursion 6B. However, the LF system in Fig. 6B Hilliard [10] was well ahead of his time
capability, Bl product, enclosure vol- is smaller and has been designed for in bringing this concept to the attention
ume, and tuning to achieve a targeted
performance goal.
During the 1970s and early 1980s
ported LF systems became the basis for
most sound reinforcement LF design.
For many music and speech applica-
tions the horn HF crossover frequency
was raised from 500 Hz to 800 Hz and
higher, with cone drivers filling in the
midrange [18]. This change from the
traditional two-way philosophy was
brought on by modern program
requirements for generating increased
level. Higher power handling transduc-
ers were substituted in the LF horns,
but these new drivers forced the HF
power bandwidth of the horn down-
ward. With the crossover frequency
raised to improve HF reliability, the
result was uniform on-axis response,
but grossly uneven power response [6].
By the mid-1970s ported LF systems
were poised to take a commanding lead
in the design of speech reinforcement
Fig. 7. Typical early-1970s horn-loaded system using ported LF horns, 90° radial
systems. Fig. 6 shows examples of the horns with large-format drivers, and 90° radial horns with small-format drivers for
older approach and the newer one. The augmenting the upper frequencies. A narrow coverage horn with a large format
system shown in Fig. 6A makes use of driver is used for long-throw coverage. (Photo, JBL)
414 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
of the film industry, but his attempts at As we continue our discussion of DIRECT RADIATORS
commercialization failed. direct radiators and horns, we will
The well-engineered ported LF sys- present a detailed analysis of the Early Development
tem of the 1970s established its domi- engineering fundamentals of both Ernst Werner Siemens’ 1874 U.S.
nance at the low end of the frequency methods of transduction. This dis- Patent 149,797 was prophetic; he
spectrum. The combination of low-cost cussion will cover basic operating described in detail a radial magnet struc-
power amplification, low-distortion, principles, directional control, and ture in which a coil of wire was placed.
high-power LF transducer development, distortion mechanisms associated The coil was connected with a radiating
and enclosure simplicity has given the with each method. We will follow surface that Siemens described as the
ported system an advantage at low fre- this with a discussion of array con- frustum of a cone. He had literally
quencies that all but the very largest cepts encompassing both kinds of invented the cone loudspeaker—with
horn systems could never match. transducers. nothing to play over it except dc tran- ➥
Keele [13] made an instructive com-
parison between LF horns and direct
radiators, pointing out the virtual parity
between them. For the same LF cutoff,
the horn will have highest efficiency, a TEST FASTER FOR LESS
large complex enclosure, and will
require a small number of drive units.
WITH DSCOPE SERIES III
By comparison, a direct radiator multi-
ple-driver vented box will have moder- Following completion of our extensive beta-test
ate efficiency, a small, relatively simple program, Release 1.00 is now available
enclosure, a large number of drivers,
and higher power handling capability
(because of the multiple drivers). In Ideal for:
Keele’s words, “This roughly means • Research & Development
that if one has a lot of space, not much
• Automated Production Test
money to spend on drivers and ampli-
fiers, and lots of cheap labor—build a • Quality Assurance
horn system. If labor is not cheap, and • Servicing
you don’t have much space, and if you • Installation
can afford drivers and amplifier
power—build a direct radiator system.”
In essence, the costs of labor and lum- DSNet I/O Switcher 16:2
ber had far outstripped those of drivers now available
and watts.
Through the late 1980s and 1990s,
however, the professional sound indus-
try saw a return to horn systems for
covering the range down to about 300
Hz, especially for large-scale speech
reinforcement. The reasons have to do
not with efficiency per se but rather
directional control. These new systems
have taken the form of large format
compression drivers, or cone transduc- dScope Series III is
ers designed for horn–driver applica- simply the fastest way to test.
tions, and large horns optimized for the
range from 100 to 300 Hz to 1 to 3
kHz, with ported LF systems handling Call or e-mail NOW to find out just how fast your tests can be!
the range below. Rapid flare HF horns
are now employed, and their distortion
characteristics, level for level, may be Prism Media Products Limited Prism Media Products Inc.
William James House, 21 Pine Street,
up to 10 dB lower than the older hard- Cowley Road, Cambridge. Rockaway, NJ.
CB4 0WX. UK. 07866. USA.
ware [4]. New digital methods of signal
control have made multiway systems Tel: +44 (0)1223 424988 Tel: 1-973 983 9577
Fax: +44 (0)1223 425023 Fax: 1-973 983 9588
more acceptable than they were in the
sales@prismsound.com www.prismsound.com
days of passive dividing networks,
through time alignment and steep
crossover slopes.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 415
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 8. The cone driver. Section view (A); equivalent circuit (B); radiation impedance of a cone in a large wall (C); power
response of system (D).

sients and other telegraphic signals. He mass controlled, and the cone looked with frequency to approximately ka = 2,
remarked at the time that it could be into a rising radiation impedance. This above which point it is essentially con-
used “for moving visible and audible in effect provided a significant fre- stant (ka is equal to cone circumference
signals.” quency region of flat power response divided by wavelength, or, 2πa/λ). Sys-
Half a century later in 1925, Rice for the design. Details of this are shown tem response is shown in Fig. 8D, and
and Kellogg of General Electric in Fig. 8. system efficiency over the so-called pis-
described “a new hornless loud- ton band is given [49] as:
speaker” that resembled that of Region of Flat Power Response
Siemens—a similarity that prompted Fig. 8A shows a section view of the
Rice to say: “The ancients have stolen cone loudspeaker with all electrical,
our inventions!” [37]. mechanical, and acoustical parame-
The key difference in the Rice and ters labeled. The equivalent circuit is
Kellogg design was the adjustment of shown in Fig. 8B; here the mechani-
mechanical parameters so that the fun- cal and acoustical parameters are
damental resonance of the moving sys- shown in the mobility analogy.
tem took place at a lower frequency When mounted in a large baffle,
than that at which the cone’s radiation the moving system looks into a com-
impedance had become uniform. Over plex acoustical load as shown in Fig. Fig. 9. Illustration of mutual coupling of LF
this range, the motion of the cone was 8C. The resistive component rises drivers.
416 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
η = [ρ0(Bl)2SD2/RE]/2πcMMS2
(1)

where ρ 0 is the density of air


(kg/m3), (Bl)2/RE is the electromechani-
cal coupling coefficient (N2/W), c is the
speed of sound (m/s), SD2 is the area of
the cone, and MMS is the mass of the
moving system (kg).
The larger the coupling coefficient,
the lower the resonant Q at f0 and the
higher the piston band efficiency will
be. Likewise, the higher the Q at f0, the
lower the piston band efficiency.
Depending on the application, both
kinds of response may be useful to the
design engineer.
It can easily be seen that, for maxi-
mum extension of the piston band, the
lower f0 must be, and the lower the sys-
tem efficiency will be. The efficiency-
bandwidth product, for a given cone
diameter and coupling coefficient, thus
tends to be constant over a relatively
large range.

Mutual Coupling Fig. 10. The ported system. Section view (A); equivalent circuit (B); port and cone
In the LF range over which their contributions to total output (C).
response is essentially omnidirectional
(ka = 0.2 or lower), a doubling of
closely spaced driving units will result
in an increase in acoustical output of 3
dB for a fixed input power reference
level [39, 48, 52, 53]. The progression
in efficiency increase is shown in Fig. 9
for one, two, and four LF transducers,
respectively. In each case, the electrical
power delivered to each ensemble of
drivers is constant. Assume that the ref-
erence power fed to the single driver is
one watt; then for the set of two
drivers, the power per driver is 0.5
watt, and for the set of four, the power
per driver is 0.25 watt. Fig. 11. Power compression in a 380-mm-diameter LF driver. Curves for 1 and 100
One may imagine that, in the two- watts input are superimposed and displaced by 20 dB. (Data courtesy JBL.)
driver case with both drivers wired in
parallel, those two drivers have, in a give something for nothing, but there tion of power doubling for each two-
sense, coalesced into a new driver— are clear limits to its effectiveness. times increase in drivers is accurate
one with twice the cone area, twice the With each doubling of cone area, the only at very low frequencies and only
moving mass, and half the value of RE. ka = 0.2 upper response frequency if the efficiency values are low to
Thus, by Equation 1, the efficiency will corner moves downward approxi- begin with.
have doubled. For the case where the mately by a factor of 0.7, since this is
two drivers are wired in series, the the reciprocal of the value by which Distortion
analysis goes as follows: The new the effective cone radius has
driver has twice the cone area, twice increased. As the process of adding Mechanical Effects
the moving mass, four times the (Bl)2 drivers is continued, in the limit it can The primary distortion mechanism in
product, and twice the value of R E. be shown that the efficiency of an cone transducers is due to mechanical
Again, by Equation 1, there will be a ensemble of direct radiators cannot stress–strain limits. Small identified a
doubling of efficiency. exceed a value of 25% [38]. Because practical mechanical displacement
Mutual coupling often appears to of these constraints, the approxima- limit from rest position in the axial ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 417
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 13. Demagnetization plots for three


Fig. 12. Power compression. Output level versus time for three LF driver loudspeaker magnetic materials. Alnico V (1);
designs. (Data courtesy JBL.) ferrite (2); neodymium-iron-boron type (3).

direction as the excursion at which Below that frequency the electrical to thermal failure. Thermal failure is
10% harmonic distortion is reached. drive signal is generally rolled off to reached when the power dissipated in
This limit is known as xMAX. While a avoid subsonic over-excursions of the the voice coil as heat cannot be
loudspeaker may be operated beyond cone. removed at a sufficient rate to maintain
this displacement limit, at least on a A secondary mechanical distortion a safe operating temperature. A great
momentary basis, the 10% linearity effect will be seen when the voice coil deal of loudspeaker development has
departure is generally recognized as a driver is far enough out of the gap so gone into designing structures and
safe limit for good engineering prac- that there is a momentary loss of Bl moving elements that are not only
tice. Since cone displacement tends to product at peak excursion values. The resistant to heat but aid in its removal
increase as the inverse square of effect is asymmetrical and gives rise from the transducer [32, 36].
frequency down to the f0 region, it is to both even and odd distortion For most applications in sound rein-
easy to see how the x MAX limitation components. forcement, the effects of loudspeaker
may easily be encountered in normal heating are more likely to result in
operation. Port Turbulence in Vented Systems component failure than those associ-
The onset of the cone displacement In vented systems the ultimate output at ated with displacement limitations.
limit at low frequencies can be allevi- low frequencies may be limited not by Dynamic linearity or power compres-
ated by using ported LF enclosures. The considerations of maximum cone sion are terms used to describe the
nature of this design is shown in Fig. excursion, but rather by air turbulence effects of heating on audio performance
10. A section view of a ported system is in the enclosure port when the system [34]. The data shown in Fig. 11 pre-
shown in Fig. 10A, and the equivalent is operating at the tuning frequency sents the frequency response of a single
circuit is shown in Fig. 10B. The design [34]. A tentative limit here is to restrict 380-mm LF transducer with inputs of 1
relies on controlling the Helmholtz res- the port air particle velocity so that its watt and 100 watts. In each case the
onance of the enclosure to provide an peak value does not exceed about 5% chart recording of the levels has been
“assisted output,” via the port, that min- of the speed of sound. In general, ports adjusted to account for the 20-dB offset
imizes cone motion (and thus distor- should be designed with contoured between the curves. In this manner the
tion) at low frequencies, as shown in boundaries to minimize turbulence and response differences can be clearly
Fig. 10C. Thiele–Small parameters are the noise and losses it often produces. seen. If there were no dynamic com-
universally used today to synthesize Significant studies of port turbulence pression, the two curves would lie one
these systems. and its minimization through tapering on top of the other. As it is, the pro-
Virtually all commercial PC design the port tube’s cross-section area have gressive heating results in an increased
programs for ported systems will indi- been carried out by Vanderkooy [50, value of R E, which lowers the effi-
cate transducer displacement limits so 51], Salvatti and Button [45], and ciency. In extreme cases, the increase
that the design engineer will always be Roozen et al. [44]. in RE can result in changes in the LF
aware of whether a system, while still alignment, which may be clearly audi-
on the drawing board, will go into dis- Thermal Effects ble as such.
placement overload before it reaches its Modern cone transducers intended for Another way of viewing power com-
thermal limit. Good engineering prac- heavy-duty professional applications pression is shown in Fig. 12. Here, sev-
tice demands that a ported system take advantage of newer materials and eral 380-mm transducers have been
remain thermally limited down to f0. adhesives to make them more immune driven with a wide-band signal, and the
418 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 15. The Altec-Lansing Duocone. Section view of cone (A);


equivalent circuit (B).

tion performance of a located high enough in the second


transducer [33]. The quadrant of the B-H graph that the
primary effect is the magnetic circuit is very likely to be
variation in the mag- operating at or near saturation. In this
netic operating point case the degree of flux modulation will
that comes as a result be minimal.
Fig. 14. Performance with same moving system in three
of signal current in Fig. 14 shows how a typical problem
different motor assemblies. Alnico V (A); traditional ferrite the voice coil as it was solved. The distortion data shown in
(B); ferrite with undercut polepiece and aluminum modulates the mag- Fig. 14A is that of an older 300-mm-
shorting ring (C). (Data courtesy JBL.) netic structure’s oper- diameter LF transducer operating with
ating point. This hap- an Alnico V magnet structure. Keeping
effect of temperature rise has been plot- pens to some degree in all designs, the same moving system but changing to
ted against time. For each transducer but the effect on acoustical perfor- a typical ferrite magnet structure yields
design, the reduction in output level mance depends on the degree of flux the data shown in Fig. 14B. Note the
eventually reaches an asymptotic value modulation of the static magnetic significant increase in mid-frequency
that depends on how effectively heat field. The general result of flux modu- distortion.
can be removed from the voice coil and lation is to increase the level of sin- The data in Fig. 14C shows the
magnet structure. In general, larger gle-ended distortion, which is largely effect of a ferrite magnet system out-
diameter voice coils remove heat more second harmonic. fitted with undercut polepiece geome-
efficiently than smaller ones; addi- Fig. 13 shows the demagnetization try and a large, low-resistance alu-
tional measures, such as increasing curves for three magnetic materials, minum flux shorting ring placed at the
convection cooling and increasing Alnico V, ferrite, and neodymium- base of the polepiece. The significant
radiation to outside mechanical parts based systems. With the Alnico V sys- reduction in second-harmonic distor-
of the driver, can be helpful as well. tem, the flux curve has a moderate tion results from the setting up of an
As the temperature of a ferrite mag- slope in the operating region, which is induction current in the shorting ring
netic structure increases, both flux den- near the intersection with the B-axis; at that counteracts the normal tendency
sity and efficiency are reduced. The that position there is little tendency for of voice coil current to shift the mag-
effect is similar to the resistance flux modulation to be a problem. How- netic operating point.
increase caused by voice coil heating. ever, a strong input signal can result in Other magnetic distortion effects
If the temperature rise in the ferrite permanent demagnetization if the oper- include: the generation of eddy cur-
material has been moderate, normal ating point is forced downward on the rents in local iron structures, which
performance may be restored upon steep portion of the curve to the left of results in an increase in third-har-
cooling. the normal operating point. monic distortion and I2R losses; and
With the ferrite magnet there is a cer- inductance modulation of the voice
Distortion in Transducer Magnetic tain amount of flux modulation due to coil, due the varying amount iron
Systems the uniform slope of the curve. With instantaneously surrounded by the
Many aspects of the direct radiator’s the high-energy neodymium materials, voice coil, which results in increased
magnetic system can affect the distor- the demagnetization curve is normally second-harmonic distortion. ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 419
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
Thermodynamic and FM Distortion
Effects
Thermodynamic distortion, or air over-
load, is present in relatively small
quantities in direct radiator systems and
may be disregarded in the normal oper-
ation of cones and domes.
Frequency modulation (FM) com-
ponents are more likely to occur,
especially at low to mid frequencies,
where high cone excursions at lower
frequencies may modulate higher fre-
quencies as the cone’s velocity attains
a significant percentage of the speed
of sound. The effect is quite notice-
able in single-cone systems but is
minimized in multiway systems.
Beers and Belar [30] and Klipsch [40]
were among the first to describe this
Fig. 16. Polar plots for a piston mounted in a wall. (Data after Beranek, 1954.) phenomenon in detail.
Another aspect of thermodynamic
distortion present in sealed direct radia-
tor systems is due to the nonlinearity of
the enclosed air spring. In general, if
the maximum instantaneous change in
an enclosed volume can be limited to
about 0.5%, the effect of the nonlinear-
ity can be ignored [34]. The type and
amount of enclosure damping material
in a sealed enclosure has the additional
effect of increasing the actual enclosed
volume. The work of Leach [41] is sig-
nificant in this area.

The Decoupled Cone


Over the years loudspeaker designers
have observed that, at high frequen-
cies, the cone ceases to move as a sin-
gle unit, but rather breaks up into more
complex motions. These result in an
effective lighter moving mass at high
frequencies, extending the HF
response of the system. This was first
commercialized by the Altec Duocone
loudspeaker [28]. Fig. 15A shows a
section view of the cone profile used in
the Duocone loudspeaker, and a mobil-
ity mechanical circuit is shown in Fig.
15B. The modern soft-dome HF unit
exploits this effect more predictably
through high damping of the moving
system. Finite element analysis (FEA)
provides a means of analyzing in detail
the directionality of loudspeaker cones
and domes, taking into account multi-
ple breakup effects. Many low-cost
drivers intended for ceiling installation
in distributed systems make good use
Fig. 17. Polar plots for a piston mounted at the end of a long tube. (Data after of decoupling for attaining wider cov-
Beranek, 1954.) erage at HF.
420 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
In recent years, the notion of decou-
pling has resurfaced with the dis-
tributed mode loudspeaker (DML).
Here, a panel of light, stiff, and fairly
lossless material is driven by a mass-
loaded transducer. When operated in
the frequency range over which the
panel exhibits a dense array of higher-
order two-dimensional bending modes,
the resulting power output and disper-
sion are uniform over a frequency
range defined by:

Upper frequency limit, fmax = RP/2πMc


(2A)

Lower frequency limit, fmin = RP/2πMm


(2B)
where RP is the radiation impedance of
the panel and Mc and Mm are, respec-
tively, the masses of the driven system
and the magnetic actuator.
Due to its random, diffuse nature, the
response of a typical DML panel
departs from that of a typical dynamic Fig. 18. The Altec Extenda-Voice gradient loudspeaker system. Section view (A);
physical circuit (B); nominal off-axis response curves (C).
driver in several regards: the modulus
of electrical impedance is quite uniform
over the normal system passband; the
solid-angle radiation pattern is quite
uniform over the normal system pass-
band; the impulse response for a typical
panel is fairly long, upwards of 25
msec, however it is free of ringing at
any specific frequency; commensurate
with the long impulse train, the ampli-
tude response will exhibit, in all direc-
tions of radiation, many fine response
dips on the order of 6 dB [29].
With its unique combination of
acoustical properties and relatively low
cost, the DML offers great promise in
the area of distributed system design.

Directional Properties of Direct


Radiators
Fig. 19. Radiation resistance for conical, exponential, and hyperbolic horns.
Assuming that the moving systems
have only a single degree of freedom,
the polar data in Fig. 16 shows the Gradient Loudspeakers one side of the dipole operated in series
theoretical directionality of a piston The simplest gradient loudspeaker is an with an acoustical delay path. The delay
mounted in a large baffle as a func- unbaffled cone transducer operating as produced a hypercardioid pattern in
tion of ka. The on-axis directivity a dipole. The natural directional much the same manner as a typical
index (DI) of each directional pattern response for a dipole is a cosine pattern, dynamic microphone does.
is also shown in the figure [1, 42]. or a “figure-eight” [43]. Dipoles are Fig. 18A shows a section view
Similar data is shown in Fig. 17 for a very inefficient and have not been through the hypercardioid loudspeaker.
piston mounted at the end of a long widely used in sound reinforcement The output from the rear of the trans-
tube. These two conditions simulate applications; however, there are certain ducer is sent through a constant delay,
normal 2π and 4π mounting condi- applications where their specific radia- provided by the path length and the
tions for loudspeakers. For many rou- tion pattern could be very useful. Dur- resistive elements in that path. The result
tine system design applications, this ing the early 1970s, Altec produced a of the delay is that, at 135°, output from
data is sufficiently accurate. modified gradient loudspeaker in which front and back of the transducer will ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 421
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
impedance components such as those
shown in Fig. 20B. The slight peaks
and dips in response are due to reflec-
tions from the mouth of the horn back
to the throat. There is an optimum
mouth size for a horn of specific cutoff
frequency to minimize reflected waves
from the horn’s mouth [63].

Theoretical Modeling
The compression driver is designed to
Fig. 20. Real and imaginary components of horn impedance for a long exponential match the impedance of the electrome-
horn (A) and a short exponential horn (B).
chanical system to the throat of the horn,
and the radiation impedance, reflected to
the electrical side of the circuit, is:

RET = [ST(Bl)2]/ρ0cSD2 (3)

where ST is the area of the driver throat


and S D is the area of the driver
diaphragm. The phasing plug in the
driver is the means by which the ratio
of the two areas is adjusted for best HF
response.
When the driver is attached to the
horn, the efficiency in the range where
the horn’s radiation impedance is
essentially resistive is:

Fig. 21. Plane wave tube measurements of a compression driver showing η = (2RERET)/(RE + RET)2 (4)
amplitude response and impedance.
where RE is the voice coil resistance.
effectively cancel. For useful output increased acoustical output [58]. The When the voice coil resistance is made
from the front of the transducer, the sig- first example of thorough engineering equal to the radiation resistance, the
nal fed to the system must be equalized was carried out by Bell Telephone Lab- efficiency of the driver over its normal
with a 6-dB-per-octave rise for each oratories [83], working from the model pass-band will in theory be 50%. In
halving of frequency to compensate for of horn impedance described by Web- practice, efficiencies of the order of
the diminishing gradient. The equivalent ster [81]. Significant later development 30% can be achieved in the mid-
physical circuit of the loudspeaker is was carried out by Klipsch [66], who range—and this is only about
shown in Fig. 18B, and off-axis polar designed a remarkably compact bass 2 dB below the theoretical maximum.
data is shown in Fig. 18C. A system horn, and Salmon [76, 77], who
such as this would normally be used for described the impedance characteristics Region of Flat Power Output
speech purposes in highly reverberant of several important horn flares, includ- The data of Fig. 21 shows the normal
spaces where the loudspeaker’s DI of 6 ing the hyperbolic, or Hypex, profile power response for a compression
dB at LF would work to its advantage. [68]. Geddes [55] sought to position driver/horn combination when the
Vertical stacks of the device can Webster’s model as a special case horn’s throat impedance is resistive.
increase total output capability as well as within a broader context. The LF limit is due to the primary reso-
increase the on-axis DI. Fig. 19 shows the real part of the nance of the driver; for a typical HF
radiation impedance for hyperbolic, compression driver this may be in the
HORNS AND COMPRESSION exponential, and conical horn profiles. range of 500 Hz.
DRIVERS Here, only the exponential and hyper- The principal midband rolloff com-
bolic profiles provide useful output at mences at what is called the mass break
Early Development low frequencies. In our discussion we point, fHM, given by:
Many engineers and physicists have will restrict ourselves to the exponen-
contributed to horn and compression tial profile, since it has found almost fHM = (Bl)2/πREMMS (5)
driver development over the years. universal application over the years.
Early versions of the horn were used by Fig. 20A shows the real and imagi- where MMS is the mass of the moving
many tinkerers who basically did not nary parts of throat impedance for a system. For most HF compression
understand how the horn worked—they long exponential horn. For a horn of drivers the mass breakpoint takes place
knew only that somehow the horn practical length, we might observe in the range of 2500 to 4500 Hz. It is
422 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
considered a fundamental limit in HF
drivers, inasmuch as today’s magnetic
flux densities are normally maximized
in the range of 2 tesla, and low-moving
mass is limited by metals such as tita-
nium and beryllium that are not likely
to be replaced in the near future.
Two additional inflection points are
often seen in the HF driver response
curve: one is due to the volume of the
front air chamber in the driver, the
space between the diaphragm and the
phasing plug. Its effect on response
may be seen as low as 8 kHz in some
drivers. Voice coil inductance may
cause an additional HF rolloff at high
frequencies. This may be compensated
for through the use of a silver or cop-
per shorting ring plated on the pole-
piece in the region of the voice coil. Fig. 22. Second-harmonic distortion in two horn systems, using a compressed
fundamental. (Data courtesy JBL.)
(See Distortion in Transducer Mag-
netic Systems, page 419.)

Cone-Driven Horns
From the earliest days, cone transduc-
ers have been employed as horn drivers
[71, 72]. The theoretical principles that
govern the design parameters for horn
drivers apply equally to the adaptation
of cone drivers as well as to purpose-
designed compression drivers. Keele
[60] presented a straightforward and
useful analysis of LF horn design using
both Thiele–Small and traditional elec-
tromechanical parameters. Leach [69]
summarized Keele’s work, together
with Small’s approach to the subject
[78], and addressed other factors such
as reactance annulling.
Fig. 23. Plane wave tube amplitude response of three compression drivers.
Reactance Annulling
In some compression driver designs, a Distortion sound velocity at elevated temperatures
mechanical stiffness in the form of a The dominant cause of distortion in under adiabatic conditions.
small air chamber is located behind the compression driver-horn systems is due Thuras, Jenkins, and O’Neil [80] and
driver’s diaphragm. The mechanical to thermodynamic, or air, overload Goldstein and McLachlan [57] ana-
reactance resulting from the stiffness [75]. This comes as a result of lyzed the problem, leading to a simpli-
cancels in part the mass reactance por- extremely high pressures that exists at fied equation that gives the percent
tion of the radiation impedance, result- the horn throat: second harmonic distortion in horn
ing in a more resistive impedance in the systems:
region of the cutoff frequency. The LP = 94 + 20 log (WA(ρ0c)/ST)0.5 (6)
effect of this is greater acoustic output % 2nd HD = 1.73(f/fC) √IT x 10-2
in the horn cut-off frequency range for where WA is the acoustical power gener- (7)
a given drive signal [73, 74]. ated and ST is the throat area (m2).
Reactance annulling is not normally For example, in plane wave propaga- where IT is the intensity in watts per
used in HF compression drivers, but it tion, an intensity of one watt per square square meter at the horn’s throat, f is
is used in the design of bass horns, most centimeter will produce a sound pres- the driving frequency, and fC is the cut-
notably in the case of the Klipschorn, sure level of 160 dB LP. For levels in off frequency of the horn.
where the normal response associated this range, successive pressure peaks Fig. 22 presents measurements of the
with a 47-Hz flare rate is extended are tilted forward as they propagate second-harmonic distortion produced
down to about 40 Hz [66]. down the horn due to the increase in by two horns of differing flare rates. ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 423
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
kind of distortion can be defined mathe-
matically, a model can be implemented
and used to predistort the signal, result-
ing in reduced distortion in the system’s
output over a given power operating
range. Klippel [65] describes some of
the techniques for accomplishing this.

The Role of Secondary


Resonances
As shown earlier in Region of Flat
Power Repsonse (page 416), the power
response of a horn driver is flat up to its
mass break point, above which the
response rolls off 6 dB per octave. How-
ever, beneficial secondary resonances
may be used to increase the driver’s out-
put above this point [64, 70]. These res-
onances generally occur in the surround
Fig. 24. Directivity of various exponential horns. (Data from Olson, 1957.) sections of the diaphragm and are
decoupled from the diaphragm itself. As
The fundamental output in each case A horn with a high cutoff frequency in the case of decoupled resonances in
was held constant at a level of 107 dB has a rapid flare rate and as such may cones discussed earlier, the lowering of
at a distance of one meter, and the sec- lack good directional control at low fre- moving mass at higher frequencies can
ond-harmonic distortion has been quencies. This is a tradeoff that the result in a considerable increase in use-
raised 20 dB for ease in reading. The design engineer has to reconcile for a ful HF response. Fig. 23 shows the
scale on the right ordinate indicates the variety of applications. For example, response for three compression drivers,
second-harmonic distortion in percent- sound-reinforcement applications all with 100-mm-diameter diaphragms
age. The cutoff frequency of the horn require specific pattern control in the and mounted on the same horn. Driver
used for the data shown in Fig. 22A is range from 300 Hz upward, while A has an aluminum diaphragm and half-
70 Hz, while that of the horn used for music-monitoring applications may roll surround. The secondary resonance
the data shown in Fig. 22B is approxi- require horn pattern control no lower is about 9 kHz. Response is maintained
mately 560 Hz. The average difference than about 800 Hz. fairly flat to that frequency, falling off
in distortion is 8 dB. If the exact mechanism for a given rapidly above. Driver B has a beryllium

Fig. 25. Beamwidth and directivity data for typical radial horn (A) and slant-plate lens system (B).

424 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 26. Directivity of a uniform coverage horn. Basic


directivity regimes (A); beamwidth and directivity for a 90° Fig. 27. Attenuation of sound with distance from point, line, and
by 40° uniform directivity horn (B). planar sound sources.

diaphragm with a half-roll surround. reinforcement applications. For multi- one dimension, as shown in Fig. 25B
Note that, due to the greater stiffness and cellular horns, in the early days [83], [54, 67].
lower mass of the material, the sec- groups of exponential cells, each about
ondary resonance has shifted out to 15° wide in each plane, were clustered Constant Directivity Horns
about 17 kHz. Driver C has an alu- together to define a specific solid radi- Also known as uniform coverage or
minum diaphragm with distributed sur- ation angle; this produced excellent constant coverage horns, these designs
round geometry that moves the sec- results at mid-frequencies, but there date from the mid-1970s to the early
ondary resonance to beyond 20 kHz, was pronounced “fingering” of the 1980s [59, 62, 79]. The basic design
resulting in smooth, extended response response along the cell boundaries at common to a number of manufacturers
within the normal audio band with no higher frequencies. For radial horns, in uses a combination of exponential or
pronounced peaks [70]. this application, the horn’s horizontal conical throat loading, diffraction wave
profile is conical, with straight, radial guide principles, and flared termina-
Directional Response sides defining a target coverage angle tions to produce uniform nominal cov-
The basic exponential horn exhibits and the vertical profile is tailored to erage angles in the horizontal and verti-
directional response as shown in make a net exponential profile along cal planes. The general shape of the
Fig. 24. From the earliest days it was the horn’s primary axis; the nominal beamwidth curve is shown in Fig. 26A,
recognized that directional characteris- horizontal and vertical -6 dB as it applies to the horizontal and verti-
tics were a key element of loudspeaker beamwidth of a radial horn is shown in cal planes independently. Fig. 26B
performance [84]. Over decades of Fig. 25A [71]. For acoustical lenses, a shows the measured beamwidth and DI
development, numerous methods have slant-plate acoustical lens can be of a typical constant directivity horn
been used to improve directional per- placed at the mouth of an exponential with nominal 90°-by-40° pattern con-
formance at high frequencies for sound horn to diverge the exiting waves in trol. Within certain limitations, ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 425
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 28. Directivity of a four-element vertical line array with 0.2 meter separation between driver centers. 200 Hz (A); 350 Hz
(B); 500 Hz (C); 1 kHz (D); directivity factor for arrays of 4, 6, 8, and 10 elements (E).

acoustic waveguide theory has pro- A is modified by a line array as The Simple Line Array
posed an alternate approach to achiev- shown at B, and a very large planar Kuttruff [95] describes the polar
ing similar goals [55, 56]. array will show little attenuation with response of a line array of omnidirec-
distance up to limits proportional to tional sources in the plane of the array as:
ARRAYS the array dimensions [98]. A finite
Both horns and direct radiators may planar array will have the characteris- R (θ) = (sin [1/2 Nkd sin θ])/(N sin
be treated the same in terms of array- tics shown at D. Long horizontal line [1/2 kd sin θ]) (8)
ing. In this section we will examine arrays have been placed above
some useful concepts. Single-ele- prosceniums in performance spaces where N is the number of elements in
ment, line, and planar arrays differ in to extend the range of a high direct- the array, k is 2πf/c, d is the spacing of
their radiation characteristics over to-reverberant ratio toward the rear of the elements in the array, c is the speed
distance, as shown in Fig. 27. At long the space; large planar arrays are the of sound, and θ is the measurement
wavelengths, the simple inverse mainstay of mega-event music rein- angle in radians. For four elements as
square relationship of a point source forcement [87]. shown in Fig. 28, the polar response ➥

C
A B

Fig. 29. Tapering of arrays. Electrical tapering (A); acoustical tapering (B); tapering through component rotation (C).

426 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

Fig. 31. Grateful Dead “Wall of Sound” direct radiator system, 1974, using line,
planar, and arc segment arrays. Individual discrete systems were employed for
each instrument separately from the vocal reinforcement. (Photo courtesy
Richard Pechner.)

is through frequency tapering, or over a 100-to-1 wavelength ratio.


shaping, to allow the array to, in Via the product theorem, each omni-
effect, reduce in size with rising fre- directional element could be replaced
quency. Some techniques are shown with a directional element, all oriented
in Fig. 29. Electrical frequency in the same direction, with the resulting
tapering is shown at A, and acousti- response of the array exhibiting the
cal frequency tapering is shown at chosen directional characteristic over
B [94]. A unique “barber-pole” the same 100-to-1 wavelength ratio.
array is shown at C [93]. The Bessel array has potential for
speech reinforcement in live spaces. Its
The Product Theorem phase response varies with angle and
The product theorem [91] states that frequency, however, and thus it may be
an array composed of uniform direc- difficult to integrate the concept into a
tional elements will exhibit the same system that includes standard radiating
far-field response as a like array of elements.
omnidirectional elements multiplied
by the directional properties of one Very Large Arrays for Music
Fig. 30. The Bessel array. Layout (A); of the single directional elements. Concert sound reinforcement in very
series wiring diagram (B); parallel wiring This is another way of stating the large venues, indoors or out, requires
diagram (C); polar response (D). (Data principle of superposition, and is large arrays, and the accepted method
courtesy Keele, 1990.) useful in estimating the directional of assembling these arrays is to use
response of complex arrays. building blocks that are each relatively
is shown in A through D. The directiv- full-range units. Thus, the assembled
ity factor is shown at E. The Bessel Array system, normally resembling a large
The four-element array will exhibit Franssen [88] describes an array of plane, or sets of planes with curved sec-
good pattern control over the range elements whose amplitude drive char- tions connecting them, has much in
from d/λ 0.5 to 2.0. At higher frequen- acteristics are derived from Bessel common with the principles described
cies the pattern will exhibit narrowing coefficients [90, 92]. A simple 5-ele- in Fig. 31 [87]. As an example of this
and lobing, and simple arrays of more ment array is shown in Fig. 30A, and we show in Fig. 32 an elevation view of
than six elements will generally have simplified wiring diagrams to derive a typical large vertical array (A) along
unsatisfactory characteristics. the drive coefficients are shown at B with the on- and off-axis response (B)
and C. The far-field response of the measured in an arc along the listening
Tapering the Line Array array, modeled with omnidirectional plane [89].
The accepted method of extending the sources, is shown at D. Note that the The great virtue of these systems is
uniform coverage range of a line array response is essentially omnidirectional their ability to deliver very high ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 427
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

B
A

Fig. 32. A large array for music reinforcement. Physical layout (A); off-axis response on the ground plane (B). (Data courtesy
Gander and Eargle.)

Fig. 33. Performance of a continuous array 3


meters in length. Attenuation with distance at Fig. 34. Performance of a spiral array. Side view of array (A); polar response of
10 kHz (A); polar response at 10 kHz (B). array at 500 Hz, 1, 2, and 4 kHz (B). (Data courtesy M. Ureda.)
428 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT

A B C

35. A programmable array. View of array (A); signal flow diagram (B); examples of variable polar response (C). (Data courtesy
Eastern Acoustics Works.)

sound pressure levels at considerable by π [98]. For progressively shorter array will act independently, with the
distances with relatively low distortion. wavelengths, this distance increases upper section producing a highly direc-
The primary defect is the dense comb according to the following equation: tive HF beam for distant coverage and
filtering (lobing) and “time smearing” the lower section producing broader
that inevitably results from such a multi- r = l2f/700 meters (9) radiation for near coverage.
plicity of sources covering a given lis- where l is the array length, and r and l Ureda proposes the spiral array for
tening position. Actually, since the are in meters [99]. more uniform overall coverage. The
required acoustic power cannot be Fig. 33A shows the attenuation pat- spiral array is continuously curved
achieved by a single source, the aim tern with distance from a straight from the top down, beginning with
here should be to keep the coverage at 3-meter array at 10 kHz, and the polar small angular increments, which
each listening position as uniform as response in the far field of that array is increase downward in arithmetic fash-
possible. The greater the number of shown at B. Note that the beamwidth (- ion. Fig. 34A shows a side view of
effective sound sources, the finer the 6-dB) is 0.8°, as given by the equation: such an array with a total length of 5
lobing patterns and the more uniform meters and a terminal angle of 45°.
the received signal will be. Ideally, we θline array = 2 sin-1(0.6λ/l) (10) The directivity function is remark-
would like the interference peaks and ably uniform with frequency over about
dips among the elements to be well where l is the length of the array and λ a decade. Fig. 34B shows a group of
within the ear’s critical bandwidths. is the wavelength [99]. polar plots from 500 Hz to 5 kHz.
The pronounced HF beaming of
Continuous Line Arrays straight arrays is of course a liability, Steerable Arrays
A continuous line array is an approxi- and articulation of the array is one way The very large arrays for music dis-
mation of a uniformly illuminated “rib- of dealing with the problem cussed earlier consist of elements
bon” of sound, and its directional equally driven in terms of level and
behavior in the far field can be deter- J and Spiral Arrays bandwidth, and the directional proper-
mined by equation (8). At low frequen- Ureda [100] describes a J array as hav- ties are due entirely to the spatial rela-
cies, the far-field for a straight line ing two distinct portions: a straight tionships among elements. A steerable
array begins approximately at a dis- upper section and a uniformly curved array is one in which the elements are
tance equal to the array length divided lower section. Each segment of the fixed in space, with relative drive ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 429
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
levels, signal delay, and frequency speakers and Their Development,” no. 4 (1937).
tapering individually adjustable for SMPTE (March 1937). [21] Olson, H., “Horn Loudspeak-
each transducer. [8] Frayne, J. and Locanthi, B., ers, Part II. Efficiency and Distortion,”
Relatively simple arrays can be “Theater Loudspeaker System Incor- RCA Review, vol. II, no. 2 (1937).
reconfigured, through sequential tim- porating an Acoustic Lens Radiator,” [22] Rice, C. and Kellogg, E.,
ing, to steer their beams as needed [85, SMPTE, 63:3, pp. 82-85 (September “Notes on the Development of a New
86, 96, 97]. While far-field modeling 1954). Type of Hornless Loudspeaker,”
may be fairly simple, the fact that many [9] Green, I. and Maxfield, J., “Pub- Transactions, AIEE, volume 44, pp.
listeners are seated in the transition lic Address Systems,” Bell System 982-991 (September 1925). Reprinted
region between near and far fields Technical Journal, 2:2, p. 113 (April J. Audio Eng. Soc., 30:7/8, pp. 512-
makes the problems of reconfiguration 1923). Reprinted in J. Audio Eng. 521 (July/August 1982).
and uniformity of coverage fairly com- Soc., 25:4, pp. 184-195 (April 1977). [23] Thiele, N. and Small, R., Direct
plex to estimate. [10] Hilliard, J., “An Improved The- Radiator Sealed Box, Vented Box, and
The relatively small system shown in ater-Type Loudspeaker System,” J. Other Papers Collected in AES Loud-
Fig. 35 can be configured via a PC by Audio Eng. Soc., 17:5, pp. 512-514 speaker Anthologies, Volumes 1, 2,
the user as required. The system profile (Oct. 1969). Reprinted in JAES, Nov and 3, Audio Engineering Society,
and signal flow diagram are shown at A 1978. New York, 1978, 1984, 1996.
and B, and a family of typical far-field [11] Hilliard, J., "A Study of The- [24] Thrasher, F., ed., Okay for
polar plots is shown at C. Systems such ater Loudspeakers and the Resultant Sound . . . How the Screen Found Its
as these, large or small, are presently Development of the Shearer Two-Way Voice, Duell, Sloan, and Pearce, New
used to solve intelligibility problems in Horn System," SMPTE, pp. 45-59 York, 1946.
a variety of large reverberant spaces. (July 1936). [25] Villchur, E., “Problems of Bass
[12] Hilliard, J., “Historical Review Reproduction in Loudspeakers,” J.
REFERENCES AND of Horns Used for Audience-Type Audio Eng. Soc., 5:3, pp. 122-126
SUPPLEMENTAL Sound Reproduction,” J. Acoust. Soc. (July 1957).
BIBLIOGRAPHY Am., 59:1, pp. 1-8 (January 1976). [26] Villchur, E., “Revolutionary
[13] Keele, D., “An Efficiency Con- Loudspeaker and Enclosure,” Audio,
Historical Perspectives stant Comparison between Low-Fre- vol. 38, no. 10 (October 1954).
[1] Beranek, L. Acoustics, John quency Horns and Direct Radiators,” [27] Wente, E. and Thuras, A.,
Wiley & Sons, New York, 1954. Cor- presented at the 54th AES Conven- “Auditory Perspective—Loudspeakers
rected edition published by the Ameri- tion, Los Angeles, 4-7 May 1976; and Microphones,” Electrical Engi-
can Institute of Physics for the Acous- preprint 1127. neering, vol. 53, pp. 17-24 (January
tical Society of America, 1986. [14] Kock, W. and Harvey, F., 1934). Also, BSTJ, XIII:2, p. 259
[2] Beranek, L., “Loudspeakers and “Refracting Sound Waves,” J. Acoust. (April 1934), and J. Audio Eng. Soc.,
Microphones,” J. Acoust. Soc. Am., Soc. Am., 21:5, pp. 471-481 (Septem- volume 26, number 3 (March 1978).
26:5 (1954). ber 1949).
[3] Clark, L. and Hilliard, J., “Head- [15] Lansing, J., “New Permanent Direct Radiators
phones and Loudspeakers,” Chapter Magnet Public Address Loudspeaker,” [28] Badmaieff, A., “Sound Repro-
VII in Motion Picture Sound Engi- SMPTE, 46:3, pp. 212 (March 1946). ducing Device,” Altec “Duocone,” U.
neering, D. Van Nostrand, New York, [16] Lansing, J. and Hilliard, J., “An S. Patent 2,834,424, issued 13 May
1938. Improved Loudspeaker System for 1958; filed 26 January 1956.
[4] Eargle, J. and Gelow, W., “Per- Theaters,” SMPTE, 45:5, pp. 339-349 [29] Bank, G., and Harris, N., “The
formance of Horn Systems: Low-Fre- (November 1945). Distributed Mode Loudspeaker—The-
quency Cut-off, Pattern Control, and [17] Locanthi, B., “Application of ory and Practice,” AES UK Confer-
Distortion Trade-offs,” presented at Electric Circuit Analogies to Loud- ence, London (16-17 March 1998).
the 101st AES Convention, Los Ange- speaker Design Problems,” IRE Trans- [30] G. Beers and H. Belar, “Fre-
les, 8-11 November 1996; preprint actions on Audio, volume PGA-4, quency Modulation Distortion in
4330. March 1952. Reprinted in J. Audio Loudspeakers,” Proceedings, IRE,
[5] Engebretson, M. and Eargle, J., Eng. Soc., 19:9, pp. 778-785 (1971). volume 31, number 4 (April 1943).
“Cinema Sound Reproduction Sys- [18] Martin, D., “Speaker Technol- Reprinted in J. Audio Eng. Soc., 29:5,
tems: Technology Advances and Sys- ogy for Sound Reinforcement,” Studio pp. 320-326 (May 1981).
tem Design Considerations,” SMPTE, Sound, March 1976. [31] Benson, J. E. “Theory and
91:11 (1982). Also see AES preprint [19] Novak, J. “Performance of Design of Loudspeaker Enclosures,”
1799. Enclosures for Low-Resonance, High Amalgamated Wireless Australia
[6] Engebretson, M., “Low Fre- Compliance Loudspeakers,” J. Audio Technical Review (1968, 1971, 1972).
quency Sound Reproduction,” J. Eng. Soc., 7:1, pp. 29-37 (January [32] Button, D., “Heat Dissipation
Audio Eng. Soc., 32:5, pp. 340-352 1959). and Power Compression in Loud-
(May 1984). [20] Olson, H., “Horn Loudspeak- speakers,” J. Audio Eng. Soc., 40:1/2,
[7] Flanagan, C., Wolf, R., and ers, Part I. Impedance and Directional pp. 32-41 (January/February 1992).
Jones, W., “Modern Theater Loud- Characteristics,” RCA Review, vol. I, [33] Gander, M., “Moving-Coil
430 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
Loudspeaker Topology and an Indica- sented at the 105th AES Convention, pp. 250-256 (March 1924); discussion
tor of Linear Excursion Capability,” J. San Fransisco, 1998; preprint 4855. pp. 1191-1197. Reprinted in J. Audio
Audio Eng. Soc., 29:1, pp. 10-26 (Jan- [46] Siemens, E., U.S. Patent Eng. Soc., 25:9, pp. 573-585 (Septem-
uary/February 1981). 149,797 (1874). ber 1977) and volume 26, number 3
[34] Gander, M., “Dynamic Linear- [47] Small, R., “Closed-box Loud- (March 1978).
ity and Power Compression in Mov- speaker Systems, Parts 1 and 2.” J. [59] Henricksen, C. and Ureda, M.,
ing-Coil Loudspeakers,” J. Audio Eng. Audio Eng. Soc., 20:10, pp. 798-808 “The Manta-Ray Horns,” J. Audio
Soc., 34:9, pp. 627-646 (September (December 1971) and 21:1, pp. 11-18 Eng. Soc., 26:9, pp. 629-634 (Septem-
1986). (January/February 1972). ber 1978).
[35] Harris, N., and Hawksford, O., [48] Strahm, C., “Complete Analy- [60] Keele, D., “Low-Frequency
“The Distributed Mode Loudspeaker sis of Single and Multiple Loud- Horn Design Using Thiele-Small
as a Broad-Band Acoustic Radiator,” speaker Enclosures,” presented at the Parameters,” presented at the 57th
presented at the AES 103rd Conven- 81st AES Convention, Los Angeles, AES Convention, Los Angeles, 10-13
tion, New York, September 1997, 12-16 November 1986; preprint 2419. May 1977; preprint 1250.
preprint 4526. [49] Thiele, N., “Loudspeakers in [62] Keele, D., “What’s So Sacred
[36] Henricksen, C., “Heat Transfer Vented Boxes, Parts 1 and 2,” J. About Exponential Horns,” presented
Mechanisms in Loudspeakers: Analy- Audio Eng. Soc., 19:5 and 6, pp. 382- at the 51st AES Convention, Los
sis, Measurement, and Design,” J. 392, 471-483 (May and June 1971). Angeles, 13-16 May 1975; preprint
Audio Eng. Soc., 35:10, pp. 778-791 [50] Vanderkooy, J., “Loudspeaker 1038.
(October 1987). Ports,” presented at the 103rd AES [63] Keele, D., “Optimum Horn
[37] Hunt, F., Electroacoustics, J. Convention, New York, September Mouth Size,” presented at the 46th
Wiley & Son, New York (1954). 1997; preprint 4523. AES Convention, New York, Septem-
Reprinted by the American Institute of [51] Vanderkooy, J., “Nonlinearities ber 1973; preprint 933.
Physics for the Acoustical Society of in Loudspeaker Ports,” presented at [64] Kinoshita, S. and Locanthi, B.,
America, 1982, p. 59. the 104th AES Convention, Amster- “The Influence of Parasitic Reso-
[38] Keele, D. B. (Don), “Maximum dam, The Netherlands, May 1998; nances on Compression Driver Loud-
Efficiency of Direct Radiator Loud- preprint 4748. speaker Performance,” presented at
speakers, 91st AES Convention, New [52] Wolff, I., and Malter, L., the 61st AES Convention, New York,
York, October 1991; preprint 3193. “Sound Radiation from a System of November 1978; preprint 1422.
[39] Klapman, “Interaction Circular Diaphragms,” Physical [65] Klippel, W., “Modeling the
Impedance of a System of Circular Review, vol. 33, pp. 1061-1065 (June Nonlinearities in Horn Loudspeakers,”
Pistons,” J. Acoustical Society of 1929). J. Audio Eng. Soc., 44:6, pp. 470-480
America, vol. 11, pp. 289-295 (Jan- [53] Zacharia, K. and Mallela, S., (June 1996).
uary 1940). “Efficiency of Multiple-Driver [66] Klipsch, P., “A Low-Frequency
[40] Klipsch, P., “Modulation Dis- Speaker Systems,” presented at the Horn of Small Dimensions,” J.
tortion in Loudspeakers: Parts 1, 2, IREE (Australia) Convention, 1975. Acoust. Soc. Am., vol. 13, pp. 137-144
and 3,” J. Audio Eng. Soc., 17:2, pp. (October 1941). Reprinted in J. Audio
194-206 (April 1969); 18:1, pp. 29-33 Horns and Compression Drivers Eng. Soc., 27:3, pp. 141-148 (March
(February 1970); 20:10, pp. 827-828 [54] Frayne, J. and Locanthi, B., 1979).
(December 1972). “Theater Loudspeaker System Incor- [67] Kock, W. and Harvey, F.,
[41] Leach, M., “Electroacoustic- porating an Acoustic Lens Radiator,” “Refracting Sound Waves,” J. Acoust.
Analogous Circuit Models for Filled SMPTE, 63:3, pp. 82-85 (September Soc. Am., 21:5, pp. 471-481 (Septem-
Enclosures,” J. Audio Eng. Soc., 1954). ber 1949).
37:7/8, pp. 586-592 (July 1989). [55] Geddes, E., “Acoustic Wave- [68] Leach, M., “A Two-Port Anal-
[42] Olson, H., Acoustical Engi- guide Theory,” J. Audio Eng. Soc., 37: ogous Circuit and SPICE Model for
neering, D. Van Nostrand, New York, 7/8, pp. 554-569 (July/August 1989). Salmon’s Family of Acoustic Horns,”
1957. Reprinted by Professional [56] Geddes, E., “Acoustic Wave- J. Acoust. Soc. Am., 99:3, pp. 1459-
Audio Journals, Philadelphia, PA, guide Theory Revisited,” J. Audio 1464 (March 1996).
1991. Eng. Soc., 41:6, pp. 452-461 (June [69] Leach, M., “On the Specifica-
[43] Olson, H., “Gradient Loud- 1993). tion of Moving-Coil Drivers for Low-
speakers,” J. Audio Eng. Soc., 21:2, [57] Goldstein, S. and McLachlan, Frequency Horn-Loaded Loudspeak-
pp. 86-93 (March 1973). N., “Sound Waves of Finite Ampli- ers,” J. Audio Eng. Soc., 27:12, pp.
[44] Roozen, N. B., et al., “Vortex tude in an Exponential Horn,” J. 950-959 (December 1979). Com-
Sound in Bass-Reflex Ports of Loud- Acoust. Soc. Am., vol. 6, pp. 275-278 ments: J. Audio Eng. Soc., 29:7/8, pp.
speakers, Parts 1 and 2,” J. Acoust. (April 1935). 523-524 (July/August 1981).
Soc. Am., vol. 104, no. 4, (October [58] Hanna, C., and Slepian J., “The [70] Murray, F. and Durbin, H.,
1998). Function and Design of Horns for “Three-Dimensional Diaphragm Sus-
[45] Salvatti A., Button, D., and Loudspeakers,” Transactions, AIEE, pensions for Compression Drivers,” J.
Devantier, A., “Maximizing Perfor- vol. 43, pp. 393-404 (February 1924); Audio Eng. Soc., 28:10, pp.720-725
mance of Loudspeaker Ports,” pre- also, abridged text in J. AIEE, vol. 43, (October 1980). ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 431
HISTORICAL PERSPECTIVES AND TECHNOLOGY OVERVIEW
OF LOUDSPEAKERS FOR SOUND REINFORCEMENT
[71] Olson, H., “A New High-Effi- 1934). Also, Bell System Technical J., Applied Science Publishers, London,
ciency Theatre Loudspeaker of the XIII:2, p. 259 (April 1934). Reprinted 1979.
Directional Baffle Type,” J. Acoust. in J. Audio Eng. Soc., 26:7/8, pp. 518- [96] Meyer, D., “Multiple-Beam
Soc. Am., pp. 485-498 (April 1931). 525 (July/August 1978). Electronically Steered Line-Source
[72] Olson, H., “Recent Develop- [84] Wolff and Malter, “Directional Arrays for Sound Reinforcement
ments in Theatre Loudspeakers of the Radiation of Sound,” J. Acoust. Soc. Applications,” J. Audio Eng. Soc.,
Directional Baffle Type,” SMPTE Am., vol. 2, pp. 201-241 (October 38:4, pp. 237-249 (April 1990).
(May 1932). 1930). [97] Meyer, D., “Digital Control of
[73] Plach, D., “Design Factors in Loudspeaker Array Directivity,” J.
Horn-Type Loudspeakers,” J. Audio Arrays Audio Eng. Soc., 32:1, pp. 747-754
Eng. Soc., 1:4, pp. 276-281 (October [85] Augspurger, G. and Brawley, (1984).
1953). J., “An Improved Collinear Array,” [98] Rathe, E., “Note on Two Com-
[74] Plach, D. and Williams, P., presented at the 74th AES Conven- mon Problems of Sound Reproduc-
“Reactance Annulling for Horn Loud- tion, New York, 8-12 October 1983; tion,” J. Sound and Vibration, vol. 10,
speakers,” Radio-Electronic Engineer- preprint 2047. pp. 472-479 (1969).
ing, pp. 15-17, 35 (February 1955). [86] Augspurger, G., “Near-Field [99] Ureda, M., “Line Arrays: The-
[75] Rocard, M., “Sur la Propaga- and Far-Field Performance of Large ory and Applications," presented at the
tion des Ondes Sonores d’Amplitude Woofer Arrays,” J. Audio Eng. Soc., 110th AES Convention, Amsterdam,
Finie,” Comptes Rendus, p. 161, 16 38:4, pp. 231-236 (April 1990). May 2001; preprint 5304.
January 1933. [87] Davis., D and Wickersham, R., [100] Ureda, M., “‘J’ and ‘Spiral’
[76] Salmon, V., “Hypex Horns,” “Experiments in the Enhancement of Line Arrays,” presented at the 111th
Electronics, vol. 14, pp. 34-35 (July the Artist’s Ability to Control His AES Convention, New York, Decem-
1941). Interface with the Acoustic Environ- ber 2001; preprint 5485.
[77] Salmon, V., “A New Family of ment in Large Halls,” presented a the
Horns,” J. Acoust. Soc. Am., 17:3, pp. 51st AES Convention, Los Angeles, SUPPLEMENTAL
212-218 (January 1946). 13-16 May 1975; preprint number BIBLIOGRAPHY:
[78] Small, R., “Suitability of Low- 1033. [101] Borwick, J., ed., Loudspeaker
Frequency Drivers for Horn-Loaded [88] Franssen, N., “Direction and and Headphone Handbook, third ed.,
Loudspeaker Systems,” presented at Frequency Independent Column of Focal Press, Oxford, UK, 2001.
the 57th AES Convention, Los Ange- Electroacoustic Transducers,” Philips [102] Colloms, M., High Perfor-
les, 10-13 May 1977; preprint 1251. “Bessel” Array, Netherlands Patent mance Loudspeakers, fifth ed., John
[79] Smith, D., Keele, D., and Ear- 8,001,119, 25 February 1980; U.S. Wiley & Sons, New York, 1997.
gle, J., “Improvements in Monitor Patent 4,399,328, issued 16 August [103] Dickason, V., The Loud-
Loudspeaker Design,” J. Audio Eng. 1983. speaker Cookbook, sixth ed., Audio
Soc., 31:6, pp. 408-422 (June 1983). [89] Gander, M. and Eargle, J., Amateur Press, Peterborough, NH,
[80] Thuras, A., Jenkins, R., and “Measurement and Estimation of 2000.
O’Neill, H., “Extraneous Frequencies Large Loudspeaker Array Perfor- [104] Eargle, J., Electroacoustical
Generated in Air Carrying Intense mance,” J. Audio Eng. Soc., 38:4, pp. Reference Data, Van Nostrand Rein-
Sound Waves,” J. Acoust. Soc. Am., 204-220 (1990). hold, New York, 1994.
vol. 6, pp. 173-180 (January 1935). [90] Keele, D., “Effective Perfor- [105] Eargle, J., Loudspeaker Hand-
[81] Webster, A., “Acoustical mance of Bessel Arrays,” J. Audio book, 2nd ed., Kluwer Academic Pub-
Impedance and the Theory of Horns Eng. Soc., 38:10, pp. 723-748 (Octo- lishers, Boston, 2003.
and of the Phonograph,” Proceedings ber 1990). [106] Eargle, J. and Foreman, C.,
of the National Academy of Sciences, [91] Kinsler, L., et al., Fundamen- JBL Audio Engineering for Sound
vol. 5, pp. 275-282 (May 1919). tals of Acoustics, third edition, Wiley, Reinforcement, Hal Leonard Publica-
Reprinted in J. Audio Eng. Soc., New York, 1980. tions, 2002.
25:1/2, pp. 24-28 (January/February [92] Kitzen, J., “Multiple Loud- [107] Langford-Smith, F., ed.,
1977). speaker Arrays Using Bessel Coeffi- Radiotron Designer’s Handbook,
[82] Wente, E. and Thuras, A., “A cients,” Electronic Components & Amalgamated Wireless Valve Co,
High Efficiency Receiver for Horn- Applications, 5:4 (September 1983) Sydney, and Radio Corporation of
Type Loudspeakers of Large Power [93] Kleis, D., “Modern Acoustical America, Harrison, NJ, 1953 (avail-
Capacity,” Bell System Technical Engineering,” Philips Technical able on CD ROM).
Journal, VII:1, p. 40 (January 1928). Review, 20:11, pp. 309-348 (1958/59). [108] Merhaut, J., Theory of Elec-
Reprinted in J. Audio Eng. Soc., 26:3, [94] Klepper, D. and Steele, D. troacoustics, McGraw-Hill, New
pp. 139-144 (March 1978). “Constant Directional Characteristics York, 1981.
[83] Wente, E. and Thuras, A., from a Line Source Array,” J. Audio [109] Olson, H. Solutions of Engi-
“Auditory Perspective—Loudspeakers Eng. Soc., 11:3, pp. 198-202 (July neering Problems by Dynamical
and Microphones,” Electrical Engi- 1963). Analogies, second ed., D. Van Nos-
neering, vol. 53, pp. 17-24 (January [95] Kuttruff, H., Room Acoustics, trand, Princeton, NJ, 1958.

432 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


Why advertise in the
Journal of the
Audio Engineering Society?
It's the best way to reach the broadest spectrum of
decision makers in the audio industry.

The more than 12,000 AES members and subscribers


worldwide who read the Journal include
record producers and engineers, audio equipment designers and
manufacturers, scientists and researchers,
educators and students (tomorrow's decision makers).

Deliver your message where you get


the best return on your investment.

For information contact:


Advertising Department, Flavia Elzinga
Journal of the Audio Engineering Society
60 E 42nd Street, Room 2520
New York, NY 10165-2520
Tel: +1 212 661 8528, ext. 34
journal_advertising@aes.org
www.aes.org/journal/ratecard.html
DSP in Loudspeakers
By AES Staff Writer
Loudspeaker systems are getting clev- Compensation of Transducer Nonlin- ing of the voice coil and long term
erer thanks to the incorporation of earities.” He points out that such cor- changes in loudspeaker parameters, and
advanced digital signal processing rection involves using DSP to deal with that any sensor employed to monitor
algorithms that can compensate for what he calls “regular nonlinearities” as the reproduced acoustic signal or cone
some of the deficiencies in the trans- opposed to mechanical defects in con- movement should be robust and stable.
duction process. Various forms of non- struction (such as voice coil rubbing Fig. 1 shows a generic model of the
linear distortion may be reduced or it and cone offset). The regular non-lin- various stages in the loudspeaker at
may be possible to get better perfor- earities are those that would be present which nonlinearities can occur (shown
mance out of smaller units by using in any loudspeaker with the same bold). Notice that only the amplifier
electronics to counteract physical inad- design, owing to physical limitations crossover and room acoustics are con-
equacies. Some of these processes can such as cone excursion, voice coil sidered to be linear elements in the
make use of psychoacoustical phenom- length, and so forth. Such techniques chain, all other elements having some
ena, such as a means of extending the are increasingly required to produce degree of nonlinearity.
perceived bass response without actu- small loudspeakers with a high output Using a signal processing software
ally reproducing the relevant low fre- level, which can result in them being module to compensate for physical
quencies, and it may also be possible to driven into the large signal region aspects of the loudspeaker linearity is
modify the way in which the loud- where linearity problems become more said to provide considerable new free-
speaker interacts with the listening noticeable. In discussing this issue, dom for the designer because it now
room. Finally there are various ways by Klippel says: “Active loudspeaker con- becomes possible to consider loud-
which it may be possible to engineer an trol can only be justified in applications speaker parameters with a different
all-digital signal chain, even using digi- requiring smaller, less expensive trans- order of priority. For example, size,
tal forms of representation right up to ducers giving more output with lower weight, and directivity can be chosen
the point where the binary data is con- distortion at higher sensitivity.” more freely, assuming that distortions
verted into an acoustical waveform. He goes on to describe the generic arising from nonoptimal combinations
This article is based on a selection of characteristics of compensation sys- of such can be compensated for elec-
recent papers that were given at the tems that act to control the output of the tronically. The hardware costs of
23rd AES International Conference on loudspeaker. Such controllers can be implementing such an approach may be
Signal Processing in Audio Recording based upon models of the loudspeaker relatively small, perhaps requiring only
and Reproduction, 23–25 May 2003. in question, so that they model more the addition of a current sensor.
The full proceedings of this confer- precisely the known behavior of the As part of a detailed exposition of
ence can be ordered online at device. These, he points out, are to be techniques, he makes the point that
www.aes.org/publications/conf.cfm. preferred to generic controllers and are with analog linearization techniques
likely to be more successful. It is also there cannot be a delay between the
DISTORTION COMPENSATION important that any control scheme loudspeaker input and the sensor out-
Klippel considers the issues of distor- should be sufficiently flexible to adapt put, therefore a feed-forward adaptive
tion compensation in his paper “Active to changes in conditions such as heat- approach, such as that shown in Fig. 2,

Fig. 1. General model describing the basic signal flow in loudspeakers using linear (thin) and nonlinear (bold) subsystems
(Figs. 1 and 2 courtesy Klippel).

434 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


Fig. 2. Adaptive indirect control of a loudspeakers using a generic feed-forward structure both in controller and detector.

has to be adopted. The filter parameters uations.” increased loudspeaker sensitivity com-
are adapted based on the loudspeaker As in Klippel’s paper, Bright consid- pared with traditional designs, owing to
input signal u(t) and the acoustic output ers a feed-forward system, which needs the use of shorter voice coils, together
p(t), making the system self-tuning. to be tuned in situ so as to take into with a reduction in distortion compared
However, the sensor can be deactivated account specific conditions within the with the uncompensated case as shown
at any time and the filter will continue loudspeaker in question such as long- in Fig. 3.
to operate based on current parameters. term drift, aging, and thermal changes.
Bright, in “Simplified Loudspeaker This can be achieved with feedback PSYCHOACOUSTIC LF
Distortion Compensation by DSP,” dis- from an electrical current sensor. Over- EXTENSION
cusses similar ideas. He describes a rel- all this compensation enables manufac- Aarts, in “Applications of DSP for
atively straightforward concept in turers to use a shorter voice coil that can Sound Reproduction Improvement,”
which a discrete-time model of the be located in the most concentrated part describes a novel means for giving the
loudspeaker in question is inverted to of the loudspeaker’s magnetic field, impression that a loudspeaker is
create an appropriate linearization filter. having correspondingly lower mass and reproducing more bass energy than it
He says, “the resulting algorithm for higher sensitivity. While this is nor- actually is. This is based on the psy-
compensation of nonlinear distortion is mally not done because of the nonlin- choacoustic principle of the missing
relatively simple, consisting of one or earity that results at high voice-coil dis- fundamental or “residue pitch,” in
more second-order IIR filter blocks placements, the signal processing which the hearing mechanism tends to
(depending on the order of the linear described makes it possible. assume the presence of a fundamental
dynamics) and several polynomial eval- Experimental results demonstrated frequency when harmonics of that ➥

Fig. 3. Bright’s results from one experimental loudspeaker with a shortened voice coil showing the distortion of an 800-Hz tone.
The left plot shows the acoustic pressure at 28 cm from the loudspeaker, and the right plot shows the amplifier’s output voltage.
The darker line shows the result with distortion compensation active (courtesy Bright).

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 435


DSP in Loudspeakers
the possibility that intermodulation dis-
tortion may cause audible artifacts, and
that the added harmonics may interfere
with the existing sound spectrum
resulting in some timbral modification.

ALL-DIGITAL SIGNAL CHAIN


Tatlas et al., in “Towards the All-Digi-
tal Audio/Acoustic Chain: Challenges
and Solutions,” describe various issues
associated with the construction of an
all-digital signal chain from source to
loudspeaker, looking at ways in which
Fig. 4. Proposed implementation scheme for loudspeaker equalization in a room digital signals can be used directly to
(Figs. 4, 5, and 6 and Table 1 courtesy Tatlas et al.). drive loudspeakers.
After considering various aspects of
DSP DAMP DAE wireless networking and data reduction,
the authors consider loudspeaker–room
equalization, which is also covered later
in this article. According to Tatlas et al.,
PCM Q PCM Demux
“The all-digital transducer should incor-
(n/s, o/s)

N
porate sufficient DSP power, not only to
N’ (N’<N)
fs
Rfs realize simple functions such as volume
A control, delay, and shelving filtering, but
also to implement transducer response
DSP
correction and adaptation to its local
DAMP DAE
acoustic environment via the use of
equalization methods. These can incor-
porate FIR filtering on PCM data by
Q
PCM
(n/s, o/s)
PCM PCM to PWM Buffer
using room response inverse filters
N
fs
N’ (N’<N)
Rfs
1
2(2N-1)Rf s
derived from responses individually
B measured for each of the channel
receivers.” A possible implementation
DSP DAMP DAE scheme is shown in Fig. 4. However, as
we will see in the next section, this may
not always be the most appropriate way
of dealing with the problem of the
loudspeaker–room interface.
S/D
PCM
(n/s, o/s) Buffer Tatlas et al. go on to discuss digital
N
loudspeakers, explaining that current
fs 1
Rfs research is centered on two different
approaches—digital transducer arrays
C
(DTA) and multiple voice coil digital
Fig. 5. Three different approaches to the digital transducer array. (A) PCM; (B) PWM; loudspeakers (MVCDL). They prefer to
(C) 1-bit sigma–delta modulation. concentrate on the former for their anal-
ysis owing to promising characteristics
frequency are present even if the fun- frequency audio is also present. of the technology. DTAs are presented
damental is not physically present. There are various advantages and and tested with a variety of different dig-
Small loudspeakers tend to suffer disadvantages to this idea. The advan- ital audio signal types including multibit
from a poor low-frequency response, tages Aarts cites are: little energy is PCM, PWM, and 1-bit sigma–delta
making them ideal candidates for some radiated below the loudspeaker’s cut- (otherwise known as Direct Stream
form of enhancement. In this psychoa- off frequency; less headroom is Digital in the Sony/Philips SACD
coustic bass-enhancement system syn- required compared with more tradi- paradigm).
thetic harmonics of the “missing” bass tional “bass boost” approaches for a A typical digital transducer array, as
(that which the loudspeaker cannot comparable bass effect; the system is shown in Fig. 5, consists of three stages:
reproduce satisfactorily) are derived computationally and power efficient; it DSP (digital signal processing), DAMP
from the audio signal and added to the can be implemented with a simple ana- (digital amplification), and DAE (digital
part of the spectrum where the loud- log circuit if required; and it can also be audio emission).
speaker does radiate well, thereby giv- tuned to any kind and size of loud- The authors show how a typical
ing the impression that the lower- speaker. However, drawbacks include PCM digital transducer array might be
436 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
DSP in Loudspeakers
Table 1. Theoretical (A) and actual (B) characteristics of for the possibility
the arrays tested. N is the resolution of the digital signal of obtaining any refer-
and R is the oversampling ratio employed. ence response by this
Digital Transducer Transducer
method. In other words,
Speaker Number Frequency the designer is able to
PCM 2N’-1 Rfs
ensure that the loud-
speaker sounds similar
PWM 2(2N’-1) Rfs
wherever it is placed in
SDM R fs different listening
A rooms.
Measurement of radi-
Digital Transducer Transducer ation resistance requires
Speaker Number Frequency
that measurements are
PCM 63 44–176 kHz taken at two different
PWM 510 44–176 kHz distances from the loud-
Fig. 6 . Simulated 6-bit PCM-based SDM 32 44–176 kHz speaker, using a method
array showing transducer matrix; 1 similar to that shown in
corresponds to the MSB of the digital B
Fig. 8. (A commercial
signal.
implementation of this
constructed in a matrix of small drivers not necessarily optimal despite the fact principle has a motorized microphone,
as arranged in Fig. 6, with each driver that it might intuitively seem so. mounted below the woofer, which pops
having a certain “weight” and driven ABC concentrates on the bass fre- out to two different distances during the
by a specific bit of the digital signal. quencies (20 to 500 Hz), views the calibration phase.) Once the measure-
The authors simulated different types problem from the loudspeaker’s situa- ments are available, a smoothed 16th
and topologies of these arrays, in an tion, and works by examining the radi- order IIR filter is calculated that
attempt to evaluate their relative perfor- ation resistance of the loudspeaker attempts to approximate the desired
mances. The three arrangements used (which is related to the power output). correction. ➥
are described in Table 1. The radiation resis-
Overall it was found that the PWM tance is affected by
version required an impractically large the existence of room
number of drivers as well as showing modes so this method
higher distortion than the other array makes it possible to
types. Oversampled low-bit PCM compensate for the
arrangements showed promising effect of these on the
results with a manageable array size. perceived loud-
The SDM version performed well on- speaker timbre,
axis and its physical characteristics adjusting for the
made it a practical proposition, but its loudspeaker’s current Fig. 7. The basic principle of the ABC system (Figs. 7 and 8
directivity characteristics would position in the room. courtesy Pedersen).
require some attention. Generally, the (For example, the
smaller sized DTAs were found to be bass sound pressure
preferable and the distortion results level might be up to 9 dB higher when
measured off-axis were considerably the loudspeaker is placed in a corner
poorer than those on-axis. when compared to the free field.) By
controlling the acoustic power of the
LOUDSPEAKER–ROOM loudspeaker in this way it is claimed
INTERFACE EQUALIZATION that the equalization of timbre is per-
Pedersen, in “Adaptive Bass Control— ceived at any point in the listening
The ABC Room Adaptation System,” room, rather than being strongly
points out that traditional loudspeaker- affected by changes in listening posi-
room equalization systems concentrate tion (see Fig. 7).
on measuring the transfer response to a The aim of ABC is to measure the
certain listening position and creating loudspeaker radiation resistance in a
an inverse filter. This has various prob- reference position in a reference room
lems including gain differences of and then in the target room, using the Fig. 8. The ABC system embedded
maybe 20 to 30 dB, sensitivity to ABC filter to equalize the resulting into an active loudspeaker. The
changes in listener position (coloration, response so that the target timbre is microphone is mounted on a vertical
rod, which by rotation through 180
preechoes), and nonminimum phase close to the reference timbre. This degrees effectively moves the
components in the filter. Furthermore, a assumes that the reference response microphone to a position 4 cm further
constant amplitude transfer function is timbre is the desired one, but allows away from the diaphragm.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 437
DSP in Loudspeakers
Wilson et al., in “The Loud- ceived sound of the loudspeaker in
speaker-Room Interface—Con- the room and the relative impor-
trolling Excitation of Room tance of early and late sound. A
Modes,” discuss a similar con- tentative proposal is offered that
cept. They reinforce the points psychoacoustic evidence would
about the problems of equalizing suggest it is dangerous to introduce
the magnitude response at the lis- filters that have a notch greater
tening position, and they also than 6 dB (which therefore reduce
claim that it is preferable to con- the decay time of a room mode by
sider the way in which the loud- more than half), but this needs
speaker interacts with room more testing to verify.
modes. They summarize some of Waterfall plots from experi-
the problems noted in relation to mental work suggest a smoothed
loudspeaker-room mode interac- modal response resulting from
tion as follows: this system, as shown in Fig. 10.
1. Some frequencies are empha- Measurements of mode decay
sized or deemphasized, therefore time also suggest that the effect is
some notes in a series are too loud successful over a range of mea-
or missing. suring positions.
2. Some notes overhang or ring Fig. 9. System for measuring and controlling room Karjalainen et al. continue this
on much longer than others, con- modes (Figs. 9 and 10 courtesy Wilson et al.). theme in “Modal Equalization by
tributing to their dominance. Temporal Shaping of Room
3. The pitch changes during the the room between 500 and 2000 Hz. Response.” They too work on the prin-
decay of a note. Such correction can cause distortions ciple that traditional magnitude equal-
4. The pitch of short notes changes, to the direct sound signal from the ization (such as third-octave graphic EQ
such that the pitch heard is different loudspeaker, and so the question arises or basic loudspeaker equalization) can-
from the original. as to what primarily governs the per- not serve to control modal decay time
5. Echoes occur where a single tone
burst or note is changed to two or more,
shorter, tone bursts.
The energy decay rate at modal fre-
quencies tends to be longer than at non-
modal frequencies, causing overhang
and other nasty effects. One of the aims,
therefore, of the approach they describe
appears to be to create a filter such that
the combined response of the filter and
mode has a shorter decay time.
In order to devise the filters needed
to undertake this equalization, a mea-
surement to identify room modes is
required, finding those with the longest A
decay time (see Fig. 9). The authors
describe a novel method of achieving
this that attempts to overcome the prob-
lem that the measuring microphone
may be near a point of minimum SPL
in the modal response at the measuring
position. Using a long Hanning win-
dow, the impulse response is trans-
formed into the frequency domain and
even the smallest peaks are registered
as potential candidates for modes.
Information about the amplitude and
RT60 (reverberation time) of the modes
is used in a weighted fashion to deter-
mine the most critical modes for equal- B
ization. Most emphasis is placed on
RT60. Filters are then calculated with a Fig. 10. Waterfall plot showing room modes excited by the left loudspeaker (A). Plot
target RT based on the average RT in showing controlled excitation of modes from the same loudspeaker (B).
438 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
DSP in Loudspeakers

A
Fig. 12. Decay envelopes (Schroeder-integrated) for the
original impulse response (upper solid line), for the AMK
equalized (dotted line), for the ARMA equalized (dashed
line), and for the windowing mode equalized response
(lower solid line).

Fig. 13. Decay time as function of frequency for original


Fig. 11. Waterfall plots showing initial (A) and reduced decay rate response (dash-dot line), the target curve (dashed line),
after modal equalization (B) with AMK filter (Figs. 11, 12, and 13 and for the equalized response (solid line) using the
courtesy Karjalainen et al.). windowing technique.

or the detailed structure of modal closely spaced modes and does not below 200 Hz (see Fig. 12), it was
response. So they resort to more sophis- require the prior estimation of modal found that the new windowing method
ticated forms of digital filtering in the decay rate. was more effective then either of the
time domain that attempt to alter the Fig. 11 shows the effect of such an original two methods, especially if no
detailed modal response of the loud- AMK filter, based on synthetically cre- hand-tuning was used with the AMK
speaker–room combination. ated room modes at frequencies of 50, and ARMA methods.
The principle of the approach is 55, 100, 130, and 180 Hz. The corre- The plot of decay time versus fre-
based on frequency-dependent win- sponding decay times were given as quency (see Fig. 13) shows that the
dowing of the impulse response of the 1.4, 0.8, 1.0, 0.8, and 0.7 s, and the decay time has been reduced success-
system. Essentially the impulse modal equalizer design target was to fully using the new windowing filter at
response at the listening position is reduce those decay times to 0.30, 0.30, most frequencies, close to the target
measured and the decay time deter- 0.26, 0.24, and 0.20 s (some LF rise is maximum (dashed line).
mined at each frequency, then the allowed as this is normal within room It is clear from the examples pre-
impulse response is filtered with an acoustics standards). The authors note sented in the papers summarized here
exponentially decaying function at that both types of filter tend to reduce that the use of DSP in loudspeakers
those frequencies where it is deemed the initial decay rate quite well but that can be an effective tool to counteract
necessary to reduce it. This is achieved then the decay slips back to being inadequacies in their physical charac-
by means of a relatively high-order FIR closer to the original (as noted at the teristics and in their interaction with
(finite impulse response) filter. They lower frequency end on these graphs), the room. It may also be that all-digi-
described two different filtering meth- particularly when the modes are closely tal reproduction chains will one day
ods, known as AMK and ARMA, in spaced (for instance at 50 and 55 Hz as become a reality if the problems asso-
previous papers, the second of which here). ciated with digital transducer arrays
they say is better suited to the fitting of Looking at the overall decay rate can be solved.
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 439
SURROUND LIVE!
SUMMARY
Frederick Ampel
Technology Visions, Overland Park, KA, USA

n October 9, Graffy of Arup Acoustics, San

O 2003 a unique
day-long event
took place at the
Manhattan Center Studio
Francisco. Graffy challenged
the attendees to start thinking
about physioceptualsophicacous-
tics, a term he created to encom-
complex’s Grand Ballroom. pass the many complex aspects
Surround Live was the first- of defining and understanding
ever comprehensive event the multichannel audio environ-
devoted exclusively to the ment and its physical and
creation, production, and reproduction psychoacoustic aspects. His presenta-
of live performance audio in multi- tion first explored some of the previous
channel surround. literature on the topic, including contri-
Surround Live brought together butions from Durand Begault on 3-D
nearly 250 working professionals from sound, Jens Blauert on spatial hearing,
a cross section of the audio industry in the extensive literature published by
a one-day interactive workshop, in David Griesinger, Brian C. J. Moore’s
conjunction with the AES 115th work on masking, David R. Perrott’s
Convention in New York, to discuss papers on audio and visual localization
the issues and technological challenges and modalities, J. Robert Stuart’s
created by presenting music, drama, defining work on the psychoacoustics
and theater in full multichannel of multichannel audio, Nick
surround audio formats to a live Zacharov’s papers covering multichan-
audience. nel level alignment, and the AES 16th
In an interesting piece of historical Conference in 1999 in Rovaniemi,
juxtaposition, the Manhattan Center Finland on spatial sound reproduction,
complex is adjacent to and connected as well as other papers from the AES
Surround Live panelists: clockwise
by passageways to what is now the New from above, Fred Ampel, Kurt Graffy, 8th, 12th, and 15th Conferences. He
Yorker Hotel, which was the site of the Kurt Eric Fischer, Bruce Olsen, and then conducted a review and refresher
very first New York AES Convention Steve Schull. for attendees on the physiological
more than a half century ago. aspects such as head shape, torso,
The Surround Live event was orga- SHeDAISY, ZZ Top, and many pinnae; perceptual aspects including
nized into several distinct sections, others—handled the console and the loudness, pitch, localization, envelop-
with formal presentations and discus- external signal processing. A lively ment; the philosophical aspects includ-
sions taking up the morning and a and intense interaction ensued between ing reality modes, visual-auditory
portion of the time after lunch. the attendees, the band, and the mixing modalities; the real-world acoustical
The event then segued into a series team, discussing various ways to aspects such as spectrum, level, signal-
of demonstrations followed by a 90- present the music experience and vari- to-noise, directivity, and reverberant
minute segment with a live band on ous ways to use electronic reverb and level.
stage, showcasing various mixing and spatial effects, and then testing those His presentation then covered a
music-presentation ideas and concepts. concepts live with the band while range of other topics related to multi-
Legendary FOH mixing engineer listening and evaluating the results channel audio including perceptual
Buford Jones—whose 30-year career immediately. environmental distortions, perception
has included tours as a live-sound The live music performance was and depth, and formats and presenta-
mixer or engineer for such luminaries provided by the Surround Live tions of sonic imaging.
as Stevie Wonder, Eric Clapton, Pink Performers featuring guitarist Jeff After a very focused question-and-
Floyd (including a Platinum Record Golub (GRP Records, www.jeffgolub. answer discussion, Bruce Olson of
for “Delicate Sound of Thunder”), com). Olson Sound Design in Minneapolis,
David Bowie, Faith Hill, Jeff Beck, The formal presentation portion of presented a detailed look at theatrical
George Harrison, James Taylor, Surround Live was opened by Kurt sound concepts and formats incorporat-
440 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
ing materials supplied by John Leonard Interaction
of Aura Sound Design in London. among the
attendees and
Issues raised by Olson included the
the panelists was
number of channels, loudspeaker loca- very lively
tion and position, audience perception throughout the
of sound effects, and spatial recreation. event.
It is hoped that a more extensive discus-
sion of this topic will be part of
Surround Live 2004 at the AES 117th
Convention in San Francisco with live
demonstrations of many of the points
from Leonard.
Next the attendees heard from Kurt
Eric Fischer of Attic Sound Design. His
sound design and production credits
include Tell Me on Sunday (Kennedy
Center), Marty (Huntington Theater (Longacre Theater), the world Following Fisher was Steve Schull
Company), Dorian (Denver Center premieres of Parade (Drama Desk of Acoustic Dimensions, Dallas,
Theater Company), Blue (Roundabout Nomination) and Whistle Down the Texas. Schull’s extensive background
Theater Company, Audelco Award), Wind, Sunset Boulevard, Jelly’s Last includes production credits on Les
Linda Eder (Carnegie Hall), Finian’s Jam, and Nick and Nora. Fisher Miserables, Grand Hotel, Cats, Lena
Rainbow (Coconut Grove Playhouse, expanded on many of the concepts Horne, Annie, Little Shop of Horrors,
The Cleveland Playhouse), Macbeth addressed by Olson, and added his prac- The Real Thing, Sophisticated Ladies,
(New York Shakespeare Festival), the tical understanding of production reali- Dreamgirls, Fences, and The Rocky
Broadway production and first national ties and the difficulties of achieving Horror Show. He has designed or
tour of Rent, Nine (Eugene O’Neill repeatability for a long running show or worked on aspects of the preservation
Theater), Jesus Christ Superstar (Ford an event that moves from location to and restoration of a number of historic
Center Theater), Fascinating Rhythm location, such as an on-the-road theaters including the New Victory
and Judgment at Nuremberg Broadway show or live music tour. Theater, the New Amsterdam ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


SURROUND LIVE SUMMARY
Theater, the New York State Theater at merging multichannel sound and the repeated several time by request. Many
Lincoln Center, the Vivian Beaumont other aspects of a show, using his questions about how things were done
Theater at Lincoln Center, and the experience as sound designer of live and how the production was
Opera House at the Brooklyn Academy Cyberjam, which had just opened at handled followed his presentation.
of Music. And he’s been involved in the the Queens Theatre in London’s West The formal presentations concluded
management of new construction of the End. The production makes extensive with a talk by Marvin Welkowitz from
Lucille Little Performing Arts Center at use of surround sound. Hood also used the NYU School of the Performing
Transylvania University, the Royal examples from his work as sound Arts, Music Technology Department.
Shakespeare Theatre, the Gallagher- designer for blast! (London, He spoke on the “why” of surround.
Bluedorn Performing Arts Center for the Broadway, and the U.S. Tour), and the Since much of the day had been spent
University of Northern Iowa, and the U.S. tour of Shockwave. He was looking at the “what” and “how,”
new performing arts center for Florida recently audio producer of the PBS Welkowitz provided a stimulating look
State University. Schull addressed the special programs “Robert Mirabal, at what benefits surround can offer to
still relatively new ideas of implement- Music from A Painted Cave” and various styles of live events, how
ing and using multichannel sound in “blast! An Explosive Musical sound levels can be better managed
religious facilities and some of the Celebration.” Both were mixed and when the audience is placed within the
more unexpected benefits and prob- released in 5.1 audio format. In his sound field, and how the musical or
lems that occur when surround sound presentation he also linked the concepts show experience can be improved for
is deployed in these facilities. The from live theater with recorded and everyone involved. He challenged both
rapid growth in megachurchs (some studio aspects of surround based on his the panel and the audience to think
have seating for 3000) was a focal work with John Mellencamp, George about how we move ahead with these
point for the presentation. A very Benson, David Sanborn, John Scofield, concepts and how more audio reality
active discussion followed Schull’s Zamfir, Glenn Branca, the Gizmos, can be created. Both the attendees and
talk, and it is planned that he will Suicide, the Cincinnati Pops, and the panelists reacted strongly to his
return to the panel at Surround 2004 hundreds of others. And he added a ideas, which carried over into the ques-
with recorded examples of many of the historical note to the proceedings when tion-and-answer session that followed,
ideas and concepts offered in New he mentioned that he still owns the which continued until the start of the
York. Many attendees were surprised matrix boxes to decode his quadra- live performance.
to learn of the magnitude of the phonic vinyl projects. Surround Live was made possible
megachurch market, in which many Later the attendees “got ready for through an extensive technical support
3000-seat churches now have full-scale some football,” from Randy Hoffner, infrastructure provided through the
rock-and-roll style flying sound manager of technical planning for the generous support of Meyer Sound,
systems, large-screen video systems, ABC Television Network. He began DiGiCo, Shure, and TC Electronics.
and demanding production require- his presentation by noting that while Very special thanks go to Tim
ments. A lively debate ensued on what others were dealing with potential Chapman and the entire Meyer Sound
audiences actually expect or demand at audiences of a few thousand people, and Sound Associates crew, Nick
megachurch services. the Monday Night Football production Woods of Shure, Bob Doyle and the
After a lunch break Duncan crew was supplying surround sound to DiGiCo team, Ed Simeone and the TC
Crundwell spoke about image-place- millions of television viewers every team, and the crew of professionals
ment techniques and level-based local- Monday night. He highlighted the from the Manhattan Center, without
ization technologies and methods. advance of HDTV and 5.1 audio by whom the event would not have been
Crundwell, president of 1602 Group showing two photographs of the same possible. The Surround Live concept
LLC, Alexandria, Virginia, brought to production location; one taken in 2000 and content was created and the event
bear his specific expertise in advanced with a huge analog production truck was produced by Frederick J. Ampel
music control systems from his 17 and small auxiliary digital van, and the of Technology Visions.
years at SSL in pro-audio technical 2003 version with the gigantic Additional special thanks go to the
sales, worldwide customer support, HDTV/surround audio production AES and Roger Furness for allowing
and IT management, and his work in vehicle all by itself at the stadium. the event of take place and supporting
studio design with Munro Associates Now the events are produced in HDTV the concept, and to Mark Herman,
in the UK. Several very interesting and digital sound and down converted Keith and Julie Clark of Live Sound
demonstrations were included using for the analog feeds, the reverse of just International, and ProSoundWeb for
the T-Max hardware/software plat- three years ago. their unwavering commitment and
form, and these produced an active Hoffner provided D5/HDTV 5.1 promotional support. The next install-
debate among attendees and panelists. surround tape of a MNF game that ment of Surround Live, More
The psychoacoustics of “fooling” the showcased the dramatic shift from the Surround! The Art of Multichannel
ear into believing that sounds were in preproduced opening credits in 4x3 Live Audio, is tentatively scheduled for
motion provided a key discussion point and analog stereo, to the dramatic the October 27, 2004, the day before
after this presentation. jump to 16x9 and full 5.1 surround as the opening of the AES 117th
Mark Hood of Echo Park Studios the game went live on air. The effect Convention, October 28–31, 2004 in
followed with a detailed look into truly startled the audience and was San Francisco.
442 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
NEWS
OF THE

SECTIONS
We appreciate the assistance of the
section secretaries in providing the
information for the following reports.

According to Holman, makers’ Seattle office; Al Swanson of


using additional bits to the Seattle Symphony and Location
increase the bandwidth, Recording; and Steve Turnidge of
sample rate or the num- Ultraviolet Studios. Opening com-
ber of bits per channel ments were made by PNW committee-
beyond current standards man/moderator Bob Gudgel and
provides little or no section chair Dave Tosti-Lane.
demonstrable improve- Gudgel directed the first question to
ment in audio quality. the panel: What is mastering? The
However, using these responses reflected the fact that it is no
same extra bits to longer composed of the steps needed
increase the number of for simply preparing an LP pressing.
channels makes an easily Some answers included:
Tom Holman (on the right) chats with Chicago Section perceptible improve- Mastering is “shoehorning” the
members during November meeting. ment, even to untrained material into the CD format if it is not
listeners. Holman ex- already in 16 bit/44.1 kHz;
plained that it is nearly impossible to Mastering is making a marketable
Surround Sound ever have enough channels for full CD from material in various states of
Tomlinson Holman spoke to Chicago transparency. Furthermore, increases in polish;
Section members on November 6, the number of channels will continue to Mastering is the last step to a fin-
about the history and future of sur- provide improvements in realism. ished product, during which all final
round sound. He presented the history Holman said that specific shortcom- details must be made;
of surround as an opera in many acts, ings of the existing 5.1 channel system Mastering is determining and eradi-
with each era of surround being an act. include poor side images, poor sense cating the differences in classical
Holman began by tracing the history of envelopment—especially at low fre- material—songs may not have to be
of surround sound from its inception quencies; poor rear-center imaging and matched up, but other similar finish
centuries ago in antiphonal choirs and no vertical information. A system with details are made;
organs. He described the history of mul- more channels would address these Mastering is like a polishing
tichannel reproduction systems from the problems, but could still be down- process, or putting a package together
1933 Bell labs demonstrations using 3- mixed to be compatible with current to make a homogeneous complete CD.
channel telephony through current 10.2 systems having fewer channels. Data Panelists addressed the question of
channel experiments at TMH labs. to control the down-mix can be includ- whether mastering engineers typically
Holman then outlined the technical ed with the music data, resulting in a do remixing. The consensus was that
capabilities of past film sound formats, scalable system. The meeting conclud- sometimes they do, or they may tell
including Fantasound, Cinerama, Cine- ed with informal discussions. the client to go back and remix if the
mascope, Todd AO, and several analog material is not yet acceptable. It was
Dolby formats on 35- and 70-mm film. noted that mastering is not like mix-
He provided informative and often Panel on Mastering ing. Mastering engineers are some-
amusing descriptions of the rise and fall The Pacific Northwest Section’s Sep- times involved in the sequencing of
of these formats. He talked about the tember 30, meeting held at the Shore- tracks. The consensus was that little
rationale behind the selection of 5.1 dis- line Community College Music Build- automation is used for mastering.
crete channels for digital film audio and ing in Shoreline, Washington, featured After stories of ego clashes during
how this has been extended to 6.1 and a panel discussion with six local mas- recording sessions, Fisher remarked
7.1 channels through the use of matrix- tering engineers. Some 145 people— that mastering engineers have the won-
ing. He also touched on the history of possibly a record—gathered to hear a derful advantage of not knowing where
multichannel sound for home, from ear- discussion among panelists Ed Brooks the “bodies are buried.” Their only
ly 4-channel analog matrixed sound on and Rick Fisher of RFI/CD Mastering concern is the music. Panelists agreed
LP to current surround sound digital in Seattle; Mark Guenther of Seattle they have to be nonjudgmental about
formats available on DVD. Disc Mastering; Max Rose of Disc- the music—they just shine it up, ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 443


NEWS
OF THE
SECTIONS

Guenther had a CD of examples from


Orban, manufacturers of the Optimod
signal processor used by most mid-
large market stations. He played
examples demonstrating different set-
tings of the device and how they
affected the music.
Regarding the question of multiband
Pacific Northwest Section compression, panelists said that they
members during the did not generally use it because it is
September meeting. hard to set up, but it depends on the
Photo by Gary Louie
source. Finally, someone asked: How
no matter how good or bad. toring system. Fisher mentioned that does one select a mastering engineer?
Do mastering engineers like to have this is probably the only place where The general advice was to be comfort-
the client on the premises during mas- professional audio meets the lunatic- able with the engineer.
tering? Al Swanson noted that for fringe hi-fi gear. Special thanks to the participating
classical work, it was usually not a How does one master for end-user mastering engineers who provided
good idea to have a client present at playback environments, such as head- insight into their craft: AES PNW
the editing stage (what he called the phones, cars and living rooms? Swan- Committee member Gudgel, who
tree stage) but it is okay at the assem- son noted that there is simply no organized the meeting; Opus 4 Studios
bly stage (or forest stage). Others rec- standard for home environments. Inter- for the Genelec monitor system; Unee-
ommended having the client there. estingly, no one had much insight into da Audio for the sound reinforcement
What makes an album finished this factor, although they noted that system; and First Choice Marketing
rather than just one with a group of CDs were good for client test copies, and Mackie Designs for their dona-
songs? A finished product must meet as opposed to the old LP pressings and tions of door prizes.
the expectations of the consumers; for cassette days. Gary Louie and Rick Chinn
example, when the volume level of After a break for refreshments, the
each song is even and one does not panelists played some mastering
have to dive for the volume knob. examples on a Genelec system. Deliv- A la Mode…
What about the loudness wars of ery formats used to send the master to On December 9, members and guests
CDs? Al Swanson noted that even the pressing plant include: Audio CD- of the British Section gathered to hear
classical music has some issues with R, although it is not the best choice Rhonda Wilson and Michael Capp of
wanting it all to be loud. Others com- due to possible inaccuracy in data Meridian Audio talk about “The
mented that most material that is retrieval; Exabyte DDP, a data tape Loudspeaker-Room Interface—Con-
squeezed/compressed and loud arrives format that was once popular but is no trolling Excitation of Room Modes.”
for mastering like that. It is hard to longer made; ftp (file transfer proto- Wilson began by stating the aim of
make mastering changes if there is no col), which sends audio files straight the paper: to identify the strongest
headroom left with which to work. to the plant via the Internet; and room modes excited by each loud-
Is there a preferred format to receive PMCD – Pre-Master CD, or an audio speaker and to prefilter the signal fed
material? The general agreement was CD-R made to full Redbook Audio to each loudspeaker in such a way that
that 1⁄2-inch stereo analog sounds the standards. The term originally the adverse effects of dominant room
best, followed by 1⁄4-inch analog. The described a proprietary format from modes are significantly reduced. Such
exception was Al Swanson, who said Sony and Sonic Solutions but is now a system must be spatially robust and
he never uses analog anymore. How- commonly understood as a general simple to set up. Improvements at one
ever, others will always use an analog media format. listening position should create
tape stage somewhere during master- Each panelist briefly described his improvements, not degradations, at
ing; it is akin to using a processor for own mastering set-up. Ed Brooks said other listening positions.
the sound, even when the recording he often converts material to analog Typically, customer room specifica-
will end up digital. and uses outboard processing. Pan- tions constitute small room acoustics.
What are the essential tools for mas- elists agreed that the material they The reverberant field under these cir-
tering? Most agreed that the most receive needs a little headroom to cumstances is very important. Wilson
important factor was a set of good work with. Normalized or clipped discussed various methods of correc-
loudspeakers. Also discussed were the material is not good and severely lim- tion, as well as methods to avoid, such
room and signal path preferences. its the changes that can be made. This as full bandwidth magnitude response
Nearly everyone tried to steer clear of led to a discussion of loudness on CDs equalization and full bandwidth
the “magic bullet” gear syndrome, but and broadcasts. One question asked impulse response equalization. These
everyone had something to say about was, “What happens to my mix when methods are very sensitive to position
the importance of the room and moni- it gets played on the radio?” Mark effects. Audio delay may be neces-

444 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


NEWS
OF THE
SECTIONS

sary, which is not appropriate for third octave bandpass filter is used for available in the paper, “The Loudspeak-
DVD video playback. measuring the decay time, followed by er-Room Interface—Controlling Excita-
Modal ranges cover 20 Hz to 200 Schroeder integration. According to tion of Room Modes,” by Rhonda J.
Hz. Room effects include emphasis of Capp, a more accurate process is Wilson, Michael D. Capp, and J. Robert
some frequencies, ringing at resonant required to determine decay of individ- Stuart, Meridian Audio, AES 23rd Int’l.
frequencies, pitch changes during ual room modes. Waterfall FFT plots Conference, Copenhagen, 2003.
decay, and beat and echo effects. The do give a better resolution. The Steven Harris
room resonant modes of rigid walled Schroeder integration is prone to noise,
rooms of certain dimensions can be so special attention is given to starting
calculated. Usually, both fundamental from the peak impulse and ending just Triple Treat in New York
and harmonic modes are set up. Non- above the noise floor. As an example, Some 70 members and guests of the
rigid walls cause the modes to decay Capp showed several waterfall plots. New York Section met on December
exponentially. Wilson showed graphs Least squares regression is used to give 8, at Innovative Audio’s high-end
of the decay of room resonance, beat the optimum decay time. showrooms on Manhattan’s East Side
frequency effects, and echo effects. To identify the dominant room for a holiday party and annual triple
Although the magnitude of a room modes, all the peaks in the magnitude treat meeting.
mode varies with listening position, response must be measured. After the The evening’s format featured three
the decay time of a room mode is the decay time at each room mode is presenters, Thom Cadley, Malcolm
same at different listening positions. found, the dominant modes are identi- Addey, and Jim Anderson, who held
There is no requirement to use a spe- fied using the decay time. A more court to share their engineering skills.
cial microphone. Hence there is a sophisticated method is to calculate Cadley played DVD-Audio 5.1 sur-
focus on decay time. many decay profiles from a waterfall round mixes from his work with artists
The filter to control decay time uses plot. All the decay profiles are then Beyonce, Billy Joel, Stevie Ray Vaugh-
original Q, target Q, and center fre- summed and the largest peaks represent an, and others. Addey brought in a
quency. The resulting notch filter has the dominant modes. Example plots beautifully spacious two-track remote
a gain and bandwidth that can be cal- clearly showed the dominant modes. of a small string orchestra he recorded
culated based on target decay time and Capp described the system imple- in a West Side church. Anderson com-
room mode decay time. Plots of level mented to make these measurements, pared four release versions of Patricia
against delay time after correction which includes a microphone or SPL Barber’s Companion album on CD,
showed some sensitivity to pole fre- meter in the center of the room, a PC XRCD, SACD, and vinyl.
quency errors. To avoid errors, 32-bit and a surround decoder. An MLS test After a brief welcome and introduc-
coefficients and double precision signal is used for each loudspeaker, tion by emcee Allan Tucker, the partic-
implementations are also necessary. from which the impulse response for ipants split up into three isolated sound
Wilson said that it is not desirable to each loudspeaker is calculated. Then rooms to spend the first of three half-
create an anechoic environment. Pre- the designer performs an FFT and hour sessions listening to the presenter
vious work in the measurement of the decay time analysis, followed by de- of their choice. After thirty minutes,
reverberation time of living rooms has sign of the filters for each loudspeaker. each participant switched to another
yielded times of 0.4 to 1 second, quite Meridian has developed PC software room to experience a different show.
long compared to studios. driven by a user-friendly wizard to per- In this way, everyone got to enjoy all
Some work was done to make sure form the measurements and correction. three programs.
that the direct response was not overly Capp discussed a sample room and The central meeting area provided a
corrupted, and plots showed that the showed magnitude responses for dif- perfect spot for relaxing, schmoozing,
direct response changes phase when ferent listening positions before correc- and snacking. Conversations casually
the exciting signal is switched off. tion. Postcorrection magnitude and overheard touched on topics such as
Music is much more complex than decay responses showed clear the placement of Blumlein micro-
tones. All prefiltering is done below improvement. The filters generated for phones, the difficult state of the
250 Hz. The trade-off is to prefilter one listening position improved the recording business, and postulations
enough to reduce the reverberation responses for many listening positions. on why each attendee could not go
without affecting the direct response In conclusion, this new spatially home with a complimentary plasma
too much. Notches up to 6 dB deep are robust filter design technique uses only display. Maybe next year.
okay. Larger notches are more one microphone position but controls Allan Tucker
problematic. the decay time at multiple positions.
Capp continued, saying that the stan- The filter design is automated and does
dard process is used to characterize the not require specialist knowledge. Webster Reports Growth
decay time in the range 500 Hz to 2 According to Capp and Wilson, so far Students from the Webster Universi-
kHz and then to set a target decay time initial feedback from dealers and cus- ty Student Section are pleased to
for modes below 250 Hz, which tomers has been positive. report that a number of new students
increases as frequency decreases. A More information on this topic is have applied for membership and ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 445


NEWS
OF THE
SECTIONS

that the section appears to be steadily of audible distortion. gram included sing-alongs of Christmas
growing. The section welcomed some Because the presentations and songs, a condensed and minimalist per-
of these new members at a meeting demonstration ended late, the group formance of Bizet’s Carmen, and a
held on January 13 at the university. didn’t get a chance to hear the concert- medley of Hanukkah tunes that featured
Section chair John Jory talked to the sized system that had been set up out- Josef Burgstaller’s remarkable imitation
32 students present and described both doors. of a klezmer clarinetist, performed on a
the section’s achievements and hopes Bob Lee piccolo trumpet.
for the upcoming semester. The stu- The group thanked Kahne Krause
dents worked on a radio spot to Disney Concert Hall for putting the program together. She
announce the surround sound demon- Thanks to the efforts of section trea- did it almost single-handedly (with
stration on February 17. Eric Black- surer Kahne Krause, the section some help from her husband Ira), han-
mer, chairman of Earthworks, agreed enjoyed a holiday concert on Decem- dling the tickets, reservations, dinner,
to give a lecture to the group sometime ber 13 by the Canadian Brass at the and guest speaker.
in the future. In addition, the section new Walt Disney Concert Hall. In a
plans to hold elections for new com- pre-concert dinner at McCormick and
mittee members. Officers will remain Schmick’s, Yasushisa Toyota, a prin- Penn State Wrap-up
in their positions for the next semester. cipal designer for Nagata Acoustics, The Penn State Section announces
Andy Weidmann spoke to the group about the acousti- membership growth due to the group’s
cal design and lineage of the hall. successful attempts to reach out to the
Toyota began by explaining the char- campus and community. The section
Planar Transducers acteristics of two leading types of con- is now on a campaign to gain more
The Los Angeles Section’s November cert hall architecture: the shoebox and national members, as well as add more
25 meeting was hosted by HPV Tech- the vineyard. He showed photos and variety to meeting topics.
nologies in Costa Mesa. The company diagrams of halls representative of the On September 29, 30 people gathered
develops, manufactures, and markets two philosophies. The shoebox, com- to hear Robin Miller of Filmmaker, Inc.
concert loudspeaker systems using pact and rectangular shape, offers excel- Miller gave an abbreviated version of
planar transducers. lent acoustics to the listening his talk for the 115th AES meeting in
Before entering the meeting, each audience but is limited in usable size. New York. The presentation outlined
person was required to sign a nondis- To accommodate larger numbers of his patent-pending method of recording
closure agreement by company staff. patrons, designers have increasingly 3-D sound (PerAmbio 3D/2D) to more
Dragoslav Colich began his talk turned to the vineyard approach, so completely capture the sound field
with the recent evolution of planar named because the curving rows and experienced by a listener in a live
transducers. He spoke of his own jour- fractured seating sections as viewed acoustic space. These recordings can be
ney in audio transducer research and from the stage resemble the hills of a reproduced in 2-D on current 5.1/6.1
development, including a description vineyard. surround sound systems. The complete
of the magnet arrangements used in According to Toyota, a hurdle to be 3-D sound field can be reproduced with
such drivers. overcome in the vineyard design is the the help of a decoder.
HPV integrates these relatively issue of primary reflecting surfaces. In a On October 7, nine members joined
small, flat drivers into large arrays, shoebox, the walls define the room life fellow and former AES governor
creating highly efficient yet remark- boundaries and are fairly close to the (1977-1979) Geoffrey Wilson in a dis-
ably lightweight systems. Colich ex- listening audience, providing strong pri- cussion entitled “British Audio—75
plained the benefits and restrictions of mary reflections to reinforce the direct Years Ago.” In his talk, Wilson remi-
arraying the drivers. Two end users, sound from the stage. Vineyard design- nisced about his boyhood memories of
including the well-known event audio ers incorporate numerous small, often his father, Percy, who was technical
specialist Gary Hardesty, described curving walls that not only separate the advisor to The Gramophone Magazine
their experience with using the arrayed seating sections from one another but from the late 1920s until the start of
systems in situations ranging from the also serve as primary reflectors. The World War II in 1939. Most of his
Arco Arena in Sacramento to a Broad- Walt Disney Concert Hall is but the lat- memories had to do with the large horns
way on Times Square show, to an out- est example of this vineyard design. that Geoffrey’s father kept around the
door concert in Washington, D.C. For most of the AES contingent, this house, as well as his experiments with
After the presentations came the concert was the first visit beneath and early phonographs—many of which
demonstration of what these loud- within the wavy stainless steel of archi- were kept in their living room. The
speaker systems can do, with seg- tect Frank Gehry’s conception, which meeting served as a wonderful excur-
ments of jazz, pop, and classical was certainly a treat. The Canadian sion into the history of audio.
music. Some was quite loud and chest- Brass, a quintet of tuba, trombone, Wilson showed many slides, most of
thumping—especially the subwoofer- French horn, and two trumpets enter- which were taken from a 1974 paper
fed cannon shots in Tchaikovsky’s tained with a winning combination of “Horn Theory and the Phonograph,”
1812 Overture Solenelle—yet devoid virtuoso playing and humor. The pro- which historically detailed the devel-

446 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


NEWS
OF THE
SECTIONS

opment of the acoustic theory of of the artistry of choosing each loud-


Upcoming Meetings
the phonograph. speaker in a surround sound set-up. He
continued with a discussion of the dos
2004 May 8-11: 116th AES Con-
Acoustics and don’ts of recording surround. vention, Messe Berlin, Berlin,
On November 17, 23 members and The audience was very interested in Germany. Contact: e-mail:
guests of the Penn State Section gath- the wide variety of choices an engineer 116th_chair@aes.org. See
ered for David Swanson, Penn State has in deciding how to mix music in a page 464 for details.
Acoustics faculty member . surround environment. In the second •
Swanson gave an overview of some half of the three-hour seminar, Sokol 2004 May 17-21: International
topics to be covered in the upcoming showcased some software and equip- Conference on Acoustics,
acoustics class, Advanced Digital ment from his sponsors: Tannoy, SRS, Speech, and Signal Process-
Audio Signal Processing. He briefly DTS, MaxVision, and Steinberg. In all, ing (ICASSP 2004), Montreal,
described many digital audio formats it was a highly enjoyable meeting. Canada. On the Internet: www.
icassp2004.com.
and algorithms, complete with exam- Dan Valente

ples, and touched on topics such as 2004 June 17-19: 25th Interna-
digital filtering, audio effects, MPEG tional Conference, London,
2 Level 3 (MP3) and noise cancella- Czech Seminar UK, “Metadata for Audio.”
tion. The entire presentation can be Forty-five members, guests, and stu- Contact John Grant, chair, e-
found on the Penn State Web site at: dents of the Czech Section attended a mail: 25th_chair@aes.org.
www.clubs.psu.edu/up/aes/. day-long seminar held in the Congress •
On December 8, 22 members gath- Hall of the Promopro Company of 2004 July 5-8: 11th International
ered for the final meeting of the Prague on December 8. During the Congress on Sound and
semester to hold elections. The new morning session, members from Vibration (ICSV11), St. Peters-
officers are as follows: Dan Prague and Brno presented several burg, Russia. Contact Con-
gress Secretariat at P. O.
Valente, chair; Alexandra Loubeau, papers and held discussions and
Box 08A9, 1st Krasnoar-
vice-chair; Mark Gramann, treasurer; demonstrations. The papers included:
meyskaya Str., 1, 190005, St.
Remy Gutierrez, secretary; members- “Present Trends in the Development of Petersburg, Russia, or e-mail:
at-large: Miguel Horta, Erin (Jinx) Digital Signal Processing,” by Profes- icsv11@rol.ru.
Horan, Liz Cosharek. sor Zdenek Smekal; “Digital Signal •
Valente expressed the need for peo- Processor for AM Transmitters,” by 2004 July 12-14: Noise-Con
ple to be more active in the Society Ing. Pavel Stranak; “Systems for Mul- 2004, Baltimore, MD, USA,
and outlined a plan devised by the tichannel Sound Recording and Repro- INCE/USA, e-mail: ibo@
officers to increase membership. At duction,” by Ing. Richard Jejkal; and inceusa.org.
the core of this plan is the need to gen- “Transducers Using Direct Digital- •
erate more publicity for meetings and Analog Conversion,” by Libor Husnik. 2004 August 22-25: Internoise
2004: Prague, Czech Repub-
provide variety in the subjects cov- After the technical symposium and
lic. On the Web: www.inter-
ered. Members discussed ideas for lunch break, the section held a busi-
noise2004.cz.
upcoming meetings and former chair ness meeting. Chair Pavel Baladran •
Kevin Bastyr suggested that smaller reviewed the events of the past year 2004 October 1-3: 26th Interna-
meetings should be held where mem- and asked that he not be nominated to tional Conference, “High
bers give 10- to 15-minute talks or continue as chair for the 2004 season. Quality Analog Audio Pro-
lead discussions. This will help keep He plans to retire and move. Secretary cessing,” Baarn, “the Nether-
members active. Brian Tuttle, another Jiri Ocenasek discussed the financial lands. See page 464.
former chair, suggested that members report and made several recommenda- •
should brainstorm a new project (simi- tions regarding member databases and 2004 October 6-8: The NAB Ra-
lar to the PVC loudspeaker designs) e-mail correspondence. There was also dio Show, San Diego, CA,
for the upcoming year. some time spent on planning transport USA. For information call 800-
342-2460 or 202-429-5419.
Thanks to the new publicity cam- and accommodation for those Czech

paign, the first meeting of the spring Section members who will travel to the 2004 October 28-31: 117th AES
semester on January 19 was enor- AES Convention in Berlin. Convention, San Francisco,
mously successful. The section kicked The election results for 2004 were: CA, USA. See page 464 for
off with Mike Sokol’s 2004 surround- Libor Husnik from the Technical Uni- details.
sound tour, which is being presented versity of Prague will be chair. He will •
by Fits & Starts Productions. Over 60 also maintain his role as faculty advi- 2004 November 15-19: 148th
people showed up for this semi- sor for the student section. Ing. Meeting of the Acoustical
nar/workshop. Sokol described tech- Zdenek Kesner will remain as vice Society of America, San
niques in surround sound engineering. chair, and Ocenasek as treasurer. Diego, CA, USA, e-mail:
During the session, he gave examples Jiri Ocenasek asa@aip.org.

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 447


SOUND
TRACK

the Shanghai New International Expo


Centre. Last year, the exhibition
attracted over 21,000 trade buyers and
visitors from 49 countries and regions.
This year, the exhibition will be held
in conjunction with Prolight+Sound
Shanghai and will include seminars,
live shows, and demonstrations on the
latest trends in the musical instrument
and accessory business. Organizers of
this year’s exhibition are Messe Frank-
furt (HK) Ltd., INTEX Shanghai Co.
Ltd., and the China Music Instrument
Association (CMIA). For information
on Music China, contact:
Ms. Judy Cheung, Messe Frankfurt
(HK) Ltd.; tel: 852-2238-9933,
Solid State Logic won a Technical Grammy at the 46th Awards ceremony on fax: 852-2598-8771, Internet:
February 8. Shown (left to right): Rich Plushner, president of SSL; Phil Ramone,
producer/engineer; and Phil Wagner, SSL’s senior vice president. www.musikmesse.com or e-mail:
judy.cheung@hongkong.messefrank-
furt.com.
ABOUT PEOPLE… development. While working for Bose,
Breen won several awards including Noise-Con 2004 will be held July
AES sustaining member Klipsch Highest Attainment to Quota, Most 12-14 at the Wyndham Inner Harbor
Audio Technologies has announced Improved Region, Highest Total Sales, Hotel in Baltimore, Maryland. This
that Jim Breen has joined the com- Largest Single Sale, and The Presi- joint meeting of the Institute of Noise
pany as business unit manager of dent’s Award. Control Engineering of the USA
Commercial Contracting and Corpo- (INCE/USA) and the Transportation-
rate Development. In his new posi- ABOUT COMPANIES… Related Noise and Vibration Commit-
tion, Breen is responsible for the tee of the Transportation Research
overall direction and distribution of Solid State Logic, an AES sustaining Board (TRB-A1F04) will feature tech-
Klipsch brand audio products to member, received the prestigious nical sessions on all aspects of noise
hotels, restaurants, houses of wor- 2004 Technical Grammy Award at control engineering with an emphasis
ship, retail stores, theme parks, board the recent 46th Annual Grammy on transportation noise, and an exposi-
and conference rooms, and other Awards ceremony on February 8, in tion of measurement instrumentation
commercial establishments. Los Angeles, CA. The award recog- and noise and vibration control prod-
Over the past year Klipsch loud- nizes SSL for outstanding technical ucts. There will also be receptions and
speakers have been installed in Gap, achievement in the design and pro- a dinner cruise of the Baltimore Harbor.
Banana Republic, Old Navy, Ron duction of audio mixing consoles. The conference proceedings will be
Jon’s, and Bloomingdale’s stores, as Since the company’s founding in published on a CD-ROM and will be
well as at hundreds of BP gas sta- 1969, SSL has pioneered advances in included as part of the package each
tions across the country. Breen will console technology to aid the audio attendee receives at the conference. In
work to maintain a steady stream of production process. addition to general sessions, special
high profile installations and increase Solid State Logic is a leading manu- sessions are being organized in the fol-
Klipsch’s share of the commercial facturer of analog and digital audio lowing areas: transportation noise,
contracting market. consoles for broadcast, postproduc- analysis and measurements, vendor
Before joining Klipsch, Breen tion, music, and film facilities. products, and policies and markets.
worked at Bose Corporation, where he For more information on INCE,
held a variety of positions over the CONFERENCES, MEETINGS… the conference and on abstract
course of 16 years. His previous titles submission, visit: www.inceusa.org.
include: regional business manager, Music China, International Exhibition The call for papers can also be found
Northeast territory manager, national for Musical Instruments and Services, on the INCE Web site: www.
accounts manager, and new products will take place October 20-23, 2004, at inceusa.org/NoiseCon04call.pdf.

448 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


NEW PRODUCTS
AND

DEVELOPMENTS
Product information is provided as a
service to our readers. Contact manu-
facturers directly for additional infor-
mation and please refer to the Journal
of the Audio Engineering Society.

and a 6.1-capable bass management the A-20, there are no parallel surfaces
system for flexibility. All electronics inside the M-20’s cabinet, so internal
are integrated into the 7073A sub- standing waves remain random and
woofer cabinet, including active provide a frequency response of 45 Hz
crossover filters, driver overload pro- to 20 kHz +/- 5 dB. Cabinet dimen-
tection circuits and power amplifiers. sions are 15-in x 13-in x 10-in and
The bass management system features each monitor weighs 18 lbs. The M-80
six inputs and outputs, LFE input and is sold by the channel and includes a
summed signal output connectors. 250-W monaural control amplifier that
Dedicated LFE input has a low-pass fits in a single rack space. NHTPro,
filter selectable to 85 or 120 Hz, plus a 6400 Goodyear Road, Benicia, CA
+10 dB sensitivity switch. If the LFE 94510, USA; tel. 800-648-9993 (toll
channel includes frequencies above free); fax +1 707 747 1273; Web site
the crossover frequency, the system’s www.nhtpro.com.
redirect function sends them to the
AES SUSTAINING MEMBER center front output to ensure they are AES SUSTAINING MEMBER
BROADCAST LINE MICRO- audible. Genelec Inc., 7 Tech Circle, WIRELESS SYSTEM for theater and
PHONE is the BCM 104 for radio Natick, MA 01760, USA; tel. +1 508 event production is composed of the
announcing. The microphone features a 652 0900; fax +1 508 652 0909; e- Artist Elite 5000 System, AT899 Sub-
K 104 large-diaphragm condenser with mail genelec.usa@genelec.com; Web miniature lavalier, and a transmitter
cardioid directional pattern and switch- site www.genelec.com. system. The Artist Elite 5000 Wireless
able proximity effect compensation. A System offers frequency-agile opera-
high-pass filter reduces frequencies tion with IntelliScan™ frequency
below 100 Hz by 12 dB/octave. A sec- selection for a choice of two hundred
ond, pre-attenuation switch allows the selectable UHF channels. The system
sensitivity to be reduced by 14 dB to also features PC-compatible control
optimize performance for circuits software, easy setup and maximum
designed for dynamic microphones. operation flexibility. The AT899cW
Both switches are internally mounted subminiature condenser microphone
within the microphone housing. The measures 5 mm in diameter and fea-
BCM 104 head grille twists off easily tures a low-profile housing with inter-
for quick cleaning. Additionally, a fit- nal construction designed to minimize
ted elastic mount prevents structure- noise from handling, clothing, cos-
borne noise and is compatible with tumes, and wind. The AEW-5111 UHF
standard broadcast-segment micro- Wireless Dual-UniPak™ Transmitter
phone arms. Neumann, 1 Enterprise System includes an AEW-R5200 dual
Drive, Old Lyme, CT 06371, USA; tel. receiver and two AEW-T1000 Uni-
+1 860 434 5220; fax +1 860 434 REFERENCE/MIXING MONITOR Pak™ transmitters. Finally, at the heart
3148; Web site ww.neumannusa.com. incorporates the drivers from the high- of the system is the AEW-R5200
ly acclaimed A-20 in a radically new receiver, which combines two indepen-
ACTIVE SUBWOOFER SYSTEM cabinet design that disperses sound dent receivers in a single full-rack
is designed for large-scale installations over a wider listening area. The new housing. The 5000 Series Wireless
in either stereo or surround. The M-20 features a flat, wide baffle across Systems are available with a choice of
7073A features four 12-in drivers, the tweeter and a narrower baffle sur- two condenser and two dynamic hand-
fast-acting low-distortion amplifiers, rounding the woofer for more even dis- held Artist Elite microphone/transmit-
124-dB SPL capability down to 19 Hz persion and enhanced imaging. Unlike ters. Audio Technica U.S., Inc., ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 449


NEW PRODUCTS
AND

DEVELOPMENTS
1221 Commerce Drive, Stow, OH AVAILABLE
44224, USA; tel. +1 330 686 2600; fax
+1 330 688 3752; Web site www.
audio-technica.com.
LITERATURE
The opinions expressed are those of
SOFTWARE OPTION complements the individual reviewers and are not
the signal analyzers of the Rohde & necessarily endorsed by the Editors of
Schwarz FSQ series as well as the the Journal.
company’s FSU and FSP spectrum
analyzers. The software expands the
applications by measurement functions CATALOGS, BROCHURES… scientific disciplines. The Century of
in accordance with the 3GPP2 specifi- Science initiative will extend through
cations for mobile phones. Thus, mea- A new catalog of miniature micro- 2004, with new material available in
surements in line with the 1 x EV-DV phones and accessories is now avail- 2005.
standard are now feasible for the first able. The 34-page, full color catalog has Thomson ISI, The Thomson Corpo-
time. The regular cdma2000 standard, details of DPA’s MSS6000 Microphone ration is in Stamford, Connecticut,
which is supported particularly in Asia Summation System and the MPS6000 USA; tel: 215-366-0100, ext. 1396,
and North America, is also covered. Microphone Power Supply, as well as Internet: www.thomsonisi.com.
Revision C of the 1xEV-DV standard all current miniature microphone prod-
enables a higher data transmission rate ucts and solutions. Also featured is the Soundspace-Architecture for Sound
compared to the regular cdma2000 company’s range of 33 connection and Vision by Peter Grueneisen is
standard, allowing additional data ser- adapters, which support more than 80 published by Birkhauser Publishing
vices. Used in combination with soft- different wireless transmitter systems. Ltd., Basel, Switzerland. The 240-page
ware option FS-K83, the signal and A wide selection of mounts, grids, book is written and compiled by studio
spectrum analyzers are ideally suited windscreens, clips and magnets, bau:ton’s founder and principal archi-
for use in the development, production, including the popular MHS6001 Hold- tect. It contains issues central to build-
and quality assurance of mobile er for Strings, and the EMK and ing for the creation of sound, picture
phones that function in accordance FMK4071 microphone kits developed and contemporary media. With over
with the cdma2000 and 1xEV-DV for film and broadcast use are includ- 700 illustrations, the book covers
standards. Rohde & Schwarz GmbH & ed. The illustrated catalog contains diverse topics aimed at professionals,
Co. KG, Mühldorfstr. 15, D-81671 specifications as well as a detailed students, and music and film lovers as
Munich, Germany; tel. +49 89 4129 description of product applications. well as architecture aficionados.
13779; fax +49 89 4129 13777; e-mail For a copy contact: DPA Microphones The relationship between space,
c u s t o m e r s u p p o r t @r s d . r o h d e - A/S, Gydevang 42-44, DK-3450 sound, and vision is explored. Recent
schwarz.com; Web site www.rohde- Alleroed, Denmark; tel: +45-4814- developments in the music, film, and
schwarz.com. 2828, fax: +45-4814-2700, Internet: media industries have resulted in new
www.dpamicrophones.com; e-mail: types of buildings. These projects
AES SUSTAINING MEMBER info@dpamicrohpones.com. require a comprehensive approach
ANALOG AND DIGITAL CON- from various disciplines to bridge
VERTERS manage complex titles for IN BRIEF AND OF INTEREST… architecture, art and technology. The
high-resolution/multichannel music book is based on the experiences of
mastering. Meitner’s ADC8 provides Thomson ISI has announced that it is studio bau:ton over 13 years in com-
accuracy in converting analog masters expanding its Web of Science® cov- bining these disciplines.
to the digital domain with the proper erage with a Century of Science™, a Essays and projects by renowned
amount of warmth. The DAC8 sends new initiative that will provide the experts in audio engineering, science,
audio from the digital to the analog research community with access to the music and architecture are included. A
arena with no loss of quality in the world’s most influential scientific chapter on buildings and projects cov-
conversion process. During the master- research throughout the 20th Century. ers over forty projects, which include
ing process, the user is able to control This comprehensive archive, Web architectural competitions for muse-
all of the audio sources using the Meit- of Science, The Century of Science, ums and concert halls, music studios,
ner Switchman source controller, includes bibliographic data from the film facilities, TV, and radio broadcast
which allows the user to A/B audio highest impact scientific literature studios. Many of the buildings are pre-
from the mastering console, analog published between 1944-2000. It will sented in a beautifully illustrated port-
tape decks, and DVD player. SADiE, add nearly 850 000 articles from folio. The format is accessible and can
Inc., 475 Craighead Street, Nashville, approximately 200 journals. The be enjoyed by beginners as well as
TN 37204, USA; tel. +1 615 327 1140; Thomson ISI editorial team carefully professionals.
fax +1 615 327 1699; e-mail: selected journals based on criteria The hardcover book costs $77 and is
sales @ sadieus.com; Web site such as citation patterns, geographic available online at www.birkhauser.ch.
www.sadie.com. origin and meaningful balance across ISBN: 3-7643-6975-2.

450 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


MEMBERSHIP
INFORMATION
Section symbols are: Aachen Student Section (AA), Adelaide (ADE), Alberta (AB), All-Russian State Institute of Cinematography
(ARSIC), American River College (ARC), American University (AMU), Appalachian State University (ASU), Argentina (RA),
Atlanta (AT), Austrian (AU), Ball State University (BSU), Belarus (BLS), Belgian (BEL), Belmont University (BU), Berklee
College of Music (BCM), Berlin Student (BNS), Bosnia-Herzegovina (BA), Boston (BOS), Brazil (BZ), Brigham Young University
(BYU), Brisbane (BRI), British (BR), Bulgarian (BG), Cal Poly San Luis Obispo State University (CPSLO), California State
University–Chico (CSU), Carnegie Mellon University (CMU), Central German (CG), Central Indiana (CI), Chicago (CH), Chile
(RCH), Cincinnati (CIN), Citrus College (CTC), Cogswell Polytechnical College (CPC), Colombia (COL), Colorado (CO),
Columbia College (CC), Conservatoire de Paris Student (CPS), Conservatory of Recording Arts and Sciences (CRAS), Croatian
(HR), Croatian Student (HRS), Czech (CR), Czech Republic Student (CRS), Danish (DA), Danish Student (DAS), Darmstadt
(DMS), Del Bosque University (DBU), Detmold Student (DS), Detroit (DET), District of Columbia (DC), Duquesne University
(DU), Düsseldorf (DF), Ecuador (ECU), Expression Center for New Media (ECNM), Finnish (FIN), Fredonia (FRE), French
(FR), Full Sail Real World Education (FS), Graz (GZ), Greek (GR), Hampton University (HPTU), Hong Kong (HK), Hungarian
(HU), I.A.V.Q. (IAVQ), Ilmenau (IM), India (IND), Institute of Audio Research (IAR), Israel (IS), Italian (IT), Italian Student
(ITS), Japan (JA), Javeriana University (JU), Kansas City (KC), Korea (RK), Lithuanian (LT), Long Beach/Student (LB/S), Los
Andes University (LAU), Los Angeles (LA), Louis Lumière (LL), Malaysia (MY), McGill University (MGU), Melbourne (MEL),
Mexican (MEX), Michigan Technological University (MTU), Middle Tennessee State University (MTSU), Moscow (MOS), Music
Tech (MT), Nashville (NA), Nebraska (NEB), Netherlands (NE), Netherlands Student (NES), New Orleans (NO), New York (NY),
New York University (NYU), North German (NG), Norwegian (NOR), Ohio University (OU), Orson Welles Institute (OWI),
Pacific Northwest (PNW), Peabody Institute of Johns Hopkins University (PI), Pennsylvania State University (PSU), Peru (PER),
Philadelphia (PHIL), Philippines (RP), Polish (POL), Portland (POR), Portugal (PT), Ridgewater College, Hutchinson Campus
(RC), Romanian (ROM), Russian Academy of Music, Moscow (RAM/S), SAE Nashville (SAENA), St. Louis (STL), St. Petersburg
(STP), St. Petersburg Student (STPS), San Buenaventura University (SBU), San Diego (SD), San Diego State University (SDSU),
San Francisco (SF), San Francisco State University (SFU), Serbia and Montenegro (SAM), Singapore (SGP), Slovakian Republic
(SR), Slovenian (SL), South German (SG), Spanish (SPA), Stanford University (SU), Swedish (SWE), Swiss (SWI), Sydney (SYD),
Taller de Arte Sonoro, Caracas (TAS), Technical University of Gdansk (TUG), Texas State University—San Marcos (TSU), The
Art Institute of Seattle (TAIS), Toronto (TOR), Turkey (TR), Ukrainian (UKR), University of Arkansas at Pine Bluff (UAPB),
University of Cincinnati (UC), University of Colorado at Denver (UCDEN), University of Hartford (UH), University of Illinois at
Urbana-Champaign (UIUC), University of Luleå-Piteå (ULP), University of Massachusetts–Lowell (UL), University of Miami
(UOM), University of Michigan (UMICH), University of North Carolina at Asheville (UNCA), University of Southern California
(USC), Upper Midwest (UMW), Uruguay (ROU), Utah (UT), Vancouver (BC), Vancouver Student (BCS), Venezuela (VEN),
Vienna (VI), Webster University (WEB), West Michigan (WM), William Paterson University (WPU), Worcester Polytechnic
Institute (WPI), Wroclaw University of Technology (WUT).

These listings represent new membership according to grade.


MEMBERS Roy Markowitz Patrick Narron
800 West End Ave. Ste. 12A, New York 38439 Oliver Way, Fremont, CA 94536 (SF)
Eric Liemley City, NY 10025 (NY)
3013 Findley Rd., Kensington, MD 20895 Bryan Neumeister
(DC) Raynald Masse 2226 N. 17th Ave., Phoenix, AZ 85007
Jeff Lipton 124 de l’Emeraude, St. Colomban, J5K 2C8,
Toshiyuki Nishiguchi
Peerless Mastering & Consulting, 1085 Quebec City, Canada (MGU)
NHK Science & Technical Research Labs.,
Commonwealth Ave. #322, Boston, MA Devin Maxwell Kinuta 1-10-11, Setagaya-ku, Tokyo 157-
02215 (BOS) 36 S. 4th St. #D9, Brooklyn, NY 11211 (NY) 8510, Japan (JA)
Wing Wah Lo Thomas P. McGinty Akira Nishimura
Rm. 1001 Chung Hwa Bldg., Ma Hang 800 Starbuck Ave. Ste. A103, Watertown, Tokyo Univeristy Information Sciences,
Chung Rd., Tokwawan, Kowloon, Hong NY 13601 (NY) Yatocho 1200-2, Wakaba-ku, Chiba-shi 265-
Kong (HK) 8501, Japan(JA)
Chris Meade
Brian Loudenslager
117 Dobbin St., Brooklyn, NY 11222 (NY) Marc Nutter
102 Eddieton Ct., Mebane, NC 27302 (AT)
275 S. Gilpin St., Denver, CO 80210 (CO)
Yan-Chen Lu Michael Modesto
6F No. 192 Lien-cheng Rd., Chung-ho, Tapei 2220 Ashwood, Nashville, TN 37212 (NA) Christopher O’Malley
County, Taiwan 12535 Pacific Ave. #2, Los Angeles, CA
Masayuki Morimoto
90066 (LA)
Ian Macbeth Kobe University Faculty of Engineering,
Dulverton House, Cedar Ave., Alsager, Rokkoudai-cho 1, Nada-ku, Kobe-shi, Richard Perry
Cheshire, ST7 2PH, UK (BR) Hyougo-shi 657-8501, Japan (JA) 200 Stoneacre Court, Peachtree City, GA
30269 (AT)
Gabriel Maga James Mullen
#201-3846 Carrigan Crt., Burnaby, V3N 44 Randolph Ave., Old Bridge, NJ 08857 Andrew Peters
4H9, British Columbia, Canada (BC) (NY) P. O. Box 36411, Tucson, AZ 85740 ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 451
MEMBERSHIP
INFORMATION
Gregory Ramos Jr. William Dave Scott Evans
215 Stagg Walk #2/B, Brooklyn, NY 11206 320 Honeylocust Ct., Bel Air, MD 21015 3003 Snelling Ave. North, St. Paul, MN
(NY) (DC) 55113 (UMW)
Stephen Rice Justin Davis Richard L. Faint
105 Lakeway Dr., Oxford, MS 38655 3351 S. Winston St., Aurora, CO 80013 (CO) 21 Redgum Ct., Bellbird Park, QLD 4300,
Todd A. Rosso Moska Davis Australia (BRI)
1090 Ashland Ave., St. Paul, MN 55104 212 E. Fulton St., Long Beach, NY 11561 James J. Falco
(UMW) (NY) 69-10B 188St. # 2B, Fresh Meadows, NY
Paul Saari Lawrence de Martin 11365 (NY)
P. O. Box 31115, Oakland, CA 94604 (SF) 2525 Arapahoe Ave. Ste. E4-272, Boulder, Nick Farina
Andres Schmidt CO 80302 (CO) 617 So. Lytle #3, Chicago, IL 60607 (CH)
2188 Tulip Rd., San Jose, CA 95128 (SF) Dan Dean
James Farina II
Yury Spitsyn 5 Lindley Rd., Mercer Island, WA 98040
3030 Flo Lor Dr. # 12, Youngstown, OH
Dartmouth College, 6242 Hallgarten Hall, (PNW)
44511 (DET)
Hanover, NH 03755 (BOS) Albertus C. den Brinker
Philips Research, WO 02, Prof. Holstlaan 4, Roberto A. Felarca
Tim Stambaugh
NL 5656 AA, Eindhoven, Netherlands (NE) 51 Zorra St. SFDM, Quezon City, MM 1105,
400 Belleville St., New Orleans, LA 70114
Philippines (RP)
(NO) Douglas Devitt
Paul Stevens 95 Newton St., Weston, MA 02493 (BOS) Dominique Fellot
188 Baddow Rd., Chelmsford, Essex CM2 6 bis rue Michel-Ange, FR 92160, Antony,
Jose Diaz-Betancourt France (FR)
9QW, UK (BR) 24 Street # J-10, Royal Town, Bayamon, PR
Albert T. Stokes Alvin Finnerty
David Dintenfass
940 Wadworth Blvd., Lakewood, CO 80214 P. O. Box 36, 104 Harrison St., Boon, MI
7549-27th Ave. NW, Seattle, WA 98117
(CO) 49618 (CH)
(PNW)
Makoto Tanase Daniel Dion Greg Fischer
Tokai Television Broadcasting Co. Ltd., 230 Rte. 195, St-Rene de Matane, G0J 3E0, 2550 So. 36th St., Quincy, IL 62305 (CH)
Higashi sakura 1-14-27, Nagoyo-shi, Aichi- Quebec City, Canada (MGU) Noel Flatt
ken, Japan (JA) 69 Jensen Rd., Watertown, MA 02472 (BOS)
Stacey Dockery
Ronald Thomas 3640 Eden Dr., Dallas, TX 75287
840 N. Hunter Dr. #4, W. Hollywood, CA Joel Foner
90069 (LA) James Dodd 4 Townsend Circle, Natick, MA 01760
260 Tower Rd., Marietta, GA 30060 (AT) (BOS)
Jeremy Todd
100 Memorial Dr. Apt. 8-15C, Cambridge, Andy Dolph Lorenzo Fontanesi
MA 02142 (BOS) 2A Waldron Ct., Dover, NH 03820 (BOS) via del Mercato 1, IT 42100, Reggio Emilia,
Thomas Domajnke Italy (IT)
Masaaki Tomita
Kitano 21, Tokororpzawa-shi, Saitama-ken 7400 N. U.S. Hwy. 1 #105, Cocoa, FL 32927 Philip Forchelli
359-1152, Japan (JA) Eric Dubowsky 859 So. Cottonwood Rd., Walnutport, PA
17 John St. #11D, New York, NY 10038 18088 (PHIL)
Thanh Tran
9707 Palacios Court, Houston, TX 77064 (NY) Tom Fowlston
Matthew Duda 315 N. Beckley Ave. , DeSoto, TX 75115
Kiyohito Uno
Muraakiyama 3-16-2, Izumi-Ku Sendai-shi, 41463 Crabtreet Ln., Plymouth, MI 48170 Jacob Freeman
Miyagi-ken 981-3205, Japan (JA) (DET) 55 E. 10th St. #506E, New York, NY 10003
Hiroshi Uto Mike Duff (NYU)
Hanazono 2-8-15 #101, Hanamigawa-ku, 2 Hively, Eureka Springs, AR 72632
Martin Frey
Chiba-shi, Chiba-ken 262-0025, Japan (JA) Steve Dupaix 500 Bonerwood Dr., Nashville, TN 37211
John M. Wallace IV P. O. Box 9660, Salt Lake City, UT 8410 (NA)
6615 English Hills Dr. 1B, Charlotte, NC (UT)
Terris Friedman
28212 (AT) Roderick Durham 415 E. 37th St., New York, NY 10016 (NY)
Terri Winston 21640 Frame Square, Ashburn, VA 20148
(DC) Olivier Fruttero
499 Alabama St. #108, San Francisco, CA P. O. Box 4283, Sabanitas, Colon, Panama
94110 (SF) Walter Dvorak
Simon Woollard Habichergasse 32, AT 1160, Vienna, Austria Brian Frye
14 Grosvenor Gardens, St. Neots, (AU) 4915 Ansley Ln., Cumming, GA 30040 (AT)
Cambridgeshire, PE19 1DL, UK (BR) H. Emerson Dwellingham John Frye
2411 Pleasant Hollow, Plainsboro, NJ 08536 3020 Willow Wisp Way, Cumming, GA
ASSOCIATES (NY) 30040 (AT)
Hermano N. Dantas Jr. Jonathan Erickson Miwa Fukino
Rua Gararu 726, Getulio Vargas, Aracaju, 4316 Sycamore, Dallas, TX 75204 Panasonic
Sergipe 49055300, Brazil (BZ) Andre Espindola 6F Kyobashi National Bldg., 2-13-10
Susan Darney Rua Tome de Souza 1234/1802, Belo Kyobashi, chuo-ku, Tokyo 104-0031, Japan
102 Thresher Ct., Cary, NC 27513 (AT) Horizonte, MG 30140131, Brazil (BZ) (JA)

452 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


MEMBERSHIP
INFORMATION
Eric Gaalaas Pete Greg
18 Hartford St., Bedford, MA 01730 (BOS) AmeriCom Inc., 308 Lafayette Freeway, St.
Paul, MN 55107 UMW)
Jonathan Gardner
612 N. White Oak Ave., Midwest City, OK Gary Griffey
73130 1045 Homer Scott Rd., Bethpage, TN 37022
(NA)
Mauricio Gargel
Rua Oratorio 522 ap 601, Sao Paulo, SP Thomas Grub
03116-010, Brazil (BZ) 5 Brixton St., Flemington, Victoria 3031,
Australia (SYD)
Henri Gauci
6607 106A Ave., Edmonton, T6A 1K2, Chad Gumeringer
Alberta, Canada (AB) 418 Weatherby Way, Bismark, ND 58501
(UMW)
Hagai Gefen
6265 Variel Ave., Woodland Hills, CA Suha Gur
91367 (LA) 726 Nathan Hale Ave., Laurenceville, NJ
08648 (NY)
Otfried Geffert
Geffert Digital Engineering, Osterholder Str. Srikanth Gurrapu
17, DE 25482, Appen, Germany 5353 Keller Springs Rd. #923, Dallas, TX
75248
Stanley George
Rodney Gurule
P. O. Box 118773, Carrollton, TX 75011
5620 Blue Pine Rd. NW, Albuquerque, NM
Andreas J. Gerrits 87120
Philips Research, WO-02, Prof. Holstlaan 4,
Jari Hagqvist
NL 5656 AA, Eindhoven, Netherlands (NE)
Nokia Visiokatu 1, FI 33720, Tampere,
Joe Giardiello Finland (FIN)
61 Downing Dr., Wyomissing, PA 19610 Garrett Haines
(PHIL) 937 William Penn Ct., Pittsburgh, PA 15221
Allen Gibbs (PHIL)
P. O. Box 250064, Plano, TX 75025 Steve Hancock
Brad Gibbs 8215 Heather Ln., Mebane, NC 27302 (AT)
1500 Washington St. # 9J, Hoboken, NJ Zach Harper
07030 (NY) 7473 Harrow Dr., Nashville, TN 37221 (NA)
Enric Gine Guix Christopher Hastings
c/Arago 291 1er 2ona, ES 08009, Barcelona, 3783 Redwood Ave., Los Angeles, CA
Spain (SPA) 90066 (LA)
Leroy Gipson
P. O. Box 15414, Atlanta, GA 30333 (AT)
Karl Heilbron
5825 SW 8th St. 2nd Flr., Miami, FL 33144
Advertiser
Mark Gladden David Henderson
Signature Sight and Sound, P. O. Box 23567,
Charlotte, NC 28227 (AT)
Tonesoup Productions, 57 Cheltenham Way,
Avon, CT 06001 (NY) Internet
Andrew Goeppner David Hermann
8600 Frazier Dr., Plain City, OH 43064
(DET)
90 Wyndham St. S. # 909, Guelph, N1E 7H7,
Ontario, Canada (TOR) Directory
Ana Gonzalez Miguel Hernandez
C/Ramon Baudet Grandy 2, 3C, ES 38004, New Art Digital, Miguel Serrano 25A, Col
Santa Cruz de Tenerife, Spain (SPA) Del Valle, D.F. 01000, Mexico (MEX)
*Prism Media Products, Inc. ..............415
Rodrigo A. Gonzalez Osorio Stephan Herzog www.prismsound.com
Vargas Buston 566, San Miguel, Santiago, Friedrichstrasse 4, DE 67655, Kaiserslautern,
Chile (RCH) Germany *Neutrik, AG ........................................441
Christian Hessler www.neutrik.com
Johanne Goyette
3116 W. 112th Court, Unit F, Westminster,
9 Plac Cambrai, Outremont, H2V 1X4, *That Corporation...............................453
CO 80031 (CO)
Quebec, Canada (MGU) www.thatcorp.com
Rob Hislop
Charles Greene 101 W 23rd St. #2314, New York, NY 10011
4915 Broadway #3N, New York, NY 10034 (NY)
(NY)
Tim Hochstedler
Dale Greg 3203 SE Woodstock #1517, Portland, OR
AmeriCom Inc., 308 Lafayette Freeway, St. 97202 (POR)
Paul, MN 55107 (UMW)
Volker Hochwald
Jim Greg Alpine Electronics Europe GmbH, Vor dem *AES Sustaining Member.
AmeriCom Inc., 308 Lafayette Freeway, St. Lauch 14, DE 70567, Stuttgart, Germany
Paul, MN 55107 (UMW) (SG) ➥
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 453
MEMBERSHIP
INFORMATION
Auke E. Hoekstra Insung Jung Roman E. Klun
Paets van Troostwijkstraat 135-A, NL 2522, 8674 Pinyon St., Buena Park, CA 90620 38 King Street E., P. O. Box 66600, Stoney
DP S Gravenhage, Netherlands (NE) (LA) Creek, L8G 5E6, Ontario, Canada (TOR)
David Horrocks Mark Kalinowski Boney Knox Lenox
340-19th St. NE, Calgary T2E 4W9, Alberta, 205 Jefferson Ave., Bristol, PA 19007 96 Vanderveer St., Brooklyn, NY 11207
Canada (AB) (PHIL) (NY)
Bill Houston Yogesh Kamat Mhamai Christopher Konovaliv
624 Georges Ln., Ardmore, PA 19003 Impulsesoft Pvt. Ltd. 690, 15th Cross llnd 1719 Jackson Ave. #1, Ann Arbor, MI 48103
(PHIL) Phase JP Nagar, Bangalore, Karnataka (DET)
Gerard Hranek 560078, India (IND) Danny Kopelson
18318 NE 146th Way, Woodinville, WA Hafthor S. Karisson 65 W. 112th St. #3H, New York, NY 10025
98072 (PNW) Karfavogur 54, IS 104, Reykjavik, Iceland (NY)
Austin Hu Brent Karley Rena N. Kozak
6337 Odessa Dr., West Bloomfield, MI 7117 Bending Oak Rd., Austin, TX 78735 2511 39th St. NE, Calgary, T1Y 3K9,
48324 (DET) Alberta, Canada (AB)
Georgios Karpouzas
Darin Hughes James Kramer
Delfwin 56, GR 54638, Thessaloniki, Greece
2625 Lafayette Ave., Winter Park, FL 32789 16265E. Balsam Dr., Fountain Hills, AZ
(GR)
85268
Seraphim Huling Pitsch Karrer
15 Violet Circle, Milford, MA 01757 (BOS) 108 Park Terrace East #4F, New York, NY Allan Kraut
10034 (NY) 25 Parkdale Ave., Boston, MA 02134 (BOS)
Jason Hurley
887 Oak Valley Ln., Nashville, TN 37220 Satoh Kazue Timothy P. Krips
(NA) Neya 676-73, Neyagawa-shi, Osaka-fu 572- 2413 Beach Rd., Walled Lake, MI 48390
0801, Japan (JA) (DET)
Aaron Hurst
157 Center Square Rd., Leola, PA 17540 Lew Kellogg V. Kumareswaran
(PHIL) 2008 Petworth Court, Raleigh, NC 27615 SAE Malaysia, No.10-1 Jalan USJ 9/5R,
(AT) Subang Business Centre, Subang Jaya,
Robert Iadanza Selangor 47620, Malaysia (MY)
4 Carvel Rd., Monroe, NY 10950 (NY) Antti P. Kelloniemi
Helsinki University of Technology, P. O. Box Toshifumi Kunimoto
Giovanni Imlach 2-28-17 Jouhoku, Hamamatsu-shi, Shizuoka
1000, FI 02015, TKK / Espoo, Finland (FIN)
227 E. 96th St. #2FE, New York, NY 10128 432-8011, Japan (JA)
(NY) Jason Kemmerer
646 Benson Way, Thousand Oaks, CA 91360 Chee Meng Kwan
Morgan Inman (LA) No.7 Jalan Nikmat, Happy Garden, Jalan
Man Made Music, 16 W. 46th St. 6th Flr., Kuchai Lama, Kuala Lumpur 58200,
New York, NY 10036 (NY) Gregory J. Kephart
11334 Niland Pass, Ft. Wayne, IN 46845 Malaysia (MY)
Kazuo Izumoya (CH) Geronimo Labrada
Takaido-higashi, 3-14-17, Suginami-ku, Exequiel Fernandez 1624 dpto 62-A,
Tokyo 168-0072, Japan (JA) James Keyes
194 S. Buckhout St., Irvington, NY 10533 Santiago, Chile (RCH)
Jerry Jackman (NY) Rodney Ladson
1061 E 1100 N., Orem, UT 84097 (UT) 4807 51st Pl., Hyattsville, MD 20781 (DC)
Gary Khan
Dean Jakubowicz 1231 W. Barry, Chicago, IL 60657 (CH) Rene Laflamme
AmeriCom Inc., 308 Lafayette Freeway, St. Youngtae Kim 9276 La Jeunesse St. 5, Montreal, H2M 1S2,
Paul, MN 55107 (UMW) SAIT, P. O. Box 111, Suwon, Korea (RK) Quebec, Canada, (MGU)
Eric Jensen Sung-Jun Kim Gary Lamb
20-53 19th St. #2A, Astoria, NY 11105 (NY) Northern Caribbean University, Music Baderstrasse 4, CH 8400, Winterthur,
Jin Soo. Jeon Department, Mandeville, Jamaica Switzerland (SWI)
203 Edgeland St., Rochester, NY 14609 Renzo Larrauri Cunza
Patrick Kimball
(NY) Hondurad 301, Lima 18, Peru (PER)
2738 Winston Rd., Oklahoma City, OK
Samuel Johnson 73120 Drew Lavyne
33 Pine Grove, Amherst, MA 01102 (BOS) 9 E. 13th St., Suite 5I, New York, NY 10003
Joshua King (NY)
Marty Johnson 8780 N. Mount Tabor Rd., Ellettsville, IN
143 Durham Hall, Dept. Mech. Eng. Virginia 47429 (CI) Sergio R. Ledesma
Tech., Blacksburg, VA 24061 (DC) Arquitectos 26-3, Colonia Escandon, D.F.
Charles King 01800, Mexico (MEX)
Eric Johnson 11 Oakwood Ln., Greenwich, CT 06830
1410 Midtown Place, Raleigh, NC 27609 (BOS) Hyuck J. Lee
Samsung Electronics Co. Ltd., 416 Maetan 3
Eric Johnson David Kingland - dong, Paldal-2u, Suwon, Korea (RK)
3401 J. St. # 3, Sacramento, CA 95816 (SF) 103 Shady Oaks Ln., Lake Mills, IA 50450
Michael Lee
Echo Jong (UMW)
3132 David Ave., Palo Alto, CA 94303 (SF)
23 Hsin Hua Rd., Taoyuan 330, Taiwan Bjarni B. Kjartansson David Lee
Larry Josephson Hringbraut 37, IS 107, Reykjavik, Iceland Concept Associates Sdn Bhd, 44 Jalan
Radio Foundation Inc., 1 W. 89th St., New Gregory Klas Sungkai, off Jalan Terengganu, Georgetown,
York, NY 10024 (NY) 8820 Sunset Dr., Colden, NY 14033 (NY) Penang 10460, Malaysia (MY)

454 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


MEMBERSHIP
INFORMATION
Jamie Lendino Marshall Simmons Anne-Marie Sylvestre
128 Bay 13th St., Brooklyn, NY 11214 (NY) 224 Varsity N., Bowling Green, OH 43402 6251 de Normanville, Montreal, H2S 2B7
Rich Leone (UC) Quebec, Canada (MGU)
12810 5th St. N, Lake Elmo, MN 55042 Steven Simpson Dennes K. Szilagyi
(UMW) 4837 Perina Way, North Highlands, CA Dirk Vreekenstraat 53, NL 1019 DP,
95660 (ARC) Amsterdam, Netherlands (NES)
Robert Lessick
Randy Tan
178 Dougbeth Dr., Ronkonkoma, NY 11779 Jasmina Sironja
23 Kent Pl. #6, Menlo Park, CA 94025 (SFU)
(NY) Bul. Umetnosti 11, YU 11070, Novi
Beograd, Yugoslavia Anton Teljeback
Opal Leung Korsgatan 13 A, SE 903 36, Umea, Sweden
123 Williams St., Jamaica Plain, MA 02130 Andy Skies Nenad Terentic
(BOS) 5102 W. Piedmont, Laveen, AZ 85339 Ralja, YU 11311, Smederevo B.B.,
Chin-Ping Liao (CRAS) Yugoslavia
P. O. Box 10-35 Nei Hou, Taipei 114, Keith Smith Nicolas Thelliez
Taiwan Naval Media Center, PSC 4 66 Box 14, FPO 146 avenue de Flandre, FR 75019, Paris,
Heung-Guy Lim AP 96595, Virgin Islands France (CPS)
881 Constitution Dr., Foster City, CA 94404 Ann Smith Katrina Thomas
(SF) 8210 Rambleton Way, Antelope, CA 95843 3629 Sweet Grass Circle #6031, Winter Park,
Yat Fung Jim Yam (ARC) FL 32792 (FS)
Flat D 33/F Tower 2 New Haven, Hong Cameron Snow Michael Thompson
Kong (HK) 50 Mill Ln., Huntington, NY 11743 (IAR) 1639 Birch St., State College, PA 16801
Geof Lipman (PSU)
Frederic Soulard
425 Prospect Place Unit 5H, Brooklyn, NY 27 avenue Secretan, FR 75019, Paris, France Anastasia Timofeyeva
11238 (NY) (CPS) 5425 Kohler Rd., Sacramento, CA 95841
Jeff Lipton (ARC)
Austin Sousa
Peerless Mastering, 1085 Commonwealth 2205 2nd Ave. #605, Seattle, WA 98121 Donald E. Tonelli
Ave # 322, Boston, MA 02215 (BOS) (TAIS) 8040 Sunrise Blvd. #A, Citrus Heights, CA
Ed Littman 95610 (ARC)
Esther L. Spence
250 W 10th St., New York, NY 10014 (NY) 6696 W. Long Dr., Littleton, CO 80123 Milan Trifunovic
David Logvin (UCDEN) Kumanovska 28/1, YU 11420, Smederevska
21 Glen Ave., Chelmsford, MA 01824 (BOS) Palanka, Yugoslavia
Joshua Srago
Richard Long 1600 Holloway Ave., El Cerrito, CA 94132 Brandon Trowel
2904 Austin Ave., Waco, TX 76710 (USF) 7313 Holworthy Way, Sacramento, CA
95842 (ARC)
Robert Long Jason J. Steeber
1583 Semoran North Circle # 203, Winter 1810 Bauman Rd., Quakertown, PA 18951 Miki Tsutsumi
Park, FL 32792 (PSU) 30-10 28th Rd. #2, Astoria, NY 11102 (IAR)
Jason Love Ise K. Steffensen Ryan M. Tucker
10719 Green Valley Rd., Union Bridge, MD Oernevej 4, 3 Fl., DK 2400 NV, Copenhagen, 1402 Country Club Rd. #38, Norfolk, NE
21791 (DC) Denmark (DAS) 68701

John Lucas James Vajda


Aaron T. Stefl
2 Princeton Ct., Tinton Falls, NJ 07724 (NY) 168 Mass. Ave., SB# 2929, Boston, MA
7541 Quail Nest Pl., Citrus Heights, CA
02115 (BCM)
Ricciano Lumpkins 95610 (ARC)
Platinum Work Projects LLC, P. O. Box Alex Valdes
Kevin Steller 5121 N. 40th St. #B-214, Phoenix, AZ 85018
948182, Maitland, FL 32794 6 Park Pl., Newburgh, NY 12550 (FRE) (CRAS)
Jennifer Lyons Maya Stevens Dan Valente
411 Camino del Rio South, Suite 301, San 564 Price St., Daly City, CA 94014 (CPC) 1033 Danby Rd. #1-02, Ithaca, NY 14850
Diego, CA 92108 (SD)
Daniel Stover (FRE)
Martin F. Macheiner 229 I-45 N. #102, Conroe, TX 77304 Elisa Valentini
Gartengasse 45, AT 2721, Bad Fischau -
Gerda Strobl via delle Cornacchie 927/01, IT 55100, San
Brunn, Austria (AU)
Lessingstrasse 12/1, AT 8010, Graz, Austria Marco - Lucca, Italy (ITS)
Anthony Magrath (GZ) Daniel Van Ampting
Flat 4, 52 Bath St., Edinburgh EH15 1HF,
Charles Stumbagh 901 DeKleva Dr., Apopka, FL 32712 (FS)
UK (BR)
222 SE Main #5, Mesa, AZ 85213 (CRAS) Raymond van Weeghel
Dominick Maita
Shiva Sundaram Klinkenberg 4, NL 8091 GV, Wezep,
235 E. 22nd St. 7L, New York, NY 10010
Signal and Image Processing Institute, Netherlands (NES)
(NY)
University of Southern California, EEB400, Jeroen M. van Zaalen
STUDENTS 340 Mclintock Ave., Los Angeles, CA 90089 Evert de Bruijnstraat 108, NL 1411 TE,
(USC) Naarden, Netherlands (NES)
Jeff Siarto
821 Bay Pointe Dr., Oxford, MI Bobby Sweeney Vincent Verderame
48371(MTU) 702 Douglas, Stanton, NE 68779 57 Springcress Dr., Delran, NJ 08075 (WPU) ➥

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 455


MEMBERSHIP
INFORMATION In Memoriam
Lisa G. Vitti
145 Bay Ridge Parkway #2, Brooklyn, NY
11209 (IAR)

G
Barbara Vlahides len Akins, AES life member, accomplished with speed, innova-
301 E. 22nd St., #11D, New York, NY 10010 died of heart failure on tion, and economy.
(IAR) November 14, 2003, in Los Akins retired in 1977 after leading
Aaron Vlasnik Angeles, CA. He was 87 years old. his departments with a calm, wise,
801 E. Benjamin, Dorm Box 218, Norfolk, Atkins was born in York, PA, in and humanistic statesmanship that
NE 68701
1916. While attending Gettysburg belied the frenzy of television broad-
Jelle Vlietstra College, he worked as a theater pro- casting. He then traveled the U. S.
Appelhof 7, NL 9201 KT, Drachten, jectionist, which spurred his interest in and worldwide with his wife Alice,
Netherlands (NES)
all things audio. Trained as an elec- visiting all continents but Antarctica.
Robin Watkins tronics engineer, Glen was employed Audio was equally important as
467 Napoleon Ave., Columbus, OH 43213 by the International Telephone and video in Glen’s mind. He encouraged
(HPTU)
Telegraph in their short wave and promoted the design and con-
Bethany Watson radio transmitter final test department. struction of production audio con-
2827 N. Spaulding #3, Chicago, IL 60618 In 1942, he joined the Office of soles and communication systems,
(CC)
War Information, and after a series which were not available commer-
Felipe Wernet of harrowing transportation incidents cially at the time. At home, he was an
Schrijnwerker 12, NL 3201 TK, Spijkenisse,
spent four years in China setting up avid gardener, an ardent amateur
Netherlands (NES)
and maintaining radio transmitting radio operator, and a hi-fi audio
Daniel White equipment, which often involved enthusiast. Loudspeaker systems
1135 S. 15th #4, Lincoln, NE 68502
dubious political consequences. This were his particular interest. He built
Felton White position involved associations with much of his own equipment. Ham
P.O. Box 5275, River Forest, IL 60305 (CC) prominent journalists and political swap meets and conventions contin-
Dennis Wikstroem figures in the area, which led to an ued to be a part of his life. He never
Ankarskatavagen 83 A, SE 941 31, Pitea, additional post of gathering and stopped learning.
Sweden (ULP) broadcasting news from China for Don McCroskey
Andreas Woerle the CBS radio network.
Zimmerplatzgasse 1, AT 8010, Graz, Austria In 1948, Glen joined the sound oseph Habig, AES life mem-
(GZ)
Endale Worku
1831 8th Ave. #502, Seattle, WA 98101
(TAIS)
department of RKO Studios in Holly-
wood, CA. When commercial televi-
sion became a reality, he moved to
J ber, died of Parkinson’s disease
on September 21, 2003, in Tin-
ton Falls, NJ. He was 79 years old.
ABC television, Hollywood, in 1951. Born in New York City, Habig
Christopher Wraith Hired as a video projectionist, he received a bachelor’s degree in music
The Coach House, 46 Painshawfield Rd., rapidly advanced to positions involv- education at the City College of New
Stocksfield, Northumberland, NE43 7QY,
UK ing the planning and maintenance of York. He also attended Julliard and
new equipment and facilities. He the Manhattan School of Music.
Fan Wu developed a working 3-D video sys- Trained as a classical musician, he
P. O. Box 147, Beijing Broadcasting Institute,
Beijing 100024, Peoples Republic of China tem, which was demonstrated at the worked with many of the top classi-
1953 NAB convention. cal artists during his career as a
Phillip Yarrow
In 1960, he designed the first producer. He also recorded major
750 Font Blvd. #C332, San Francisco, CA
94132 (SFU) video/audio/machine control popular and jazz artists.
routing switcher that served 24 A Grammy Award-winning record
Chris Yates
297 Commssioners Rd. E., London, N6C
sources to six control rooms. producer, Habig won the award for
2T3, Ontario, Canada Automation of video and audio his recording of Stravinsky’s Sym-
switching for the local station phony of Psalms. For 10 years he
Anthony N. Yeager
813 Sullivan Dr., Lansdale, PA 19446 control room was another pro- played the trombone with various
ject that involved unique symphony orchestras. He later
Emi S. Yonemura approaches. Many landmark became an artists’ and repertoire pro-
5312 Markwood Ln., Fair Oaks, CA 95628
(ARC) broadcasts bore his fingerprints, ducer at RCA Victor Red Seal for 19
such as the 1960 Democratic years (1954 to 1973). After that he
Luis G. Zamora Manriquez
convention, Nixon Kennedy de- worked as an executive producer at
Avenida Chillan 27 85, Independencia,
Santiago, Chile bates, and the Wide World of Reader’s Digest Music Division for
Sports. The conversion to color 16 years. He is survived by his wife,
Jose R. Zapata Gonzalez
cra 64A #39-15 Apto. 102, Medellin,
television under his aegis was Virginia Harlow Habig.
Antioquia, Colombia (LAU)

456 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


AUDIO ENGINEERING SOCIETY 26th Conference
October 1–3

CALL for PAPERS


AES 26TH CONFERENCE, 2004
High-Quality Analog Audio Processing
Baarn, The Netherlands
Dates: October 1–3, 2004, Location: Baarn, The Netherlands
Chair: Ben Kok, Dorsserblesgraaf, NL, Email: 26th_chair@aes.org

The AES 26th International Conference intends to explore the new insights in analog audio technology that have contributed to the
overall increase in the subjective and objective quality of modern digital audio systems. The resolution of digital audio systems,
both in the time domain and in the amplitude domain, has undergone a spectacular improvement in recent years. Because nearly
all digital audio signals are derived from analog microphone signals, the recording industry has directed new efforts to the design
of low-level and line-level analog circuitry to keep up with the increasing demands of the digital audio world. In particular, analog
microphone amplifiers and associated circuits such as line drivers, power supplies, cables, etc., have to match the quality of the
modern high-resolution A-to-D converters. Recent years have also seen an increase in the attention paid to a system’s ability to
create an illusion of depth, of space surrounding the performers, and of the feeling of “being there.” The relationship of these sub-
jective experiences with aspects of the design of the equipment will be one of the main topics of this conference. Because of the
subjective nature of this field, a preference will be given to papers that combine a lecture with a listening demonstration. For these
demonstrations three identical first-class listening rooms equipped for stereo listening will be available at the Polyhymnia Studios.
The AES 26th Conference Committee invites submission of technical papers and proposals for demonstrations at the conference in
October 2004 in Baarn. By 2004 May 17, a proposed title, 60- to 120-word abstract, and 500- to 1000-word precis of the paper
should be submitted via the Internet to the AES26th Conference paper-submission site at www.aes.org/26th_authors. You can
visit this site for more information and complete instructions for using the site anytime after 2004 March 19. The author’s informa-
tion, title, abstract, and precis should all be submitted online. The precis should describe the work performed, methods employed,
conclusion(s), and significance of the paper. Titles and abstracts should follow the guidelines in Information for Authors at
www.aes.org/journal/con_infoauth.html. Acceptance of papers will be determined by the 26th Conference review committee
based on an assessment of the abstract and precis.

PROPOSED TOPICS FOR PAPERS


New Circuits and Topologies EMC Considerations
Feedback: Bane or Blessing Passive Components
Power Supplies and Audible Effects Tubes and Transistors
Discrete versus Integrated Circuits Connectors and Contacts
Filters, Bandwidth Effects Transformers
Correlation Between Measurements and Cables
Listening Experiences Lay-out and Mechanical Design
SUBMISSION OF PAPERS SCHEDULE
Please submit proposed title, abstract, and precis Proposal deadline: 2004 May 17
at www.aes.org/26th_authors no later than 2004 Acceptance emailed: 2004 June 9
May 17. If you have any questions, contact:
Paper deadline: 2004 July 19
PAPERS COCHAIRS
Tom Magchielse Peter van Willenswaard Authors whose contributions have been
Consultant Audiomagic accepted for presentation will receive
The Netherlands The Netherlands additional instructions for submission
Email: 26th_papers@aes.org of their manuscripts.

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 457


SECTIONS CONTACTS
DIRECTORY
The following is the latest information we have available for our sections contacts. If you
wish to change the listing for your section, please mail, fax or e-mail the new information
to: Mary Ellen Ilich, AES Publications Office, Audio Engineering Society, Inc., 60 East
42nd Street, Suite 2520, New York, NY 10165-2520, USA. Telephone +1 212 661 8528,
ext. 23. Fax +1 212 661 7829. E-mail MEI@aes.org.
Updated information that is received by the first of the month will be published in the
next month’s Journal. Please help us to keep this information accurate and timely.

EASTERN REGION, AES Student Section Fax +1 973 720 2217 PENNSYLVANIA
USA/CANADA Peabody Institute of Johns E-mail wpu@aes.org Carnegie Mellon University
Hopkins University Section (Student)
Vice President: Recording Arts & Science Dept. NEW YORK
Thomas Sullivan
Jim Anderson 2nd Floor Conservatory Bldg. Fredonia Section (Student) Faculty Advisor
12 Garfield Place 1 E. Mount Vernon Place Bernd Gottinger, Faculty Advisor AES Student Section
Brooklyn, NY 11215 Baltimore, MD 21202 AES Student Section Carnegie Mellon University
Tel. +1 718 369 7633 Tel. +1 410 659 8100 ext. 1226 SUNY–Fredonia University Center Box 122
Fax +1 718 669 7631 E-mail peabody@aes.org 1146 Mason Hall Pittsburg, PA 15213
E-mail vp_eastern_usa@aes.org Fredonia, NY 14063 Tel. +1 412 268 3351
MASSACHUSETTS Tel. +1 716 673 4634 E-mail carnegie_mellon@aes.org
UNITED STATES OF Berklee College of Music Fax +1 716 673 3154
E-mail fredonia@aes.org Duquesne University Section
AMERICA Section (Student) (Student)
Eric Reuter, Faculty Advisor Institute of Audio Research Francisco Rodriguez
CONNECTICUT Berklee College of Music Section (Student) Faculty Advisor
University of Hartford Audio Engineering Society Noel Smith, Faculty Advisor AES Student Section
c/o Student Activities
Section (Student) AES Student Section Duquesne University
Timothy Britt 1140 Boylston St., Box 82 Institute of Audio Research School of Music
Faculty Advisor Boston, MA 02215 64 University Pl. 600 Forbes Ave.
AES Student Section Tel. +1 617 747 8251 New York, NY 10003 Pittsburgh, PA 15282
University of Hartford Fax +1 617 747 2179 Tel. +1 212 677 7580 Tel. +1 412 434 1630
Ward College of Technology E-mail berklee@aes.org Fax +1 212 677 6549 Fax +1 412 396 5479
200 Bloomfield Ave. E-mail iar@aes.org E-mail duquesne@aes.org
West Hartford, CT 06117 Boston Section
Tel. +1 860 768 5358 J. Nelson Chadderdon New York Section Pennsylvania State University
Fax +1 860 768 5074 c/o Oceanwave Consulting, Inc. Bill Siegmund Section (Student)
E-mail aes@hartfordaes.org 21 Old Town Rd. Digital Island Studios Dan Valente
Beverly, MA 01915 71 West 23rd Street Suite 504 AES Penn State Student Chapter
FLORIDA Tel. +1 978 232 9535 x201 New York, NY 10010 Graduate Program in Acoustics
Full Sail Real World Fax +1 978 232 9537 Tel. +1 212 243 9753 217 Applied Science Bldg.
Education Section (Student) E-mail boston@aes.org E-mail new_york@aes.org University Park, PA 16802
Bill Smith, Faculty Advisor New York University Section Home Tel. +1 814 863 8282
AES Student Section University of Massachusetts (Student) Fax +1 814 865 3119
Full Sail Real World Education –Lowell Section (Student) Robert Rowe, Faculty Advisor E-mail penn_state@aes.org
3300 University Blvd., Suite 160 John Shirley, Faculty Advisor Steinhardt School of Education
Winter Park, FL 327922 AES Student Chapter 35 West 4th St., 777G Philadelphia Section
Tel. +1 800 679 0100 University of Massachusetts–Lowell New York, NY 10012 Rebecca Mercuri
E-mail full_sail@aes.org Dept. of Music Tel. +1 212 998 5435 P. O. Box 1166.
35 Wilder St., Ste. 3 E-mail nyu@aes.org Philadelphia, PA 19105
University of Miami Section Lowell, MA 01854-3083 Tel. +1 215 327 7105
(Student) Tel. +1 978 934 3886 NORTH CAROLINA E-mail philly@aes.org
Ken Pohlmann, Faculty Advisor Fax +1 978 934 3034
AES Student Section E-mail umass_lowell@aes.org Appalachian State University VIRGINIA
University of Miami Section (Student) Hampton University Section
School of Music Worcester Polytechnic Michael S. Fleming (Student)
PO Box 248165 Institute Section (Student) Faculty Advisor Bob Ransom, Faculty Advisor
Coral Gables, FL 33124-7610 William Michalson Sonaura Sound AES Student Section
Tel. +1 305 284 6252 Faculty Advisor 152 Villafe Drive Hampton University
Fax +1 305 284 4448 AES Student Section Boone, NC 28607 Dept. of Music
E-mail miami@aes.org Worcester Polytechnic Institute Tel. +1 828 263 0454 Hampton, VA 23668
100 Institute Rd. E-mail appalachian@aes.org Office Tel. +1 757 727 5658,
GEORGIA Worcester, MA 01609 +1 757 727 5404
Tel. +1 508 831 5766 University of North Carolina
Atlanta Section at Asheville Section (Student) Home Tel. +1 757 826 0092
Robert Mason E-mail wpi@aes.org Fax +1 757 727 5084
Wayne J. Kirby
2712 Leslie Dr. Faculty Advisor E-mail hampton_u@aes.org
Atlanta, GA 30345 NEW JERSEY
AES Student Section
Tel./Fax +1 770 908 1833 University of North Carolina at WASHINGTON, DC
William Paterson University
E-mail atlanta@aes.org Section (Student) Asheville American University Section
David Kerzner, Faculty Advisor Dept. of Music (Student)
MARYLAND AES Student Section One University Heights Rebecca Stone-gordon
Peabody Institute of Johns William Paterson University Asheville, NC 28804 Faculty Advisor
Hopkins University Section 300 Pompton Rd. Tel. +1 828 251 6487 AES Student Section
(Student) Wayne, NJ 07470-2103 Fax +1 828 253 4573 American University
Neil Shade, Faculty Advisor Tel. +1 973 720 3198 E-mail north_carolina@aes.org 4400 Massachusetts Ave., N.W.
458 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
SECTIONS CONTACTS
DIRECTORY
Washington, DC 20016 Faculty Advisor University of Michigan OHIO
Tel. +1 202 885 3242 AES Student Section Section (Student) Cincinnati Section
E-mail american_u@aes.org 676 N. LaSalle, Ste. 300 Jason Corey, Dan Scherbarth
Chicago, IL 60610 Faculty Advisor Digital Groove Productions
District of Columbia Section Tel. +1 312 344 7802 University of Michigan School
John W. Reiser 5392 Conifer Dr.
Fax +1 312 482 9083 of Music Mason, OH 45040
DC AES Section Secretary E-mail columbia@aes.org 1100 Baits Drive
P. O. Box 169 Tel. +1 513 325 5329
University of Illinois at Ann Arbor, MI 48109 E-mail cincinnati@aes.org
Mt. Vernon, VA 22121-0169 E-mail univ_michigan@aes.org
Tel. +1 703 780 4824 Urbana-Champaign Section Ohio University Section
Fax +1 703 780 4214 (Student) West Michigan Section (Student)
E-mail dc@aes.org Mark Hasegawa-Johnson, Carl Hordyk Erin M. Dawes
Faculty Advisor Calvin College AES Student Section
CANADA AES Student Section 3201 Burton S.E. Ohio University, RTVC Bldg.
University of Illinois, Urbana- Grand Rapids, MI 49546 9 S. College St.
McGill University Section Champaign Tel. +1 616 957 6279 Athens, OH 45701-2979
(Student) Urbana, IL 61801 Fax +1 616 957 6469 Home Tel. +1 740 597 6608
John Klepko, Faculty Advisor E-mail urbana@aes.org E-mail west_mich@aes.org E-mail ohio@aes.org
AES Student Section
McGill University INDIANA MINNESOTA University of Cincinnati
Sound Recording Studios Section (Student)
Ball State University Section Music Tech College Section Thomas A. Haines
Strathcona Music Bldg. (Student) (Student)
555 Sherbrooke St. W. Faculty Advisor
Michael Pounds, Faculty Advisor Michael McKern AES Student Section
Montreal, Quebec H3A 1E3 AES Student Section Faculty Advisor
Canada University of Cincinnati
Ball State University AES Student Section College-Conservatory of Music
Tel. +1 514 398 4535 ext. 0454 MET Studios Music Tech College
E-mail mcgill_u@aes.org M.L. 0003
2520 W. Bethel 19 Exchange Street East Cincinnati, OH 45221
Toronto Section Muncie, IN 47306 Saint Paul, MN 55101 Tel. +1 513 556 9497
Anne Reynolds Tel. +1 765 285 5537 Tel. +1 651 291 0177 Fax +1 513 556 0202
606-50 Cosburn Ave. Fax +1 765 285 8768 Fax +1 651 291 0366 E-mail Cincinnati@aes.org
Toronto, Ontario M4K 2G8 E-mail ball_state@aes.org E-mail
Canada musictech_student@aes.org TENNESSEE
Tel. +1 416 957 6204 Central Indiana Section Belmont University Section
James Latta Ridgewater College,
Fax +1 416 364 1310 Hutchinson Campus Section (Student)
E-mail toronto@aes.org Sound Around Wesley Bulla, Faculty Advisor
6349 Warren Ln. (Student)
Dave Igl, Faculty Advisor AES Student Section
Brownsburg, IN 46112 Belmont University
CENTRAL REGION, Office Tel. +1 317 852 8379 AES Student Section
F Ridgewater College, Hutchinson Nashville, TN 37212
USA/CANADA Fax +1 317 858 8105 E-mail Belmont@aes.org
E-mail central_indiana@aes.org Campus
2 Century Ave. S.E. Middle Tennessee State
Vice President: KANSAS Hutchinson, MN 55350 University Section (Student)
Frank Wells E-mail ridgewater@aes.org Phil Shullo, Faculty Advisor
Kansas City Section
2130 Creekwalk Drive Jim Mitchell AES Student Section
Murfreesboro, TN Upper Midwest Section Middle Tennessee State University
Custom Distribution Limited Greg Reierson
Tel. +1 615 848 1769 12301 Riggs Rd. 301 E. Main St., Box 21
Rare Form Mastering Murfreesboro, TN 37132
Fax +1 615 848 1108 Overland Park, KS 66209 4624 34th Avenue South
E-mail vp_central_usa@aes.org Tel. +1 913 661 0131 Tel. +1 615 898 2553
Minneapolis, MN 55406 E-mail mtsu@aes.org
Fax +1 913 663 5662 Tel. +1 612 327 8750
UNITED STATES OF E-mail upper_midwest@aes.org Nashville Section
AMERICA LOUISIANA Tom Edwards
New Orleans Section MISSOURI MTV Networks
ARKANSAS Joseph Doherty St. Louis Section 330 Commerce St.
University of Arkansas at Factory Masters John Nolan, Jr. Nashville, TN 37201
Pine Bluff Section (Student) 4611 Magazine St. 693 Green Forest Dr. Tel. +1 615 335 8520
Robert Elliott, Faculty Advisor New Orleans, LA 70115 Fenton, MO 63026 Fax +1 615 335 8625
AES Student Section Tel. +1 504 891 4424 Tel./Fax +1 636 343 4765 E-mail nashville@aes.org
Music Dept. Univ. of Arkansas Cell +1 504 669 4571 E-mail st_louis@aes.org SAE Nashville Section (Student)
at Pine Bluff Fax +1 504 899 9262 Larry Sterling, Faculty Advisor
1200 N. University Drive E-mail jdoherty@accesscom.net Webster University Section AES Student Section
Pine Bluff, AR 71601 (Student) 7 Music Circle N.
Tel. +1 870 575 8916 MICHIGAN Faculty Advisor: Nashville, TN 37203
Fax +1 870 543 8108 Detroit Section Gary Gottleib Tel. +1 615 244 5848
E-mail pinebluff@aes.org David Carlstrom Webster University Fax +1 615 244 3192
DaimlerChrysler 470 E. Lockwood Ave. E-mail saenash_student@aes.org
ILLINOIS E-mail detroit@aes.org Webster Groves, MO 63119
Tel. +1 961 2660 x7962 TEXAS
Chicago Section E-mail webster_st_louis@aes.org
Tom Miller Michigan Technological Texas State University—San
Knowles Electronics University Section (Student) NEBRASKA
Marcos (Student)
1151 Maplewood Dr. Greg Piper Mark C. Erickson
AES Student Section Nebraska Section Faculty Advisor
Itasca, IL 60143 Anthony D. Beardslee
Tel. +1 630 285 5882 Michigan Technological AES Student Section
University Northeast Community College Southwest Texas State University
Fax +1 630 250 0575 P.O. Box 469
E-mail chicago@aes.org 1400 Townsend Dr. 224 N. Guadalupe St.
121 EERC Building Norfolk, NE 68702 San Marcos, TX 78666
Columbia College Section Houghton, MI 49931 Tel. +1 402 844 7365 Tel. +1 512 245 8451
(Student) Tel. +1 906 482 3581 Fax +1 209 254 8282 Fax +1 512 396 1169
Dominique J. Chéenne E-mail michigan_tech@aes.org E-mail nebraska@aes.org E-mail tsu_sm@aes.org

J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 459


SECTIONS CONTACTS
DIRECTORY
AES Student Section Stanford AES Student Section Seattle, WA 98195
WESTERN REGION, Cogswell Polytechnical College Stanford University Office Tel. +1 206 543 1218
USA/CANADA Music Engineering Technology CCRMA/Dept. of Music Fax +1 206 685 9499
Vice President: 1175 Bordeaux Dr. Stanford, CA 94305-8180 E-mail pacific_nw@aes.org
Bob Moses Sunnyvale, CA 94089 Tel. +1 650 723 4971
Tel. +1 408 541 0100, ext. 130 Fax +1 650 723 8468 The Art Institute of Seattle
Island Digital Media Group, Section (Student)
LLC Fax +1 408 747 0764 E-mail stanford@aes.org
E-mail cogswell@aes.org David G. Christensen
26510 Vashon Highway S.W. University of Southern Faculty Advisor
Vashon, WA 98070 Expression Center for New California Section (Student) AES Student Section
Tel. +1 206 463 6667 Media Section (Student) Kenneth Lopez The Art Institute of Seattle
Fax +1 810 454 5349 Scott Theakston, Faculty Advisor Faculty Advisor 2323 Elliott Ave.
E-mail vp_western_usa@aes.org AES Student Section AES Student Section Seattle, WA 98121-1622
Ex’pression Center for New Media University of Southern California Tel. +1 206 448 0900
UNITED STATES OF 6601 Shellmount St. 840 W. 34th St. E-mail
AMERICA Emeryville, CA 94608 Los Angeles, CA 90089-0851 art_institute_seattle@aes.org
ARIZONA Tel. +1 510 654 2934 Tel. +1 213 740 3224
Fax +1 510 658 3414 Fax +1 213 740 3217 CANADA
Conservatory of The E-mail expression_center@aes.org E-mail usc@aes.org
Recording Arts and Sciences Alberta Section
Section (Student) Long Beach City College COLORADO Frank Lockwood
Glenn O’Hara, Faculty Advisor Section (Student) AES Alberta Section
Nancy Allen, Faculty Advisor Colorado Section
AES Student Section Roy Pritts Suite 404
Conservatory of The Recording AES Student Section 815 - 50 Avenue S.W.
Long Beach City College 2873 So. Vaughn Way
Arts and Sciences Aurora, CO 80014 Calgary, Alberta T2S 1H8
2300 E. Broadway Rd. 4901 E. Carson St. Canada
Long Beach, CA 90808 Tel. +1 303 369 9514
Tempe, AZ 85282 E-mail colorado@aes.org Home Tel. +1 403 703 5277
Tel. +1 480 858 9400, 800 562 Tel. +1 562 938 4312 Fax +1 403 762 6665
6383 (toll-free) Fax +1 562 938 4409 University of Colorado at E-mail alberta@aes.org
Fax +1 480 829 1332 E-mail long_beach@aes.org Denver Section (Student) Vancouver Section
E-mail Los Angeles Section Roy Pritts, Faculty Advisor David Linder
conservatory_RAS@aes.org Andrew Turner AES Student Section 93.7 JRfm/600am Radio, A
2311 Lakeview Ave. University of Colorado at Denver Division of the Jim Pattison
CALIFORNIA
Los Angeles, CA 90039 Dept. of Professional Studies Broadcast Group
American River College Tel. +1 323 663 1327 Campus Box 162 E-mail vancouver@aes.org
Section (Student) E-mail la@aes.org P.O. Box 173364
Eric Chun, Faculty Advisor Denver, CO 80217-3364 Vancouver Student Section
AES Student Section San Diego Section Tel. +1 303 556 2795 Gregg Gorrie, Faculty Advisor
American River College Chapter J. Russell Lemon Fax +1 303 556 2335 AES Greater Vancouver
4700 College Oak Dr. 2031 Ladera Ct. E-mail cu_denver@aes.org Student Section
Sacramento, CA 95841 Carlsbad, CA 92009-8521 Centre for Digital Imaging and
Tel. +1 916 484 8420 Home Tel. +1 760 753 2949 OREGON Sound
E-mail american_river@aes.org E-mail san_diego@aes.org PORTLAND SECTION 3264 Beta Ave.
Cal Poly San Luis Obispo San Diego State University Tony Dal Molin Burnaby, B.C. V5G 4K4, Canada
State University Section Section (Student) Audio Precision, Inc. Tel. +1 604 298 5400
(Student) John Kennedy, Faculty Advisor 5750 S.W. Arctic Dr. E-mail
Bryan J. Mealy AES Student Section Portland, OR 97005 vancouver_student @ aes.org
Faculty Advisor San Diego State University Tel. +1 503 627 0832
AES Student Section Electrical & Computer Fax +1 503 641 8906
California Polytechnic State Engineering Dept. E-mail portland@aes.org NORTHERN REGION,
University 5500 Campanile Dr. EUROPE
Dept. of Electrical Engineering San Diego, CA 92182-1309 UTAH
Tel. +1 619 594 1053 Brigham Young University Vice President:
San Luis Obispo, CA 93407 Søren Bech
Tel. +1 805 756 2300 Fax +1 619 594 2654 Section (Student)
E-mail sdsu@aes.org Timothy Leishman, Bang & Olufsen a/s
Fax +1 805 756 1458 CoreTech
E-mail san_luis_obispo@aes.org Faculty Advisor
San Francisco Section BYU-AES Student Section Peter Bangs Vej 15
California State University Conrad Cooke Department of Physics and DK-7600 Struer, Denmark
–Chico Section (Student) 1046 Nilda Ave. Astronomy Tel. +45 96 84 49 62
Keith Seppanen, Faculty Advisor Mountain View, CA 94040 Brigham Young University Fax +45 97 85 59 50
AES Student Section Office Tel. +1 650 846 1132 Provo, UT 84602 E-mail
California State University–Chico Home Tel. +1 650 321 0713 Tel. +1 801 422 4612 vp_northern_europe@aes.org
400 W. 1st St. E-mail san_francisco@aes.org E-mail brigham_young@aes.org
Chico, CA 95929-0805 BELGIUM
Tel. +1 530 898 5500 San Francisco State Utah Section
University Section (Student) Belgian Section
E-mail chico@aes.org Deward Timothy Hermann A. O. Wilms
John Barsotti, Faculty Advisor c/o Poll Sound
AES Student Section AES Europe Region Office
Citrus College Section (Student) 4026 S. Main Zevenbunderslaan 142, #9
Stephen O’Hara, Faculty Advisor San Francisco State University Salt Lake City, UT 84107
Broadcast and Electronic BE-1190 Vorst-Brussels, Belgium
AES Student Section Tel. +1 801 261 2500 Tel. +32 2 345 7971
Citrus College Communication Arts Dept. Fax +1 801 262 7379
1600 Halloway Ave. Fax +32 2 345 3419
Recording Arts
1000 W. Foothill Blvd. San Francisco, CA 94132 WASHINGTON DENMARK
Glendora, CA 91741-1899 Tel. +1 415 338 1507
Fax +1 626 852 8063 E-mail sfsu@aes.org Pacific Northwest Section Danish Section
Gary Louie Preben Kvist
Cogswells Polytechnical Stanford University Section University of Washington c/o DELTA Acoustics &
College Section (Student) (Student) School of Music Vibration
Tim Duncan, Faculty Sponsor Jay Kadis, Faculty Advisor P. O. Box 353450 Bygning 356
460 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
SECTIONS CONTACTS
DIRECTORY
Akademivej Tel. +7 095 2502161, +7 095 80-952 Gdansk, Poland Central German Section
DK 2800 Lyngby, Denmark 1929011 Tel. +48 58 347 2717 Ernst-Joachim Völker
Tel. +45 61 33 45 81 Fax +7 095 9430006 Fax +48 58 347 1114 Institut für Akustik und
Fax +45 45 20 12 01 E-mail moscow@aes.org E-mail Bauphysik
E-mail denmark@aes.org vp_central_europe@aes.org Kiesweg 22-24
Russian Academy of Music DE-61440 Oberursel, Germany
Danish Student Section Student Section AUSTRIA Tel. +49 6171 75031
Preben Kvist Igor Petrovich Veprintsev Austrian Section Fax +49 6171 85483
c/o DELTA Acoustics & Faculty Advisor Franz Lechleitner E-mail c_german@aes.org
Vibration Sound Engineering Division Lainergasse 7-19/2/1
Bygning 356 30/36 Povarskaya Street AT-1230 Vienna, Austria Darmstadt Section (Student)
Akademivej RU 121069, Moscow, Russia Office Tel. +43 1 4277 29602 G. M. Sessler, Faculty Sponsor
DK 2800 Lyngby, Denmark Tel. +7 095 291 1532 Fax +43 1 4277 9296 AES Student Section
Tel. +45 61 33 45 81 E-mail russian_academy@aes.org E-mail austria @aes.org Technical University of
Fax +45 45 20 12 01 St. Petersburg Section Darmstadt
E-mail denmark@aes.org Irina A. Aldoshina Graz Section (Student) Institut für Übertragungstechnik
St. Petersburg University of Robert Höldrich Merkstr. 25
FINLAND Faculty Sponsor DE-64283 Darmstadt, Germany
Telecommunications
Finnish Section Gangutskaya St. 16, #31 Institut für Elektronische Musik Tel. +49 6151 162869
Kalle Koivuniemi RU-191187 St. Petersburg und Akustik E-mail
Nokia Research Center Russia Inffeldgasse 10 darmstadt_student@aes.org
P.O. Box 100 Tel. +7 812 272 4405 AT-8010 Graz, Austria
FI-33721 Tampere, Finland Tel. +43 316 389 3172 Detmold Section
Fax +7 812 316 1559 (Student)
Tel. +358 7180 35452 E-mail st_petersburg@aes.org Fax +43 316 389 3171
Fax +358 7180 35897 E-mail graz_student@aes.org Andreas Meyer, Faculty Sponsor
E-mail finland@aes.org St. Petersburg Student Section AES Student Section
Natalia V. Tyurina Vienna Section (Student) c/o Erich Thienhaus Institut
NETHERLANDS Faculty Advisor Jürg Jecklin, Faculty Sponsor Tonmeisterausbildung
Prosvescheniya pr., 41, 185 Vienna Student Section Hochschule für Musik
Netherlands Section Universität für Musik und
Rinus Boone RU-194291 St. Petersburg, Russia Detmold
Tel. +7 812 595 1730 Darstellende Kunst Wien Neustadt 22, DE-32756
Voorweg 105A Institut für Elektroakustik und
NL-2715 NG Zoetermeer Fax +7 812 316 1559 Detmold, Germany
E-mail Experimentelle Musik Tel/Fax +49 5231 975639
Netherlands Rienösslgasse 12
Tel. +31 15 278 14 71, +31 62 st_petersburg_student@aes.org E-mail detmold@aes.org
AT-1040 Vienna, Austria
127 36 51 SWEDEN Tel. +43 1 587 3478 Düsseldolf Section
Fax +31 79 352 10 08 Fax +43 1 587 3478 20 (Student)
E-mail netherlands@aes.org Swedish Section
Ingemar Ohlsson E-mail vienna_student@aes.org Ludwig Kugler
Netherlands Student Section AES Student Section
Audio Data Lab CZECH REPUBLIC Bilker Allee 126
Maurik van den Steen Katarinavägen 22
AES Student Section Czech Section DE-40217 Düsseldorf, Germany
SE-116 45 Stockholm, Sweden Tel. +49 211 3 36 80 38
Prins Willemstraat 26 Jiri Ocenasek
2584 HV Den Haag, Netherlands Tel. +46 8 644 58 65 E-mail
Dejvicka 36
Tel. +31 6 45702051 Fax +46 8 641 67 91 duesseldorf_student@aes.org
CZ-160 00 Prague 6, Czech
E-mail E-mail sweden@aes.org
Republic Ilmenau Section
netherlands_student@aes.org University of Luleå-Piteå Home Tel. +420 2 24324556 (Student)
NORWAY Section (Student) E-mail czech@aes.org Karlheinz Brandenburg
Lars Hallberg, Faculty Sponsor Faculty Advisor
Norwegian Section AES Student Section Czech Republic Student AES Student Section
Jan Erik Jensen University of Luleå-Piteå Section Fraunhofer Institute for Digital
Nøklesvingen 74 School of Music Libor Husník, Faculty Advisor Media Technology IDMT
NO-0689 Oslo, Norway Box 744 AES Student Section Langewiesener Str. 22
Office Tel. +47 22 24 07 52 S-94134 Piteå, Sweden Czech Technical Univ. at Prague DE-98693 Ilmenau, Germany
Home Tel. +47 22 26 36 13 Tel. +46 911 726 27 Technická 2, Tel. +49 3677 69 4340
Fax +47 22 24 28 06 Fax +46 911 727 10 CZ-116 27 Prague 6 E-mail ilmenau_student@aes.org
E-mail norway@aes.org E-mail lulea_pitea@aes.org Czech Republic
Tel. +420 2 2435 2115 North German Section
RUSSIA UNITED KINGDOM E-mail husnik@feld.cvut.cz Reinhard O. Sahr
All-Russian State Institute of British Section Eickhopskamp 3
GERMANY
Cinematography Section Heather Lane DE-30938 Burgwedel, Germany
(Student) Audio Engineering Society Aachen Section (Student) Tel. +49 5139 4978
Leonid Sheetov, Faculty Sponsor P. O. Box 645 Michael Vorländer Fax +49 5139 5977
AES Student Section Slough SL1 8BJ Faculty Advisor E-mail n_german@aes.org
All-Russian State Institute of United Kingdom Institut für Technische Akustik
Cinematography (VGIK) Tel. +44 1628 663725 RWTH Aachen South German Section
W. Pieck St. 3 Fax +44 1628 667002 Templergraben 55 Gerhard E. Picklapp
RU-129226 Moscow, Russia E-mail uk@aes.org D-52065 Aachen, Germany Landshuter Allee 162
Tel. +7 095 181 3868 Tel. +49 241 807985 DE-80637 Munich, Germany
Fax +7 095 187 7174 Fax +49 241 8888214 Tel. +49 89 15 16 17
E-mail all_russian_state@aes.org CENTRAL REGION, E-mail aachen@aes.org Fax +49 89 157 10 31
EUROPE E-mail s_german@aes.org
Moscow Section Berlin Section (Student)
Michael Lannie Vice President: Bernhard Güttler HUNGARY
Research Institute for Bozena Kostek Zionskirchstrasse 14
Television and Radio Multimedia Systems DE-10119 Berlin, Germany Hungarian Section
Acoustic Laboratory Department Tel. +49 30 4404 72 19 István Matók
12-79 Chernomorsky bulvar Gdansk University of Technology Fax +49 30 4405 39 03 Rona u. 102. II. 10
RU-113452 Moscow, Russia Ul. Narutowicza 11/12 E-mail berlin@aes.org HU-1149 Budapest, Hungary
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 461
SECTIONS CONTACTS
DIRECTORY
Home Tel. +36 30 900 1802 Tel. +421 7 6478 0767 Tel. +385 1 6129 640 R. Paulo Renato 1, 2A
Fax +36 1 383 24 81 Fax. +421 7 6478 0042 Fax +385 1 6129 852 PT-2745-147 Linda-a-Velha
E-mail hungary@aes.org E-mail E-mail Portugal
slovakian_rep @aes.org croatian_student@aes.org Tel. +351 214145827
LITHUANIA E-mail portugal @ aes.org
Lithuanian Section SWITZERLAND FRANCE
Vytautas J. Stauskis ROMANIA
Swiss Section Conservatoire de Paris
Vilnius Gediminas Technical Joël Godel Section (Student) Romanian Section
University AES Swiss Section Alessandra Galleron Marcia Taiachin
Traku 1/26, Room 112 Sonnmattweg 6 36, Ave. Parmentier Radio Romania
LT-2001 Vilnius, Lithuania CH-5000 Aarau FR-75011 Paris, France 60-62 Grl. Berthelot St.
Tel. +370 5 262 91 78 Tel./Fax +41 26 670 2033 Tel. +33 1 43 38 15 94 RO-79756 Bucharest, Romania
Fax +370 5 261 91 44 Switzerland E-mail Paris_student@aes.org Tel. +40 1 303 12 07
E-mail lithuania@aes.org E-mail swiss@aes.org Fax +40 1 222 69 19
French Section E-mail romanian@aes.org
POLAND UKRAINE Michael Williams
Ile du Moulin SERBIA AND MONTENEGRO
Polish Section Ukrainian Section
Andrzej Dobrucki Dimitri Danyuk 62 bis Quai de l’Artois Serbia and Montenegro
Wroclaw University of 32-38 Artyoma St., Apt. 38 FR-94170 Le Perreux sur Section
Technology UA 04053 Kiev, Ukraine Marne, France Tomislav Stanojevic
Institute of Telecommunication E-mail ukrainian@aes.org Tel. +33 1 48 81 46 32 Sava centre
and Acoustics Fax +33 1 47 06 06 48 M. Popovica 9
Wybrzeze Wyspiannkiego 27 E-mail french@aes.org YU-11070 Belgrade, Yugoslavia
PL-50-370 Wroclaw, Poland SOUTHERN REGION, Tel. +381 11 311 1368Fax +381
Tel. +48 48 71 320 3068 EUROPE Louis Lumière Section 11 605 578
Fax +48 71 320 3189 (Student) E-mail
E-mail poland@aes.org Vice President: Alexandra Carr-Brown serbia_montenegro@aes.org
Ivan Stamac AES Student Section
Technical University of Gdansk Ivlje 4 Ecole Nationale Supérieure SLOVENIA
Section (Student) HR-10040 Zagreb, Croatia Louis Lumière
Pawel Zwan 7, allée du Promontoire, BP 22 Slovenian Section
Tel. +385 1 4822361 Tone Seliskar
AES Student Section FR-93161 Noisy Le Grand
Home Tel./Fax +385 1 4574403 Cedex, France RTV Slovenija
Technical University of Gdansk E-mail
Sound Engineering Dept. Tel. +33 6 18 57 84 41 Kolodvorska 2
ul. Narutowicza 11/12 vp_southern_europe@aes.org E-mail louis_lumiere@aes.org SI-1550 Ljubljana, Slovenia
PL-80 952 Gdansk, Poland Tel. +386 61 175 2708
BOSNIA-HERZEGOVINA Fax +386 61 175 2710
Home Tel. +48 58 347 23 98 GREECE
Office Tel. +4858 3471301 Bosnia-Herzegovina Section E-mail
Jozo Talajic Greek Section slovenian @ aes.org
Fax +48 58 3471114 Vassilis Tsakiris
E-mail gdansk_u @aes.org Bulevar Mese Selimovica 12
BA-71000 Sarajevo Crystal Audio SPAIN
Bosnia–Herzegovina Aiantos 3a Vrillissia
Wroclaw University of GR 15235 Athens, Greece Spanish Section
Technology Section (Student) Tel. +387 33 455 160 Juan Recio Morillas
Andrzej B. Dobrucki Fax +387 33 455 163 Tel. + 30 2 10 6134767 Spanish Section
Faculty Sponsor E-mail Fax + 30 2 10 6137010 C/Florencia 14 3oD
AES Student Section bosnia_herzegovina@aes.org E-mail greek@aes.org ES-28850 Torrejon de Ardoz
Institute of Telecommunications BULGARIA (Madrid), Spain
ISRAEL
and Acoustics Tel. +34 91 540 14 03
Wroclaw Univ.Technology Bulgarian Section Israel Section E-mail spanish @ aes.org
Wybrzeze Wyspianskiego 27 Konstantin D. Kounov Ben Bernfeld Jr.
PL-503 70 Wroclaw, Poland Bulgarian National Radio H. M. Acustica Ltd. TURKEY
Tel. +48 71 320 30 68 Technical Dept. 20G/5 Mashabim St. Turkish Section
Fax +48 71 320 31 89 4 Dragan Tzankov Blvd. IL-45201 Hod Hasharon, Israel Sorgun Akkor
E-mail BG-1040 Sofia, Bulgaria Tel./Fax +972 9 7444099 STD
wroclaw @ aes.org Tel. +359 2 65 93 37, +359 2 E-mail israel@aes.org Gazeteciler Sitesi, Yazarlar
9336 6 01 Sok. 19/6
REPUBLIC OF BELARUS Fax +359 2 963 1003 ITALY Esentepe 80300
E-mail Istanbul, Turkey
Belarus Section bulgarian @ aes.org Italian Section
Valery Shalatonin Carlo Perretta Tel. +90 212 2889825
Belarusian State University of CROATIA c/o AES Italian Section Fax +90 212 2889831
Informatics and Piazza Cantore 10 E-mail turkish@aes.org
Croatian Section
Radioelectronics Ivan Stamac IT-20134 Milan, Italy
vul. Petrusya Brouki 6 Ivlje 4 Tel. +39 338 9108768
BY-220027 Minsk Fax +39 02 58440640 LATIN AMERICAN REGION
HR-10040 Zagreb, Croatia
Republic of Belarus Tel. +385 1 4822361 E-mail italian@aes.org
Tel. +375 17 239 80 95 Home Tel./Fax +385 1 4574403 Vice President:
Fax +375 17 231 09 14 E-mail croatian@aes.org Italian Student Section Mercedes Onorato
E-mail Franco Grossi, Faculty Advisor Talcahuano 141
belarus @ aes.org Croatian Student Section AES Student Section Buenos Aires, Argentina
Hrvoje Domitrovic Viale San Daniele 29 Tel./Fax +5411 4 375 0116
SLOVAK REPUBLIC Faculty Advisor IT-33100 Udine, Italy E-mail
Slovakian Republic Section AES Student Section Tel. +39 0432227527 vp_latin_american@aes.org
Richard Varkonda Faculty of Electrical E-mail italian_student@aes.org
Engineering and Computing ARGENTINA
Centron Slovakia Ltd.
Dept. of Electroaocustics (X. Fl.) PORTUGAL
Podhaj 107 Argentina Section
SK-841 03 Bratislava Unska 3 Portugal Section German Olguin
Slovak Republic HR-10000 Zagreb, Croatia Rui Miguel Avelans Coelho Talcahuano 141
462 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April
SECTIONS CONTACTS
DIRECTORY
Buenos Aires, Argentina 1013 Tel./Fax +52 55 5240 1203 Fax +618 8 8384 3419 MALAYSIA
Tel./Fax +5411 4 375 0116 E-mail mexican @ aes.org E-mail adelaide@aes.org Malaysia Section
E-mail argentina@aes.org C. K. Ng
PERU
Brisbane Section
BRAZIL David Ringrose King Musical Industries
Orson Welles Institute Section AES Brisbane Section Sdn Bhd
Brazil Section (Student) P.O. Box 642 Lot 5, Jalan 13/2
José Carlos Giner Javier Antón Roma St. Post Office MY-46200 Kuala Lumpur
Rua Marechal Cantuária # 18 Av. Salaberry 3641, San Isidro Brisbane, Qld. AU-4003, Australia Malaysia
Urca-Rio de Janeiro Lima, Peru Office Tel. +61 7 3364 6510 Tel. +603 7956 1668
RJ-2291-060, Brazil Tel. +51 1 264 1773 E-mail brisbane@aes.org Fax +603 7955 4926
Tel. +55 21 2244 6530 Fax +51 1 264 1878 E-mail malaysia@aes.org
Fax +55 21 2244 7113 E-mail orsonwelles@aes.org Melbourne Section
E-mail brazil@aes.org Graham J. Haynes PHILIPPINES
PERU SECTION P.O. Box 5266
CHILE Wantirna South, Victoria Philippines Section
Armando Puente De La Vega AU-3152, Australia Dario (Dar) J. Quintos
Chile Section Av. Salaberry 3641 San Isidro Tel. +61 3 9887 3765 125 Regalia Park Tower
Andres Schmidt Lima, Peru Fax +61 3 9887 1688 P. Tuazon Blvd., Cubao
Hernan Cortes 2768 Tel. +51 1 264 1773 E-mail Quezon City, Philippines
Ñuñoa, Santiago de Chile Fax +51 1 264 1878 melbourne @ aes.org Tel./Fax +63 2 4211790, +63 2
Tel. +56 2 4249583 E-mail peru@aes.org 4211784
E-mail chile@aes.org Sydney Section E-mail philippines@aes.org
URUGUAY Howard Jones
COLOMBIA AES Sydney Section SINGAPORE
Uruguay Section P.O. Box 766
Colombia Section César Lamschtein Singapore Section
Sandra Carolina Hernandez Crows Nest, NSW AU-2065 Kenneth J. Delbridge
Universidad ORT Australia
CR 14 #87-25 Cuareim 1451 480B Upper East Coast Rd.
Bogotá, Colombia Tel. +61 2 9417 3200 Singapore 466518
Montevideo, Uruguay Fax +61 2 9417 3714
Tel. +57 1 622 1282 Tel. +59 1 902 1505 Tel. +65 9875 0877
Fax +57 1 629 7313 E-mail sydney@aes.org Fax +65 6220 0328
Fax +59 1 900 2952
E-mail colombia@aes.org E-mail uruguay@aes.org E-mail singapore@aes.org
HONG KONG
Javeriana University Section VENEZUELA
(Student) Hong Kong Section
Silvana Medrano Taller de Arte Sonoro, Henry Ma Chi Fai
Caracas Section (Student) HKAPA, School of Film and STUDENT DELEGATE
Carrera 7 #40-62
Bogota, Colombia Carmen Bell-Smythe de Leal Television ASSEMBLY
Tel./Fax +57 1 320 8320 Faculty Advisor 1 Gloucester Rd.
E-mail javeriana@aes.org AES Student Section Wanchai, Hong Kong
Tel. +852 2584 8824 NORTH/SOUTH
Taller de Arte Sonoro AMERICA REGIONS
Los Andes University Section Ave. Rio de Janeiro Fax +852 2588 1303
(Student) Qta. Tres Pinos E-mail Chair:
Jorge Oviedo Martinez Chuao, VE-1061 Caracas hong_kong @ aes.org Marie Desmarteau
Transversal 44 # 96-17 Venezuela McGill University Section
Bogota, Colombia Tel. +58 14 9292552 INDIA
(AES)
Tel./Fax +57 1 339 4949 ext. Tel./Fax +58 2 9937296 72 Delaware Avenue
2683 E-mail caracas@aes.org India Section
Avinash Oak Ottawa K2P 0Z3
E-mail losandes@aes.org Ontario, Canada
Venezuela Section Avisound
San Buenaventura University Elmar Leal Home Tel. +1 613 236 5411
A-20, Deepanjali Office Tel. +1 514 398 4535
Section (Student) Ave. Rio de Janeiro Shahaji Raje Marg
Nicolas Villamizar Qta. Tres Pinos E-mail
Vile Parle East tonmaestra@hotmail.com
Transversal 23 # 82-41 Apt. 703 Chuao, VE-1061 Caracas
Int.1 Venezuela Mumbai IN-400 057, India
Tel. +91 22 26827535 Vice Chair:
Bogota, Colombia Tel. +58 14 9292552 Felice Santos-Martin
Tel. +57 1 616 6593 Tel./Fax +58 2 9937296 E-mail
American River College (AES)
Fax +57 1 622 3123 E-mail venezuela@aes.org webmaster@aesindia.org Tel. +1 916 802 2084
E-mail sanbuenaventura@aes.org E-mail felicelazae@hotmail.com
JAPAN
ECUADOR INTERNATIONAL REGION
Japan Section
Ecuador Section Katsuya (Vic) Goh EUROPE/INTERNATIONAL
Juan Manuel Aguillo Vice President:
Neville Thiele 2-15-4 Tenjin-cho, Fujisawa-shi REGIONS
Av. La Prensa 4316 y Vaca de Kanagawa-ken 252-0814, Japan
Castro 10 Wycombe St.
Epping, NSW AU-2121, Tel./Fax +81 466 81 0681 Chair:
Quito, Ecuador E-mail aes_japan@aes.org Natalia Teplova
Tel./Fax +59 32 2598 889 Australia
Tel. +61 2 9876 2407 European Student Section
E-mail ecuador@aes.org KOREA Bratislavskaya Street 13-1-48
Fax +61 2 9876 2749
I.A.V.Q. Section (Student) E-mail vp_international@aes.org Moscow, RU 109 451, Russia
Korea Section Tel. +7 095 291 1532
Felipe Mardones Seong-Hoon Kang
315 Carrion y Plaza AUSTRALIA
Taejeon Health Science College Vice Chair:
Quito, Ecuador Adelaide Section Dept. of Broadcasting Martin Berggren
Tel./Fax +59 3 225 61221 David Murphy Technology European Student Section
E-mail iavq@aes.org Krix Loudspeakers 77-3 Gayang-dong Dong-gu Varvsgatan 35
MEXICO 14 Chapman Rd. Taejeon, Korea Arvika, SE 67133, Sweden
Hackham AU-5163 Tel. +82 42 630 5990 Home Tel. +46 0570 12018
Mexican Section South Australia Fax +82 42 628 1423 Office Tel. +46 0570 38500
Jorge urbano Tel. +618 8 8384 3433 E-mail korea@aes.org E-mail martin.bergren@imh.se
J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April 463
AES CONVENTIONS AND CON
The latest details on the following events are posted on the AES Website: http://www.aes.org

116th Convention Convention chair: Papers cochair:


Berlin, Germany Reinhard O. Sahr Ben Bernfeld
2004 Date: 2004 May 8–11 Eickhopskamp 3
DE-30938 Burgwedel, Germany
Krozinger Str. 22
DE-79219 Staufen, Germany
Berlin, Germany Location: Messe Berlin Telephone: + 49 5139 4978 Email: 116th_papers@aes.org
Berlin, Germany Fax: + 49 5139 5977 Papers cochair:
Email: 116th_chair@aes.org Stephan Peus
Vice chair: Georg Neumann GmbH
Jörg Knothe Email: 116th_papers@aes.org
DeutschlandRadio
Email: 116th_vicechair@aes.org

25th International Conference chair: Papers cochair:


Conference John Grant Gerhard Stoll
London, UK Nine Tiles Networks, Cambridge, UK IRT, Munich, Germany
“Metadata for Audio” Email: 25th_chair@aes.org Email: 25th_papers@aes.org
Date: 2004 June 17–19 Papers cochair:
Russell Mason
University of Surrey, Guildford, UK
Email: 25th_papers@aes.org

26th International Conference chair: Papers cochair:


26th Conference Conference Ben Kok Tom Magchielse
October 1–3 Baarn, The Netherlands Dorsserblesgraaf, The Netherlands Consultant, The Netherlands
“High-Quality Analog Audio Email: 26th_chair@aes.org Email: 26th_papers@aes.org
Processing” Papers cochair:
Date: 2004 October 1–3 Peter van Willenswaard
Audiomagic, The Netherlands
Email: 26th_papers@aes.org

117th Convention Convention chair: Papers cochair:


San Francisco, CA, USA John Strawn Brian Link
117 th
Date: 2004 October 28–31 S Systems
15 Willow Avenue
Dolby Laboratories
Email: 117th_papers@aes.org
2004 Location: Moscone Center Larkspur, CA, 94939 USA
San Francisco, CA, USA Papers cochair:
Telephone: + 1 415 927 8856 Rob Maher
Fax: + 1 415 927 2935 Montana State University-Bozeman
Email: 117th_chair@aes.org Email: 117th_papers@aes.org

San Francisco

All of the papers from AES conventions and conferences through


2003 are available on the 20-disk AES Electronic Library. The 2003
update disks for the Electronic Library are now available.

For price and ordering information go to www.aes.org,


send email to Andy Veloz at aav@aes.org,
or call any AES office at +1 212 661 8528, ext. 39 (USA),
+44 1628 663725 (UK), +33 1 4881 4632 (Europe).

464 J. Audio Eng. Soc., Vol. 52, No. 4, 2004 April


FERENCES INFORMATION FOR AUTHORS
Presentation should be labeled with author’s name and
Manuscripts submitted should be figure number.
typewritten on one side of ISO size A4 Photographs should be black and
Exhibit information: (210 x 297 mm) or 216-mm x 280-mm white prints without a halftone screen,
Thierry Bergmans (8.5-inch x 11-inch) paper with 40-mm preferably 200 mm x 250 mm (8 inch by
Telephone: +32 2 345 7971 (1.5-inch) margins. All copies including 10 inch).
Fax: +32 2 345 3419 abstract, text, references, figure captions,
Email: 116th_exhibits@aes.org Line drawings (graphs or sketches) can be
and tables should be double-spaced. original drawings on white paper, or high-
Call for papers: Vol. 51, No. 7/8, Pages should be numbered consecutively. quality photographic reproductions.
pp. 768 (2003 July/August) Authors should submit an original plus
two copies of text and illustrations. The size of illustrations when printed in the
Convention preview, Vol. 52, No. 3, Journal is usually 82 mm (3.25 inches)
pp. 266–287 (2004 March) Review wide, although 170 mm (6.75 inches) wide
Manuscripts are reviewed anonymously can be used if required. Letters on original
by members of the review board. After the illustrations (before reduction) must be large
reviewers’ analysis and recommendation enough so that the smallest letters are at
to the editors, the author is advised of least 1.5 mm (1/16 inch) high when the
either acceptance or rejection. On the illustrations are reduced to one of the above
Call for papers: Vol. 51, No. 9, basis of the reviewers’ comments, the widths. If possible, letters on all original
pp. 871 (2003 September) editor may request that the author make illustrations should be the same size.
Conference preview, This issue, certain revisions which will allow the
Units and Symbols
pp. 402–411 (2004 April) paper to be accepted for publication.
Metric units according to the System of
Content International Units (SI) should be used.
Technical articles should be informative For more details, see G. F. Montgomery,
and well organized. They should cite “Metric Review,” JAES, Vol. 32, No. 11,
original work or review previous work, pp. 890–893 (1984 Nov.) and J. G.
giving proper credit. Results of actual McKnight, “Quantities, Units, Letter
experiments or research should be Symbols, and Abbreviations,” JAES, Vol.
included. The Journal cannot accept 24, No. 1, pp. 40, 42, 44 (1976 Jan./Feb.).
unsubstantiated or commercial statements. Following are some frequently used SI
Organization units and their symbols, some non-SI units
An informative and self-contained that may be used with SI units (▲), and
Call for papers: This issue, some non-SI units that are deprecated (■).
pp. 457 (2004 April) abstract of about 60 words must be
provided. The manuscript should develop Unit Name Unit Symbol
the main point, beginning with an ampere A
introduction and ending with a summary bit or bits spell out
or conclusion. Illustrations must have bytes spell out
informative captions and must be referred decibel dB
to in the text. degree (plane angle) (▲) °
References should be cited numerically in farad F
brackets in order of appearance in the gauss (■) Gs
text. Footnotes should be avoided, when gram g
possible, by making parenthetical henry H
remarks in the text. hertz Hz
hour (▲) h
Mathematical symbols, abbreviations, inch (■) in
Exhibit information: acronyms, etc., which may not be familiar joule J
Chris Plunkett/Donna Vivero to readers must be spelled out or defined kelvin K
Telephone: +1 212 661 8528, ext. 30 the first time they are cited in the text. kilohertz kHz
Fax: +1 212 682 0477 kilohm kΩ
Email: 117th_exhibits@aes.org Subheads are appropriate and should be liter (▲) l, L
Call for papers: ,Vol. 52, No. 3 inserted where necessary. Paragraph megahertz MHz
pp. 319 (2004 March) division numbers should be of the form 0 meter m
(only for introduction), 1, 1.1, 1.1.1, 2, 2.1, microfarad µF
2.1.1, etc. micrometer µm
References should be typed on a microsecond µs
manuscript page at the end of the text in milliampere mA
order of appearance. References to millihenry mH
periodicals should include the authors’ millimeter mm
names, title of article, periodical title, millivolt mV
volume, page numbers, year and month minute (time) (▲) min
of publication. Book references should minute (plane angle) (▲) ’
contain the names of the authors, title of nanosecond ns
book, edition (if other than first), name oersted (■) Oe
and location of publisher, publication year, ohm Ω
and page numbers. References to AES pascal Pa
convention preprints should be replaced picofarad pF
with Journal publication citations if the second (time) s
preprint has been published. second (plane angle) (▲) ”
siemens S
Reports of recent AES Illustrations tesla T
conventions and conferences are Figure captions should be typed on a volt V
now available online, go to separate sheet following the references. watt W
www.aes.org/events/reports. Captions should be concise. All figures weber Wb
AES
sustaining
member
organizations

You might also like