Report 2012

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

TECHNOLOGY TRENDS

Technology Trends
in Audio Engineering
A report by the AES Technical Council

INTRODUCTION http://www.aes.org/technical/ to learn Council and its committees is to track


Technical Committees are centers of tech- more about the activities of each commit- new, important research and technology
nical expertise within the AES. Coordi- tee and to inquire about membership. trends in audio and report them to the
nated by the AES Technical Council, these Membership is open to all AES members Board of Governors and the Society’s
committees track trends in audio in order as well as those with a professional inter- membership. This information helps the
to recommend to the Society papers, est in each field. governing bodies of the AES to focus on
workshops, tutorials, master classes, stan- Technical Committee meetings and items of high priority. Supplying this
dards, projects, publications, conferences, informal discussions held during regular information puts our technical expertise
and awards in their fields. The Technical conventions serve to identify the most to a greater use for the Society. In the fol-
Council serves the role of the CTO for the current and upcoming issues in the spe- lowing pages you will find an edited com-
society. Currently there are 23 such cific technical domains concerning our pilation of the reports recently provided
groups of specialists within the council. Society. The TC meetings are open to all by many of the Technical Committees
Each consists of members from diverse convention registrants. With the addition
backgrounds, countries, companies, and of an internet-based Virtual Office, com- Francis Rumsey
interests. The committees strive to foster mittee members can conduct business at Chair, AES Technical Council
wide-ranging points of view and any time and from any place in the world. Bob Schulein, Jürgen Herre, Michael Kelly
approaches to technology. Please go to: One of the functions of the Technical Vice Chairs

ARCHIVING, RESTORATION,
AND DIGITAL LIBRARIES
David Ackerman, Chair
Chris Lacinak, Vice Chair

Practical observations ing throughout the world to support those In addition, the Technical Committee on
Broadcast Wave File (BWF) format has who oversee and manage moving image and Audio Recording and Mastering Systems
become the de facto standard for preserva- sound archives. This is acknowledgment of has completed a study on the persistence
tion of audio content within the field, as the differing skill set from traditional paper and interoperability of metadata in wav
has a digital audio resolution of 24 bit/96 and still image archivists. files, while Indiana University published
kHz. Time-based metadata is also of partic- IT and programming skills are an ever- “Meeting the challenge of media preserva-
ular interest, including time-stamped growing need in the fulfilment of preserva- tion; strategies and solutions.”
descriptive metadata and closed captions. tion. This is an emerging required under- The following standards activities
Manufacturers have begun to enable preser- standing / skill for audio engineers. recently took place:
vation activities through additional Requirements and specifications for digital AES60-2011 AES Standard for audio
metadata capabilities and support for open repositories serving preservation and access metadata—Core audio metadata was pub-
formats. roles are currently in development. lished September 22, 2011.
Sound for moving image is somewhat in AES57-2011 AES standard for audio
limbo, currently being grouped with mov- Selected significant projects metadata—Audio object structures for
ing image preservation for the most part. and initiatives preservation and restoration was published
Preservation of sound for moving image is a The 131st AES Convention featured an September 21, 2011
current focus for future attention of this archiving track that was well attended. We AES SC-07-01 Working Group on audio
committee. Moving image and sound believe archiving will continue to grow as metadata was formed this October. This
preservation graduate programs are emerg- an area of interest to AES members. group continues the work to complete AES-

90 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February


TECHNOLOGY TRENDS

X98C, metadata for process history of audio collaborative research and development ini- structure and Preservation Program
objects. tiative with tangible end results that will (NDIIPP) has the mission to develop a
AES SC-03 was retired this October. create best practices and test emerging national strategy to collect, archive, and
The Federal Agencies Audio Visual Digiti- standards for digital preservation of archival preserve the burgeoning amounts of digital
zation Working Group (digitizationguide- audio. This is known as Sound Directions. content, especially materials that are cre-
lines.gov) is investigating audio system The National Recording Preservation ated only in digital formats, for current and
evaluation tools for evaluating the perform- Board, mandated by the National Recording future generations.
ance of analog to digital converters and for Preservation Act of 2000, is an advisory Presto Center is a European effort to
detecting interstitial errors. group bringing together a number of pro- push the limits of current technology
The Indiana University Archives of Tradi- fessional organizations and expert individu- beyond the state of the art, bringing
tional Music (ATM) and the Archive of als concerned with the preservation of together industry, research institutes, and
World Music (AWM) at Harvard University recorded sound. The group has published a stakeholders to provide products and serv-
have received a grant from the National report from the engineers’ roundtable ices for bringing effective automated preser-
Endowment for the Humanities to under- (CLIR). vation and access to Europe’s diverse audio-
take a joint technical archiving project, a The National Digital Information Infra- visual collections.

AUDIO FOR GAMES


Michael Kelly and Steve Martz, Chairs
Kazutaka Someya, Vice Chair

Emerging trends in audio for games are and Android devices) and non-contact tech- those used in games like Rocksmith, permit
driven by continuing advances in game nology (e.g., Microsoft Kinect, PlayStation the use of real instruments as game
technology and the diversity of devices and Eye). These are able to track player position controllers.
operating systems that are now considered or gestures and are beginning to find useful
gaming devices. Trends are summarized applications in game-audio. 3-D video is yet DSP plugins and codecs
under the headings below. to demonstrate a new counterpart in audio. A move into software has made it possible
for developers to write their own DSP plug-
A general move from hardware Spatial audio ins for use in games. There has been an
to software processing Console games are largely geared around increase in third-party companies provid-
Audio DSP is now performed in software on 5.1 and 7.1 playback or legacy formats. ing DSP algorithms for licensing by game
CPUs or programmable DSP processors. Some commercial games (e.g., Race Driver: developers. Solutions often involve plat-
Even on lower-power platforms there is a Grid) are now making use of Ambisonics. form-specific optimized codecs and DSP for
move away from dedicated audio chips and Portable platforms are generally targeted at use in-game as well PC versions in the
memory although exceptions still exist. headphone playback or device speaker play- form of VST plugins or similar for author-
back, although many tablets are equipped ing. There has been a growth in use of
Game platforms are diversifying with other methods such as HDMI outputs. algorithms such as convolution reverb and
Console platforms are very dominant in There is general trend toward scalability efforts to further R&D in improving audio
large budget titles and a lot of memory and and adaptation to the consumer’s configu- DSP for use in game. There has also a
DSP is leveraged for audio on these plat- ration, particularly as the line between con- strong trend toward returning to synthe-
forms. Consoles remain a major driver in sole and portable platforms becomes sized sound in-game; this is partially
game-audio trends and games often target blurred. The driver for spatial audio formats driven by resource requirements of
high-end consumer playback environments. largely comes from outside the games portable platforms, but also by the poten-
Portable platforms, particularly iOS and industry and future conventions include tial flexibility of synthesized sound as new
Android devices, now also account for a features such as height-channels to aug- R&D can provide improved quality for
large portion of gameplay and present new ment current multichannel setups. appropriate sounds. As well as low-level
constraints, development approaches, and DSP, higher level systems such as intelli-
creative styles. Production methodologies Audio input gent or automatic mixing technologies are
for console and mobile gaming will increas- Speech input is now used in a number of being used in games like the Battlefield
ingly merge as handheld devices become games and devices for character control or series.
more powerful. More recently, cloud gam- player-to-player communication. Speech
ing is demonstrating itself as a viable plat- analysis and processing is a key research Tools and workflow
form and offers new challenges including area in game-audio. Analysis of singing and A number of studios now have extremely
potential latency and network delivery research in this area has been applied in a sophisticated tools for game audio content
issues. number of leading console game titles. authoring, either developed in-house or
Rhythm based games (e.g., Rock Band, Gui- licensed as middleware. Tools generally
Peripherals and interaction tar Hero) make use of varying degrees of remain specific to the game domain. There
Social gaming and new platforms offer new instrument-style peripherals such as guitar are an increasing number of attempts
ways to interact with games using hand- controllers, piano keyboards, virtual drum though standards groups like the IASIG to
held devices (e.g., Wii Remote, PlayStation kits; as well as motion controllers and increase interoperability between linear
Move Controller), touch screens (e.g., iOS touch screens. New technologies, such as audio tools and game-tools.

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 91


TECHNOLOGY TRENDS

Education and standards levels in games. There has been recent The IASIG recently introduced game audio
Standards activity continues in the games growth in the number of educational insti- curriculum guidelines for interested insti-
industry and becomes more relevant as the tutions that offer game audio courses and tutions. Research from academia is also
industry matures. Current standards activ- interest from academia continues to grow. directly impacting game development and
ity includes: interoperable file formats, digi- This is an important step as informed game many titles feature the results of collabora-
tal audio workstation design, and loudness audio programmers are still in short supply. tion between academia and industry.

AUDIO FOR
TELECOMMUNICATIONS
Bob Zurek, Chair
Antti Kelloniemi, Vice Chair

The trend in mobile telecommunications device and have allowed the device to more can upgrade the call quality of older devices
has been toward moving advanced features accurately detect the environment that it is by adding new Bluetooth headsets or car
down in price point to feature phones and in. This allows the device to adapt its opera- kits that contain many of the same noise
using the more advanced mobile devices as tion, to best function in any environment adaptive algorithms found in much newer
a personal computing and multi-media whether the device is being used for multi- devices. This includes both noise adaptive
capture and playback devices. The typical media playback, communications, or com- downlink and advanced noise and echo sup-
feature phone today exhibits all of the char- puting. Voice control of communications pression in the uplink signal.
acteristics of a top of the line device of a few devices has progressed to the point where Following the move toward using the
years ago with both private mode and hands networked voice recognition allows the use mobile device as a user’s main device for
free audio, multiple microphones with of natural language with larger vocabular- communication, computing, and media
advanced noise reduction capabilities, and ies than previously possible on a standalone playback has led the creation of a number
Bluetooth allowing a low end feature phone device. of multimedia docks, computing docks, and
to serve as the center of a personal commu- Many people have replaced several indi- accessories for the devices. In many cases
nications network. vidual pieces of mobile electronics with the portable communication device can
Wideband audio communications has their portable communication device over serve as the hub for the home multimedia
been rolled out in many countries over the last few years. Integration of high qual- system, when paired to or placed in docking
both cellular and wireless VOIP (voice over ity optics has led to the replacement of still systems connected to the home audio video
internet protocol) doubling the audio band- and video capture devices for some. Current system. It is not uncommon for smart-
width used in speech communications. devices are capable of both multi-megapixel phones and tablets to have HDMI output for
Multiple VOIP clients are available for still photography and HD video capture. media playback on HDMI compatible moni-
download on the major mobile operating Some of the devices feature multichannel tors or sound systems. The creation of
systems and many devices come with at audio capture capabilities. The combination Bluetooth mice and keyboards, and laptop
least one VOIP client preinstalled. of GPS and network connectivity has docks often in conjunction with HDMI
The last few years have shown smart- allowed the portable communications video output has allowed the user to
phones and tablet devices becoming a devices to become personal navigation quickly and effortlessly transition from
larger percentage of the total mobile devices with nearly continuous map using the communications device as a
telecommunications devices. They are no updates and real time traffic information. portable phone to a home computer.
longer the niche devices of the mid to late The enhanced processing capabilities of Software updating of not only the appli-
part of the last decade. The move to com- separate application processors coupled cations but also the operating systems
mon operating systems with thousands of with the over the air download of applica- allow for the devices to grow in capability
applications allows the user to customize tions have led to the use of portable com- after purchase much as personal computers
their device in ways not possible a few years munications devices for office productivity, have in the past. No longer is a customer
ago. The downloadable application environ- multimedia playback, and authoring, as forced to live with the limitations that a
ments of the major mobile operating sys- well as gaming all in a single device. device is shipped with for the life of the
tems have allowed different users to take Current 3G and 4G data rates allow the device or service provider contract. As new
the same hardware and customize it into mobile devices to operate with bandwidths features are developed and integrated into
very diverse devices to suit their needs, comparable to home-based high speed operating systems, as long as the hardware
from business oriented devices, to media internet. This has led to the use of wireless still supports the new functionality, a user
and gaming devices, even as far as using devices as wi-fi hubs for a network of of a year-old device can update to many of
the device as a configurable piece of test devices requiring internet access such as the features being released in the latest
equipment. personal computers, gaming systems, auto- devices.
The integration of sensing capabilities mobiles, and televisions. Over the next few years, the rapid growth
such as accelerometers, gyroscopes, light Many of the advances in handsets of a few in capabilities of portable communication
and infrared sensors into devices have years ago have migrated to the edge of the devices tied with ever-expanding applica-
allowed not only manufacturers but also personal network allowing headphones, tion environments will allow portable com-
creators of applications the ability to create headsets, and car-kits to achieve handset munications devices to evolve into tools
more natural human interfaces to the levels of uplink voice quality. Consumers unimaginable a few short years ago.

92 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February


TECHNOLOGY TRENDS

AUDIO FORENSICS
Jeff M. Smith, Chair
Christopher Peltier, Vice Chair
Eddy Bøgh Brixen, Vice Chair

Enhancement how the recorder was powered, and more. auditory system’s perceptual abilities to
The enhancement of forensic audio Recent developments in digital audio understand audio scenes. CASA systems
recordings remains the most common authentication also include the Compres- have already proven very useful as pre-
task for forensic audio practitioners. The sion Level Analysis of an audio recording processors for automatic speech recogni-
goal of forensic audio enhancement is to to determine if an uncompressed file had tion systems and in hearing aids. New
increase intelligibility of voice informa- been previously subject to data compres- areas of study include their use in audio
tion or improve the signal to noise ratio sion or if the compression level present is forensics. Also, automatic speaker seg-
of a target signal by reducing the effects consistent with an authentic recording. mentation based on extracted spectral
or interferences that mask it. Many tools Also, a technique for determining the features and statistical modeling can help
are available through various software presence of butt-splice edits has been pre- automated systems tasked with speech
developers with the most common being sented. In the digital domain, as in the and speaker recognition.
noise reduction—either adaptive or lin- analog, auditory, and spectral acoustic
ear. Difficulties in this area are caused by analysis continues to be necessary. How- Other considerations
lossy data compression common to small ever, it is also clear that analysis of the Since the fundamental aspect of forensic
digital recorders, data compression, and digital data that makes up a recorded audio is its application to law with the lit-
bandwidth limited signals in telecommu- audio file including its header and file igation process benefitting from audio
nications, and non-ideal recording envi- structure must be exploited to ascertain a enhancement and analysis, it is important
ronments common to surveillance and digital recording’s authenticity. for the practitioner working with forensic
security. One growing area of research is audio to be aware of this process and the
the assessment of speech intelligibility Speech and speaker analysis need for proper evidence handling and
with multiple papers presented on the The analysis of speech and speakers pres- laboratory procedures. As digital audio
topic at the AES 39th Conference on ent on audio recordings is a large domain proliferates so to have the identification
Audio Forensics in 2010. that intersects many industries including of proper practices for imaging media,
forensics and security. The analysis of hashing file duplicates, and recovering
Authentication speakers present in recordings to ascer- and/or repairing corrupt or carved files.
The majority of audio media presented to tain identity continues to be a common Additionally, it is not only common for
the forensic examiner are digital record- request of forensic audio examiners. forensic audio to be played in a court-
ings on optical disc, HDD, flash memory, However, “identifying” persons in a 1:1 room but for typed transcripts of
and solid-state recorders. However, the comparison is not supported within the recorded conversations to be prepared for
analysis of analog tape cassettes and scientific community that favors “recog- the individuals involved in a case; the
microcassettes is still required of examin- nition” of persons based on extracted fea- lawyers, judge(s), and/or jury. Specific to
ers. In the area of forensic media authen- tures relative to a background model rep- these needs, there are developments in
tication, digitally recorded audio files resenting a population of speakers. addressing the inherent bias present in
may be subject to various kinds of manip- Automatic systems based on cepstral coef- the human preparation of these tran-
ulation that are harder to detect than ficients, Gaussian Mixture Modeling, and scripts. Also, the forensic audio practi-
those in the analog domain. This leaves likelihood ratios employ robust and vali- tioner must be aware of the audio sam-
the forensic audio examiner with new dated techniques for speaker recognition. ples being presented taking into
challenges regarding the authentication This quantitative approach better meas- consideration courtroom acoustics, psy-
of these recordings. Many new techniques ures and takes into account intra- and choacoustics, and the hearing abilities of
have been developed in recent years for inter-speaker variation. When used in a these individuals.
use in these analyses. These techniques forensic environment where trained
continue to be published and presented examiners base conclusions on likelihood AES activities
through the AES Journal and proceedings ratios, this technique is valued greatly Numerous papers on audio forensics
of AES Conferences and Conventions. over other qualitative analyses. appear in the Journal of AES and are pre-
Among these techniques is the analysis of The capability of a system to process sented at AES conventions each year.
the Electric Network Frequency compo- multitudes of audio signals and sort them Additionally, there have been three AES
nent (ENF) of a recording. If present, the based on language, topic, speakers pres- conferences on audio forensics since 2005
remains of the ENF may be compared to a ent, and acoustic environment continues (AES 26th, 33rd, and 39th) and the next
database of ENF from the same grid to to progress with many new advances. An will be in Denver, CO in 2012. Addition-
authenticate the date and time the interesting area of research and its appli- ally, regular workshops and tutorials
recording was made. In addition to auto- cation in audio forensics is Computa- appear at AES conventions. At the AES
matic database comparison, it is possible tional Auditory Scene Analysis (CASA). 130th Convention in London there was a
to learn several other things from ENF This field of audio processing is inter- tutorial on forensic audio enhancement,
analysis including whether portions of ested in developing machine systems that and at the AES 131st Convention in New
the recording were removed, if an audio perform automatic signal separation York there was a workshop on forensic
recording was digitized multiple times, using principles derived from the human audio authentication.

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 93


TECHNOLOGY TRENDS

AUDIO RECORDING
AND MASTERING SYSTEMS
Kimio Hamasaki, Chair
Toru Kamekawa and Andres Mayo, Vice Chairs

The growth of multichannel audio record- the principal tool for editing and mixing. standard is also gradually expanding in
ing and production is the most remarkable Mixing consoles are becoming an interface audio recording and mixing consoles and
trend in the audio recording and mastering for the DAW. Physical control surfaces for stage boxes equipped with AES 42 I/O are
systems. Recording and mastering using mixing are sometimes not used, but instead now available. IP networking is very often
high resolution audio technology is also a a virtual control surface on a PC display is used in audio recording and mastering.
notable trend in this area. often used for recording and mastering. Growth of IP networking, especially consid-
While 5.1 multichannel sound is widely DAWs use hard disks for storage, and music ering the increase of data transfer rates, is
applied in audio recording and mastering, recording and mastering studios also essential for the improvement of recording
audio recording and mastering using intend to use server-based storage systems and mastering systems.
advanced multichannel sound formats such for recordings. Network attached storage It is common to use DSP (digital signal
as 7.1, 9.1, and more channels have been (NAS) is widely used for audio recording. processing) in recording and mastering sys-
increasing. Higher sampling frequencies While removable hard disks had been tems. A new trend can be seen in the appli-
such as 96 kHz and 192 kHz are also widely used for audio recording and master- cation of FPGAs (field programmable gate
applied in audio recording and mastering. ing, there is still no internationally stan- arrays) instead of DSP, and DAWs working
Most recording systems can now work at dardized removable hard disk drive. on FPGA are already available. A remark-
these higher sampling rates, including in MADI (Multichannel Audio Digital Inter- able trend in mastering systems is the
some cases the very high rate used by DSD face) has been gaining popularity in record- development of new plug-in audio process-
(Direct Stream Digital) systems. DXD (Digi- ing systems because multichannel sound ing software for mixing and mastering.
tal eXtreme Definition), which samples recordings need many channels compared DAWs equipped with plug-in audio process-
multi-bit PCM at 352.8 kHz, is a new trend with 2-channel stereo recording. Stage ing software are widely used for audio
for digital recording. A-to-D converters and boxes equipped with multichannel micro- production and can be purchased quite
D-to-A converters for DXD are available, phone preamps and A-to-D converters are inexpensively. The availability of such DAWs
and some DAWs can record and edit DXD. now available with MADI output. Use of dig- has been changing the nature of music
The digital audio workstation (DAW) is ital microphones according to the AES 42 productions.

AUTOMOTIVE AUDIO
Richard Stroud, Chair
Tim Nind, Vice Chair

Vehicles with built-in internet capability (via MP3 sources via the typically included USB mass reduction, have recently become
3G, etc.) could present numerous music and connection. SSDs (solid state devices) will much more expensive due to neodymium
talk selections at higher quality than most replace hard drives as preferred storage cost increases. Some reports indicate
other data-reduced sources. At least one when cost permits. Premium receivers are increases of as much as eight times their
OEM is working on personal audio to allow beginning to appear that do not include CD former prices. Vendors of smaller speakers
people to have the same data and source players. Increasingly larger USB drives are were offering neodymium magnet speakers
material that they have at home in the car. becoming a primary music storage medium, at prices similar to those of ferrite magnet
Connectivity may be based on the user’s along with Bluetooth-connected cell phones speakers but are struggling to do so at pres-
mobile phone. Some OEM’s are considering with their music libraries. Download of MP3 ent. Having a strong set of specifications
using a dedicated server to control quality. files into vehicles by home-based RF (radio will insure that sensitivity, Xmax and other
There is an interest in providing sounds frequency) links has been introduced. parameters are maintained in these speak-
for very quiet cars such as electric vehicles. Objective measurement is still battling ers. Planar style speakers are now found in
These include “engine start” and “engine subjective listening tests as a final authority vehicles. These are not totally flat, but have
running” sounds for inside the vehicle and for OEMs. SPL vs. distortion measurements profiles of 10 mm or less. Some examples
pedestrian safety sounds for outside the are quite good now, and directionally cor- have shown very low sensitivity.
vehicle. rect frequency response measurements are HD radio components are now for sale.
Hard disk drives are now used in pre- improving. Spatial measurement capability AM HD radio offers much higher fidelity
mium audio systems. These disks tend to be is being developed and evaluated. and FM HD offers additional program
smaller than state-of-art home disk drives The trend toward higher performance sources. Because of the fidelity difference
because of vibration requirements (40 to 80 audio systems is in direct conflict with on AM, rapid switching in fringe areas must
Gbyte drives are becoming available, and recent trends of cost and weight reduction be carefully managed.
larger drives are expected soon). Disk drive of components in automobiles. Increased There are an increasing number of center
usages include navigation data and music. application of neodymium magnets may speakers appearing in prestige class auto-
Systems allow storing of many CDs from help here. Neodymium magnet speakers, motive system designs. Speakers have also
on-board readers and music from available once attractive as an affordable means of appeared in the tops of front seats. “Sur-

94 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February


TECHNOLOGY TRENDS

round sound” is becoming mandatory in tems can perhaps take advantage of inexpen- and/or level compensation now enjoy a
high-end automotive systems even when sive, powerful audio DSP systems to widespread market presence. Basic versions
the source is limited to two channels (so improve performance. Rear seat audio per- are available in many OEM head units while
this is implemented using upmix algo- formance may be important in China and some high-end premium systems have
rithms). Some listeners sense that some other countries, as some who can afford more sophisticated implementations. Sim-
surround systems provide limited envelop- automobiles can also afford drivers. ple systems use the speedometer signal to
ment on both stereo and much “surround” Voice recognition systems for telephone apply predefined loudness curves. Others
source material. and navigation functions are becoming use microphones to measure the current
There is almost universal branding of more sophisticated and enjoy wider applica- cabin noise, after separating the music,
audio systems in luxury cars, and newer tion. Automatic equalization is being allowing more targeted equalization or
brands are emerging. The maximum num- offered for audio system tuning. Use of such bass/level compression to be applied.
ber of speakers used in luxury vehicle sys- automatic systems can significantly speed Switching audio is now commonly seen
tems seems to be leveling out at 18±2. After- the tuning process but may not be ready to in automotive amplifiers. Switching audio
market audio now represents a very small completely replace tuning for on-road per- costs are becoming comparable with older
part of the automotive audio market. There formance by trained listeners. Active noise AB amplifiers, as the heat sink requirement
are still parts of the world where 5.1 and cancellation by the audio system is being is minimized. Important for electric vehi-
high-level premium audio are not featured used for exhaust drone under condition of cles is the low current draw under all audio
in most vehicles’ audio line-ups. These sys- cylinder deactivation. Active road noise bass power output conditions.

CODING OF AUDIO SIGNALS


Jürgen Herre and Schuyler Quackenbush, Chairs

Overview cally not designed to deliver “transparent” successful audio coders is still largely
Audio coding has emerged as a critical audio quality (i.e., that original and based on a classic filterbank based coding
technology in numerous audio applica- encoded/decoded audio signal cannot be paradigm, in which the quantization
tions. In particular, it is a key component perceptually distinguished even under noise is shaped in the time/frequency
of mobile multimedia applications in the most rigorous circumstances). Neverthe- domain to exploit (primarily) simultane-
consumer market. Examples include less, “entertainment quality” services over ous masking in the human auditory
wireless audio broadcast, internet radio wireless channels have been very success- system. However, the recent success of
and streaming music, music download, ful. Examples of audio coding that facili- parametric extensions to the core audio
storage and playback, mobile audio tates these new markets include MPEG codec, in both market deployment and
recording, and Internet-based teleconfer- HE-AACv2 and MPEG USAC. standardization, illustrates this tendency
encing. Example platforms include digital Transform-based audio coding schemes as follows.
audio broadcast radio receivers, portable have been exploited to their full potential Audio bandwidth extension technology
music players, mobile phones, and per- (quality vs. bitrate). As such, new para- substitutes the explicit transmission of
sonal computers. From this, a variety of digms will be exploited to gain further the signal’s high-frequency part (e.g., by
implications and trends can be discerned. compression efficiency. sending quantized spectral coefficients)
Digital distribution of content is For broadcast-only applications where by a parametric synthesis of high-fre-
offered to the consumer in many formats delay is not a constraint, there is the pos- quency spectrum at the decoder side
with varying quality / bitrate trade-off, sibility to gain further compression effi- based on the transmitted low frequency
depending on application context. This ciency by exploiting large algorithmic part and some parametric side informa-
ranges from very compact formats (e.g., delays or even multi-pass algorithms in tion that captures the most relevant
MPEG HE-AACv2 and MPEG USAC) for the case of “off-line” audio coding. aspects of the original high frequency
wireless mobile distribution to perceptu- The role of higher-level psychoa- spectrum. This exploits the lower percep-
ally transparent, scalable-to-lossless and coustics and perception is becoming tual acuity in the high-frequency region
lossless formats for regular IP-based increasingly important in audio coding. of the human auditory system. An exam-
distribution (e.g., MPEG AAC, HD-AAC Detection of auditory objects in an audio ple is MPEG HE-AAC.
and ALS). stream, separation into auditory (as Parametric stereo techniques enable ren-
The frontiers of compression have been opposed to acoustic) objects, and storage dering of several output channels at very
pushed further, allowing carriage of full- and manipulation as auditory objects is low bit rates. Instead of a full transmission
bandwidth signals at very low bit rates to beginning to play a role. This will be an of all channel signals, the stereo / multi-
the point where recent coding systems are important and ongoing area of research. channel sound image is re-synthesized at
considered appropriate for some broad- the decoder side based on a transmitted
casting applications, particularly relatively Hybrid and parametric coding downmix signal and parametric side infor-
expensive wireless communication chan- There is a consistent trend toward hybrid mation that describes the perceptual prop-
nels such as satellite or cellular channels. coding techniques that employ paramet- erties (cues) of the original stereo / multi-
While such technology predominantly ric modeling to represent aspects of a sig- channel sound scene. Examples are MPEG
makes use of parametric approaches (at nal, where the parametric coding tech- Parametric Stereo (for coding of two chan-
least in part) to achieve highest possible niques are typically motivated by aspects nels) and MPEG Surround (for full sur-
quality at lowest bit rates, they are typi- of human perception. The core of most round representation).

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 95


TECHNOLOGY TRENDS

Parametric coding of audio object sig- pression) to achieve “perceptually trans- ease of communication in conferences
nals provides, similarly to parametric parent” compression of music, since the between several partners.
coding of multichannel audio, a very additional increment in resources required There is considerable research activity
compact representation of a scene con- for such operating points is relatively inex- exploring audio presentation that is more
sisting of several audio objects (e.g., pensive. For example, consumers are opt- immersive than the pervasive consumer
music instruments, talkers, etc.). Rather ing to use MPEG Layer III (MP3) or MPEG 5.1 channel audio systems. One might
than transmitting discrete object signals, AAC at rates of 256 kb/s or higher to code apply the label of “3-D Audio” to such
the (downmixed) scene is transmitted, their music libraries for their portable explorations, since their common thread
plus parametric side information describ- music players. is the use of many loudspeakers posi-
ing the properties of the individual Processor speed has continued to tioned around, above, and below the lis-
objects. At the decoder side, the scene increase at a tremendous pace. Even with tener. This might range from proposed
can be modified by the user according to the low-power restrictions imposed by 22.2 channel systems for the consumer to
his/her preference, e.g., the level of a par- battery powered portable devices, the tens or hundreds of loudspeakers for
ticular object can be attenuated or quantity of CPU cycles potentially avail- research in, e.g., wave field synthesis. Of
boosted. A recent example for such a able for audio processing is large. Present great interest is exploring the impact of
technology is MPEG Spatial Audio Object audio coders work in a fraction of avail- loudspeakers positioned above or below
Coding (SAOC). able CPU capacity, even for multichannel the horizontal plane of the typical 5.1
There has been significant progress in coding, and new research may be needed channel system. When systems with a
the challenge of developing a truly uni- to discover how to use the additional CPU large number of loudspeakers are consid-
versal coder that can deliver state of the cycles and memory space. Some possibili- ered, efficient coding of the audio speaker
art performance for all kinds of input sig- ties are improved psychoacoustic models signals is of paramount importance. In
nals, including music and speech, that and sophisticated acoustic scene analysis. addition, a flexible rendering method that
has been achieved. Hybrid coders, such as Seen overall, the research in audio cod- permits high-quality playback on a wide
MPEG USAC (Unified Speech and Audio ing is moving to the extremes, both range of conceivable consumer loud-
Coding), have a structure combining ele- toward lowest bit rates (very lossy com- speaker arrangements would be very
ments from the speech and the audio pression using parametric coding exten- desirable. It may be that audio coding and
coding architectures and, over a wide sions) and highest bit rates (noiseless/ rendering to arbitrary loudspeaker setups
range of bit rates, perform better than lossless coding for high resolution audio can be realized in a unified algorithm.
coders designed for only speech or only at high sampling rates/resolutions), as This will be an interesting trend to watch.
audio. well as the more complex high-level pro- Finally, after quite some time, the “dig-
cessing (scene analysis and sound field ital deadlock” regarding the legitimate
Implications for technology synthesis of various sorts). commercial dissemination of authorized
and consumer applications Audio coding has successfully entered digital audio content has been success-
Solid-state and hard drive-based storage the world of telecommunication, provid- fully resolved, and the business models of
for audio has become extremely inexpen- ing low-delay high-quality codecs that the music industry have embraced the
sive and consumer internet connection enable natural sound for teleconferencing Internet. Besides a number of (mostly
speeds reach into the megabits per second and video-conferencing. Such codecs legal) sources of audio (and audio-visual)
range. When such resources are available, deliver full bandwidth and high quality, content with very limited audio quality
music streaming, download, and storage not only for speech material but also for and free access, several successful major
applications no longer require state of the any type of music and environmental distribution platforms exist now for the
art audio compression. Instead, what is sound, enabling applications such as tele- electronic distribution of audio. These
occurring in the marketplace is that con- teaching for music. They support spatial download stores offer digital audio con-
sumers are operating well-known percep- reproduction of sound (stereo or even tent in a variety of formats, quality levels
tual coders at higher bit rates (lower com- surround), which can greatly increase the and protection levels.

FIBER OPTICS FOR AUDIO


Ronald G. Ajemian, Chair (USA)
Werner Bachmann, Chair (Europe)

It is clear that there are new current and In the future, copper based systems will nate common noise, radio-frequency inter-
future trends in the area of fiber optics for be inadequate to drive the demands for ference, electromagnetic interference, and
audio. It has been hard to ignore that more higher bit rates and bandwidth. It is clear mains hum.
and more companies are deploying fiber from just the telecommunication and Other trends include the use of fiber optic
optics in their audio/video systems. One can broadcast companies that everything is snakes, links, networks and switchers, cables
witness this especially in the broadcast field becoming more integrated. Optical fiber and connectors, microphone preamplifiers,
of audio/video. In the current economy cables can carry multiple signals (audio, and feeds for stage/theater live sound. Fiber
where jobs are diminishing, there is growth video, clock sync/time codes, control data, over Cat 5 or Cat 6 is an option, and fiber
for expertise with using fiber optic-based etc.) all over a single strand of fiber or two used in MADI. It is likely that fiber optics will
audio/video systems. New start-up compa- or more if necessary. The proof is in the affect every sector of audio/video and will
nies come to the AES Convention every year. application that has been proven to elimi- eventually be ubiquitous.

96 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February


TECHNOLOGY TRENDS

HEARING AND HEARING LOSS


PREVENTION
Robert Schulein, Chair
Michael Santucci and Jan Voetmann, Vice Chairs

Introduction designs for such measurements are now in signal-to-noise ratio of desired sounds in
The AESTC on Hearing and Hearing Loss common use for screening newborn chil- the real world primarily for better speech
Prevention was established in 2005 with dren. Additional research is being con- intelligibility in noise. Currently miniature
five initial goals focused on informing the ducted directed at using such test methods directional microphone systems with port
membership as to important aspects of the to detect early stages of hearing loss not yet spacings in the 5-mm range are being used
hearing process and issues related to hear- detectable by hearing-threshold measure- to provide improvements in speech intelli-
ing loss, so as to promote engineering- ments. The committee is currently working gibility in noise of 4 to 6 dB. Such micro-
based solutions to improve hearing and to establish a cooperative relationship phones have become rather sophisticated,
reduce hearing loss. Its aims include the between researchers in this field and AES in that many designs have directional adap-
following: raising AES member awareness members, who will serve as evaluation tation circuits designed to modify polar pat-
of the normal and abnormal functions of subjects. terns to optimize the intelligibility of
the hearing process; raising AES member desired sounds. In addition some designs
awareness of the risk and consequences of Emerging treatments and technology are capable of providing different direc-
hearing loss resulting from excessive sound Currently there is no known cure for what tional patterns in different frequency bands.
exposure; coordinating and providing tech- is referred to as sensorineural hearing loss, Furthermore, some hearing aid manufac-
nical guidance for the AES-supported hear- in that irreparable damage has been done to turers have introduced products using sec-
ing testing and consultation programs at the hearing mechanism. Such loss is com- ond-order directional microphones operat-
U.S. and European conventions; facilitating monly associated with aging and prolonged ing above 1 kHz with some success.
the maintenance and refinement of a data- exposure to loud sounds, although it is well In many situations traditional hearing
base of audiometric test results and expo- established that all individuals are not aid technology is not able to provide ade-
sure information on AES members; forging affected to the same degree. Considerable quate improvements in speech intelligibil-
a cooperative union between AES members, research is ongoing with the purpose of ity. Under such circumstances wireless
audio equipment manufacturers, hearing devising therapies leading to the activation transmission and reception technology is
instrument manufacturers, and the hearing of cochlear stem cells in the inner ear to being employed to essentially place micro-
conservation community for purposes of regenerate new hair cells. There are, how- phones closer to talkers’ mouths and speak-
developing strategies, technologies, and ever, drug therapies being introduced in ers closer to listeners’ ears. This trend
tools to reduce and prevent hearing loss. oral form to prevent or reduce damage to appears to offer promise enabled by the evo-
the cilia portion of hair cells in cases where lution of smaller transmitter and receiver
Measurement and diagnosis standard protection is not enough, such as devices and available operating-frequency
Current technology in the field of audiology in military situations. We are beginning to allocations. Practical devices using such
allows for the primary measurement of see the emergence of otoprotectant drug technology are now being offered for use
hearing loss by means of minimum sound therapies, now in clinical trials that show with cellular telephones. This is expected to
pressure level audibility vs. frequency pro- signs of reducing temporary threshold shift be an area of considerable technology and
ducing an audiogram record. Such a record and tinnitus from short term high sound product growth.
is used to define hearing loss in dB vs. fre- pressure levels. New stem cell therapies are
quency. The industry also uses measure- also being developed with goals of regener- Tinnitus
ment of speech intelligibility masked by ating damaged hair cells. Another hearing disorder, tinnitus, is com-
varying levels of speech noise. Such meas- Hearing instruments are the only proven monly experienced by individuals, often as
urements allow individuals to compare method by which sensorineural hearing a result of ear infections, foreign objects or
their speech intelligibility signal-to-noise loss is treated. In general the task of a hear- wax in the ear, and injury from loud noises.
ratio performance to the normal popula- ing instrument is to use signal processing Tinnitus can be perceived in one or both
tion. Other tests are commonly used as well and electroacoustical means to compress ears or in the head. It is usually described
for diagnosis as to the cause of a given hear- the dynamic range of sounds in the real as a ringing, buzzing noise, or a pure tone
ing loss and as a basis for treatment. world to the now limited audible dynamic perception. Certain treatments for tinnitus
Within the past ten years, new tests have range of an impaired person. This requires have been developed for excessive condi-
evolved for diagnosing the behavior of the the implementation of level-dependent tions in the form of audio masking,
cochlea by means of acoustical stimulation compression circuits to selectively amplify however most research is directed toward
of hair cells and sensing their resulting low-level sounds and power amplification pharmaceutical solutions and prevention.
motion. Minute sounds produced by such and high-performance microphone and We are also seeing the emergence of elec-
motions are referred to as otoacoustic emis- receiver transducers fitted into miniature tro-acoustic techniques for treating what is
sions. Measurement systems developed to packages. Such circuitry is commonly commonly referred to as idiopathic tinni-
detect and record such emissions work by implemented using digital signal process- tus or tinnitus with no known medical
means of distortion product detection ing techniques powered by miniature 1-volt cause. About 95% of all tinnitus is consid-
resulting from two-tone stimulations as zinc-air batteries. ered idiopathic. These treatments involve
well as hair cell transients produced from In addition to dynamic-range improve- prescriptive sound stimuli protocols based
pulse-like stimulations. Test equipment ments, hearing aids serve to improve the on the spectral content and intensity of the

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 97


TECHNOLOGY TRENDS

tinnitus. In Europe, psychological assis- ability of fixed and portable audio equip- the hearing and hearing-conservation
tance to help individuals live with their tin- ment capable of producing damaging communities. In recognition of the
nitus is a well established procedure. sound levels as well as live sound per- importance of hearing health to audio
formance attendance. One approach to professionals engaged in the production
Hearing loss prevention dealing with this issue is education in the and reproduction of music, this commit-
Hearing-loss prevention has become a form of communicating acceptable expo- tee has scheduled its first conference
major focus of this committee due to the sure levels and time guidelines. Such devoted to technological solutions to
fact that a majority of AES members measures are however of limited value, as hearing loss. The 47th AES International
come in contact with high level sounds as users have little practical means of gaug- Conference on Music Induced Hearing
a part of the production, creation, and ing exposure and exposure times. This Disorders will take place in Chicago, IL,
reproduction of sound. In addition, this situation represents a major need and USA from June 20–22, 2012. This confer-
subject has become a major issue of con- consequent opportunity for this commit- ence will focus on new technologies for
sumer concern due to the increased avail- tee, audio equipment manufacturers, and measurement and prevention.

HIGH RESOLUTION AUDIO


Vicky Melchior and Josh Reiss, Chairs

Within the past decade, the types, distribu- With the discontinuance of HD-DVD, BD and its compatibility with most computer
tion, and uses of audio have greatly diversi- is now the higher bandwidth successor to operating systems. FLAC is not widely sup-
fied. Portables and internet sourcing have DVD and is well suited for high resolution ported on mobile devices or in many lower
flourished and disc sales have fallen, multichannel audio, both alone and in priced home theater (HT) systems and can
although the balance between the two combination with high definition (HD) be difficult to route through an HT system
varies by country. High quality audio for video. The format provides an optional without first transcoding.
formal listening has evolved simultaneously 8 channels of 96 kHz/24 bit audio or 2–6
and mirrors many of the same influences. channels of 192 kHz/24 bit. The great Growth of computer
There is a notable broad trend toward majority of current BDs include one or and server-based audio
increasing quality in many aspects of audio, more of these optional formats. Audio-only There is a strong trend toward adoption of
and together with promised developments discs are not yet common, but a nascent computers and file servers into all areas of
such as cloud storage and HD streaming, initiative exists on the part of several small audio, especially evident in the U.S. and Far
digital audio including high quality formal companies to record audio-only high res East. For high quality audio, there are
listening will continue to grow and evolve. multichannel on BD without the need for a excellent opportunities but a range of new
TV monitor. Note that derivative HD discs technical and delivery issues. The term
Music sources also exist in some regions, for example “computer audio” covers numerous config-
High resolution remains a mainstay of pro- China Blue HD in the Chinese market. urations where the computer may act as
fessional recording and archiving due to its A rapid proliferation of BD-capable front end disc player or file server; may out-
extended headroom, precision, and fre- devices has resulted, encompassing players, put audio via a PCI sound card, external
quency capture. In the consumer market- laptops, external BD drives for PCs, PCI sound card, or motherboard ports; and may
place, the principal current high resolution cards supporting 7.1 audio with BD decod- access downloads or streamed radio and AV
sources are discs, especially Blu-ray, and ing, recorders, and home theater proces- from the internet. Files may be stored on
internet downloads. The music for these sors. Many, though not all, support eight hard drives, flash, network-attached storage
releases reflects a range of eras and record- channels of high resolution audio. The (NAS), or redundant arrays with backup;
ing techniques as well as resolutions, and retail industry in the U.S. also reports and network file servers other than a com-
may have been remastered, transcoded, or growing interest among ordinary con- puter may act as software players.
upsampled. Thus the frequency extension sumers in BD and multichannel audio. The traditional audiophile two-channel,
and dynamic range in some cases is less At least 40 websites ranging from large music-only marketplace has embraced
than that of newer recordings made directly aggregators to individual orchestras and computers and file servers due to the con-
at high resolution. bands now exist and sell both new work and venience of file storage and downloads. In
The original high resolution disc formats back catalog with resolutions from this market, which overlaps professional
have not achieved wide success although 192 kHz/24 bit to 44.1 kHz/16 bit. Tracks audio, the design ethos of low distortion,
SACDs continue to be released in small are principally stereo and favor classical high quality engineering has spurred
numbers, notably in classical music. SACD- music, although broader genre coverage is manufacturer research in identifying and
capable players continue to be available and increasing. Websites currently sell without eliminating technical problems associated
today’s universal players may play Blu-ray copy protection. Accordingly, few releases with computers as front end devices.
Disc (BD), DVD, SACD, and CD. Some sup- at the highest resolutions are available from These include isolation of noisy computer
port for Direct Stream Digital, the single bit the major labels. power supplies, avoidance of jittered com-
encoding technique behind SACD, can be The file formats of online downloads have puter clocks, RFI shielding, special atten-
found in professional recorders, players, coalesced around FLAC and WMA for loss- tion to computer layouts by makers of
and modern interfaces, but LPCM has less compression and WAV or AIFF for PCI sound cards, and design of digital
largely supplanted single bit techniques as uncompressed LPCM. The popularity of interfaces to avoid contaminating an
release and recording formats. FLAC relates to its free, open source nature external DAC master clock with the jitter

98 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February


TECHNOLOGY TRENDS

and noise from the PC. Examples of the Improving audio quality Current wireless audio devices with few
latter include asynchronous USB, PLL Transmission of high quality, high band- exceptions are limited to 48 kHz, but com-
chips in association with Firewire and width AV signals across networks and digi- ponents and transmission protocols are
SPDIF, and DAC-controlled data transmis- tal interfaces is a very active arena of work. underway that promise 96 kHz capability.
sion. Much ongoing effort in computer In addition to advances in point-to-point Convergence trends are strongly evident
related software aims to provide bit-accu- interfaces discussed above, development in AV design and will certainly continue in
rate decoding, ripping, playback, and continues on Ethernet and HDMI. light of entertainment trends such as cloud
transcoding. New Ethernet initiatives such as Audio storage and streamed HD live performance.
A trend to include computer audio in Video Bridging (AVB) promise improved
home theater is underway as well but network attributes like bandwidth reserva- Research
with a greater mix of challenges for high tion, traffic shaping, phase synchronization High resolution formats in general are
quality audio. Home theater is above all a across all channels, and low latency. AVB mature, although efforts to improve loss-
rapidly evolving and richly diverse area of Ethernet is relevant to home and car sys- less compression continue. Inquiry contin-
wide price range and capability. HT com- tems, although the jitter performance of ues into the perceptual characteristics and
ponents routinely support the lower reso- DAC clocks linked to the network will need audibility of sample rates above
lution compressed formats streamed from to be assessed. 44.1 kHz/16 bit, and of the associated fil-
the internet and cable, and variously the HDMI, the point-to-point connector tering and data conversion processes.
high resolution AV needed for DVD, BD, required for BD and HD video, has excellent Design research continues on loudspeak-
and HDTV. Support for the file types and bandwidth and an Ethernet data link ers, class D amps, and microphones in
resolutions typical of downloads, disc (HDMI 1.4), but lacks an audio clock. HDMI support of the wide bandwidth, low distor-
rips, and AV from other recording or non- receivers must derive audio word clock tion, wide dynamic range requirements of
movie sources may be absent. It contin- from the video pixel clock, commonly high resolution. Also, surround algorithms
ues to be challenging to transmit files resulting in very high jitter that affects emphasizing enhanced spatial coding are
without invoking unwanted sample rate quality and can be audible. Some high end an especially active research area that
conversion, unintended transcoding (e.g., receivers address the jitter and many com- should be mentioned in context of high
FLAC to MP3), bit truncation, and loss of panies are researching it but current solu- resolution because of the improved spatial
metadata. tions are expensive and uncommon. resolution they afford.

HUMAN FACTORS
IN AUDIO SYSTEMS
Michael Hlatky, Chair
William Martens, Vice Chair
Jörn Loviscach

The Technical Committee on Human Fac- with digital audio: Touch screens com- degree of freedom for expressive input,
tors in Audio Systems provides an industry monly lack pixel-precise navigation, parts of some software—for instance Apple’s
forum for questions concerning the design the screen will be visually obstructed by the GarageBand on the iPad—incorporates data
of user interaction for audio applications, user’s hand and arm when manipulating an from the device’s accelerometer sensor
the integration of audio in man-machine on-screen control, and there is relatively when the user plays virtual instruments
interfaces (such as warning sounds, data little to no tactile feedback during the inter- with the on-screen keyboard.
sonification and auditory feedback), and action process. The common smartphone’s collection of
the design of interfaces for musical These three reasons alone make the sensors such as the touch screen,
instruments. design of, for instance, a touch-controlled accelerometer, compass, GPS, microphone,
on-screen fader quite cumbersome. While and ambient light sensor also provides a
Touch screens and mobile devices the precision achievable by touch manipu- whole new range of input capabilities that
With the recent advent of ubiquitous touch- lation of an on-screen fader might be can be leveraged in conjunction with digital
controlled computing devices, especially enough to set the playback volume when audio. There is a collection of new audio
the first topic has gained considerable listening to MP3s on a phone, it can be by applications that enable users to influence
importance. Devices that provide touch- far not enough to set parameters when the presented audio using these sensors.
based on-screen manipulation such as mixing music. Some manufacturers have Software, such as for instance Smule’s “I
smartphones and tablet PCs are heavily therefore enabled swiping gestures on am T-Pain,” RjDj’s “Inception App” or the
used to consume all things digital. Audio touch-controlled faders to increase preci- Black Eyed Peas’ “BEP360” interactive
software on phones or tablets, however, is sion; this does, however, take away direct music video, introduce a whole new level of
yet mostly targeted at the consumption end controllability, as several micro-actions interactivity into the formerly lean-back
of the audio commercialization chain. The might be necessary to achieve a desired experience of listening to music. In addi-
reason why we are not yet commonly see- parameter value. Furthermore, the lack of tion, they raise the question whether music
ing professional audio workstations run- pressure-sensitive touch screens on the might in future not even be generally dis-
ning on a touch screens alone might be mass market renders the expressive control tributed as mere audio data, but as an
traced back to some of the obvious short- of musical instruments with such devices application.
comings of such devices when used to work nearly impossible. To enable an additional Interactive audio applications also pose a

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 99


TECHNOLOGY TRENDS

new set of problems to designers of the munity and Microsoft itself. It took only few able in the cloud, also completely new
common digital audio workstation (DAW). days after the Kinect came to market until approaches to user experiences inside a
How does a future digital audio workstation the first software was published by the DAW are possible. Already today, “UJAM”
that is targeted at producing audio for open-source community enabling control enables you to sing a few lines, from which
interactive applications integrate itself well of software instruments. it automatically generates a complete, pro-
into the development environments for the fessional-sounding song.
iPhone and its siblings? Hints might be The cloud A drawback of the browser-based DAWs,
taken from software employed to design Another trend to be observed at Music however, might be that the long-learned,
interactive music scores and dynamic Hackdays is the rise of web-based APIs known, and expected standard user inter-
sounds for computer games, such as Cry- (application programming interfaces). faces provided by the operating systems
tec’s “CryEngine,” or visual programming Whether it is finding new audio content, such as the default buttons or the behavior
languages such as Cycling ’74 Max, or Pure processing audio, or simply listening to of menus are not easily replicable inside a
Data. music, companies such as SoundCloud, The web browser. With the advent of HTML5 as
Echo Nest or Spotify have an API for that. a “kind of operating system” of a cloud-dri-
Novel game controllers Music discovery and recommendation via ven audio experience, such standards might
The experimental music scene has quickly interconnected web services are topics never exist again.
picked up off-the-shelf devices for natural taken on now by Facebook and Google, and
user interaction (NUI). Novel game con- even Pro Tools got in its tenth incarnation Modular hardware controllers
trollers such as the Microsoft Kinect Sensor equipped with a function to directly bounce Tangible interfaces with knobs and faders
or the older Nintendo Wii Remote have, a mix to SoundCloud. Even the DAW has are still a big topic, and it seems that
however, yet to arrive in the professional moved into the cloud, with, for instance, extreme modularity is the new trend in
audio industry. In the gaming market espe- PowerFX’s “Soundation Studio” or Ohm- hardware controllers. Steinberg’s “CMC
cially the Kinect has had a huge impact; Force’s “OhmStudio.” Series” or Euphonic’s “Artist Series” con-
Microsoft reported selling more than eight The key benefit of these new audio pro- trollers can be combined in any number,
million units within the first 60 days, mak- duction platforms are the enhanced possi- enabling the user to build a hardware con-
ing this the fastest selling consumer elec- bilities for remote collaboration in compar- troller setup for the bedroom studio or the
tronics device ever. ison to traditional DAWs. The move to the scoring stage, all employing the same
The Kinect controller enables data cloud does, however, also enable a whole components.
manipulation for multiple users by natural new approach to designing user interfaces Recent research in the HCI community
user interaction employing multiple users’ through so-called perpetual betas. As appli- has explored the combination of touch
whole bodies via skeleton tracking. This cations are running in the browser, update screen interfaces with superimposed physi-
means that for instance the positions of the cycles are frictionless, because each time cal controls in audio editing tasks, such as
users’ hands in three-dimensional space the user loads a session, a new version of for instance “Slap Widgets” by Malte Weiss
can be used to control parameters, or the the software can be delivered. Another fact and his coworkers or “Dynamic Mapping of
software can react directly to full-body ges- to keep in mind is that the computing Physical Controls for Tabletop Groupware”
tures. The hacking scene, such as the atten- power in the cloud is decentralized. A limit by Rebecca Fiebrink and her coworkers.
dants of the industry-sponsored Music Hack to the number of plugins running in paral- These approaches seem promising to unite
Days, embraces these devices. In the case of lel might be a problem of the past as soon the tactile controllability of physical input
the Kinect, this is fueled in particular by as audio processing has moved to the cloud. devices with the configurability of a touch
the SDKs provided by the open-source com- With all this computational power avail- screen.

MICROPHONES
AND APPLICATIONS
Eddy B. Brixen, Chair
David Josephson, Vice Chair

The microphone is an amazing device. No designs introduced to the market is better the limited signal-to-noise ratios may not
other piece of audio equipment being 20 to explained by the opportunity of making be a problem any longer.
50 years old would be considered as a sensi- good business on the general assumption
ble choice for modern recording. However, that exotic looking microphones provide Digital adaptation
that is to some degree the way microphones exotic audio than it is by an increased level Innovation in the field of modern micro-
are regarded. of research in understanding and improve- phone technology is to some degree concen-
ments of these designs. trated around adaptation to the digital age.
Oldies but goodies(?) In particular the interfacing problems are
In the marketplace of today we find a lot of Transducer technology addressed. The continued updating of the
old designs still being produced. A high per- There has been no major break-through in AES42 standard is essential in this respect.
centage of new products brought to market transducer technology during the last Now dedicated input/control stages for
in reality are copies of aging technologies years. Microelectronic mechanical systems microphones with integrated interfaces are
—ribbon microphones, tube microphones, (MEMS) are not yet on the market for pro- available. However, different widely imple-
and the like. The large number of these fessional audio. However, in the near future mented “device-to-computer” standards like

100 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
TECHNOLOGY TRENDS

USB and Firewire—which are not specifi- recordings the applications are obvious. exotic are finding their way into practical
cally reserved for audio—have also been applications. As an example NASA has pub-
applied in this field. Regarding the data Other microphone developments lished technical briefs on a laser micro-
streams, USB3 is fully satisfactory for most More attention has been paid to the reduc- phone technology that must be regarded as
audio purposes but USB microphones are tion of EMC problems found in an environ- a serious solution.
outside standards. However they have ment of increasing high frequency electro- Battery technology—especially for wire-
reached a much higher level of popularity in magnetic fields that are being picked-up by less microphones—is an area of great atten-
semi-pro audio and home recording com- microphones. tion. Surprisingly many engineers still
pared to AES42. Higher order Ambisonics has taken a prefer replaceable batteries from recharge-
DSP-controlled microphones are still central position in the search for multi-for- able. This will change.
developing. This includes directional pattern mat compatibility. Other dedicated formats The difficulties of getting some of the
control of multi-transducer units providing for surround sound exist. However, it seems rare earth materials for magnets may affect
steering or multichannel output for surround that the 9.1/13.1 formats are forcing many the microphone selection available on the
recordings. These techniques are not neces- engineers to start reinventing arrays over market. In the future the effect of this
sarily applicable in professional audio. How- again. This should not be necessary. might be realized as fewer dynamic micro-
ever, in the field of surveillance and security Some technologies earlier regarded as phones or rising prices.

NETWORK AUDIO SYSTEMS


Kevin Gross, Chair
Umberto Zanghieri and Thomas Sporer, Vice Chairs
Tim Shuttleworth

This document is a compilation of contribu- Ethernet networks. The IEEE is the organi- professional environments, where networks
tions from numerous members of the Tech- zation that maintains Ethernet standards are planned and managed.
nical Committee on Networked Audio Sys- including wired and wireless Ethernet (prin- All protocols and mechanisms used within
tems. The committee has identified the cipally 802.3 and 802.11 respectively). AVB RAVENNA are based on widely deployed and
following important topics related to emerg- adds several new services to Ethernet established methods from the IT and audio
ing audio networking technologies. Tech- switches to bring this about. The new industry or comply with standards as defined
nologies that have emerged since the last switches interoperate with existing Ethernet and maintained by international standardiza-
published Emerging Trends Report from the gear but AVB-compliant media equipment tion organizations like IEEE, IETF, AES, and
committee in 2007 are included. To provide interconnected through these switches enjoy others. RAVENNA can be viewed as a collec-
structure to the report items are discussed performance currently only available from tion of recommendations on how to com-
in order of their maturity; commercialized proprietary network systems. bine existing standards to build a media
technologies implemented in products avail- AVB consists of a number of interacting streaming system with the designated
able for purchase being discussed first and standards: features.
embryonic concepts in early development 802.1AS – Timing and Synchronization RAVENNA is an open technology standard
come up last. Other categorizations referred 802.1Qat – Stream Reservation Protocol without a proprietary licensing policy. The
to in this document are consumer market 802.1Qav – Forwarding and Queuing technology is defined and specified within
orientation versus professional market focus, 802.1BA – AVB System the RAVENNA partner community, which is
as well as media transport methods versus IEEE 1722 – Layer 2 Transport Protocol led by ALC NetworX and supported by
command and control protocols. IEEE P1722.1 – Discovery, enumeration, numerous well-known companies from the
connection management and control pro audio market.
EBU N/ACIP IEEE 1733 – Layer 3 Transport Protocol.
The European Broadcasting Union (EBU) AVB standardization efforts began in AES X192
together with many equipment manufactur- earnest in late 2006. As of November 2011, Audio Engineering Society Standards Com-
ers has defined a common framework for all but the P1722.1 work have been ratified mittee Task Group SC-02-12-H is developing
Audio Contribution over IP in order to by the IEEE. an interoperability standard for high-perfor-
achieve interoperability between products. mance media networking. The project has
The framework defines RTP as a common RAVENNA been designated “X192.”
protocol and media payload type formats A consortium of European audio companies High-performance media networks sup-
according to IETF definitions. SIP is used as has announced an initiative called RAVENNA port professional quality audio (16 bit,
signaling for call setup and control, along for real-time distribution of audio and other 48 kHz and higher) with low latencies (less
with SDP for the session description. The media content in IP-based network environ- than 10 ms) compatible with live sound rein-
recommendation is currently published as ments. RAVENNA uses protocols from the forcement. The level of network perform-
document EBU Tech 3326-2008. IETF’s RTP suite for media transport. IEEE ance required to meet these requirements is
1588-2008 is used for clock distribution. achievable on enterprise-scale networks but
Audio video bridging Performance and capacity scale with the generally not on wide-area networks or the
The Audio Video Bridging initiative is an capabilities of the underlying network archi- public internet.
effort by the IEEE 802.1 task group working tecture. RAVENNA emphasizes data trans- The most recent generation of these
within the IEEE standards organization that parency, tight synchronization, low latency, media networks use a diversity of proprietary
brings media-ready real-time performance to and reliability. It is aimed at applications in and standard protocols (see Table 1). Despite

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 101
TECHNOLOGY TRENDS

Technology Purveyor Date introduced Synchronization Transport events in which musicians perform
together synchronously over wide area net-
works, often separated by thousands of
RAVENNA ALC NetworX In development IEEE 1588-2008 RTP
miles. The main technical challenge asso-
ciated with these events is maintaining suf-
AVB IEEE, AVnu In development IEEE 1588-2008 Ethernet, ficiently low latencies for the musicians to
advanced profile RTP be able to play together, given the dis-
(IEEE 802.1AS) tances involved. Emerging enabling tech-
Q-LAN QSC Audio 2009 IEEE 1588-2002 UDP nologies such as the low latency codecs
Products CELT, which stands for “Constrained
Dante Audinate 2006 IEEE 1588-2002 UDP Energy Lapped Transform,” Opus, a merg-
ing of CELT and SILK (a Skype codec) as
LiveWire Telos/Axia 2004 Proprietary (native),
well as ULD, which refers to “Ultra-Low-
Delay” allow streaming over DSL or cable
Table 1 Media networks end point connections rather than high-
bandwidth managed networks, such as
a common basis in Internet Protocol, the sys- then many professional audio and broad- Internet2, which are recently more com-
tems do not interoperate. This latest crop of cast manufacturers have adopted Dante. monly used.
technologies has not yet reached a level of From the beginning Dante implementa- Another wide area networked emerging
maturity that precludes changes to improve tions have been fully IP based, using the use case is streaming audio for cinema
interoperability. IEEE 1588-2002 standard for synchroniza- postproduction, in which studios and post-
The X192 project endeavors to identify tion, UDP/IP for audio transport and are production facilities are connected with
the region of intersection between these designed to exploit standard gigabit Ether- one another via high-bandwidth managed
technologies and to define an interoper- net switches and VoIP-style QoS (quality of fiber networks. This allows studios to see
ability standard within that region. The ini- service) technology (e.g., Diffserv). Dante and hear the latest version of a film in
tiative will focus on defining how existing is evolving with new networking standards. postproduction without having to physi-
protocols are used to create an interopera- Audinate has produced versions of Dante cally move the assets to the studio or use a
ble system. It is believed that no new pro- that use the new Ethernet Audio Video file-transfer system. Real-time streaming of
tocols need be developed to achieve this. Bridging (AVB) protocols, including IEEE uncompressed audio and video also allows
Developing interoperability is therefore a 802.1AS for synchronization and RTP greater collaboration between directors
relatively small investment with potentially transport protocols. It is committed to sup- and postproduction facilities and between
huge return for users, audio equipment porting both IEEE 1733 and IEEE 1722. different departments in the postproduc-
manufacturers, and network equipment Existing Dante hardware devices can be tion process.
providers. firmware upgraded as Dante evolves, pro- Networked postproduction uses two
While the immediate X192 objective is to viding a migration path from existing methods (at present) for streaming audio:
define a common interoperability mode the equipment to new AVB capable Ethernet when audio is streamed independently of
different technologies may use to commu- equipment. video, hardware Layer 3 uncompressed
nicate to one another, it is believed that the Recent developments include announced audio-over-IP devices are used. When audio
mode will have the potential to eventually support for routing audio signals between is streamed along with video, it is embed-
become the default mode for all systems. It IP subnets and the demonstration of low ded in an HD-SDI video stream, and the
will be compatible with and receive per- latency video. Audinate is a member of the stream is networked using a video codec.
formance benefits from an AVB infrastruc- AVnu Alliance and the AES X192 working The former case is primarily used for audio
ture. Use of the standard will allow AVB group. postproduction, in which the audio engi-
implementations to reach beyond Ethernet neers are mixing to a poor-quality version
into wider area applications. Q-LAN of the video; the video is then sourced
While the initial X192 target application Q-LAN is a third-generation networked locally at all locations, and the audio
is audio distribution, it is assumed that the media distribution technology providing synced to it. Control information is
framework developed by X192 will be sub- high quality, low latency, and ample scala- streamed between all nodes using high-def-
stantially applicable to video and other bility aimed primarily at commercial and inition KVM-over-IP devices, along with
types of media data. professional audio systems. Q-LAN oper- MIDI-based control surfaces connected via
ates over gigabit and higher rate IP net- Apple’s MIDI Network Setup. KVM over IP
Dante works. Q-LAN is a central component of is a server management technology.
Dante is a media networking solution QSC’s Q-Sys integrated system platform. (Streaming of Ethernet-based control sur-
developed by Audinate. In addition to pro- Q-Sys was introduced by QSC Audio Prod- faces is forthcoming.) Video-conferencing
viding basic synchronization and transport ucts in June 2009. Q-LAN carries up to 512 to allow collaboration uses either H.323
protocols it provides simple plug and play channels of uncompressed digital audio in devices or the same codec used to stream
operation, PC sound card interfacing via floating point format with a latency of 1 content video. Clock synchronization
software or hardware, glitch free redun- millisecond. between nodes can be accomplished either
dancy, support for AVB, and support for with the hardware audio-over-IP devices,
routed IP networks. The first Dante prod- WAN based telematic/distributed which usually stream clock information, or
uct arrived in 2008 via a firmware upgrade performance and postproduction with GPS-based sync generators at each
for the Dolby Lake Processor and since Telematic or distributed performances are node.

102 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
TECHNOLOGY TRENDS

XFN command and control protocol large variety of content. Apple is also driv- for professional applications. OCA in its
XFN is an IP-based peer to peer audio net- ing this trend with iCloud, released with current form is a Layer 3 protocol that has
work control protocol, in which any device iOS5. Consumer devices are becoming been created by Bosch Communications
on the network can send or receive connec- more complicated and connecting devices based around the earlier (abandoned) com-
tion management, control, and monitoring to the network has been difficult for users, mand and control protocol AES-24. The
messages. The size and capability of devices resulting in many calls to tech support. Alliance has been formed to complete the
on the network will vary. Some devices will The good news is that devices are becom- technical definition of OCA, then to transfer
be large, and will incorporate extensive ing easier to set up. The WiFi Alliance has its development to an accredited public
functionality, while other devices will be created an easy setup method call WiFi standards organization.
small with limited functionality. The XFN Protected Setup (WPS). This makes attach- The founding group of OCA members is
protocol is undergoing standardization ing a new device onto the home network as proceeding to complete the OCA specifica-
within the AES, and AES project X170 has easy as pressing a button or entering a sim- tion and prepare it for transfer to a public
been assigned to structure the standardiza- ple numeric code. standards organization without inviting
tion process. A draft standards document Another trend driven by the adoption of new active members but welcomes any
has been written and presented to the SC- home wireless LAN technologies is in the interested parties to join as an Observer
02-12 working group for approval. user interface (UI) of networked audio Member.
devices. More and more audio products are
Home broadband audio-over-IP using the iPhone or iPad as the primary International Telecommunications
and home wireless LAN method of device control, via the home Union: Future Networks
Home broadband connections are increas- WiFi network. Some commentators are ITU-T Q21/13, Study Group SG13 is look-
ing in speed, up to a typical rate, world- even announcing the death of the infrared ing at “Future Networks,” which are
wide, of about 2 Mbps. This is sufficient for remote control. Consumer Audio/Video expected to be deployed during 2015–2020.
streaming audio services to produce a good Receiver manufacturers such as Denon and So far an objectives and design goals docu-
performance, mostly using 256 kbps WMA Pioneer offer free iPhone/iPad apps which ment has been published (Y.3001), and the
or AAC, which yields pretty good quality at allow complete, and intuitive control of study group is working on virtualization
a low bit rate. their devices. This leads to another emerg- and energy saving (soon to be published as
Use of wireless LANs in the homes, ing trend, that of the display-less net- Y.3011 and Y.3021 respectively) and on
mostly WiFi, with some proprietary sys- worked audio player. Once the player can identifiers. These deliberations are at a very
tems is increasing. IEEE802.11g routers be conveniently controlled from your early stage and a clear direction is not yet
and devices are realizing faster throughput smartphone, it may not be necessary for apparent. The underlying technology could
rates, while IEEE802.11n achieves the device to continue to include an expen- be a “clean slate” design, or it could be a
improved range, improved QoS, and speeds sive display and user controls. Display-less small increment to NGN (Next Generation
that exceed the needs of low bit rate com- high end audio players are already selling Network, which is based on IPv6).
pressed audio streaming. Two eco-systems well (for example B&W Zeppelin Air). Such
co-exist at the moment. The first is the display-less networked audio players will IEC/ISO: Future Network
Digital Living Network Alliance (DLNA), become ubiquitous and be available for ISO/IEC JTC1/SC6/WG7 is also working on
which focusses on interoperability between under $100. Future Network, and also expects deploy-
devices using UPnP (Universal Plug and ment during 2015–2020. Their system will
Play) as the underlying connectivity tech- Open Control Architecture Alliance be a “clean slate” design with a control
nology. DLNA is becoming available in (OCA) protocol that is separate from the packet
more and more devices, such as PC servers The Open Control Architecture Alliance has forwarding. It will support multiple net-
and players, digital televisions with net- been formed by a group of professional working technologies, both legacy tech-
work connectivity, network attached stor- audio companies who are working in differ- nologies such as IPv4 and IPv6 and also
age (NAS) drives, and other consumer ent product markets and represent a new technologies able to provide a service
devices. The second eco-system is Apple diverse cross section of vertical market suitable for the most demanding live audio
AirPlay, which allows iTunes running on a positions and application use-cases. Each of applications.
PC or MAC to stream audio to multiple the companies realized that relying solely It will carry two kinds of data, “synchro-
playback devices. AirPlay also supports on proprietary solutions for media network- nous” and “asynchronous.” For synchro-
streaming directly from an iOS device ing system controls made interoperability nous data there is a set-up process (part of
(iPhone, iTouch, iPad) over WiFi to a net- with other manufacturers’ equipment or the control protocol) during which
worked audio playback device. Both ecosys- across application domains difficult. resources can be reserved. The application
tems are driving the rapid acceptance of The member companies agreed that an requests QoS parameters (delay, through-
audio networking in the home. open standardized control architecture was put, etc.) appropriate to the data to be sent,
Cloud computing, in particular cloud not only possible, but should be created and and the network reports the service the
storage of audio content, is another emerg- made available as an open, public standard underlying technology is able to provide.
ing trend. The increasing popularity of pre- that could be available to any participant in Asynchronous data can use a similar set-
mium audio services, for example Rhap- the audio market in order to facilitate an up process, or can be routed in a similar
sody, Pandora, Last.fm, and Napster, are improved environment for the entire AV way to Internet Protocol. Thus it will also
driving a trend away from users the need to industry. It is the stated mission of the OCA be efficient at carrying protocols such as
keep a copy of their favorite music in the Alliance to secure the standardization of TCP and will interoperate with IP networks.
home or on a portable device. Connection the Open Control Architecture (OCA), as a This provides a migration path from cur-
to the internet allows real time access to a media networking system control standard rent systems.

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 103
TECHNOLOGY TRENDS

SEMANTIC AUDIO ANALYSIS


Mark Sandler, Chair
Dan Ellis and Jay LeBoeuf, Vice Chairs
Gyorgy Fazekas

The scope of Semantic Audio Analysis has recognize that semantic labelling of music Educational games
undergone a dramatic expansion over the relies on a number of different inputs, and New advances in semantic audio technolo-
past few years. As seen by many researchers started to develop techniques that take con- gies enable the creation of interactive edu-
and practitioners, the area is now best textual information into account. This may cational games for music learners. It is now
defined as the confluence of a multitude of be defined as a piece of complementary data possible to analyze the sound played on real
technologies. These include digital signal that improves the results, but not in itself instruments and thus avoid the need for
processing tools that enable the extraction sufficient in a particular information using MIDI controllers, extract symbolic
of characteristic features from audio, extraction or audio processing task. Exam- information such as chords or note names,
machine learning tools that connect raw ples of new methods include informed and align this information with musical
feature data with high-level semantic repre- source separation, which works by encod- scores in real-time. Applications like
sentations of musical content, information ing information about the mixing process Song2See demonstrate how semantic audio
management tools that allow us to effec- into the stereo signal, and enhance signal technologies may help to create content for
tively link and organize this information, separation by using this data in the decoder, music learners by using automated tran-
and knowledge representation tools facili- and informed music transcription, which scription, keep the user in the loop by allow-
tating the use of automatic data aggrega- takes prior information about the instru- ing the correction of transcription errors,
tion and high-level inferences, thus the for- ments into account. We can also observe use the content to ease the learning process
mulation of complex queries involving the increasing use of studio stems, taking with fingering suggestions for each instru-
unique features of content, as well as social advantage of the multitrack format and the ment, and provide real-time feedback about
metadata about musical recordings. Web- use of multiple modalities, the simultane- the quality of playing by means of sound
based applications that allow us to pose ous analysis of audio, video, text, and other analysis. The appearance of web-based plat-
queries like “find me upbeat and catchy sources of information. forms for content and metadata sharing,
songs between 130–140 bpm, performed by and advances in semantic analysis and rec-
artists collaborating in the London-Shore- Ontologies and linked data ommendation technologies also provide for
ditch area, and sort them by musical key” The heterogeneity and open-ended nature creating novel applications for music educa-
are now imminent. The TC is concerned of musical data is often the culprit of tion. There is a growing trend in using com-
with overseeing and coordinating develop- developing complex systems that use many munity created web content, including lead
ments, disseminating knowledge, and pro- sources of information. Recent develop- sheets and chord charts, and to analyze
moting novel interdisciplinary tools and ments in other disciplines, namely Web- YouTube videos to enhance machine analy-
applications in the light of emerging trends Science and the Semantic Web help us in ses, or to create interactive games that are
in Semantic Audio. The most important of developing methods for associating musi- not limited to expert generated content. The
these novel trends and applications include cal data with explicit meaning in a use of the web thus provides an advantage
the following. machine-processable way. Technologies over games like Rock Band or Guitar Hero.
such as the Resource Description Frame-
Multi-modality and the use work (RDF) and Semantic Web ontologies Intelligent music production tools
of contextual information enable us to represent information and Finally, there is a recent increase in adapt-
The process by which human beings under- knowledge in a uniform interoperable ing semantic audio technologies in music
stand music, and assign high-level semantic framework, and lead to intelligent music production. Examples of these applications
descriptions to musical events depends on a processing tools of the future. Semantic include navigating sound effect libraries by
variety of information sources; precepts Web ontologies such as the Music Ontol- using similarity defined by proximity in a
from different senses, memory, and expec- ogy also provide the back-bone of the characteristic feature space, using auto-
tations. As recently demonstrated during Linked Data, which eases linking and matic audio-to-score alignment in audio
the first and highly successful AES confer- aggregation over disparate resources, con- editing, and developing intelligent audio
ence on Semantic Audio Analysis in Ilme- taining increasing amounts of editorial effects and automatic mixing techniques
nau, Germany, researchers have started to and social data about music. that rely on semantic audio analysis.

SIGNAL PROCESSING
FOR AUDIO
Christoph Musialik, Chair
James Johnston, Vice Chair

Signal processing applications in audio engi- performance improvements in solid-state Observations


neering have grown enormously in recent memory, disk drives, and microprocessor First, DSP has emerged as a technical
years. This trend is particularly evident in dig- devices. The growth in audio signal process- mainstay in the audio engineering field.
ital signal processing (DSP) systems due to ing applications leads to several observations. Paper submissions on DSP are now among

104 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
TECHNOLOGY TRENDS

the most popular topics at AES Conven- rection, and delay methods to synchronize clock recovery circuits, and output ampli-
tions, while just some years ago DSP ses- wavefront arrival times at a particular lis- fiers that match the specifications of the
sions were rare at AES. DSP is also a key tening position. digital components.
field for other professional conferences, In professional sound reinforcement,
including those sponsored by IEEE and loudspeakers with steerable radiation pat- Implications for technology
ASA. terns can provide a practical solution for All of the trends show a demand for ever
Second, the consumer and professional difficult rooms. Professional live audio greater computational power, memory
marketplaces continue to show growth in applications often demand low-latency sys- capacity, word length, and more sophisti-
signal processing applications, such as tems, which remain challenging for DSP cated signal processing algorithms. Never-
increasing number of discrete audio chan- because many algorithms and processors theless, the demand for greater processing
nels, increasing audio quality per channel are optimized for block processing instead capability will be constrained by the need
(both word width and sampling frequency, of sample-by-sample, and thus introduce to minimize power consumption, since a
and increasing quality of building block more latency. continuously growing part of audio signal
electronics, such as sampling rate Algorithmic developments will continue processing will be done in small, portable,
converters, ADCs and DACs, due to con- to occur in many other areas of audio wireless, battery-powered devices. On the
tinuously growing availability of con- engineering, including music synthesis, other hand, due to the increasing capabili-
sumer-ready DSP hardware processing and effect algorithms, intelli- ties of standard microprocessors, contem-
Third, there is growing interest in intel- gent noise reduction in cars, as well as porary personal computers are now fast
ligent signal processing for music infor- enhancement and restoration for archiv- enough to handle a large proportion of
mation retrieval (MIR), like tune query by ing and audio forensic purposes. Also, the standard studio processing algo-
humming, automatically generating improved algorithms for intelligent ambi- rithms. Advanced algorithms still exceed
playlists to mimic user preferences, or ent noise reduction echo cancellation, the capabilities of traditional processors,
searching large databases with semantic acoustical feedback suppression, and steer- so we see a trend in the design of future
queries such as style, genre, and aesthetic able microphone arrays are expected in processors to incorporate highly parallel
similarity. the audio teleconferencing field. architectures and the compiler tools nec-
Fourth, there are emerging algorithmic Fifth, switching amplifiers (like Class D) essary to exploit these capabilities using
methods designed to deliver an optimal continue to replace traditional analog high-level programming schemes. Due to
listening experience for the particular amplifiers in both low power and high the relentless price pressure in the con-
audio reproduction system chosen by the power applications. Yet even with Class D sumer industry, processors with limited
listener. These methods include transcod- systems, the design trends for load inde- resolution will still challenge algorithm
ing and up-converting of audio material to pendence and lowest distortion often developers to look for innovative solutions
take advantage of the available playback include significant analog signal process- in order to achieve the best price-perfor-
channels, numerical precision, frequency ing elements and negative feedback fea- mance ratio.
range, and spatial distribution of the play- tures. Due to advances in AD/DA converter The Committee is considering forging
back system. Other user benefits may technology, future quality improvements contacts with digital signal processor
include level matching for programs with will require the increasingly scarce pool of manufacturers to convey to them the
differing loudness, frequency filtering to skilled analog engineers to design input needs, experiences, and recommendations
match loudspeaker capabilities, room cor- stages like microphone preamps, analog from the audio community.

SPATIAL AUDIO
James Johnston and Sascha Spors, Chairs

Loudspeaker layouts that include elevated loudspeakers, for exist, however there is still a lot of room for
Nowadays surround sound is available in instance 9.1, 10.2 or 22.2, are becoming new developments that can be foreseen to
many households, where the 5.1 layout is the ready for the market. It remains to be seen show up in the future. In addition, new deliv-
most deployed loudspeaker configuration. whether the market accepts the increased ery methods that provide specific informa-
The production chain from recording, cod- number of loudspeakers that have to be tion related to height and distance, as well as
ing, transmission to reproduction of sur- installed at home. From a perceptual point of horizontal angle are being reported on at
round sound for cinema is also well estab- view, including height cues into the repro- AES conventions.
lished. So far, the consumer market for duction has a clear benefit. However, the
surround sound has mainly been driven by optimal speaker layout is subject to a vital 3-D
movie titles; audio-only content is still quite discussion within the community. With the increased spread of 3-D video in cin-
rare. As successor of the 5.1 layout, various Novel recording, production, and coding ema and home cinema, new requirements
layouts with more loudspeakers arranged in a techniques have to be developed and estab- must be met by spatial audio reproduction.
plane have been proposed, for instance the lished for the new layouts including height. While 3-D video adds depth to the image, this
7.1 layout. None of them had the commercial Upmixing algorithms for content that has is not a straightforward task with stereo-
success of the 5.1 layout. Layouts that allow been produced for systems not featuring phonic techniques. This holds especially for
for the reproduction of height seem to be the height, for instance from stereo or 5.1 sur- sound sources closer to the listener than the
next natural step in the evolution of sur- round, to the novel formats including height loudspeakers. Future spatial audio tech-
round sound. A number of proposed layouts are being developed. A number of proposals niques have to provide solutions to the chal-

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 105
TECHNOLOGY TRENDS

lenges imposed by 3-D video. First concepts in the field that should be researched in the consequence, an increasing number of indi-
have been presented. future. Peer-reviewed, published perceptual viduals use headphones when listening to
research based on established psychoacoustic audio. In such scenarios the use of reproduc-
Sound field synthesis methodologies will definitely bring the com- tion techniques based on head-related trans-
As alternatives to the traditional stereophonic munity forward in this aspect. fer functions provides truly three-dimen-
systems, sound field synthesis techniques sional spatial audio by relatively simple
such as Wave Field Synthesis (WFS) and Production and mixing techniques technical means. As a consequence binaural
higher-order Ambisonics (HOA) are being So far, the traditional production and mixing audio is expected to play a more prominent
deployed more and more. Sound field synthe- techniques used in stereophony are channel- role in the future. As an alternative to head-
sis approaches are based on the principle of based. The different tracks are mixed in the phone listening, near-field loudspeaker play-
physically synthesizing a sound field. In the studio for a particular target layout and then back with cross talk cancellation may be
past two decades around 100 WFS systems transmitted/stored. This process relies on the used.
have been installed worldwide with each up assumption that the setup at the receiver
to 832 channels. Large scale Ambisonics sys- matches the setup used during production. Diverse applications
tems are currently not so widespread, but it Object-oriented audio, as an alternative Besides its traditional application fields, cin-
seems that such systems will show up in the approach, is based on the concept that each ema and home cinema, spatial audio is
near future. A vital research community track or group of tracks forms an audio increasingly being deployed in other areas.
exists for both WFS and HOA that investi- object. The signal(s) of the object together For instance, in teleconferencing systems,
gates various aspects and combinations of with side information (position, room, cars, and as auditory system for advanced
both approaches. effects) is then transmitted to the receiver. human-machine interfaces. Here the use of
Here the loudspeakers signals are generated spatial audio is expected to provide a clear
Psychoacoustic motivation by a suitable rendering algorithm, on the benefit in terms of naturalness and transport
Upcoming trends in spatial audio reproduc- basis of the transmitted side information. A of additional information. Another important
tion besides traditional stereophony are major benefit of the object-oriented approach area of application is virtual concert hall and
multichannel reproduction systems that lies in the fact that it is almost independent stage acoustics using active architecture sys-
are psychoacoustically motivated. Several from the setup used by the receiver. It seems tems where spatial audio enhances the envi-
techniques have been developed on the that in the future a combination of both ronment with which musicians and audi-
basis of WFS that aim at spatial reproduc- approaches might be promising to cope for ences interact during performance. Modern
tion with almost arbitrary layouts using a the needs of the producers on the one side multichannel systems offer adjustability of
decreased number of loudspeakers com- and to allow setup independent mixing/pro- acoustics and high sound quality suitable for
pared to traditional WFS. Such approaches duction on the other side. Currently several live performance and recording.
are already commercially available. Multi- different formats for the transmission of the
channel time-frequency approaches use content/side information have been pro- Network standards
techniques from short-term signal analysis posed, however, none have yet been commer- With respect to cabling, coping with the ever
to analyze and synthesize sound fields. cially adopted in any significant fashion. increasing number of loudspeakers, the new
Directional Audio Coding (DirAC) and Bin- IEEE standards for Audio Video Bridging
aural Cue Coding (BCC) are representatives Headphone listening (AVB) seems promising. The standards are
of the latter techniques. Time-frequency Although spatial audio is routinely used by designed for the fully synchronized transmis-
processing seems to be a promising con- the gaming industry, advanced techniques sion of a high number of output/input chan-
cept since its basic idea is related to the with better quality and realism can be nels via Ethernet. The standards are devel-
analysis of sound fields performed by the expected with further increases in processing oped and currently supported by all major
human auditory system. power. This holds especially for mobile players in the field and devices are expected
The psychoacoustic mechanisms underly- devices, where spatial audio is currently to be available in the near future. Such stan-
ing the perception of synthetic sound fields rarely deployed. Due to the general shift dards that work with intelligent processing to
have been investigated in quite some detail. toward mobile devices spatial audio will also detect the listening setup are expected to be
However there are still plenty of open issues be finding its way into the mobile world. As a proposed soon.

TRANSMISSION AND BROADCASTING


Kimio Hamasaki and Stephen Lyman, Chairs
Lars Jonsson and Neville Thiele, Vice Chairs

The growth of digital broadcasting is the Digital terrestrial TV broadcasting IBOC fulfills this role in the USA and
most remarkable trend in this field. Digital In Europe DVB-T2 has been deployed in Canada, with ISDB-SB in Japan. Large
terrestrial TV and radio broadcasting have several countries for HD services. ATSC is broadcasting organizations in Europe and
been launched in several countries using the used in USA, Canada, and Korea, while Asia, and major countries like India and
technology standards listed below. Analog ISDB-T is employed in Japan and Brazil. Russia with large potential audiences, are
broadcasting has ceased in some countries. committed to the introduction of DRM
The World Wide Web has become a more Digital terrestrial radio broadcasting (Digital Radio Mondiale) services and it is
common, alternate source of streamed or In Europe and Asia DAB+ is state of the art to be expected that this will open the mar-
downloadable programming. in the DAB Eureka 147 family. HD-Radio or ket for low-cost receivers.

106 J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February
TECHNOLOGY TRENDS

Digital terrestrial TV broadcasting Loudness core of the EBU Loudness work:


for mobile receivers Loudness and True Peak measurements are EBU R128 Loudness Recommendation
DVB-T2 Lite (Europe) is still under develop- replacing the conventional VU/PPM methods EBU Tech 3341 Metering specification
ment, while ISDB-T is used in Japan. DMB of controlling program levels. This has EBU Tech 3342 Loudness Range
is employed in Korea and there have been a largely eliminated significant differences in descriptor
few trials in Europe. the loudness of different programs (and EBU Tech 3343 Production Guidelines
In the U.S., the Advanced Television Sys- advertisements) and the need for listeners to EBU Tech 3344 Distribution Guidelines
tems Committee (ATSC) has published final keep adjusting their volume controls. Sup- ATSC: A/85 “Techniques for Establishing
reports of two critical industry planning porting international standards and operat- and Maintaining Audio Loudness for Digital
committees that have been investigating ing practices have been published by several Television”.
likely methods of enhancing broadcast TV organizations such as ITU-R, EBU and ATSC
with next-generation video compression, listed below. More and more broadcasters Lip sync
transmission and Internet Protocol technolo- now apply these standards in their program The lip-sync issue remains unsolved, but is
gies and developing scenarios for the trans- production and transmission chains. being discussed in digital broadcasting
mission of three-dimensional (3-D) programs ITU-R: BS.1770: “Algorithms to measure groups. Some international standards
via local broadcast TV stations. The final audio programme loudness and true-peak development organizations such as IEC and
reports of the ATSC Planning Teams on 3-D- audio level”; BS.1771: “Requirements for SMPTE are discussing new standards for
TV (PT-1) and on ATSC 3.0 Next Generation loudness and true-peak indicating meters.” measuring the time differences between
Broadcast Television (PT-2) are available now The following five documents provide the audio and video.
for free download from the ATSC web site.

Benefits of digital broadcasting


The introduction of digital broadcasting has
introduced such benefits as High Definition
TV (1080i, 720p). Due to the current avail-
Subscribe to the AES E-Library
ability of 5.1 surround sound in digital
broadcasting, surround sound is an impor- h ttp://www.aes.org/e-li b/subscribe /
tant trend in TV broadcasting. 5.1 surround
sound is evolving with future extensions
involving additional channels. Along with 3-
D TV, several broadcasters are experimenting
with 3-D audio (for instance 22.2, Ambisonic,
wave-field synthesis, directional audio cod-
ing). Data broadcasting now includes addi-
tional information related to a program.
G ai n i m m e di a t e a c ce s s t o
Digital broadcasting themes o v e r 1 3 , 0 0 0 f u l l y se a r c h a b l e
at AES conventions
Recent AES conventions have discussed the P D F f i l e s d o cu m e n t i n g a u di o
following digital broadcasting issues: strate- r e s e ar ch f ro m 1 9 5 3 t o t h e
gies for the expansion of digital broadcasting;
audio coding quality issues for digital broad- pre sen t day. The E-librar y
casting and transmission; the role and i n cl u d e s e v e r y A E S p ap e r
importance of audio in an era of digital mul-
timedia broadcasting; new program-produc- p u b l i s h e d at a c o n v e n t i on ,
tion schemes for audio in digital multimedia c o n f e re n c e o r i n t h e J o u r n al
broadcasting; the future of radio including
multicasting over the web and surround.

Internet streaming
The use of new methods for the distribution
of signals to the home via the Internet with
streaming services is an increasing trend.
Web radio and IPTV are now getting audi- Individual annual subscription
ence figures that in a number of years from
now will be closing in on the traditional sys-
$255 non-members
tems. Distribution technologies with rapid $145 members
growth in many countries are: ADSL/VDSL
over copper or fiber, combined with WiFi in Institutional annual subscription
homes; WIMAX and 3G/UMTS; 4G and wi-fi $1750 per year
hot spots for distribution to handheld
devices.

J. Audio Eng. Soc., Vol. 60, No. 1/2, 2012 January/February 107

You might also like