Telling The Difference Preference and Prediction

FEATURE ARTICLE
Telling the difference

Preference and prediction
Francis Rumsey
Consultant Technical Writer
The recent AES Virtual Vienna convention featured some interesting papers
on perception and sound quality evaluation. In particular an emphasis could
be found on listener preferences, clarity, and naturalness concepts in live and
reproduced sound, such as with artificial reverberation and in sound mixes.
There was also some revealing work on the effects of
different loudspeaker placements.
T he recent AES Virtual Vienna conven-

tion featured some interesting papers
on perception and sound quality evaluation.
some 60 years ago, and that many systems
don’t explicitly aim to model the acoustics of
spaces. The signal processing requirements
impulse responses (BRIRs) using each of the
selected reverb algorithms. The RT30 values
of the simulated reverb were matched to
In particular an emphasis could be found for accurate room modelling can be quite those of the real rooms as closely as possi-
on listener preferences, clarity, and natural- high and may not be necessary to achieve a ble, and there was also an attempt to match
ness concepts in live and reproduced sound, plausible effect, so various types of feedback the size of the real rooms in terms of early
such as with artificial reverberation and in delay network (FDN) have been employed reflections created by each of the reverbs.
sound mixes. There was also some revealing quite widely in commercial systems. These The six stimuli versions they ended up with
work on the effects of different loudspeaker are contrasted with recent ideas such as (to be compared by listeners) consisted of
placements. the scattering delay network (SDN), which those treated with the real-room BRIRs
attempts to simulate the physical parameters (convolution reverb), those synthesized by
NATURALNESS OF ARTIFICIAL of spaces without too high a price in terms CATT-Acoustic (see the models in Fig. 1)
REVERBERATION of signal processing. In the experiments and a simplified CATT model, the results
Artificial reverberation (reverb) is used reported here, reverb generated using such of an SDN VST plug-in, the results of an
widely in the audio industry to enhance methods is compared with that modelled FDN plug-in, and a low anchor consisting
mixes, simulate spaces in VR environments using advanced acoustic design software of high-pass filtered natural BRIRs. (The
and games, and facilitate auralization in (CATT-Acoustic), and with that created using authors have put these signals online for
design applications. In an interesting paper, a convolution process. (Convolution reverb those interested in comparing the results
“Evaluation of the Perceived Naturalness involves combining dry sounds with impulse for themselves.)
of Artificial Reverberation Algorithms” responses recorded in real spaces.) It was decided to ask 28 listeners in a
(Paper 10353), Stojan Djordjevic and his A number of anechoic recordings were multiple comparison test to rate acous-
colleagues looked into a number of dif- used as the input to the various reverbera- tic naturalness, which was defined as the
ferent types of artificial reverb in order to tors, including a cello, a door opening and degree to which samples corresponded to
discover how natural they sounded. closing, gunfire, and male speech. The aim the experience of a sound source within
The authors point out that artificial was to simulate the reverberation of two a room. (Listeners had also been asked
reverb has been around for quite some time, real rooms (a lecture room and a small to rate "pleasantness," but this was found
having first been proposed by Schroeder seminar room) by generating binaural room to correlate strongly with naturalness.)
774 J. Audio Eng. Soc., Vol. 68, No. 10, 2020 October
FEATURE ARTICLE
up to the test of unbiased displayed no significant preference for any

comparison. (See the engi- plugin, except in one case where the refer-
neering brief “Investigating ence paid plugin was compared to one of the
User Preference for Rever- free versions on the cello sample. The expert
beration Plugins," eBrief 578.) listener, though, showed more consistent
Introducing the topic, it’s said preferences, with a strong preference for
making accurate, unbiased the paid reference plugin, for both stimuli.
comparisons between com- There was, therefore, at least some evidence
mercial reverb plugins is hard that an experienced listener would consis-
because of confounding fac- tently prefer a specific implementation even
tors such as interface design, when a number of metrics were made simi-
branding, reviews, and mar- lar to others.
keting. People may suffer from
so-called confirmation bias, LISTENER PREFERENCE BASED
whereby they tend to look for ON ACOUSTIC PARAMETERS
outcomes that support their Peter Critchell and Ludovico Ausiello
existing hypothesis about the attempt “A New Approach to Predicting
goodness or badness of differ- Listener’s Preference Based on Acousti-
ent products. In order to chal- cal Parameters” (Paper 10378), in which
lenge these biases the authors they describe a meta-analysis of existing
attempted to find out whether studies of relationships between acoustic
there were consistent pref- metrics and listener preference. When they
erences in a more controlled talk about acoustic parameters or metrics
test where the settings of dif- here they mean things like early decay time
Fig. 1. CATT-Acoustic models of two spaces used when ferent plugins were closely (EDT), bass ratio (BR), reverberation time
comparing artificial reverberation methods. (Courtesy of matched in terms of objective (RT), and so forth, that are used to mea-
Djordjevic et al.) parameters. sure the characteristics of acoustic spaces
Four reverb plugins were such as halls. Knowing that Frick had ear-
As the reverb had been rendered binau- used for the experiments, two of them lier come up with a measure of “acoustic
rally the tests were done on headphones. paid, and two of them free, with the “hall” quality” based on six such metrics, they
Interestingly, although the authors had settings of one of the paid systems used as discuss trying to arrive at a simple “prefer-
expected the stimuli convolved with the real a reference for the others. Low frequencies ence rating” (PR) that can be used in the
room impulse responses to come out top below 100 Hz were removed from all four in context of rock and pop venues. An exist-
in terms of naturalness, the SDN network order to eliminate differences in LF behav- ing study of similar relationships by Adel-
outperformed them slightly, and the differ- ior. By using acoustic analysis software man-Larsen et al. is acknowledged, and the
ence was significant. Attempting to explain to analyze the spectra, early reflections, authors decided to use the same data set,
this, a number of ideas were put forward, RT60, and late reflections of each plugin a derived from 20 venues, which included a
including mismatches between the HRTFs number of iterative adjustments were made lot of acoustical parameters and subjective
(head-related transfer functions) used by the to the settings so as to get them to match ratings.
different systems, differences in perceived as closely as possible. This turned out to be In a meta-analysis one tries to bring
externalization, and differences in the direc- quite a challenging process, owing partly to together multiple studies and sources of
tional characteristics of the sound sources the different ways that these were expressed data in order to discover trends and patterns
used. (An ideal omni source was used in and controlled. An interesting outcome was among those studies. Here the authors
the SDN simulation, whereas a real loud- that even when a user interface was clearly pulled together all the parameters describ-
speaker was used for the recorded impulse calibrated for a parameter such as early ing room acoustic quality that they could
responses.) Differences were also noticed reflection time, the measured result didn’t find in the literature, and then attempted
between the simplified and standard CATT match the setting. to pool parameters that appeared to relate
models, suggesting that the room details Anechoic instrumental recordings (cello to the same category of phenomenon. An
modelled in the standard setup might have and violin) were used as stimuli, the results example of this is shown in Fig. 2, where a
acted to improve the naturalness of simu- being normalized in terms of loudness. The table presented in the paper groups param-
lated results. results were auditioned in a randomized eters or variables in “acoustic contexts”
paired comparison test on headphones by and indicates the number of papers (out of
PREFERENCE FOR REVERB PLUGINS 10 relatively inexperienced listeners from 136 in three major journals) in which they
In a related vein, Kevin Garland and Mal- a music technology course. The authors occurred. Only the more frequently arising
achy Ronan tried to find out whether peo- also managed to recruit an experienced variables were selected as predictors for
ple’s professed preferences for particular recording engineer. The group of inexperi- listener preference, unless there was only
reverb plugins used in sound mixing stand enced listeners, when analyzed as a whole, one in any context. It can be seen that “early
J. Audio Eng. Soc., Vol. 68, No. 10, 2020 October 775
FEATURE ARTICLE
weightings and scores might be

different.
To test the output of the new
algorithm in terms of its effective-
ness for predicting listener prefer-
ence, the authors used all of the
acoustic data from the Adelman-
Larsen dataset mentioned above,
which also included subjective
preference ratings of all the rooms
measured, by sound engineers
and musicians. Some metrics
(SDI—sound diffusivity index,
background noise) needed for the
new algorithm were not included
Fig. 2. Relative frequency of various acoustical in the dataset so they had to be
parameters discovered in papers about acoustic estimated from other information.
quality of spaces. Those included in a final preference
The overall comparison between
algorithm are shown in bold. (Figs. 2–5 courtesy of
Critchell and Ausiello) initial predicted preference ratings
and the subjective data from the
Adelman-Larsen study is shown
in Fig. 5., where it can be seen
that fairly good correspondence
existed at the outset, especially
when estimated values of SDI
and noise were omitted from the
prediction. A number of variants
on the prediction metrics were
then tried in order to improve the
Fig. 5. Preference rating (PR) from the algorithm
compared with “subjective data” (assumed to fit with the subjective data, the
represent actual listener preference) from a previous one being most successful using
study by Adelman-Larsen. (N.A.F. represents “no a slightly different version of the
additional factors,” that is excluding estimated
values for SDI and noise.) acoustic parameter known as
Definition (D50), averaged over a
and pop venues the listeners are often wider frequency range. After this tweaking
standing and moving around, making the predicted and actual results lay within
this parameter potentially less rele- 5% of each other for 17 out of the 20 venues
vant. This selection process enabled examined.
the number of parameters used for a
Fig. 3. Preference rating (PR) prediction for rock and
pop venues, algorithm flow chart prediction model to be boiled down MIX CLARITY PREDICTION
to eight. Mix clarity in music production may be
In order to weight the values said to be related to the “ease in which the
of each of these parameters in the timbres that constitute the piece of music
prediction model the process shown can be separated and identified,” according
in Fig. 3 was used. “Ideal” values for to Andrew Parker and Steven Fenton in
each parameter were inferred from their paper “Mix Clarity Prediction Using
the literature as a means of scoring Multiresolution Inter-Band Relationship
the measured values, then the scores Analysis” (Paper 10340). They say that the
were weighted according to their attribute can be used as a basis for auto-
relative importance (RIW). The RIW matic mixing systems, particularly when it
was determined, again from the liter- is related to masking phenomena, working
Fig. 4. Relative importance weighting (RIW) of metrics ature surveyed, on the basis of the on the basis of minimizing the masking
used to predict preference
number of mentions occurring in the of one musical instrument by another in a
reflections” was eliminated from the shape/ meta-analysis, specifically with importance mix. A number of factors that can contrib-
layout group, the argument being that room for rock and pop venues in mind (Fig. 4). ute towards mix clarity are reviewed, point-
mode distribution can easily be determined It’s proposed that if this were to be done ing out that masking-based metrics need a
from room dimensions, and that in rock again for, say, classical music, then the fairly detailed knowledge or decomposition
FEATURE ARTICLE
median clarity scores, whether one looks at the first or second sets
as shown in Fig. 6. of subjective data. The second set (the inde-
In a second listen- pendent absolute ratings of different genres)
ing test, 16 stimuli seemed to benefit from a much wider range
were selected from of window sizes (50–550 ms) between the
a music archive to different frequency bands, being smaller at
represent examples the LF and HF ends than in the middle. The
of a range of genres first set of subjective results seemed better
and styles. In this predicted by a more consistent window
case listeners were size of 350–450 ms across the three bands.
expected to make Overall the results with tuned measurement
an absolute rating windows performed better than the original
of the mix clarity of fixed window metrics.
each item, without
reference to others. EFFECTS OF LOUDSPEAKER
This too resulted in a PLACEMENT
Fig. 6. Median subjective clarity scores for a listening test comparing broad range of ratings Another group of papers presented at the
multiple mixes of the same multitrack content. (Figs. 6 and 7 courtesy of across the scale convention dealt with various effects of
Parker and Fenton)t
employed. loudspeaker placement on sound quality
of the various sources in a mix, and this is The IBR analysis mentioned above is and spatial impression.
not always available. The authors’ previous shown in block form in Fig. 7. Signals are Although not strictly concerned directly
work on inter-band relationship (IBR) is divided into three frequency bands and the with sound quality, it’s instructive to note
extended—work that involved measuring crest factor in each band is measured over the results of a study that was conducted by
multi-band dynamic range to predict var- a given time window, as an indicator of Craig Cieciura and colleagues from Surrey
ious aspects of music production quality. dynamic range. These can be compared and the BBC investigating factors affect-
The tuning of window size in different fre- over time to evaluate the changes that arise. ing people’s arrangement of loudspeakers
quency bands in order to optimize clarity Previously the authors had used a fixed 400 in their homes. In “Understanding Users’
prediction is explored. ms window with 75% overlap, inspired by Choices and Constraints when Positing
Ten mixes of a multitrack recording were momentary loudness measurement, but Loudspeakers in Living Rooms” (eBrief
selected from those previously submitted in this case the length of the window was 596), the authors asked a number of partic-
for a mix competition at another university, varied between 5 and 550 ms, resulting in ipants how they would arrange between
ranging in their performance in the compe- a selection of measurements at different one and eight compact wireless loudspeak-
tition, and hoped to have differing levels of lengths that could be compared for effec- ers with the goal of enhancing their exist-
mix clarity. Identical 10-second clips were tiveness. Using a grid search procedure the ing systems. This was done in the light
then normalized in terms of their loudness. optimum combination of window sizes was of evidence that ownership of surround
These were then rated by 18 listeners in sought so as to maximize the correlation of sound systems in UK homes is low (at
relative terms in a multiple comparison test, the prediction with subjective ratings. As far around 11%), lower than soundbars and
including a low anchor stimulus. Thankfully as can be ascertained, the best correlations wireless loudspeakers (both around 17%).
the results of that test displayed a range of with subjective results seemed to depend on Consequently various studies have been
Fig. 7. Multiresolution inter-band relationship (IBR) approach used for measuring mix clarity
J. Audio Eng. Soc., Vol. 68, No. 10, 2020 October 777
FEATURE ARTICLE
interested in what has been termed “device layout geometry for ambisonic render-
orchestration”—essentially a means by ing by Lukas Gölles and his colleagues
which ad hoc wireless and smart speak- (“Influence of Horizontal Loudspeaker JAES
ers can be utilized to enhance audio scene Layout Geometry on Sweet Area Shape
reproduction. The question therefore was for Wwidened/Diffuse Frontal Sound,” offers
Open
where might people be willing to put these Paper 10369.) They point out that the
devices in their rooms? geometry of a loudspeaker array for ambi-
Three themes emerged from the study—
spatial balance and distribution, room
sonic reproduction is often adapted to the
rectangular shape of a space, even though
Access
aesthetics, and room functionality. Sadly, a perfect cube or sphere might theoreti-
The AES Journal is
but perhaps not surprisingly for the audio cally be the best layout. It may not be clear pleased to offer an Open
community, aesthetics and functionality were whether the listener is facing the long Access (OA) publishing
prized above balance and distribution, with or short side of the rectangle, or which option to its authors.
convenient surfaces such as tables and book- arrangement offers the biggest “sweet A g rowing number of
shelves inevitably being used for positioning. area.” After conducting listening experi- countries and research funding
When asked to position eight such devices ments with widened and diffused sources, bodies require Open Access to pub-
in their rooms most participants expressed looking at the listening area beyond which licly-funded research. If an AES
“trepidation” or “doubt,” with one exclaiming images collapsed into the nearest loud- paper has been made Open Access it
“I wouldn’t have that many loudspeakers in speaker, they concluded that in fact a will have the OA log o (above) next to
it and will be freely downloadable
my house!” This was even a factor for some wide rectangle layout yielded significantly
from the AES E-Library by anyone,
people with four loudspeakers. This does not better results than a long rectangle or a
even if they’re not an AES member
provide much encouragement for those that circular layout in most cases. or E-Lib rary sub scrib er. An OA
hope immersive audio systems will become In an eBrief looking at “The Influence paper can also be distributed by the
adopted widely in the consumer arena. of Loudspeaker-Listener Distance on the author and by third parties.
Continuing the theme of how many loud- Detection of Low-Bitrate Audio Coding To find further information refer
speakers are needed and where they should Artifacts” (eBrief 576), Alan Pawlak and to: http://www.aes.org /openaccess/.
be put, Kamekawa and Marui look at whether Hyunkook Lee wanted to know whether
full-range loudspeakers are necessary in all the recommended minimum distance
locations for 3D audio. (Paper 10362, “Are of two metres was actually important. A
Full-Range Loudspeakers Necessary for the distance of at least two metres from the noise in the results. Some of the papers dis-
Top Layer of Three-Dimensional Audio.”) loudspeakers seems to be specified in a cussed in this article bear out the idea that
Here they were concerned with 22.2 multi- number of standards, but the authors had if you gather a group of relatively untutored
channel sound and whether filtering the not found a clear rationale for this, or much listeners together and ask them what they
top layer speakers makes any difference. As experimental evidence. Consequently they like, or whether they can hear the differ-
they say, even though the various 3D audio compared the results of a sound quality ences between things, the results will be
system proponents hope for speakers with test on a low bitrate audio codec between inconclusive, or you will conclude that it
similar characteristics all round, “it is not headphone listening and listening at vari- doesn’t matter. The more discerning and
easy to prepare an ideal reproduction envi- ous loudspeaker distances, finding little trained the listener is, the more likely it
ronment in a consumer’s home audio…” If influence of the listening distance on the is that small differences will be noticed,
the overhead speakers could be smaller it ability to detect artifacts. Listeners were and that these differences will matter for
could help in practical terms. more affected by the reproduction system their preference. Whether the differences
Using a variety of different types of than they were by the listening distance, that highly trained listeners obsess about
program material, including classical and although the authors did note that these actually matter when it comes to imple-
traditional Japanese music, as well as voice issues are potentially listener dependent. menting consumer systems may be a ques-
and noise, the authors attempted to discover tion overridden in some cases by practical
what high-pass or low-pass filtering could CONCLUSION constraints in the home, car or life situa-
be tolerated without the listeners noticing Listener preference for one thing or tion. This is perhaps best exemplified by the
a difference from full-range content in the another, be it reverberation, mix balance, reaction of one participant in the study on
upper loudspeakers. Interestingly 75% of loudspeaker arrangements or characteris- multiple loudspeakers, mentioned above—
the listeners couldn’t distinguish any differ- tics, has always been a thorny issue. It tends “I wouldn’t have that many loudspeakers in
ence when the upper channels emitted noth- to depend who you ask, although common my house!”
ing below 400 Hz. On the other hand, high trends have been observed in a number
frequency roll-offs had more variable effects. of studies. There is some evidence that Editor’s note: the papers discussed in this
Listeners with greater 3D production experi- consumer preferences for basic aspects of article and others from the same conference
ence were more sensitive to changes in high sound quality (such as frequency response can be downloaded from the AES E-Library
at http://www.aes.org/e-lib/. AES
and low frequency response. of loudspeakers or headphones) tend to fol- members get free access to the E-Library.
There’s also a paper on loudspeaker low professional ones, just with much more

Telling The Difference Preference and Prediction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Telling The Difference Preference and Prediction

Uploaded by

Copyright:

Available Formats

FEATURE ARTICLE

Telling the difference

T he recent AES Virtual Vienna conven-

up to the test of unbiased displayed no significant preference for any

weightings and scores might be

You might also like